Algorithms Interview Questions

Master your next Algorithms interview with our comprehensive collection of questions and expert-crafted answers. Get prepared with real scenarios that top companies ask.

Find mentors at
Airbnb
Amazon
Meta
Microsoft
Spotify
Uber

Master Algorithms interviews with expert guidance

Prepare for your Algorithms interview with proven strategies, practice questions, and personalized feedback from industry experts who've been in your shoes.

Thousands of mentors available

Flexible program structures

Free trial

Personal chats

1-on-1 calls

97% satisfaction rate

Study Mode

Choose your preferred way to study these interview questions

1. Can you describe the merge sort algorithm? What is its time complexity?

Merge sort is a divide and conquer algorithm that splits the array into halves until each piece is just one element. Then, it merges those pieces back together in a sorted order. First, the array is divided recursively into two halves until the base case of a single element is reached. Then, these halves are merged together using a 'merge' function that combines two sorted arrays into one larger sorted array.

Its consistency in breaking down the array and merging them back in a sorted order renders it highly efficient in most cases. The time complexity of merge sort in both the average and worst-case scenarios is O(n log n), where 'n' is the number of elements in the array. This efficiency comes from the logarithmic number of levels needed to break down the array and the linear time required to merge each level.

2. How do you detect a cycle in a linked list?

To detect a cycle in a linked list, one common approach is to use Floyd’s Cycle-Finding Algorithm, also known as the Tortoise and Hare algorithm. You use two pointers: a slow pointer that moves one step at a time and a fast pointer that moves two steps at a time. If there's no cycle, the fast pointer will eventually reach the end of the list. However, if there is a cycle, the fast pointer will eventually meet the slow pointer within the cycle, which confirms the presence of a cycle. This method is efficient with a time complexity of O(n) and doesn't require extra space, making it a great solution for this problem.

3. Explain the Travelling Salesman Problem and its significance in algorithm theory.

The Travelling Salesman Problem (TSP) involves finding the shortest possible route that allows a salesman to visit each city once and return to the original city. It's a classic optimization problem in algorithm theory. The difficulty stems from the fact that it’s an NP-hard problem, meaning that as the number of cities increases, the number of possible routes grows factorially, making it computationally infeasible to solve using brute-force for large datasets.

TSP's significance lies not just in theoretical aspects, but also in practical applications like route planning, manufacturing, and DNA sequencing. Solving or approximating TSP efficiently can lead to significant cost and time savings in various industries. Additionally, studying TSP has led to the development of various heuristic and approximation algorithms, which are useful for tackling other complex optimization problems.

No strings attached, free trial, fully vetted.

Try your first call for free with every mentor you're meeting. Cancel anytime, no questions asked.

Nightfall illustration

4. What is a graph and how is it different from a tree?

A graph is a data structure consisting of a set of nodes (or vertices) and a set of edges that connect pairs of nodes. Graphs can be either directed, where edges have a direction, or undirected, where edges don't have a direction. Graphs can represent a wide range of real-world systems like social networks, web structures, and transport systems.

A tree is actually a specific type of graph—it's an acyclic, connected graph. This means a tree has no cycles (you can't start at a node, traverse some edges, and come back to the same node without retracing steps) and all its nodes are connected. Additionally, trees have a hierarchical structure with a single root node and any two nodes are connected by exactly one path. This makes trees a bit more restrictive and structured compared to general graphs.

5. Describe Kruskal's algorithm for finding the minimum spanning tree.

Kruskal's algorithm is used to find the minimum spanning tree (MST) in a graph. It works by sorting all the edges of the graph in ascending order based on their weights. You start by creating a forest, which is a set of trees where each vertex in the graph is a separate tree. Then, you iterate over the sorted edges and add each edge to the forest, but only if it doesn't form a cycle with the existing edges in the forest. You continue this process until there are exactly (V-1) edges in the forest, where (V) is the number of vertices in the graph. This gives you the MST.

To efficiently check for cycles when adding edges, Kruskal's algorithm often uses a Union-Find data structure, which supports the quick union and find operations necessary for keeping track of which vertices belong to which components (or trees). This allows the algorithm to run in (O(E \log E)) time, where (E) is the number of edges in the graph, primarily due to the time complexity of sorting the edges.

6. What is the difference between Prim's and Kruskal's algorithm?

Prim's algorithm and Kruskal's algorithm are both used to find the Minimum Spanning Tree (MST) in a connected, weighted graph, but they approach the problem differently. Prim's algorithm grows the MST from a starting vertex by adding the lowest-weight edge that connects a vertex in the MST to a vertex outside of it. It works well with dense graphs because it can take advantage of a priority queue to efficiently select the minimum edge.

In contrast, Kruskal's algorithm builds the MST by sorting all the edges in the graph by weight and then adding them one by one to the MST, as long as they don't form a cycle. It uses a union-find (disjoint-set) data structure to efficiently keep track of which vertices are in the same connected component. Kruskal's is particularly effective for sparse graphs since it focuses on edges rather than vertices.

To sum up, Prim’s algorithm is vertex-oriented and performs better in dense graphs with many edges, while Kruskal's algorithm is edge-oriented and excels in sparse graphs with fewer edges.

7. What are AVL trees and why are they used?

AVL trees are a type of self-balancing binary search tree. Named after their inventors Adelson-Velsky and Landis, they maintain a balance factor for each node, which is the difference between the heights of the left and right subtrees. The tree ensures that the balance factor is always -1, 0, or 1, which keeps the tree height logarithmic relative to the number of nodes.

They are used to ensure that operations like insertion, deletion, and lookup remain efficient, typically O(log n) in time complexity. This is particularly useful in applications where the input size can grow large, and maintaining a balanced tree is crucial for keeping performance optimal.

8. How does the binary search algorithm work?

Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one. You start by comparing the target value to the middle element of the list. If the target value is equal to the middle element, you've found the item. If the target value is less than the middle element, you repeat the search in the lower half, and if it's greater, you repeat the search in the upper half. This process continues until the target value is found or the search space is reduced to zero.

User Check

Find your perfect mentor match

Get personalized mentor recommendations based on your goals and experience level

Start matching

9. What is the time complexity of quicksort and why?

The average-case time complexity of quicksort is O(n log n). This occurs because the algorithm typically partitions the array in a balanced way, effectively halving the problem size with each recursive call. In the best case, you consistently divide the array into two nearly equal parts, which allows the divide-and-conquer strategy to work efficiently.

However, in the worst case, quicksort's time complexity is O(n^2). This happens when the pivot always ends up being the smallest or largest element in the array, leading to unbalanced partitions and thus, a deep recursion stack. This unbalanced partitioning occurs, for example, when the input array is already sorted in ascending or descending order and the pivot selection doesn't help in balancing. Nevertheless, with good pivot strategies, like choosing a random pivot or using the median-of-three rule, you can generally avoid the worst-case scenario.

10. Describe the Floyd-Warshall algorithm.

The Floyd-Warshall algorithm is an all-pairs shortest path algorithm that works for both directed and undirected weighted graphs. It systematically examines all paths between all pairs of vertices and updates the shortest known path. The algorithm uses a dynamic programming approach, with a matrix to store the shortest path distances between each pair of nodes.

It initializes the distances based on the direct edges between nodes and iteratively updates the matrix by considering each node as an intermediate point. Specifically, if the distance from node A to node B through an intermediate node C is shorter than the previously known distance from A to B, the algorithm updates the distance. This process repeats for all nodes as intermediates, ultimately providing the shortest paths between every pair of nodes.

11. How would you balance a binary search tree?

I’d answer this in two parts:

  1. Clarify whether we’re talking about keeping the tree balanced over time, or fixing one that’s already unbalanced.
  2. Then explain the tradeoff between the common approaches.

For keeping it balanced as you update it:

  • I’d use a self-balancing BST.
  • The two standard choices are AVL and Red-Black trees.

How I think about them:

  • AVL trees are stricter.
  • They keep the height difference between left and right subtrees very small.
  • That gives very fast lookups.
  • The downside is more rotations during inserts and deletes.

  • Red-Black trees are a bit more relaxed.

  • They enforce a set of coloring rules instead of tight height balance.
  • In practice, they still keep operations at O(log n).
  • They usually do fewer rotations, so they’re often a better general-purpose choice.

If the tree is already unbalanced and I need to rebuild it:

  • Do an inorder traversal to get the nodes in sorted order.
  • Then rebuild the tree by recursively picking the middle element as the root.
  • Repeat that for the left and right halves.

That gives you a balanced BST in O(n) time if you store the inorder result in an array first.

So my short version would be:

  • For dynamic updates, use AVL or Red-Black.
  • For a one-time rebalance, inorder traversal plus rebuild from the middle out is the cleanest approach.

12. How does a trie (prefix tree) work?

A trie is a tree-like data structure that stores a dynamic set of strings. Essentially, each node of the trie represents a single character of the string. Strings are inserted into the trie character by character, with each character serving as a path from one node to the next. Starting from the root, each edge represents a character in the string, and traversing down from the root to a leaf node (or to a node that marks the end of a string) spells out the stored string.

To check if a string is in the trie, you'd start at the root and traverse the nodes following the characters of the string. If you can follow the characters all the way to a node that marks the end of a string, the string exists in the trie. This method makes searching for strings very efficient, O(l) where l is the length of the string. They are particularly useful for prefix-based searches, as you can retrieve all words with a given prefix by reaching the last node of the prefix and exploring all its descendant nodes.

13. What is the purpose of the Bellman-Ford algorithm?

The Bellman-Ford algorithm is used to find the shortest paths from a single source vertex to all other vertices in a weighted graph. It's particularly useful for graphs with negative weight edges because it can detect negative weight cycles, something Dijkstra's algorithm can't do. Although it's less efficient than Dijkstra’s algorithm in graphs with only positive weights, its ability to handle negative weights makes it a valuable tool in certain scenarios.

14. Describe the differences between depth-first search (DFS) and breadth-first search (BFS).

Depth-first search (DFS) and breadth-first search (BFS) are two fundamental algorithms for traversing or searching tree or graph data structures. The main difference between them lies in their approach to exploring nodes.

DFS dives as deep as possible along each branch before backtracking. It uses a stack, either explicitly with a data structure or implicitly via recursion, to keep track of the next node to visit. This approach is useful for scenarios where you need to explore all possible paths, like puzzle solving or finding connected components in a graph.

BFS, on the other hand, explores all neighbors of a node before moving on to the neighbors' neighbors. It uses a queue to keep track of nodes to explore. This method is ideal for finding the shortest path in unweighted graphs, like in shortest route problems or web crawlers.

15. What is a heap data structure?

A heap is a specialized tree-based data structure that satisfies the heap property. In a max-heap, for example, every parent node has a value greater than or equal to the values of its children, making the largest value always at the root. Conversely, a min-heap has every parent node with a value less than or equal to its children, with the smallest value at the root. Heaps are typically implemented with arrays and are useful for algorithms like heapsort and for implementing priority queues.

16. What are hash tables and how are they implemented?

A clean way to answer this is:

  1. Start with what a hash table is.
  2. Explain the core pieces, array plus hash function.
  3. Mention collisions and the common ways to handle them.
  4. Close with performance and tradeoffs.

Here’s how I’d say it:

A hash table is a data structure for storing key -> value pairs, optimized for fast lookup.

The basic idea is simple:

  • You have an underlying array
  • A hash function takes a key, like a string or integer
  • It converts that key into an array index
  • You store the value at that location

So instead of scanning through data, you jump straight to where the item should be.

Core operations are usually:

  • Insert
  • Lookup
  • Delete

Under normal conditions, those are O(1) on average.

Implementation-wise, there are two main parts:

  • The bucket array
  • The hash function

A good hash function should:

  • Be fast to compute
  • Spread keys evenly across the array
  • Minimize collisions

A collision happens when two different keys map to the same index. Since that is unavoidable in practice, the table needs a collision strategy.

The two most common approaches are:

  • Chaining
  • Each array slot stores a small collection, often a linked list or dynamic list
  • If multiple keys hash to the same bucket, they go into that bucket’s chain

  • Open addressing

  • All entries live directly in the array
  • If a slot is taken, you probe for another one
  • Common probing strategies are linear probing, quadratic probing, and double hashing

You also have to manage the load factor, which is:

  • number of stored entries / number of buckets

If the load factor gets too high, performance starts to degrade because collisions become more frequent. So most implementations resize the array when it crosses a threshold, then rehash all existing entries into the new table.

Complexity-wise:

  • Average case for insert, lookup, delete: O(1)
  • Worst case: O(n), if lots of collisions happen

In practice, hash tables are great when you need very fast membership checks, lookups, counting, caching, or deduplication. The main tradeoff is that they give up ordering, and their performance depends heavily on good hashing and sensible resizing.

17. How do you find the kth largest element in an unsorted array?

One efficient way to find the kth largest element in an unsorted array is by using the Quickselect algorithm. Quickselect is a selection algorithm partially based on the QuickSort sorting algorithm. It works by first choosing a pivot element and partitioning the array into two sub-arrays: elements greater than the pivot and elements less than the pivot. Depending on the size of the sub-arrays relative to k, you can determine which sub-array contains the kth largest element and recursively apply Quickselect to that sub-array.

Alternatively, for a simpler approach especially when k is small and performance isn't a critical concern, you could use a min-heap. Insert the first k elements of the array into the heap. Then, for each remaining element, check if it's larger than the smallest element in the heap. If it is, remove the smallest element and insert the new element. By the end of the iteration, the root of the heap will be the kth largest element.

18. Explain the concept of memoization.

Memoization is a technique used to optimize programs by storing the results of expensive function calls and reusing them when the same inputs occur again. The main idea is to save time by avoiding the repeated computation of the same result. It's particularly useful in recursive algorithms where the same subproblems are solved multiple times. By caching previous results, you reduce the time complexity, often turning an exponential task into a polynomial one.

19. How would you reverse a linked list?

Reversing a linked list involves changing the direction of the pointers so that each node points to the previous node instead of the next one. You can achieve this iteratively or recursively. Let’s go through the iterative approach as it’s more intuitive and usually more efficient with respect to space complexity.

You keep three pointers: prev, current, and next. Initially, set prev to null and current to the head of the list. Then, iterate through the list, doing the following at each step: set next to current.next, reverse the current.next to point to prev, move prev to current, and current to next. Repeat this process until you’ve traversed the entire list. At the end, prev will be the new head of the reversed list.

Here's a quick pseudocode to help visualize it:

prev = null current = head while current is not null: next = current.next current.next = prev prev = current current = next head = prev

20. What are the advantages and disadvantages of using recursive algorithms?

Recursive algorithms can make code simpler and more readable, especially for problems that have a natural recursive structure like tree traversal or the Fibonacci sequence. They break down complex problems into smaller, more manageable ones and can be easier to implement for certain tasks.

However, recursion can be less efficient due to the overhead of multiple function calls and the risk of stack overflow for deep recursions. Memory usage can also be higher compared to iterative solutions. Tail recursion can help mitigate some issues, but not all languages optimize for it.

21. How does the Rabin-Karp string search algorithm work?

The Rabin-Karp string search algorithm uses hashing to find a pattern within a text efficiently. It works by first calculating a hash value for the pattern and then for each substring of the text with the same length as the pattern. Instead of checking the entire substring against the pattern each time, it first compares the hash values. If the hash values match, then it performs a direct comparison to account for any potential hash collisions. This can significantly speed up the search process, especially when multiple patterns need to be searched in a single text. The rolling hash technique is typically used to efficiently compute hash values of substrings.

22. Explain the concept of Big O, Big Theta, and Big Omega with examples.

Big O, Big Theta, and Big Omega are notations used to describe the time complexity of an algorithm, helping to understand its efficiency in terms of input size.

Big O, O(n), describes the upper bound of an algorithm's runtime. It tells us the worst-case scenario: how the runtime grows relative to the input size. For example, if an algorithm takes at most 3n^2 + 2n + 1 operations, we say its complexity is O(n^2) because the n^2 term grows the fastest and dominates for large inputs.

Big Theta, Θ(n), provides a tight bound on the runtime. It gives a more precise description by accounting for both the upper and lower bounds of the runtime. For instance, an algorithm with runtime 2n^2 + 3n + 4 is Θ(n^2), since n^2 is the dominant term, and the growth rate can be sandwiched between c1n^2 and c2n^2 for some constants c1 and c2.

Big Omega, Ω(n), represents the lower bound of an algorithm's runtime. This is best-case complexity. If an algorithm's runtime is at least n log n for an input size n, then we say it is Ω(n log n), indicating that, no matter what, the algorithm cannot perform better than n log n operations.

23. Explain the Boyer-Moore string search algorithm.

Boyer-Moore is a pattern matching algorithm that is fast in practice because it does not scan left to right one character at a time.

The key idea is:

  • Compare the pattern against the text from right to left
  • When there is a mismatch, shift the pattern by more than one position
  • Use precomputed rules to decide how far to jump

The two main heuristics are:

  1. Bad character rule
  2. Suppose you mismatch on text character c
  3. Look at the pattern and find the last occurrence of c
  4. Shift the pattern so that occurrence lines up with the text
  5. If c does not appear in the pattern, you can skip past it entirely

  6. Good suffix rule

  7. If a suffix of the pattern matched before the mismatch, reuse that information
  8. Shift the pattern so that the matched suffix lines up with another occurrence of that same suffix, or with a valid prefix
  9. This can produce even bigger jumps than the bad character rule

Why it is efficient:

  • It often skips large chunks of text
  • That makes it much faster than naive matching on real inputs
  • Especially good when the pattern is relatively long and the alphabet is reasonably large

Typical performance: - Preprocessing the pattern takes linear time - Worst-case search time is still linear with the refined versions - In practice, it is one of the classic high-performance string searching algorithms

Tiny intuition example:

If the pattern is EXAMPLE and you mismatch near the end, Boyer-Moore uses that mismatched character and any matched suffix to jump the pattern several positions, instead of moving it by just one.

So the one-line version is:

  • Boyer-Moore searches from right to left and uses mismatch information to skip ahead aggressively.

24. How can you find the shortest path in a weighted graph?

The clean way to answer this is:

  1. Start with the graph type, because the algorithm depends on edge weights.
  2. Name the right algorithm.
  3. Mention the key idea and time complexity.
  4. Call out edge cases like negative weights or negative cycles.

A solid answer would be:

  • If all edge weights are non-negative, I’d use Dijkstra’s algorithm.
  • Set the source distance to 0, everything else to infinity.
  • Use a min-priority queue to always expand the node with the smallest known distance.
  • For each neighbor, relax the edge, meaning update its distance if going through the current node is cheaper.
  • With a heap, the runtime is typically O((V + E) log V).

  • If the graph can have negative edge weights, Dijkstra is not reliable.

  • In that case, I’d use Bellman-Ford.
  • It relaxes every edge V - 1 times, which guarantees correct shortest paths if there’s no negative cycle.
  • It can also detect negative cycles.
  • Runtime is O(VE).

  • If every edge has the same weight, or the graph is effectively unweighted, BFS is enough.

  • BFS finds the shortest path in terms of number of edges.
  • Runtime is O(V + E).

  • If I need shortest paths from every node to every other node:

  • Floyd-Warshall works for dense graphs, O(V^3).
  • Or run Dijkstra from each node if the graph is sparse and weights are non-negative.

If I were answering in an interview, I’d usually say:

“For a weighted graph, the default choice is Dijkstra if all weights are non-negative. You keep track of the best known distance to each node, repeatedly pick the closest unvisited node, and relax its outgoing edges. If negative weights are allowed, I’d switch to Bellman-Ford, because Dijkstra can give the wrong answer there. And if all weights are equal, BFS is the simplest and most efficient option.”

25. Describe the process of heap sort.

Heap sort involves two main steps: building a max heap and then repeatedly extracting the maximum element from the heap.

First, you build a max heap from the input array. This is done by rearranging the array elements so that they satisfy the heap property, where each parent node is greater than or equal to its children. You typically start from the last non-leaf node and move upwards, ensuring each subtree is a valid heap.

Once the max heap is built, the sorting process begins. You repeatedly swap the first element (the maximum) with the last element of the unsorted section of the array. After each swap, you reduce the heap size by one and heapify the root element to ensure the remaining elements still form a max heap. This process is repeated until the heap size is reduced to one. The array is now sorted.

26. How can you determine if a binary tree is a valid binary search tree?

To determine if a binary tree is a valid binary search tree (BST), you can perform an in-order traversal of the tree and check if the values are in ascending order. An in-order traversal visits the nodes in the left subtree, then the root node, and then the nodes in the right subtree. If the sequence of visited nodes forms a strictly increasing sequence, it's a valid BST.

Alternatively, you can use a recursive approach where you pass down the range of valid values for each node. For the root node, the range is (-∞, +∞). For each node, you check if it falls within the allowed range, then recursively check its left subtree with the updated range (-∞, node's value) and the right subtree with (node's value, +∞). If all nodes satisfy this property, the tree is a valid BST.

27. How does the KMP (Knuth-Morris-Pratt) pattern matching algorithm work?

I like to explain KMP in two parts:

  1. Preprocess the pattern.
  2. Use that information to avoid re-checking characters during the search.

The core idea is simple:

  • When a mismatch happens, do not move all the way back to the start of the pattern.
  • Reuse what you already know matched.

1) Build the prefix table

KMP first computes an array, often called lps or "longest prefix suffix".

For each position in the pattern, lps[i] tells you:

  • the length of the longest proper prefix
  • that is also a suffix
  • for the pattern ending at i

"Proper prefix" just means not the whole string.

Example with pattern ababaca:

  • a -> 0
  • ab -> 0
  • aba -> 1, because a matches a
  • abab -> 2, because ab matches ab

So the table captures self-overlap inside the pattern.

2) Search the text

Now scan the text with two pointers:

  • i for the text
  • j for the pattern

If characters match:

  • move both forward

If j reaches the end of the pattern:

  • you found a match

If there is a mismatch:

  • if j > 0, jump j back to lps[j - 1]
  • if j == 0, just move i forward

That jump is the whole win. You do not re-compare characters that KMP already knows must match.

Why it is efficient

Naive matching can backtrack a lot and end up redoing work.

KMP avoids that by using the prefix table to decide the next best pattern position immediately.

That gives:

  • O(m) to build the prefix table
  • O(n) to scan the text
  • total O(n + m)

Intuition

If you matched, say, 5 characters and then failed on the 6th, KMP asks:

  • "Of the 5 characters I already matched, is there a smaller prefix of the pattern that could still be valid here?"

The lps table answers that instantly.

So instead of starting over, KMP shifts the pattern intelligently.

28. What is the difference between an array and a linked list?

I’d explain it by comparing how they store data, and what that means for performance.

  • Array:
  • Stores elements in contiguous memory.
  • Fast random access, O(1) by index.
  • Insertions or deletions in the middle are expensive, usually O(n) because elements need to shift.
  • Usually better cache performance, so it tends to be faster in practice for reads.

  • Linked list:

  • Stores elements as separate nodes, each pointing to the next one.
  • No direct indexing, so access is O(n) because you have to walk through the list.
  • Insertions and deletions can be O(1) if you already have the node or pointer to the right spot.
  • Uses extra memory for pointers, and has worse cache locality.

The practical tradeoff is:

  • Use an array when you want fast lookups and mostly sequential reads.
  • Use a linked list when you expect lots of inserts and deletes, especially if they happen in places you already have a reference to.

A quick example:

  • If I need the 500th element, an array gets it instantly.
  • In a linked list, I’d have to traverse node by node until I reach it.

So in one line, arrays optimize access, linked lists optimize structural updates.

29. What is the principle behind Dijkstra’s algorithm?

A clean way to answer this is:

  1. Start with the core idea, the algorithm is greedy.
  2. Explain the invariant, once a node is chosen, its shortest distance is final.
  3. Mention the key condition, edge weights must be non-negative.

Example answer:

Dijkstra’s algorithm works on a greedy principle.

At each step, it picks the unvisited node with the smallest known distance from the source. The reason that works is, with non-negative edge weights, there is no later path that can make that node cheaper once it is the smallest available.

From there, it relaxes its outgoing edges, meaning it checks whether going through that node gives a shorter path to its neighbors, and updates those distances if it does.

So the main idea is:

  • keep track of the best distance found so far
  • always expand the closest unvisited node next
  • treat that distance as finalized
  • repeat until all shortest paths are found

One important caveat, Dijkstra’s algorithm only works correctly when all edge weights are non-negative.

30. Describe a situation where you would use a greedy algorithm.

I’d use a greedy algorithm when two things are true:

  1. A locally best choice is likely to move you toward the global best answer.
  2. The problem has some structure that makes those local choices safe, not just fast.

A clean example is interval scheduling.

Say I have a bunch of meetings, each with a start and end time, and I want to fit in the maximum number of non-overlapping meetings.

The greedy strategy is:

  • Sort meetings by earliest finish time
  • Pick the meeting that ends first
  • Then keep picking the next meeting that starts after the last one ends

Why greedy works well here:

  • Choosing the meeting that finishes earliest leaves the most room for everything else
  • You’re making the best decision for the current step without needing to backtrack
  • This actually gives the optimal answer, not just a decent approximation

I’d also call out that greedy is not something I use blindly.

For example, coin change is a famous case where greedy sometimes works and sometimes doesn’t. If the denominations are standard, like 1, 5, 10, 25, taking the largest coin first is fine. But with unusual denominations, that same strategy can fail, so I’d switch to dynamic programming.

So my rule of thumb is:

  • Use greedy when I can justify that the local choice stays globally valid
  • Avoid it when early choices can block a better solution later

31. Explain the concept of backtracking with an example.

A clean way to explain backtracking is:

  1. Build a solution one decision at a time.
  2. Check early if the current partial solution can still work.
  3. If not, undo the last choice and try the next option.
  4. Keep going until you find a valid answer, or exhaust all possibilities.

So it is basically depth-first search with pruning. You explore a path, and the moment it becomes invalid, you stop exploring that branch.

A simple example is the N-Queens problem.

You want to place N queens on an N x N chessboard so that no two queens attack each other.

How backtracking works there:

  • Start with the first row.
  • Try placing a queen in each column.
  • If a position is safe, move to the next row.
  • If you reach a row where no column works, that means an earlier choice was bad.
  • Remove the last queen, try a different column, and continue.

For example, in 4-Queens:

  • Put a queen in row 1, column 1.
  • Move to row 2, try valid spots.
  • Keep going row by row.
  • If row 4 has no valid position, back up to row 3 and change that placement.
  • Repeat until you either find a full board configuration or confirm none exists.

Why backtracking is useful:

  • It avoids brute-forcing every complete combination blindly.
  • It cuts off bad paths early.
  • It works well for combinatorial problems like:
  • permutations
  • subsets
  • Sudoku
  • maze solving
  • constraint satisfaction problems

The key idea is, "make a choice, recurse, and undo the choice if it doesn’t lead anywhere."

32. Explain the concept of dynamic programming with an example.

A clean way to explain dynamic programming is:

  1. Identify a problem with overlapping subproblems.
  2. Define a recurrence, meaning how a bigger answer depends on smaller answers.
  3. Store intermediate results so you do not solve the same subproblem again.
  4. Build the answer either top-down with memoization or bottom-up with a table.

In simple terms, dynamic programming is about trading a little extra memory for a big speedup.

Example, Fibonacci:

  • Naive recursion for fib(n) recomputes the same values over and over.
  • For example, fib(5) needs fib(4) and fib(3).
  • Then fib(4) also needs fib(3) and fib(2).
  • So fib(3) gets recalculated multiple times.

With dynamic programming, you save each result once:

  • fib(0) = 0
  • fib(1) = 1
  • fib(2) = fib(1) + fib(0) = 1
  • fib(3) = fib(2) + fib(1) = 2
  • fib(4) = fib(3) + fib(2) = 3
  • fib(5) = fib(4) + fib(3) = 5

Why this matters:

  • Recursive brute force is exponential, about O(2^n).
  • Dynamic programming reduces it to O(n) time.
  • Space is typically O(n), or even O(1) if you only keep the last two values.

So the core idea is, solve each subproblem once, save it, and reuse it. That is the heart of dynamic programming.

33. How would you implement a minimum heap?

I’d implement a min heap with an array, because it gives you a really clean mapping between indices and the binary tree structure.

How I’d structure the answer in an interview:

  1. Start with the core representation.
  2. Explain the heap property.
  3. Walk through the main operations, insert, extractMin, maybe peek.
  4. Mention time complexity.
  5. If needed, talk through edge cases.

A clean implementation looks like this:

  • Store elements in a list heap
  • For index i:
  • parent = (i - 1) // 2
  • left child = 2 * i + 1
  • right child = 2 * i + 2

The key rule is:

  • Every parent is less than or equal to its children
  • That means the minimum element is always at index 0

Main operations:

  • insert(x)
  • Append x to the end
  • "Bubble up" while it’s smaller than its parent
  • Swap until the heap property is restored

  • extractMin()

  • Save the root value
  • Move the last element to the root
  • Remove the last slot
  • "Bubble down" by swapping with the smaller child until the heap property is restored

  • peek()

  • Just return heap[0] if the heap isn’t empty

  • buildHeap(array)

  • Start from the last non-leaf node and heapify downward
  • This is more efficient than inserting one by one

Complexities:

  • insert: O(log n)
  • extractMin: O(log n)
  • peek: O(1)
  • buildHeap: O(n)

If I were coding it, I’d usually define helper methods like:

  • _heapify_up(i)
  • _heapify_down(i)
  • _swap(i, j)

Example flow:

  • Insert 5, 3, 8, 1
  • Array evolves like:
  • [5]
  • [3, 5]
  • [3, 5, 8]
  • [1, 3, 8, 5]

Because each insert bubbles the smaller value upward.

A concise Python-style implementation would be:

  • Keep self.heap = []
  • In insert, append and heapify up
  • In extractMin, handle the 1-element case separately, otherwise replace root with last element and heapify down

I’d also mention edge cases, because interviewers like that:

  • Extracting from an empty heap
  • Heap with one element
  • Duplicate values, which work fine
  • If you need custom ordering, store tuples like (priority, value)

If they want, I can also extend it to support decreaseKey or explain how this differs from a max heap.

34. Explain what a binary search tree (BST) is.

A binary search tree is a binary tree with one key rule:

  • Everything in the left subtree is smaller than the node
  • Everything in the right subtree is larger than the node

That ordering is what makes it useful.

Example:

  • Root is 10
  • Left child could be 5
  • Right child could be 15
  • 3 would go somewhere under the left side
  • 12 would go somewhere under the right side

Because the tree is ordered, you can search by repeatedly choosing left or right instead of scanning everything.

Why it matters:

  • Search is efficient
  • Insert is efficient
  • Delete is also relatively efficient

Typical performance:

  • Average case, O(log n)
  • Worst case, O(n) if the tree gets badly unbalanced and turns into something like a linked list

That is also why balanced BSTs matter. Variants like AVL trees and Red-Black trees keep the height under control so those operations stay fast.

35. Describe the A* search algorithm and its use cases.

I’d explain A* in two parts:

  1. What it optimizes
  2. Why it’s faster than plain shortest-path search

A* is a graph search algorithm for finding the lowest-cost path from a start state to a goal state.

The key idea is that it ranks each candidate node with:

  • g(n), the exact cost from the start to node n
  • h(n), a heuristic estimate from n to the goal
  • f(n) = g(n) + h(n)

So instead of exploring blindly, it prefers nodes that look cheap so far and also promising going forward.

How it works, at a high level:

  • Start from the source node
  • Keep a priority queue ordered by lowest f(n)
  • Repeatedly expand the most promising node
  • For each neighbor, update its best-known path cost
  • Stop when the goal is popped from the queue

Why it’s useful:

  • If h(n) = 0, A* basically becomes Dijkstra’s algorithm
  • If the heuristic is admissible, meaning it never overestimates the true remaining cost, A* still finds an optimal path
  • A good heuristic can massively reduce the search space, which is why A* is often much faster than uninformed search

Typical use cases:

  • Pathfinding in games, like moving units on a map
  • Robotics, for navigation through obstacles
  • Road routing and GPS-style map search
  • Puzzle solving, like the 8-puzzle or 15-puzzle
  • State-space planning problems where you can define a meaningful heuristic

A simple example is grid navigation:

  • g(n) is the distance traveled so far
  • h(n) might be Manhattan distance to the target
  • A* uses both to head toward the goal without giving up shortest-path guarantees

The main tradeoff is memory. A can store a lot of frontier nodes, so on very large search spaces it may need variants like IDA, weighted A*, or hierarchical approaches.

36. How do you find the longest common subsequence in two sequences?

I’d answer this by giving the core idea first, then the recurrence, then the complexity.

The standard approach is dynamic programming.

  • Let dp[i][j] be the length of the LCS of:
  • the first i elements of sequence A
  • the first j elements of sequence B

Then fill the table like this:

  1. Base case
  2. If i == 0 or j == 0, the LCS length is 0

  3. Transition

  4. If A[i-1] == B[j-1], then
    dp[i][j] = dp[i-1][j-1] + 1
  5. Otherwise,
    dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Why this works:

  • If the current elements match, they can extend a common subsequence
  • If they do not match, the best answer must come from skipping one element from either A or B

Once the table is filled:

  • dp[m][n] gives the length of the longest common subsequence
  • If you want the actual subsequence, you backtrack from dp[m][n]
  • move diagonally when characters match
  • otherwise move to the neighboring cell with the larger value

Complexity:

  • Time: O(mn)
  • Space: O(mn)

If I only need the length, I can optimize space to O(min(m, n)) by keeping just the previous row and current row.

If this were an interview, I’d also call out the difference between subsequence and substring:

  • subsequence, characters stay in order but do not need to be contiguous
  • substring, characters must be contiguous

That distinction usually matters a lot.

37. Walk me through how you would design an algorithm to find the median of two sorted arrays, and explain the time and space complexity of your approach.

I’d use the classic binary search partition approach. The key idea is:

  • You do not need to merge the arrays.
  • You just need to find a split point where:
  • everything on the left side is less than or equal to everything on the right side
  • the left side has the same number of elements as the right side, or one extra if the total length is odd

How to think about it

  1. Let the arrays be A and B, with lengths m and n.
  2. Always binary search on the smaller array. That keeps the runtime at O(log(min(m, n))).
  3. Pick a partition in A, call it i.
  4. That determines the partition in B, call it j, so that the total number of elements on the left is (m + n + 1) / 2.

So:

  • left part of A is A[0..i-1]
  • right part of A is A[i..]
  • left part of B is B[0..j-1]
  • right part of B is B[j..]

Now check whether the partition is valid.

What makes a partition valid

You want:

  • A_left_max <= B_right_min
  • B_left_max <= A_right_min

Where:

  • A_left_max = A[i-1], or -∞ if i == 0
  • A_right_min = A[i], or +∞ if i == m
  • B_left_max = B[j-1], or -∞ if j == 0
  • B_right_min = B[j], or +∞ if j == n

If both conditions hold, you found the correct split.

How binary search moves

  • If A_left_max > B_right_min, then i is too far right, move left.
  • If B_left_max > A_right_min, then i is too far left, move right.

Because both arrays are sorted, this decision is monotonic, so binary search works.

How to get the median

Once the partition is valid:

  • If total length is odd:
  • median is max(A_left_max, B_left_max)
  • If total length is even:
  • median is (max(A_left_max, B_left_max) + min(A_right_min, B_right_min)) / 2

Small example

Say:

  • A = [1, 3, 8]
  • B = [7, 9, 10, 11]

Total length is 7, so left side should contain 4 elements.

Try partition i = 2 in A: - left of A = [1, 3] - right of A = [8]

Then j = 4 - 2 = 2 in B: - left of B = [7, 9] - right of B = [10, 11]

Now: - A_left_max = 3 - A_right_min = 8 - B_left_max = 9 - B_right_min = 10

Check: - 3 <= 10, yes - 9 <= 8, no

So i is too far left, move right.

Try i = 3: - A_left_max = 8 - A_right_min = +∞ - j = 1 - B_left_max = 7 - B_right_min = 9

Check: - 8 <= 9, yes - 7 <= +∞, yes

Valid partition.

Since total length is odd, median is max(8, 7) = 8.

Why this is better than merging

A straightforward merge would be:

  • merge both sorted arrays
  • take the middle

That works, but costs: - time: O(m + n) - space: O(m + n) if you build a merged array

You can optimize the merge-style approach to O(1) extra space by only walking until the median, but time is still O(m + n).

The binary search partition approach is better: - Time: O(log(min(m, n))) - Space: O(1)

Edge cases to mention in an interview

  • One array is empty
  • Arrays have very different sizes
  • Duplicates
  • All elements of one array are smaller than all elements of the other
  • Even vs odd total length

If I were explaining this in an interview, I’d structure it like this:

  1. Start with the brute force merge idea.
  2. Say we can do better by exploiting sorted order.
  3. Introduce partitioning instead of merging.
  4. Explain the two validity inequalities.
  5. Show why binary search applies.
  6. State:
  7. time: O(log(min(m, n)))
  8. space: O(1)

If you want, I can also give the actual code for this in Python, Java, or C++.

38. Tell me about a time you had to optimize an algorithm or data-processing workflow for performance; what trade-offs did you consider and what was the outcome?

A good way to answer this is with a tight STAR structure:

  1. Situation, what system or workflow was slow, and why it mattered.
  2. Task, what you specifically owned.
  3. Action, what you changed, and how you evaluated trade-offs.
  4. Result, measurable impact, plus what you learned.

For this kind of question, interviewers usually want two things: - Your performance instincts, how you find bottlenecks and choose the right optimization. - Your judgment, especially what you did not optimize, and why.

Here is a strong example answer:

At my last team, we had a nightly data-processing pipeline that computed customer-level analytics from raw event logs. It started out fine, but as volume grew, runtime went from about 40 minutes to over 3 hours, which was causing downstream reports to miss their SLA by the start of business.

I owned figuring out where the slowdown was and getting it back under target without making the pipeline too complex to maintain.

The first thing I did was profile the workflow instead of guessing. I found two main bottlenecks. One was an expensive join step that was repeatedly scanning a very large events table. The second was a deduplication step that was doing a global sort, even though most duplicates were local to a smaller key range.

I looked at a few options. We could scale the job with more compute, rewrite the whole thing in a lower-level framework, or change the algorithm and data layout. I chose to first attack the algorithmic inefficiencies because that would reduce cost as well as latency.

I made three changes: - I partitioned the input data by customer and date so the join touched much less data. - I replaced the full global sort for dedup with a hash-based aggregation keyed on the business identifier, which brought that step closer to linear time in practice. - I added a small precomputed reference table for the most frequently joined metadata so we could avoid repeatedly recomputing or rescanning it.

The trade-offs were pretty real: - Hash-based dedup used more memory, so I had to validate that it would not create instability on peak days. - Partitioning improved runtime a lot, but it added some operational complexity, especially around backfills and skewed partitions. - Precomputing a reference table introduced a freshness trade-off, so we had to define acceptable staleness and add validation checks.

To manage that, I ran the new workflow in shadow mode for a week, compared outputs against the old pipeline, and added monitoring on memory usage, partition skew, and data-quality diffs.

The outcome was that runtime dropped from a little over 3 hours to about 28 minutes, and compute cost went down around 35 percent because we were processing less unnecessary data. More importantly, the pipeline became predictable again, and the business stopped missing reporting deadlines.

What I liked about that project was that the best answer was not just, make it faster. It was balancing speed, cost, correctness, and maintainability, then using measurement to make the trade-offs explicit.

39. Imagine your team is debating whether to use a dynamic programming solution or a greedy approach for a new optimization problem with tight latency constraints. How would you evaluate the options and make a recommendation?

I’d frame it as a tradeoff between optimality, predictability, and runtime.

How I’d evaluate it

  1. Start with the problem properties
  2. Ask, does the problem have:
  3. Optimal substructure, which makes DP viable
  4. Greedy choice property, which makes a greedy method correct
  5. If greedy is being considered, I would not trust it just because it is faster. I’d want either:
  6. A proof of correctness, or
  7. Strong evidence from the problem structure, like interval scheduling, Huffman coding, MST-style cut properties

  8. Quantify the latency budget

  9. What is the actual SLA, p50, p95, p99?
  10. How large can inputs get in production, not just average case?
  11. Is latency per request, batch, or offline precomputation?
  12. How much memory is available?

This matters because DP can be fine if the state space is small or can be precomputed, but impossible if it explodes at production scale.

  1. Compare asymptotics and constants
  2. DP:
  3. Usually gives exact answers
  4. Runtime depends on state space and transitions
  5. Can be too slow or memory-heavy under tight latency
  6. Greedy:
  7. Usually much cheaper, often O(n) or O(n log n)
  8. Easier to operationalize
  9. Risk is suboptimal output if the greedy rule is not actually valid

I’d also care about constant factors, not just big-O, if latency is tight.

  1. Validate correctness risk
  2. For DP, correctness is usually easier to reason about once the recurrence is right.
  3. For greedy, the biggest question is, can we prove local choices lead to a global optimum?
  4. If we cannot prove greedy, I’d treat it as a heuristic, not a true alternative.

  5. Measure business impact of suboptimality

  6. If greedy is occasionally 1 percent worse but 10 times faster, that might be the right choice.
  7. If being suboptimal causes real revenue loss, fairness issues, or bad user outcomes, exact DP may be worth it.
  8. This is often the deciding factor.

What recommendation I’d make

I’d recommend based on these scenarios:

  • Use greedy if:
  • We can prove it is optimal, or the problem has known greedy structure
  • Latency is very tight
  • Simplicity and robustness matter
  • Small quality loss is acceptable, if it is heuristic

  • Use DP if:

  • We need exact optimal answers
  • The state space fits comfortably within the latency and memory budget
  • Input sizes are bounded enough that worst-case behavior is safe

  • Use a hybrid if:

  • DP is too expensive online, but can be used offline for benchmarking or precomputation
  • Greedy can generate a fast candidate, and limited DP or local search can refine it
  • We can use DP on small cases, greedy on large cases

How I’d drive the decision as a team

I would propose a short, evidence-based process:

  1. Formalize correctness
  2. Try to prove greedy.
  3. If we cannot, define it as heuristic.

  4. Build both lightweight prototypes

  5. Measure p50, p95, p99 latency, memory, and output quality on realistic workloads.

  6. Use DP as a gold standard if possible

  7. Compare greedy’s solution quality against DP on smaller instances where DP is tractable.

  8. Decide with a clear rule

  9. Example:
  10. If greedy is provably optimal, ship greedy.
  11. If not provably optimal, but within quality tolerance and comfortably meets latency, ship greedy with guardrails.
  12. If greedy quality is not acceptable, either optimize DP or redesign the product constraints.

Concrete recommendation I’d likely give

If latency is truly tight, my default bias is toward greedy, but only if one of these is true: - It is provably correct, or - Its quality gap is empirically tiny and acceptable to the business

Otherwise, I’d avoid pretending a heuristic is equivalent to an exact algorithm. I’d present the tradeoff clearly, recommend the fastest approach that meets both SLA and quality targets, and back it with benchmark data rather than intuition.

If I were saying this in an interview, I’d close with something like: - “I’d use theory to rule in or rule out greedy correctness, then use experiments to validate latency and quality on production-like inputs. Under tight SLAs, I’d prefer greedy if it is provably optimal or the approximation error is demonstrably acceptable. If exactness is non-negotiable and DP fits the budget, then DP is the right choice.”

40. Describe a project where you implemented or significantly modified a graph, tree, or indexing algorithm in production, and what you learned from the experience.

I’d answer this in a simple structure:

  1. Context, what the system did and why the existing approach was failing.
  2. My specific technical contribution, what algorithm or data structure I changed.
  3. Production constraints, latency, memory, correctness, rollout risk.
  4. Result, measurable impact.
  5. What I learned.

A concrete example:

At a previous company, I worked on a product catalog and search platform for a marketplace. One painful area was category and attribute filtering. We had millions of items, a deep category hierarchy, and a lot of faceted filters like brand, price range, seller region, and availability. The existing implementation treated the category tree mostly like relational joins plus some cached lookups. It worked at small scale, but under heavy traffic the latency was inconsistent, especially for broad categories like Electronics where users expected near-instant filter updates.

My main contribution was redesigning the indexing and query path around a tree-aware inverted index.

What I changed:

  • I represented the category hierarchy explicitly as a tree with precomputed entry and exit timestamps from a DFS traversal.
  • That let us answer subtree membership checks in O(1) using interval containment, instead of repeatedly walking parent pointers or doing recursive expansion at query time.
  • On top of that, I changed the search index so each category node had compressed posting lists for item IDs, and broad category queries could be served by either:
  • directly using a materialized subtree posting list for hot nodes, or
  • unioning child posting lists for colder paths.
  • For facets, I added per-category aggregate counts using a hybrid strategy:
  • precompute counts for the top N hottest categories and attributes,
  • compute the long tail on demand and cache it.

The interesting part was that this was not just an algorithm exercise. The production challenge was balancing CPU, memory, and index freshness.

A few tradeoffs I had to make:

  • Fully materializing posting lists for every subtree made queries fast, but index size exploded.
  • Computing everything dynamically reduced memory, but query latency became noisy.
  • Precomputing all facet counts was too expensive during reindexing.

So I introduced a tiered approach:

  • Hot categories got materialized subtree indexes.
  • Cold categories used child-list unions.
  • Facet aggregates were partially precomputed based on traffic patterns.
  • We kept separate incremental update paths for inventory changes, so in-stock filters stayed fresh without forcing full reindexing.

How I rolled it out:

  • First, I built an offline correctness harness that compared old and new query results on a big sample of production traffic.
  • Then I shadowed a percentage of live requests and logged diffs in result sets and facet counts.
  • Only after the diff rate was low and explainable did we start ramping traffic.

Impact:

  • P95 facet-query latency dropped by about 40 percent.
  • P99 improved even more, because we removed the recursive category expansion and a lot of repeated DB work.
  • Cache hit rate improved since the query shape became more stable.
  • We also cut database load materially during peak traffic.

What I learned:

  • The right algorithm in production is usually a hybrid, not the academically cleanest version. The best design came from understanding workload skew, especially hot versus cold categories.
  • Precomputation is powerful, but only if you’re disciplined about freshness and storage cost.
  • For graph or tree changes, correctness validation matters as much as speed. Shadow traffic and diff tooling saved us from subtle bugs around category moves and partial inventory updates.
  • Data distribution matters more than asymptotic complexity in a lot of real systems. A theoretically fine approach can still fall over if a few giant nodes dominate traffic.
  • Finally, I learned to design for operability. Metrics like index build time, memory growth, stale-count rate, and diff rate were just as important as the raw query latency.

If I were saying this live in an interview, I’d keep it to about two minutes and make sure to emphasize where I personally made the technical decisions, rather than describing the whole team’s work.

Get Interview Coaching from Algorithms Experts

Knowing the questions is just the start. Work with experienced professionals who can help you perfect your answers, improve your presentation, and boost your confidence.

Complete your Algorithms interview preparation

Comprehensive support to help you succeed at every stage of your interview journey

Still not convinced? Don't just take our word for it

We've already delivered 1-on-1 mentorship to thousands of students, professionals, managers and executives. Even better, they've left an average rating of 4.9 out of 5 for our mentors.

Find Algorithms Interview Coaches