Binary Search
Cheat Sheet
Prime Use Case
Use when searching for an element in a sorted array, or when the problem can be modeled as finding the first or last value that satisfies a monotonic boolean predicate (Binary Search on Answer).
Critical Tradeoffs
- Time: O(log N) vs Space: O(1) for iterative implementations
- Requires O(N log N) pre-sorting overhead if data is unsorted
- Random access (O(1) indexing) is a prerequisite for the O(log N) time complexity
Killer Senior Insight
Binary search is not just an array lookup tool; it is a generic optimization framework for any monotonic function where you are looking for a 'transition point' between true and false.
Recognition
Common Interview Phrases
Common Scenarios
- Finding elements in a sorted array or matrix
- Finding the peak element in a unimodal array
- Finding the first/last occurrence of a duplicate element
- Binary search on the answer (e.g., Koko Eating Bananas, Split Array Largest Sum)
Anti-patterns to Avoid
- Applying binary search to a Linked List (accessing the middle element is O(N), making the total time O(N))
- Using it on unsorted data without considering if the cost of sorting O(N log N) outweighs a simple linear scan O(N)
- Using it on very small datasets where cache-friendly linear scans might actually be faster
The Problem
The Fundamental Issue
Eliminates the need for a linear O(N) scan by leveraging the structural properties of sorted data to discard half the search space at each step.
What breaks without it
Time Limit Exceeded (TLE) errors on large datasets (N > 10^5)
Inability to solve 'optimization' problems where the search space is continuous or extremely large
Why alternatives fail
Linear search is too slow for high-frequency queries or massive datasets
Hash Maps provide O(1) lookup but require O(N) extra space and do not support range-based queries or 'closest element' searches
Mental Model
The Intuition
Think of finding a page in a physical dictionary. You open to the middle, see if your word is before or after that page, and rip the half you don't need in half again until you find the word.
Key Mechanics
Initialize two pointers: low (start of range) and high (end of range)
Calculate the midpoint to avoid integer overflow: mid = low + (high - low) / 2
Evaluate the condition at mid
Adjust pointers: if target > mid, low = mid + 1; else high = mid - 1 (or variants based on the predicate)
Framework
When it's the best choice
- The search space is sorted and allows O(1) random access
- The problem asks for an optimal value in a bounded range where the feasibility of a value follows a 'TTTTTFFFFF' pattern
When to avoid
- Data is frequently changing (insertions/deletions), making maintaining a sorted state expensive
- The search space is not monotonic (the 'check' function fluctuates)
Fast Heuristics
Tradeoffs
Strengths
- Extremely fast scaling: searching 1 billion elements takes only ~30 comparisons
- Constant O(1) space complexity for iterative versions
- Versatile: applies to discrete integers, floating point ranges, and abstract decision problems
Weaknesses
- Strict requirement for sorted data
- Implementation is notoriously prone to off-by-one errors and infinite loops
- Requires contiguous memory or O(1) indexing to maintain O(log N) performance
Alternatives
When it wins
When data is uniformly distributed (e.g., a phone book)
Key Difference
Estimates the position of the target based on its value, achieving O(log log N) average time
When it wins
When you only need exact matches and have extra memory
Key Difference
O(1) average time but loses ordering and range-query capabilities
When it wins
When searching in an unbounded or infinite list
Key Difference
Finds the range first by doubling the index, then performs binary search
Execution
Must-hit talking points
- Mention integer overflow prevention: 'low + (high - low) / 2' instead of '(low + high) / 2'
- Define the loop invariant clearly (e.g., 'the target is always within [low, high]')
- Discuss the choice of template: while(low <= high) vs while(low < high)
- Explain 'Binary Search on Answer' for optimization problems
Anticipate follow-ups
- Q:How would you modify this for a circular/rotated sorted array?
- Q:How do you handle duplicates (finding the leftmost vs rightmost index)?
- Q:What if the array is too large to fit in memory (External Binary Search)?
- Q:Can you implement this on a stream of data?
Red Flags
Infinite loops caused by incorrect boundary updates
Why it fails: Using 'low = mid' in a 'while(low < high)' loop can lead to a state where low and high never converge if mid is calculated with floor division.
Integer overflow in mid calculation
Why it fails: In languages like Java or C++, (low + high) can exceed the maximum value of a 32-bit integer before the division by 2 occurs.
Incorrectly identifying the search range
Why it fails: Setting 'high' to 'arr.length' instead of 'arr.length - 1' (or vice versa) without adjusting the loop condition leads to IndexOutOfBounds or missing the last element.