Binary Search

A divide-and-conquer search algorithm that finds the position of a target value within a sorted collection or a monotonic search space by repeatedly halving the search interval.

Cheat Sheet

Prime Use Case

Use when searching for an element in a sorted array, or when the problem can be modeled as finding the first or last value that satisfies a monotonic boolean predicate (Binary Search on Answer).

Critical Tradeoffs

  • Time: O(log N) vs Space: O(1) for iterative implementations
  • Requires O(N log N) pre-sorting overhead if data is unsorted
  • Random access (O(1) indexing) is a prerequisite for the O(log N) time complexity

Killer Senior Insight

Binary search is not just an array lookup tool; it is a generic optimization framework for any monotonic function where you are looking for a 'transition point' between true and false.

Recognition

Common Interview Phrases

Input is sorted or nearly sorted
Constraints involve N up to 10^12 (suggesting logarithmic time)
Phrases like 'Find the minimum value such that...', 'Find the maximum possible...', or 'Minimize the maximum...'
Searching for a specific value in a range where a 'check(x)' function is monotonic

Common Scenarios

  • Finding elements in a sorted array or matrix
  • Finding the peak element in a unimodal array
  • Finding the first/last occurrence of a duplicate element
  • Binary search on the answer (e.g., Koko Eating Bananas, Split Array Largest Sum)

Anti-patterns to Avoid

  • Applying binary search to a Linked List (accessing the middle element is O(N), making the total time O(N))
  • Using it on unsorted data without considering if the cost of sorting O(N log N) outweighs a simple linear scan O(N)
  • Using it on very small datasets where cache-friendly linear scans might actually be faster

The Problem

The Fundamental Issue

Eliminates the need for a linear O(N) scan by leveraging the structural properties of sorted data to discard half the search space at each step.

What breaks without it

Time Limit Exceeded (TLE) errors on large datasets (N > 10^5)

Inability to solve 'optimization' problems where the search space is continuous or extremely large

Why alternatives fail

Linear search is too slow for high-frequency queries or massive datasets

Hash Maps provide O(1) lookup but require O(N) extra space and do not support range-based queries or 'closest element' searches

Mental Model

The Intuition

Think of finding a page in a physical dictionary. You open to the middle, see if your word is before or after that page, and rip the half you don't need in half again until you find the word.

Key Mechanics

1

Initialize two pointers: low (start of range) and high (end of range)

2

Calculate the midpoint to avoid integer overflow: mid = low + (high - low) / 2

3

Evaluate the condition at mid

4

Adjust pointers: if target > mid, low = mid + 1; else high = mid - 1 (or variants based on the predicate)

Framework

When it's the best choice

  • The search space is sorted and allows O(1) random access
  • The problem asks for an optimal value in a bounded range where the feasibility of a value follows a 'TTTTTFFFFF' pattern

When to avoid

  • Data is frequently changing (insertions/deletions), making maintaining a sorted state expensive
  • The search space is not monotonic (the 'check' function fluctuates)

Fast Heuristics

If N < 100, Linear Search is fine and often faster due to constant factors
If N > 10^5 and data is sorted, Binary Search is mandatory
If you need to find an element in a dynamic set, consider a Balanced BST or Skip List instead

Tradeoffs

+

Strengths

  • Extremely fast scaling: searching 1 billion elements takes only ~30 comparisons
  • Constant O(1) space complexity for iterative versions
  • Versatile: applies to discrete integers, floating point ranges, and abstract decision problems

Weaknesses

  • Strict requirement for sorted data
  • Implementation is notoriously prone to off-by-one errors and infinite loops
  • Requires contiguous memory or O(1) indexing to maintain O(log N) performance

Alternatives

Interpolation Search
Alternative

When it wins

When data is uniformly distributed (e.g., a phone book)

Key Difference

Estimates the position of the target based on its value, achieving O(log log N) average time

Hash Table
Alternative

When it wins

When you only need exact matches and have extra memory

Key Difference

O(1) average time but loses ordering and range-query capabilities

Exponential Search
Alternative

When it wins

When searching in an unbounded or infinite list

Key Difference

Finds the range first by doubling the index, then performs binary search

Execution

Must-hit talking points

  • Mention integer overflow prevention: 'low + (high - low) / 2' instead of '(low + high) / 2'
  • Define the loop invariant clearly (e.g., 'the target is always within [low, high]')
  • Discuss the choice of template: while(low <= high) vs while(low < high)
  • Explain 'Binary Search on Answer' for optimization problems

Anticipate follow-ups

  • Q:How would you modify this for a circular/rotated sorted array?
  • Q:How do you handle duplicates (finding the leftmost vs rightmost index)?
  • Q:What if the array is too large to fit in memory (External Binary Search)?
  • Q:Can you implement this on a stream of data?

Red Flags

Infinite loops caused by incorrect boundary updates

Why it fails: Using 'low = mid' in a 'while(low < high)' loop can lead to a state where low and high never converge if mid is calculated with floor division.

Integer overflow in mid calculation

Why it fails: In languages like Java or C++, (low + high) can exceed the maximum value of a 32-bit integer before the division by 2 occurs.

Incorrectly identifying the search range

Why it fails: Setting 'high' to 'arr.length' instead of 'arr.length - 1' (or vice versa) without adjusting the loop condition leads to IndexOutOfBounds or missing the last element.