FAISS/HNSW
Cheat Sheet
Prime Use Case
Use when you need to retrieve the top-K most similar items from a corpus of millions or billions of high-dimensional embeddings with sub-linear latency.
Critical Tradeoffs
- Recall vs. Latency
- Memory Footprint vs. Search Speed
- Index Build Time vs. Query Throughput
Killer Senior Insight
HNSW effectively solves the 'curse of dimensionality' by creating a multi-layered graph where the top layers provide coarse-grained navigation (long-range jumps) and bottom layers provide fine-grained local search, mimicking a skip-list data structure for high-dimensional space.
Recognition
Common Interview Phrases
Common Scenarios
- Semantic Search Engines
- Large-scale Recommendation Systems (Candidate Generation)
- Image/Video Retrieval via CLIP embeddings
- Face Recognition databases
Anti-patterns to Avoid
- Using HNSW for datasets with fewer than 10,000 vectors where exact Flat search is faster and simpler.
- Applying HNSW to frequently changing metadata filters without a hybrid search strategy.
- Using HNSW on low-dimensional data (e.g., < 10 dimensions) where KD-Trees are more efficient.
The Problem
The Fundamental Issue
The fundamental bottleneck is the O(N * D) complexity of exhaustive (Flat) search, which becomes computationally prohibitive as the number of vectors (N) or dimensions (D) grows.
What breaks without it
Query latency exceeds SLA (e.g., > 500ms for a simple search).
CPU utilization spikes to 100% during retrieval tasks.
Linear scaling of hardware costs as the dataset grows.
Why alternatives fail
KD-Trees and Ball-Trees collapse to linear search performance in high-dimensional spaces (D > 20).
LSH (Locality Sensitive Hashing) often requires a massive number of hash tables to achieve high recall, leading to excessive memory usage.
Simple clustering (IVF) without quantization still requires loading all centroids and residuals into memory, which doesn't scale to billions.
Mental Model
The Intuition
Imagine a multi-story library. The top floor has only 10 books representing broad categories. You find the closest category, then take an elevator down to the next floor which has 100 books in that category. You repeat this until you reach the basement, which contains all the books, but you only search the small shelf where your target book is likely to be.
Key Mechanics
Probability-based layer assignment for nodes.
Greedy routing on a proximity graph.
M (number of neighbors) and efConstruction (search depth during build) as primary tuning knobs.
Delaunay-like graph properties to ensure connectivity.
Framework
When it's the best choice
- When low-latency (millisecond range) is the highest priority.
- When the dataset fits in RAM (HNSW is memory-intensive).
- When high recall (>95%) is required for the application.
When to avoid
- When memory is extremely constrained (use IVF-PQ instead).
- When the index needs to be rebuilt from scratch very frequently (HNSW build time is high).
- When you only need to search by exact ID or metadata.
Fast Heuristics
Tradeoffs
Strengths
- Superior query-per-second (QPS) performance compared to almost all other ANN methods.
- Excellent recall-latency curve.
- Supports incremental additions without full re-indexing.
Weaknesses
- High memory overhead (stores both vectors and graph edges).
- Slow index construction time due to neighbor link evaluations.
- Difficult to delete nodes (often requires 'marking' and periodic rebuilding).
Alternatives
When it wins
When the dataset is too large for RAM and needs to be compressed.
Key Difference
Uses clustering to narrow search space and quantization to compress vectors into short codes.
When it wins
When using Maximum Inner Product Search (MIPS) and seeking maximum CPU efficiency.
Key Difference
Uses anisotropic quantization to better preserve the direction of high-magnitude vectors.
When it wins
When you need a simple, static index that can be memory-mapped across multiple processes.
Key Difference
Uses a forest of random projection trees rather than a graph.
Execution
Must-hit talking points
- Mention the 'Recall-Latency-Memory' triangle and how HNSW sits on it.
- Discuss the importance of 'efSearch' as a runtime parameter to tune recall on the fly.
- Explain how Product Quantization (PQ) can be combined with HNSW to reduce the memory footprint.
- Highlight the difference between L2 distance and Inner Product (IP) and how they affect index choice.
Anticipate follow-ups
- Q:How would you handle a multi-tenant vector search where users can only see their own data?
- Q:How do you handle 'stale' embeddings when the underlying model is updated?
- Q:What is the strategy for sharding a vector index across a cluster?
Red Flags
Ignoring the memory overhead of HNSW edges.
Why it fails: HNSW requires storing 'M' pointers per vector; for M=32, this can double the memory requirement, leading to unexpected OOM errors.
Assuming 100% recall.
Why it fails: ANN is probabilistic. In critical systems (like legal or medical search), missing the 'exact' best match might be unacceptable without a re-ranking step.
Not normalizing vectors before using Inner Product search.
Why it fails: If the model wasn't trained for IP or vectors aren't unit-length, the similarity scores will be mathematically inconsistent with Cosine Similarity.