The Question

Design a Scalable Typeahead Suggestion System

Design a search autocomplete system (Typeahead) similar to Google or Amazon. The system should support 500 million daily searches with a p99 latency under 100ms. Discuss how you would store and query prefix data efficiently, how the system handles the massive write volume of search logs, and how you would ensure the suggestions remain relevant as search trends change. Focus on data structure selection, caching strategies, and the end-to-end data pipeline.

Trie

Redis

Kafka

Apache Spark

API Gateway

CDN

NoSQL

gRPC

Questions & Insights

Clarifying Questions

Q1: What is the scale of the system in terms of DAU and search volume?

Assumption: 100M Daily Active Users (DAU), 500M searches per day. Peak QPS ~25,000.

Q2: What is the required latency for suggestions?

Assumption: Suggestions must appear within 100ms of a keystroke (p99) to provide a seamless user experience.

Q3: How many suggestions should be returned per prefix?

Assumption: Top 10 most popular/relevant suggestions.

Q4: How fresh do the suggestions need to be?

Assumption: Suggestions do not need to be real-time. Daily or weekly updates are sufficient for an MVP to capture trends.

Q5: Are there personalization or language requirements?

Assumption: MVP will focus on global popularity in English. Personalization is out of scope.

Thinking Process

To design an ultra-low latency Typeahead system, we must move away from standard database queries and focus on specialized data structures and aggressive caching.

How do we store and search prefixes efficiently? Use a Trie (Prefix Tree) data structure. For performance, we pre-calculate and store the top-K results at each node.

How do we handle the massive read volume? Implement a multi-layered cache (Browser -> CDN -> Redis) to prevent requests from hitting the core service.

How do we update the suggestion rankings? Use an asynchronous pipeline. Collect search logs, aggregate them via batch processing (Spark), and rebuild/update the Trie weights periodically.

How do we scale the storage? Shard the Trie based on the prefix (e.g., shard by the first or second character) to distribute the load across multiple nodes.

Bonus Points

Succinct Data Structures: Use Louds-Trie or compressed Tries to fit massive datasets into RAM, reducing hardware costs significantly.

Client-Side Optimization: Implement "Smart Debouncing" and browser-side caching of the last few prefixes to eliminate redundant network calls.

Sampling for Scalability: At 500M searches/day, we don't need to process every log. Sample 1-5% of logs to build frequency weights, maintaining accuracy while saving massive processing costs.

Operational Safety: Use a "Blocklist Service" that applies filters in the suggestion path to instantly remove sensitive or inappropriate terms without rebuilding the entire Trie.

Design Breakdown

Functional Requirements

Core Use Cases:

Provide top 10 relevant suggestions based on user input prefix.

Track search term frequency to improve suggestion ranking.

Scope Control:

In-scope: Prefix matching, frequency-based ranking, batch updates.

Out-of-scope: Spell check (fuzzy search), personalized history, real-time "trending" (e.g., Twitter-style trends).

Non-Functional Requirements

Scale: Support 500M searches/day and 100M+ unique terms.

Latency: Sub-100ms response time from the user's perspective.

Availability & Reliability: 99.99% availability; the system must not fail even if the analytics pipeline is delayed.

Consistency: Eventual consistency for suggestion updates (it's okay if a new popular term takes 24h to show up).

Fault Tolerance: The Suggestion Service must be stateless to handle node failures gracefully.

Estimation

Traffic Estimation:

500M searches/day. Assume 4 characters typed per search = 2B requests/day.

Average QPS: 2B / 86400 ≈ 23,000 QPS.

Peak QPS (2x): 46,000 QPS.

Storage Estimation:

100M unique search terms. Average length 20 characters = 2GB raw data.

Trie overhead (with top-10 IDs at each node): ~30-50GB. This fits easily in a small cluster of Redis/Memory nodes.

Bandwidth Estimation:

Ingress: Minimal (prefix strings).

Egress: 10 suggestions 20 bytes = 200 bytes per request. 23k 200B ≈ 4.6 MB/s.

Blueprint

The architecture uses a decoupled approach: a Query Path for serving requests at high speed and a Data Collection Path for offline ranking updates.

Major Components:

API Gateway: Handles rate limiting and initial request routing.

Suggestion Service: Stateless service that fetches prefix results from the Trie.

Redis Cache: Stores the most frequent prefix results to minimize Trie lookups.

Trie Store: A persistent Key-Value store or specialized database containing the pre-built Prefix Tree.

Kafka: Ingests search events for asynchronous processing.

Aggregator (Spark): Processes raw logs to calculate search frequencies and update Trie weights.

Simplicity Audit: We avoid a real-time streaming Trie, which is complex to make thread-safe and consistent. Instead, we use batch updates to maintain high read performance.

Architecture Decision Rationale:

Performance: In-memory Tries provide O(L) lookup time where L is the prefix length, independent of the total number of terms.

Scalability: Decoupling ingestion (Kafka) from querying ensures peak search traffic doesn't crash the analytics pipeline.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

Use a global CDN to cache responses for the most common prefixes (e.g., single letters 'a', 'b', 's').

DNS uses latency-based routing to the nearest regional data center.

Security & Perimeter:

API Gateway: Implements JWT validation and strict rate limiting (e.g., 10 requests per second per IP) to prevent scraping or DDoS.

SSL Termination: Handled at the Gateway to reduce load on internal services.

Service

Topology & Scaling:

Suggestion Service: Stateless Go or Java pods running in Kubernetes. Scales horizontally based on CPU/Request count.

API Schema Design:

Endpoint: GET /v1/suggest?q={prefix}

Protocol: REST/JSON (MVP) or gRPC for internal low-latency.

Response: {"suggestions": ["apple", "apple watch", "iphone"]}

SLA: 50ms internal latency.

Resilience & Reliability:

Timeout: Set to 100ms. If the service is slow, the client should simply not show suggestions.

Circuit Breaker: If Redis is down, fallback to Trie Store; if both are slow, return empty list.

Storage

Access Pattern: 100% Read for queries, Batch Write for updates.

Database Table Design:

The Trie is stored in a Key-Value format (e.g., DynamoDB or RocksDB).

Key: Prefix (string).

Value: List of {suggestion: string, weight: int}.

Technical Selection:

Trie Store: Use a Distributed Key-Value store. For the MVP, we can even use a sharded Redis cluster with persistence enabled.

Distribution Logic:

Shard by prefix (e.g., hash(prefix[0:2])). This prevents a single node from being a hotspot for popular letters.

Cache

Purpose & Justification: Reduces the load on the Trie Store for the top 20% of prefixes that make up 80% of traffic.

Key-Value Schema:

Key: prefix:{query}

Value: Serialized JSON list of suggestions.

TTL: 1 hour (for popular trends).

Failure Handling: If Redis fails, the Suggestion Service queries the Trie Store directly.

Messaging

Purpose & Decoupling: Kafka decouples the critical query path from the heavy analytics path.

Event Schema: { "query": "apple", "timestamp": 12345678, "geo": "US" }.

Throughput & Partitioning: Partition by query string to ensure all identical searches land on the same aggregator node.

Data Processing

Processing Model: Daily batch processing using Spark.

Processing DAG:

Read Kafka logs (S3 dump).

Map-Reduce to count query frequencies.

Filter out banned words.

Update the weights in the Trie Store.

Technical Selection: Apache Spark for its robustness in handling large-scale joins and aggregations.

Wrap Up

Advanced Topics

Trade-offs: We chose Eventual Consistency over Strong Consistency. New terms won't show up instantly, but this allows for a much simpler and faster read path.

Reliability: If the aggregator fails, the system continues to serve old data. This is a graceful degradation.

Bottleneck Analysis:

Hot Keys: Prefixes like "s" or "g" will be extremely hot. Mitigated by CDN and Redis caching.

Memory Limits: As the Trie grows, we may need to implement a "Pruning" strategy where low-frequency nodes are deleted.

Security & Privacy:

PII Scrubbing: The Aggregator must strip any PII (names, SSNs) from search logs before storing them in the Trie weights.

Distinguishing Insights:

Trie Partitioning: Instead of simple hashing, use "Range Partitioning" for the Trie to keep related prefixes on the same physical node, allowing for more efficient range scans if needed in the future.