The Question
DesignScalable Typeahead Suggestion System
Design a highly available and low-latency autocomplete system similar to a major search engine's search bar. The system should provide the most popular suggestions based on user input in real-time, handling millions of requests per second while ensuring that the suggestions are updated periodically based on global search trends.
Trie
Redis
Spark
S3/HDFS
REST
Questions & Insights
Clarifying Questions
Scale and Traffic: What is the expected scale in terms of Daily Active Users (DAU) and Queries Per Second (QPS)?
Assumption: 100 Million DAU, resulting in ~50,000 QPS for suggestions (assuming 5-10 keystrokes per search).
Latency Requirement: What is the target "p99" latency for returning suggestions?
Assumption: To feel "instant," the end-to-end latency must be <100ms.
Freshness: How quickly must new trending searches appear in the suggestions?
Assumption: Daily updates are sufficient for the MVP; real-time trending is out of scope for YAGNI.
Scope of Suggestions: Are we supporting personalization (user history) or localization (language/region)?
Assumption: MVP will focus on global popularity based on the English language to minimize complexity.
Data Volume: How many unique prefixes and search terms are we indexing?
Assumption: Top 10 million most frequent queries.
Thinking Process
Core Strategy: Shift the heavy lifting to an offline pipeline. Instead of calculating popular queries in real-time, we pre-compute a Trie data structure and serve it from an in-memory cache.
Key Progression:
How do we find suggestions for a prefix? (Trie Data Structure).
How do we store and retrieve this Trie at scale? (Distributed Cache/Redis).
How do we keep the suggestions relevant? (Daily Batch Processing Pipeline).
how do we minimize client-side overhead? (Browser caching and debounce logic).
Bonus Points
Trie Partitioning: To handle massive Tries that exceed a single node's RAM, we can partition based on the first 1 or 2 characters (e.g., Server A handles 'a'-'m', Server B handles 'n'-'z').
Smart Sampling: Instead of logging every single keystroke (which would overwhelm the system), we log 1 out of every 10 or 100 requests to build our frequency models.
Edge Caching: Utilizing CDNs to cache popular prefix results (e.g., "g", "go", "goo") to reduce origin load significantly.
Trie Serialization: Storing the Trie as a Succinct Data Structure or a serialized Bloom Filter to optimize memory footprint.
Design Breakdown
Functional Requirements
Given a search prefix, return the top 5 most popular completed search terms.
Suggestions must update based on historical search frequency.
Support basic prefix matching (case-insensitive).
Non-Functional Requirements
Low Latency: Suggestions must appear as the user types (<100ms).
High Availability: The system must remain available even if the background update pipeline fails.
Scalability: Must handle spikes in traffic during peak hours.
Read-Heavy Optimization: 100:1 Read-to-Write ratio.
Estimation
QPS: 100M DAU 10 searches/day 5 keystrokes/search = 5 Billion requests/day. 5B / 86400s \approx 58,000 QPS.
Storage: 10M unique queries * 20 bytes/query = 200MB. Including Trie overhead and frequency weights, ~1-2GB total. This easily fits in a single high-memory Redis instance or a small cluster.
Network: 50k QPS * 0.5KB response size = 25MB/s.
Blueprint
Concise Summary: A read-optimized system where an offline Spark job aggregates search logs daily to build a frequency-weighted Trie, which is then serialized into Redis for sub-millisecond prefix lookups by a Suggestion Service.
Major Components:
Suggestion Service: Lightweight API that performs prefix lookups against the Trie.
Redis (Cache): Stores the serialized Trie or pre-computed top-k results for common prefixes.
Log Aggregator (Spark): Processes raw search logs to calculate query frequencies.
Trie Builder: A specialized worker that constructs the Trie from aggregated frequencies and pushes it to the Cache.
Simplicity Audit: This design avoids complex real-time streaming (Kafka/Flink) and complex NLP, using simple frequency counts and batch processing to meet the core user need.
Architecture Decision Rationale:
Why this architecture is the best for this problem?: It separates the high-frequency "Read" path (suggestions) from the compute-heavy "Write" path (calculating popularity), ensuring that search performance is never degraded by background analytics.
Functional Requirement Satisfaction: Uses a Trie structure to ensure O(L) lookup time where L is prefix length, satisfying the top-5 suggestion requirement.
Non-functional Requirement Satisfaction: In-memory storage (Redis) guarantees <10ms database latency, contributing to the overall <100ms end-to-end goal.
High Level Architecture
Sub-system Deep Dive
Service
Topology & Scaling: The Suggestion Service is a fleet of stateless microservices behind a Load Balancer. It scales horizontally based on CPU/Request count.
API Spec:
GET /v1/suggestions?prefix={query}Response:
{ "suggestions": ["apple", "amazon", "alphabet"] }Protocol: HTTPS/REST. For MVP, we use standard REST with a
Connection: keep-alive to minimize TCP handshake overhead.Storage
Data Model: The "Search Logs" are stored in an append-only distributed file system (S3/HDFS).
Database Logic: The Aggregator groups searches by
query_string and counts occurrences.Table Schema:
[query (string), frequency (long), last_updated (timestamp)].Cache
Data Structure: Redis
Hashes. Key:
prefix:{prefix_string}Value: Serialized List of top 5 strings.
Example:
prefix:ap -> ["apple", "amazon", "applied"].TTLs: Since the Trie is updated daily via batch, TTL is set to 24 hours + jitter to prevent thundering herds.
Eviction: LFU (Least Frequently Used) to keep popular prefixes in memory if the dataset grows.
Data Processing
Spark Job:
Read logs from T-1 day.
Filter out profanity/sensitive terms via a blacklist.
Map-Reduce to get
(query, count).Sort by count and take top 10M.
Trie Builder: Iterates through the top 10M queries, inserts them into a Trie where each node stores the top 5 children's metadata to avoid traversing the whole subtree during read time.
Wrap Up
Advanced Topics
Monitoring:
Prometheus/Grafana: Monitor P99 latency and 4xx/5xx error rates.
Cache Hit Rate: Monitor how many prefixes are served by Redis vs. missed.
Trade-offs:
Consistency vs. Performance: Suggestions are stale for up to 24 hours. This is acceptable for an MVP to ensure maximum system stability and low latency.
Bottlenecks: The Trie Builder could become a bottleneck if the number of unique queries explodes. In that case, we would shard the Trie by prefix.
Failure Handling:
Redis Replication: Use a Primary-Replica setup to prevent data loss.
Fallback: If Redis is unreachable, the Suggestion Service returns an empty list (fail-silent) to avoid blocking the main search experience.
Alternatives:
Elasticsearch: Could be used for prefix matching via
completion suggester. However, for a simple frequency-based autocomplete, Redis + Trie is significantly faster and cheaper to operate at scale.