The Question
DesignDesign a High-Performance Search Service
Design a scalable search service for an e-commerce platform with 100 million items. The system must support full-text search, faceted filtering (e.g., by category or price), and near real-time updates. The design should handle 10,000 queries per second with sub-200ms latency, while ensuring high availability and fault tolerance. Discuss how you would handle data ingestion from a primary database, indexing strategies, and query optimization.
Elasticsearch
Kafka
Redis
CDC
Inverted Index
BM25
Kubernetes
REST
Questions & Insights
Clarifying Questions
Scale of Data & Traffic: How many documents/items are we indexing (e.g., 10M vs 10B), and what is the expected peak QPS for searches?
Data Freshness: How quickly must updates to the source data appear in search results (near real-time vs. daily batches)?
Search Complexity: Does the service require full-text search, fuzzy matching, faceted filtering, or advanced ML-based re-ranking?
Content Type: Are we searching structured data (e.g., products in an e-commerce store) or unstructured data (e.g., web pages or logs)?
Assumptions for MVP:
Scale: 100 million searchable documents.
Traffic: 10,000 QPS (Read-heavy).
Latency: Sub-200ms P99 for search queries.
Freshness: Near Real-Time (NRT) - updates visible within seconds.
Scope: Focused on a product/document search with basic filtering and ranking.
Thinking Process
To design a robust search service, we must solve for efficient data ingestion and low-latency retrieval.
How do we efficiently transform raw data into a searchable format? We implement an "Ingestion Pipeline" that consumes events from a primary database and builds an Inverted Index.
How do we achieve sub-second search across millions of records? We utilize a specialized search engine (Elasticsearch/OpenSearch) which uses an inverted index structure.
How do we handle high read volume? We scale the search engine via read replicas and implement a caching layer for frequent queries.
How do we ensure search results are relevant? We apply query parsing and a scoring algorithm (like BM25) to rank documents by relevance.
Bonus Points
Hybrid Search: Combining keyword-based (BM25) with vector-based (Semantic) search using embeddings for better intent matching.
Segment-based Architecture: Explaining how Elasticsearch manages segments, merges, and the impact of "refresh" intervals on performance vs. freshness.
Query Rewriting: Implementing a query expansion layer using a synonym dictionary or LLMs to improve recall.
Data Backfill Strategy: Using a "Dual-Write" or "Snapshot-and-Stream" approach to re-index data without downtime during schema migrations.
Design Breakdown
Functional Requirements
Core Use Cases:
Text-based search with keyword matching.
Filtering results by categories, price, or tags (Faceting).
Sorting by relevance, date, or specific attributes.
Basic "Typeahead" suggestions.
Scope Control:
In-scope: Text search, indexing pipeline, and ranking.
Out-of-scope: Complex ML personalization, image search, and deep web crawling.
Non-Functional Requirements
Scale: Support for 100M documents and 10k QPS.
Latency: P99 search response under 200ms.
Availability & Reliability: 99.99% uptime; search must work even if the ingestion pipeline is delayed.
Consistency: Eventual consistency is acceptable (NRT).
Fault Tolerance: Automatic recovery from node failures in the search cluster.
Estimation
Traffic Estimation:
10k QPS average, 20k QPS peak.
Write QPS: ~1k document updates/sec.
Storage Estimation:
100M documents. Avg size 5KB = 500GB raw.
Indexing overhead (Inverted index, doc values) ~2x = 1TB total storage.
Bandwidth Estimation:
Outgoing: 10k QPS * 50KB (search results) = 500MB/s (4 Gbps).
Blueprint
The design follows an event-driven architecture to decouple the source of truth from the search index.
Search Service: A stateless microservice that parses user queries, applies filters, and interacts with the search engine.
Ingestion Worker: Consumes change events from the primary database and transforms them into the format required for indexing.
Elasticsearch Cluster: The core search engine providing inverted index capabilities.
Redis: Caches the most frequent and expensive search queries.
Simplicity Audit: This design avoids complex batch processing (Spark/Flink) for the MVP, relying on simple stream workers to keep the index updated, which is sufficient for 100M docs.
Architecture Decision Rationale:
Why this?: Using a dedicated search engine (Elasticsearch) is superior to SQL
LIKE queries for scale and relevance.Functional: Satisfies keyword search, filtering, and NRT updates.
Non-functional: Horizontal scaling of Elasticsearch replicas handles the 10k QPS requirement.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Traffic Routing: AWS Route53 for global latency-based routing.
Security: API Gateway handles JWT validation and Rate Limiting (e.g., 100 requests/sec per user) to prevent scraping.
SSL: Termination at the Load Balancer.
Service
Topology & Scaling:
Search Service: Stateless pods running in Kubernetes (EKS). Auto-scales based on CPU/Request count.
Ingestion Worker: Consumer group in K8s, scales based on Kafka consumer lag.
API Schema Design:
GET /v1/search?q={query}&size=20&page=1&filter_category={cat}Protocol: REST/JSON.
Idempotent: Yes.
SLA: 200ms.
Resilience:
Circuit breaker (Resilience4j) on the connection to Elasticsearch to prevent cascading failures if ES is slow.
Exponential backoff for Ingestion Worker retries.
Storage
Access Pattern:
Write: Individual document updates via Ingestion Worker.
Read: High-frequency complex queries (aggregations, filters, text match).
Database Table Design (Elasticsearch Mapping):
id: Keyword (Primary Key)title: Text (Standard Analyzer)description: Text (Standard Analyzer)category: Keyword (For filtering)price: Scaled Float (For sorting/range)created_at: DateTechnical Selection: Elasticsearch
Rationale: Best-in-class for full-text search, natively supports horizontal sharding and replicas.
Distribution Logic:
Sharding: 10 shards (approx 10M docs per shard, ~50GB per shard).
Replication: 2 replicas per shard to handle read volume and failover.
Cache
Purpose: Reduce load on Elasticsearch for "Hot" queries (e.g., "iphone", "trending items").
Key-Value Schema:
key: search_hash(query_params), value: serialized_results.TTL: 5-10 minutes (balances freshness vs. performance).
Technical Selection: Redis (Cluster mode).
Messaging
Purpose: Decouples the Primary DB from the Search Index. If the Search Service or Indexer is down, events are persisted in Kafka.
Event Schema:
{ "op": "UPDATE", "doc_id": "123", "data": {...} }.Technical Selection: Kafka.
Rationale: High throughput and ability to replay events if the index needs to be rebuilt.
Wrap Up
Advanced Topics
Trade-offs: We choose Eventual Consistency over Strong Consistency. There might be a 1-2 second delay between a database update and search visibility. This is standard for search systems.
Reliability: If Redis fails, the system falls back to Elasticsearch. If Elasticsearch fails, the API Gateway returns a 503, but data is preserved in Kafka.
Optimization (Staff Level):
N-Gram Tokenization: For "Typeahead" to support partial word matching.
Force Merge: Periodically merging Elasticsearch segments during low-traffic hours to improve search performance.
Query Warming: Running "warm-up" queries against new Elasticsearch segments before they serve live traffic to populate the filesystem cache.