The Question

Design a High-Performance Search Service

Design a scalable search service for an e-commerce platform with 100 million items. The system must support full-text search, faceted filtering (e.g., by category or price), and near real-time updates. The design should handle 10,000 queries per second with sub-200ms latency, while ensuring high availability and fault tolerance. Discuss how you would handle data ingestion from a primary database, indexing strategies, and query optimization.

Elasticsearch

Kafka

Redis

CDC

Inverted Index

BM25

Kubernetes

REST

Questions & Insights

Clarifying Questions

Scale of Data & Traffic: How many documents/items are we indexing (e.g., 10M vs 10B), and what is the expected peak QPS for searches?

Data Freshness: How quickly must updates to the source data appear in search results (near real-time vs. daily batches)?

Search Complexity: Does the service require full-text search, fuzzy matching, faceted filtering, or advanced ML-based re-ranking?

Content Type: Are we searching structured data (e.g., products in an e-commerce store) or unstructured data (e.g., web pages or logs)?

Assumptions for MVP:

Scale: 100 million searchable documents.

Traffic: 10,000 QPS (Read-heavy).

Latency: Sub-200ms P99 for search queries.

Freshness: Near Real-Time (NRT) - updates visible within seconds.

Scope: Focused on a product/document search with basic filtering and ranking.

Thinking Process

To design a robust search service, we must solve for efficient data ingestion and low-latency retrieval.

How do we efficiently transform raw data into a searchable format? We implement an "Ingestion Pipeline" that consumes events from a primary database and builds an Inverted Index.

How do we achieve sub-second search across millions of records? We utilize a specialized search engine (Elasticsearch/OpenSearch) which uses an inverted index structure.

How do we handle high read volume? We scale the search engine via read replicas and implement a caching layer for frequent queries.

How do we ensure search results are relevant? We apply query parsing and a scoring algorithm (like BM25) to rank documents by relevance.

Bonus Points

Hybrid Search: Combining keyword-based (BM25) with vector-based (Semantic) search using embeddings for better intent matching.

Segment-based Architecture: Explaining how Elasticsearch manages segments, merges, and the impact of "refresh" intervals on performance vs. freshness.

Query Rewriting: Implementing a query expansion layer using a synonym dictionary or LLMs to improve recall.

Data Backfill Strategy: Using a "Dual-Write" or "Snapshot-and-Stream" approach to re-index data without downtime during schema migrations.

Design Breakdown

Functional Requirements

Core Use Cases:

Text-based search with keyword matching.

Filtering results by categories, price, or tags (Faceting).

Sorting by relevance, date, or specific attributes.

Basic "Typeahead" suggestions.

Scope Control:

In-scope: Text search, indexing pipeline, and ranking.

Out-of-scope: Complex ML personalization, image search, and deep web crawling.

Non-Functional Requirements

Scale: Support for 100M documents and 10k QPS.

Latency: P99 search response under 200ms.

Availability & Reliability: 99.99% uptime; search must work even if the ingestion pipeline is delayed.

Consistency: Eventual consistency is acceptable (NRT).

Fault Tolerance: Automatic recovery from node failures in the search cluster.

Estimation

Traffic Estimation:

10k QPS average, 20k QPS peak.

Write QPS: ~1k document updates/sec.

Storage Estimation:

100M documents. Avg size 5KB = 500GB raw.

Indexing overhead (Inverted index, doc values) ~2x = 1TB total storage.

Bandwidth Estimation:

Outgoing: 10k QPS * 50KB (search results) = 500MB/s (4 Gbps).

Blueprint

The design follows an event-driven architecture to decouple the source of truth from the search index.

Search Service: A stateless microservice that parses user queries, applies filters, and interacts with the search engine.

Ingestion Worker: Consumes change events from the primary database and transforms them into the format required for indexing.

Elasticsearch Cluster: The core search engine providing inverted index capabilities.

Redis: Caches the most frequent and expensive search queries.

Simplicity Audit: This design avoids complex batch processing (Spark/Flink) for the MVP, relying on simple stream workers to keep the index updated, which is sufficient for 100M docs.

Architecture Decision Rationale:

Why this?: Using a dedicated search engine (Elasticsearch) is superior to SQL LIKE queries for scale and relevance.

Functional: Satisfies keyword search, filtering, and NRT updates.

Non-functional: Horizontal scaling of Elasticsearch replicas handles the 10k QPS requirement.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Traffic Routing: AWS Route53 for global latency-based routing.

Security: API Gateway handles JWT validation and Rate Limiting (e.g., 100 requests/sec per user) to prevent scraping.

SSL: Termination at the Load Balancer.

Service

Topology & Scaling:

Search Service: Stateless pods running in Kubernetes (EKS). Auto-scales based on CPU/Request count.

Ingestion Worker: Consumer group in K8s, scales based on Kafka consumer lag.

API Schema Design:

GET /v1/search?q={query}&size=20&page=1&filter_category={cat}

Protocol: REST/JSON.

Idempotent: Yes.

SLA: 200ms.

Resilience:

Circuit breaker (Resilience4j) on the connection to Elasticsearch to prevent cascading failures if ES is slow.

Exponential backoff for Ingestion Worker retries.

Storage

Access Pattern:

Write: Individual document updates via Ingestion Worker.

Read: High-frequency complex queries (aggregations, filters, text match).

Database Table Design (Elasticsearch Mapping):

id: Keyword (Primary Key)

title: Text (Standard Analyzer)

description: Text (Standard Analyzer)

category: Keyword (For filtering)

price: Scaled Float (For sorting/range)

created_at: Date

Technical Selection: Elasticsearch

Rationale: Best-in-class for full-text search, natively supports horizontal sharding and replicas.

Distribution Logic:

Sharding: 10 shards (approx 10M docs per shard, ~50GB per shard).

Replication: 2 replicas per shard to handle read volume and failover.

Cache

Purpose: Reduce load on Elasticsearch for "Hot" queries (e.g., "iphone", "trending items").

Key-Value Schema: key: search_hash(query_params), value: serialized_results.

TTL: 5-10 minutes (balances freshness vs. performance).

Technical Selection: Redis (Cluster mode).

Messaging

Purpose: Decouples the Primary DB from the Search Index. If the Search Service or Indexer is down, events are persisted in Kafka.

Event Schema: { "op": "UPDATE", "doc_id": "123", "data": {...} }.

Technical Selection: Kafka.

Rationale: High throughput and ability to replay events if the index needs to be rebuilt.

Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency over Strong Consistency. There might be a 1-2 second delay between a database update and search visibility. This is standard for search systems.

Reliability: If Redis fails, the system falls back to Elasticsearch. If Elasticsearch fails, the API Gateway returns a 503, but data is preserved in Kafka.

Optimization (Staff Level):

N-Gram Tokenization: For "Typeahead" to support partial word matching.

Force Merge: Periodically merging Elasticsearch segments during low-traffic hours to improve search performance.

Query Warming: Running "warm-up" queries against new Elasticsearch segments before they serve live traffic to populate the filesystem cache.