The Question
Design

Scalable E-commerce Search System

Design a high-performance search service for an e-commerce platform that supports full-text search, faceted filtering, and near real-time updates for millions of products while maintaining sub-150ms latency at high query volumes.
Elasticsearch
Kafka
Redis
CDC
Kubernetes
Questions & Insights

Clarifying Questions

What is the scale of the data? (Assumed: 10 million searchable items/documents, averaging 2KB each).
What is the expected traffic? (Assumed: 10,000 Queries Per Second (QPS) at peak, with 100 updates per second).
What are the specific search requirements? (Assumed: Full-text search, fuzzy matching, filtering by categories/attributes, and basic relevance ranking).
What is the consistency requirement for updates? (Assumed: Eventual consistency is acceptable; updates should reflect in search results within 1-2 seconds).
Is personalization or AI-driven ranking required? (Assumed: Not for MVP; standard BM25/TF-IDF ranking is sufficient).
Assumptions Summary:
Data Scale: 10M documents.
Traffic: 10k QPS Search, 100 QPS Write.
Freshness: Near Real-Time (NRT) ~1 second.
Features: Text search, facets (filtering), and basic autocomplete.

Thinking Process

Core Inversion: How do we transform unstructured or relational data into a format optimized for sub-second retrieval? (Focus: Inverted Index).
Read Path Efficiency: How do we handle 10k QPS without overloading the primary search engine? (Focus: Scaling the Search Engine and Caching).
The Freshness Pipeline: How do we sync changes from the Source of Truth (Database) to the Search Index without impacting user-facing latency? (Focus: Event-driven ingestion).
Ranking and Relevance: How do we ensure the first page of results is what the user actually wants? (Focus: Scoring functions and analyzers).

Bonus Points

Hybrid Search: Implementing Vector Search (using embeddings + HNSW index) alongside traditional keyword search for semantic understanding.
Search Result Diversity: Implementing "de-biasing" or "diversification" algorithms to ensure the top results aren't all from the same category/seller.
Index Aliasing: Using zero-downtime index migrations via aliases (Atomic switch from index_v1 to index_v2).
Query Rewriting: Implementing a pre-processor for query expansion (synonyms), spell check, and intent detection (e.g., "cheap laptops" -> filter by price < $500).
Design Breakdown

Functional Requirements

Users can search via free-text keywords.
Users can filter results by category, price range, and rating.
System provides basic autocomplete/type-ahead suggestions.
Results are paginated and ranked by relevance.
New/Updated items appear in search within 2 seconds.

Non-Functional Requirements

Low Latency: Search results returned in < 150ms (p99).
High Availability: 99.99% uptime; search must work even if ingestion is delayed.
Scalability: Support horizontal scaling of both storage and query volume.
Consistency: Eventual consistency for index updates.

Estimation

Storage: 10M docs * 2KB = 20GB. Including overhead for inverted indices and replicas, ~100GB total storage (fits in RAM or fast NVMe).
Search QPS: 10,000 QPS. If one node handles 500 QPS, we need ~20 search nodes.
Ingestion: 100 updates/sec is negligible for modern message queues (Kafka) and search engines (Elasticsearch).
Bandwidth: 10k QPS * 10KB response = 100MB/s outgoing traffic.

Blueprint

Concise Summary: A microservices-based architecture utilizing an event-driven pipeline to sync a relational database (Source of Truth) to an Elasticsearch cluster for high-performance retrieval.
Major Components:
API Gateway: Entry point for authentication, rate limiting, and request routing.
Search Service: Stateless service that translates user queries into Elasticsearch DSL.
Elasticsearch Cluster: Distributed inverted index providing full-text search and analytical aggregations.
Ingestion Worker: Consumes change events and updates the search index.
Kafka: Decouples the primary database from the search index for reliability.
Redis: Caches popular query results to reduce cluster load.
Simplicity Audit: This design avoids complex data processing frameworks (like Spark/Flink) and uses a managed search engine (Elasticsearch) to handle the heavy lifting of indexing and ranking.
Architecture Decision Rationale:
Why this architecture?: Elasticsearch is the industry standard for text search due to its rich API, horizontal scalability, and built-in support for facets/filtering.
Functional Satisfaction: Covers text matching, filtering (via aggregations), and NRT updates.
Non-functional Satisfaction: Kafka provides "buffer" against spikes, while Elasticsearch replicas ensure high availability and read throughput.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: DNS routes users to the nearest regional Load Balancer.
Security & Perimeter:
API Gateway: Handles JWT validation.
Rate Limiting: Throttles users exceeding 10 searches per second to prevent scraping.
WAF: Protects against SQL injection or Lucene injection attacks.

Service

Topology & Scaling:
Search Service: Stateless pods in K8s, scaling based on CPU and Request Count.
Isolation: Search traffic is isolated from Ingestion traffic to prevent write-heavy indexing from slowing down user queries.
API Schema Design:
Endpoint: GET /v1/search
Params: q (string), filters (JSON), sort (string), limit/offset (int).
Response: List of document snippets, total count, and aggregation buckets for filters.
Resilience & Reliability:
Circuit Breaker: If Elasticsearch latency spikes, the service returns cached results or an empty state rather than hanging.
Retries: Exponential backoff for transient network errors between Service and ES.

Storage

Access Pattern: Heavy read (100:1 ratio). Searches involve complex filtering and keyword matching across multiple fields.
Database Table Design (Elasticsearch Mapping):
id: Keyword (Primary Key)
title: Text (Standard Analyzer)
description: Text (Standard Analyzer)
price: Double (Range indexed)
category: Keyword (For exact filtering)
created_at: Date
Technical Selection: Elasticsearch.
Rationale: Built-in BM25 ranking, native support for "Aggregations" (facets), and horizontal sharding.
Distribution Logic:
Sharding: Sharded by document_id. 5 primary shards (for 10M docs) with 2 replicas each for high read throughput.

Cache

Purpose & Justification: Reduces load on Elasticsearch for top 1% of queries (e.g., "iphone", "laptop") which account for 20% of traffic.
Key-Value Schema:
Key: search:hash(query_params)
Value: JSON-serialized search response.
TTL: 5-10 minutes (Short TTL to maintain freshness).
Technical Selection: Redis.
Failure Handling: If Redis is down, the Search Service falls back directly to Elasticsearch (Cache-aside pattern).

Messaging

Purpose & Decoupling: Ensures that if the search cluster is undergoing maintenance, updates from the Primary DB are not lost.
Event Schema: { "op": "UPDATE", "id": "123", "data": {...} }.
Throughput & Partitioning: Kafka topic partitioned by document_id to ensure that updates for the same document are processed in order.
Technical Selection: Kafka. High throughput and 7-day retention for replayability.

Infrastructure (Optional)

Observability:
Metrics: Monitor "Search Latency" and "Index Freshness" (time from DB update to ES availability).
Traces: Distributed tracing (Jaeger) to track a request from Gateway -> Search Service -> Elasticsearch.
Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency over Strong Consistency. A product update might take ~1s to appear in search. This is a standard trade-off for high-performance search systems (CAP: Availability over Consistency).
Reliability: If the Ingestion Worker fails, Kafka stores the events. We monitor "Consumer Lag" to ensure the search index doesn't fall too far behind.
Bottleneck Analysis:
Hot Shards: If one category is extremely popular, we ensure the sharding key is document_id (high cardinality) rather than category_id.
Deep Paging: Using from/size in ES is expensive for deep pages (>10,000). We limit UI pagination or use search_after (cursor-based) for deep results.
Distinguishing Insights:
N+1 Prevention: Ensure the Search Service returns enough data so the frontend doesn't have to query the Primary DB for every item in the list.
Query Performance: Use filter context in Elasticsearch instead of query context for price/category filters to leverage ES internal bitset caching.