The Question
DesignLocal Discovery & Review System
Design a high-scale platform that allows users to discover local businesses based on geographical proximity, supporting features such as location-based search, business metadata management, and a user-generated review system with aggregate ratings.
Elasticsearch
PostgreSQL
Redis
Kafka
Geohash
Questions & Insights
Clarifying Questions
What is the scale of the system? (Assumed: 50 million businesses, 100 million Daily Active Users (DAU), 100,000 Search QPS, 2,000 Review QPS).
What are the core functional priorities? (Assumed: Proximity-based search by category/name and posting reviews).
Does the search need to be real-time? (Assumed: Business information changes slowly, so near real-time (seconds/minutes) latency for new businesses appearing in search is acceptable).
How do we handle high-density areas vs. low-density areas? (Assumed: The indexing strategy must account for the difference between Manhattan and rural Montana).
Thinking Process
The core challenge of Yelp is efficient spatial querying combined with high read throughput.
How do we represent geography? Convert 2D Lat/Long into a 1D searchable index (Geohash or QuadTree).
How do we optimize search? Decouple the "Business Metadata" from the "Spatial Index" to allow independent scaling of the high-frequency search path.
How do we handle write-heavy reviews? Use an asynchronous path to update aggregate ratings so search performance isn't degraded.
How do we ensure global low latency? Implement a multi-layered caching strategy for frequently searched locations and business profiles.
Bonus Points
Google S2 Geometry: Propose using S2 cells instead of Geohashes for better coverage of the earth's curvature and easier "neighbor" lookups using Hilbert Curves.
CQRS Pattern: Separate the command (editing a business) from the query (searching for a business) using a search-optimized engine like Elasticsearch.
Write-back Caching: Use Redis for real-time view counts or "trending" metrics to avoid saturating the primary DB.
Cell-based Sharding: Shard the spatial index based on Geohash prefixes to ensure location-based queries hit a limited number of shards.
Design Breakdown
Functional Requirements
Users can search for businesses based on their current location (latitude/longitude) and radius.
Users can filter searches by category (e.g., "Restaurants") and ratings.
Users can view detailed business information, including photos and reviews.
Users can post ratings and text reviews for businesses.
Non-Functional Requirements
High Availability: The system must be available for discovery even if the review-posting service is lagging.
Low Latency: Search results should return in < 200ms.
Scalability: Handle 100k+ searches per second during peak hours.
Consistency: Eventual consistency is acceptable for reviews and ratings; search results don't need to reflect a new business in milliseconds.
Estimation
Storage: 50M businesses * 2KB metadata = 100GB.
Reviews: 50M businesses 100 reviews 1KB = 5TB.
Read QPS: 100k (Search) + 50k (Profile views).
Write QPS: 2k (New reviews).
Bandwidth: 100k search requests/sec * 5KB response = 500MB/s outgoing.
Blueprint
Concise Summary: A microservices architecture leveraging a Geospatial search engine (Elasticsearch) for discovery and a relational database (PostgreSQL) for source-of-truth metadata.
Major Components:
Load Balancer & API Gateway: Routes traffic and handles authentication/rate limiting.
Search Service: Performs proximity-based queries using Geohashes.
Business/Review Service: Manages CRUD operations for business profiles and user reviews.
Spatial Index (Elasticsearch): Provides high-speed geo-distance filtering and full-text search.
Message Broker (Kafka): Decouples business updates from the search index synchronization.
Simplicity Audit: This architecture uses standard, proven components (Elasticsearch/Postgres) rather than custom-built QuadTrees to minimize time-to-market.
Architecture Decision Rationale:
Why this architecture?: Elasticsearch natively supports
geo_point and geo_shape queries, handling the complexity of spatial indexing out of the box.Functional Satisfaction: Meets search and review requirements through specialized read/write paths.
Non-functional Satisfaction: High availability is achieved through database replication and search engine clustering.
High Level Architecture
Sub-system Deep Dive
Service
Topology: Services are deployed as Dockerized containers in an auto-scaling Kubernetes cluster.
Search API:
GET /v1/search?lat=x&long=y&radius=5km&category=pizza. Uses Geohash-based sharding to route queries.Review API:
POST /v1/biz/{id}/reviews. Accepts JSON payload with rating and text.Storage
Data Model:
Businesses Table: id (UUID), name, lat, long, category_id, avg_rating, review_count.Reviews Table: id, business_id, user_id, rating, comment, created_at.Database Logic: PostgreSQL handles ACID transactions for reviews. Elasticsearch stores a flattened version of business data with a
geo_point field for search.Cache
Technology: Redis.
Usage: Stores "Hot" business profiles (key:
biz_id) and popular search results for specific Geohash cells (key: geohash_prefix_6).TTL: 15 minutes for search results; 24 hours for business profiles (evicted on update).
Messaging
Technology: Kafka.
Structure:
business-updates topic.Logic: When a business is updated or a new review is posted, a message is emitted. The Indexer Worker consumes these to update the Elasticsearch index and re-calculate average ratings asynchronously.
Wrap Up
Advanced Topics
Monitoring: Prometheus for latency/error rates; ELK stack for log aggregation; Zipkin for distributed tracing across services.
Trade-offs: We choose Eventual Consistency for ratings. A user might post a review, but the "Average Rating" in the search results might take 30-60 seconds to update. This significantly reduces the lock contention on the Business table.
Bottlenecks: High-density areas (e.g., Times Square) might result in very large search result sets. Optimization: Use "Grid-based Pagination" where we limit the number of results per Geohash cell.
Failure Handling: PostgreSQL uses Lead-Follower replication. Elasticsearch uses cross-cluster replication to ensure search availability if one data center fails.
Alternatives: Could use PostGIS instead of Elasticsearch. Decision: Elasticsearch is preferred because it handles full-text search (e.g., "Best gluten-free pizza") better than a relational database.