The Question
Design

Scalable Proximity-Based Discovery System

Design a local discovery and review platform similar to Yelp. The system must support searching for millions of businesses based on real-time geographic location, viewing rich business profiles, and handling high-concurrency user reviews. Focus on the geospatial indexing strategy, data consistency for business ratings, and how to scale the system for global traffic with low-latency search results.
PostgreSQL
PostGIS
Redis
Kafka
Geohash
S2 Geometry
CDN
Kubernetes
Microservices
Questions & Insights

Clarifying Questions

What is the scale of the data? (Assumption: 100 million businesses globally, 50 million Daily Active Users (DAU), and an average of 10-20 reviews per business).
What are the primary search criteria? (Assumption: Primarily location-based "near me" searches, with secondary filters for category, price, and rating).
What is the expected latency for search? (Assumption: P99 < 200ms for geospatial queries).
How frequent are business information updates? (Assumption: Low write volume for business metadata; high write volume for reviews/ratings, but eventual consistency is acceptable).

Thinking Process

Core Bottleneck: Efficiently querying 2D coordinate data (latitude/longitude) in a 1D database index.
Progressive Approach:
How do we store business metadata and handle basic CRUD?
How do we convert 2D coordinates into a searchable format (Geohashing vs. Quadtree)?
How do we scale the read-heavy search traffic?
How do we handle high-frequency review ingestion without impacting search performance?

Bonus Points

S2 Geometry: Discuss using Google’s S2 library for spherical geometry which uses Hilbert Curves, offering better cell coverage and less "edge case" behavior than Geohash.
Dynamic Grid Density: Explain how to adjust the depth of the spatial index (e.g., smaller cells in San Francisco, larger cells in rural Nebraska) to balance load.
Read-Optimized Views: Mention Materialized Views or CQRS to separate the complex business-searching logic from the metadata management.
Geo-Sharding: Partitioning data by geographic regions (e.g., continent or country) to ensure data locality and reduce cross-region latency.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can search for businesses within a specified radius or bounding box.
Users can view detailed business profiles (metadata, reviews, photos).
Users can post reviews and ratings for a business.
Business owners can add or update their business information.
Scope Control:
In-scope: Search, Business Metadata, Reviews, Basic Ranking.
Out-of-scope: Reservation systems, Food delivery tracking, Social feeds/friends, Ad-bidding engines.

Non-Functional Requirements

Scale: Must support 500 million searches per day and 100 million businesses.
Latency: Sub-second search results; high-speed profile loading.
Availability & Reliability: 99.99% availability (CAP: Availability over Consistency).
Consistency: Eventual consistency for search results and reviews (seconds to minutes delay is fine).
Security: OAuth2/JWT for user actions; protection against review spam.

Estimation

Traffic Estimation:
Search: 50M DAU * 10 searches/day = 500M searches/day.
QPS: 500M / 86400 ≈ 6,000 Average QPS. Peak QPS ≈ 12,000.
Writes: 50M DAU * 0.1 reviews/day = 5M reviews/day ≈ 60 QPS.
Storage Estimation:
100M businesses * 1KB/metadata = 100GB.
2B reviews * 2KB/review = 4TB.
Images: 5 photos/business 100M 200KB = 100TB (Stored in Object Store).
Bandwidth Estimation:
Outgoing: 6k QPS (10 results 5KB/result) ≈ 300 MB/s.

Blueprint

Concise Summary: A microservices architecture leveraging a Geospatial-capable database (PostgreSQL + PostGIS) for precise searching and a distributed cache for high-traffic business profiles.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing.
Search Service: Executes geospatial queries using Geohashing or spatial indexes.
Business Service: Manages CRUD operations for business profiles.
Review Service: Handles the high-volume write stream of user ratings and reviews.
PostgreSQL + PostGIS: The source of truth providing robust spatial indexing.
Redis: Caches "hot" business profiles and frequently accessed search results.
Simplicity Audit: This architecture avoids complex stream processing or multi-tier spatial trees for the MVP, relying on the mature PostGIS extension which scales vertically well and supports horizontal sharding by region if needed.
Architecture Decision Rationale:
Why this architecture?: Relational DBs with PostGIS are the gold standard for combined metadata/geospatial queries.
Functional Satisfaction: Meets search, profile, and review requirements through dedicated services.
Non-functional Satisfaction: Scalability is achieved via read-replicas and Redis; availability is managed through service redundancy.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
CDN: Use Cloudflare or CloudFront to cache static business photos and UI assets globally.
Anycast DNS: Routes users to the nearest regional data center.
Security & Perimeter:
API Gateway: Implements JWT validation and standard rate limiting (e.g., 100 requests/minute per user) to prevent scraping.
WAF: Standard protection against SQL injection on the search parameters.

Service

Topology & Scaling:
Stateless Services: All services are deployed in containers (Kubernetes) allowing horizontal scaling based on CPU utilization.
Isolation: Search and Review services are isolated so a spike in review-writing doesn't impact search availability.
API Schema Design:
GET /v1/search?lat=...&long=...&radius=...&category=... (REST)
Response: List of Business Summaries.
GET /v1/business/{id} (REST)
Response: Full profile + Top 5 reviews.
POST /v1/reviews (REST)
Request: BusinessID, Rating, Comment.
Idempotency: Client-generated UUID for the review.
Resilience & Reliability:
Retries: Exponential backoff for Search Service -> DB connections.
Circuit Breaker: If the Review Service is down, the Business Service can still serve profiles without reviews.

Storage

Access Pattern:
Search: High Read, Complex Range Queries.
Profiles: High Read, Point Queries (Key-Value style).
Reviews: High Write, Time-series ordered.
Database Table Design:
Businesses: id (UUID), name, location (GEOGRAPHY), geohash (Varchar), metadata (JSONB).
Reviews: id, business_id, user_id, rating, comment, created_at.
Technical Selection: PostgreSQL + PostGIS.
Rationale: PostGIS provides the GIST index for R-Tree based spatial queries, which is more flexible than Geohash for complex polygons/radii.
Distribution Logic:
Sharding: Shard by location_id or city_id. Businesses in NYC rarely need to be queried alongside businesses in London.
Replication: 1 Primary (Writes) + 3 Read Replicas (Search traffic).

Cache

Purpose & Justification: Reduces DB load for "Hot" businesses (e.g., Starbucks in Times Square) and popular search keywords.
Key-Value Schema:
Key: biz:{id}, Value: Serialized JSON Profile, TTL: 1 Hour.
Key: search:{geohash_prefix}, Value: List of IDs, TTL: 15 Minutes.
Technical Selection: Redis.
Failure Handling: If Redis fails, the system falls back to the Read Replicas. To prevent a "thundering herd," we use a random jitter in TTL.

Messaging

Purpose & Decoupling: Kafka is used to decouple the critical path of posting a review from the background tasks like updating the business's average rating or fraud detection.
Event / Topic Schema: review-events: {business_id, user_id, rating, timestamp}.
Technical Selection: Kafka.
Failure Handling: Use a Dead Letter Queue (DLQ) for malformed review events that fail processing.
Wrap Up

Advanced Topics

Trade-offs: We chose PostGIS over a pure Geohash/NoSQL approach. While Geohash is easier to shard on a standard NoSQL DB, PostGIS handles "edge of the box" calculations much more accurately without complex application-level logic.
Reliability: Multi-AZ deployment for PostgreSQL ensures that a single zone failure does not result in data loss.
Bottleneck Analysis: The primary DB index for spatial data can become a bottleneck. If this happens, we move to a Quadtree held in memory (using a service like ElasticSearch or a custom service) to offload the 2D search.
Distinguishing Insight: For a true global Yelp clone, the use of Cell-based Sharding (S2/H3) is superior to simple Latitude/Longitude sharding because it accounts for the curvature of the earth and provides constant-area cells, ensuring uniform data distribution across shards.