The Question

Scalable Proximity-Based Discovery System

Design a local discovery and review platform similar to Yelp. The system must support searching for millions of businesses based on real-time geographic location, viewing rich business profiles, and handling high-concurrency user reviews. Focus on the geospatial indexing strategy, data consistency for business ratings, and how to scale the system for global traffic with low-latency search results.

PostgreSQL

PostGIS

Redis

Kafka

Geohash

S2 Geometry

CDN

Kubernetes

Microservices

Questions & Insights

Clarifying Questions

What is the scale of the data? (Assumption: 100 million businesses globally, 50 million Daily Active Users (DAU), and an average of 10-20 reviews per business).

What are the primary search criteria? (Assumption: Primarily location-based "near me" searches, with secondary filters for category, price, and rating).

What is the expected latency for search? (Assumption: P99 < 200ms for geospatial queries).

How frequent are business information updates? (Assumption: Low write volume for business metadata; high write volume for reviews/ratings, but eventual consistency is acceptable).

Thinking Process

Core Bottleneck: Efficiently querying 2D coordinate data (latitude/longitude) in a 1D database index.

Progressive Approach:

How do we store business metadata and handle basic CRUD?

How do we convert 2D coordinates into a searchable format (Geohashing vs. Quadtree)?

How do we scale the read-heavy search traffic?

How do we handle high-frequency review ingestion without impacting search performance?

Bonus Points

S2 Geometry: Discuss using Google’s S2 library for spherical geometry which uses Hilbert Curves, offering better cell coverage and less "edge case" behavior than Geohash.

Dynamic Grid Density: Explain how to adjust the depth of the spatial index (e.g., smaller cells in San Francisco, larger cells in rural Nebraska) to balance load.

Read-Optimized Views: Mention Materialized Views or CQRS to separate the complex business-searching logic from the metadata management.

Geo-Sharding: Partitioning data by geographic regions (e.g., continent or country) to ensure data locality and reduce cross-region latency.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can search for businesses within a specified radius or bounding box.

Users can view detailed business profiles (metadata, reviews, photos).

Users can post reviews and ratings for a business.

Business owners can add or update their business information.

Scope Control:

In-scope: Search, Business Metadata, Reviews, Basic Ranking.

Out-of-scope: Reservation systems, Food delivery tracking, Social feeds/friends, Ad-bidding engines.

Non-Functional Requirements

Scale: Must support 500 million searches per day and 100 million businesses.

Latency: Sub-second search results; high-speed profile loading.

Availability & Reliability: 99.99% availability (CAP: Availability over Consistency).

Consistency: Eventual consistency for search results and reviews (seconds to minutes delay is fine).

Security: OAuth2/JWT for user actions; protection against review spam.

Estimation

Traffic Estimation:

Search: 50M DAU * 10 searches/day = 500M searches/day.

QPS: 500M / 86400 ≈ 6,000 Average QPS. Peak QPS ≈ 12,000.

Writes: 50M DAU * 0.1 reviews/day = 5M reviews/day ≈ 60 QPS.

Storage Estimation:

100M businesses * 1KB/metadata = 100GB.

2B reviews * 2KB/review = 4TB.

Images: 5 photos/business 100M 200KB = 100TB (Stored in Object Store).

Bandwidth Estimation:

Outgoing: 6k QPS (10 results 5KB/result) ≈ 300 MB/s.

Blueprint

Concise Summary: A microservices architecture leveraging a Geospatial-capable database (PostgreSQL + PostGIS) for precise searching and a distributed cache for high-traffic business profiles.

Major Components:

API Gateway: Handles authentication, rate limiting, and request routing.

Search Service: Executes geospatial queries using Geohashing or spatial indexes.

Business Service: Manages CRUD operations for business profiles.

Review Service: Handles the high-volume write stream of user ratings and reviews.

PostgreSQL + PostGIS: The source of truth providing robust spatial indexing.

Redis: Caches "hot" business profiles and frequently accessed search results.

Simplicity Audit: This architecture avoids complex stream processing or multi-tier spatial trees for the MVP, relying on the mature PostGIS extension which scales vertically well and supports horizontal sharding by region if needed.

Architecture Decision Rationale:

Why this architecture?: Relational DBs with PostGIS are the gold standard for combined metadata/geospatial queries.

Functional Satisfaction: Meets search, profile, and review requirements through dedicated services.

Non-functional Satisfaction: Scalability is achieved via read-replicas and Redis; availability is managed through service redundancy.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

CDN: Use Cloudflare or CloudFront to cache static business photos and UI assets globally.

Anycast DNS: Routes users to the nearest regional data center.

Security & Perimeter:

API Gateway: Implements JWT validation and standard rate limiting (e.g., 100 requests/minute per user) to prevent scraping.

WAF: Standard protection against SQL injection on the search parameters.

Service

Topology & Scaling:

Stateless Services: All services are deployed in containers (Kubernetes) allowing horizontal scaling based on CPU utilization.

Isolation: Search and Review services are isolated so a spike in review-writing doesn't impact search availability.

API Schema Design:

GET /v1/search?lat=...&long=...&radius=...&category=... (REST)

Response: List of Business Summaries.

GET /v1/business/{id} (REST)

Response: Full profile + Top 5 reviews.

POST /v1/reviews (REST)

Request: BusinessID, Rating, Comment.

Idempotency: Client-generated UUID for the review.

Resilience & Reliability:

Retries: Exponential backoff for Search Service -> DB connections.

Circuit Breaker: If the Review Service is down, the Business Service can still serve profiles without reviews.

Storage

Access Pattern:

Search: High Read, Complex Range Queries.

Profiles: High Read, Point Queries (Key-Value style).

Reviews: High Write, Time-series ordered.

Database Table Design:

Businesses: id (UUID), name, location (GEOGRAPHY), geohash (Varchar), metadata (JSONB).

Reviews: id, business_id, user_id, rating, comment, created_at.

Technical Selection: PostgreSQL + PostGIS.

Rationale: PostGIS provides the GIST index for R-Tree based spatial queries, which is more flexible than Geohash for complex polygons/radii.

Distribution Logic:

Sharding: Shard by location_id or city_id. Businesses in NYC rarely need to be queried alongside businesses in London.

Replication: 1 Primary (Writes) + 3 Read Replicas (Search traffic).

Cache

Purpose & Justification: Reduces DB load for "Hot" businesses (e.g., Starbucks in Times Square) and popular search keywords.

Key-Value Schema:

Key: biz:{id}, Value: Serialized JSON Profile, TTL: 1 Hour.

Key: search:{geohash_prefix}, Value: List of IDs, TTL: 15 Minutes.

Technical Selection: Redis.

Failure Handling: If Redis fails, the system falls back to the Read Replicas. To prevent a "thundering herd," we use a random jitter in TTL.

Messaging

Purpose & Decoupling: Kafka is used to decouple the critical path of posting a review from the background tasks like updating the business's average rating or fraud detection.

Event / Topic Schema: review-events: {business_id, user_id, rating, timestamp}.

Technical Selection: Kafka.

Failure Handling: Use a Dead Letter Queue (DLQ) for malformed review events that fail processing.

Wrap Up

Advanced Topics

Trade-offs: We chose PostGIS over a pure Geohash/NoSQL approach. While Geohash is easier to shard on a standard NoSQL DB, PostGIS handles "edge of the box" calculations much more accurately without complex application-level logic.

Reliability: Multi-AZ deployment for PostgreSQL ensures that a single zone failure does not result in data loss.

Bottleneck Analysis: The primary DB index for spatial data can become a bottleneck. If this happens, we move to a Quadtree held in memory (using a service like ElasticSearch or a custom service) to offload the 2D search.

Distinguishing Insight: For a true global Yelp clone, the use of Cell-based Sharding (S2/H3) is superior to simple Latitude/Longitude sharding because it accounts for the curvature of the earth and provides constant-area cells, ensuring uniform data distribution across shards.