DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Proximity-Based Discovery System

Design a local discovery and review platform similar to Yelp. The system must support searching for millions of businesses based on real-time geographic location, viewing rich business profiles, and handling high-concurrency user reviews. Focus on the geospatial indexing strategy, data consistency for business ratings, and how to scale the system for global traffic with low-latency search results.
PostgreSQL
PostGIS
Redis
Kafka
Geohash
S2 Geometry
CDN
Kubernetes
Microservices
Questions & Insights

Clarifying Questions

What is the scale of the data? (Assumption: 100 million businesses globally, 50 million Daily Active Users (DAU), and an average of 10-20 reviews per business).
What are the primary search criteria? (Assumption: Primarily location-based "near me" searches, with secondary filters for category, price, and rating).
What is the expected latency for search? (Assumption: P99 < 200ms for geospatial queries).
How frequent are business information updates? (Assumption: Low write volume for business metadata; high write volume for reviews/ratings, but eventual consistency is acceptable).

Thinking Process

Core Bottleneck: Efficiently querying 2D coordinate data (latitude/longitude) in a 1D database index.
Progressive Approach:
How do we store business metadata and handle basic CRUD?
How do we convert 2D coordinates into a searchable format (Geohashing vs. Quadtree)?
How do we scale the read-heavy search traffic?
How do we handle high-frequency review ingestion without impacting search performance?

Bonus Points

S2 Geometry: Discuss using Google’s S2 library for spherical geometry which uses Hilbert Curves, offering better cell coverage and less "edge case" behavior than Geohash.
Dynamic Grid Density: Explain how to adjust the depth of the spatial index (e.g., smaller cells in San Francisco, larger cells in rural Nebraska) to balance load.
Read-Optimized Views: Mention Materialized Views or CQRS to separate the complex business-searching logic from the metadata management.
Geo-Sharding: Partitioning data by geographic regions (e.g., continent or country) to ensure data locality and reduce cross-region latency.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can search for businesses within a specified radius or bounding box.
Users can view detailed business profiles (metadata, reviews, photos).
Users can post reviews and ratings for a business.
Business owners can add or update their business information.
Scope Control:
In-scope: Search, Business Metadata, Reviews, Basic Ranking.
Out-of-scope: Reservation systems, Food delivery tracking, Social feeds/friends, Ad-bidding engines.

Non-Functional Requirements

Scale: Must support 500 million searches per day and 100 million businesses.
Latency: Sub-second search results; high-speed profile loading.
Availability & Reliability: 99.99% availability (CAP: Availability over Consistency).
Consistency: Eventual consistency for search results and reviews (seconds to minutes delay is fine).
Security: OAuth2/JWT for user actions; protection against review spam.

Estimation

Traffic Estimation:
Search: 50M DAU * 10 searches/day = 500M searches/day.
QPS: 500M / 86400 ≈ 6,000 Average QPS. Peak QPS ≈ 12,000.
Writes: 50M DAU * 0.1 reviews/day = 5M reviews/day ≈ 60 QPS.
Storage Estimation:
100M businesses * 1KB/metadata = 100GB.
2B reviews * 2KB/review = 4TB.
Images: 5 photos/business 100M 200KB = 100TB (Stored in Object Store).
Bandwidth Estimation:
Outgoing: 6k QPS (10 results 5KB/result) ≈ 300 MB/s.

Blueprint

Concise Summary: A microservices architecture leveraging a Geospatial-capable database (PostgreSQL + PostGIS) for precise searching and a distributed cache for high-traffic business profiles.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing.
Search Service: Executes geospatial queries using Geohashing or spatial indexes.
Business Service: Manages CRUD operations for business profiles.
Review Service: Handles the high-volume write stream of user ratings and reviews.
PostgreSQL + PostGIS: The source of truth providing robust spatial indexing.
Redis: Caches "hot" business profiles and frequently accessed search results.
Simplicity Audit: This architecture avoids complex stream processing or multi-tier spatial trees for the MVP, relying on the mature PostGIS extension which scales vertically well and supports horizontal sharding by region if needed.
Architecture Decision Rationale:
Why this architecture?: Relational DBs with PostGIS are the gold standard for combined metadata/geospatial queries.
Functional Satisfaction: Meets search, profile, and review requirements through dedicated services.
Non-functional Satisfaction: Scalability is achieved via read-replicas and Redis; availability is managed through service redundancy.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
CDN: Use Cloudflare or CloudFront to cache static business photos and UI assets globally.
Anycast DNS: Routes users to the nearest regional data center.
Security & Perimeter:
API Gateway: Implements JWT validation and standard rate limiting (e.g., 100 requests/minute per user) to prevent scraping.
WAF: Standard protection against SQL injection on the search parameters.

Service

Topology & Scaling:
Stateless Services: All services are deployed in containers (Kubernetes) allowing horizontal scaling based on CPU utilization.
Isolation: Search and Review services are isolated so a spike in review-writing doesn't impact search availability.
API Schema Design:
GET /v1/search?lat=...&long=...&radius=...&category=... (REST)
Response: List of Business Summaries.
GET /v1/business/{id} (REST)
Response: Full profile + Top 5 reviews.
POST /v1/reviews (REST)
Request: BusinessID, Rating, Comment.
Idempotency: Client-generated UUID for the review.
Resilience & Reliability:
Retries: Exponential backoff for Search Service -> DB connections.
Circuit Breaker: If the Review Service is down, the Business Service can still serve profiles without reviews.

Storage

Access Pattern:
Search: High Read, Complex Range Queries.
Profiles: High Read, Point Queries (Key-Value style).
Reviews: High Write, Time-series ordered.
Database Table Design:
Businesses: id (UUID), name, location (GEOGRAPHY), geohash (Varchar), metadata (JSONB).
Reviews: id, business_id, user_id, rating, comment, created_at.
Technical Selection: PostgreSQL + PostGIS.
Rationale: PostGIS provides the GIST index for R-Tree based spatial queries, which is more flexible than Geohash for complex polygons/radii.
Distribution Logic:
Sharding: Shard by location_id or city_id. Businesses in NYC rarely need to be queried alongside businesses in London.
Replication: 1 Primary (Writes) + 3 Read Replicas (Search traffic).

Cache

Purpose & Justification: Reduces DB load for "Hot" businesses (e.g., Starbucks in Times Square) and popular search keywords.
Key-Value Schema:
Key: biz:{id}, Value: Serialized JSON Profile, TTL: 1 Hour.
Key: search:{geohash_prefix}, Value: List of IDs, TTL: 15 Minutes.
Technical Selection: Redis.
Failure Handling: If Redis fails, the system falls back to the Read Replicas. To prevent a "thundering herd," we use a random jitter in TTL.

Messaging

Purpose & Decoupling: Kafka is used to decouple the critical path of posting a review from the background tasks like updating the business's average rating or fraud detection.
Event / Topic Schema: review-events: {business_id, user_id, rating, timestamp}.
Technical Selection: Kafka.
Failure Handling: Use a Dead Letter Queue (DLQ) for malformed review events that fail processing.
Wrap Up

Advanced Topics

Trade-offs: We chose PostGIS over a pure Geohash/NoSQL approach. While Geohash is easier to shard on a standard NoSQL DB, PostGIS handles "edge of the box" calculations much more accurately without complex application-level logic.
Reliability: Multi-AZ deployment for PostgreSQL ensures that a single zone failure does not result in data loss.
Bottleneck Analysis: The primary DB index for spatial data can become a bottleneck. If this happens, we move to a Quadtree held in memory (using a service like ElasticSearch or a custom service) to offload the 2D search.
Distinguishing Insight: For a true global Yelp clone, the use of Cell-based Sharding (S2/H3) is superior to simple Latitude/Longitude sharding because it accounts for the curvature of the earth and provides constant-area cells, ensuring uniform data distribution across shards.