The Question
DesignReal-Time Ride-Sharing System Design
Design the core backend for a ride-sharing application similar to Uber. The system should support 10,000 daily active users. Focus on the real-time driver-rider matching mechanism, efficient location tracking for moving vehicles, and the ride lifecycle state management. Discuss how you would handle geospatial data at scale and ensure high availability for the dispatching process while maintaining transactional integrity for ride assignments.
PostgreSQL
PostGIS
Redis
WebSockets
JWT
Docker
Geohash
Questions & Insights
Clarifying Questions
Geographic Scale: Is this for a single city or global?
Assumption: We are targeting a single large metropolitan area for the MVP to minimize cross-region latency and complexity.
Rider-to-Driver Ratio: What is the expected split of the 10K DAU between riders and drivers?
Assumption: 8,000 riders and 2,000 drivers.
Location Update Frequency: How often do drivers report their GPS coordinates?
Assumption: Every 5 seconds to provide a balance between real-time accuracy and battery/network efficiency.
Matching Logic Complexity: Do we need advanced surge pricing or carpooling for the MVP?
Assumption: No. Simple "nearest driver" matching and fixed/pre-calculated pricing based on distance/time.
Thinking Process
The core challenge of a ride-sharing app, even at 10K DAU, is the efficient handling of geospatial data and the state synchronization between two mobile clients (Rider and Driver).
How do we efficiently track and query thousands of moving drivers?
We use a geospatial index (Redis GEO) to store the latest coordinates of active drivers, allowing "radius searches" in sub-millisecond time.
How do we ensure a driver isn't assigned to two rides simultaneously?
We implement a state machine for the "Ride" entity and use distributed locking or atomic database updates to "claim" a driver.
How do we handle the real-time communication between the server and the apps?
We utilize WebSockets for low-latency notifications (e.g., "Ride Found," "Driver Arrived") rather than expensive polling.
Bonus Points
Geospatial Indexing Choice: Using Uber’s H3 (Hexagonal Hierarchical Spatial Index) over S2/Geohash to avoid edge-case issues in grid-based systems (hexagons have uniform neighbor distances).
Idempotency Keys: Implementing idempotency for ride requests and payment triggers to prevent double-charging during network retries.
Write-Heavy Optimization: Using a "Last-Write-Wins" strategy in Redis for location updates to avoid hammering the primary persistent database.
Graceful Degradation: If the matching service fails, allow the app to fall back to a "Service Unavailable" state rather than crashing, and preserve existing ride states in the local client cache.
Design Breakdown
Functional Requirements
Core Use Cases:
Riders can request a ride by specifying a destination.
Drivers can signal availability (Go Online/Offline).
The system matches a Rider with the nearest available Driver.
Real-time tracking of the driver’s location on the rider’s map.
Ride completion and basic payment processing.
Scope Control:
In-Scope: Rider/Driver onboarding, Real-time location tracking, Matching, Trip lifecycle.
Out-of-Scope: Surge pricing, Carpooling (Uber Pool), Driver ratings/reviews, Detailed Map tiles (use Google Maps API).
Non-Functional Requirements
Scale: Support 10K DAU with roughly 500–1,000 concurrent active connections.
Latency: Matching should happen within < 2 seconds; location updates < 500ms.
Availability: High availability (99.9%) for the matching and tracking flows.
Consistency: Strong consistency for ride assignment (one driver per ride).
Security: TLS for all transit; storage of PII (names, phone numbers) must be encrypted.
Estimation
Traffic:
10K DAU, 20% concurrent at peak = 2,000 active users.
Drivers (500) send location every 5s = 100 QPS.
Riders (1,500) poll or receive WebSocket updates = ~300-500 QPS.
Storage:
10,000 rides/day. Each record ~1KB. 10MB/day.
1 Year = ~3.6GB. Extremely manageable for a single PostgreSQL instance.
Bandwidth:
100 QPS (Location) * 500 bytes = 50 KB/s (Inbound). Negligible.
Blueprint
The architecture uses a modular service approach to separate concerns between user management, real-time location tracking, and the transactional ride lifecycle.
Major Components:
API Gateway: Entry point for Auth, Rate Limiting, and routing requests to internal services.
Location Service: High-speed ingestion of driver coordinates using Redis.
Matching Service: Orchestrates the logic of finding the best driver for a request.
Ride Service: Manages the state machine of a trip (Requested, Accepted, In-Progress, Completed).
PostgreSQL: The source of truth for user profiles and ride history.
Simplicity Audit: For 10K DAU, a single RDS instance and a few containerized services are sufficient. We avoid Kafka or complex micro-frontends to keep operational overhead low.
Architecture Decision Rationale:
PostgreSQL + PostGIS: Best-in-class for relational data with spatial query support for ride history.
Redis: Essential for the high-frequency "ephemeral" write pattern of GPS updates.
WebSockets: Superior to long-polling for real-time driver movement on the rider's map.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Traffic Routing: Standard AWS ALB (Application Load Balancer) for L7 routing.
Security: JWT-based authentication at the Gateway. Rate limiting set to 50 requests/sec per user to prevent API abuse.
Service
Topology: Services deployed as Docker containers on a managed cluster (e.g., AWS ECS). Stateless design allows horizontal scaling.
API Schema:
POST /v1/rides: Create a ride request. (Idempotent via request_id).PUT /v1/driver/location: Driver GPS update. (Protocol: WebSocket or REST).PATCH /v1/rides/{id}/status: Transition ride state (e.g., ACCEPTED).Resilience: Exponential backoff for mobile clients during network switches (LTE to WiFi).
Storage
Access Pattern:
Ride Service: Heavy ACID transactions for state changes.
Location Service: Extreme write-heavy for "Live" locations; Read-heavy for "Nearby" queries.
Database Table Design (PostgreSQL):
Users: id, role (rider/driver), profile_data.Rides: id, rider_id, driver_id, status (enum), pickup_loc, destination_loc, price.Technical Selection: PostgreSQL with PostGIS extension for persistent spatial queries (e.g., "Find all rides in this neighborhood last month").
Cache
Purpose: Store current driver locations and availability status.
Key-Value Schema:
Key:
driver_locations (Redis GeoSet).Member:
driver_id, Score: Geohash.Key:
driver_status:{id}: String value (AVAILABLE, BUSY, OFFLINE).Failure Handling: If Redis fails, drivers will appear "offline." We use Redis AOF (Append Only File) to ensure quick recovery of location data.
Wrap Up
Advanced Topics
Consistency vs Availability: We prioritize Consistency for the matching process (using DB transactions) to ensure a driver isn't double-booked. We prioritize Availability for location updates; if one update is lost, the next one in 5 seconds will correct it.
Matching Logic: The Matching Service performs a
GEORADIUS query in Redis to find drivers within 5km. It then filters by AVAILABLE status. To scale, we can use a "fan-out" approach where the top 3 drivers are notified, and the first to respond wins.Optimization: To reduce battery drain, the app can decrease GPS update frequency when the driver is stationary (detected via accelerometer).