The Question
DesignReal-time Trading Platform Design
Design a high-concurrency retail trading platform like Robinhood. The system must support millions of users viewing real-time stock prices and executing buy/sell orders with strong consistency for financial ledgers. Focus on the architecture for low-latency market data distribution and a resilient asynchronous order execution pipeline that interacts with external market makers.
PostgreSQL
Kafka
Redis
WebSockets
Go
Protobuf
Kubernetes
mTLS
Questions & Insights
Clarifying Questions
What is the primary asset class for the MVP?
Assumption: We focus on US Equities (Stocks and ETFs) to minimize regulatory and technical complexity for the MVP.
What is the expected scale (DAU and Peak Traffic)?
Assumption: 20M total users, 5M DAU. Peak traffic occurs during market open (9:30 AM EST) and high-volatility events, requiring the system to handle 10x the average load.
Do we need an internal matching engine or do we route to external market makers/exchanges?
Assumption: To reflect the Robinhood business model (PFOF), we route orders to external market makers/exchanges via FIX protocol or proprietary APIs.
How "real-time" must the market data be?
Assumption: Sub-second latency for UI updates via WebSockets is required to provide a competitive trading experience.
Thinking Process
Core Bottleneck: High-concurrency write contention on account balances and order books during market volatility.
Key Questions for Design Evolution:
How do we ensure atomic transactions for buying/selling while maintaining high throughput? (Answer: Use RDBMS with strict ACID and row-level locking or Event Sourcing).
How do we deliver market price updates to millions of concurrent users without melting the backend? (Answer: Pub/Sub model via WebSockets and an Edge Cache).
How do we handle the asynchronous nature of trade execution with external parties? (Answer: Distributed Saga pattern or State Machine with a message broker like Kafka).
How do we protect the system from "thundering herd" during market open? (Answer: Pre-warmed caches, aggressive rate limiting, and prioritized queueing).
Bonus Points
Idempotency Keys: Use client-side generated UUIDs for every order request to prevent double-spending/double-buying during retry loops in unstable network conditions.
Event Sourcing for Audit: Implement an append-only ledger for all balance changes to ensure a perfect audit trail and simplify reconciliation with clearinghouses.
Hot-Partition Mitigation: Use specialized sharding for high-volume ticker data (e.g., TSLA, NVDA) to prevent single-database-node saturation.
Isolation Levels: Use
REPEATABLE READ or SERIALIZABLE isolation for the transaction ledger to prevent race conditions in margin calculations.Design Breakdown
Functional Requirements
Core Use Cases:
Users can view real-time market prices and charts.
Users can place Market and Limit orders (Buy/Sell).
Users can view their Portfolio (current holdings, total value).
Users receive notifications on order execution.
Scope Control:
In-Scope: Equity trading, real-time price streaming, basic order management, portfolio tracking.
Out-of-Scope: Options, Crypto, Margin trading, Social features, Instant Bank Transfers (ACH lifecycle).
Non-Functional Requirements
Scale: Support 5M DAU with peak 100k+ TPS (Transactions Per Second) during market events.
Latency: < 100ms for order placement acknowledgment; < 500ms for end-to-end price propagation.
Availability & Reliability: 99.99% availability during market hours. Zero data loss for financial records.
Consistency: Strong consistency for account balances and order states.
Fault Tolerance: Graceful degradation (e.g., disable charts but keep trading active).
Security & Privacy: PCA-DSS compliance, TLS 1.3, and strict mTLS between internal services.
Estimation
Traffic Estimation:
5M DAU * 10 orders/day = 50M orders/day.
Peak QPS (Market Open): ~50k-100k requests/sec.
Price Stream: 10,000 tickers * 10 updates/sec = 100k updates/sec to be fanned out to active users.
Storage Estimation:
1 order record ≈ 500 bytes. 50M orders/day = 25GB/day.
10 years of data = ~90TB (manageable with sharding).
Bandwidth Estimation:
Outgoing (Price stream): 5M active connections * 1KB/sec = 5GB/s (requires heavy CDN/Edge usage).
Blueprint
Concise Summary: A microservices-based architecture leveraging an asynchronous order processing pipeline. Market data is ingested via a streaming layer and broadcast to users over WebSockets, while trade execution is handled by a transactional Order Service backed by a relational database and Kafka for reliability.
Major Components:
API Gateway: Handles authentication, rate limiting, and routing for mobile/web clients.
Market Data Service: Ingests vendor feeds (Nasdaq/NYSE) and publishes to a WebSocket fleet.
Order Service: Manages the order lifecycle (Validate -> Lock Funds -> Route -> Execute).
Portfolio Service: Tracks user holdings and calculates real-time profit/loss (P&L).
Message Broker (Kafka): Decouples order placement from execution and downstream updates (notifications/ledger).
Simplicity Audit: This architecture uses a standard RDBMS for financial integrity and Kafka for async processing, avoiding complex distributed ledgers or custom matching engines not required for a PFOF-based MVP.
Architecture Decision Rationale:
Why this architecture is the best?: It separates the high-volume, read-heavy market data path from the high-integrity, write-heavy trading path.
Functional Requirement Satisfaction: WebSocket fleet handles real-time updates; Kafka ensures order execution is never lost.
Non-functional Requirement Satisfaction: RDBMS provides ACID for consistency; horizontal scaling of the WebSocket fleet handles the massive fan-out.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: Use a Global Load Balancer with Geo-DNS. Static assets (UI) are served via CDN.
Security & Perimeter:
API Gateway: Implements JWT validation and request signing to prevent tampering.
Rate Limiting: Enforced at the user ID level to prevent "keyboard smashing" bots during volatility.
SSL/TLS: Terminated at the Gateway; mTLS used for internal service communication.
Service
Topology & Scaling: Stateless services (Order, Portfolio) deployed in K8s across multiple Availability Zones. Scaling is based on CPU and request latency.
API Schema Design:
POST /v1/orders: symbol, qty, side, type, limit_price, idempotency_key. Returns order_id.GET /v1/portfolio: Returns current holdings and equity.Resilience & Reliability:
Circuit Breakers: Implemented on the Execution Worker when calling External Exchange APIs.
Retries: Exponential backoff for non-terminal failures (e.g., 503 from exchange).
Security: RBAC for internal tools; sensitive data (SSN/Bank info) is encrypted at the application layer before storage.
Storage
Access Pattern:
Order DB: High write (new orders, status updates).
Portfolio DB: Read-heavy (UI display), moderate write (update on execution).
Database Table Design:
Orders Table:
id (PK), user_id (FK), ticker, side (BUY/SELL), status (PENDING/FILLED/CANCELLED), price, quantity, timestamp.Ledger Table:
id, user_id, amount, transaction_type, reference_id.Technical Selection: PostgreSQL with logical sharding by
user_id. Rationale: ACID compliance is non-negotiable for financial transactions.Distribution Logic: Sharding by
user_id ensures all data for a single user resides on one node, facilitating atomic balance updates.Cache
Purpose & Justification: Reduces load on the Market Data Service and provides sub-millisecond price lookups for the Order Service (to validate "limit price" against "current price").
Key-Value Schema:
ticker_symbol -> {last_price, bid, ask, timestamp}.Technical Selection: Redis. Use Pub/Sub internally to push updates from Ingestor to WebSocket Fleet.
Failure Handling: If Redis fails, fall back to the last known price in the Market Data Service memory.
Messaging
Purpose & Decoupling: Kafka acts as the source of truth for the order lifecycle.
Event / Topic Schema:
order-events: Contains state changes (PLACED, ROUTED, EXECUTED).portfolio-updates: Consumed by Portfolio Service to update holdings.Throughput & Partitioning: Partition by
user_id to ensure strict ordering of events for a specific user (e.g., Sell cannot be processed before Buy).Technical Selection: Kafka. Required for high-throughput and replayability (reconciliation).
Data Processing
Processing Model: Stream processing for market data ingestion.
Processing DAG:
Source: External FIX/UDP Feed.
Transform: Normalize data formats.
Sink: Update Redis and trigger WebSocket broadcast.
Technical Selection: Custom Go-based service for low-latency market data ingestion. Flink is skipped for MVP to reduce complexity.
Wrap Up
Advanced Topics
Trade-offs (PACELC): In the Order path, we choose Consistency (C) over Availability (A). If the DB is down, we cannot accept trades. In the Market Data path, we choose Availability (A) over Consistency (C) (it's okay if a price update is slightly delayed to some users).
Reliability: A "Dead Letter Queue" (DLQ) in Kafka handles failed order executions that require manual intervention or "compensating transactions."
Bottleneck Analysis: The primary bottleneck is the
user_id shard during high volatility. We mitigate this by using high-performance SSDs and optimizing the transaction length.Optimization: Use "Binary Packing" for WebSocket messages (e.g., Protobuf) to reduce egress bandwidth costs by up to 80% compared to JSON.