The Question
DesignReal-Time Online Chess Platform
Design a high-scale online chess platform capable of supporting 10 million daily active users. The system must handle real-time multiplayer gameplay with low latency (<200ms), manage distributed matchmaking based on player skill (Elo), and maintain a server-authoritative game state to prevent cheating. Consider how to handle clock synchronization, game persistence for millions of matches daily, and a global leaderboard that updates accurately as games conclude.
WebSockets
Redis
PostgreSQL
Kafka
Consistent Hashing
JWT
Protobuf
Questions & Insights
Clarifying Questions
Scale and Concurrency: What is the target for Daily Active Users (DAU) and peak concurrent games? (Assumption: 10M DAU, 500k concurrent games).
Latency Sensitivity: What is the acceptable "move-to-reflection" latency? (Assumption: < 200ms globally).
Matchmaking Criteria: Is it purely ELO-based, or do we consider geo-proximity and connection quality? (Assumption: Rank-based with secondary latency-based buckets).
Persistence: Do we need to store every move for every game indefinitely, or just the final result? (Assumption: Full move history for anti-cheat and "Game Review" features).
Game Logic: Should the game server run a full chess engine (e.g., Stockfish) for move validation and cheat detection? (Assumption: Lightweight validation on the game server; heavy analysis async).
Thinking Process
Real-Time Synchronicity: How do we maintain a persistent, low-latency duplex connection for 500k games simultaneously?
Server Authority: How do we prevent illegal moves and "time-hacking" in a distributed environment?
Distributed Matchmaking: How do we efficiently pair players of similar skill levels without creating bottlenecks or long wait times?
Reliable Rating Updates: How do we ensure Elo/Glicko-2 ratings are updated atomically and reflected on leaderboards at scale?
Bonus Points
Clock Drift Compensation: Implementing a "Network Time Protocol" (NTP) style handshake to calibrate game clocks, ensuring fair "flagging" (losing on time) despite network jitter.
Z-Order Curve for Matchmaking: Using spatial indexing or bucketing for multi-dimensional matchmaking (Skill + Latency + Behavior Score).
Sticky WebSocket Sessions: Leveraging consistent hashing at the Load Balancer level to route players of the same game to the same Game Server instance to minimize cross-server communication.
Delta-Compressed Move History: Using compact binary formats (Protobuf) for move notation to reduce bandwidth for mobile users.
Design Breakdown
Functional Requirements
Core Use Cases:
Players can join a matchmaking queue and be paired with opponents of similar rank.
Real-time move execution with server-authoritative validation.
Game clock management (Bullet, Blitz, Rapid).
Persistence of game history (PGN format).
Global leaderboard based on Elo ratings.
Scope Control:
In-scope: Matchmaking, Real-time moves, Leaderboards.
Out-of-scope: In-game chat, AI-bot play, tournament bracket management, social/friends lists.
Non-Functional Requirements
Scale: Support 10M DAU and 500k concurrent games.
Latency: Move propagation latency < 100ms (P99).
Availability: 99.99% availability; active games must survive server restarts (reconnection logic).
Consistency: Strong consistency for game state and rating updates.
Security: TLS encryption; server-side move validation to prevent cheating.
Estimation
Traffic:
500k concurrent games = 1M active WebSocket connections.
Avg moves/game = 40. Avg move frequency = 1 move per 5s.
Peak Move QPS = 1M users / 5s = 200k move requests/sec.
Storage:
10M games/day. 1 Game History (PGN) \approx 2KB.
Daily storage: 10M * 2KB = 20GB.
Yearly storage: ~7.3TB.
Bandwidth:
Incoming: 200k QPS * 500 bytes (move packet) \approx 100MB/s.
Outgoing: 200k QPS * 500 bytes (broadcast to opponent) \approx 100MB/s.
Blueprint
Concise Summary: A WebSocket-based architecture using a distributed Matchmaker and authoritative Game Servers. Game state is cached in Redis for low-latency access and persisted in PostgreSQL upon game completion.
Major Components:
API Gateway: Handles authentication and routing for REST and WebSocket upgrades.
Matchmaker Service: Uses Redis-sorted sets to group players by rank and pair them.
Game Server Cluster: Manages real-time game state, clocks, and move validation.
State Store (Redis): Stores ephemeral game data (current board, active clocks).
Rating Worker: Async processing of game results to update Elo and leaderboards.
Simplicity Audit: This design avoids complex distributed lock managers by assigning a specific Game Server instance as the "leader" for a specific Game ID via consistent hashing.
Architecture Decision Rationale:
WebSockets over HTTP: Essential for bidirectional real-time updates and minimizing header overhead.
Redis for Matchmaking: Extremely high throughput for atomic "pop" operations when pairing players.
PostgreSQL: Chosen for game history due to ACID requirements for rating integrity.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: Use a Global Accelerator (Anycast IP) to route traffic to the nearest regional PoP, minimizing TCP handshake latency.
Security & Perimeter:
API Gateway: Performs JWT validation and terminates SSL.
Rate Limiting: Applied per UserID to prevent "move-spamming" DoS.
Service
Matchmaker Service:
Players are placed into "Elo Buckets" (e.g., 1200-1300).
If no match is found in 5s, the bucket range expands.
Once two players are matched, a
GameID is generated and they are redirected to a specific Game Server node.Game Server Cluster:
Stateful Management: Each server handles a subset of active games.
Authority: Validates move legality (FIDE rules). If a move is illegal, the server rejects it without broadcasting.
Heartbeats: Detects player disconnects. Starts a "grace period" timer for reconnection.
API Schema:
POST /v1/match/join: Join queue.WS /v1/game/{gameId}: WebSocket endpoint for moves.Message:
{ move: "e2e4", timestamp: 1625... }GET /v1/leaderboard: Fetch top players.Storage
Access Pattern: Heavy writes for game history (end of game), heavy reads/writes for active state (Redis).
Database Table Design:
Games Table: game_id (PK), white_user_id, black_user_id, pgn_history (TEXT), result, end_time.User_Stats Table: user_id (PK), elo_rating, games_played, wins, losses.Technical Selection:
PostgreSQL: Handles the relational nature of user stats and game history with strong consistency.
Redis: Sorted sets (ZSET) for the leaderboard (Score = Elo, Member = UserID).
Cache
Purpose: Low-latency storage for active board states and session data.
Key-Value Schema:
game:state:{gameId}: Hash map containing board_fen, white_clock_ms, black_clock_ms, last_move_ts.Failure Handling: If a Game Server crashes, the new assigned server fetches the state from Redis.
Messaging
Purpose: Decouples game completion from rating calculations and persistence.
Event Schema:
GameCompletedEvent { gameId, winnerId, pgn, duration }.Technical Selection: Kafka or RabbitMQ. Used to ensure that even if the Rating Worker is down, no game results are lost (durability).
Wrap Up
Advanced Topics
Trade-offs: We choose Consistency over Availability (CP) for the game state. A player would rather have the game pause (wait for state) than have the board desynchronize between opponents.
Reliability:
Redis Replication: Use Redis Sentinel or Cluster to ensure the active game state is not lost.
Reconnection Logic: Clients must be able to resume a WebSocket session using the
gameId and an auth token.Bottleneck Analysis:
Matchmaker Hotspots: At 10M DAU, a single Redis instance for matchmaking might bottleneck. Strategy: Shard the matchmaking queue by Elo ranges.
WebSocket Count: A single server can handle ~50k-100k connections. We need 10-20 servers for 1M connections.
Security:
Move Validation: Every move must be checked against the current FEN (Forsyth-Edwards Notation) stored in the server's memory/Redis.
Anti-Cheat: Periodically stream moves to an async Analysis Service running Stockfish to flag high accuracy/engine-like play.