The Question
DesignReal-time Multiplayer Chess Platform
Design a scalable, real-time online chess system. The solution must address low-latency move delivery for time-sensitive matches, server-side validation of game logic to prevent cheating, an efficient matchmaking algorithm based on player skill (Elo), and a reliable method for persisting game history and updating global leaderboards.
WebSocket
Redis
Kafka
PostgreSQL
Protobuf
ELO Rating
Questions & Insights
Clarifying Questions
Scale and Traffic: What is the target for Daily Active Users (DAU) and Peak Concurrent Users (PCU)?
Assumption: 1M DAU and 100k PCU.
Game Modes: Do we need to support Bullet (1 min), Blitz (3-5 mins), or just Daily chess?
Assumption: Support for real-time Blitz/Bullet (latency critical) is required.
Anti-Cheat: Is server-side engine analysis required for the MVP to detect cheaters?
Assumption: No. MVP will focus on basic server-authoritative move validation only.
Matchmaking Logic: Is it strictly Elo-based?
Assumption: Yes, Elo-based matchmaking within specific rating buckets.
Thinking Process
Core Strategy: Use a stateful WebSocket-based architecture for real-time move delivery, backed by a fast in-memory store (Redis) for game state, and an asynchronous pipeline for persistence and leaderboard updates.
Step 1: The Connection: How do we maintain low-latency bidirectional communication?
Use WebSockets with a Gateway layer that maps Player IDs to specific Game Server instances.
Step 2: The Logic: How do we ensure no illegal moves?
The Game Server maintains a virtual board (e.g., using a chess library) and validates every move against the current FEN (Forsyth-Edwards Notation) state before broadcasting.
Step 3: The Matchmaking: How do we pair players efficiently?
Use Redis Sorted Sets to bucket players by Elo and "Matchmaking Workers" to poll and create game sessions.
Step 4: The Scale: How do we handle 100k concurrent games?
Shard game sessions across a cluster of Game Servers. Use a "Game Registry" in Redis to track which server is hosting which Game ID.
Bonus Points
Clock Synchronization: Implementing "Lag Compensation" algorithms to ensure that a player's clock doesn't deplete due to network jitter (server-side timestamping vs. client-side intent).
Z-Order Curves for Matchmaking: Using advanced indexing for multi-dimensional matchmaking (e.g., Elo + Latency/Region + Game Type).
Turn-based Consistency: Using a "Sequence Number" in the WebSocket protocol to handle out-of-order packets and reconnections seamlessly.
Hot-Path Optimization: Using Protocol Buffers (Protobuf) instead of JSON for WebSocket payloads to reduce bandwidth and serialization overhead.
Design Breakdown
Functional Requirements
Users can join a matchmaking queue based on game type (e.g., 5-min Blitz).
Users can play real-time chess with server-validated moves.
The system manages game clocks (timers) accurately.
Users can view their Elo-based leaderboard.
Users can view game history (PGN format).
Non-Functional Requirements
Low Latency: Move delivery < 100ms for a smooth experience.
High Availability: Matchmaking and ongoing games should not drop on single-node failure.
Consistency: The game state must be strictly authoritative on the server.
Scalability: Support 100k concurrent games.
Estimation
CCU: 100k players = 50k concurrent games.
Bandwidth: 1 move every 2 seconds per game. 50k / 2 = 25k moves/sec. Each move payload ~200 bytes. 25k * 200B = 5MB/s (Very manageable).
Storage: 1M games/day. Each PGN ~1KB. 1GB/day for game history. 365GB/year.
Memory: 50k active games in Redis. Each game state ~2KB. 50k * 2KB = 100MB (Trivial for Redis).
Blueprint
Concise Summary: A WebSocket-centric architecture where a Matchmaking Service pairs players into sessions hosted on stateful Game Servers, with Redis handling ephemeral state and Kafka decoupling long-term persistence.
Major Components:
API Gateway: Handles Auth, REST requests, and WebSocket upgrades/routing.
Matchmaking Service: Uses Redis-based queues to pair players of similar skill.
Game Service (Stateful): Manages WebSocket connections, validates chess logic, and tracks game clocks.
Game Registry (Redis): A global map of GameID to GameServerAddr to allow cross-server communication.
Persistence Worker: Consumes finished games from Kafka to update Postgres and Leaderboards.
Simplicity Audit: This is the simplest viable design because it avoids complex distributed locking by pinning a game to a single server node, while using Redis for fast discovery.
Architecture Decision Rationale:
Why this architecture?: Chess is a low-bandwidth but high-frequency state machine. A stateful server per game is the most performant way to handle timers and validation.
Functional Satisfaction: Covers matchmaking, real-time play, and history via the async pipeline.
Non-functional Satisfaction: Scalable via horizontal sharding of Game Servers; Low latency via direct WebSocket pipes.
High Level Architecture
Sub-system Deep Dive
Service
Topology & Scaling
Game Service: Stateful. Scaling is based on "Connection Count" and "CPU" (logic validation). Multi-AZ deployment is critical; if a node fails, clients reconnect and the new node fetches state from Redis.
Matchmaking Service: Stateless. Scales based on queue depth.
API Schema Design
POST /v1/matchmaking/join: { game_type: "blitz", elo: 1500 }.WS /v1/game/{game_id}: Binary Protobuf stream.MoveRequest: { from: "e2", to: "e4", timestamp: 1234567 }.GameUpdate: { fen: "...", white_clock: 298, black_clock: 300, last_move: "e4" }.Resilience & Reliability
Circuit Breaker: Used in the Gateway to stop sending players to the Matchmaking service if queues are backed up.
Heartbeats: WebSockets send pings every 5s. If a client dcs for > 30s, the server flags the game for "Timeout Win."
Observability
Metrics: Move-to-Broadcast latency (P99 < 50ms), Matchmaking wait time, Active WebSocket count.
Security
JWT Auth: Validated at the Gateway during WS upgrade.
Server-Authoritative: Clients only send "Intent" (e2-e4). The server recalculates the board to prevent move-injection or illegal teleports.
Storage
Access Pattern
Game history: Write-heavy (at game end), read-light (player profile view).
Leaderboards: Read-heavy (top 100), write-heavy (incrementing Elo).
Database Table Design
Games Table:
game_id (UUID), white_id, black_id, result, pgn_data (TEXT), created_at.Users Table:
user_id, username, elo_rating, games_played.Technical Selection
PostgreSQL: For persistent game history and user metadata (ACID compliant).
Redis: For active game state and leaderboards (Sorted Sets).
Distribution Logic
Sharding: Postgres sharded by
user_id for history. Redis uses Cluster mode for horizontal scale.Cache
Purpose & Justification:
Matchmaking Cache: Stores the list of players waiting for a game. Needs O(log N) for Elo-range searches.
Active Game State: Stores FEN strings and clocks. Prevents losing game progress if a Game Service node restarts.
Key-Value Schema:
game:{id} -> {fen, white_id, black_id, start_time}.leaderboard:blitz -> SortedSet {score: elo, member: user_id}.Failure Handling: Redis Sentinel for high availability. If Redis fails, current games are lost (acceptable for MVP), but Postgres remains the source of truth for history.
Messaging
Purpose & Decoupling: Decouples the real-time game loop from slow DB writes and third-party notifications.
Event Schema:
GameFinishedEvent: {game_id, pgn, winner_id, end_reason}.Technical Selection: Kafka. High throughput and ensures that even if the History Worker is down, game results are eventually processed.
Failure Handling: Dead-letter queue (DLQ) for malformed PGNs or failed Elo updates.
Wrap Up
Advanced Topics
Trade-offs (PACELC): We choose Availability over Consistency (AP) for the connection (if a node dies, move to another), but Strong Consistency for the game state (a move is either valid or not).
Bottleneck Analysis: The Matchmaking Service can become a bottleneck if the pool is huge. We solve this by partitioning the pool by Elo ranges (0-1000, 1000-2000, etc.).
Security: Chess engines are the biggest threat. While not in the MVP logic, the architecture allows adding an "Analysis Worker" to the Kafka pipeline to flag suspicious centipawn loss in hindsight.
Distinguishing Insight: To handle "Bullet Chess" where every ms counts, we use WebSockets with TCP_NODELAY enabled to prevent Nagle's algorithm from buffering small chess moves.