The Question
DesignReal-time Multiplayer Tetris System Design
Design a high-scale, real-time multiplayer Tetris game. The system must support 1v1 matchmaking based on skill level, server-authoritative move validation to prevent cheating, and a global leaderboard. Focus on minimizing latency for a responsive 'feel,' handling 100,000 concurrent users, and ensuring game state consistency between opponents. Address how garbage lines are synchronized and how the system handles player disconnections.
WebSockets
Redis
PostgreSQL
Protobuf
JWT
CDN
Questions & Insights
Clarifying Questions
Is this primarily single-player with a leaderboard, or real-time head-to-head multiplayer?
Assumption: We will design for real-time 1v1 multiplayer as it presents the most significant architectural challenge (synchronization and latency).
What is the expected scale (DAU and Concurrent Users)?
Assumption: 1 million DAU, with 50,000 Concurrent Users (CCU).
Is the game logic server-authoritative or client-side?
Assumption: Server-authoritative logic to prevent cheating, especially for competitive leaderboards.
What is the latency threshold for a "good" experience?
Assumption: Player actions (rotation, drop) should feel responsive, targeting <100ms round-trip latency.
Thinking Process
Core Bottleneck: Real-time state synchronization between two players and a central server with minimal jitter.
Key Focus Areas:
Connection Management: Using WebSockets for persistent, low-latency bi-directional communication.
Matchmaking: Efficiently pairing players of similar skill levels using a memory-resident queue.
Server-Authoritative Game Loop: Processing inputs on the server to prevent "teleporting" blocks or score manipulation.
State Compression: Minimizing the payload size of board updates to stay within MTU limits and reduce lag.
Bonus Points
Input Prediction & Lag Compensation: Implementing client-side prediction where the client renders the move immediately while waiting for server confirmation, rolling back only on discrepancies.
Delta Compression: Instead of sending the whole 10x20 grid, only send the coordinates of the active piece and lines cleared to save bandwidth.
Geo-aware Matchmaking: Placing players in the same game session on a server closest to their geographic midpoint to equalize latency.
Bin-packing Game Sessions: Packing multiple game instances (lightweight threads/goroutines) into a single high-compute server instance to optimize costs.
Design Breakdown
Functional Requirements
Core Use Cases:
Users can join a matchmaking queue.
Users can play a real-time 1v1 Tetris match.
Users see their opponent's board and "garbage lines" sent to them.
Users can view a global leaderboard of top scores.
Scope Control:
In-scope: Real-time 1v1, Matchmaking, Anti-cheat logic, Global Leaderboard.
Out-of-scope: In-game voice chat, Replay system, Clan/Guild features, Cosmetic skins store.
Non-Functional Requirements
Scale: Support 50k concurrent games (100k CCU).
Latency: Sub-100ms latency for piece movement and rotation.
Availability: High availability for the Matchmaking and Leaderboard services (99.9%). Game sessions should be resilient but can be terminated if a server fails.
Consistency: Strong consistency for Leaderboard rankings; Eventual consistency for historical match stats.
Security: Prevent score spoofing via server-side validation of every piece movement.
Estimation
Traffic Estimation:
100k CCU = 50k active games.
Input frequency: ~5 actions/second per player.
Total QPS (Inputs): 100k users * 5 = 500k WebSocket messages/sec.
Storage Estimation:
User profiles + Stats: 1 million users * 1KB = 1GB.
Leaderboard: 1 million entries in a Sorted Set ≈ 100MB.
Bandwidth Estimation:
Packet size: ~200 bytes (JSON/Protobuf).
500k pkts/sec * 200 bytes = 100 MB/s total ingress/egress.
Blueprint
Concise Summary: A WebSocket-based real-time architecture utilizing a Stateful Game Server for authoritative logic and Redis for matchmaking and leaderboards.
Major Components:
WebSocket Gateway: Manages persistent connections and routes messages between clients and game servers.
Matchmaker: Uses Redis to pair players by skill (ELR/MMR) and assigns them to a Game Server.
Stateful Game Server: Hosts the actual game logic, validates moves, and calculates "garbage" lines.
Leaderboard Service: Manages global rankings using Redis Sorted Sets.
Simplicity Audit: This architecture avoids complex message brokers (like Kafka) for the real-time path, opting for direct WebSocket-to-Server routing to minimize latency.
Architecture Decision Rationale:
Why this architecture?: Tetris requires immediate feedback. A stateful server model minimizes the overhead of fetching state from a DB for every single block rotation.
Functional Satisfaction: Matchmaking pairs players; Game servers handle the 1v1 logic and garbage line mechanics.
Non-functional Satisfaction: WebSockets provide low latency; Redis Sorted Sets provide O(\log N) leaderboard updates for high scale.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing:
Use a global CDN for static assets (JS/WASM game engine).
Anycast DNS to route users to the nearest regional data center.
Security & Perimeter:
API Gateway: Handles JWT-based authentication.
Rate Limiting: Limits "Join Queue" requests to prevent spamming the Matchmaker.
SSL Termination: Performed at the Load Balancer to offload compute from Game Servers.
Service
Topology & Scaling:
Stateful Game Servers: Scaled based on "Session Count." Once a server hits its limit (e.g., 500 games), it marks itself as "full" in the Session Manager.
Matchmaker: Stateless service using Redis as the source of truth for the queue.
API Schema Design:
POST /v1/matchmaking/join: Starts the search.WebSocket /game/stream: Protobuf-based binary protocol for moves.msg_type: MOVE, direction: LEFT, timestamp: 162540...Resilience & Reliability:
Heartbeats: Clients send heartbeats every 5s. If missed, the server waits 10s for reconnection before declaring a forfeit.
Circuit Breaker: If the Leaderboard service is down, game servers buffer scores locally to retry later.
Storage
Access Pattern:
Heavy writes to Redis for matchmaking/leaderboards.
Low-frequency writes to PostgreSQL for match history.
Database Table Design (PostgreSQL):
users: user_id (PK), username, m_score, created_atmatch_history: match_id (PK), player1_id, player2_id, winner_id, p1_score, p2_score, ended_atTechnical Selection:
Redis: For real-time data due to sub-millisecond latency.
PostgreSQL: For relational integrity of user data and historical records.
Distribution Logic:
Sharding PostgreSQL by
user_id if the user base grows to 100M+.Cache
Purpose & Justification: Redis acts as both a cache and a primary store for transient data (active games and matchmaking).
Key-Value Schema:
matchmaking_queue:skill_level: Redis List or Sorted Set (Score = Entry Time).global_leaderboard: Redis Sorted Set (Member = UserID, Score = Tetris Rating).Failure Handling:
If Redis fails, matchmaking is paused. Since this is an MVP, we accept this risk rather than implementing a complex multi-master DB.
Wrap Up
Advanced Topics
Stateful vs Stateless: We chose Stateful Game Servers.
Trade-off: Harder to scale (requires sticky sessions/session mapping).
Benefit: Extremely low latency as the game state resides in the server's RAM.
Protocol Choice: We chose WebSockets over HTTP Long Polling.
Reasoning: Tetris requires high-frequency updates (multiple times per second). The overhead of HTTP headers in every move would be prohibitive.
Anti-Cheat Strategy:
The server maintains its own copy of the board.
When a client says "Rotate Piece," the server checks if the rotation is valid. If the client sends a "Line Clear" message but the server's board doesn't show a full line, the message is ignored or the player is flagged.
Optimizing Garbage Lines:
When Player A clears 4 lines, a "Garbage Event" is sent to the Session Manager, which then pushes a message to Player B's WebSocket. This is handled entirely in memory on the Game Server hosting that specific match.