The Question
DesignReal-Time Community Chat Platform
Design a large-scale real-time community chat platform similar to Discord. The system should support servers, channels, direct messaging, and live presence indicators for tens of millions of concurrent users, with sub-200ms message delivery and high availability across global regions.
WebSocket
Redis Pub/Sub
Cassandra
PostgreSQL
Consistent Hashing
Questions & Insights
Thinking Process
To design a system like Discord, the primary challenge is handling massive real-time fan-out and persistent connections.
Core Bottleneck: Managing millions of concurrent WebSocket connections and broadcasting a single message to thousands of members in a "Server" (Guild) simultaneously.
Progressive Questions:
Connection Management: How do we maintain stateful connections for millions of users? (Answer: Sharded WebSocket Gateways).
Message Fan-out: When a message is sent in a 100k-member server, how do we notify only the online users? (Answer: Pub/Sub with a Presence Service).
Storage Strategy: How do we store billions of chat messages while maintaining fast "scroll-back" performance? (Answer: Wide-column NoSQL like Cassandra/ScyllaDB).
Presence Fatigue: How do we prevent the "Thundering Herd" problem when a celebrity goes online? (Answer: Presence "Lazy Loading" and batched updates).
Bonus Points
ScyllaDB Optimization: Using ScyllaDB instead of Cassandra for its shard-per-core architecture to minimize P99 latency during massive chat bursts.
Discord’s "Read States": Implementing a custom "Read Pointer" service using a "compressed bitset" or "version clocks" to track unread messages without per-message DB writes.
Gateway Sharding: Using a "Consistent Hashing" ring for WebSocket Gateways to ensure that when a node fails, only a fraction of users are disconnected.
Voice Infrastructure: Mentioning SFU (Selective Forwarding Unit) over MCU to minimize server-side transcoding costs for voice channels.
Design Breakdown
Functional Requirements
Users can create Servers (Guilds) and Channels.
Users can send/receive real-time text messages.
Users can see the online/offline status (Presence) of friends/server members.
Users can see message history (infinite scroll).
Non-Functional Requirements
Low Latency: Message delivery < 200ms.
High Availability: System must stay up even if a regional data center fails.
Scalability: Support 10M+ DAU and servers with 500k+ members.
Consistency: Eventual consistency for message history is acceptable; real-time delivery must feel ordered.
Estimation
DAU: 10 Million.
Messages per Day: 500 Million.
Average Message Size: 100 bytes.
Storage: 500M * 100 bytes = 50GB/day. ~18TB/year (excluding media).
Concurrency: Peak 1M concurrent WebSocket connections.
Throughput: ~6,000 Messages Per Second (MPS) average; 50k+ peak.
Blueprint
Concise Summary: A microservices architecture leveraging WebSockets for bi-directional communication, Redis for real-time presence/pub-sub, and Cassandra for scalable message storage.
Major Components:
Gateway Service (WebSocket): Manages persistent connections and routes incoming/outgoing real-time events.
Presence Service: Tracks user heartbeats and manages "who is online" state using an in-memory store.
Chat Service: Handles message validation, persistence logic, and fan-out triggers.
Metadata DB (Postgres): Stores relational data like Server settings, Channel permissions, and User profiles.
Message Store (Cassandra): Stores the actual chat history optimized for time-series retrieval.
Simplicity Audit: This design avoids complex stream processing (Flink/Spark) for the MVP, relying on Redis Pub/Sub for immediate message relay, which is sufficient for initial scale.
High Level Architecture
Sub-system Deep Dive
Service
Gateway Service (WebSocket):
Maintains long-lived TCP connections.
Upon connection, it maps
User_ID -> Connection_ID in the Session Cache.Uses a "Heartbeat" mechanism (every 30s) to detect stale connections.
API Spec:
POST /v1/channels/{id}/messages: Send a message.GET /v1/channels/{id}/messages?limit=50: Fetch history.WS /gateway: Upgrade to WebSocket for real-time events.Storage
Data Model (Cassandra):
Table:
messages_by_channelPartition Key:
channel_id (clusters all messages for one channel together).Clustering Key:
message_id (TimeUUID, descending) for fast retrieval of latest messages.Metadata DB (Postgres):
Tables for
Servers, Channels, Users, and Memberships.Uses standard B-Tree indexing on
server_id and user_id.Cache
Session Cache (Redis): Stores which Gateway node a user is connected to (e.g.,
user:123 -> gateway_node_5).Presence Store (Redis):
Stores
user_id: status (online, dnd, idle).Uses Redis Sorted Sets to track last-active timestamps for pruning.
Eviction: No eviction for active sessions; Presence entries expire after 2x heartbeat interval.
Messaging
Redis Pub/Sub:
Used for intra-cluster communication.
When a message arrives for
Channel_A, the Chat Service publishes to a Redis topic channel:Channel_A.All Gateway nodes subscribed to that topic receive the message and push it to their locally connected users who are members of that channel.
Wrap Up
Advanced Topics
Monitoring:
Prometheus/Grafana: Monitor WebSocket connection counts, message delivery latency, and Redis memory usage.
Tracing: Jaeger for tracing a message from API call to WebSocket broadcast.
Trade-offs:
Consistency vs. Availability: Chosen Availability (AP). If the Message Store is briefly inconsistent, users might see messages in slightly different orders, but the system remains functional.
Bottlenecks:
Large Servers: A message in a 500k member server creates a "Fan-out Write" problem.
Optimization: For large servers, do not broadcast presence updates for every user. Only show the "Online" count and limit the member list to the top 100 active users.
Failure Handling:
Gateway Failure: If a node dies, clients use an exponential backoff to reconnect to a different node via the Load Balancer.
Cassandra Replication: Use a replication factor of 3 across different Availability Zones (AZs).
Alternatives:
Kafka: Could replace Redis Pub/Sub if we needed durable message queuing, but Redis is faster for sub-millisecond real-time delivery in an MVP.