The Question
Design

Real-Time Community Chat Platform

Design a large-scale real-time community chat platform similar to Discord. The system should support servers, channels, direct messaging, and live presence indicators for tens of millions of concurrent users, with sub-200ms message delivery and high availability across global regions.
WebSocket
Redis Pub/Sub
Cassandra
PostgreSQL
Consistent Hashing
Questions & Insights

Thinking Process

To design a system like Discord, the primary challenge is handling massive real-time fan-out and persistent connections.
Core Bottleneck: Managing millions of concurrent WebSocket connections and broadcasting a single message to thousands of members in a "Server" (Guild) simultaneously.
Progressive Questions:
Connection Management: How do we maintain stateful connections for millions of users? (Answer: Sharded WebSocket Gateways).
Message Fan-out: When a message is sent in a 100k-member server, how do we notify only the online users? (Answer: Pub/Sub with a Presence Service).
Storage Strategy: How do we store billions of chat messages while maintaining fast "scroll-back" performance? (Answer: Wide-column NoSQL like Cassandra/ScyllaDB).
Presence Fatigue: How do we prevent the "Thundering Herd" problem when a celebrity goes online? (Answer: Presence "Lazy Loading" and batched updates).

Bonus Points

ScyllaDB Optimization: Using ScyllaDB instead of Cassandra for its shard-per-core architecture to minimize P99 latency during massive chat bursts.
Discord’s "Read States": Implementing a custom "Read Pointer" service using a "compressed bitset" or "version clocks" to track unread messages without per-message DB writes.
Gateway Sharding: Using a "Consistent Hashing" ring for WebSocket Gateways to ensure that when a node fails, only a fraction of users are disconnected.
Voice Infrastructure: Mentioning SFU (Selective Forwarding Unit) over MCU to minimize server-side transcoding costs for voice channels.
Design Breakdown

Functional Requirements

Users can create Servers (Guilds) and Channels.
Users can send/receive real-time text messages.
Users can see the online/offline status (Presence) of friends/server members.
Users can see message history (infinite scroll).

Non-Functional Requirements

Low Latency: Message delivery < 200ms.
High Availability: System must stay up even if a regional data center fails.
Scalability: Support 10M+ DAU and servers with 500k+ members.
Consistency: Eventual consistency for message history is acceptable; real-time delivery must feel ordered.

Estimation

DAU: 10 Million.
Messages per Day: 500 Million.
Average Message Size: 100 bytes.
Storage: 500M * 100 bytes = 50GB/day. ~18TB/year (excluding media).
Concurrency: Peak 1M concurrent WebSocket connections.
Throughput: ~6,000 Messages Per Second (MPS) average; 50k+ peak.

Blueprint

Concise Summary: A microservices architecture leveraging WebSockets for bi-directional communication, Redis for real-time presence/pub-sub, and Cassandra for scalable message storage.
Major Components:
Gateway Service (WebSocket): Manages persistent connections and routes incoming/outgoing real-time events.
Presence Service: Tracks user heartbeats and manages "who is online" state using an in-memory store.
Chat Service: Handles message validation, persistence logic, and fan-out triggers.
Metadata DB (Postgres): Stores relational data like Server settings, Channel permissions, and User profiles.
Message Store (Cassandra): Stores the actual chat history optimized for time-series retrieval.
Simplicity Audit: This design avoids complex stream processing (Flink/Spark) for the MVP, relying on Redis Pub/Sub for immediate message relay, which is sufficient for initial scale.

High Level Architecture

Sub-system Deep Dive

Service

Gateway Service (WebSocket):
Maintains long-lived TCP connections.
Upon connection, it maps User_ID -> Connection_ID in the Session Cache.
Uses a "Heartbeat" mechanism (every 30s) to detect stale connections.
API Spec:
POST /v1/channels/{id}/messages: Send a message.
GET /v1/channels/{id}/messages?limit=50: Fetch history.
WS /gateway: Upgrade to WebSocket for real-time events.

Storage

Data Model (Cassandra):
Table: messages_by_channel
Partition Key: channel_id (clusters all messages for one channel together).
Clustering Key: message_id (TimeUUID, descending) for fast retrieval of latest messages.
Metadata DB (Postgres):
Tables for Servers, Channels, Users, and Memberships.
Uses standard B-Tree indexing on server_id and user_id.

Cache

Session Cache (Redis): Stores which Gateway node a user is connected to (e.g., user:123 -> gateway_node_5).
Presence Store (Redis):
Stores user_id: status (online, dnd, idle).
Uses Redis Sorted Sets to track last-active timestamps for pruning.
Eviction: No eviction for active sessions; Presence entries expire after 2x heartbeat interval.

Messaging

Redis Pub/Sub:
Used for intra-cluster communication.
When a message arrives for Channel_A, the Chat Service publishes to a Redis topic channel:Channel_A.
All Gateway nodes subscribed to that topic receive the message and push it to their locally connected users who are members of that channel.
Wrap Up

Advanced Topics

Monitoring:
Prometheus/Grafana: Monitor WebSocket connection counts, message delivery latency, and Redis memory usage.
Tracing: Jaeger for tracing a message from API call to WebSocket broadcast.
Trade-offs:
Consistency vs. Availability: Chosen Availability (AP). If the Message Store is briefly inconsistent, users might see messages in slightly different orders, but the system remains functional.
Bottlenecks:
Large Servers: A message in a 500k member server creates a "Fan-out Write" problem.
Optimization: For large servers, do not broadcast presence updates for every user. Only show the "Online" count and limit the member list to the top 100 active users.
Failure Handling:
Gateway Failure: If a node dies, clients use an exponential backoff to reconnect to a different node via the Load Balancer.
Cassandra Replication: Use a replication factor of 3 across different Availability Zones (AZs).
Alternatives:
Kafka: Could replace Redis Pub/Sub if we needed durable message queuing, but Redis is faster for sub-millisecond real-time delivery in an MVP.