The Question
Design

Scalable Real-Time Messaging and Presence System

Design the core architecture for a large-scale real-time communication platform similar to Discord. The system must support millions of concurrent users across thousands of persistent 'servers' or groups. Focus on the end-to-end flow of a text message, the management of real-time user presence (online/offline status) at scale, and the persistent storage of billions of messages with low-latency retrieval. Address the challenges of 'fan-out' in large groups and the stateful nature of maintaining millions of persistent connections.
WebSockets
ScyllaDB
Redis
PostgreSQL
Kafka
Elixir
Snowflake ID
Protobuf
Anycast DNS
Questions & Insights

Clarifying Questions

Scale: What is the target scale in terms of Daily Active Users (DAU) and Peak Concurrent Users (PCU)?
Assumption: 200M DAU, 20M PCU, 1M+ messages per second at peak.
Data Retention: Should messages be stored indefinitely, or is there a TTL?
Assumption: Messages are stored permanently unless deleted by the user.
Server Size: What is the maximum number of users in a single "Server" (Guild)?
Assumption: Up to 1M users per server (e.g., Midjourney, official game servers).
Media: Are we including Voice/Video/Screen-share in the MVP?
Assumption: MVP focuses on real-time text chat and Presence. Voice/Video is out of scope for the core architectural foundation but will be acknowledged for extensibility.
Consistency: Is strict ordering of messages required across all devices?
Assumption: Causal consistency within a channel is required; messages must appear in the same order for all viewers.

Thinking Process

The core challenge of Discord is maintaining massive real-time state (Presence) and high-throughput message delivery to millions of concurrent WebSockets.
How do we handle 20M+ concurrent connections? We use a distributed Gateway layer (Stateful WebSockets) backed by a pub/sub mechanism to route messages.
How do we store billions of messages efficiently? We utilize a NoSQL wide-column store (ScyllaDB/Cassandra) partitioned by channel_id to ensure localized reads and high write throughput.
How do we handle the "Presence" problem (who is online)? We implement a "fan-out on write" or "lazy pull" strategy depending on server size to avoid the "Celebrity Problem" (where one person's status update crashes the system).
How do we ensure low latency? We co-locate the Gateway and the Message service, using a highly concurrent language runtime (e.g., Elixir/BEAM or Go) to manage massive lightweight processes.

Bonus Points

BEAM VM (Elixir/Erlang): Using Elixir for the Gateway allows for millions of lightweight processes, simplifying the "one process per user" model which is battle-tested by Discord.
ScyllaDB Optimization: Using a shard-aware driver to reduce cross-core communication and utilizing a "Time-Window Compaction Strategy" for efficient message aging.
Ramping Presence: For large servers (e.g., 50k+ members), we stop sending 1-to-1 presence updates and switch to a summarized "Member List" view to save bandwidth (Presence Capping).
Consistent Hashing for Gateway: Mapping users to specific gateway nodes to minimize the search space for message delivery while allowing for seamless rebalancing.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can join Servers (Guilds) and Channels.
Real-time text messaging within channels.
Presence tracking (Online, Idle, DND, Offline).
Message history retrieval (infinite scroll).
Scope Control:
In-scope: Text chat, Presence, Server/Channel management.
Out-of-scope: Voice/Video (WebRTC), Search, File uploads, Rich Embeds, Nitro subscriptions.

Non-Functional Requirements

Scale: Support 20M+ concurrent users and 1M+ messages/sec.
Latency: End-to-end message delivery < 200ms.
Availability & Reliability: 99.99% (highly available even if regional nodes fail).
Consistency: Causal consistency for message ordering.
Fault Tolerance: Automatic reconnection of WebSockets with exponential backoff.

Estimation

Traffic:
1M messages/sec (Write QPS).
Read QPS is significantly higher (Fan-out): If 1 message is sent to a channel with 100 active users, Read QPS = 100M/sec (handled via WebSockets, not DB).
Storage:
1M msgs/sec * 86,400 sec/day = ~86B messages/day.
Avg msg (100 bytes) = ~8.6 TB/day.
3.1 PB per year (Requires significant sharding).
Bandwidth:
Incoming: 1M * 100B = 100 MB/s.
Outgoing (Fan-out 1:50 avg): 100 MB/s * 50 = 5 GB/s.

Blueprint

The architecture centers on a Stateful Gateway Layer that maintains persistent WebSocket connections with clients. A Presence Service tracks user state via Heartbeats. Chat Services handle the persistence of messages to a partitioned ScyllaDB cluster, while a Pub/Sub (Redis) layer ensures that messages are routed to the correct Gateway node for real-time delivery.
Gateway Service: Manages WebSocket connections and routes incoming client events.
Chat Service: Business logic for message validation, permissions, and persistence.
Presence Service: High-speed, in-memory tracking of user online status.
ScyllaDB: High-performance NoSQL for message storage, partitioned by channel_id.
Redis (Pub/Sub): Facilitates inter-node communication for real-time message fan-out.
Simplicity Audit: This design avoids complex microservices by grouping logic into "Chat" and "Presence," focusing purely on the data flow from User A -> Gateway -> DB -> Pub/Sub -> User B.
Architecture Decision Rationale:
Why ScyllaDB?: Discord transitioned from MongoDB to Cassandra/ScyllaDB because of its ability to handle massive amounts of data with predictable p99 latency when partitioned by channel_id.
Why Stateful Gateway?: WebSockets are mandatory for the "real-time" feel of Discord.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global Anycast IP routes users to the nearest regional Data Center (DC).
Load Balancing: L4 Load Balancers (NLB) terminate SSL and distribute TCP/WebSocket traffic to Gateway nodes using consistent hashing on user_id.
API Gateway: A lightweight L7 gateway (e.g., Envoy) handles REST requests for metadata (getting server lists, user profiles).

Service

Gateway Service (WebSocket)
Protocol: Binary Protobuf over WebSockets for bandwidth efficiency.
Heartbeats: Clients send heartbeats every 30s; failure to send results in "Offline" status.
Session Resumption: Supports a session_id and last_sequence_number to replay missed messages during brief disconnects.
Chat Service
API: POST /channels/{id}/messages (REST/gRPC).
Idempotency: Clients generate a nonce (UUID) to prevent duplicate messages during retries.
Presence Service
Fan-out Strategy: When a user's status changes, the service fetches the list of "Mutual Servers" and broadcasts the update to those servers' active Gateway sessions.
Capping: For servers with >1,000 members, presence updates are only sent for the user's current "viewed" channel or a limited subset of the member list.

Storage

Access Pattern:
Write-heavy (every message).
Read-heavy (fetching history on scroll).
Database Table Design (ScyllaDB)
Table: messages
channel_id (Partition Key): Groups all messages in a channel on the same shard.
message_id (Clustering Key, Descending): Snowflake ID (time-sortable) for efficient range queries (fetching latest 50 messages).
author_id, content, attachments, timestamp.
Technical Selection: PostgreSQL is used for "Relational Metadata" (Servers, Channels, Permissions, User profiles) where ACID compliance is critical for structural changes.
Distribution Logic: channel_id is the shard key. Since Discord channels rarely exceed the storage limit of a single physical shard, this prevents cross-node joins for history fetching.

Cache

Purpose: Presence state is the most volatile data in the system.
Key-Value Schema:
u:{user_id} -> {status: "online", last_active: timestamp}.
TTL: 60 seconds (extended by heartbeats).
Technical Selection: Redis Cluster. High throughput, sub-millisecond latency.

Messaging

Purpose: Redis Pub/Sub is used for real-time fan-out. Each Gateway node subscribes to topics for the channel_ids that its currently connected users are viewing.
Failure Handling: If Redis Pub/Sub drops a message, the client will eventually sync via the "Last Sequence Number" on reconnection, or the user will notice a gap and trigger a manual refresh.
Technical Selection: Kafka is used for non-critical async paths: indexing messages for Search, generating Notifications, and Trust/Safety analytics.
Wrap Up

Advanced Topics

Trade-offs (Availability over Consistency): In the CAP theorem, Discord leans towards AP. It is better for a user to see a slightly delayed message or an outdated "Online" status than for the whole chat to be frozen while waiting for global consensus.
Snowflake IDs: Standard UUIDs are bad for DB indexing. We use a custom Snowflake ID (Timestamp + Worker ID + Sequence) to ensure IDs are k-sortable, allowing the UI to sort messages chronologically without needing a separate timestamp index.
The "Thundering Herd": When a Gateway node goes down, 100k+ clients try to reconnect at once. We implement Exponential Backoff with Jitter on the client-side and a "Reconnection Token" to bypass full authentication for existing sessions.
Security:
End-to-End Encryption (E2EE): Not used for standard Discord text (allows for server-side moderation), but mTLS is used for service-to-service communication.
Permissions: Every message write checks a "Permission Cache" (often local to the Chat Service) to ensure the user_id has SEND_MESSAGES rights for that channel_id.