The Question

Real-time Messaging and Presence System

Design a high-scale real-time communication platform similar to Discord. The system must support millions of concurrent users across various 'servers' and 'channels'. Focus on the architectural challenges of managing persistent WebSocket connections, efficient real-time message fan-out to large groups (up to 500k members), and a storage strategy for billions of messages with low-latency retrieval. Discuss how you would handle presence state (online/offline) at this scale and ensure system reliability during massive traffic spikes.

WebSockets

ScyllaDB

PostgreSQL

Redis

Consistent Hashing

Pub/Sub

TimeUUID

JWT

TLS

Questions & Insights

Clarifying Questions

Scale: What is the target DAU and Peak Concurrent Users (PCU)?

Assumption: 200M MAU, 20M DAU, and 1M+ PCU.

Core Features: Should the MVP include Voice/Video and Screen Sharing?

Assumption: No. Focus on the core persistent chat, server/channel structure, and real-time presence.

Server Size: What is the maximum number of members in a single "Guild" (Server)?

Assumption: Up to 500,000 members per server (similar to large community servers).

Data Retention: Do messages need to be stored forever?

Assumption: Yes, message history is permanent and searchable.

Thinking Process

The Connection Problem: How do we maintain 1M+ persistent connections efficiently?

The Fan-out Problem: When a message is sent in a server with 100k users, how do we notify everyone without melting the system?

The Storage Problem: How do we store trillions of messages while keeping read latency low for channel history?

The Presence Problem: How do we sync "Online/Offline" status across massive servers without O(N^2) traffic?

Bonus Points

ScyllaDB/Cassandra Partitioning: Using channel_id + bucket as the partition key to ensure messages for a specific channel are co-located but don't grow into giant partitions.

Gateway Sharding: Using consistent hashing to map users to specific Gateway (WebSocket) nodes to maintain session affinity.

Presence Lazy-Loading: Implementing a strategy where presence updates are only pushed to "visible" members in the UI to prevent broadcast storms in large servers.

Discord-specific Optimization: Using "Read States" (Atomic counters or specific markers) to track unread messages efficiently across devices.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can create/join Servers (Guilds) and Channels.

Users can send and receive real-time text messages.

Users can see the online/offline status (Presence) of friends and server members.

Users can view message history for channels.

Scope Control:

In-scope: Text chat, Servers/Channels, Presence, Message History.

Out-of-scope: Voice/Video calls, File uploads (CDN), Message Search, Roles/Permissions complex logic.

Non-Functional Requirements

Scale: Support millions of concurrent WebSocket connections and billions of messages.

Latency: Real-time message delivery should be < 200ms.

Availability & Reliability: 99.99% availability; messages must not be lost once acknowledged.

Consistency: Eventual consistency for presence; Stronger consistency for message ordering within a single channel.

Fault Tolerance: Gateway node failures must not lose messages (reconnect and resume).

Estimation

Traffic:

20M DAU * 50 messages/day = 1 Billion messages/day.

Average Write QPS: ~11,500 msgs/sec.

Peak Write QPS: ~50,000 msgs/sec.

Storage:

1B msgs/day * 100 bytes/msg = 100 GB/day.

36.5 TB/year (Replication factor 3 = ~110 TB/year).

Bandwidth:

Outgoing bandwidth is significantly higher due to fan-out (1 message to 100 users).

Blueprint

Concise Summary: A WebSocket-based real-time architecture utilizing a distributed Gateway for persistent connections and a wide-column NoSQL store (ScyllaDB) for high-volume message persistence.

Major Components:

Gateway Service: Manages long-lived WebSocket connections for real-time event duplexing.

Guild Service: Manages the metadata of servers, channels, and memberships.

Message Service: Handles incoming messages, persists them to the database, and triggers the fan-out.

Presence Service: Tracks user heartbeats and broadcasts status changes to relevant peers.

ScyllaDB: Provides high-throughput, low-latency writes for the message firehose.

Simplicity Audit: This design avoids complex service meshes and uses a simple Pub/Sub model for message distribution, satisfying the MVP's need for speed and scale.

Architecture Decision Rationale:

Why this architecture?: WebSockets are the industry standard for low-latency bi-directional communication. NoSQL is chosen over SQL for messages because Discord's access pattern (Key: ChannelID, Sort: MessageID) maps perfectly to wide-column storage.

Functional Satisfaction: Covers real-time delivery and historical access.

Non-functional Satisfaction: Gateway sharding ensures horizontal scalability; ScyllaDB handles the massive write load.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Use a global DNS (Route53) with latency-based routing to direct users to the nearest regional data center.

Security & Perimeter:

API Gateway: Handles TLS termination and JWT validation.

Rate Limiting: Implemented at the Gateway level to prevent spam (e.g., 5 messages/second per user).

Service

Topology & Scaling:

Gateway Service (Stateful): Nodes are sharded by UserID. When a user connects, they are assigned a specific node. If a node fails, clients reconnect to a different node.

API Services (Stateless): Message and Guild services scale based on CPU/Request count.

API Schema Design:

POST /v1/channels/{channel_id}/messages: REST, JSON payload. Returns message_id.

GET /v1/channels/{channel_id}/messages: Returns paginated history.

WS /gateway: WebSocket upgrade request.

Resilience & Reliability:

Exponential Backoff: For client reconnections to avoid "Thundering Herd" after a Gateway node goes down.

Sequence Numbers: Every message sent to a client includes a sequence number so the client can detect gaps and request missed messages.

Storage

Access Pattern:

Writes: Very high (every message).

Reads: High for recent messages (fetching channel history).

Database Table Design:

ScyllaDB (Messages):

Partition Key: channel_id

Clustering Key: message_id (TimeUUID)

Columns: author_id, content, timestamp.

PostgreSQL (Metadata):

Tables: Guilds, Channels, Users, Memberships.

Use for relational data where consistency is key (e.g., who is in which server).

Technical Selection: ScyllaDB. It provides Cassandra-compatible API with much higher performance and lower tail latency, critical for a chat app.

Cache

Purpose & Justification: Redis is used for two purposes:

Presence State: Storing the current status (Online/DND) and the Last Seen timestamp.

Session Mapping: Mapping UserID -> GatewayNodeID so the Message Service knows which node to route a message to.

Failure Handling: If Redis fails, presence defaults to "Offline." Users will re-sync their presence state upon reconnection.

Messaging

Purpose & Decoupling: Redis Pub/Sub is used for the real-time fan-out.

Throughput & Partitioning:

Each ChannelID or GuildID can act as a topic.

Gateway nodes subscribe to the topics of the users currently connected to them.

Failure Handling: Redis Pub/Sub is "fire and forget." We rely on the client's sequence number tracking to recover from missed pulses during brief disconnects.

Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP) for presence. It is better for a user to occasionally see a friend as "Online" when they just logged off than to have the whole chat system hang waiting for a global lock on presence state.

Reliability: To handle massive guilds (500k members), we do not push presence updates to everyone. We only push to users "visible" in the current channel's member list or based on recent activity.

Security: All messages are encrypted in transit (TLS). RBAC (Role-Based Access Control) is checked by the Guild Service before a user is allowed to subscribe to a channel's WebSocket stream.

Distinguishing Insights: To handle the "Large Server" problem, Discord uses a Lazy Loading approach for the member list. The client only requests presence for the ~100 members currently visible in the sidebar, significantly reducing the fan-out load.