The Question

Design a Scalable Real-time Messaging Platform

Design a real-time communication system similar to Discord that supports massive persistent connections (1M+ PCU), instantaneous message delivery across channels, and a scalable presence system. The system must handle 'megaservers' with hundreds of thousands of members and ensure message durability and ordering. Discuss your strategy for WebSocket management, storage selection for high-write chat logs, and the architectural trade-offs involved in presence broadcasting at scale.

ScyllaDB

PostgreSQL

Redis

Kafka

WebSockets

gRPC

Snowflake ID

CDN

Questions & Insights

Clarifying Questions

Scale: What is the expected scale for the MVP? (Assumed: 100M Registered Users, 10M DAU, 1M Peak Concurrent Users (PCU)).

Core Features: Are we including Voice/Video or just Text and Presence? (Assumed: Text chat, Servers/Channels, and Real-time Presence only for MVP).

Persistence: Do messages need to be stored forever or is there a TTL? (Assumed: Permanent storage with horizontal scalability).

Group Size: What is the maximum size of a "Server"? (Assumed: Support for "Megaservers" up to 500k members).

Client Support: Mobile and Desktop? (Assumed: Cross-platform support via WebSockets).

Thinking Process

The WebSocket Problem: How do we maintain 1M+ persistent connections and route messages to the correct user sessions?

The Presence Bottleneck: How do we broadcast "Online/Offline" status to thousands of friends/server-mates without creating an

O(N^2)

traffic storm?

The Storage Choice: Which database can handle heavy write throughput and provide low-latency retrieval by channel_id + timestamp?

The Delivery Guarantee: How do we ensure messages are delivered in order and handle offline synchronization?

Bonus Points

Snowflake IDs: Using decentralized k-sortable unique IDs to ensure message ordering across distributed shards without a central lock.

Presence Fan-out Optimization: Implementing a "Lazy Load" or "View-port" based presence strategy for large servers to prevent thundering herds.

ScyllaDB/Cassandra Compaction: Tuning the storage engine (Time Window Compaction Strategy) specifically for chat access patterns to minimize disk I/O.

Gateway Sharding: Using consistent hashing for Gateway nodes to minimize the impact of node failures on persistent connections.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can create servers and channels.

Users can send and receive real-time messages in channels.

Users can see the real-time online/offline status of friends.

Users can view message history in channels.

Scope Control:

In-scope: Text messaging, Presence, Server/Channel management, Auth.

Out-of-scope: Voice/Video calls, File uploads (CDN), Search functionality, Message reactions.

Non-Functional Requirements

Scale: Support 1M PCU and 100k message writes per second.

Latency: End-to-end message delivery < 200ms.

Availability & Reliability: 99.99% availability; messages must not be lost once acknowledged.

Consistency: Eventual consistency for presence; strong ordering for messages within a channel.

Fault Tolerance: System must survive the failure of individual Gateway or Database nodes.

Estimation

Traffic Estimation:

1M PCU. If 10% of users send a message every 10 seconds:

1M * 0.1 / 10 = 10k

Messages/sec (Average).

Peak:

100k

Messages/sec.

Storage Estimation:

Average message size: 100 bytes.

100k \text{ msg/sec} * 86,400 \text{ sec/day} \approx 8.6 \text{ billion msgs/day}

8.6B * 100 \text{ bytes} \approx 860 \text{ GB/day}

Yearly storage:

\approx 310 \text{ TB}

Bandwidth Estimation:

Incoming:

100k * 100 \text{ bytes} = 10 \text{ MB/s}

Outgoing (Fan-out): If average channel has 20 active users,

10 \text{ MB/s} * 20 = 200 \text{ MB/s}

Blueprint

This architecture focuses on a high-concurrency Gateway layer to manage persistent connections and a distributed wide-column store for message persistence.

Gateway Service (WebSocket): Manages stateful connections and bi-directional real-time communication.

Chat Service: Handles message validation, persistence, and fan-out logic.

Presence Service: Tracks user heartbeats and broadcasts status updates.

ScyllaDB/Cassandra: Stores messages using a partition key of channel_id for fast sequential reads.

Redis: Stores transient session mapping and presence state.

Simplicity Audit: We avoid complex service meshes or global transactions. We use ScyllaDB because it scales linearly, which is a requirement even for a "Discord MVP" due to the inherent viral nature of the product.

Architecture Decision Rationale:

Why this architecture?: WebSockets are essential for low-latency bi-directional updates. A wide-column store (ScyllaDB) is chosen over RDBMS because chat history is write-heavy and grows indefinitely.

Functional Satisfaction: Covers real-time delivery via Gateway and history via ScyllaDB.

Non-functional Satisfaction: Redis provides sub-millisecond presence checks; ScyllaDB ensures no single point of failure for data.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery: Use CDN for static assets (CSS/JS) and emojis.

Load Balancing: L4 (TCP) Load Balancer for WebSocket connections using round-robin or least-connections. SSL termination happens here.

Gateway Cluster: Nodes maintain a mapping of User_ID -> WebSocket_Connection. Heartbeats every 30s to prune dead connections.

Service

Chat Service:

Protocol: gRPC for internal service communication.

Idempotency: Clients attach a nonce or client_msg_id.

Ordering: Generates a Snowflake ID (64-bit) for every message to ensure total ordering within a channel.

Presence Service:

Handles "Online", "Idle", "Do Not Disturb", and "Offline".

Optimization: For servers > 1000 members, presence is only sent for users currently looking at the member list (viewport-based presence).

Observability: Prometheus metrics for "Connection Count" and "Message E2E Latency".

Storage

Access Pattern:

Write: High volume inserts (Messages).

Read: Fetch last N messages for a channel_id.

Database Table Design:

Messages (ScyllaDB):

Partition Key: channel_id

Clustering Key: message_id (Snowflake, descending)

Fields: author_id, content, timestamp.

Metadata (PostgreSQL):

Tables for Servers, Channels, Server_Members.

Indexed on user_id and server_id.

Technical Selection: ScyllaDB (NoSQL Wide-column). It handles high write-throughput better than Postgres and supports partitioning by channel_id, allowing message history to be retrieved in a single disk seek.

Cache

Purpose: Store session metadata and user presence state.

Key-Value Schema:

user:presence:{id} -> {status, last_active_ts}.

user:gateway:{id} -> gateway_node_ip (used for routing incoming messages to the correct socket).

Technical Selection: Redis. High performance for TTL-based presence heartbeats.

Messaging

Purpose: Decouples the Chat Service from the Gateway. When a message is sent to a channel, it is published to a topic.

Event Schema: channel_id, message_payload, recipient_list.

Throughput: Kafka handles the fan-out. Each Gateway node subscribes to a subset of partitions.

Technical Selection: Kafka. Required for high-throughput message distribution and ability to "replay" if a Gateway node restarts.

Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP) for Presence. It is acceptable if a user sees a friend as "online" for a few seconds longer than they actually are.

Reliability:

Gateway Failure: If a Gateway node dies, clients reconnect to another node. Redis is updated with the new User -> Gateway mapping.

Backpressure: If Kafka is slow, the Chat Service can buffer messages locally or return a 503 to the client.

Bottleneck Analysis:

Hot Channels: A channel with 100k active users (e.g., a major gaming event) creates a massive fan-out.

Optimization: Use "Sub-groups" for fan-out or limit the broadcast only to the "Top 100" active members in massive channels.

Security:

All messages encrypted in transit via TLS.

Member permissions checked in the Chat Service via Postgres Server_Members table before allowing a write.