The Question
Design

Real-Time Community Chat Platform

Design a large-scale real-time community chat platform similar to Discord. The system should support servers, channels, direct messaging, and live presence indicators for tens of millions of concurrent users, with sub-200ms message delivery and high availability across global regions.
WebSocket
ScyllaDB
Redis Pub/Sub
Consistent Hashing
Protobuf
Questions & Insights

Clarifying Questions

Scale: What is the target Daily Active User (DAU) count and concurrency?
Assumption: 100M DAU, 10M concurrent users, 100k+ messages per second peak.
Scope: Should the MVP include voice/video or focus on text chat and presence?
Assumption: Focus on real-time text chat, presence (online/offline), and server/channel structure. Voice/Video signaling is out of scope for the MVP.
Message Retention: Is message history permanent or ephemeral?
Assumption: Permanent history with infinite scroll capabilities.
Server Sizes: What is the maximum number of members per server/guild?
Assumption: Support servers up to 500,000 members.

Thinking Process

How do we handle real-time delivery at scale? Use a WebSocket-based Gateway architecture with a distributed Pub/Sub (Redis or NATS) to route messages to the correct connection handler.
How do we store billions of messages with high write throughput? Use a wide-column NoSQL store (ScyllaDB/Cassandra) partitioned by channel_id to ensure localized sequential writes and fast range scans for history.
How do we solve the "Presence Storm" problem? In large servers, fanning out presence updates to 500k users is O(N^2). We will use a "lazy-loading" presence model where updates are only pushed to active subscribers of a specific channel view, rather than the whole server.
How do we ensure message ordering? Use Snowflake IDs (k-sortable unique IDs) as the primary key to maintain chronological order across distributed nodes without a central bottleneck.

Bonus Points

ScyllaDB Performance: Opting for ScyllaDB over Cassandra for the message store to leverage its shard-per-core architecture, reducing tail latency (P99) in high-concurrency chat environments.
Discord’s "Read States" Optimization: Implement a compact bitmask or a separate high-speed counter service for "read-markers" to avoid heavy SQL queries when a user logs in and needs to see unread counts across 100 channels.
Gateway Sharding: Implementing a consistent hashing ring for Gateway nodes to minimize the blast radius during a node failure and ensure clients can deterministically reconnect.
Protocol Buffers: Use Protobufs for WebSocket communication instead of JSON to reduce payload size and CPU overhead during serialization/deserialization for 10M+ concurrent connections.
Design Breakdown

Functional Requirements

Users: Create accounts, add friends.
Servers/Channels: Create servers, organize channels, join/leave.
Messaging: Send/receive real-time text messages in channels and DMs.
Presence: View online/offline/idle status of friends and server members.
History: Load previous messages (infinite scroll).

Non-Functional Requirements

Low Latency: Real-time delivery < 200ms.
High Availability: 99.99% availability; the system must not go down if one region fails.
Scalability: Horizontal scaling for both the connection layer and the storage layer.
Consistency: Eventual consistency for presence; causal consistency for message ordering.

Estimation

DAU: 100M.
Write Throughput: 100M users * 50 msgs/day \approx 5B msgs/day \approx 60k msgs/sec (Average). Peak: ~200k msgs/sec.
Storage: 5B msgs/day * 200 bytes/msg \approx 1TB/day. 365TB per year.
WebSocket Connections: 10M concurrent connections. If 1 server handles 50k connections, we need ~200 Gateway nodes.

Blueprint

Concise Summary: A microservices architecture centered around a WebSocket Gateway for real-time duplex communication, backed by a high-throughput NoSQL store for message persistence.
Major Components:
Gateway Service: Maintains persistent WebSocket connections for real-time event pushing.
Message Service: Handles incoming message logic, persistence, and publishing to the fan-out layer.
Presence Service: Manages user heartbeats and status distribution.
ScyllaDB Cluster: Primary store for message history organized by channel.
Redis Cluster: Fast in-memory store for session mapping and presence state.
Simplicity Audit: This design avoids complex "Member List" sync algorithms and heavy media processing, focusing strictly on the core chat experience.
Architecture Decision Rationale:
Why this architecture is the best for this problem?: The separation of the persistent connection layer (Gateway) from the logic layer (API/Message Service) allows us to scale connection count and business logic independently.
Functional Requirement Satisfaction: WebSocket ensures real-time delivery; ScyllaDB handles the massive write-heavy history.
Non-functional Requirement Satisfaction: Partitioning by channel_id ensures that even as the total data grows, lookups for a specific channel remain O(1) or O(log N).

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling:
Gateway Nodes: Stateless but maintain stateful TCP connections. Scaled via L4 Load Balancers using source-IP hashing or round-robin with session affinity.
Microservices: Deployed in Kubernetes clusters, auto-scaled based on CPU/Request count.
API Spec:
POST /v1/channels/{id}/messages: Send message via REST (fallback or primary).
WS /v1/gateway: WebSocket upgrade for real-time events.
Protobuf payload: { "op": "SEND_MSG", "d": { "content": "hi", "cid": "123" } }.

Storage

Data Model:
ScyllaDB (messages):
Partition Key: channel_id (Groups all messages of a channel together).
Clustering Key: message_id (Snowflake ID, ensures chronological ordering on disk).
PostgreSQL (metadata): Stores users, guilds, channels, and memberships. Uses standard B-Tree indexes on user_id and guild_id.
Database Logic: ScyllaDB uses a LSM-tree based storage engine, which is ideal for the high-volume append-only nature of chat.

Cache

Presence Cache (Redis): Stores user_id -> {status, last_active, gateway_id}.
TTL: Presence keys have a short TTL (e.g., 60s), refreshed by Gateway heartbeats. If the heartbeat stops, the key expires, and the user is marked "offline".
Eviction: LRU (Least Recently Used) for general metadata caching.

Messaging

Topic Structure: One topic per gateway_node_id or a global broadcast bus using Redis Pub/Sub.
Delivery: When a message is sent to Channel A, the Message Service identifies all active members, finds their gateway_id from Redis, and publishes the message to those specific Gateway node topics.
Guarantees: At-least-once delivery. Clients use message_id for de-duplication.
Wrap Up

Advanced Topics

Monitoring:
Metrics: WebSocket connection count, message end-to-end latency, ScyllaDB disk I/O, Redis memory usage.
Tools: Prometheus for metrics, Grafana for visualization.
Trade-offs:
Consistency vs Availability: We prioritize Availability (AP in CAP). In a partition, some users might see messages slightly later, or presence might be stale, but the system remains functional.
Bottlenecks: The Redis Pub/Sub could become a bottleneck at massive scales.
Failure Handling:
Gateway Fail: Client detects disconnect, reconnects to a new node, and requests missed messages using the last received message_id.
Alternatives & Optimization:
Alternative: Use Amazon DynamoDB if managed service is preferred over ScyllaDB, though costs at Discord's scale would be prohibitive.
Optimization: Use "Message Pull" (Delta sync) when a user wakes up from background state instead of pushing all missed history via WebSockets.