DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Real-time Chat System Design

Design a real-time messaging platform similar to Discord that supports millions of concurrent users. The system must handle persistent message history, real-time delivery with low latency, and a presence system (online/offline status). Specifically address how the architecture handles massive group servers with over 100,000 concurrent members, focusing on storage partitioning, real-time fan-out challenges, and maintaining high availability during peak traffic spikes.
WebSockets
ScyllaDB
PostgreSQL
Redis
NATS
Snowflake ID
Anycast IP
Questions & Insights

Clarifying Questions

Scale: What is the target scale for Daily Active Users (DAU) and Peak Concurrent Users (PCU)?
Server Size: What is the maximum number of members a single "Server" (Guild) can hold? (e.g., 1 million+ like official game servers).
Persistence: Do we need to store message history indefinitely, and should it be searchable?
Features: Should we focus strictly on text-based messaging for the MVP, or are Voice/Video and Screen Sharing required?
Presence: How real-time does the "Online/Idle/DND" status need to be across large servers?
Assumptions:
DAU: 100 million; PCU: 15 million.
Max Server Size: 1 million members.
Message History: Persistent forever.
MVP Focus: Real-time text chat, server/channel management, and presence. Voice/Video is out of scope for the MVP.

Thinking Process

Core Bottleneck: Real-time delivery to millions of concurrent users and the "Fan-out" problem (sending one message to 100k+ online members in a large server).
Step 1: How do we maintain persistent connections? We use a WebSocket Gateway to maintain bi-directional stateful connections for real-time delivery.
Step 2: How do we handle massive message storage? Relational databases fail at this scale; we need a wide-column NoSQL store (ScyllaDB/Cassandra) for message partitioning.
Step 3: How do we scale presence? A distributed Pub/Sub mechanism is needed, but for massive servers, we must use "Lazy Loading" (only showing presence for users in the current UI viewport) to avoid O(N^2) traffic.

Bonus Points

Message IDs: Use Snowflake-like IDs (k-sortable) to ensure chronological ordering across distributed shards without a central lock.
Read-states Optimization: Instead of a "read" flag per message/user, store a last_read_message_id per channel/user to drastically reduce write load.
Gateway Sharding: Implement consistent hashing on Gateway servers to ensure users from the same guild land on the same "Manifold" or "Session" worker to optimize internal pub/sub routing.
Data Locality: Discuss ScyllaDB's shard-per-core architecture to minimize CPU cache misses during high-volume chat spikes.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can join/create Servers and Channels.
Users can send and receive real-time messages.
Users can see the online/offline status (Presence) of friends/server members.
Users can view message history (infinite scroll).
Scope Control:
In-scope: Text chat, Presence, Guild/Channel metadata.
Out-of-scope: Voice/Video, File attachments (beyond URLs), Search, Rich Embeds, Nitro/Billing.

Non-Functional Requirements

Scale: Support 100M+ DAU and billions of messages per day.
Latency: End-to-end message delivery < 200ms.
Availability: 99.99% (Chat is "always-on" utility).
Consistency: Eventual consistency for message history is acceptable, but message ordering within a channel must be strictly preserved.
Fault Tolerance: No single point of failure; Gateway servers must handle graceful reconnection.

Estimation

Traffic:
100M DAU * 20 messages/day = 2 Billion messages/day.
Average QPS: ~23,000 writes/sec.
Peak QPS (10x): ~230,000 writes/sec.
Storage:
2B messages * 100 bytes/msg = 200 GB/day.
73 TB/year (before replication).
Bandwidth:
23k writes/sec * 100 bytes = 2.3 MB/s (Inbound).
Outbound is much higher due to Fan-out (e.g., avg 100 users per channel) = 230 MB/s.

Blueprint

Concise Summary: A WebSocket-based real-time architecture utilizing a stateful Gateway for message delivery, backed by ScyllaDB for high-throughput message persistence and Redis for session/presence tracking.
Major Components:
WebSocket Gateway: Maintains persistent TCP connections to clients for real-time bi-directional events.
Chat Service: Handles incoming messages, validates permissions, and persists them to storage.
Presence Service: Tracks user heartbeats and broadcasts status changes.
Metadata DB (Postgres): Stores relational data like Guilds, Channels, and Roles.
Message Store (ScyllaDB): High-performance wide-column store for chat history.
Simplicity Audit: This design avoids complex stream processing (Flink) or heavy service meshes for the MVP, focusing on direct WebSocket-to-Service communication.
Architecture Decision Rationale:
ScyllaDB over Postgres: Postgres cannot handle the write-throughput and partitioning required for billions of messages.
WebSocket over Long Polling: Reduced header overhead and lower latency for bi-directional communication.
Redis for Presence: High-speed, TTL-based storage is perfect for ephemeral heartbeats.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Use a Global Anycast IP to route users to the nearest regional Data Center.
Security & Perimeter:
API Gateway: Handles TLS termination and JWT validation.
Rate Limiting: Applied at the UserID level to prevent spamming messages.

Service

Topology & Scaling:
Gateway Servers: Stateful. Scaled horizontally based on concurrent connection count (usually 50k-100k per instance).
API Servers: Stateless. Scaled based on CPU/Request Latency.
API Schema Design:
POST /v1/channels/{id}/messages: REST for sending (provides ACK).
WS /gateway: WebSocket for receiving events.
Idempotency: Clients generate a nonce to prevent duplicate messages during retries.
Resilience & Reliability:
Exponential Backoff: For client reconnections to avoid a "Thundering Herd" after a Gateway restart.
Message ACKs: Clients must ACK a message ID; if not received, the Gateway retries or the client fetches missing data on reconnect.

Storage

Access Pattern:
Writes: Very high (new messages).
Reads: High (loading channel history, infinite scroll).
Database Table Design:
Message Table (ScyllaDB):
channel_id (Partition Key): All messages for a channel live together.
message_id (Clustering Key): K-sortable (Snowflake) for chronological ordering.
author_id, content, timestamp.
Metadata (Postgres):
guilds table, channels table, members table (roles/permissions).
Technical Selection: ScyllaDB. Chosen for its C++ implementation of Cassandra's model, offering lower p99 latency and better resource utilization for the massive write volume.

Cache

Purpose & Justification:
Presence: Redis stores user_id -> {status, last_active}.
Session: Mapping user_id -> gateway_id to route messages to the correct stateful server.
Key-Value Schema:
Key: presence:{user_id}, Value: Hash of status. TTL: 60s (heartbeat based).
Failure Handling: If Redis fails, presence defaults to "Offline". This is acceptable for an MVP as it's non-critical data.

Messaging

Purpose & Decoupling: Used as the internal backbone for fan-out. When a message is persisted, it is published to the broker.
Throughput & Partitioning: Partitioned by channel_id to ensure message ordering within a specific chat.
Technical Selection: Redis Pub/Sub (for low latency presence) or NATS/Kafka (for message fan-out). For a Discord-scale MVP, a distributed Pub/Sub like NATS is preferred for low-latency delivery over persistent storage like Kafka.
Wrap Up

Advanced Topics

Trade-offs (PACELC): We choose Availability over Consistency (AP). If a message is slightly delayed in history but delivered real-time, it's better than the system being down.
Reliability:
Gateway Sharding: Users are assigned to Gateway "buckets" to limit the blast radius if one node crashes.
ScyllaDB Replication: 3x replication across Availability Zones.
Bottleneck Analysis (The 1M Member Server):
If 100k people are online in a server, one message generates 100k outgoing packets.
Optimization: The Gateway should not send presence updates for all 1M members. It only sends updates for members visible in the user's UI "member list" (Client-side Viewport).
Security:
End-to-Transit Encryption: TLS 1.3 for all connections.
Permissions: Every message write checks the Postgres members table (cached in Redis) to ensure the user has SEND_MESSAGES permission for that channel_id.