DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Design a Scalable Real-time Messaging Platform

Design a real-time communication system similar to Discord that supports massive persistent connections (1M+ PCU), instantaneous message delivery across channels, and a scalable presence system. The system must handle 'megaservers' with hundreds of thousands of members and ensure message durability and ordering. Discuss your strategy for WebSocket management, storage selection for high-write chat logs, and the architectural trade-offs involved in presence broadcasting at scale.
ScyllaDB
PostgreSQL
Redis
Kafka
WebSockets
gRPC
Snowflake ID
CDN
Questions & Insights

Clarifying Questions

Scale: What is the expected scale for the MVP? (Assumed: 100M Registered Users, 10M DAU, 1M Peak Concurrent Users (PCU)).
Core Features: Are we including Voice/Video or just Text and Presence? (Assumed: Text chat, Servers/Channels, and Real-time Presence only for MVP).
Persistence: Do messages need to be stored forever or is there a TTL? (Assumed: Permanent storage with horizontal scalability).
Group Size: What is the maximum size of a "Server"? (Assumed: Support for "Megaservers" up to 500k members).
Client Support: Mobile and Desktop? (Assumed: Cross-platform support via WebSockets).

Thinking Process

The WebSocket Problem: How do we maintain 1M+ persistent connections and route messages to the correct user sessions?
The Presence Bottleneck: How do we broadcast "Online/Offline" status to thousands of friends/server-mates without creating an O(N^2) traffic storm?
The Storage Choice: Which database can handle heavy write throughput and provide low-latency retrieval by channel_id + timestamp?
The Delivery Guarantee: How do we ensure messages are delivered in order and handle offline synchronization?

Bonus Points

Snowflake IDs: Using decentralized k-sortable unique IDs to ensure message ordering across distributed shards without a central lock.
Presence Fan-out Optimization: Implementing a "Lazy Load" or "View-port" based presence strategy for large servers to prevent thundering herds.
ScyllaDB/Cassandra Compaction: Tuning the storage engine (Time Window Compaction Strategy) specifically for chat access patterns to minimize disk I/O.
Gateway Sharding: Using consistent hashing for Gateway nodes to minimize the impact of node failures on persistent connections.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can create servers and channels.
Users can send and receive real-time messages in channels.
Users can see the real-time online/offline status of friends.
Users can view message history in channels.
Scope Control:
In-scope: Text messaging, Presence, Server/Channel management, Auth.
Out-of-scope: Voice/Video calls, File uploads (CDN), Search functionality, Message reactions.

Non-Functional Requirements

Scale: Support 1M PCU and 100k message writes per second.
Latency: End-to-end message delivery < 200ms.
Availability & Reliability: 99.99% availability; messages must not be lost once acknowledged.
Consistency: Eventual consistency for presence; strong ordering for messages within a channel.
Fault Tolerance: System must survive the failure of individual Gateway or Database nodes.

Estimation

Traffic Estimation:
1M PCU. If 10% of users send a message every 10 seconds: 1M * 0.1 / 10 = 10k Messages/sec (Average).
Peak: 100k Messages/sec.
Storage Estimation:
Average message size: 100 bytes.
100k \text{ msg/sec} * 86,400 \text{ sec/day} \approx 8.6 \text{ billion msgs/day}.
8.6B * 100 \text{ bytes} \approx 860 \text{ GB/day}.
Yearly storage: \approx 310 \text{ TB}.
Bandwidth Estimation:
Incoming: 100k * 100 \text{ bytes} = 10 \text{ MB/s}.
Outgoing (Fan-out): If average channel has 20 active users, 10 \text{ MB/s} * 20 = 200 \text{ MB/s}.

Blueprint

This architecture focuses on a high-concurrency Gateway layer to manage persistent connections and a distributed wide-column store for message persistence.
Gateway Service (WebSocket): Manages stateful connections and bi-directional real-time communication.
Chat Service: Handles message validation, persistence, and fan-out logic.
Presence Service: Tracks user heartbeats and broadcasts status updates.
ScyllaDB/Cassandra: Stores messages using a partition key of channel_id for fast sequential reads.
Redis: Stores transient session mapping and presence state.
Simplicity Audit: We avoid complex service meshes or global transactions. We use ScyllaDB because it scales linearly, which is a requirement even for a "Discord MVP" due to the inherent viral nature of the product.
Architecture Decision Rationale:
Why this architecture?: WebSockets are essential for low-latency bi-directional updates. A wide-column store (ScyllaDB) is chosen over RDBMS because chat history is write-heavy and grows indefinitely.
Functional Satisfaction: Covers real-time delivery via Gateway and history via ScyllaDB.
Non-functional Satisfaction: Redis provides sub-millisecond presence checks; ScyllaDB ensures no single point of failure for data.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery: Use CDN for static assets (CSS/JS) and emojis.
Load Balancing: L4 (TCP) Load Balancer for WebSocket connections using round-robin or least-connections. SSL termination happens here.
Gateway Cluster: Nodes maintain a mapping of User_ID -> WebSocket_Connection. Heartbeats every 30s to prune dead connections.

Service

Chat Service:
Protocol: gRPC for internal service communication.
Idempotency: Clients attach a nonce or client_msg_id.
Ordering: Generates a Snowflake ID (64-bit) for every message to ensure total ordering within a channel.
Presence Service:
Handles "Online", "Idle", "Do Not Disturb", and "Offline".
Optimization: For servers > 1000 members, presence is only sent for users currently looking at the member list (viewport-based presence).
Observability: Prometheus metrics for "Connection Count" and "Message E2E Latency".

Storage

Access Pattern:
Write: High volume inserts (Messages).
Read: Fetch last N messages for a channel_id.
Database Table Design:
Messages (ScyllaDB):
Partition Key: channel_id
Clustering Key: message_id (Snowflake, descending)
Fields: author_id, content, timestamp.
Metadata (PostgreSQL):
Tables for Servers, Channels, Server_Members.
Indexed on user_id and server_id.
Technical Selection: ScyllaDB (NoSQL Wide-column). It handles high write-throughput better than Postgres and supports partitioning by channel_id, allowing message history to be retrieved in a single disk seek.

Cache

Purpose: Store session metadata and user presence state.
Key-Value Schema:
user:presence:{id} -> {status, last_active_ts}.
user:gateway:{id} -> gateway_node_ip (used for routing incoming messages to the correct socket).
Technical Selection: Redis. High performance for TTL-based presence heartbeats.

Messaging

Purpose: Decouples the Chat Service from the Gateway. When a message is sent to a channel, it is published to a topic.
Event Schema: channel_id, message_payload, recipient_list.
Throughput: Kafka handles the fan-out. Each Gateway node subscribes to a subset of partitions.
Technical Selection: Kafka. Required for high-throughput message distribution and ability to "replay" if a Gateway node restarts.
Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP) for Presence. It is acceptable if a user sees a friend as "online" for a few seconds longer than they actually are.
Reliability:
Gateway Failure: If a Gateway node dies, clients reconnect to another node. Redis is updated with the new User -> Gateway mapping.
Backpressure: If Kafka is slow, the Chat Service can buffer messages locally or return a 503 to the client.
Bottleneck Analysis:
Hot Channels: A channel with 100k active users (e.g., a major gaming event) creates a massive fan-out.
Optimization: Use "Sub-groups" for fan-out or limit the broadcast only to the "Top 100" active members in massive channels.
Security:
All messages encrypted in transit via TLS.
Member permissions checked in the Chat Service via Postgres Server_Members table before allowing a write.