The Question
Design

Global Instant Messaging System

Design the backend for a global instant messaging application like WhatsApp. The system must support real-time 1:1 and group chat for 500 million daily active users. Key requirements include low-latency message delivery, presence indicators (online/offline status), delivery and read receipts, and support for offline users via push notifications. Discuss how you handle millions of concurrent persistent connections, message ordering, and data persistence for high-volume writes.
WebSockets
Cassandra
Redis
Kafka
S3
gRPC
API Gateway
Push Notifications
Questions & Insights

Clarifying Questions

Scale and Traffic: What is the expected Daily Active User (DAU) count and average messages sent per user per day?
Feature Set: Does the MVP include 1:1 chat, group chat, and presence (online/offline status)? Should we handle media (images/video) or just text?
Data Retention: Should messages be stored permanently or deleted once delivered?
Network Constraints: Are we designing for high-latency mobile networks?
Assumptions for this design:
Scale: 500 million DAU, 20 billion messages per day.
Features: 1:1 messaging, group messaging (up to 512 members), delivery/read receipts, and presence status.
Media: MVP will support basic image/video uploads via an object store.
E2EE: End-to-end encryption is required (handled client-side, but the system must transport the keys/payloads).

Thinking Process

The core challenge is maintaining millions of persistent connections while ensuring low-latency delivery and handling massive write volumes.
Persistent Connectivity: How do we maintain open pipes for real-time delivery? (WebSockets for bi-directional communication).
Message Routing: How does the system know which server User B is connected to when User A sends a message? (Global Connection Registry/Presence Service).
Offline Delivery: How do we handle messages for users with no active connection? (Persistent storage + Push Notification triggers).
Group Fan-out: How do we efficiently send one message to 512 people without blocking the sender? (Asynchronous expansion via Message Queues).

Bonus Points

Consistency vs. Availability: Use a PACELC-aware design, prioritizing Availability and Partition Tolerance (AP) for presence, but ensuring "At-least-once" delivery for messages.
Connection Draining: Implementing a graceful way to migrate millions of WebSocket connections during deployments without causing a thundering herd.
Conflict-Free Replicated Data Types (CRDTs): Using CRDTs for message ordering and synchronization across multiple devices for the same user.
Last-mile Latency: Using Edge Locations/PoPs to terminate TLS/WebSocket connections closer to the user to reduce handshake latency.
Design Breakdown

Functional Requirements

Core Use Cases:
One-to-one messaging (Text and Media).
Group messaging (Fan-out).
Delivery and Read receipts (Ack/Nack).
Presence status (Online, Offline, Last Seen).
Scope Control:
In-scope: Real-time messaging, status, push notifications.
Out-of-scope: Voice/Video calling (Signaling only), message search, "Stories/Status" updates.

Non-Functional Requirements

Scale: Support 500M DAU and 10M+ concurrent connections per cluster.
Latency: Message delivery < 200ms for 99th percentile (p99) when both users are online.
Availability & Reliability: 99.99% availability; zero message loss for delivered/stored messages.
Consistency: Eventual consistency for presence; causal consistency for message ordering within a thread.
Security: TLS 1.3 for transport; client-side End-to-End Encryption (E2EE).

Estimation

Traffic Estimation:
20B messages / 86,400 seconds ≈ 231,500 Average QPS.
Peak QPS (3x): ~700k QPS.
Storage Estimation:
Avg message size: 100 bytes (text/metadata).
20B msgs/day * 100 bytes = 2 TB/day.
5-year retention: 2 TB 365 5 ≈ 3.6 PB.
Bandwidth Estimation:
Ingress: 231k QPS * 100 bytes ≈ 23 MB/s (text only).
Egress: Multiplying by group fan-out (assume avg 3x fan-out) ≈ 70 MB/s.

Blueprint

Concise Summary: A WebSocket-based architecture where a Connection Manager tracks active users, a Chat Service routes messages, and Cassandra provides high-throughput persistent storage for undelivered or historical messages.
Major Components:
Connection Manager: Maintains stateful WebSocket sessions for real-time delivery.
Presence Service: Tracks online/offline status in a low-latency cache.
Message Service: Handles message logic, receipts, and persistent storage.
Message DB: Distributed NoSQL store for high-volume message persistence.
Message Queue: Decouples group message fan-out and offline push notifications.
Object Store: Stores encrypted media files (images/videos).
Simplicity Audit: This architecture avoids complex distributed locking by using a "Push-on-Connect" and "Store-and-Forward" model, which is the most reliable way to handle intermittent mobile connectivity.
Architecture Decision Rationale:
WebSockets over HTTP: Essential for low-overhead, real-time bi-directional updates.
Cassandra: Chosen for its linear scalability and high write throughput, which fits the massive message volume.
Redis: Chosen for presence tracking because status changes are frequent and ephemeral.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Use Global Server Load Balancing (GSLB) to route users to the nearest regional data center.
CDN used for caching static assets and media downloads.
Security & Perimeter:
API Gateway: Handles authentication via JWT and terminates TLS.
Rate Limiting: Applied per UserID to prevent spam/DDoS.

Service

Topology & Scaling:
Connection Managers: Stateful, horizontally scaled. Each node tracks which UserIDs are connected to it in local memory and updates the Presence Cache.
Chat Service: Stateless, scales based on CPU/QPS.
API Schema Design:
sendMessage(to_id, content_type, payload): gRPC for internal service communication; WebSocket for client-to-server.
getMessages(conversation_id, cursor): REST for historical sync.
Resilience & Reliability:
Heartbeats: Clients send heartbeats every 30s to keep the WebSocket alive and update presence.
Sequence Numbers: Every message has a client-generated monotonically increasing sequence number to handle deduplication and ordering.

Storage

Access Pattern: 90% writes (new messages/status) and 10% reads (historical sync).
Database Table Design:
Table: Messages
conversation_id (Partition Key)
message_id (Clustering Key - TimeUUID)
sender_id, content, timestamp, status (Sent/Delivered/Read)
Technical Selection: Cassandra.
Rationale: High write availability and efficient clustering keys for time-series message retrieval.
Distribution Logic: Sharded by conversation_id. For large groups, a sub-shard (bucket) is added to conversation_id to prevent single-partition hotspots.

Cache

Purpose & Justification: Presence tracking (Online/Offline/Last Seen).
Key-Value Schema:
Key: user:presence:{user_id}, Value: {status: "online", last_seen: timestamp, server_id: "conn-mgr-01"}.
TTL: Set to 60 seconds (expires if heartbeat fails).
Technical Selection: Redis.
Rationale: High IOPS for status updates and support for Pub/Sub (optional for friend status updates).

Messaging

Purpose & Decoupling: Kafka is used for:
Group Fan-out: Chat service produces a "GroupMsg" event; workers expand this into 1:1 messages for each member.
Offline Notifications: If a recipient is offline (checked via Presence Cache), a message is sent to the Push Notification Queue.
Failure Handling: Dead-letter queues (DLQ) for failed push notifications (e.g., invalid device tokens).
Technical Selection: Kafka.
Rationale: Message durability and high throughput for event-driven fan-out.
Wrap Up

Advanced Topics

Trade-offs: We chose Availability over Consistency (AP) for presence. If Redis is slightly out of sync, a user might see someone as "online" who just disconnected. This is acceptable for UX.
Reliability:
Retry Logic: Clients implement exponential backoff for WebSocket reconnection.
Acknowledgement: The "Message Received" flow: Client A -> Server (Ack to A) -> Client B -> Server (Ack from B) -> Client A (Delivered status).
Bottleneck Analysis:
Hot Shards: Large group chats (e.g., 500 members) can cause write spikes.
Solution: Use a separate asynchronous "Group Worker" to handle fan-out rather than doing it in the request-response cycle.
Security & Privacy:
PII Protection: Media stored in S3 is encrypted. Only the URL is stored in Cassandra.
End-to-End Encryption: The server never sees the raw message content, only the encrypted blob and routing metadata.