The Question

Real-time Chat Platform Design

Design a high-scale real-time chat platform supporting millions of concurrent users. The system must handle 1-on-1 and group messaging with low latency and guarantee message persistence. Key considerations include maintaining persistent connections at scale, managing user presence (online/offline status), and designing a storage schema optimized for chronological message retrieval across multiple devices.

WebSockets

Redis

Cassandra

Kafka

JSON Web Token

Load Balancer

Service Discovery

NoSQL

Questions & Insights

Clarifying Questions

What is the expected scale of the platform? (e.g., DAU, concurrent connections, and messages per second).

Assumption: 10 Million DAU, 1 Million concurrent connections (Peak), 500 Million messages per day.

What are the core features required for the MVP? (e.g., 1-on-1 chat, group chat, presence indicators, read receipts, media support).

Assumption: MVP includes 1-on-1 and group chat (up to 100 members), presence (online/offline), and text-only messages.

Do we need to support multi-device synchronization?

Assumption: Yes, users expect message history to be synced across multiple active sessions.

What are the latency and durability requirements?

Assumption: Real-time delivery (<100ms) and high durability (messages must not be lost once acknowledged).

Thinking Process

Core Bottleneck: Maintaining and scaling millions of persistent WebSocket connections while ensuring low-latency message routing between geographically distributed users.

Strategy Questions:

How do we track which server a specific user is connected to? (The "Session Mapping" problem).

How do we store and retrieve massive volumes of message history efficiently? (The "Storage Choice" problem).

How do we handle presence status without overwhelming the system with "heartbeat" traffic?

How do we ensure "at-least-once" delivery in an unstable network environment?

Bonus Points

Consistency Models: Utilizing a hybrid approach where message ordering is guaranteed within a conversation (causal consistency) using Lamport Timestamps or Snowflake IDs, rather than strict global ordering.

Write-Optimized Storage: Implementing a LSM-tree based storage (like Cassandra or ScyllaDB) optimized for high-velocity writes and sequential reads by conversation_id + timestamp.

Presence Optimization: Using a "Pull-on-Demand" or "Lazy Presence" model for large group chats to avoid the

O(N^2)

fan-out problem of status updates.

Connection Draining: Implementing graceful WebSocket handovers during deployments to prevent "thundering herd" reconnects that could crash the gateway layer.

Design Breakdown

Functional Requirements

Core Use Cases:

Send and receive 1-on-1 real-time messages.

Create and participate in group chats.

See "Online/Offline" status of contacts.

Fetch conversation history (infinite scroll).

Scope Control:

In-scope: Real-time delivery, presence, history, group chat (MVP size).

Out-of-scope: End-to-end encryption (E2EE), voice/video calls, file/media processing, read receipts.

Non-Functional Requirements

Scale: Support 1M+ concurrent WebSocket connections.

Latency: End-to-end message delivery under 100ms (P95).

Availability & Reliability: 99.99% uptime; messages must be persistent.

Consistency: Messages must appear in the same order for both sender and receiver.

Security: TLS for all connections; JWT-based authentication.

Estimation

Traffic Estimation:

500M messages / 86400s

\approx

5,800 Avg QPS.

Peak QPS (2x avg)

\approx

12,000 writes/sec.

Storage Estimation:

1 message

\approx

100 bytes (Metadata + Text).

500M * 100 bytes = 50GB/day.

5 years of data

\approx

90TB (excluding replication).

Bandwidth Estimation:

Inbound: 12k * 100 bytes = 1.2 MB/s.

Outbound (Fan-out for groups): Assuming avg group size of 5

\approx

6 MB/s.

Blueprint

Concise Summary: A WebSocket-based architecture using a stateful Gateway layer to maintain persistent connections, a Redis-backed Session Store for message routing, and a NoSQL database for high-throughput message persistence.

Major Components:

WebSocket Gateway: Manages persistent bi-directional connections and heartbeats.

Chat Service: Orchestrates message persistence, routing, and group logic.

Presence Service: Tracks user online/offline status via heartbeats.

Session Store (Redis): Maps User_ID -> Gateway_ID for message routing.

Message Store (Cassandra): Provides high-speed writes for message history.

Simplicity Audit: This design avoids complex service meshes and heavy message brokers for the core path, focusing on direct routing via Redis to minimize latency.

Architecture Decision Rationale:

Why?: WebSocket is the industry standard for low-latency bi-directional communication. NoSQL (Cassandra) is chosen over RDBMS because chat data is write-heavy and fits a wide-column key-value pattern (ConversationID as Partition Key).

Functional Satisfaction: Meets real-time, history, and group requirements.

Non-functional Satisfaction: Scalable via horizontal sharding of Gateways and Cassandra.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

Use Geo-DNS to route users to the nearest regional data center.

Load Balancing: L7 Load Balancer (e.g., NGINX/Envoy) with WebSocket support and TLS termination.

Security:

Rate limiting at the LB level to prevent connection-exhaustion attacks.

Service

Topology & Scaling:

WebSocket Gateway: Stateful, horizontally scalable. Nodes are registered in Service Discovery (Consul/Zookeeper) so the Chat Service knows how to reach them.

Chat Service: Stateless, scales based on CPU/Request count.

API Schema Design:

Send Message (WebSocket): { "to": "user_123", "text": "hi", "temp_id": "uuid" }

Fetch History (REST): GET /v1/history/{conv_id}?limit=20&cursor={ts}

Resilience & Reliability:

Client-side retries with exponential backoff for connection loss.

Sequence numbers assigned by the server to detect gaps in message delivery.

Storage

Access Pattern: 1:1 write/read ratio for real-time; read-heavy for history browsing.

Database Table Design (Cassandra):

Table: messages

conversation_id (Partition Key): Group or 1:1 ID.

bucket (Partition Key): To prevent partitions from growing too large (e.g., month/year).

timestamp (Clustering Key): Descending for fast recent message retrieval.

message_id, sender_id, content.

Technical Selection: Cassandra.

Rationale: Linear scalability, no single point of failure, and excellent performance for time-series data like chat logs.

Cache

Purpose:

Session Store: Maps User_ID to Gateway_IP. Used by Chat Service to find where to push a message.

Presence Store: Stores User_ID -> Last_Seen_Timestamp.

Key-Value Schema:

sess:{user_id} -> gateway_id (TTL: 30s, refreshed by heartbeats).

Failure Handling: If Redis fails, use the Message Store as a fallback, and trigger Push Notifications for all messages.

Messaging

Purpose: Decouples the real-time path from the asynchronous "Push Notification" path.

Event Schema: { "recipient_id": "user_1", "preview": "Hello..." }.

Technical Selection: Kafka.

Rationale: High throughput; allows multiple consumers (Notifications, Analytics, Archival) to process the same message stream.

Infrastructure (Optional)

Distributed Coordination:

Consul: Used for service discovery of Gateway nodes.

Observability:

Prometheus metrics for "Connection Count per Gateway" and "Message Delivery Latency".

Wrap Up

Advanced Topics

Trade-offs: We chose Eventual Consistency for the Message Store to ensure high availability. This means a user might briefly see messages out of order if they refresh multiple devices simultaneously during a network partition.

Reliability: To handle "Message Lost" scenarios, we use a Client ACK model. The server sends a message; the client must respond with an ACK. If no ACK is received, the server retries or marks it for the next "Sync" when the client reconnects.

Bottleneck Analysis: The Redis Session Store is a potential single point of failure. We mitigate this using Redis Sentinel or Cluster for high availability.

Presence Optimization: For 10M DAU, heartbeats every 5s create 2M req/sec. We optimize by batching heartbeats and using a "Lazy Presence" update—only notifying friends who are actually looking at the chat screen.