DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Design a Massive-Scale Instant Messaging and Social Platform

Design a highly available and scalable communication system similar to WeChat. The system must support real-time 1:1 and group messaging for over 1 billion users, a social feed (Moments) with a complex social graph, and real-time presence tracking. Address the challenges of maintaining hundreds of millions of concurrent persistent connections, ensuring message delivery guarantees (no loss, correct ordering), and managing high-throughput write-heavy workloads for social updates. Discuss the trade-offs between push and pull models for feed distribution and the protocols used for mobile network resilience.
Cassandra
Redis
Kafka
WebSocket
gRPC
Flink
ZooKeeper
CDN
TLS
Anycast DNS
Questions & Insights

Clarifying Questions

Scale and Traffic: What is the expected Daily Active User (DAU) count and peak concurrent users (PCU)?
Assumption: 1 Billion DAU, 200 Million PCU, 100 Billion messages per day.
Core Features: Should we focus on the entire "Super App" or specific pillars?
Assumption: Focus on Instant Messaging (1:1 and Group Chat) and "Moments" (Social Feed). Payments and Mini-programs are out-of-scope for the MVP.
Message Types: What types of content are supported?
Assumption: Text, images, and short videos.
Data Retention: Are messages stored forever or for a specific duration?
Assumption: Messages are stored permanently but cold-storage strategies apply for older data.
Latency Requirements: What is the acceptable delay for message delivery?
Assumption: Under 200ms for 95th percentile (P95) in the same region.

Thinking Process

Core Bottleneck 1: Real-time Connection Management: How do we maintain 200M+ persistent connections efficiently without exhausting server resources?
Core Bottleneck 2: Message Ordering & Reliability: How do we ensure messages arrive in the correct order and are never lost (Exactly-once delivery feel)?
Core Bottleneck 3: Social Graph Fan-out: How do we handle "Moments" updates when a user has 5,000 friends (Write-heavy vs. Read-heavy tradeoffs)?
Progressive Path:
Establish a Gateway Layer to handle long-lived connections (WebSockets/TCP).
Implement a "Sync-Check" or "Mailbox" mechanism for message delivery.
Design a distributed ID generator for global causal ordering.
Optimize the "Moments" feed using a hybrid push/pull model based on user activity.

Bonus Points

Protocol Optimization: Mentioning the use of a custom binary protocol (like WeChat's Mars) over TCP/UDP to handle mobile network volatility (tailored heartbeat, quick reconnect).
Intelligent Diff-Sync: Instead of sending full message lists, use state synchronization (SyncKey) to only fetch the delta between the client's last known state and the server's current state.
Write-Ahead Logging for Group Chats: Utilizing a "Timeline" or "Inbox" model for group chats to avoid N-write amplification (writing once per group rather than once per member).
Multi-region Geo-local Storage: Storing message data close to the user's geographical location to comply with data sovereignty (e.g., GDPR) and reduce latency.
Design Breakdown

Functional Requirements

Core Use Cases:
One-to-one instant messaging (Text/Media).
Group chats (up to 500 members).
User status (Online/Offline/Presence).
"Moments" feed (Post, Like, Comment).
Scope Control:
In-scope: IM core, Moments core, Presence, Media upload.
Out-of-scope: Payments, Official Accounts, Mini-programs, Voice/Video calls.

Non-Functional Requirements

Scale: Support 1 Billion users and high-throughput write operations (1M+ QPS).
Latency: End-to-end message delivery < 200ms.
Availability & Reliability: 99.99% availability; zero message loss.
Consistency: Causal consistency for chat (messages from A must appear in order to B); Eventual consistency for Moments.
Security & Privacy: End-to-end encryption for 1:1 chats; TLS for all transit.

Estimation

Traffic Estimation:
100B messages/day \approx 1.15M messages/sec (Avg QPS).
Peak QPS (3x Avg) \approx 3.5M QPS.
Storage Estimation:
Avg message size: 100 bytes (text/metadata).
100B messages \times 100 bytes = 10 TB/day.
3.6 PB/year (excluding media). Media requires Object Storage (S3/OSS).
Bandwidth Estimation:
1.15M msg/s \times 100 bytes \approx 115 MB/s ingress (Average).

Blueprint

Concise Summary: A microservices architecture leveraging persistent WebSocket/TCP connections through a Gateway, using a "Timeline" (Mailbox) storage model for IM and a Push-Pull hybrid for Moments.
Major Components:
Connection Gateway: Manages millions of persistent TCP/WebSocket connections and handles heartbeat/session state.
Sync Service: The core logic engine that orchestrates message delivery via the "SyncKey" mechanism.
Chat Storage: A high-write NoSQL store (Wide-column) to hold the message timeline per user.
Moments Service: Manages the social feed using a fan-out pattern to distribute posts to friend timelines.
Push Notification (APNs/FCM): Handles delivery for offline users.
Simplicity Audit: This design avoids complex distributed transactions by using idempotent "SyncKey" sequences and eventual consistency for social feeds.
Architecture Decision Rationale:
Why this?: The Mailbox/Timeline model is the industry standard for IM because it scales linearly and handles multi-device synchronization naturally.
Functional Satisfaction: Covers real-time chat, group chat, and social feeds.
Non-functional Satisfaction: Horizontally scalable gateway and storage layers handle the massive DAU and QPS.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Use Anycast DNS to route users to the nearest Data Center (DC).
Static Content: Images/Videos served via CDN with edge-side resizing.
Security & Perimeter:
API Gateway: Handles authentication (JWT/OAuth2) and TLS termination.
Long-lived Connections: Connection Gateway maintains TCP/WebSocket sticky sessions. If a client moves from Wi-Fi to LTE, the gateway handles session resumption via a Session Token.

Service

Topology & Scaling:
Stateless services (Sync, Moments, Presence) deployed in Kubernetes clusters across multiple Availability Zones (AZs).
Auto-scaling based on CPU and Request Count.
API Schema Design:
SendMessage: POST /v1/chat/send (gRPC). Request: {to_id, type, content, client_msg_id}. Idempotency guaranteed via client_msg_id.
SyncMessages: GET /v1/chat/sync. Request: {last_sync_key}. Response: {messages: [], new_sync_key}.
Resilience & Reliability:
SyncKey Mechanism: Each user has a monotonically increasing version number (SyncKey). The client sends its current_sync_key, and the server returns all delta messages since that version. This ensures no message is missed even after a long offline period.

Storage

Access Pattern: 90% write for chat (new messages); 10% read (initial sync/scrolling history).
Database Table Design:
Messages Table (NoSQL - Cassandra):
Partition Key: user_id (Mailbox owner).
Clustering Key: sequence_id (SyncKey).
Fields: sender_id, message_body, timestamp, msg_type.
Technical Selection:
Message Store: Cassandra or HBase. High write throughput and efficient range scans for a specific user's timeline.
User/Friend Metadata: PostgreSQL with read replicas (Strong consistency for account data).
Distribution Logic:
Shard Messages Table by user_id to ensure a single user's mailbox is contiguous on disk.

Cache

Purpose: Reduce DB load for presence and recent feed lookups.
Key-Value Schema:
Presence: presence:{user_id} -> {status: online/offline, last_seen: timestamp}.
Moments Feed: feed:{user_id} -> List[post_ids].
Technical Selection: Redis (Cluster mode).
Failure Handling: Use a small TTL for presence; if Redis is down, fallback to DB is costly, so use "Fail-fast" for presence but "Fail-over" to DB for Feed.

Messaging

Purpose & Decoupling: Decouples the critical path (sending a message) from side effects (updating feed, pushing to APNs, analytics).
Throughput & Partitioning: Kafka with partitioning by user_id to ensure message ordering for individual users.
Failure Handling: Dead-letter queues (DLQ) for messages that fail to be stored or processed after retries.

Data Processing

Processing Model: Streaming (Flink) for real-time analytics and abuse detection (spam filtering).
Processing DAG: Message Stream -> Content Filter -> Sensitive Word Detection -> Statistics Sink.
Technical Selection: Apache Flink for low-latency windowed aggregations.

Infrastructure (Optional)

Observability: Prometheus for metrics, Jaeger for tracing "Sync" requests across services.
Distributed Coordination: ZooKeeper for service discovery and managing the sharding map of the Connection Gateways.
Wrap Up

Advanced Topics

Trade-offs (PACELC): For Chat, we choose Consistency and Partition Tolerance (CP) over Availability. It is better for a message to fail to send than to appear in a scrambled order or be lost.
Moments Fan-out Strategy:
Pull Model: For "Celebrities" (many followers), followers pull the post.
Push Model: For "Regular Users", the post is pushed to friends' timelines.
Optimization (Connection Migration): When a server in the Gateway Layer is overloaded, we don't just kill connections. We send a "Go Away" frame (WebSocket) or a special packet (TCP) instructing clients to reconnect to a different, less-loaded node with jitter to avoid a "Thundering Herd."
Security: Use "Double Ratchet Algorithm" or similar Signal-protocol-inspired encryption for 1:1 chats to ensure Perfect Forward Secrecy (PFS).