The Question
Design

Design a Scalable End-to-End Encrypted (E2EE) Messaging System

Design a secure, real-time messaging application similar to Signal or WhatsApp that supports end-to-end encryption for 10 million daily active users. The system must handle asynchronous messaging (offline recipients), provide perfect forward secrecy, and ensure the server cannot access message content. Discuss the key exchange protocol (X3DH/Double Ratchet), scaling persistent connections, and the trade-offs involved in metadata privacy versus operational visibility.
WebSockets
Signal Protocol
X3DH
Double Ratchet
DynamoDB
Cassandra
Redis
Kafka
S3
AES-256
Questions & Insights

Clarifying Questions

Scale and Growth: What is the target DAU for the MVP, and what are the peak message volumes (QPS)?
Encryption Protocol: Should we implement a proprietary protocol or stick to the industry standard like the Signal Protocol (Double Ratchet + X3DH)?
Persistence: Are messages stored on the server until delivered, or do we provide cloud backups (which complicates E2EE)?
Multi-device: Is the MVP limited to a single primary device (mobile), or is multi-device synchronization (e.g., WhatsApp Web) required?
Media Support: Does the MVP support large file attachments (video/audio), or is it strictly text and small images?
Assumptions for this Design:
DAU: 10 million.
Protocol: Signal Protocol (standard for high security).
Persistence: "Ephemeral-first" – messages are deleted from the server immediately after successful delivery.
Multi-device: Single primary device per account for MVP simplicity.
Content: Text, metadata, and small media (up to 10MB).

Thinking Process

The core challenge of E2EE is the asynchronous key exchange (allowing users to start encrypted sessions even if the recipient is offline) and maintaining message state across high-concurrency connections.
Trust Establishment: How do we securely exchange keys without the server being a "Man-in-the-Middle"? (Solution: X3DH Key Agreement using Pre-keys).
Perfect Forward Secrecy: How do we ensure that a compromised key doesn't leak historical messages? (Solution: Double Ratchet Algorithm).
Real-time Delivery: How do we route messages to a recipient who may be on a different server node or offline? (Solution: WebSocket Gateway + Redis Pub/Sub + Push Notifications).
Metadata Protection: How do we minimize the "who talks to whom" data footprint on our infrastructure? (Solution: Minimal logging and hashed identifiers).

Bonus Points

Sealed Sender: Implementation of Signal's "Sealed Sender" logic to further obscure the sender's identity from the server during transit.
Post-Quantum Resistance: Mentioning the integration of PQ-X3DH to protect against future quantum computing attacks on current asymmetric encryption.
Zero-Knowledge Architecture: Ensuring the server has no access to the user's social graph by using Private Information Retrieval (PIR).
Client-Side Sharding: Discussing how the local SQLite database on the mobile device is optimized for fast full-text search over encrypted blobs.
Design Breakdown

Functional Requirements

Core Use Cases:
User Registration & Identity Verification (SMS-based).
Asynchronous Key Exchange (fetching Pre-keys).
1-to-1 Encrypted Text Messaging.
Delivery/Read Receipts.
Media Upload/Download (Encrypted).
Scope Control:
In-Scope: E2EE messaging, offline delivery, media handling.
Out-of-Scope: Group chats (MVP), Voice/Video calls, Cloud backups.

Non-Functional Requirements

Scale: Support 10M DAU and 500M messages per day.
Latency: End-to-end delivery < 200ms (p99) when both parties are online.
Availability: 99.99% availability for the Key Service (blocking for new chats).
Consistency: High consistency for the Key Store; eventual consistency for message delivery receipts.
Fault Tolerance: Automatic failover of WebSocket nodes without losing message pointers.
Security: Mandatory E2EE; zero-knowledge of message content on servers.

Estimation

Traffic:
10M DAU * 50 messages/day = 500M messages/day.
Average QPS: ~5,800. Peak QPS: ~12,000.
Storage:
Keys: 10M users * 100 Pre-keys = 1B keys. Each key ~100 bytes = 100GB (SSD fits).
Messages: Only stored until delivered. If 10% are offline for 24h: 50M messages * 1KB = 50GB temporary storage.
Bandwidth:
12k QPS * 1KB = 12MB/s (Very manageable).

Blueprint

The design focuses on a stateful WebSocket layer for real-time delivery and a robust Key Management Service for E2EE bootstrapping.
WebSocket Gateway: Maintains persistent connections for real-time delivery and presence.
Identity & Key Service: Stores public identity keys and "Pre-keys" used for X3DH key agreement.
Message Store: A temporary buffer for messages intended for offline users.
Media Store: S3-compatible storage for encrypted blobs.
Simplicity Audit: This architecture avoids complex distributed locking by relying on the Signal Protocol's inherent ability to handle out-of-order keys and a simple Redis-based session tracker.
Architecture Decision Rationale:
Why this?: WebSockets are essential for the low-latency "typing" and "delivered" feedback users expect.
Functional Satisfaction: E2EE is enforced at the client; the server only sees encrypted blobs and routing metadata.
Non-functional Satisfaction: Horizontal scaling of Gateway nodes allows the system to grow to millions of concurrent connections.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery: A CDN is used for downloading encrypted media blobs to reduce latency for global users.
Traffic Routing: Latency-based DNS routing directs users to the nearest regional WebSocket cluster.
Security: SSL termination occurs at the LB; however, the application-level E2EE remains intact. Rate limiting is applied at the API Gateway to prevent SMS registration spam.

Service

Topology: Stateless API services (Identity, Media) and Stateful WebSocket nodes.
WebSocket Gateway:
Maintains a mapping of User_ID -> Connection_ID.
Upon receiving a message, it checks if the recipient is online via the Session Cache (Redis).
API Schema Design:
POST /v1/keys/upload: Client uploads Pre-keys.
GET /v1/keys/fetch/{user_id}: Fetch recipient's Pre-keys for X3DH.
POST /v1/media/upload: Returns a signed S3 URL.
Resilience:
WebSocket nodes use heartbeats. If a client disconnects, the Session Cache is updated within seconds.
Message Bus (Kafka) ensures that if a Gateway node crashes, the message can be re-routed or sent to the Offline Store.

Storage

Key Store: PostgreSQL or DynamoDB.
Schema: user_id (PK), identity_public_key, signed_pre_key, pre_key_bundle (JSON/Blob).
Choice: DynamoDB for its seamless scaling and predictable performance for key lookups.
Offline Message Store: Cassandra.
Access Pattern: High write (messages coming in), high delete (messages delivered).
Schema: recipient_id (Partition Key), message_id (Clustering Key), encrypted_payload, timestamp.
TTL: 30 days (YAGNI for permanent storage).
Distribution: Sharded by user_id to avoid hot spots on popular users.

Cache

Purpose: Tracks active user sessions and maps them to specific WebSocket server IDs.
Key-Value Schema: session:{user_id} -> {server_id}. TTL is 2 minutes, refreshed by heartbeat.
Technical Selection: Redis (Cluster mode).
Failure Handling: If Redis fails, the system falls back to broadcasting messages to all Gateway nodes (expensive) or pushing to the Offline Store.

Messaging

Purpose: Decouples the ingest of a message from its delivery.
Event Schema: sender_id, recipient_id, payload_url (for media) or payload_blob.
Throughput: 12k Peak QPS. Kafka handles this easily with 10-20 partitions.
Technical Selection: Kafka.
Rationale: Provides the durability needed to ensure no message is lost between the "Ingest" and "Delivery" phase.

Infrastructure (Optional)

Observability: Prometheus for monitoring WebSocket connection counts and message lag.
Security: mTLS between all internal microservices. KMS (Key Management Service) used to encrypt the "Offline Message Store" at rest (extra layer of protection).
Wrap Up

Advanced Topics

Consistency vs. Availability: We choose Availability for message delivery (Eventual Consistency). If a message arrives out of order, the Signal Protocol's ratchet handles it on the client.
Reliability: We implement "At-least-once" delivery. The client sends an ACK back to the server; the server only deletes from the Offline Store or Kafka offset after receiving the ACK.
Bottleneck Analysis: The Key Store is the most critical component. If users cannot fetch Pre-keys, they cannot start new conversations. We mitigate this using multi-AZ replication in DynamoDB.
Security & Privacy:
No Metadata Persistence: Delete routing metadata as soon as the message is ACKed.
Media: Clients encrypt media with a random symmetric key, upload to S3, and send the key + URL to the recipient via the E2EE channel. The server never sees the media decryption key.