The Question
Design

Scalable Distributed Email System

Design a globally scalable email service capable of handling millions of users. The system must support sending and receiving messages, attachment storage, real-time inbox updates, and full-text search while ensuring high durability and low-latency access to the inbox.
SMTP
Cassandra
S3
Redis
Kafka
Elasticsearch
Questions & Insights

Clarifying Questions

What is the target scale for the MVP? (Assumption: 10M DAU, 1B total emails, average 50 emails/day per user).
What is the maximum attachment size? (Assumption: 25MB per email, similar to industry standards).
Do we need to support legacy protocols like IMAP/POP3 for the MVP? (Assumption: No, we will focus on a proprietary HTTP-based API for web/mobile clients first).
Is real-time notification (push) required? (Assumption: Yes, users expect to see new emails immediately).
What is the consistency requirement for the Inbox? (Assumption: High durability is non-negotiable; we cannot lose emails. Read-after-write consistency for the sender's "Sent" folder is expected).

Thinking Process

Core Bottleneck: The primary challenge is the write-heavy nature of receiving emails and the massive storage requirements for bodies and attachments.
Progressive Logic:
How do we ingest emails from the public internet? (SMTP Gateway).
Where do we store the massive volume of unstructured mail bodies vs. structured metadata? (Blob Storage vs. NoSQL).
How do we ensure the UI remains responsive during heavy load? (Asynchronous processing via Message Queues).
How do we provide "instant" search over billions of records? (Search Indexing service).

Bonus Points

Data Locality & Residency: Implementation of "Cells" or "Pods" to ensure user data stays within specific geographic regions for GDPR/compliance.
Delta Sync Protocol: Using a protocol like JMAP or custom sync IDs to fetch only changes since the last known state, reducing mobile data usage.
Storage Tiering: Automatically moving older emails (> 1 year) to colder, cheaper storage (e.g., S3 Glacier) to optimize COGS.
Searchable Encryption: Discussing the trade-offs of encrypting mail at rest while still allowing server-side indexing.
Design Breakdown

Functional Requirements

Users can send and receive emails.
Users can view a list of emails (Inbox/Sent/Drafts).
Users can search emails by keyword, sender, or date.
Support for attachments (up to 25MB).
Real-time notifications for incoming mail.

Non-Functional Requirements

High Durability: 99.9999999% (no mail loss).
High Availability: 99.99% for reading/sending.
Scalability: Must handle spikes in traffic (e.g., marketing blasts).
Low Latency: Rendering the inbox should take < 200ms.

Estimation

DAU: 10M.
Writes (Sending/Receiving): 10M users * 20 emails/day = 200M emails/day (~2,300 TPS).
Storage (Metadata): 200M emails * 1KB = 200GB/day. 73TB/year.
Storage (Bodies/Attachments): 200M emails * 100KB (avg) = 20TB/day. 7.3PB/year.
Search IOPS: Heavy indexing requirements for 200M documents/day.

Blueprint

Concise Summary: A microservices-based architecture leveraging a distributed NoSQL database for metadata and Blob storage for content, decoupled by message queues for reliability.
Major Components:
SMTP Gateway: Handles incoming/outgoing mail traffic using standard mail protocols.
Mail Service: Orchestrates sending, saving drafts, and retrieving mail lists via HTTP.
Search Service: Maintains an inverted index for fast full-text search.
Object Storage: Persists the raw email source and attachments.
NoSQL Metadata Store: Stores folder structures, flags (read/unread), and mail headers.
Simplicity Audit: This design avoids complex distributed transactions by using event-driven indexing and a single-writer model per user mailbox.
Architecture Decision Rationale:
Why this architecture?: Separating metadata from bodies allows us to scale storage independently and optimize the "Inbox View" (metadata) without loading heavy payloads.
Functional Satisfaction: Covers end-to-end mail flow, search, and storage.
Non-functional Satisfaction: Queues provide fault tolerance; NoSQL (e.g., Cassandra) provides horizontal scalability for metadata.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: Stateless microservices deployed across multiple Availability Zones. Scaling is based on Request-Per-Second (RPS) for the Mail Service and Queue Depth for Workers.
API Schema Design:
POST /v1/messages/send: Sends an email. Protocol: REST/JSON. Idempotency: client_mutation_id.
GET /v1/messages/inbox: Returns list of mail summaries. Supports pagination via page_token.
GET /v1/messages/{id}: Fetches full email body and attachment links.
Resilience: Exponential backoff for SMTP delivery. Circuit breakers on the Search Service to prevent UI hangs if indexing is delayed.
Observability: RED metrics for APIs. Distributed tracing (e.g., Jaeger) to track an email from Inbound SMTP to the User's Inbox.
Security: OAuth2 for user AuthN. mTLS between internal services.

Storage

Access Pattern: Read-heavy for Inbox view (metadata), write-heavy for incoming mail.
Database Table Design (NoSQL - e.g., Cassandra):
Table: mail_metadata: Partition Key: user_id, Clustering Key: message_id (DESC).
Fields: subject, from_address, to_address, timestamp, has_attachments, is_read, snippet.
Technical Selection: Cassandra or Bigtable.
Rationale: Excellent write throughput and perfect fit for wide-column storage where each user's inbox is a partition.
Distribution Logic: Partitioned by user_id to ensure all mail for a single user is co-located, making "List Inbox" queries extremely fast.
Reliability: 3x replication factor. Cross-region replication for disaster recovery.

Cache

Purpose & Justification: Reduces load on Metadata DB for the most frequent operation: checking the first page of the Inbox.
Key-Value Schema: Key: inbox:{user_id}, Value: List of top 50 message summaries (JSON). TTL: 24 hours or until invalidated by new mail.
Technical Selection: Redis.
Failure Handling: If Redis is down, fall back to Metadata DB. Use "Write-through" or "Cache-aside" based on consistency needs.

Messaging

Purpose & Decoupling: Decouples the ingestion of mail from the indexing and delivery logic. Prevents SMTP spikes from crashing the Mail Service.
Event Schema: MailReceivedEvent: user_id, message_id, storage_path.
Throughput & Partitioning: Kafka with user_id as the partition key to ensure sequential processing of a single user's mail actions.
Technical Selection: Kafka for high throughput and replayability.

Data Processing

Processing Model: Stream processing using the Indexing Worker.
Processing DAG: Source (Mail Queue) -> Extract Text/Attachments -> Update Search Index (Elasticsearch).
Correctness: Idempotent updates to the search index using message_id as the document ID.
Technical Selection: Custom Go/Java workers for low-latency indexing.
Wrap Up

Advanced Topics

Monitoring: Prometheus for monitoring queue lag; Grafana for visualizing SMTP delivery success rates.
Trade-offs: We choose Eventual Consistency for the Search Index to maintain high availability and low latency for the core mail-sending flow.
Bottlenecks: The SMTP Gateway is a potential single point of failure if not globally distributed across multiple IP ranges (to avoid blacklisting).
Optimization: Use Protobuf for internal service communication to reduce payload size compared to JSON.