The Question
Design

Scalable Microblogging Platform (Twitter-like)

Design a high-scale microblogging service that supports posting tweets, following users, and generating a real-time home timeline for 300 million daily active users. Address the challenges of massive read/write asymmetry, the 'celebrity fan-out' bottleneck, and ensure sub-second timeline latency under peak load while maintaining high system availability.
Cassandra
Redis
Kafka
S3
Kubernetes
CDN
gRPC
ZSET
Questions & Insights

Clarifying Questions

Scale: What is the expected scale of the system in terms of Monthly Active Users (MAU) and Daily Active Users (DAU)?
Functional Scope: Are we focusing purely on the core features (Tweeting, Following, Home Timeline) or do we need to include Search, Media Processing (video/images), and Direct Messages for the MVP?
Timeline Behavior: Should the timeline be strictly chronological or algorithm-based?
Celebrity Problem: How should the system handle users with millions of followers (e.g., "The Justin Bieber problem")?
Consistency: Is eventual consistency acceptable for tweet delivery to timelines?
Assumptions:
DAU: 300 Million.
Scale: ~6,000 tweets per second (TPS) average; ~300,000 timeline reads per second (100:1 read/write ratio).
Scope: Core MVP (Tweet, Follow, Home Timeline). Media is stored in S3 but processing is out-of-scope for this architectural focus.
Timeline: Reverse-chronological for simplicity in MVP.
Consistency: Eventual consistency is highly acceptable for high availability.

Thinking Process

The Core Bottleneck: The primary challenge is the Fan-out problem—how to efficiently deliver a single tweet to millions of follower timelines without crushing the database or causing massive latency.
Progressive Logic:
How do we store tweets and follows for high-write throughput? (NoSQL Selection)
How do we build a user's timeline? (Pull vs. Push models)
How do we handle "Hot" users (Celebrities) to prevent fan-out storms? (Hybrid model)
How do we scale the read path for 300k+ QPS? (Pre-computation & Caching)

Bonus Points

Cell-based Architecture: Discussing how to partition users into isolated "cells" to limit the blast radius of failures.
Read-Repair & Anti-Entropy: Using Cassandra's background mechanisms to ensure consistency without sacrificing write speed.
Tail Latency Management: Implementing "Hedged Requests" or aggressive timeouts at the API Gateway to ensure 99th percentile latency remains low.
Selective Fan-out: Using user engagement signals to prioritize fan-out for active users while deferring it for inactive ones to save resources.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can post tweets (280 characters + metadata).
Users can follow other users.
Users can view a "Home Timeline" (tweets from people they follow).
Users can view a "User Timeline" (their own tweets).
Scope Control:
In-scope: Tweeting, Following, Timeline generation, Basic scaling for celebrities.
Out-of-scope: Search, Analytics, Ad-engine, Direct Messages, Media transcoding.

Non-Functional Requirements

Scale: Support 300M DAU and 10k peak write QPS.
Latency: < 200ms for timeline loading; < 500ms for tweet propagation to followers.
Availability & Reliability: 99.99% availability (Highly Available over Strongly Consistent).
Consistency: Eventual consistency (Tweets appearing a few seconds late is okay).
Fault Tolerance: No single point of failure; multi-AZ deployment.

Estimation

Traffic Estimation:
Write QPS: 300M DAU 1 tweet/day / 86400s ≈ 3,500 avg QPS** (10k peak).
Read QPS: 3,500 100 (Read/Write ratio) ≈ 350,000 QPS**.
Storage Estimation:
Tweet size: 280 chars + metadata ≈ 1KB.
Daily storage: 300M tweets 1KB ≈ 300GB/day**.
5-year storage: 300GB 365 * 5 ≈ 547TB**.
Bandwidth Estimation:
Ingress: 3,500 * 1KB ≈ 3.5 MB/s.
Egress: 350,000 (20 tweets * 1KB) ≈ 7 GB/s** (assuming 20 tweets per timeline page).

Blueprint

The design utilizes a Push-based Fan-out model for standard users and a Pull-based model for celebrities to solve the write-amplification problem.
Tweet Service: Handles incoming tweet writes and persists them to a NoSQL store.
Social Graph Service: Manages follow/unfollow relationships.
Fan-out Workers: Asynchronously pushes tweet IDs into the Redis timelines of followers.
Timeline Service: Serves the pre-computed timelines from Redis.
Cassandra: Used for the primary tweet store due to high write-availability.
Redis: Stores materialized timelines for O(1) read access.
Simplicity Audit: We avoid complex graph databases and real-time heavy joins by using materialized views in Redis, which is the simplest way to hit sub-100ms read latencies at this scale.
Architecture Decision Rationale:
Scalability: Decoupling the write path from the fan-out path using Kafka ensures that a spike in tweets doesn't crash the timeline service.
Performance: Pre-computing timelines in Redis minimizes the work done during a user's read request.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Static assets (JS/CSS/Images) served via Global CDN.
Geo-DNS routes users to the nearest regional data center.
Security & Perimeter:
API Gateway: Handles JWT-based Authentication and Rate Limiting (e.g., 100 tweets/hour per user).
SSL Termination at the Load Balancer level.

Service

Topology & Scaling: Stateless microservices deployed in Kubernetes (EKS/GKE) across multiple Availability Zones. Scaling is triggered by CPU utilization and Request-per-pod metrics.
API Schema Design:
POST /v1/tweets: { text: string, media_ids: [] }. Protocol: REST/HTTPS.
GET /v1/timeline/home: Returns list of Tweet objects. Protocol: gRPC for internal service communication.
Resilience:
Circuit Breaker: If Redis is down, the Timeline Service falls back to a "Pull" model from Cassandra (degraded performance but functional).
Retries: Exponential backoff for Fan-out workers.

Storage

Access Pattern:
Tweets: Heavy writes, read by ID.
Social Graph: Heavy read/write (who follows whom).
Database Table Design:
Tweets Table (Cassandra): tweet_id (K), author_id, content, created_at. Partitioned by author_id or tweet_id.
Follows Table (Cassandra): follower_id (K), followee_id (C), created_at.
Technical Selection: Cassandra.
Rationale: It is optimized for high write throughput and scales linearly. The wide-column store handles the social graph effectively without the overhead of a RDBMS.

Cache

Purpose: Materialized Timelines.
Key-Value Schema:
Key: timeline:{user_id}
Value: Redis Sorted Set (ZSET) containing [tweet_id: timestamp].
TTL: 72 hours for active users; inactive users are evicted.
Failure Handling: If a cache miss occurs, the Timeline Service reconstructs the timeline by querying the Social Graph Service and then fetching the latest tweets from the Tweet Store (Cassandra).

Messaging

Purpose: Decoupling and Load Leveling.
Event Schema: TweetCreatedEvent { tweet_id, author_id, timestamp }.
Throughput: Kafka handles the 10k peak TPS easily.
Partitioning: Partition by author_id to ensure ordering of tweets from the same user.
Technical Selection: Kafka. Chosen for its high durability and ability to replay events if fan-out workers fail.

Data Processing

Processing Model: Asynchronous Stream Processing.
Fan-out Logic:
Worker consumes TweetCreatedEvent.
Fetches list of followers from Social Graph Service.
For each follower, it inserts tweet_id into their Redis ZSET.
Celebrity Logic (Hybrid):
If follower_count > 100,000, the worker skips the fan-out for this user.
When a follower of a celebrity requests their timeline, the Timeline Service pulls the celebrity's tweets on-the-fly and merges them with the pre-computed Redis timeline.
Technical Selection: Custom Golang/Java Workers. They provide the lowest latency for simple I/O bound tasks like Redis updates.

Infrastructure (Optional)

Observability: Prometheus for metrics (Latency/Error Rates), ELK stack for logs, and Jaeger for tracing the fan-out path.
Distributed Coordination: Not heavily required for MVP; standard service discovery via Kubernetes is sufficient.
Wrap Up

Advanced Topics

Trade-offs: We chose Availability over Consistency (AP in CAP). If a tweet is delayed by 2 seconds, the user experience is largely unaffected, but if the system is down, it's a failure.
Bottleneck Analysis:
Redis Memory: Storing timelines for 300M users is expensive. Optimization: Only store timelines for users who have logged in within the last 30 days.
Hot Shards: High-frequency tweeters might cause hot partitions in Cassandra. Optimization: Use a synthetic shard key (e.g., author_id_bucket).
Security:
PII (emails/phone numbers) encrypted at rest in the User Service (not detailed here for YAGNI).
Rate limiting to prevent bot-driven spamming of the tweet endpoint.
Distinguishing Insight: Read-Path Merging. The "Celebrity" solution (Hybrid Pull/Push) is what separates a junior design from a staff design. It prevents the "Thunder Herd" problem where one tweet from a celebrity triggers 50 million Redis writes simultaneously.