The Question

Scalable Microblogging Platform (Twitter-like)

Design a high-scale microblogging service that supports posting tweets, following users, and generating a real-time home timeline for 300 million daily active users. Address the challenges of massive read/write asymmetry, the 'celebrity fan-out' bottleneck, and ensure sub-second timeline latency under peak load while maintaining high system availability.

Cassandra

Redis

Kafka

Kubernetes

CDN

gRPC

ZSET

Questions & Insights

Clarifying Questions

Scale: What is the expected scale of the system in terms of Monthly Active Users (MAU) and Daily Active Users (DAU)?

Functional Scope: Are we focusing purely on the core features (Tweeting, Following, Home Timeline) or do we need to include Search, Media Processing (video/images), and Direct Messages for the MVP?

Timeline Behavior: Should the timeline be strictly chronological or algorithm-based?

Celebrity Problem: How should the system handle users with millions of followers (e.g., "The Justin Bieber problem")?

Consistency: Is eventual consistency acceptable for tweet delivery to timelines?

Assumptions:

DAU: 300 Million.

Scale: ~6,000 tweets per second (TPS) average; ~300,000 timeline reads per second (100:1 read/write ratio).

Scope: Core MVP (Tweet, Follow, Home Timeline). Media is stored in S3 but processing is out-of-scope for this architectural focus.

Timeline: Reverse-chronological for simplicity in MVP.

Consistency: Eventual consistency is highly acceptable for high availability.

Thinking Process

The Core Bottleneck: The primary challenge is the Fan-out problem—how to efficiently deliver a single tweet to millions of follower timelines without crushing the database or causing massive latency.

Progressive Logic:

How do we store tweets and follows for high-write throughput? (NoSQL Selection)

How do we build a user's timeline? (Pull vs. Push models)

How do we handle "Hot" users (Celebrities) to prevent fan-out storms? (Hybrid model)

How do we scale the read path for 300k+ QPS? (Pre-computation & Caching)

Bonus Points

Cell-based Architecture: Discussing how to partition users into isolated "cells" to limit the blast radius of failures.

Read-Repair & Anti-Entropy: Using Cassandra's background mechanisms to ensure consistency without sacrificing write speed.

Tail Latency Management: Implementing "Hedged Requests" or aggressive timeouts at the API Gateway to ensure 99th percentile latency remains low.

Selective Fan-out: Using user engagement signals to prioritize fan-out for active users while deferring it for inactive ones to save resources.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can post tweets (280 characters + metadata).

Users can follow other users.

Users can view a "Home Timeline" (tweets from people they follow).

Users can view a "User Timeline" (their own tweets).

Scope Control:

In-scope: Tweeting, Following, Timeline generation, Basic scaling for celebrities.

Out-of-scope: Search, Analytics, Ad-engine, Direct Messages, Media transcoding.

Non-Functional Requirements

Scale: Support 300M DAU and 10k peak write QPS.

Latency: < 200ms for timeline loading; < 500ms for tweet propagation to followers.

Availability & Reliability: 99.99% availability (Highly Available over Strongly Consistent).

Consistency: Eventual consistency (Tweets appearing a few seconds late is okay).

Fault Tolerance: No single point of failure; multi-AZ deployment.

Estimation

Traffic Estimation:

Write QPS: 300M DAU 1 tweet/day / 86400s ≈ 3,500 avg QPS** (10k peak).

Read QPS: 3,500 100 (Read/Write ratio) ≈ 350,000 QPS**.

Storage Estimation:

Tweet size: 280 chars + metadata ≈ 1KB.

Daily storage: 300M tweets 1KB ≈ 300GB/day**.

5-year storage: 300GB 365 * 5 ≈ 547TB**.

Bandwidth Estimation:

Ingress: 3,500 * 1KB ≈ 3.5 MB/s.

Egress: 350,000 (20 tweets * 1KB) ≈ 7 GB/s** (assuming 20 tweets per timeline page).

Blueprint

The design utilizes a Push-based Fan-out model for standard users and a Pull-based model for celebrities to solve the write-amplification problem.

Tweet Service: Handles incoming tweet writes and persists them to a NoSQL store.

Social Graph Service: Manages follow/unfollow relationships.

Fan-out Workers: Asynchronously pushes tweet IDs into the Redis timelines of followers.

Timeline Service: Serves the pre-computed timelines from Redis.

Cassandra: Used for the primary tweet store due to high write-availability.

Redis: Stores materialized timelines for O(1) read access.

Simplicity Audit: We avoid complex graph databases and real-time heavy joins by using materialized views in Redis, which is the simplest way to hit sub-100ms read latencies at this scale.

Architecture Decision Rationale:

Scalability: Decoupling the write path from the fan-out path using Kafka ensures that a spike in tweets doesn't crash the timeline service.

Performance: Pre-computing timelines in Redis minimizes the work done during a user's read request.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

Static assets (JS/CSS/Images) served via Global CDN.

Geo-DNS routes users to the nearest regional data center.

Security & Perimeter:

API Gateway: Handles JWT-based Authentication and Rate Limiting (e.g., 100 tweets/hour per user).

SSL Termination at the Load Balancer level.

Service

Topology & Scaling: Stateless microservices deployed in Kubernetes (EKS/GKE) across multiple Availability Zones. Scaling is triggered by CPU utilization and Request-per-pod metrics.

API Schema Design:

POST /v1/tweets: { text: string, media_ids: [] }. Protocol: REST/HTTPS.

GET /v1/timeline/home: Returns list of Tweet objects. Protocol: gRPC for internal service communication.

Resilience:

Circuit Breaker: If Redis is down, the Timeline Service falls back to a "Pull" model from Cassandra (degraded performance but functional).

Retries: Exponential backoff for Fan-out workers.

Storage

Access Pattern:

Tweets: Heavy writes, read by ID.

Social Graph: Heavy read/write (who follows whom).

Database Table Design:

Tweets Table (Cassandra): tweet_id (K), author_id, content, created_at. Partitioned by author_id or tweet_id.

Follows Table (Cassandra): follower_id (K), followee_id (C), created_at.

Technical Selection: Cassandra.

Rationale: It is optimized for high write throughput and scales linearly. The wide-column store handles the social graph effectively without the overhead of a RDBMS.

Cache

Purpose: Materialized Timelines.

Key-Value Schema:

Key: timeline:{user_id}

Value: Redis Sorted Set (ZSET) containing [tweet_id: timestamp].

TTL: 72 hours for active users; inactive users are evicted.

Failure Handling: If a cache miss occurs, the Timeline Service reconstructs the timeline by querying the Social Graph Service and then fetching the latest tweets from the Tweet Store (Cassandra).

Messaging

Purpose: Decoupling and Load Leveling.

Event Schema: TweetCreatedEvent { tweet_id, author_id, timestamp }.

Throughput: Kafka handles the 10k peak TPS easily.

Partitioning: Partition by author_id to ensure ordering of tweets from the same user.

Technical Selection: Kafka. Chosen for its high durability and ability to replay events if fan-out workers fail.

Data Processing

Processing Model: Asynchronous Stream Processing.

Fan-out Logic:

Worker consumes TweetCreatedEvent.

Fetches list of followers from Social Graph Service.

For each follower, it inserts tweet_id into their Redis ZSET.

Celebrity Logic (Hybrid):

If follower_count > 100,000, the worker skips the fan-out for this user.

When a follower of a celebrity requests their timeline, the Timeline Service pulls the celebrity's tweets on-the-fly and merges them with the pre-computed Redis timeline.

Technical Selection: Custom Golang/Java Workers. They provide the lowest latency for simple I/O bound tasks like Redis updates.

Infrastructure (Optional)

Observability: Prometheus for metrics (Latency/Error Rates), ELK stack for logs, and Jaeger for tracing the fan-out path.

Distributed Coordination: Not heavily required for MVP; standard service discovery via Kubernetes is sufficient.

Wrap Up

Advanced Topics

Trade-offs: We chose Availability over Consistency (AP in CAP). If a tweet is delayed by 2 seconds, the user experience is largely unaffected, but if the system is down, it's a failure.

Bottleneck Analysis:

Redis Memory: Storing timelines for 300M users is expensive. Optimization: Only store timelines for users who have logged in within the last 30 days.

Hot Shards: High-frequency tweeters might cause hot partitions in Cassandra. Optimization: Use a synthetic shard key (e.g., author_id_bucket).

Security:

PII (emails/phone numbers) encrypted at rest in the User Service (not detailed here for YAGNI).

Rate limiting to prevent bot-driven spamming of the tweet endpoint.

Distinguishing Insight: Read-Path Merging. The "Celebrity" solution (Hybrid Pull/Push) is what separates a junior design from a staff design. It prevents the "Thunder Herd" problem where one tweet from a celebrity triggers 50 million Redis writes simultaneously.