The Question
DesignSocial Media Microblogging Platform
Design a social media platform similar to Twitter. The system should support posting short messages, following other users, delivering personalized real-time feeds, handling viral content fan-out, and serving hundreds of millions of daily active users at low latency.
Redis
Kafka
Cassandra
PostgreSQL
Hybrid Fan-Out
Questions & Insights
Thinking Process
The core challenge of Twitter is the Timeline Generation. With a massive read-to-write ratio (100:1), pre-computing feeds is essential for low latency, but this creates a "Fan-out" problem when users have millions of followers.
How do we handle the high read volume for timelines? We shift the work from read-time to write-time using a "Push" model (Fan-out) into an in-memory cache.
How do we handle the "Celebrity" (Hot Key) problem? We use a hybrid approach: Push for normal users and Pull (on-demand merge) for celebrities to prevent write-amplification.
How do we scale the social graph (Followers/Following)? Use a dedicated graph-optimized schema in a distributed database to handle rapid membership checks.
How do we ensure the system stays responsive during peak traffic? Use asynchronous processing for non-critical tasks like feed updates and notification triggers.
Bonus Points
Hybrid Fan-out Strategy: Dynamically categorize users based on follower count. Use "Push" for the long tail and "Pull" for "Whales" (celebrities) to optimize both latency and resource consumption.
Bloom Filters: Use Bloom Filters in front of the Follower service to quickly determine if a user might follow another, reducing unnecessary database hits for cold "unfollow" states.
Geo-Sharding & Edge Caching: Store user timelines in Redis clusters geographically close to the user to minimize "Time to First Tweet."
Read-Your-Writes Consistency: Implement session-based consistency to ensure that when a user posts a tweet, it appears immediately on their personal timeline, even if the global fan-out is still processing.
Design Breakdown
Functional Requirements
Users can post tweets (max 280 characters).
Users can follow/unfollow other users.
Users can view a User Timeline (their own tweets).
Users can view a Home Timeline (tweets from people they follow).
Basic search for tweets by keyword.
Non-Functional Requirements
High Availability: The system must be available 99.99% of the time (reads are more critical than writes).
Low Latency: Home timeline generation must be < 200ms.
Eventual Consistency: It is acceptable if a tweet takes a few seconds to appear in a follower's feed.
Scalability: Must support 100M+ Daily Active Users (DAU).
Estimation
DAU: 100 Million.
Tweets per Day: 100M users * 1 tweet/avg = 100M tweets/day (~1,150 TPS).
Read Volume: 100M users * 10 visits = 1 Billion views/day (~11,500 QPS).
Storage (Tweets): 100M tweets * 500 bytes (text + metadata) = 50GB/day.
Storage (Graph): 100M users 200 follows/avg 16 bytes (IDs) = 320GB total.
Cache: To store Home Timelines for active users (last 200 tweets): 100M users 200 IDs 8 bytes = 160GB RAM.
Blueprint
Concise Summary: A microservices architecture leveraging a write-path fan-out mechanism to pre-compute home feeds into a Redis cache.
Major Components:
Tweet Service: Manages tweet creation and persistence in a NoSQL store.
Graph Service: Manages follow relationships using a sharded SQL database.
Timeline Service: Fetches pre-computed feeds from Redis or aggregates them on-the-fly for celebrities.
Fan-out Worker: An asynchronous processor that pushes new tweets into the Redis feeds of followers.
Simplicity Audit: This design avoids complex "pull-only" architectures which would crush the database with JOINs at read-time, opting instead for a simple Redis-based cache push.
High Level Architecture
Sub-system Deep Dive
Service
Topology: Services are deployed as Docker containers in a Kubernetes cluster, autoscaling based on CPU/Request count.
API Spec:
POST /v1/tweets: Creates a tweet. Returns TweetID.GET /v1/timeline/home: Returns a list of Tweet objects for the feed.POST /v1/follow/{userId}: Updates the social graph.Communication: REST/JSON for external; gRPC for internal service-to-service calls (e.g., Timeline Service calling Graph Service).
Storage
Data Model:
Tweet Store: Wide-column (Cassandra) or Key-Value (DynamoDB). Schema:
tweet_id (PK), author_id, content, timestamp. Indexed on author_id for User Timelines.Graph Store: PostgreSQL (Sharded by
follower_id). Schema: follower_id, followee_id, created_at. A composite index on (follower_id, followee_id) ensures fast lookups.Database Logic: Tweets are immutable. High write throughput of NoSQL handles the tweet volume, while SQL's ACID compliance ensures follow/unfollow actions are consistent.
Cache
Data Structure: Redis Lists or Sorted Sets.
Logic: Each active user has a key
feed:{userId}. The value is a list of the 200 most recent TweetIDs. TTL: 72 hours for inactive users to save memory.
Eviction: LRU (Least Recently Used).
Messaging
Component: Apache Kafka or RabbitMQ.
Topic Structure:
tweet-posted topic.Delivery Guarantees: At-least-once delivery to ensure followers eventually see the tweet.
Consumers: Fan-out Workers subscribe to this topic to initiate the feed update process.
Data Processing
Component: Fan-out Workers.
Logic:
Retrieve
TweetID and AuthorID from the message.Query Graph Service for
FollowerIDs of the author.For each follower,
LPUSH the TweetID into their Redis feed.Celebrity Check: If followers > 100k, skip fan-out; instead, mark the tweet for "Pull" logic.
Wrap Up
Advanced Topics
Monitoring:
Prometheus/Grafana: Monitor "Fan-out Lag" (time between tweet post and appearing in feeds).
CloudWatch: Monitor Redis memory usage and DB IOPS.
Trade-offs:
Consistency vs. Availability: We choose Availability (AP). A follower might see a tweet 2 seconds late, but the system must never fail to load the feed.
Bottlenecks: The "Celebrity" fan-out is the main bottleneck. If a user with 50M followers tweets, 50M Redis writes would crash the cluster.
Failure Handling:
Replication: Redis Sentinel/Cluster for high availability.
Dead Letter Queues: If a fan-out fails, the message is retried to ensure no "lost" tweets in feeds.
Alternatives & Optimization:
Media Storage: For an MVP, we assume text-only. To add images/video, use S3 with a CDN (CloudFront) and store the URL in the Tweet metadata.
Snowflake ID: Use Twitter Snowflake or similar for generating time-sortable 64-bit unique IDs for tweets without a centralized DB auto-increment.