The Question
DesignSocial Media Microblogging Platform
Design a social media platform similar to Twitter. The system should support posting short messages, following other users, delivering personalized real-time feeds, handling viral content fan-out, and serving hundreds of millions of daily active users at low latency.
Redis
Kafka
gRPC
Sharding
Rate Limiting
Design Breakdown
Functional Requirements
Post Tweets: Users can create short text-based posts (up to 280 characters).
Follow/Unfollow: Users can follow other users to subscribe to their content.
Home Timeline: Users can view a chronological feed of tweets from people they follow.
User Timeline: Users can view their own tweet history.
Non-Functional Requirements
High Availability: The system must be available even if components fail (Reads > Writes).
Scalability: Support 200M Daily Active Users (DAU) and 100M tweets/day.
Latency: Timeline generation should take < 200ms.
Eventual Consistency: It is acceptable if a tweet takes a few seconds to appear in a follower's feed.
Estimation
Writes: 100M tweets / 86,400s \approx 1,200 TPS (Transactions Per Second).
Reads: 200M users * 10 timeline views/day \approx 23,000 QPS (Queries Per Second).
Storage: 100M tweets * 300 bytes \approx 30GB/day. 11TB per year.
Bandwidth: 1,200 tweets/s * 300 bytes \approx 360 KB/s (Inbound); Outbound is roughly 20x-50x higher.
Blueprint
Concise Summary: A microservices-based architecture utilizing a "Push-on-Write" fan-out strategy to pre-compute timelines in a Redis cache for low-latency retrieval.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing.
Tweet Service: Manages creation and storage of tweets in a relational database.
Fan-out Worker: Asynchronously pushes new tweet IDs to the Redis caches of active followers.
Timeline Service: Fetches pre-computed tweet lists from Redis and hydrates them with content for the user.
Relational DB (PostgreSQL): Acts as the source of truth for user profiles, follows, and tweet metadata.
Redis (Timeline Cache): Stores lists of tweet IDs for each user's home timeline.
Message Queue (Kafka): Decouples the tweet creation from the expensive fan-out process.
Simplicity Audit: This architecture avoids complex stream processing or graph databases in favor of a proven "Cache-aside" and "Asynchronous Fan-out" pattern, which is sufficient for an MVP scale.
High Level Architecture
Sub-system Deep Dive
Service
Topology & Scaling: Services are deployed as stateless Docker containers orchestrated by Kubernetes. Scaling is based on CPU/Memory utilization.
API Spec:
POST /v1/tweets: Payload {text: string}. Returns tweet_id.GET /v1/timeline/home: Returns a list of Tweet objects. Supports pagination via max_id.POST /v1/follows/{user_id}: Creates a follow relationship.Communication: Internal communication via gRPC; External via REST/JSON.
Storage
Data Model:
Users: user_id (PK), username, email, created_at.Tweets: tweet_id (PK), author_id (FK), content, created_at.Follows: follower_id, followee_id. Composite PK on both columns.Database Logic: PostgreSQL with horizontal sharding by
user_id. Indexes on author_id and created_at for the User Timeline.Cache
Data Structure: Redis Lists or Sorted Sets (ZSETs).
Logic: Each
user_id maps to a list of tweet_ids. TTL/Eviction: Timelines for inactive users (no login for 30 days) are evicted. Max size per list is capped at 1,000 tweet IDs to save memory.
Reconstruction: If a cache miss occurs, the Timeline Service queries the SQL DB (Follows + Tweets) to rebuild the Redis list.
Messaging
Topic Structure:
tweet-events topic.Guarantees: At-least-once delivery.
Consumers: Fan-out workers consume from Kafka. For every new tweet, the worker looks up followers of the author in the DB and updates their corresponding Redis lists.
Wrap Up
Advanced Topics
Trade-offs:
Space vs Time: We trade storage space (redundant tweet IDs in millions of Redis lists) for read speed (timelines are pre-computed).
Consistency: We use "Eventual Consistency". A follower might see a tweet slightly later than another, which is acceptable for social media.
Bottlenecks:
Celebrity Problem (Hot Key): If a user has 50M followers (e.g., Elon Musk), the Fan-out worker will struggle to update 50M Redis lists.
Optimization: For "Power Users" (high follower count), we switch to a Pull Model. Their tweets are not pushed to caches; instead, the Timeline Service fetches them on-demand and merges them with the cached feed.
Failure Handling:
Redis Failure: If Redis fails, the system falls back to querying the SQL DB directly (degraded performance).
Kafka Lag: Monitor consumer lag; if the fan-out is too slow, we increase the number of partitions and worker instances.
Alternatives:
NoSQL (Cassandra): Could be used for storing tweets instead of PostgreSQL for easier horizontal scaling, but RDBMS is chosen for the MVP for ACID compliance on user/follow data and ease of initial development.