The Question
DesignDesign a Massive Scale Discussion Platform (Reddit)
Design a community-based discussion platform similar to Reddit. The system must support millions of concurrent users who can join sub-communities, create posts with text or images, engage in deeply nested comment threads, and participate in a high-velocity upvote/downvote system. Focus specifically on how you would architect the feed ranking (Hot/Top) at scale, handle the massive write-throughput of the voting system, and ensure low-latency access to popular content during viral events. Discuss your storage choices for both relational metadata and high-volume event logs, and how you would maintain high availability for a globally distributed user base.
PostgreSQL
Cassandra
Redis
Kafka
CDN
S3
Kubernetes
gRPC
ZSET
Go
Questions & Insights
Clarifying Questions
Scale: What is the target scale for Daily Active Users (DAU) and concurrent users? (Assumption: 100M DAU, 500M monthly active users).
Content Types: Should we support rich media like hosted video and images, or just text and external links for the MVP? (Assumption: Text, external links, and image uploads. Video is out-of-scope for MVP).
Feed Complexity: Do we need personalized AI-driven "For You" feeds or just "Hot/New/Top" ranking per subreddit? (Assumption: Subreddit-specific ranking based on time-decayed vote scores).
Consistency: Is eventual consistency acceptable for vote counts and comment counts? (Assumption: Yes, highly available reads are prioritized over strict real-time vote count accuracy).
Search: Is full-text search required for the MVP? (Assumption: Yes, but basic keyword matching).
Thinking Process
To design a platform like Reddit, we must balance massive read volume with a high-velocity, low-latency write stream for votes.
How do we handle the nested comment structure efficiently? We use a flattened relational or document structure with path-based or parent-id indexing to allow efficient fetching.
How do we scale vote counting without killing the database? We decouple vote ingestion from count updates using a write-behind cache and a messaging queue to aggregate increments.
How do we generate "Hot" feeds for millions of users? We pre-compute subreddit feeds in a cache and use a pull-model for user homepages, merging the top items from a user's subscribed subreddits.
How do we handle "Celebrity" posts (Hot shards)? We implement multi-level caching (CDN -> Redis -> Local App Cache) for viral threads.
Bonus Points
Hybrid Feed Generation: Using a "Pull-on-Demand" approach for long-tail subreddits and a "Push-to-Cache" approach for high-volume popular subreddits.
Probabilistic Counting: Using HyperLogLog for unique view counts to save massive storage space and compute.
Write-Behind Caching with LUA: Using Redis LUA scripts to atomically update vote counts and user-vote state to ensure local consistency before syncing to the database.
Geo-Sharding: Localizing subreddit data to specific regions to reduce latency for regional communities (e.g., r/London stored in UK-based shards).
Design Breakdown
Functional Requirements
Core Use Cases:
Users can create subreddits (communities).
Users can create posts (text/images/links) within subreddits.
Users can comment on posts (nested threading).
Users can upvote/downvote posts and comments.
Users can view "Hot", "New", and "Top" feeds for subreddits and a personalized home feed.
Scope Control:
In-Scope: Post/Comment lifecycle, Voting, Basic Feed, Subreddit management.
Out-of-Scope: Private messaging (chat), live video streaming, advertising engine, sophisticated anti-spam/mod-tools (MVP focus).
Non-Functional Requirements
Scale: Support 100k+ RPS for reads; handle 10k+ RPS for votes.
Latency: Feed loading < 200ms; Vote acknowledgment < 50ms.
Availability & Reliability: 99.99% availability; "Read-your-own-write" consistency for posts/comments.
Consistency: Eventual consistency for vote counts (users see their own vote immediately, but the global counter updates asynchronously).
Fault Tolerance: Region-level failover for read-only traffic.
Estimation
Traffic:
100M DAU.
Read:Write Ratio is roughly 100:1 (Heavy consumption).
Votes are the most frequent write (10x more than comments).
Total Read QPS: ~1M (Peak).
Total Write QPS: ~10k (Posts/Comments) + ~100k (Votes).
Storage:
1M posts/day * 500 bytes = 500MB/day.
10M comments/day * 200 bytes = 2GB/day.
Total Metadata: ~1TB/year.
Media (Images): 1M images/day * 1MB = 1TB/day (Requires Object Store).
Bandwidth:
Outgoing: 1M QPS * 100KB (feed page) = 100GB/s (Requires heavy CDN).
Blueprint
The design utilizes a microservices architecture focused on decoupling the write-heavy voting system from the read-heavy feed system.
Major Components:
Post Service: Manages the lifecycle of posts and subreddit metadata.
Vote Service: High-throughput service that handles upvote/downvote streams via a message queue.
Feed Service: Aggregates and ranks content for users using a combination of cached subreddit lists.
Comment Service: Specialized service for handling nested tree structures.
Object Store: S3-compatible storage for hosting user-uploaded images.
Simplicity Audit: This design avoids complex "Push" models (Fan-out) for feeds, which is necessary for Twitter but overkill for Reddit where content is shared in public communities rather than personal follower graphs.
Architecture Decision Rationale:
Why this architecture?: Separating voting into an async pipeline prevents database locks during viral events.
Functional Satisfaction: Meets all core posting and browsing needs.
Non-functional Satisfaction: Scalable via horizontal service scaling and caching at every layer.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing:
CDN: Cloudflare or CloudFront to cache static assets (images) and JSON responses for popular subreddit "Hot" feeds (TTL 60s).
Load Balancing: L7 Load Balancer (NGINX/Envoy) for SSL termination and path-based routing (e.g.,
/api/v1/posts -> Post Service).Security & Perimeter:
Rate Limiting: Tiered limiting (e.g., 100 votes/minute per user) implemented at the Gateway.
WAF: Standard protection against SQLi and XSS.
Service
Topology & Scaling:
Stateless Go/Java services deployed on Kubernetes.
Scaling based on CPU/Request count.
API Schema Design:
POST /v1/posts: Create a post.GET /v1/sub/{id}/hot: Fetch ranked list.POST /v1/votes: {target_id, target_type, direction}. Idempotent based on {user_id, target_id}.Resilience & Reliability:
Circuit Breakers: Ensure that if the Vote Service is slow, it doesn't block the Feed Service.
Retries: Exponential backoff for post creation.
Storage
Access Pattern:
Reads: Extremely high, mostly by
subreddit_id + score/time.Writes: High for votes, moderate for posts/comments.
Database Table Design:
Post Table (PostgreSQL):
id (PK), author_id, subreddit_id, title, content_url, score, comment_count, created_at.Vote Table (Cassandra):
target_id (K), user_id (C), direction. (Using NoSQL for high-write vote logs).Technical Selection:
PostgreSQL: Used for posts and subreddits due to relational requirements and strong consistency for content creation.
Cassandra: Used for votes to handle massive write volume and provide tunable consistency.
Distribution Logic:
Sharding PostgreSQL by
subreddit_id to keep community data together.Cache
Purpose & Justification: Reduce DB load for feed generation.
Key-Value Schema:
sub:{id}:hot: Sorted Set (ZSET) in Redis where score = Hot Rank and member = Post ID.user:{id}:votes: Bitmap or Set to track which posts a user has already voted on.Failure Handling: Cache-aside pattern. If Redis is down, services fall back to DB with strict rate limits.
Messaging
Purpose & Decoupling: Decouples vote submission from database updates to protect the Relational DB from write-spikes.
Event Schema:
VoteEvent: {postId, userId, delta, timestamp}.Throughput & Partitioning: Kafka topic partitioned by
postId to ensure vote increments for a single post are processed sequentially to avoid race conditions.Technical Selection: Kafka. High throughput and durability are critical here.
Data Processing
Processing Model: Streaming (for real-time ranking).
Processing DAG: Vote Queue -> Aggregator Service -> Batch Update DB Score + Update Redis ZSET.
Technical Selection: Custom Go consumer (Worker) is sufficient for MVP instead of Spark/Flink to keep complexity low (YAGNI).
Infrastructure (Optional)
Observability: Prometheus for metrics (RPS, Latency), ELK for logs, Jaeger for tracing request paths across services.
Wrap Up
Advanced Topics
Trade-offs (CAP Theorem): We choose Availability and Partition Tolerance (AP) for votes and feeds. It is better to show a slightly stale vote count than to fail the request.
Reliability: Use a "Dead Letter Queue" for failed vote processing. If the DB update fails, we can replay the vote events.
Bottleneck Analysis:
Hot Subreddits (e.g., r/news during an event): We use local memory caching (in-process) in the Feed Service for the top 100 items of the most active subreddits to bypass Redis.
Deep Comment Threads: Use a "More Comments" pagination approach to avoid fetching thousands of nested records in one query.
Security: RBAC for moderators and shadow-banning logic in the Post/Comment services to prevent spam.