The Question

Design a Massive Scale Discussion Platform (Reddit)

Design a community-based discussion platform similar to Reddit. The system must support millions of concurrent users who can join sub-communities, create posts with text or images, engage in deeply nested comment threads, and participate in a high-velocity upvote/downvote system. Focus specifically on how you would architect the feed ranking (Hot/Top) at scale, handle the massive write-throughput of the voting system, and ensure low-latency access to popular content during viral events. Discuss your storage choices for both relational metadata and high-volume event logs, and how you would maintain high availability for a globally distributed user base.

PostgreSQL

Cassandra

Redis

Kafka

CDN

Kubernetes

gRPC

ZSET

Questions & Insights

Clarifying Questions

Scale: What is the target scale for Daily Active Users (DAU) and concurrent users? (Assumption: 100M DAU, 500M monthly active users).

Content Types: Should we support rich media like hosted video and images, or just text and external links for the MVP? (Assumption: Text, external links, and image uploads. Video is out-of-scope for MVP).

Feed Complexity: Do we need personalized AI-driven "For You" feeds or just "Hot/New/Top" ranking per subreddit? (Assumption: Subreddit-specific ranking based on time-decayed vote scores).

Consistency: Is eventual consistency acceptable for vote counts and comment counts? (Assumption: Yes, highly available reads are prioritized over strict real-time vote count accuracy).

Search: Is full-text search required for the MVP? (Assumption: Yes, but basic keyword matching).

Thinking Process

To design a platform like Reddit, we must balance massive read volume with a high-velocity, low-latency write stream for votes.

How do we handle the nested comment structure efficiently? We use a flattened relational or document structure with path-based or parent-id indexing to allow efficient fetching.

How do we scale vote counting without killing the database? We decouple vote ingestion from count updates using a write-behind cache and a messaging queue to aggregate increments.

How do we generate "Hot" feeds for millions of users? We pre-compute subreddit feeds in a cache and use a pull-model for user homepages, merging the top items from a user's subscribed subreddits.

How do we handle "Celebrity" posts (Hot shards)? We implement multi-level caching (CDN -> Redis -> Local App Cache) for viral threads.

Bonus Points

Hybrid Feed Generation: Using a "Pull-on-Demand" approach for long-tail subreddits and a "Push-to-Cache" approach for high-volume popular subreddits.

Probabilistic Counting: Using HyperLogLog for unique view counts to save massive storage space and compute.

Write-Behind Caching with LUA: Using Redis LUA scripts to atomically update vote counts and user-vote state to ensure local consistency before syncing to the database.

Geo-Sharding: Localizing subreddit data to specific regions to reduce latency for regional communities (e.g., r/London stored in UK-based shards).

Design Breakdown

Functional Requirements

Core Use Cases:

Users can create subreddits (communities).

Users can create posts (text/images/links) within subreddits.

Users can comment on posts (nested threading).

Users can upvote/downvote posts and comments.

Users can view "Hot", "New", and "Top" feeds for subreddits and a personalized home feed.

Scope Control:

In-Scope: Post/Comment lifecycle, Voting, Basic Feed, Subreddit management.

Out-of-Scope: Private messaging (chat), live video streaming, advertising engine, sophisticated anti-spam/mod-tools (MVP focus).

Non-Functional Requirements

Scale: Support 100k+ RPS for reads; handle 10k+ RPS for votes.

Latency: Feed loading < 200ms; Vote acknowledgment < 50ms.

Availability & Reliability: 99.99% availability; "Read-your-own-write" consistency for posts/comments.

Consistency: Eventual consistency for vote counts (users see their own vote immediately, but the global counter updates asynchronously).

Fault Tolerance: Region-level failover for read-only traffic.

Estimation

Traffic:

100M DAU.

Read:Write Ratio is roughly 100:1 (Heavy consumption).

Votes are the most frequent write (10x more than comments).

Total Read QPS: ~1M (Peak).

Total Write QPS: ~10k (Posts/Comments) + ~100k (Votes).

Storage:

1M posts/day * 500 bytes = 500MB/day.

10M comments/day * 200 bytes = 2GB/day.

Total Metadata: ~1TB/year.

Media (Images): 1M images/day * 1MB = 1TB/day (Requires Object Store).

Bandwidth:

Outgoing: 1M QPS * 100KB (feed page) = 100GB/s (Requires heavy CDN).

Blueprint

The design utilizes a microservices architecture focused on decoupling the write-heavy voting system from the read-heavy feed system.

Major Components:

Post Service: Manages the lifecycle of posts and subreddit metadata.

Vote Service: High-throughput service that handles upvote/downvote streams via a message queue.

Feed Service: Aggregates and ranks content for users using a combination of cached subreddit lists.

Comment Service: Specialized service for handling nested tree structures.

Object Store: S3-compatible storage for hosting user-uploaded images.

Simplicity Audit: This design avoids complex "Push" models (Fan-out) for feeds, which is necessary for Twitter but overkill for Reddit where content is shared in public communities rather than personal follower graphs.

Architecture Decision Rationale:

Why this architecture?: Separating voting into an async pipeline prevents database locks during viral events.

Functional Satisfaction: Meets all core posting and browsing needs.

Non-functional Satisfaction: Scalable via horizontal service scaling and caching at every layer.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

CDN: Cloudflare or CloudFront to cache static assets (images) and JSON responses for popular subreddit "Hot" feeds (TTL 60s).

Load Balancing: L7 Load Balancer (NGINX/Envoy) for SSL termination and path-based routing (e.g., /api/v1/posts -> Post Service).

Security & Perimeter:

Rate Limiting: Tiered limiting (e.g., 100 votes/minute per user) implemented at the Gateway.

WAF: Standard protection against SQLi and XSS.

Service

Topology & Scaling:

Stateless Go/Java services deployed on Kubernetes.

Scaling based on CPU/Request count.

API Schema Design:

POST /v1/posts: Create a post.

GET /v1/sub/{id}/hot: Fetch ranked list.

POST /v1/votes: {target_id, target_type, direction}. Idempotent based on {user_id, target_id}.

Resilience & Reliability:

Circuit Breakers: Ensure that if the Vote Service is slow, it doesn't block the Feed Service.

Retries: Exponential backoff for post creation.

Storage

Access Pattern:

Reads: Extremely high, mostly by subreddit_id + score/time.

Writes: High for votes, moderate for posts/comments.

Database Table Design:

Post Table (PostgreSQL): id (PK), author_id, subreddit_id, title, content_url, score, comment_count, created_at.

Vote Table (Cassandra): target_id (K), user_id (C), direction. (Using NoSQL for high-write vote logs).

Technical Selection:

PostgreSQL: Used for posts and subreddits due to relational requirements and strong consistency for content creation.

Cassandra: Used for votes to handle massive write volume and provide tunable consistency.

Distribution Logic:

Sharding PostgreSQL by subreddit_id to keep community data together.

Cache

Purpose & Justification: Reduce DB load for feed generation.

Key-Value Schema:

sub:{id}:hot: Sorted Set (ZSET) in Redis where score = Hot Rank and member = Post ID.

user:{id}:votes: Bitmap or Set to track which posts a user has already voted on.

Failure Handling: Cache-aside pattern. If Redis is down, services fall back to DB with strict rate limits.

Messaging

Purpose & Decoupling: Decouples vote submission from database updates to protect the Relational DB from write-spikes.

Event Schema: VoteEvent: {postId, userId, delta, timestamp}.

Throughput & Partitioning: Kafka topic partitioned by postId to ensure vote increments for a single post are processed sequentially to avoid race conditions.

Technical Selection: Kafka. High throughput and durability are critical here.

Data Processing

Processing Model: Streaming (for real-time ranking).

Processing DAG: Vote Queue -> Aggregator Service -> Batch Update DB Score + Update Redis ZSET.

Technical Selection: Custom Go consumer (Worker) is sufficient for MVP instead of Spark/Flink to keep complexity low (YAGNI).

Infrastructure (Optional)

Observability: Prometheus for metrics (RPS, Latency), ELK for logs, Jaeger for tracing request paths across services.

Wrap Up

Advanced Topics

Trade-offs (CAP Theorem): We choose Availability and Partition Tolerance (AP) for votes and feeds. It is better to show a slightly stale vote count than to fail the request.

Reliability: Use a "Dead Letter Queue" for failed vote processing. If the DB update fails, we can replay the vote events.

Bottleneck Analysis:

Hot Subreddits (e.g., r/news during an event): We use local memory caching (in-process) in the Feed Service for the top 100 items of the most active subreddits to bypass Redis.

Deep Comment Threads: Use a "More Comments" pagination approach to avoid fetching thousands of nested records in one query.

Security: RBAC for moderators and shadow-banning logic in the Post/Comment services to prevent spam.