The Question
Design

Social Forum System Design

Design a high-scale social news aggregation platform where users can join communities, submit content, and participate in threaded discussions. The system must support a dynamic ranking mechanism for content based on community feedback (votes) and time-based decay, ensuring high availability and low latency for global users.
PostgreSQL
Redis
Kafka
CDN
Microservices
Questions & Insights

Clarifying Questions

Scale and Traffic: What is the target scale for the MVP? (Assumption: 50M DAU, 1B total posts, 100:1 read-to-write ratio).
Feed Complexity: Does the "Hot" feed require complex machine learning, or a time-decay ranking algorithm? (Assumption: Standard decay-based ranking based on votes and age).
Content Types: Should we support video/image hosting or just external links and text? (Assumption: Internal text and image metadata; images/videos are stored in S3).
Comment Depth: Is there a limit to comment nesting? (Assumption: Unlimited nesting, requiring efficient tree retrieval).

Thinking Process

The core challenge of Reddit is the massive read volume and the "Ranking Problem" (calculating the Hot feed dynamically while handling high-velocity voting).
How do we handle high-frequency vote updates without locking the database? Use an asynchronous write-behind pattern with a distributed message queue to decouple the vote action from the persistent storage and ranking update.
How do we serve the 'Hot' feed at sub-second latency? Implement a multi-level caching strategy where feed "skeletons" (Post IDs) are pre-computed and stored in-memory.
How do we retrieve deeply nested comments? Use a "Path Enumeration" or "Materialized Path" approach in the database to allow fetching entire comment threads in a single query.
How do we scale for 'Slashdot' effects (viral spikes)? Utilize a CDN for static assets and edge-caching for popular subreddit front pages.

Bonus Points

Hybrid Vote Persistence: Discussing the use of a "Write-Through" cache for immediate UI feedback (optimistic UI) combined with "Buffered Writes" to the database to prevent IOPS exhaustion.
Feed Sharding: Partitioning the feed cache by subreddit_id to ensure localized performance and horizontal scalability.
Vector Search for Discovery: Briefly mentioning the use of embeddings for "Related Communities" as a secondary growth lever.
Cold Storage Strategy: Moving posts older than 1 year to a columnar store or compressed archive to keep the operational Postgres DB lean.
Design Breakdown

Functional Requirements

Users can create subreddits (communities).
Users can create posts (text/image/link) within subreddits.
Users can upvote/downvote posts and comments.
Users can view "Hot", "New", and "Top" feeds for specific subreddits and a global "All" feed.
Users can comment on posts with nested threading.

Non-Functional Requirements

High Availability: The system must remain available even if the feed generation service lags.
Scalability: Support 100k+ concurrent users and bursts of votes during viral events.
Low Latency: Feed retrieval should be < 200ms.
Eventual Consistency: Vote counts can be eventually consistent (it's okay if a user sees 100 votes instead of 105 for a few seconds).

Estimation

DAU: 50M.
Read QPS: Assuming 20 feed views/user/day = 1B views/day ≈ 12,000 QPS.
Write QPS (Votes): 10 votes/user/day = 500M votes/day ≈ 6,000 QPS.
Storage: 5M posts/day. At 1KB per post metadata = 5GB/day. 1.8TB/year. Standard relational databases can handle this with partitioning.

Blueprint

Concise Summary: A microservices architecture leveraging a relational database for metadata and a distributed cache for feed ranking.
Major Components:
Post Service: Manages post/comment lifecycle and metadata storage.
Vote Service: High-throughput ingestion of upvotes/downvotes using a message queue.
Feed Service: Aggregates and ranks posts based on time-decay algorithms.
PostgreSQL: Primary source of truth for normalized data.
Redis: Stores vote counters and pre-computed feed IDs.
Simplicity Audit: This design avoids complex "Fan-out on write" (like Twitter) because Reddit users follow subreddits, not millions of individual users, making "Fan-out on read" with caching more efficient and easier to maintain.
Architecture Decision Rationale:
Why this architecture?: It decouples the write-heavy voting logic from the read-heavy feed logic.
Functional Satisfaction: Covers the full lifecycle from content creation to discovery.
Non-functional Satisfaction: Redis ensures low-latency reads; Kafka ensures the DB isn't overwhelmed by vote spikes.

High Level Architecture

Sub-system Deep Dive

Service

Topology: Services are deployed as Docker containers on Kubernetes.
API Spec:
POST /v1/posts: Create content.
POST /v1/votes: Submit vote {target_id, type: up|down}.
GET /v1/r/{subreddit}/hot: Retrieve ranked post IDs.
Communication: Internal services communicate via gRPC for low latency; external clients use REST/JSON.

Storage

Data Model:
Subreddits: id, name, metadata.
Posts: id, subreddit_id, user_id, title, content, score, created_at.
Comments: id, post_id, parent_path (LTREE), content, score.
Database Logic: Use PostgreSQL with the ltree extension for comments. This allows querying a whole tree via WHERE path <@ 'root.id'.

Cache

Redis Usage:
Counters: Atomic increments for vote counts (INCRBY).
Feeds: ZSET (Sorted Set) where score is the ranking score and value is the Post ID.
TTL: Subreddit "Hot" feeds have a 60-second TTL to ensure freshness while protecting the DB.

Messaging

Topic Structure: vote-events topic partitioned by post_id.
Delivery: At-least-once delivery.
Consumer: The Ranking Worker batches votes (e.g., every 100 votes or 1 second) to update the Postgres score column and the Redis ZSET score.
Wrap Up

Advanced Topics

Monitoring:
Prometheus: Track API latency and Kafka lag.
Grafana: Dashboard for "Votes per second" and "Cache hit ratio".
Trade-offs: We trade strong consistency for availability. A user might see their own vote reflected in the UI immediately (local state) but others may see the updated count 1-2 seconds later.
Bottlenecks: The primary Postgres database is the bottleneck for writes. Sharding by subreddit_id is the first optimization when the MVP exceeds single-node capacity.
Failure Handling:
Redis Failover: If Redis fails, the Feed Service falls back to querying Postgres directly (with higher latency).
Kafka Buffer: If the Ranking Worker is down, votes stay in Kafka, preventing data loss.
Optimization: Use Bloom Filters in the API Gateway to quickly check if a subreddit exists before routing the request, saving downstream resources.