The Question
Design

Design a High-Scalability Social News Aggregator

Design a system similar to Reddit that supports millions of active users. The system must handle community creation (subreddits), threaded discussions with deeply nested comments, and a high-concurrency voting system. Focus on the architecture for ranking feeds (Hot/Top), managing extreme write bursts on popular content, and ensuring low-latency read access to community feeds. Discuss how you would handle data consistency for vote counts and the strategy for scaling the storage of billions of comments.
PostgreSQL
Redis
Kafka
S3
CloudFront
Citus
Kubernetes
Elasticsearch
Protobuf
Questions & Insights

Clarifying Questions

Scale: What are the target metrics for Daily Active Users (DAU) and concurrent users?
Assumption: 50M DAU, with an average of 10-20 posts viewed per session.
Content Types: Does the MVP support media hosting (images/videos) or just text and external links?
Assumption: Support text, external links, and image hosting (via S3/CDN). Video is out of scope for MVP.
Feed Complexity: How is the "Hot" feed calculated? Is it personalized or global?
Assumption: A global/subreddit-level "Hot" algorithm based on vote count and time decay (Wilson score or similar).
Consistency Requirements: Does voting need to be reflected instantly globally?
Assumption: Eventual consistency is acceptable for vote counts and comment counts to favor high availability.

Thinking Process

Core Bottleneck: High read-to-write ratio (roughly 100:1) with extreme "hot keys" (popular threads).
Key Strategy: Decouple vote ingestion from feed generation using asynchronous processing and heavy multi-layer caching.
Progressive Walkthrough:
How do we store millions of posts and nested comments efficiently? (Relational DB with Path enumeration or Closure tables).
How do we handle a burst of 100k votes on a single "Front Page" post? (Message Queue buffering and write-behind caching).
How do we serve the "Hot" feed to 50M users without recalculating it per request? (Pre-computed feed caches).
How do we handle "Search" and "Discovery"? (Managed Search Service like Elasticsearch).

Bonus Points

Hybrid Feed Strategy: For very active subreddits, use a "Push" model to pre-compute feeds; for inactive ones, use a "Pull" model (Lazy loading) to save compute.
Vote Fraud Detection: Implementing a sliding-window rate limiter and anomaly detection at the ingestion layer to prevent bot manipulation.
Data Locality: Using Geo-DNS and Edge Side Includes (ESI) to cache subreddit fragments closer to users.
Post-GIS/Spatial Logic: If Reddit-style "Local" feeds are required, use spatial indexing for location-based post discovery.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can create subreddits (communities).
Users can create posts (text, link, image) within subreddits.
Users can comment on posts (nested threading).
Users can Upvote/Downvote posts and comments.
Users can view "Hot", "New", and "Top" feeds.
Scope Control:
In-Scope: User Auth, Posts, Comments, Voting, Feed Generation.
Out-of-Scope: Direct Messaging, Live Chat, Video Streaming, Advertising Engine.

Non-Functional Requirements

Scale: 50M DAU; 1B+ total posts/comments.
Latency: Feed loading < 200ms; Vote reflection < 1s (UI feedback can be optimistic).
Availability & Reliability: 99.99% availability; No data loss for posts/comments.
Consistency: Eventual consistency for vote counts; Strong consistency for user account actions.
Fault Tolerance: Multi-AZ deployment; Circuit breakers for search/media services.
Security: Rate limiting for voting/posting to prevent spam; TLS encryption in transit.

Estimation

Traffic Estimation:
50M DAU * 20 reads/day = 1B reads/day (~11,500 Avg QPS; 25,000 Peak QPS).
50M DAU * 2 writes/day (votes/posts) = 100M writes/day (~1,200 Avg QPS; 3,000 Peak QPS).
Storage Estimation:
1M new posts/day * 5KB (metadata + text) = 5GB/day.
10M new comments/day * 2KB = 20GB/day.
Total Metadata: ~9TB per year (Postgres sharding required).
Media: Assuming 500k images/day @ 1MB = 500GB/day (S3/Object Storage).
Bandwidth Estimation:
Ingress: ~3,000 writes/sec * 5KB = 15MB/s.
Egress: ~25,000 reads/sec 10 posts/page 5KB = 1.25GB/s (Significant, needs CDN).

Blueprint

Concise Summary: A microservices-based architecture leveraging Sharded Relational Databases for metadata consistency, S3 for media, and Redis for high-speed feed delivery.
Major Components:
API Gateway: Handles Auth, Rate Limiting, and Routing.
Post/Comment Service: Manages the lifecycle of content in Sharded Postgres.
Vote Service: Uses Kafka to buffer high-frequency vote writes to protect the database.
Feed Service: Aggregates data from Post and Vote services to serve pre-computed lists from Redis.
Media Store: S3 for hosting images with CloudFront CDN for global delivery.
Simplicity Audit: This design avoids complex graph databases in favor of optimized SQL queries and heavy caching, which is easier to operate and scale for an MVP.
Architecture Decision Rationale:
Why this architecture is the best?: The decoupling of votes via Message Queues prevents write-locking the main database during viral events.
Functional Requirement Satisfaction: Handles nested comments via adjacency lists/path enumeration and feed ranking via background workers.
Non-functional Requirement Satisfaction: Scalability via sharding and availability via read-replicas and multi-region CDNs.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: CloudFront CDN used for all static assets and image content. Dynamic requests are routed via Geo-DNS to the nearest regional API Gateway.
Security & Perimeter: WAF enabled on the Gateway to block known botnets. Rate limiting implemented at the User-ID and IP level (e.g., max 60 votes per minute).

Service

Topology & Scaling: Stateless microservices deployed on Kubernetes (EKS). Scaled based on CPU and Request Count.
API Schema Design:
POST /v1/posts: Create post (Auth required).
GET /v1/r/{subreddit}/hot: Returns list of post IDs and metadata.
POST /v1/votes: {target_id: uuid, direction: int}. Idempotent based on user_id + target_id.
Resilience & Reliability: Circuit breakers on the Media Store; if S3 is down, posts return text-only. Exponential backoff on vote retries.

Storage

Access Pattern: 95% reads (feeds), 5% writes (votes/comments).
Database Table Design:
Posts Table: id (UUID), subreddit_id, author_id, title, content_url, created_at.
Comments Table: id, post_id, parent_id (for nesting), body, path (L-Tree for fast subtree retrieval).
Votes Table: user_id, target_id, direction, updated_at.
Technical Selection: PostgreSQL with Citus or manual sharding by subreddit_id to ensure all data for one community lives together.
Distribution Logic: Sharded by subreddit_id to optimize for local feed generation.

Cache

Purpose & Justification: Minimize DB load for popular feeds and vote counts.
Key-Value Schema:
feed:{subreddit_id}:hot -> Sorted Set (Score = Rank, Value = PostID).
post:{id}:votes -> Integer (Atomic increments).
Technical Selection: Redis. High throughput, supports Sorted Sets (ZSET) for ranking.

Messaging

Purpose & Decoupling: Buffers votes. A single viral post can generate 10k votes/sec; Kafka absorbs this burst.
Event / Topic Schema: vote-events: {userId, targetId, voteType, timestamp}.
Technical Selection: Kafka. High durability and ability to replay events if the Aggregator fails.

Data Processing

Processing Model: Near real-time stream processing for vote aggregation; Batch processing (every 5 mins) for "Top" and "Hot" feed ranking updates.
Technical Selection: Custom Go-based workers for vote aggregation. Simple cron-based Spark or Flink jobs for feed ranking in the MVP stage.
Wrap Up

Advanced Topics

Trade-offs: We chose Eventual Consistency for vote counts. A user might see 1,000 votes, while another sees 1,005. This is a standard trade-off to ensure the system doesn't lock up during high traffic.
Reliability: If the Feed Service's Redis cache is wiped, the system falls back to a "Cold Start" strategy where it queries the Postgres read-replicas directly (with heavy rate limiting) while background workers rebuild the cache.
Bottleneck Analysis: The "Subreddit Owner" or "Celebrity Post" problem (hot shards). Solution: Use an LRU cache at the API Gateway layer to serve the most popular 0.1% of posts without even hitting the Post Service.
Optimization: Use Protobuf for service-to-service communication to reduce payload size and serialization latency compared to JSON.