The Question
Design

Design a Professional Social Network Feed and Connection System

Design the core architecture for a professional networking platform like LinkedIn. The system must support user profiles, a bidirectional connection graph (1st and 2nd degree), and a highly scalable news feed. Focus on how you would handle a scale of 300 million daily active users, including strategies for feed generation (push vs. pull), handling high-profile 'power users', and ensuring low-latency retrieval of content from a user's professional network. Explain your choice of data stores for both profile metadata and the connection graph.
PostgreSQL
Neo4j
Cassandra
Redis
Kafka
Elasticsearch
JWT
Kubernetes
Questions & Insights

Clarifying Questions

Scale and Traffic: What is the expected scale (DAU/MAU)? Assumption: 300M DAU, 800M total users.
Core Features: Which features are critical for the MVP? Assumption: User profiles, connection graph (1st/2nd degree), post creation, and a personalized news feed.
Feed Type: Is the feed chronological or ranked? Assumption: A hybrid ranked feed where recent activity from 1st-degree connections is prioritized.
Search: Do we need full-text search? Assumption: Yes, for people and posts.
Consistency: Is strong consistency required for connections? Assumption: Eventual consistency is acceptable for the feed and connection counts, but strong consistency is preferred for profile edits.

Thinking Process

The Connection Bottleneck: How do we efficiently query a graph of 800M nodes and billions of edges to generate a feed?
The Feed Fan-out Challenge: Should we use "Push" (Fan-out on Write) or "Pull" (Fan-out on Read)?
Hybrid Retrieval Strategy: How do we handle "Power Users" (celebrities with millions of followers) without crashing the fan-out workers?
Graph Traversal: Implementing "People You May Know" using 2nd-degree connection paths.

Bonus Points

Graph Partitioning: Discussing how to shard a graph database (e.g., using consistent hashing on UserID) to minimize cross-shard joins.
Feed Pre-computation Strategy: Using a "Read-ahead" buffer for active users to keep the P99 latency low.
TAO (The Associations Object): Mentioning Facebook/LinkedIn style graph caching layers for high-performance edge lookups.
Geo-sharding: Placing profile data and feed caches closer to the user's geographic region to reduce RTT.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can create and update professional profiles.
Users can send/accept connection requests (bidirectional).
Users can create posts (text, images).
Users can view a news feed of 1st-degree connections' activity.
Scope Control:
In-scope: Profile, Graph, Feed, Search.
Out-of-scope: Messaging (IM), Job Board, Premium Subscriptions, Ads engine.

Non-Functional Requirements

Scale: 300M DAU, 1B+ connections.
Latency: Feed loading < 200ms; Profile updates < 500ms.
Availability: 99.99% (Highly available for feed consumption).
Consistency: Eventual consistency for the feed (seconds of lag is okay); Strong consistency for profile changes.
Fault Tolerance: Region-level failover for the read-heavy feed service.

Estimation

Traffic Estimation:
300M DAU.
Feed Views: 10/day/user = 3B views/day ≈ 35k QPS.
Posts: 1% of users post daily = 3M posts/day ≈ 35 QPS.
Storage Estimation:
Profiles: 1B 10KB = 10 TB**.
Graph Edges: 1B users 500 connections * 16 bytes = 8 TB**.
Posts (Text): 3M/day 1KB * 365 days * 3 years ≈ 3.3 TB**.
Bandwidth Estimation:
Outgoing (Feed): 35k QPS 50KB (feed metadata) ≈ 1.75 GB/s**.

Blueprint

This design focuses on a microservices architecture optimized for high read-to-write ratios. The core is the Connection Service using a Graph Database for relationship management and a Feed Service utilizing a hybrid fan-out strategy.
API Gateway: Entry point for Auth, Rate Limiting, and Routing.
Profile Service: Manages professional identity data (RDBMS).
Connection Service: Manages the professional graph (Graph DB).
Post Service: Handles content creation and storage.
Feed Service: Aggregates content from connections to build the user's wall.
Search Service: Provides full-text search capabilities (Elasticsearch).
Simplicity Audit: We use a standard RDBMS for profiles to ensure data integrity and a specialized Graph DB for connections to avoid complex SQL recursive joins. We use Redis for feed caching to meet latency targets.
Architecture Decision Rationale:
Why Graph DB?: Social networks are inherently graph-based. Relational databases struggle with "N-degree" depth queries.
Functional Satisfaction: Covers profiles, networking, and content.
Non-functional Satisfaction: Redis-based feed pre-computation ensures sub-200ms latency.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Traffic Routing: Latency-based DNS routing to the nearest regional data center.
API Gateway:
Handles JWT-based authentication.
Implements leaky bucket rate limiting (100 requests/sec per user).
SSL Termination.

Service

Profile Service:
Stateless REST API.
Scaling: Horizontal scaling based on CPU usage.
API: GET /v1/profiles/{id}, PUT /v1/profiles/{id}.
Connection Service:
Manages "Follow" and "Connect" actions.
API: POST /v1/connections/request, GET /v1/connections/{id}/degrees/2.
Feed Service:
Pulls pre-computed feed IDs from Redis.
Hydrates IDs with post content from Post Service.
Fallback: If Redis is empty, perform a "Pull" from the Connection Service.

Storage

Profile DB (PostgreSQL):
Table: Users (user_id PK, name, bio, experience_json, location).
Indexing: B-Tree on user_id.
Connection DB (Neo4j/AWS Neptune):
Nodes: User.
Edges: CONNECTED_TO (bidirectional), FOLLOWS (unidirectional).
Post DB (Cassandra):
Chosen for high write throughput.
Table: Posts (post_id, author_id, content, timestamp).
Partition Key: author_id (allows fast retrieval of a single user's posts).

Cache

Feed Cache (Redis):
Purpose: Store the list of post_ids for a user's feed.
Schema: Key: feed:{user_id}, Value: ZSET <timestamp, post_id>.
TTL: 72 hours.
Eviction: LRU.
Failure Handling: If Redis fails, the system falls back to querying the Post DB for 1st-degree connections (slower but functional).

Messaging

Fanout Message Queue (Kafka):
Purpose: When a post is created, the Post Service produces a message.
Topic: post-events.
Consumers: Fanout Workers.
Strategy:
For active users (< 5k connections), push post_id to all connection caches.
For "Celebrities" (> 5k connections), do NOT fan out. Instead, the Feed Service will "Pull" celebrity posts and merge them at read-time.

Data Processing

Feed Fanout Workers:
Subscribes to post-events.
Queries Connection Service for 1st-degree connections.
Updates Redis ZSETs for those connections.
Wrap Up

Advanced Topics

Trade-offs (Push vs Pull):
Push (Fan-out on Write): Fast reads, slow writes, high storage overhead. Great for most users.
Pull (Fan-out on Read): Fast writes, slow reads. Better for celebrities.
Decision: Hybrid approach to balance the load.
Bottleneck Analysis:
Hot Shards: High-profile users (e.g., Bill Gates) cause read hotspots. Mitigation: Cache their posts globally.
Graph Depth: Calculating 3rd-degree connections is expensive. Mitigation: Limit feed to 1st degree and "Suggested" to 2nd degree.
Reliability:
Use of Dead Letter Queues (DLQ) in Kafka to handle failed feed updates.
Circuit breakers (Hystrix/Resilience4j) on the Connection Service to prevent cascading failures during feed hydration.
Security:
Data at rest encryption for PII in Profile DB.
RBAC (Role-Based Access Control) for internal admin dashboards.