The Question

Design a Professional Social Network Feed and Connection System

Design the core architecture for a professional networking platform like LinkedIn. The system must support user profiles, a bidirectional connection graph (1st and 2nd degree), and a highly scalable news feed. Focus on how you would handle a scale of 300 million daily active users, including strategies for feed generation (push vs. pull), handling high-profile 'power users', and ensuring low-latency retrieval of content from a user's professional network. Explain your choice of data stores for both profile metadata and the connection graph.

PostgreSQL

Neo4j

Cassandra

Redis

Kafka

Elasticsearch

JWT

Kubernetes

Questions & Insights

Clarifying Questions

Scale and Traffic: What is the expected scale (DAU/MAU)? Assumption: 300M DAU, 800M total users.

Core Features: Which features are critical for the MVP? Assumption: User profiles, connection graph (1st/2nd degree), post creation, and a personalized news feed.

Feed Type: Is the feed chronological or ranked? Assumption: A hybrid ranked feed where recent activity from 1st-degree connections is prioritized.

Search: Do we need full-text search? Assumption: Yes, for people and posts.

Consistency: Is strong consistency required for connections? Assumption: Eventual consistency is acceptable for the feed and connection counts, but strong consistency is preferred for profile edits.

Thinking Process

The Connection Bottleneck: How do we efficiently query a graph of 800M nodes and billions of edges to generate a feed?

The Feed Fan-out Challenge: Should we use "Push" (Fan-out on Write) or "Pull" (Fan-out on Read)?

Hybrid Retrieval Strategy: How do we handle "Power Users" (celebrities with millions of followers) without crashing the fan-out workers?

Graph Traversal: Implementing "People You May Know" using 2nd-degree connection paths.

Bonus Points

Graph Partitioning: Discussing how to shard a graph database (e.g., using consistent hashing on UserID) to minimize cross-shard joins.

Feed Pre-computation Strategy: Using a "Read-ahead" buffer for active users to keep the P99 latency low.

TAO (The Associations Object): Mentioning Facebook/LinkedIn style graph caching layers for high-performance edge lookups.

Geo-sharding: Placing profile data and feed caches closer to the user's geographic region to reduce RTT.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can create and update professional profiles.

Users can send/accept connection requests (bidirectional).

Users can create posts (text, images).

Users can view a news feed of 1st-degree connections' activity.

Scope Control:

In-scope: Profile, Graph, Feed, Search.

Out-of-scope: Messaging (IM), Job Board, Premium Subscriptions, Ads engine.

Non-Functional Requirements

Scale: 300M DAU, 1B+ connections.

Latency: Feed loading < 200ms; Profile updates < 500ms.

Availability: 99.99% (Highly available for feed consumption).

Consistency: Eventual consistency for the feed (seconds of lag is okay); Strong consistency for profile changes.

Fault Tolerance: Region-level failover for the read-heavy feed service.

Estimation

Traffic Estimation:

300M DAU.

Feed Views: 10/day/user = 3B views/day ≈ 35k QPS.

Posts: 1% of users post daily = 3M posts/day ≈ 35 QPS.

Storage Estimation:

Profiles: 1B 10KB = 10 TB**.

Graph Edges: 1B users 500 connections * 16 bytes = 8 TB**.

Posts (Text): 3M/day 1KB * 365 days * 3 years ≈ 3.3 TB**.

Bandwidth Estimation:

Outgoing (Feed): 35k QPS 50KB (feed metadata) ≈ 1.75 GB/s**.

Blueprint

This design focuses on a microservices architecture optimized for high read-to-write ratios. The core is the Connection Service using a Graph Database for relationship management and a Feed Service utilizing a hybrid fan-out strategy.

API Gateway: Entry point for Auth, Rate Limiting, and Routing.

Profile Service: Manages professional identity data (RDBMS).

Connection Service: Manages the professional graph (Graph DB).

Post Service: Handles content creation and storage.

Feed Service: Aggregates content from connections to build the user's wall.

Search Service: Provides full-text search capabilities (Elasticsearch).

Simplicity Audit: We use a standard RDBMS for profiles to ensure data integrity and a specialized Graph DB for connections to avoid complex SQL recursive joins. We use Redis for feed caching to meet latency targets.

Architecture Decision Rationale:

Why Graph DB?: Social networks are inherently graph-based. Relational databases struggle with "N-degree" depth queries.

Functional Satisfaction: Covers profiles, networking, and content.

Non-functional Satisfaction: Redis-based feed pre-computation ensures sub-200ms latency.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Traffic Routing: Latency-based DNS routing to the nearest regional data center.

API Gateway:

Handles JWT-based authentication.

Implements leaky bucket rate limiting (100 requests/sec per user).

SSL Termination.

Service

Profile Service:

Stateless REST API.

Scaling: Horizontal scaling based on CPU usage.

API: GET /v1/profiles/{id}, PUT /v1/profiles/{id}.

Connection Service:

Manages "Follow" and "Connect" actions.

API: POST /v1/connections/request, GET /v1/connections/{id}/degrees/2.

Feed Service:

Pulls pre-computed feed IDs from Redis.

Hydrates IDs with post content from Post Service.

Fallback: If Redis is empty, perform a "Pull" from the Connection Service.

Storage

Profile DB (PostgreSQL):

Table: Users (user_id PK, name, bio, experience_json, location).

Indexing: B-Tree on user_id.

Connection DB (Neo4j/AWS Neptune):

Nodes: User.

Edges: CONNECTED_TO (bidirectional), FOLLOWS (unidirectional).

Post DB (Cassandra):

Chosen for high write throughput.

Table: Posts (post_id, author_id, content, timestamp).

Partition Key: author_id (allows fast retrieval of a single user's posts).

Cache

Feed Cache (Redis):

Purpose: Store the list of post_ids for a user's feed.

Schema: Key: feed:{user_id}, Value: ZSET <timestamp, post_id>.

TTL: 72 hours.

Eviction: LRU.

Failure Handling: If Redis fails, the system falls back to querying the Post DB for 1st-degree connections (slower but functional).

Messaging

Fanout Message Queue (Kafka):

Purpose: When a post is created, the Post Service produces a message.

Topic: post-events.

Consumers: Fanout Workers.

Strategy:

For active users (< 5k connections), push post_id to all connection caches.

For "Celebrities" (> 5k connections), do NOT fan out. Instead, the Feed Service will "Pull" celebrity posts and merge them at read-time.

Data Processing

Feed Fanout Workers:

Subscribes to post-events.

Queries Connection Service for 1st-degree connections.

Updates Redis ZSETs for those connections.

Wrap Up

Advanced Topics

Trade-offs (Push vs Pull):

Push (Fan-out on Write): Fast reads, slow writes, high storage overhead. Great for most users.

Pull (Fan-out on Read): Fast writes, slow reads. Better for celebrities.

Decision: Hybrid approach to balance the load.

Bottleneck Analysis:

Hot Shards: High-profile users (e.g., Bill Gates) cause read hotspots. Mitigation: Cache their posts globally.

Graph Depth: Calculating 3rd-degree connections is expensive. Mitigation: Limit feed to 1st degree and "Suggested" to 2nd degree.

Reliability:

Use of Dead Letter Queues (DLQ) in Kafka to handle failed feed updates.

Circuit breakers (Hystrix/Resilience4j) on the Connection Service to prevent cascading failures during feed hydration.

Security:

Data at rest encryption for PII in Profile DB.

RBAC (Role-Based Access Control) for internal admin dashboards.