The Question
Design

Design a Scalable Photo-Sharing Platform (Instagram)

Design a system similar to Instagram that allows users to upload photos, follow other users, and view a personalized feed of content from their social graph. The system must support 100 million daily active users (DAU) and provide low-latency feed retrieval. Discuss how you would handle the massive read-to-write ratio, manage large-scale media storage, and solve the 'Celebrity' fan-out problem while ensuring high availability and system reliability.
PostgreSQL
Redis
S3
Kafka
CDN
ZSET
Signed URLs
Microservices
Questions & Insights

Clarifying Questions

Scale: What is the target scale for the MVP?
Assumption: 100 Million Daily Active Users (DAU), 1 photo upload per user/day average, 50-100 feed views per user/day.
Media Support: Do we need to support video and stories for the MVP?
Assumption: Focus strictly on high-resolution photos and captions to ensure simplicity and fast time-to-market (YAGNI for video).
Feed Logic: Should the feed be algorithmic or chronological?
Assumption: Reverse-chronological for the MVP to minimize complex data science/ML infrastructure.
Interactions: Are likes and comments in-scope?
Assumption: Yes, but strictly as atomic interactions. Search and Discovery (Explore) are out-of-scope for the MVP.
Follower Model: Is there a limit on the number of followers?
Assumption: We must handle "Celebrity" (Hot-key) issues where a user has millions of followers.

Thinking Process

Core Bottleneck: The primary challenge is the Read-to-Write ratio (100:1) and the Feed Generation latency for users following thousands of accounts.
Key Questions:
How do we store and serve massive amounts of binary media efficiently?
Should we use a "Push" (Fan-out) or "Pull" model for feed delivery?
How do we prevent system degradation when "Celebrities" post new content?
How do we ensure high availability for feed viewing while maintaining consistency for the follow graph?

Bonus Points

Hybrid Feed Architecture: Implement a "Push" model for regular users and a "Pull" model for celebrities to prevent "Fan-out storms" and write amplification.
Edge Intelligence: Using Lambda@Edge or Cloudflare Workers for on-the-fly image resizing and format conversion (WebP/AVIF) to reduce egress costs.
Storage Tiering: Implementing a TTL-based or LRU-based migration from SSD-backed storage to cold S3 Glacier for historical photos (1+ year old).
Consistency vs. Availability: Using Eventual Consistency for "Likes" counters but Strong Consistency for the "Follow" relationship to prevent UX glitches.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can upload photos with captions.
Users can follow/unfollow other users.
Users can view a reverse-chronological feed of photos from people they follow.
Users can like/comment on photos.
Scope Control:
In-scope: Image upload, Feed generation, Follow system, Likes/Comments.
Out-of-scope: Direct Messaging (DM), Stories, Reels/Video, Explore/Search, Photo Editing filters (handled client-side).

Non-Functional Requirements

Scale: Support 100M DAU and storage for billions of images.
Latency: Feed loading should be < 200ms (P95); Image upload should be < 2s for 5MB files.
Availability & Reliability: 99.9% availability (SLA); No data loss for uploaded photos.
Consistency: Eventual consistency for feed and likes; high consistency for follow relationships.
Security & Privacy: Private vs. Public profile support; secure media delivery via Signed URLs.

Estimation

Traffic Estimation:
Writes (Uploads): 100M users * 1 photo/day ≈ 1,150 QPS.
Reads (Feed): 100M users * 50 views/day ≈ 58,000 QPS.
Storage Estimation:
Image Storage: 100M photos/day * 2MB/photo (including thumbnails) = 200 TB/day.
Metadata: 100M rows/day * 1KB/row ≈ 100 GB/day.
Bandwidth Estimation:
Ingress: 1,150 QPS * 2MB ≈ 2.3 GB/s.
Egress: 58,000 QPS (10 photos/page 200KB thumbnail) ≈ 116 GB/s.

Blueprint

Concise Summary: A microservices-based architecture utilizing a hybrid feed delivery system (Push for users, Pull for celebrities) with S3 for media and PostgreSQL for relational metadata.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing.
Media Service: Manages photo uploads, generates thumbnails, and interfaces with S3.
Feed Service: Aggregates posts from followed users and serves them to the client.
Follow Service: Manages the social graph (User A follows User B).
Notification Service: Handles async tasks like push notifications for likes/follows.
Simplicity Audit: Avoids complex ML-based ranking and heavy data processing (Spark) in favor of simple Redis-based caching and relational indexing for the MVP.
Architecture Decision Rationale:
Why this?: Sharded PostgreSQL provides ACID for the follow graph, while Redis provides the low-latency required for feed retrieval.
Functional: Meets all core posting and viewing needs.
Non-functional: Scalable via horizontal sharding and highly available via CDN-cached media.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Use Amazon CloudFront to cache images globally.
Use S3 Signed URLs for secure, time-limited access to private photos.
Security & Perimeter:
API Gateway (Kong/Envoy): Handles OAuth2/JWT validation.
Rate Limiting: Fixed-window limits at the User ID level to prevent scraping/spam.

Service

Topology & Scaling:
Stateless services deployed in multiple Availability Zones (AZs).
Auto-scaling based on CPU (for Image Processing) and Request Count (for Feed Service).
API Schema Design:
POST /v1/media/upload: Protocol: Multipart/REST. Returns media_id.
GET /v1/feed: Protocol: REST. Request: limit, cursor. Returns: List<Post>.
POST /v1/follow/{user_id}: Idempotent. Protocol: REST.
Resilience & Reliability:
Circuit Breaker: Used in Feed Service to fallback to a "Global Popular Feed" if the personalized cache is unavailable.
Retries: Exponential backoff for media uploads.

Storage

Access Pattern:
Metadata: High read/write (SQL).
Media: Write once, Read many (Object Storage).
Database Table Design:
Table: `users: id (PK), username, created_at.
Table: `posts: id (PK), user_id (FK), media_url, caption, created_at.
Table: `follows: follower_id, followee_id (Composite PK).
Technical Selection: PostgreSQL with sharding on user_id. Rationale: Relational integrity is vital for the follow graph.
Distribution Logic: Hash-based sharding to prevent hotspots, though "Celebrity" profiles stay on a single shard (mitigated via Cache).

Cache

Purpose & Justification: Feed retrieval from SQL is O(N \log M) across many shards; Redis reduces this to O(1).
Key-Value Schema:
Key: feed:{user_id}.
Value: Redis Sorted Set (ZSET) storing post_id as member and timestamp as score.
Failure Handling: If Redis fails, the system performs a live query against PostgreSQL (degraded latency).

Messaging

Purpose & Decoupling: Decouples photo upload from feed updates and thumbnail generation.
Event Schema: PostCreatedEvent: {post_id, user_id, timestamp, media_url}.
Throughput & Partitioning: Kafka partitioned by user_id to ensure chronological processing for a specific user's posts.
Technical Selection: Kafka. Rationale: Persistence allows for "replaying" events if the Feed Cache needs to be rebuilt.

Data Processing

Processing Model: Asynchronous Workers.
Processing DAG:
Trigger on S3 Upload.
Generate multiple resolutions (Thumb, Medium, High).
Write metadata to PostgreSQL.
Trigger Feed Fan-out (Push to followers' Redis ZSETs).
Technical Selection: Python/Celery or Go Workers. Simple and sufficient for MVP.

Infrastructure (Optional)

Observability:
Prometheus/Grafana for QPS and Latency.
ELK Stack for distributed log tracing via request_id.
Wrap Up

Advanced Topics

Trade-offs: We chose Eventual Consistency for Feed Fan-out. A user might not see their own post in their feed for a few hundred milliseconds, but the system stays highly available.
Fan-out Strategies:
Push: For users with < 50k followers, push the post_id to all follower caches.
Pull: For "Celebrities" (> 50k followers), do not fan out. Instead, merge the celebrity's posts into the follower's feed at request time (Hybrid approach).
Bottleneck Analysis:
Hot Shards: High-traffic users. Mitigated by using a CDN for media and a Hybrid Feed model.
Storage Growth: Sharding PostgreSQL and using S3 Lifecycle policies are critical for 10x growth.
Security: All traffic over TLS 1.3. PII (emails/phone) encrypted at rest using AES-256.