The Question

Global Video Streaming Platform Design

Design a high-scale video streaming platform similar to Netflix. The system must support global content distribution to 200M+ users, handle heavy-duty video transcoding for diverse device types, and provide low-latency discovery and playback experiences. Focus on how you would manage massive bandwidth requirements, ensure high availability of user metadata like watch history, and handle the ingestion of large video assets.

CDN

Cassandra

Redis

Kafka

Microservices

HLS/DASH

Adaptive Bitrate Streaming

Questions & Insights

Clarifying Questions

Scale & Traffic: What is the target scale? (e.g., 200M DAU, global footprint).

Functional Scope: Are we designing the internal content ingestion/transcoding pipeline, or strictly the user-facing discovery and playback?

Device Support: Do we need to support a wide range of bitrates and formats (Adaptive Bitrate Streaming) for various devices (Mobile, Smart TVs, Web)?

Regional Constraints: Does the system need to support multi-region active-active deployments for disaster recovery?

Content Library: What is the approximate size of the content library? (e.g., 10,000s of titles vs. millions).

Assumptions:

DAU: 200 Million.

Scope: Includes Video Ingestion (Transcoding), Discovery (Search/Browse), and Playback.

Traffic: Heavy read/stream traffic; writes are primarily watch history and internal content uploads.

Reliability: 99.99% Availability; global low-latency playback.

Thinking Process

The core challenge of Netflix is delivering high-bitrate video to millions of concurrent users with minimal buffering while maintaining a responsive discovery UI.

How do we handle the massive bandwidth of video delivery? We offload the heavy lifting to a Content Delivery Network (CDN) to move data closer to the edge.

How do we support diverse devices and network conditions? We implement a Transcoding Pipeline that generates multiple resolutions and bitrates (DASH/HLS).

How do we manage highly scalable metadata (Watch History/User Preferences)? We use a distributed NoSQL database (Cassandra) designed for high-write availability.

How do we ensure discovery is fast? We utilize a robust caching layer and an optimized Search service.

Bonus Points

Chaos Engineering: Mentioning the use of "Chaos Monkey" to proactively test system resilience by injecting failures in production.

Adaptive Bitrate Streaming (ABR): Implementing logic where the client switches video quality dynamically based on real-time bandwidth.

Open Connect (Custom CDN): Mentioning that for true Netflix scale, building a private CDN (placing hardware inside ISPs) is more cost-effective than using generic third-party providers.

Predictive Prefetching: Using machine learning to predict what a user might watch next and warming the cache at the edge.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can browse/search for movie titles.

Users can stream video content in various qualities (4K, HD, SD).

Users can maintain a "Watchlist" and "Watch History" (resume playback).

Content Creators/Admins can upload new videos for processing.

Scope Control:

In-scope: Video ingestion, metadata management, discovery, and streaming architecture.

Out-of-scope: Billing/Subscription management, User social features, Content recommendation algorithms (the ML model itself).

Non-Functional Requirements

Scale: Support 200M+ users and peak concurrent streaming of millions.

Latency: Discovery page load < 500ms; Video start time (Time to First Frame) < 2s.

Availability & Reliability: 99.99% uptime; playback should not fail if a single region goes down.

Consistency: Eventual consistency for "Watch History" is acceptable; metadata (titles) can be eventually consistent.

Fault Tolerance: Handle CDN failures and microservice outages gracefully.

Estimation

Traffic:

200M DAU. If 10% watch at peak = 20M concurrent streams.

Average bitrate: 5 Mbps (HD).

Total Bandwidth: 20M * 5 Mbps = 100 Tbps.

Storage:

50,000 titles. Each title has ~10 versions (resolutions/codecs).

1 title (all versions) ≈ 50GB.

Total Video Storage: 50,000 * 50GB = 2.5 PB.

Metadata Storage:

200M users 100 history entries/user 1KB/entry = 20 TB (Cassandra).

Blueprint

Concise Summary: A microservices-based architecture that separates the heavy-write/compute ingestion path from the high-read playback path, utilizing a global CDN for content delivery.

Major Components:

API Gateway: Handles authentication, rate limiting, and routes requests to downstream services.

Discovery Service: Manages browsing, search, and video metadata.

Playback Service: Provides the manifest files (URLs) for video streaming.

Video Ingestion (Transcoder): A worker-based system that processes raw uploads into multiple formats.

CDN (Open Connect): Geographically distributed servers that store and serve video files to users.

Simplicity Audit: This architecture avoids complex distributed transactions and focuses on decoupling the static video assets from the dynamic metadata.

Architecture Decision Rationale:

Why this architecture?: Microservices allow independent scaling of the "Discovery" (CPU intensive) vs "Playback" (IO intensive) vs "Ingestion" (Compute intensive) components.

Functional Requirement Satisfaction: Meets browsing via Discovery Service and streaming via Playback/CDN.

Non-functional Requirement Satisfaction: CDN ensures low latency; Cassandra ensures high availability and scalability for user data.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing

CDN: Use a mix of 3rd party (Cloudfront/Akamai) and Netflix Open Connect. Open Connect appliances are placed inside ISP data centers to reduce backbone traffic.

DNS: Use latency-based routing to direct users to the nearest CDN node.

Security & Perimeter

API Gateway: Handles TLS termination and JWT-based authentication. Implement regional rate limiting to prevent DDoS.

Service

Topology & Scaling

Stateless Services: Discovery and Playback services are deployed in containers (K8s/ECS) across multiple Availability Zones.

Scaling: Scale based on CPU and Request Count per Target.

API Schema Design

GET /v1/metadata/{videoId}: Returns movie details (REST).

GET /v1/playback/manifest/{videoId}: Returns HLS/DASH manifest containing segment URLs (REST).

POST /v1/history: Updates user watch position (Idempotent via offset).

Resilience & Reliability

Circuit Breaker: Used in the API Gateway for the Discovery Service; if search is down, show "Trending" static results.

Retries: Exponential backoff for metadata fetches.

Storage

Access Pattern

High read volume for Metadata; Very high write/read volume for Watch History.

Database Table Design

Metadata (PostgreSQL/NoSQL): video_id (PK), title, description, release_date, tags.

Watch History (Cassandra): user_id (Partition Key), video_id (Clustering Key), offset_seconds, last_updated.

Technical Selection

Cassandra: Chosen for Watch History because it handles high-volume time-series writes across multiple regions effectively.

S3: Object storage for the "Source of Truth" video files.

Distribution Logic

Shard Watch History by user_id to ensure all history for a user is co-located on the same partition.

Cache

Purpose & Justification: Reduce load on Metadata DB and speed up "Continue Watching" features.

Key-Value Schema: Key: metadata:{videoId}, Value: Serialized JSON. TTL: 24 hours.

Technical Selection: Redis (Cluster mode) for sub-millisecond latency.

Failure Handling: If Redis fails, fall back to the primary database (Graceful degradation).

Messaging

Purpose & Decoupling: Decouple the Ingestion Service from the Transcoding Workers.

Event / Topic Schema: Topic video-upload-tasks. Payload: {s3_path, video_id, target_resolutions[]}.

Technical Selection: Kafka for high throughput and durability of job state.

Data Processing

Processing Model: Asynchronous Batch Processing for video transcoding.

Processing DAG: Upload -> Split into Chunks -> Transcode Chunks (Parallel) -> Merge -> Manifest Generation -> CDN Push.

Technical Selection: Custom worker pool (or AWS Elemental MediaConvert style architecture) for massive compute parallelization.

Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency for watch history. If a user switches devices instantly, they might be off by a few seconds in their playback position, but the system remains highly available.

Reliability: Implement Graceful Degradation. If the Transcoding pipeline is backed up, prioritize lower resolutions first so the video is "live" sooner.

Bottleneck Analysis: The "Hot Video" problem (e.g., a new season of Stranger Things) is handled by the CDN. The Metadata DB bottleneck is solved by aggressive caching and Cassandra's linear scalability.

Security: Use Signed URLs for CDN access. This ensures that only authenticated users with a valid playback session can download video segments from the CDN.

Adaptive Bitrate (ABR): The system generates a .m3u8 (HLS) or .mpd (DASH) file. The client player monitors the download speed of each segment and automatically requests the next segment at a higher or lower bitrate.