The Question

Global Video Streaming Platform Design

Design a global video-on-demand (VOD) streaming service like Netflix. The system must handle hundreds of millions of users, providing low-latency video playback across varying network conditions. Key areas to address include: 1) The end-to-end video ingestion and transcoding pipeline. 2) A scalable metadata management system for millions of titles. 3) Global content delivery strategies to minimize buffering. 4) High availability for user profiles and watch history tracking under extreme write loads. Discuss the trade-offs between storage costs and playback performance, and how the system remains resilient during regional outages.

Cassandra

Redis

Kafka

Elasticsearch

FFmpeg

HLS

DASH

CDN

Kubernetes

gRPC

JWT

Questions & Insights

Clarifying Questions

Scale: What is the expected number of Daily Active Users (DAU) and the total number of titles in the library?

Content Type: Are we focusing exclusively on Video-on-Demand (VOD) or do we need to support Live Streaming?

Geography: Is this a global service or a regional one? (Affects CDN and multi-region strategy).

Functionality: Should the MVP include the recommendation engine and social features, or just core playback, search, and metadata?

Device Support: Do we need to optimize for specific bitrates/codecs (e.g., 4K, mobile-only)?

Assumptions:

Scale: 200M+ total users, 50M DAU.

Content: VOD only (Live is out of scope for MVP).

Geography: Global distribution requiring a heavy CDN presence.

Functionality: Focus on Playback, Metadata (titles, genres), Search, and User Profiles.

Transcoding: High-quality multi-bitrate support is required for adaptive bitrate streaming (ABR).

Thinking Process

To design a system of this magnitude, we must move from the source of the data to the eyes of the user:

How do we handle massive video storage and processing? We use an asynchronous, distributed transcoding pipeline to convert raw uploads into multiple formats and resolutions.

How do we ensure low-latency playback globally? We leverage a Geo-distributed Content Delivery Network (CDN) to cache video chunks as close to the user as possible.

How do we manage highly available metadata? We utilize a NoSQL wide-column store (like Cassandra) to handle massive read/write volumes across multiple regions without a single point of failure.

How do we handle traffic spikes (e.g., a new "Stranger Things" release)? We implement stateless microservices behind a robust API Gateway with aggressive caching and circuit breakers.

Bonus Points

Chaos Engineering: Mention "Simian Army" concepts—injecting failures in production to ensure the system is resilient to regional outages.

Open Connect (Custom CDN): Discussing the strategy of placing ISP-local hardware appliances to bypass the public internet for video delivery.

Adaptive Bitrate (ABR): Explain how the client logic switches between manifests (HLS/DASH) based on real-time network throughput.

Multi-Region Active-Active: Moving beyond simple failover to a model where any region can serve any user, requiring complex data replication (e.g., Cassandra Multi-Region).

Design Breakdown

Functional Requirements

Core Use Cases:

Users can browse/search for video titles.

Users can stream videos in various qualities (ABR).

Admins/Content providers can upload new titles.

Users have profiles and watch history.

Scope Control:

In-scope: Video ingestion/transcoding, metadata management, global delivery, search.

Out-of-scope: Live streaming, billing/payment processing integration, complex recommendation ML models (MVP focus is infrastructure).

Non-Functional Requirements

Scale: Support 50M DAU and petabytes of video storage.

Latency: Startup latency < 200ms; no buffering during playback.

Availability & Reliability: 99.99% availability; the "Play" button must always work.

Consistency: Eventual consistency is acceptable for "Watch History," but strong consistency is preferred for user account changes.

Fault Tolerance: Resilience against CDN node failures or regional data center outages.

Security: Content protection via DRM (Digital Rights Management) and TLS for all API communication.

Estimation

Traffic:

50M DAU × 1 hour average watch time = 50M hours/day.

Average bitrate: 2 Mbps (720p/1080p mixed).

Total Bandwidth: 50M 3600s 2 Mbps = 360 Pb/day (mostly served from CDN).

Peak QPS for Metadata: ~100k - 500k QPS (browsing/searching).

Storage:

10k Titles × 5 resolutions/codecs × 10GB/movie = ~500 TB total library storage.

Bandwidth:

Ingress: Relatively low (uploading new content).

Egress: Massive (Terabits per second aggregate).

Blueprint

Concise Summary: The system utilizes an asynchronous ingestion pipeline to transcode videos into S3, indexed by a Cassandra metadata layer, and delivered via a global CDN.

Major Components:

Transcoding Pipeline: Distributed workers that split, encode, and package raw video into HLS/DASH segments.

Metadata Service: A set of microservices managing movie details, user profiles, and watch history.

CDN (Open Connect): A distributed network of servers caching video segments at the edge to minimize latency.

API Gateway: Handles authentication, routing, and rate limiting for all client requests.

Simplicity Audit: This design avoids complex "just-in-time" transcoding (which is CPU intensive) by pre-transcoding into all necessary formats, trading storage for performance.

Architecture Decision Rationale:

Why this architecture?: Separating the "Control Plane" (API/Metadata) from the "Data Plane" (Video Streaming) allows each to scale independently.

Functional Satisfaction: Users can search via the metadata service and stream via the CDN.

Non-functional Satisfaction: Cassandra provides the high availability needed for global scale, while the CDN handles the heavy egress requirements.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

Open Connect: Netflix-style custom CDN. For MVP, use a mix of CloudFront/Akamai and GSLB (Global Server Load Balancing).

DNS: Use Latency-based routing to direct users to the nearest CDN POP (Point of Presence).

Security & Perimeter:

API Gateway: Handles JWT validation and SSL termination.

WAF: Protects against SQL injection on metadata queries and DDoS on the playback API.

Service

Topology & Scaling:

Stateless microservices deployed on Kubernetes (EKS/GKE) across multiple Availability Zones.

Scaling based on CPU/Request count.

API Schema Design:

GET /v1/titles/{id}: Returns metadata and CDN manifest URL.

POST /v1/history: Updates user's progress (Asynchronous).

GET /v1/search?q=...: gRPC for internal service communication, REST for external.

Resilience:

Hystrix-style Circuit Breakers: If the search service is down, the UI gracefully hides the search bar but allows playback from "Continue Watching."

Retries: Exponential backoff for metadata fetches.

Storage

Access Pattern:

Metadata: Heavy Read (Browsing), Low Write (Updates).

Watch History: Heavy Write (Heartbeats every 10 seconds).

Database Table Design:

Cassandra (Titles): Partitioned by category, Clustered by rating/release_date.

Cassandra (UserHistory): Partitioned by user_id, Clustered by timestamp.

Technical Selection:

Cassandra: Chosen for its linear scalability and multi-region replication capabilities.

Elasticsearch: Required for the "Search Service" to support fuzzy matching and prefix search on titles/actors.

S3: Industry standard for durable, massive-scale blob storage.

Cache

Purpose: Reduce load on Cassandra for popular titles and trending lists.

Key-Value Schema:

Key: title_id, Value: JSON(metadata). TTL: 24 hours.

Key: trending_list, Value: List[ID]. TTL: 10 minutes.

Technical Selection: Redis (Cluster mode) for sub-millisecond response times.

Messaging

Purpose: Decouple video ingestion from the CPU-heavy transcoding process.

Event Schema: VideoUploadedEvent containing S3_URI and target_profiles.

Technical Selection: Kafka. It allows multiple consumers (Transcoder, Content Analytics, Notification Service) to process the same upload event.

Data Processing

Processing Model: Distributed workers pulling tasks from Kafka.

Processing DAG:

Chunking: Split 2GB movie into 5MB chunks.

Transcoding: Parallel encoding of chunks into H.264, H.265, VP9.

Merging: Reassembling chunks into manifests (M3U8/MPD).

Technical Selection: Custom worker pool using FFmpeg for video processing logic.

Infrastructure (Optional)

Observability:

Prometheus/Grafana for metric monitoring.

ELK Stack for log aggregation.

Zipkin/Jaeger for tracing the request flow from Gateway to DB.

Wrap Up

Advanced Topics

Trade-offs:

Consistency vs Availability: We choose Availability (AP) for the Playback path. If a user's watch history is slightly out of sync across devices due to eventual consistency, it's better than the video not playing at all.

Reliability:

Redundant Storage: S3 cross-region replication ensures that even if an entire AWS region fails, the source video files are safe.

Optimization:

VMAF (Video Multimethod Assessment Fusion): Mentioning Netflix's metric to optimize bitrate vs. perceived quality during transcoding to save bandwidth.

Security:

Widevine/FairPlay DRM: Ensuring video chunks are encrypted and only playable with a valid license key fetched via the Auth service.