The Question

Design a Scalable Short-Form Video Platform

Design a system similar to TikTok that supports high-volume video uploads, asynchronous video processing (transcoding), and a personalized recommendation feed for millions of users. The design should address low-latency video playback globally, efficient handling of 'viral' content, and the trade-offs between real-time feed updates and system performance. Discuss how you would handle massive storage requirements and ensure high availability of the feed service.

Kafka

Redis

Cassandra

CDN

HLS

DASH

FFMPEG

Spark

NoSQL

JWT

Questions & Insights

Clarifying Questions

What is the primary content format and scale? (Assumption: Short-form videos up to 60 seconds. Target 500M DAU with a 1:100 creator-to-viewer ratio).

How critical is the recommendation engine for the MVP? (Assumption: A "For You" feed is mandatory. We will implement a basic asynchronous scoring mechanism rather than a complex real-time deep learning model for the MVP).

What are the latency requirements for video playback? (Assumption: Start-up latency < 500ms. Global delivery via CDN is required).

Are social features like "Live Streaming" or "Real-time Messaging" in scope? (Assumption: No. Out-of-scope for MVP to satisfy YAGNI).

What is the consistency requirement for view counts and likes? (Assumption: Eventual consistency is acceptable for engagement metrics to prioritize availability).

Thinking Process

Core Bottleneck: The system is read-heavy (feed consumption) and write-intensive (video uploads/transcoding). The "Cold Start" problem for new videos is the primary challenge.

Strategy Steps:

Asynchronous Ingestion: Use a message-driven architecture to decouple video uploads from the heavy transcoding and metadata indexing process.

Edge-Heavy Delivery: Leverage CDNs for both caching video segments and terminating SSL/TLS closer to the user to reduce "Time to First Byte".

Pre-computed Feeds: Shift the heavy lifting of recommendations from "Read-time" to "Write-time" or "Background-time" using a ranking worker that populates a Redis-based Feed Cache.

Tiered Storage: Store raw videos in high-durability object storage and serve transcoded segments via optimized edge-friendly formats (HLS/DASH).

Bonus Points

Video Bitrate Adaptation: Implementing HLS/DASH with multiple resolution ladders to handle users on varying network conditions (3G vs. Fiber).

Vector Embeddings for Recs: Using a Vector Database (like Milvus or Pinecone) to perform K-Nearest Neighbor (KNN) searches for content similarity in the recommendation pipeline.

Quic/HTTP3 Protocol: Utilizing UDP-based protocols at the Edge to minimize head-of-line blocking for video segment streaming.

Write-back Cache for Metrics: Using a "Buffered Counter" pattern to batch "Like" and "View" increments in memory before flushing to the persistent DB to prevent hot-partition issues.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can record and upload short videos.

Users can scroll through a "For You" feed (infinite scroll).

Users can follow other creators and see a "Following" feed.

Users can "Like" videos.

Scope Control:

In-Scope: Video upload, transcoding, feed generation, and global delivery.

Out-of-Scope: Live streaming, video editing tools, ads platform, and direct messaging.

Non-Functional Requirements

Scale: Support 500M DAU and 5M video uploads per day.

Latency: Feed generation < 200ms; Video start-up < 500ms.

Availability & Reliability: 99.9% (High availability for the feed is more critical than upload availability).

Consistency: Eventual consistency for feed updates and engagement counts.

Fault Tolerance: Automatic retry on transcoding failure; multi-AZ storage for video blobs.

Estimation

Traffic Estimation:

Read QPS: 500M DAU * 40 videos/day = ~230k QPS (Read).

Write QPS: 5M uploads/day = ~60 QPS (Write - low QPS but high bandwidth).

Storage Estimation:

5M videos/day * 20MB (avg compressed) = 100 TB/day.

36.5 PB per year (before replication/multi-resolution overhead).

Bandwidth Estimation:

Ingress: 60 QPS * 20MB = 1.2 GB/s.

Egress: 230k QPS * 2MB (initial segment) = 460 GB/s (handled by CDN).

Blueprint

Concise Summary: A globally distributed video platform using an asynchronous "upload-and-notify" pipeline and a pre-computed feed cache for low-latency discovery.

Major Components:

CDN: Caches video segments (TS files) globally to reduce origin load.

Upload Service: Handles multipart uploads and persists raw files to S3.

Transcoding Pipeline: Decoupled via Kafka, converts raw video into multiple resolutions.

Feed Service: Fetches pre-ranked video IDs from Redis for immediate user consumption.

Metadata Store: A NoSQL database (Cassandra) storing video attributes and user relations.

Simplicity Audit: This design avoids complex real-time ranking by using a background worker to populate feeds, fulfilling the MVP need for speed without a massive ML infrastructure.

Architecture Decision Rationale:

Why this?: Short-form video success depends on "instant-on" playback and relevant content. Pre-computation and CDNs are the standard for achieving this at scale.

Functional Satisfaction: Covers the full lifecycle from creation to consumption.

Non-functional Satisfaction: Scalable via horizontal service scaling and decoupled via messaging; low latency via Redis and CDN.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

CDN: Use CloudFront or Akamai. Static assets (JS/CSS) and dynamic video segments (.ts) are cached. Use signed URLs for security.

DNS: Latency-based routing to the nearest regional Load Balancer.

Security & Perimeter:

API Gateway: Handles JWT-based Authentication.

Rate Limiting: Applied at the user level (e.g., 5 uploads/hour) to prevent spam.

Service

Topology & Scaling:

Stateless microservices deployed across multiple Availability Zones (AZs).

Scaling based on CPU/Request count for Feed Service; Scaling based on Queue Depth for Workers.

API Schema Design:

POST /v1/videos/upload: Multipart upload, returns video_id.

GET /v1/feed?type=foryou: Returns a list of video objects (URLs, metadata).

POST /v1/videos/{id}/like: Idempotent like action.

Resilience:

Circuit Breakers: If the Ranking Engine is down, Feed Service falls back to a "Global Popularity" static list.

Retries: Exponential backoff for video segment uploads.

Storage

Access Pattern: Heavy writes for metadata during upload; massive random reads for metadata during feed generation.

Database Table Design (Cassandra):

videos: video_id (PK), creator_id, description, s3_url, created_at.

user_videos: user_id (PK), video_id, timestamp. (For "Following" feed).

Technical Selection: Cassandra.

Rationale: High write throughput and easy linear scaling. Partitioning by video_id ensures no single hot node for metadata.

Distribution Logic: Sharded by user_id for user-specific data and video_id for global video metadata.

Cache

Purpose: Reducing latency for "For You" feeds.

Key-Value Schema:

Key: feed:foryou:{user_id}

Value: List<video_id> (Max 200 items).

TTL: 30 minutes.

Technical Selection: Redis. Use Redis Clusters for high availability.

Failure Handling: If Redis is empty, fetch a "warm start" list from the Metadata DB based on popularity.

Messaging

Purpose: Decouples the user-facing upload from the long-running transcoding and ML tasks.

Topic Schema: video-uploaded-event containing video_id and raw_s3_path.

Throughput & Partitioning: Partitioned by video_id to ensure ordering for specific video processing stages.

Technical Selection: Kafka. High durability and replayability for failed transcoding jobs.

Data Processing

Processing Model:

Transcoding: Worker pulls from Kafka, downloads from S3, uses FFMPEG to create 360p, 720p, 1080p versions, and pushes back to S3.

ML Ranking: Spark Streaming job aggregates user interactions (likes, views) to update user preference vectors.

Technical Selection: Spark/Flink for real-time ranking updates; FFMPEG-based custom workers for transcoding.

Wrap Up

Advanced Topics

Trade-offs: We chose Eventual Consistency for engagement counts (likes/views). A user might see 100k likes while another sees 100,005. This allows us to use a write-back cache and avoid locking the DB on every single "like" click.

Reliability: Transcoding is the most brittle part. We use a Dead Letter Queue (DLQ) in Kafka. If a video fails to transcode 3 times, it's flagged for manual review or lower-quality fallback.

Bottleneck Analysis: The "Following" feed is a fan-out problem. For the MVP, we use the "Pull" model (fetching from followed users at read-time) rather than "Push" (delivering to all followers' feeds) to save storage and simplify the architecture.

Optimization: Video Chunking. Instead of waiting for the full video to upload, the client can stream chunks to the Upload Service, which pipes them directly to S3. This improves the perceived speed for users on slow connections.