The Question
Design

Short-Form Video Platform

Design a short-form video platform similar to TikTok. The system should support video uploads, algorithmic feed personalization based on engagement signals, viral content distribution via CDN, and seamless playback for hundreds of millions of daily active users.
Cassandra
S3
Redis
CDN
Kafka
Questions & Insights

Thinking Process

To design TikTok at scale, the primary challenge is the massive mismatch between write-heavy video uploads (high compute/storage) and extremely read-heavy, low-latency feed consumption.
How do we handle high-volume video uploads without blocking the user? We use an asynchronous ingestion pipeline. The user uploads to a raw landing zone; we acknowledge immediately while a background worker handles transcoding and CDN distribution.
How do we achieve sub-100ms latency for the "For You" feed? We cannot calculate recommendations in real-time for every request. We use a hybrid approach: pre-computing a "candidate pool" of videos and using a lightweight ranking service to sort them at request time.
How do we scale the metadata for billions of videos? We utilize a distributed NoSQL store (e.g., Cassandra or DynamoDB) sharded by VideoID to handle the high write throughput of metadata and likes.
How do we minimize video playback "stutter"? Implementation of Adaptive Bitrate Streaming (HLS/DASH) and a geographically distributed Content Delivery Network (CDN) to serve content from the edge.

Bonus Points

QUIC Protocol: Mentioning the use of HTTP/3 (QUIC) to improve performance in high-packet-loss environments (common in mobile networks).
Write-Path Optimization: Using a "LSM-Tree" based storage engine for the metadata layer to handle the bursty nature of likes and comments during viral events.
Edge Side Rendering/Compute: Using Lambda@Edge to handle localized content filtering (e.g., regional censorship or language tagging) at the CDN level.
Cold/Hot Storage Separation: Moving older, non-viral videos to S3-Infrequent Access to optimize storage costs while keeping trending videos in NVMe-backed caches.
Design Breakdown

Functional Requirements

Users can upload short videos (up to 60s).
Users can view a personalized "For You" feed.
Users can like videos.
Users can follow other users.

Non-Functional Requirements

Low Latency: Feed loading and video start time must be < 200ms.
High Availability: The feed must be available even if the recommendation engine is lagging (fallback to "Trending").
Scalability: Support 1 Billion Daily Active Users (DAU).
Durability: Uploaded videos must not be lost.

Estimation

DAU: 1 Billion.
Uploads: 1% of users upload 1 video/day = 10 Million videos/day.
Storage: 10M videos * 10MB avg (compressed) = 100 TB/day.
Read Volume: 1B users * 50 videos/day = 50 Billion views/day.
Bandwidth (Egress): 50B views * 5MB (avg watched portion) / 86400s \approx 2.9 Tbps. This necessitates a heavy reliance on CDNs.

Blueprint

Concise Summary: A microservices architecture leveraging a decoupled ingestion pipeline for media and a pre-computed cache for feed delivery.
Major Components:
API Gateway: Routes traffic, handles authentication, and performs rate limiting.
Upload Service: Manages multi-part uploads and triggers the transcoding pipeline.
Media Pipeline: Transcodes raw video into multiple resolutions (720p, 1080p) and segments for HLS.
Feed Service: Aggregates video metadata from the cache to serve personalized content.
Recommendation Engine (Spark): Processes user interactions to generate candidate lists for feeds.
Simplicity Audit: This design avoids complex "real-time" graph traversals for every click, instead relying on asynchronous pre-computation (Recommendation Engine) and caching (Redis) to ensure the mobile app feels instantaneous.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: All services are containerized (Kubernetes) and scaled horizontally based on CPU/Memory usage. The Upload Service is IO-bound, while the Feed Service is compute-bound (ranking).
API Spec:
POST /v1/video/upload: Returns a pre-signed URL for direct S3 upload.
GET /v1/feed: Returns a JSON list of video metadata and CDN URLs.
POST /v1/action/like: Asynchronous event to update engagement metrics.

Storage

Data Model:
Video Metadata (Cassandra): Partition Key: video_id. Attributes: user_id, cdn_url, thumbnail_url, created_at, tags_vector.
User Actions (BigTable): Row Key: user_id#timestamp. Columns: action_type, video_id.
Database Logic: Metadata is stored in NoSQL for linear write scaling. Read patterns for the feed are optimized by the Feed Service which primarily queries by IDs retrieved from the Cache.

Cache

Implementation: Redis Clusters.
Data Structures: ZSET (Sorted Sets) per user. Key: user_feed:{user_id}, Value: video_id, Score: ranking_score.
TTL & Eviction: TTL of 24 hours. If a user is inactive, the feed is evicted to save memory.

Messaging

Implementation: Apache Kafka.
Topic Structure:
video_events: High-volume stream of views, likes, and skips.
upload_events: Triggers notifications or search indexing after transcoding is complete.
Delivery Guarantees: At-least-once delivery is sufficient for analytics and recommendation updates.

Data Processing

Implementation: Apache Spark Streaming.
DAG/Transformations:
Consume video_events from Kafka.
Aggregate engagement per video_id / user_interest_tag.
Update user profile vectors.
Generate new "Candidate Videos" (Collaborative Filtering).
Push top 100 IDs to Redis user_feed:{user_id}.
Wrap Up

Advanced Topics

Trade-offs: We trade Consistency for Availability in the Like/View counts. A user might see slightly different like counts across refreshes, but the feed will never be blocked by a locking database transaction.
Bottlenecks: The Transcoding process is the slowest part of the write path. We mitigate this by using a priority queue (e.g., verified creators get faster transcoding).
Failure Handling:
S3 Regional Failure: Multi-region replication for S3 buckets.
Redis Cache Miss: If Redis is empty, the Feed Service falls back to a "Global Trending" list from the Metadata DB.
Alternatives & Optimization:
Alternative: Instead of Spark, one could use Flink for lower-latency stateful stream processing if "real-time" trends (e.g., < 1 minute) are critical.
Optimization: Use Protobuf instead of JSON for the internal service-to-service communication and the feed API response to reduce payload size and parsing overhead on mobile devices.