The Question
DesignScalable Short-Video Platform (TikTok)
Design a high-scale, global short-video social platform capable of supporting 100M+ Daily Active Users. The system must handle high-throughput video uploads, automated asynchronous video processing (transcoding), and provide a low-latency, personalized 'For You' feed. Address how you would manage petabyte-scale storage, optimize video delivery for global users, and ensure the recommendation engine remains responsive under heavy load. Discuss trade-offs between consistency and availability in a massive social graph.
S3
Cassandra
Redis
Kafka
Flink
CDN
HLS
DASH
gRPC
Snowflake ID
Questions & Insights
Clarifying Questions
Scale: What is the target scale for Daily Active Users (DAU) and the expected upload-to-view ratio?
Content: What is the maximum video duration and file size? Do we support high-definition (4K) or just mobile-optimized resolutions?
Personalization: Does the MVP require a real-time recommendation engine ("For You" page) or a simple chronological feed?
Geographic Distribution: Is this a global launch or regional? This impacts CDN and data residency strategies.
Interactions: Are social features (likes, comments, shares, follows) required for the MVP?
Assumptions:
DAU: 100 Million.
Traffic: 1:100 write-to-read ratio (1 upload per 100 views).
Video: Max 60 seconds, ~10-20MB per file.
Personalization: An ML-driven "For You" feed is the core value proposition and must be included.
Storage: Petabyte-scale object storage for raw and transcoded videos.
Thinking Process
To design a short-video platform at scale, we must solve for high-bandwidth ingestion, massive storage, and ultra-low latency playback.
Core Bottleneck 1: Video Ingestion & Processing. How do we handle high-concurrency uploads and ensure videos are playable across all devices? (Solution: Async Transcoding Pipeline + S3 + Kafka).
Core Bottleneck 2: Feed Latency. How do we serve a personalized feed to 100M users in <200ms? (Solution: Pre-computed Feed Cache in Redis).
Core Bottleneck 3: Global Delivery. How do we prevent buffering? (Solution: Multi-tier CDN and Edge Caching).
Core Bottleneck 4: Data Scalability. How do we store trillions of metadata rows? (Solution: Wide-column NoSQL like Cassandra).
Bonus Points
QUIC Protocol: Use QUIC (HTTP/3) for video uploads to improve reliability over lossy mobile networks.
Adaptive Bitrate Streaming (ABS): Implement HLS/DASH with multiple resolution ladders to dynamically switch quality based on user bandwidth.
Vector Embeddings: Use a Vector Database (e.g., Milvus/Pinecone) for the recommendation engine to perform similarity searches on user/video embeddings.
Write-Back Cache: Use a write-back strategy for video view counts to protect the database from hot-key contention.
Design Breakdown
Functional Requirements
Core Use Cases:
Users can upload short videos (up to 60s).
Users can view a personalized "For You" feed.
Users can follow other creators.
Users can like/interact with videos.
Scope Control:
In-scope: Video upload, transcoding, feed generation, basic interactions.
Out-of-scope: Live streaming, video editing tools, monetization/ads, real-time messaging.
Non-Functional Requirements
Scale: Handle 100M+ DAU and 1M+ concurrent video streams.
Latency: Feed loading and video start time (TTFB) should be <200ms.
Availability: 99.99% availability (High availability for the Feed service is critical).
Consistency: Eventual consistency for likes, follower counts, and comment counts.
Fault Tolerance: System must survive the loss of an entire Availability Zone (AZ).
Security: Content moderation hooks and secure URL signatures for CDN access.
Estimation
Traffic Estimation:
100M DAU.
Uploads: 1% of users upload 1 video/day = 1M uploads/day (~12/sec avg, ~100/sec peak).
Feed Reads: 100M users * 20 videos/day = 2B views/day (~23k/sec avg, ~100k/sec peak).
Storage Estimation:
1M videos * 15MB (avg) = 15TB/day.
1 Year = ~5.4PB (before replication/transcoding).
Transcoding (3 resolutions) = ~3x storage = ~16PB/year.
Bandwidth Estimation:
Ingress: 100 uploads/sec * 15MB = 1.5 GB/s (12 Gbps).
Egress: 100k views/sec 15MB (cached/streamed) = 1.5 TB/s (12 Tbps) - Heavy reliance on CDN*.
Blueprint
The architecture utilizes an event-driven approach for video processing and a pre-computed cache for feeding.
API Gateway: Central entry point for auth, rate-limiting, and routing.
Upload Service: Handles multipart uploads and persists raw files to S3.
Transcoder Service: Async workers that convert videos into multiple formats (HLS/DASH).
Feed Service: Aggregates content from the ML-driven pre-computed cache.
Metadata DB (Cassandra): Stores video/user metadata with high write throughput.
ML Engine: Processes user interactions to update feed recommendations.
Simplicity Audit: We use S3 for storage and Cassandra for metadata to avoid the complexity of manual sharding in RDBMS. Redis serves as the "speed layer" for the feed.
Architecture Decision Rationale:
Cassandra is chosen for metadata because its LSM-tree structure handles massive write volumes (likes/views) better than traditional SQL.
Kafka decouples the ingestion from the heavy compute of transcoding, ensuring the user isn't blocked by processing.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing:
CDN: Multi-vendor CDN strategy (Cloudflare/Akamai) to deliver video segments (.ts files) via HLS.
Geo-DNS: Routes users to the nearest Point of Presence (PoP).
Security:
WAF: Protects against DDoS and scrapers.
Signed URLs: Upload/Download services generate short-lived tokens to prevent unauthorized S3 access.
Service
Upload Service:
Supports Multipart Upload for large files and resume-ability.
Generates a
VideoID using a distributed ID generator (Snowflake).Feed Service:
Protocol: gRPC for internal service communication; REST/JSON for the client.
Strategy: If the user's Redis cache is empty, it falls back to a "Popularity-based" cold-start feed from Cassandra.
Resilience:
Circuit Breakers: Applied to the ML Engine to prevent the Feed Service from hanging if recommendations are slow.
Retries: Exponential backoff for video metadata writes to Cassandra.
Storage
Access Pattern: High write for interaction logs; High read for video metadata.
Database Table Design (Cassandra):
videos: video_id (K), creator_id, s3_url, thumbnail_url, timestamp, description.user_feed: user_id (K), video_id (S), score.Technical Selection:Cassandra for linear scalability and high availability.
Distribution Logic: Shard
videos by video_id. Shard user_feed by user_id.Cache
Purpose: Store the pre-computed "For You" list for every active user.
Key-Value Schema:
Key:
feed:{user_id}Value: List of
video_id (e.g., [v123, v456, v789]).TTL: 24 hours for active users; 1 hour for inactive.
Failure Handling: If Redis is down, the system fetches a "Global Hot" list from Cassandra.
Messaging
Purpose: Decouple video upload from transcoding and ML indexing.
Event Schema:
UploadEvent { video_id, user_id, raw_s3_path, timestamp }.Throughput: Kafka partitions scaled to handle peak upload bursts (~1k events/sec).
Failure Handling: Dead-letter queues (DLQ) for failed transcoding jobs.
Data Processing
Processing Model: Flink for real-time interaction processing; Spark for batch ML model training.
Processing DAG: Kafka Source -> Feature Extraction -> ML Model Scoring -> Redis Sink (Feed update).
Technical Selection:Flink for sub-second latency in updating the "For You" feed based on the video just watched.
Wrap Up
Advanced Topics
Trade-offs (PACELC): We prioritize Availability (A) and Partition Tolerance (P) over Consistency (C). A user seeing a "Like" count off by a few digits is acceptable; the feed being down is not.
Reliability:
Transcoding Retries: If a worker fails, another picks up the message from Kafka.
S3 Cross-Region Replication: Critical for disaster recovery.
Bottleneck Analysis:
Hot Creators: Viral videos create "hot partitions" in Cassandra. Mitigation: Use a write-back cache in Redis for incrementing view counts before flushing to DB.
Security:
Moderation: Use AWS Rekognition or a custom CNN model at the end of the transcoding pipeline to flag NSFW content before it hits the feed.
Distinguishing Insights:
Video Tail-Latency: TikTok's "smoothness" comes from the app pre-fetching the next 2-3 videos in the feed background while the user is watching the current one. The API must support pagination and pre-fetch hints.
Edge Compute: Move the "Video URL Signing" logic to the CDN Edge (Cloudflare Workers) to reduce API Gateway load.