The Question

Scalable Audio Streaming Platform Design

Design a global music streaming service similar to Spotify. The system must support a library of 100 million tracks and 500 million users. Key challenges include low-latency audio delivery (minimal buffering), efficient search across a massive catalog, and high-throughput tracking of playback events for royalty and analytics purposes. Address how you would handle global content distribution, metadata management, and the trade-offs between consistency and availability in user-generated content like playlists.

PostgreSQL

Redis

Kafka

Elasticsearch

CDN

Flink

Kubernetes

gRPC

Questions & Insights

Clarifying Questions

Scale: What is the target Monthly Active User (MAU) count and song library size? (Assumption: 500M MAU, 100M tracks).

Audio Quality: Do we need to support multiple bitrates/formats (e.g., Ogg Vorbis, AAC) and lossless? (Assumption: Support 96kbps, 160kbps, and 320kbps for adaptive streaming).

Offline Mode: Is offline playback (encrypted local caching) required for the MVP? (Assumption: No, focus on real-time streaming).

Social/Discovery: Are real-time "Friend Activity" or complex ML recommendations in scope? (Assumption: MVP focuses on Search, Playback, and Playlists only).

Latency: What is the target "Time to First Byte" (TTFB) for audio playback? (Assumption: < 200ms to ensure a "snappy" feel).

Thinking Process

The Core Bottleneck: Delivering high-bandwidth audio binary data globally with zero jitter and minimal buffering.

The Strategy:

How do we store and serve 100M+ high-quality audio files efficiently? (Object Storage + Multi-layered CDN).

How do we handle the high-read volume for track metadata and search? (Read-through Caching + Search Indices).

How do we ensure seamless playback across network transitions? (Chunked streaming via HTTP Range requests).

How do we track user "Plays" for royalty calculations and history? (Async event streaming via Kafka).

Bonus Points

Adaptive Bitrate Streaming (ABS): Implementing logic similar to HLS/DASH where the client switches audio quality dynamically based on network bandwidth.

Content-Addressable Storage (CAS): Using hash-based IDs for audio files to eliminate duplicate uploads and ensure data integrity.

Geographic Request Steering: Using Anycast DNS to route users to the nearest "Edge POP" (Point of Presence) for audio chunks.

Write-optimized Metadata: Using a LSM-tree based storage or optimized NoSQL for user "Like" actions to handle massive write bursts during global song releases.

Design Breakdown

Functional Requirements

Core Use Cases:

Users can search for tracks, artists, and albums.

Users can stream audio tracks with minimal buffering.

Users can create, update, and follow playlists.

System tracks "Play Counts" for history and royalties.

Scope Control:

In-scope: Music streaming, metadata management, search, and simple playlisting.

Out-of-scope: Podcasts, Video, Real-time social feeds, AI recommendations (MVP uses basic popularity).

Non-Functional Requirements

Scale: Support 100M+ songs and 500M users.

Latency: Playback start < 200ms; Search results < 100ms.

Availability: 99.99% availability (Audio must play even if recommendation services are down).

Consistency: Eventual consistency for playlists; Strong consistency for user account/subscription status.

Fault Tolerance: Regional failover for metadata; multi-CDN strategy for audio delivery.

Estimation

Traffic Estimation:

500M MAU -> 50M DAU.

Avg 10 songs/day/user = 500M play requests/day (~6,000 QPS).

Peak QPS (3x): 18,000 QPS.

Storage Estimation:

100M songs 3 formats 5MB/song = 1.5 Petabytes for audio.

Metadata: 100M songs * 1KB/row = 100GB.

Bandwidth Estimation:

50M DAU 10 songs 5MB = 2,500 TB/day.

~230 Gbps egress (Requires heavy CDN reliance).

Blueprint

Concise Summary: A microservices architecture leveraging Object Storage and CDNs for audio delivery, while using a relational database for metadata and Elasticsearch for discovery.

Major Components:

CDN (Content Delivery Network): Distributed edge nodes that cache audio chunks close to the user.

Music Service: Manages track metadata (Artist, Album, Song) and generates streaming URLs.

Audio Storage (S3): Source of truth for all encoded audio files.

Search Service: Provides full-text search over the music catalog.

Play Count Collector: Ingests play events via Kafka for analytics and royalty processing.

Simplicity Audit: This design avoids complex "Recommendation Engines" or "P2P Mesh" (which Spotify used early on but moved away from) in favor of standard, scalable cloud-native components.

Architecture Decision Rationale:

CDN: Essential because the primary cost/bottleneck is bandwidth and latency of large files.

Relational Metadata: Music data is highly structured (Artist -> Album -> Song); Postgres is perfect for this.

Asynchronous Play Tracking: Writing to a DB on every "Play" would kill performance; Kafka buffers these writes.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

CDN: Use a multi-CDN strategy (Akamai/Cloudfront) to serve .ogg or .aac chunks. Chunks are requested via HTTP Range headers.

Global Load Balancing: Use Latency-based DNS (Route53) to point api.spotify.com to the nearest region.

Security & Perimeter:

API Gateway: Handles JWT validation, rate limiting (100 requests/sec per user), and SSL termination.

Service

Topology & Scaling: Stateless microservices running on K8s. Scale based on CPU and Request Count.

API Schema Design:

GET /v1/tracks/{id}: Returns metadata and a signed CDN URL for the audio file.

GET /v1/search?q={query}: Returns list of track/artist objects.

POST /v1/me/play-events: Idempotent endpoint to report a song play.

Resilience & Reliability:

Circuit Breakers: If Search Service is slow, the API Gateway returns a cached "Popular Tracks" list.

Retries: Exponential backoff for metadata fetches.

Storage

Access Pattern: 99% Read (Metadata/Search), 1% Write (Playlists/Likes).

Database Table Design (PostgreSQL):

Tracks: id (PK), album_id (FK), title, duration_ms, s3_path.

Artists: id (PK), name, genre, bio.

Playlists: id (PK), user_id, name, visibility.

Playlist_Tracks: playlist_id, track_id, position (Composite PK).

Technical Selection:

PostgreSQL: For Tracks/Artists/Playlists due to relational requirements and ACID for user data.

Elasticsearch: For Search; indexes track_name and artist_name with N-gram analyzers for fuzzy matching.

Distribution Logic: Shard Playlist_Tracks by playlist_id to ensure all tracks for a single playlist reside on one node.

Cache

Purpose & Justification: Reduce DB load for "Hot Tracks" and "User Profiles".

Key-Value Schema:

track:{id} -> JSON metadata (TTL 24h).

user:{id}:profile -> User preferences (TTL 1h).

Technical Selection: Redis. Use LRU eviction policy.

Messaging

Purpose & Decoupling: Decouple the critical playback path from royalty and history processing.

Event / Topic Schema:

Topic: song-plays. Payload: {user_id, track_id, timestamp, duration_listened}.

Technical Selection: Kafka. High throughput (millions of events/sec). Use user_id as the partition key to maintain chronological history per user.

Data Processing

Processing Model: Stream processing for real-time history; Batch processing for daily royalty reports.

Technical Selection: Flink for real-time aggregation (e.g., "Top 50" charts updated hourly).

Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency for playlist updates across devices to favor Availability. If a user adds a song on mobile, it might take seconds to appear on desktop.

Reliability: If the Audio S3 bucket in one region fails, the CDN is configured to failover to a secondary region origin.

Bottleneck Analysis:

Hot Shards: A new Taylor Swift album will cause a massive spike.

Mitigation: Aggressive edge caching at the CDN level and pre-warming Redis caches for high-anticipation releases.

Security: Audio files are stored with unique, non-guessable keys and served via Signed URLs with short TTLs (e.g., 2 hours) to prevent link sharing outside the app.