The Question
Design

Video Conferencing Platform

Design a large-scale video conferencing platform similar to Zoom. The system should support real-time audio and video communication, screen sharing, meeting scheduling, recording, and breakout rooms for millions of simultaneous participants with low latency and high reliability.
WebSocket
SFU
Redis
PostgreSQL
SRTP/UDP
Questions & Insights

Thinking Process

To design a high-performance video conferencing system like Zoom, the core challenge shifts from standard CRUD operations to managing real-time, low-latency data streams and signaling.
Signaling vs. Media: Separate the control plane (joining, leaving, permissions) from the data plane (audio/video packets).
Media Routing Topology: Choose between Mesh (too heavy for clients), MCU (too heavy for servers), and SFU (Selective Forwarding Unit). For MVP, SFU is the gold standard as it forwards streams without transcoding, minimizing latency and server CPU.
Logical Progression:
How do we connect two peers and exchange session metadata? (Signaling via WebSockets).
How do we scale to 10+ participants without crushing the uploader's bandwidth? (Introducing the SFU).
How do we ensure a user connects to the closest data center to minimize latency? (Geo-DNS and Global Edge locations).
How do we handle fluctuating network conditions? (Adaptive Bitrate and Simulcast).

Bonus Points

Cascading SFUs: To support massive webinars (10k+ viewers), SFUs should be designed in a tree-like hierarchy to distribute the load across regions rather than a single hub.
Simulcast & SVC: Explain the use of Scalable Video Coding (SVC) to send multiple layers of resolution/frame rate in one stream, allowing the SFU to drop layers for low-bandwidth participants without re-encoding.
UDP-based Transport: Detail why Zoom uses custom UDP protocols or SRTP over TCP to avoid Head-of-Line blocking, which is fatal for real-time media.
Jitter Buffer Management: Implementing dynamic jitter buffers on the client side to smooth out packet arrival variance at the cost of minimal, controlled latency.
Design Breakdown

Functional Requirements

Meeting Management: Users can create, join, and end meetings.
Real-time Video/Audio: Low-latency 1-to-N and N-to-N communication.
Signaling: Managing participant state (mute, camera on/off, hand raise).
Text Chat: Persistent and real-time messaging during the call.
Screen Sharing: High-resolution stream forwarding.

Non-Functional Requirements

Latency: Sub-200ms end-to-end latency for a seamless conversational experience.
High Availability: The signaling plane must be 99.99% available; the media plane must gracefully handle SFU failures.
Scalability: Support up to 1,000 participants per meeting in the MVP.
Reliability: Handle graceful reconnection if a user's network switches (e.g., Wifi to 5G).

Estimation

Traffic: 10M Daily Active Users.
Concurrency: 1M concurrent participants at peak.
Bandwidth: Average 1.5 Mbps per high-quality stream.
1M users * 1.5 Mbps = 1.5 Tbps total egress.
Storage: Metadata is negligible (~100 bytes per meeting).
SFU Capacity: A standard c5.4xlarge instance can handle ~200-500 concurrent streams depending on optimization.

Blueprint

Concise Summary: The system utilizes a WebSocket-based Signaling Service for session control and a cluster of Selective Forwarding Units (SFUs) for routing encrypted media packets with minimal latency.
Major Components:
Signaling Service: A WebSocket cluster that manages the "room" state, handles SDP (Session Description Protocol) exchange, and broadcasts participant metadata.
SFU (Selective Forwarding Unit): The media engine that receives video/audio from a sender and forwards it to all authorized receivers in the room without re-encoding.
Redis (State Store): Stores transient session data, mapping users to specific SFU nodes and signaling servers.
PostgreSQL: Persistent storage for user accounts, meeting history, and scheduled meeting metadata.
Simplicity Audit: This architecture bypasses the complexity of MCU (Mixing) to save on CPU costs and avoids the instability of P2P Mesh for meetings larger than 3 people.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling:
Signaling Service: Stateless Node.js or Go servers using WebSockets (Socket.io or Gorilla). Scaled horizontally via K8s based on connection count.
Service Discovery: When a user joins, the Signaling Service queries the SFU Manager (logical part of the service) to find an SFU with available capacity in the user's closest region.
API Spec:
POST /v1/meetings: Create a meeting ID.
WS /v1/signal: WebSocket endpoint for SDP offer/answer exchange and ICE candidates.
GET /v1/meetings/{id}/sfu: Returns the IP of the assigned SFU node.

Storage

Data Model:
Meetings Table: id (UUID), host_id, start_time, settings (JSONB), status (Active/Ended).
Participants Table: id, meeting_id, user_id, joined_at, left_at.
Database Logic:
Read-heavy for meeting validation during the "Join" phase.
Write-heavy for participant logs (used for post-meeting analytics).

Cache

Data Structures:
Meeting_Map (Hash): Maps MeetingID -> SFU_IP.
User_Presence (Set): Tracks active users in a specific MeetingID.
TTLs: Keys expire 2 hours after meeting inactivity to prevent memory leaks.
Logic: Used as a fast lookup to ensure all participants of a single meeting are routed to the same SFU (or same SFU cluster).
Wrap Up

Advanced Topics

Trade-offs:
SFU vs. MCU: We chose SFU. Sacrifice: High client-side CPU (client must decode multiple streams). Gain: Extremely low server-side latency and lower infrastructure costs.
Bottlenecks:
SFU Bandwidth: Single SFU nodes can become egress-bound. We solve this by limiting participants per node and using "Cascading" for larger rooms.
Failure Handling:
Signaling Failover: If a signaling node dies, the client reconnects via the Load Balancer to a new node; session state is recovered from Redis.
SFU Failover: If an SFU node crashes, the Signaling Service detects the heartbeat failure and instructs all clients in that meeting to reconnect to a fresh SFU node (ICE restart).
Alternatives & Optimization:
Alternative: Use WebTransport instead of WebSockets/WebRTC for the data channel to reduce overhead once browser support matures.
Optimization: Implement Anycast IP for SFU clusters so users automatically route to the nearest edge node via BGP routing.