The Question
DesignCinema Ticket Booking System Design
Design a high-scale cinema ticket booking platform (like Fandango or BookMyShow) that can handle millions of users and peak traffic during blockbuster movie releases. The system must support movie browsing, real-time seat selection, and a secure booking workflow. Key challenges include preventing double-booking of seats under high concurrency, managing temporary seat holds during payment, and ensuring low latency for seat map availability.
PostgreSQL
Redis
Kafka
CDN
Kubernetes
JWT
gRPC
Prometheus
Questions & Insights
Clarifying Questions
What is the expected scale of the system? (Assumption: 10M Daily Active Users (DAU), with 1M tickets sold per day, peaking at 10x during blockbuster releases).
How long should a seat be "locked" during the checkout process? (Assumption: A 10-minute temporary hold/lock is required to allow the user to complete payment).
Does the system handle seat selection or just "best available"? (Assumption: Users select specific seats from a cinema map).
What is the geographic scope? (Assumption: National scale with multiple cities and cinema chains).
Are we handling the payment processing directly? (Assumption: We integrate with 3rd-party payment gateways but must manage the transaction state internally).
Thinking Process
Core Bottleneck: The primary challenge is preventing "double-booking" of the same seat under high concurrency (e.g., when tickets for a "Marvel" movie go on sale).
Key Progressive Questions:
How do we provide high-performance browsing of movies and showtimes for millions of users?
How do we ensure atomic seat allocation so two users cannot book the same seat at the exact same millisecond?
How do we manage the lifecycle of a "Locked" seat to ensure it is released if the payment fails or the user abandons the session?
How do we handle the "Thundering Herd" problem when a major movie's tickets are released at a specific time?
Bonus Points
Distributed Locking vs. DB Constraints: Implementing Redis-based distributed locks for high-performance temporary seat holds, backed by SQL
FOR UPDATE or unique constraints for final ACID guarantees.Transactional Outbox Pattern: Ensuring that the Booking Service and Payment/Notification services remain in sync even if the network fails after a successful DB commit.
Cache Warming for Blockbusters: Pre-loading high-demand movie/showtime data into distributed caches to prevent "Cache Cold Starts" and DB meltdowns during peak traffic.
Optimistic Concurrency Control (OCC): Using versioning in the
Seats table to minimize lock contention for less popular movies.Design Breakdown
Functional Requirements
Core Use Cases:
Users can search for movies by city, genre, or date.
Users can view available showtimes and real-time seat maps for a specific cinema.
Users can temporarily lock seats (10-minute TTL).
Users can complete payment to confirm the booking.
Users receive a booking confirmation/E-ticket.
Scope Control:
In-Scope: Seat selection, locking, payment state management, search.
Out-of-Scope: User reviews, dynamic pricing algorithms, cinema-side inventory management (admin portal).
Non-Functional Requirements
Scale: Must handle 50,000+ QPS for browsing and 5,000+ TPS for booking during peak sales.
Latency: Seat map loading should be < 200ms; booking confirmation < 2s.
Availability & Reliability: 99.99% availability for browsing. High reliability for booking (zero data loss).
Consistency: Strong Consistency is required for seat allocation (no double-booking). Eventual consistency is acceptable for movie search/metadata.
Fault Tolerance: Automatic seat release if the payment service is unreachable or if the user times out.
Estimation
Traffic: 10M DAU -> ~115 average QPS. Peak QPS (10x) = 1,150. Read-to-Write ratio is roughly 50:1 (browsing vs. booking).
Storage: 1M tickets/day * 1KB per record = 1GB/day. 365GB/year. Easily fits in a partitioned RDBMS.
Bandwidth: 50k QPS * 10KB (avg response) = 500 MB/s outgoing bandwidth.
Blueprint
Concise Summary: A microservices-based architecture utilizing a read-optimized Search Service with heavy caching and a write-optimized Booking Service using a relational database for ACID compliance and Redis for high-speed seat locking.
Major Components:
API Gateway: Handles rate limiting, authentication, and routing.
Search/Showtime Service: Scalable read-only service for browsing movies and cinemas.
Booking Service: Manages the seat-locking state machine and finalizes transactions.
Redis Cache: Stores showtime metadata and active seat-lock TTLs.
Relational DB (PostgreSQL/MySQL): The source of truth for seat occupancy and bookings.
Simplicity Audit: This design avoids complex event-sourcing or distributed transactions by centralizing seat state in a single SQL cluster per region, which is the simplest way to guarantee "exactly-once" booking.
Architecture Decision Rationale:
Why this?: Relational DBs are the industry standard for transactional integrity. Redis provides the low-latency locking required to handle the initial "click-to-lock" surge.
Functional Satisfaction: Meets all browsing and booking flows.
Non-functional Satisfaction: Scalable via read-replicas for search and sharding for bookings.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: Use a CDN (Cloudflare/Akamai) to cache static movie posters and UI assets.
Security & Perimeter: API Gateway implements JWT-based Authentication and Rate Limiting (e.g., max 5 booking attempts per minute per user) to prevent bot-driven ticket scalping.
Service
Topology & Scaling:
Services are stateless and deployed in Kubernetes across multiple Availability Zones (AZs).
Scaling is triggered by CPU utilization and Request Count.
API Schema Design:
GET /v1/movies?city={id}: Returns list of movies.GET /v1/showtimes/{movie_id}: Returns cinema showtimes.POST /v1/bookings/lock: Request:
{showtime_id, seat_ids[]}Response:
{lock_id, expiry_timestamp}Idempotency:
X-Idempotency-Key header required.POST /v1/bookings/confirm:Request:
{lock_id, payment_token}Resilience & Reliability:
Circuit Breakers: Applied to the Payment Gateway Proxy to prevent Booking Service hangs during payment outages.
Saga Pattern: If payment fails, the Booking Service triggers a compensatory action to unlock the seats immediately instead of waiting for TTL.
Storage
Access Pattern:
High-frequency reads on
Showtimes and Seats status.High-frequency atomic updates on
SeatStatus.Database Table Design:
Showtimes:
id, movie_id, cinema_id, start_time.Seats:
id, showtime_id, seat_number, status (AVAILABLE, LOCKED, BOOKED), version_id.Bookings:
id, user_id, showtime_id, total_price, status (PENDING, PAID, CANCELLED).Technical Selection: PostgreSQL with Row-Level Locking.
Rationale: ACID compliance is non-negotiable. PostgreSQL handles concurrent transactions reliably.
Distribution Logic: Shard the
Bookings and Seats tables by showtime_id. This ensures all seats for a single movie screening live on the same physical shard, making atomic updates easier.Cache
Purpose & Justification:
Showtime Cache: Reduces DB load for browsing.
Lock Cache (Redis): Acts as a fast distributed lock for the 10-minute window.
Key-Value Schema:
Key:
lock:showtime:{id}:seat:{id}, Value: user_id, TTL: 600s.Failure Handling: If Redis fails, the system falls back to the DB status check, but with higher latency.
Messaging
Purpose & Decoupling: Kafka is used for downstream tasks that don't need to be synchronous with the booking (e.g., sending E-tickets, updating analytics).
Event Schema:
BookingConfirmedEvent {booking_id, user_id, seat_details, price}.Technical Selection: Kafka. High throughput and message persistence allow for replaying notifications if the email service goes down.
Infrastructure (Optional)
Observability: Prometheus for metrics (tracking 2xx vs 5xx errors), Grafana for dashboards, and Jaeger for distributed tracing to find bottlenecks in the booking flow.
Wrap Up
Advanced Topics
Trade-offs: We choose Consistency (C) over Availability (A) in the CAP theorem for the booking flow. It is better to show an error to a user than to sell the same seat twice.
Reliability & Failure Handling:
Zombie Locks: A background cron job scans the SQL
Seats table for LOCKED statuses where the updated_at is older than 10 minutes and resets them to AVAILABLE (in case the Redis TTL and DB state get out of sync).Bottleneck Analysis:
Hot Shards: A blockbuster movie in one city could overload a single DB shard. Solution: Use Redis to offload all seat status checks (
is available?), only hitting the DB for the final write.Security & Privacy:
Anti-Scalping: Implement "Virtual Waiting Rooms" (Queueing) for extremely high-demand releases to protect the backend from a massive surge.
Distinguishing Insights:
For massive scale, one could use a Two-Phase Lock approach: 1. Set a "soft lock" in Redis (very fast). 2. If successful, attempt a "hard lock" in the DB. This prevents the DB from even seeing requests for seats that are already being processed by another user.