The Question
Design

Cinema Ticket Booking System Design

Design a high-scale cinema ticket booking platform capable of handling millions of users. The system must support movie discovery, real-time seat availability, and a secure booking process. A critical constraint is preventing double-booking during high-concurrency events (e.g., blockbuster releases). Address how you would handle temporary seat holds, payment consistency, and system reliability under bursty traffic patterns.
PostgreSQL
Redis
Kafka
CDN
Kubernetes
gRPC
JWT
Stripe
Questions & Insights

Clarifying Questions

What is the expected scale of the system? (Assumed: 10M Daily Active Users, 5,000 cinemas, and 50,000 screens globally).
What is the peak traffic pattern? (Assumed: Highly bursty; 100x traffic during blockbuster openings or "Friday 8 PM" windows).
How long should a seat be "held" during the checkout process? (Assumed: 10 minutes before the hold expires and the seat is released back to the pool).
Is global consistency required across regions? (Assumed: No, cinema bookings are naturally localized to a specific cinema/city; localized consistency is sufficient).
What is the payment integration strategy? (Assumed: Integration with 3rd-party providers like Stripe or PayPal).

Thinking Process

The "Double Booking" Problem: This is the core bottleneck. We must ensure that two users cannot book the same seat simultaneously under high concurrency.
Progressive Architecture Flow:
How do we handle massive read traffic for movie schedules? (CDN + Read Replicas).
How do we manage temporary seat holds efficiently? (Redis with TTL).
How do we ensure atomic seat confirmation? (Relational DB Transactions + State Machine).
How do we handle payment failures and seat releases? (Saga Pattern or Distributed Workflows).

Bonus Points

High-Concurrency Seat Locking: Utilizing Redis SETNX or Redlock for ultra-fast distributed locking before hitting the database to reduce RDBMS contention.
Database Sharding: Partitioning the database by cinema_id or city_id to ensure localized traffic doesn't create a global bottleneck.
Optimistic vs. Pessimistic Locking: Implementing optimistic locking (versioning) for general metadata and pessimistic locking for the specific transaction window of seat confirmation.
Idempotency Keys: Using client-generated UUIDs to ensure that network retries during payment do not result in duplicate charges or bookings.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can browse movies by city and cinema.
Users can view real-time seat availability for a specific show.
Users can temporarily "hold" seats for 10 minutes.
Users can complete payment and receive a digital ticket.
Scope Control:
In-Scope: Search, Seat Selection, Booking, Payment Integration.
Out-of-Scope: User reviews, cinematic trailers streaming (handled by CDN), loyalty programs, and complex AI-based recommendations.

Non-Functional Requirements

Scale: Must handle 10,000+ QPS during peak blockbuster releases.
Latency: Search and seat map rendering under 200ms.
Availability: 99.99% availability; booking must be available even if the recommendation engine is down.
Consistency: Strong Consistency for seat reservations (No double booking).
Fault Tolerance: Automatic seat release if the payment service or user session fails.
Security: PCI-DSS compliance for payment handling (via 3rd party redirection).

Estimation

Traffic Estimation:
10M DAU.
Peak QPS: 10,000 (Booking attempts/sec during high-demand windows).
Read/Write Ratio: 100:1 (Browsing vs. Booking).
Storage Estimation:
50,000 screens 10 shows/day 200 seats = 100M seat records per day.
100M records * 100 bytes = 10GB/day. 3.6TB per year.
Bandwidth Estimation:
Read: 10k QPS * 5KB = 50MB/s.
Write: 100 QPS * 2KB = 0.2MB/s.

Blueprint

Concise Summary: A microservices-based architecture utilizing a Relational Database for ACID compliance on transactions and Redis for high-speed temporary seat locks.
Major Components:
API Gateway: Handles authentication, rate limiting, and request routing.
Movie Service: Manages movie metadata and schedules (Read-heavy, optimized with CDN).
Booking Service: Orchestrates seat selection, temporary locks, and finalization.
Payment Service: Interfaces with 3rd-party gateways and manages transaction states.
Simplicity Audit: This design avoids complex distributed transaction coordinators by using a single RDBMS instance per cinema shard, ensuring ACID properties are handled natively by the database.
Architecture Decision Rationale:
Why this architecture?: Relational databases are chosen over NoSQL because ticket booking is a classic "all-or-nothing" transaction problem where data integrity is more important than horizontal write scaling.
Functional Satisfaction: Covers the full flow from movie discovery to ticket issuance.
Non-functional Satisfaction: Scalability is achieved via sharding and read replicas, while consistency is guaranteed by RDBMS transactions.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: CDN (Cloudflare/CloudFront) caches static movie posters, actor bios, and trailers to reduce origin load.
Security & Perimeter: API Gateway handles JWT validation and implements Leaky Bucket rate limiting to prevent bots from scraping seat maps during blockbuster releases.

Service

Topology & Scaling: Stateless microservices deployed on Kubernetes (EKS/GKE). Auto-scaling based on CPU and Request Count.
API Schema Design:
GET /v1/movies/{id}/seats: Returns seat map and availability. (REST, Idempotent).
POST /v1/bookings/reserve: Initiates a 10-minute hold. Request: {show_id, seat_ids[]}. Returns booking_id.
POST /v1/bookings/{id}/confirm: Finalizes booking after payment.
Resilience & Reliability: Circuit breakers on the Payment Service to prevent cascading failures if Stripe/PayPal is down.
Observability: Prometheus for metrics (booking success rate) and Jaeger for tracing the lifecycle of a reservation.

Storage

Access Pattern: Read-heavy for movie info; Write-heavy for bookings during peak hours. Requires row-level locking for seat selection.
Database Table Design (PostgreSQL):
Movies: id (PK), title, duration, rating.
Shows: id (PK), movie_id, cinema_id, start_time.
Seats: id (PK), show_id, row, number, status (Available, Locked, Booked), version (for Optimistic Locking).
Bookings: id (PK), user_id, show_id, total_price, status (Pending, Paid, Cancelled).
Technical Selection: PostgreSQL. Rationale: Strong ACID support and reliable row-level locking (SELECT ... FOR UPDATE).
Distribution Logic: Sharded by cinema_id. This ensures that even if one cinema's database is overloaded (e.g., a local premiere), it doesn't affect other cinemas.

Cache

Purpose & Justification: Redis is used for "Seat Holds". It reduces DB load by checking availability in memory before attempting a heavy DB transaction.
Key-Value Schema:
Key: lock:show_id:seat_id.
Value: user_id.
TTL: 600 seconds (10 mins).
Failure Handling: If Redis fails, the system falls back to the DB status check (Performance degrades, but consistency remains).

Messaging

Purpose & Decoupling: Kafka decouples the Booking Service from downstream systems like Email/SMS notification and Analytics.
Event Schema: BookingCompleted { booking_id, user_id, email, qr_code_data }.
Technical Selection: Kafka. Rationale: High throughput and durability for audit logs.
Wrap Up

Advanced Topics

Trade-offs: We choose Consistency over Availability (CP) for the booking flow. It is better to show an error than to sell the same seat twice.
Reliability: Use a Two-Phase Commit (2PC) or a simpler State Machine in the DB. A seat remains Locked until the Payment Service emits a Success or Timeout event.
Bottleneck Analysis: The primary bottleneck is the DB write lock on the Seats table. This is mitigated by sharding and by using Redis to "pre-filter" available seats before hitting the DB.
Security: Ticket QR codes are digitally signed (HMAC) to prevent forgery at the cinema entrance.
Staff-level Insight: To handle "Thundering Herd" for blockbusters, implement a Virtual Waiting Room (Queue) at the API Gateway level to throttle users into the booking flow based on system capacity.