DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Cinema Ticket Booking System Design

Design a high-scale cinema ticket booking platform (e.g., BookMyShow or Fandango) capable of handling millions of users and high-concurrency 'opening night' events for blockbusters. The system must ensure no two users can book the same seat, handle seat locks during a 10-minute payment window, and remain responsive under heavy search traffic. Detail your strategy for data consistency, concurrency control, and handling external payment gateway integrations.
PostgreSQL
Redis
Kafka
Docker
Kubernetes
CDN
JWT
Prometheus
Grafana
Stripe
Questions & Insights

Clarifying Questions

Scale of Operations: What is the expected volume of theaters, movies, and concurrent users? (Assumption: 5,000 theaters, 50,000 shows per day, and 10 million Daily Active Users with significant spikes for blockbuster releases).
Booking Window & Locking: How long should a seat be held during the checkout process? (Assumption: 5-10 minutes "soft lock" before the seat is released).
Consistency Requirements: Is "eventual consistency" acceptable for seat availability, or is "strong consistency" required? (Assumption: Strict consistency for seat locking to prevent double-booking).
Geographic Distribution: Is the system global or regional? (Assumption: Regional focus initially, but designed to scale across multiple availability zones).

Thinking Process

Core Bottleneck: The primary challenge is the "Thundering Herd" problem during blockbuster ticket releases where thousands of users attempt to book the same limited seats simultaneously.
Key Progressive Questions:
How do we ensure movie/showtime discovery remains fast despite high traffic? (Read-heavy optimization).
How do we manage temporary seat reservations without locking the entire database? (Distributed locking/Caching).
How do we handle the transition from a "temporary hold" to a "confirmed booking" atomically? (Transactional integrity).
How do we ensure the system survives a payment gateway failure or timeout? (Async reconciliation and state machines).

Bonus Points

Virtual Waiting Rooms: Implementation of a "Queue-it" style waiting room using Redis sorted sets to buffer incoming traffic during peak bursts, preventing downstream service collapse.
Optimistic Concurrency with Versioning: Using version numbers in the database for seat status to allow high throughput without heavy row-level locking during the initial "check" phase.
Transactional Outbox Pattern: Ensuring that the Booking Service and the Notification Service stay in sync by writing events to a local DB table before publishing to a Message Queue.
Hot-Partition Mitigation: Sharding showtime data by ShowID to distribute the load of popular movies across different database nodes.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can search for movies by city, date, and genre.
Users can view real-time seat maps for specific showtimes.
Users can "hold" seats for 5-10 minutes while completing payment.
Users receive a booking confirmation (QR code/Email) upon successful payment.
Scope Control:
In-scope: Movie discovery, Seat selection, Booking workflow, Payment integration, and Basic notifications.
Out-of-scope: Popcorn/Concession ordering, User reviews/social features, and Cinema-side management portals.

Non-Functional Requirements

Scale: Must handle 100k+ concurrent requests during peak "opening night" scenarios.
Latency: Search and seat map rendering should be < 200ms.
Availability & Reliability: 99.99% availability; booking data must be durable.
Consistency: Strong consistency for seat reservations (No double-booking).
Fault Tolerance: System must handle payment gateway timeouts gracefully without losing ticket state.

Estimation

Traffic Estimation:
10M DAU.
Peak QPS: During major releases, 50x average load. If average is 1k QPS, peak can hit 50k-100k QPS.
Storage Estimation:
5,000 theaters 10 screens 3 shows/day = 150,000 showtimes/day.
Each showtime has 200 seats = 30M seat records daily.
1 year of seat/booking data: ~10B records. At 100 bytes/record ≈ 1 TB/year.
Bandwidth Estimation:
100k peak QPS * 5KB average payload ≈ 500 MB/s.

Blueprint

Concise Summary: A microservices-based architecture utilizing an API Gateway for routing, Redis for high-speed distributed seat locking, and PostgreSQL for ACID-compliant final bookings.
Major Components:
Search Service: Handles movie and theater discovery using read-replicas.
Booking Service: Manages the seat reservation lifecycle and state machine.
Payment Service: Interfaces with external providers and manages transaction status.
Redis Cache: Maintains temporary seat "locks" to offload the database.
Message Queue: Decouples notification and analytics from the core booking flow.
Simplicity Audit: This design uses a standard RDBMS for the source of truth while leveraging Redis for the high-concurrency "locking" phase, avoiding complex distributed transaction protocols like 2PC where possible.
Architecture Decision Rationale:
Why this architecture?: Separating Search from Booking prevents heavy browsing traffic from impacting the critical booking path. Redis provides the low-latency locking required for seat selection.
Functional Requirement Satisfaction: Covers the end-to-end flow from discovery to ticket issuance.
Non-functional Requirement Satisfaction: Scalable via horizontal service scaling and DB sharding; highly available through multi-AZ deployment.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: CloudFront/CDN for static assets (movie posters, UI). Latency-based DNS routing to the nearest regional data center.
Security & Perimeter:
API Gateway: Handles JWT-based AuthN/AuthZ.
Rate Limiting: Tiered limiting (e.g., 10 seat-lock attempts per minute per user) to prevent bot-driven "seat sweeping".
WAF: Protects against SQL injection and DDoS attacks during high-traffic sales.

Service

Topology & Scaling:
Stateless Services: All services are deployed as Docker containers on K8s across multiple Availability Zones (AZs).
Scaling: Auto-scaling based on CPU and Request Count per Target.
API Schema Design:
POST /v1/bookings/reserve: Reserve seat. Returns reservation_id. (REST, Idempotent via request_id).
POST /v1/bookings/confirm: Finalize booking after payment.
Resilience & Reliability:
Circuit Breakers: Implemented for the Payment Service to fail fast if the external provider is down.
Retries: Exponential backoff for notification delivery.

Storage

Access Pattern: Read-heavy for Search (90:10), Write-intensive for Booking during peak.
Database Table Design (PostgreSQL):
Table: Showtimes: id (PK), movie_id, theater_id, start_time, price.
Table: Seats: id (PK), showtime_id, row, number, status (Available, Reserved, Booked), version (for Optimistic Locking).
Table: Bookings: id (PK), user_id, showtime_id, total_price, status (Pending, Paid, Cancelled).
Technical Selection: PostgreSQL for its robust ACID properties and support for JSONB (for flexible theater layouts).
Distribution Logic: Partitioning Seats and Bookings tables by showtime_id to ensure queries for a single show remain on one shard.

Cache

Purpose & Justification: Redis is used for "Soft Locks" on seats.
Key-Value Schema:
Key: lock:showtime_id:seat_id
Value: user_id
TTL: 10 minutes.
Failure Handling: If Redis fails, the system falls back to the RDBMS (slower but safe). We use a Redis Cluster with replication to minimize this risk.

Messaging

Purpose & Decoupling: Kafka is used to decouple the Booking Service from post-booking actions.
Event Schema: BookingConfirmedEvent containing booking_id, user_id, and metadata.
Technical Selection: Kafka for high throughput and replayability (useful if the Notification Service needs to re-process failed emails).

Infrastructure (Optional)

Observability: Prometheus for metrics (Red/USE), Grafana for dashboards, and Jaeger for tracing the lifecycle of a reservation across services.
Distributed Coordination: Not explicitly needed beyond Redis locks for the MVP.
Wrap Up

Advanced Topics

Trade-offs (Consistency vs Availability): We prioritize Consistency (CP) for the booking flow to prevent double-booking. For the search/browsing flow, we prioritize Availability (AP) using read-replicas that might be slightly out of sync.
Reliability: We use a Saga Pattern (Choreography-based) to manage the distributed transaction between Booking and Payment. If the payment fails or times out, the Booking Service receives a message to release the seat locks.
Bottleneck Analysis: The Seat table in the DB could become a hotspot. Optimization: We perform the initial "Can I book?" check against the Redis cache. Only if Redis shows the seat is free do we attempt a DB transaction.
Security: PII (User emails/phones) is encrypted at rest. All payment processing is handled via PCI-compliant external providers (Stripe/Braintree) so we never store credit card numbers.
Distinguishing Insight: During a "Flash Sale" for a blockbuster, the system can implement Request Collapsing at the cache layer—if 1000 requests check for the same seat map in the same millisecond, the Search Service only performs one DB query and broadcasts the result to all callers.