The Question
Design

Design a Scalable Ticket Booking System

Design a system like BookMyShow or Ticketmaster capable of handling extreme concurrency during high-demand events. The system must support movie and event discovery, real-time seat availability, and a secure booking process. Address critical challenges including the prevention of double-bookings, handling payment failures, and maintaining low latency during traffic spikes. Explain your choice of data storage, locking mechanisms, and how you would scale the system for millions of concurrent users.
PostgreSQL
Redis
Kafka
Elasticsearch
CDN
Kubernetes
gRPC
JWT
Questions & Insights

Clarifying Questions

Scale: What is the anticipated scale in terms of Daily Active Users (DAU) and peak booking volume (e.g., during a blockbuster movie release)?
Assumption: 10M DAU, with peaks of 50,000 booking requests per second.
Geography: Is this a global system or localized to a single country?
Assumption: Primarily localized (e.g., India or USA), but multi-region for high availability.
Inventory Type: Are we supporting just movies, or also live events with complex seating (e.g., stadiums)?
Assumption: Movies and simple theater venues for the MVP.
Seat Lock Duration: How long should a seat be "held" once a user starts the checkout process?
Assumption: 5 to 10 minutes. If payment isn't completed, the seats are released.
Consistency: Is strict consistency required for seat selection?
Assumption: Yes, double-booking is a critical failure.

Thinking Process

The core challenge is managing high-concurrency inventory where many users compete for the same limited resources (seats) simultaneously.
How do we prevent double-booking? We utilize a distributed locking mechanism or RDBMS transactions with row-level locking to ensure atomic seat reservations.
How do we handle the "Thundering Herd" during blockbuster releases? We implement a virtual "Waiting Room" (Queue) and aggressive caching for show metadata to protect the database.
How do we manage transient seat states? Use a TTL-based cache (Redis) to represent temporary "locked" seats during the payment window, moving to persistent storage only upon payment success.
How do we keep the read path fast? Separate the Show/Movie catalog (Read-heavy) from the Booking flow (Write-heavy) using a CQRS-lite approach.

Bonus Points

Transactional Outbox Pattern: Ensure that booking data in the DB and "Ticket Confirmed" messages in Kafka stay in sync without distributed transactions (2PC).
Optimistic vs. Pessimistic Locking: Use optimistic locking (version column) for seat selection to maximize throughput, falling back to pessimistic locks only for the final payment commit.
Idempotency Keys: Use client-generated UUIDs for all booking attempts to prevent duplicate charges/bookings on network retries.
Database Sharding by EventID: Shard the booking and seat-map tables by Show_ID to distribute the load of a blockbuster movie across multiple database nodes.
Design Breakdown

Functional Requirements

Core Use Cases:
Users can search for movies by city, date, and genre.
Users can view theater layouts and real-time seat availability.
Users can select and "lock" seats for 10 minutes.
Users can complete payment and receive a digital ticket.
Scope Control:
In-scope: Browsing, Seat Selection, Booking, Payment Integration, Basic Notifications.
Out-of-scope: User reviews, loyalty programs, dynamic pricing based on demand, food/beverage pre-ordering.

Non-Functional Requirements

Scale: Must handle 50k QPS during peak bursts (e.g., "Avengers" ticket opening).
Latency: Seat selection response < 200ms; search results < 500ms.
Availability & Reliability: 99.99% availability for browsing; high reliability for booking (ACID).
Consistency: Strong Consistency for seat reservations; Eventual Consistency for movie catalog and search.
Fault Tolerance: System must handle payment gateway failures gracefully (refund/release logic).

Estimation

Traffic Estimation:
10M DAU -> ~115 average QPS.
Peak factor 100x for blockbuster releases -> 11,500 to 50,000 Peak QPS.
Read:Write Ratio: 100:1 (Browsing vs. Booking).
Storage Estimation:
1,000 Cinemas 10 screens 5 shows/day * 200 seats = 10M seat-status records per day.
1 year storage = 3.65 Billion records. ~100 bytes per record -> ~365 GB/year.
Bandwidth Estimation:
Outgoing: 50k QPS * 10KB response = 500 MB/s peak.

Blueprint

Concise Summary: A microservices-based architecture using a Relational Database for ACID-compliant bookings and Redis for high-speed seat locking and show metadata caching.
Major Components:
Search Service: Uses Elasticsearch/OpenSearch for low-latency movie and cinema discovery.
Show Service: Manages theater metadata, show timings, and static seat maps.
Booking Service: Orchestrates seat locking, payment verification, and ticket issuance using PostgreSQL.
Payment Gateway: External integration with webhook callbacks.
Notification Service: Asynchronous ticket delivery via Kafka.
Simplicity Audit: This architecture avoids complex distributed actors or global locks by partitioning data by Show_ID and using localized RDBMS transactions for seat integrity.
Architecture Decision Rationale:
Why this?: Ticket booking is fundamentally a "Double-Spend" problem. RDBMS is the industry standard for financial-grade consistency.
Functional Satisfaction: Covers search-to-ticket flow seamlessly.
Non-functional Satisfaction: Scalable via sharding and caching; handles peak loads via asynchronous message processing for non-critical paths (notifications).

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: CDN caches static movie posters and theater layouts. DNS uses latency-based routing to nearest regional API Gateway.
Security & Perimeter: API Gateway handles JWT validation and Rate Limiting (per UserID) to prevent bot scraping during high-demand releases.

Service

Topology & Scaling: Stateless microservices deployed on Kubernetes. Scaling triggered by CPU/Memory.
API Schema Design:
POST /v1/bookings/reserve: Reserve seats.
Request: { showId, seatIds[], userId }
Response: { bookingId, expiryTime, totalAmount }
Idempotency: X-Retry-Key header.
POST /v1/bookings/confirm: Finalize after payment.
Resilience: Circuit breakers on the Payment Gateway. If a payment provider is down, the system fails over to a secondary provider or holds the seat longer.

Storage

Access Pattern:
Show_Seats: Extreme write contention on specific Show_IDs.
Bookings: Write-once, read-often.
Database Table Design:
ShowSeats: show_id (PK), seat_id (PK), status (Available, Locked, Booked), user_id, version_col (for Optimistic Locking).
Bookings: booking_id (PK), user_id, show_id, status (Pending, Paid, Cancelled), total_price.
Technical Selection: PostgreSQL. It offers the best balance of ACID compliance and row-level locking performance.
Distribution Logic: Shard ShowSeats and Bookings tables by show_id. This ensures that contention for a specific movie show is isolated to a single DB shard.

Cache

Purpose: Reduce DB load for show information and handle transient "locks" for seats.
Key-Value Schema:
lock:show:{id}:seat:{id}: Value: userId, TTL: 600s.
show_metadata:{id}: JSON of movie/theater details.
Failure Handling: If Redis fails, the system falls back to the DB (pessimistic locking) to ensure consistency, albeit with higher latency.

Messaging

Purpose: Decouple ticket generation and notifications from the critical payment path.
Event Schema: BookingConfirmedEvent: { bookingId, userId, ticketDetails }.
Technical Selection: Kafka. High throughput and durability for event-driven flows.
Wrap Up

Advanced Topics

Trade-offs: We choose Consistency over Availability (CP) for the booking flow. If the Booking DB is down, we cannot sell tickets. This is preferable to over-selling.
Reliability: A Cron Job / Worker scans the ShowSeats table for expired "Locked" seats that didn't convert to "Paid" and releases them back to "Available".
Bottleneck Analysis: The primary bottleneck is the DB write QPS for ShowSeats.
Optimization*: Use a Distributed Semaphore in Redis to check availability before* hitting the DB. This filters out 90% of requests that would have failed anyway due to sold-out seats.
Security: All PII (User emails/phones) is encrypted at rest. Payments are handled via PCI-compliant external gateways (Stripe/Paypal) so the system never touches raw credit card data.
Distinguishing Insights: For extreme loads (e.g., Olympic tickets), implement a Token Bucket at the Gateway level that only allows a number of users equal to Seat_Count * 2 into the booking flow, placing others in a "Virtual Queue" to prevent DB exhaustion.