The Question
DesignDistributed Inventory Reservation System
Design a high-concurrency inventory management system that supports atomic stock reservation with a 5-minute timeout. The system must provide interfaces to block inventory (with automatic release after 5 minutes if not confirmed), confirm orders, and query available stock. Focus on ensuring strict consistency to prevent overselling and handle race conditions between user confirmations and system timeouts.
Redis
Lua
SQS
RabbitMQ
Stateless Microservices
Questions & Insights
Clarifying Questions
Scale and Throughput: What is the expected peak QPS for
blockInventory and getInventory? (Assumption: 10k+ QPS, requiring a distributed memory-first approach).Persistence Requirements: Does the inventory state need to survive a full data center outage, or is a high-availability distributed cache sufficient for the MVP? (Assumption: Redis with AOF/RDB persistence is sufficient for MVP).
Atomicity: Can a single
blockInventory call involve multiple different product IDs? (Assumption: No, the interface specifies a single productId per call).Consistency Guarantee: Is "Strict Consistency" (no overselling) a hard requirement? (Assumption: Yes, we must never sell more than the available stock).
Order Lifecycle: What happens if
confirmOrder arrives after the 5-minute timeout has already released the inventory? (Assumption: The confirmation should fail if the inventory was already reclaimed).Thinking Process
Core Bottleneck: The "Check-then-Act" race condition. If two threads check inventory at the same time, both might see "1 item left" and both decrement it, leading to -1.
Strategy Step 1: Use Atomic Operations (Redis Lua Scripts) to combine "Check availability" and "Reserve amount" into a single indivisible step.
Strategy Step 2: Implement a State Machine for orders (PENDING, CONFIRMED, EXPIRED) to handle the race between the 5-minute timeout and the user confirmation.
Strategy Step 3: Leverage a Delayed Message Queue or TTL-based mechanism to trigger the "Release Inventory" logic precisely 5 minutes after the block.
Strategy Step 4: Ensure Idempotency on all operations using the
orderId as the unique key to prevent duplicate processing.Bonus Points
Deterministic Cleanup: Instead of active polling, use a "Dead Letter Exchange" (DLX) or "Delayed Message" pattern to ensure O(1) complexity for triggering expiration.
Transactional Integrity: Discuss using Redis Transactions (MULTI/EXEC) vs Lua Scripts; Lua is preferred here for conditional logic (if count > required) inside the atomic block.
Distributed Locking vs. Atomic Counters: Choosing atomic counters in Redis over heavy-weight Redlock for better performance in high-frequency inventory updates.
Ghost Inventory Prevention: Handling the "Double Release" problem where a late confirmation and a timeout worker both try to modify the state simultaneously.
Design Breakdown
Functional Requirements
blockInventory: Atomically decrease available stock and create a 5-minute reservation.confirmOrder: Permanently finalize the reservation, preventing it from being released.getInventory: Return the current available (unreserved) stock for a product.Auto-Release: Automatically return reserved stock to "available" if not confirmed within 5 minutes.
Non-Functional Requirements
High Concurrency: Support thousands of requests per second without race conditions.
Low Latency:
getInventory and blockInventory should respond in < 50ms.Accuracy: Zero overselling (strict consistency on inventory count).
Reliability: Reservations must eventually be released even if the application server crashes.
Estimation
Storage: 1 million products 100 bytes/product = ~100MB. 1 million active reservations 200 bytes/reservation = ~200MB. Total memory fits easily in a single small Redis node.
QPS: 10k QPS is well within the limits of a single-threaded Redis instance (which handles ~100k+ ops/sec).
Bandwidth: Negligible (small JSON/String payloads).
Blueprint
Concise Summary: A distributed Inventory Service using Redis for atomic state management and a Delayed Message Queue to handle the 5-minute expiration logic.
Major Components:
Inventory Service: A stateless API layer implementing the required interface and handling business logic.
Redis Cache: The primary source of truth for real-time inventory counts and reservation states.
Delayed Message Queue: Orchestrates the 5-minute timeout by holding "Check Expiration" messages.
Expiration Worker: A background consumer that processes timed-out messages and releases inventory if the order remains unconfirmed.
Simplicity Audit: This design avoids complex distributed locking and database polling, using the natural atomicity of Redis and the inherent timing capabilities of message brokers.
Architecture Decision Rationale:
Why this architecture is the best for this problem?: It decouples the timing logic (5 mins) from the request-response cycle, ensuring the system remains responsive.
Functional Requirement Satisfaction: Lua scripts handle
block, confirm updates the state, and get reads the counter.Non-functional Requirement Satisfaction: Redis provides sub-millisecond latency and atomic operations for high-concurrency safety.
High Level Architecture
Sub-system Deep Dive
Service
Topology & Scaling: Stateless microservices deployed in a Multi-AZ Kubernetes cluster. Scaled based on CPU/Request count.
API Schema Design:
POST /v1/inventory/block: { productId, count, orderId } | Returns 200 (Success) or 409 (Insufficient Stock).POST /v1/inventory/confirm: { orderId } | Returns 200 (Success) or 404 (Expired/Not Found).GET /v1/inventory/{productId}: Returns { count }.Resilience & Reliability:
Idempotency: Use
orderId in Redis to check if a reservation already exists.Retries: Client-side retries with exponential backoff for
confirmOrder.Observability: Metrics on
inventory_exhausted_total and reservation_timeout_total.Storage
Access Pattern: Heavy write (updates on block/confirm) and heavy read (get inventory). Requires low latency.
Database Table Design (Redis Key-Value Schema):
inv:{productId}: Integer (Atomic Counter).res:{orderId}: Hash { prodId, qty, status } (Status: pending | confirmed | released).Technical Selection: Redis.
Rationale: Support for Lua scripts ensures the "Check and Set" operation is atomic across concurrent threads.
Reliability & Recovery: Redis AOF (Append Only File) set to
everysec to ensure minimal data loss during crashes.Cache
Purpose & Justification: Redis acts as the Primary Store for the MVP to meet latency and concurrency requirements.
Key-Value Schema:
inv:prod123 -> 50res:ord456 -> {"p": "prod123", "q": 2, "s": "pending"}Failure Handling: If a block succeeds but the message queue fails, we use a secondary periodic "reconciliation" scan or wrap both in a transaction (where possible).
Messaging
Purpose & Decoupling: Acts as the timer for the 5-minute window.
Event Schema:
{ "orderId": "string", "timestamp": "long" }.Failure Handling: Use a Dead Letter Queue (DLQ) if the Expiration Worker fails to process a release.
Technical Selection: RabbitMQ with Delayed Message Plugin or AWS SQS with DelaySeconds.
Rationale: High reliability for scheduled tasks.
Data Processing
Processing Model: Simple event-driven consumer (Worker).
Processing DAG:
Message Received -> Fetch Reservation Status from Redis -> IF Status == pending -> Atomic Increment Inventory + Set Status to released.Technical Selection: Python/Go/Java worker pool consuming from the MQ.
Wrap Up
Advanced Topics
Trade-offs (Consistency vs Availability): We choose CP (Consistency/Partition Tolerance) in the CAP theorem. During a network partition, we prefer to reject inventory blocks rather than risk overselling.
Optimization (Lua Script for Blocking):
local stock = tonumber(redis.call('get', KEYS[1]))
if stock >= tonumber(ARGV[1]) then
redis.call('decrby', KEYS[1], ARGV[1])
redis.call('hset', KEYS[2], 'p', KEYS[1], 'q', ARGV[1], 's', 'pending')
return 1
else
return 0
endBottleneck Analysis: Redis is a single-threaded bottleneck for writes. For 100x scale, we would implement Sharded Redis Clusters based on
productId.Edge Case: What if
confirmOrder and the Expiration Worker run at the exact same millisecond?Solution: Use Redis
WATCH or another Lua script for confirmOrder to ensure it only sets status to confirmed if it is currently pending.