The Question
DesignScalable Distributed Rate Limiter
Design a high-performance, distributed rate limiting system capable of handling over 1 million requests per second. The system must support multiple identification strategies (User ID, IP address, API Key) and different quota tiers. Key constraints include a sub-2ms latency overhead and high availability, with a focus on how to handle race conditions in a distributed environment and how the system should behave during partial failures.
Redis
Lua
gRPC
PostgreSQL
Consistent Hashing
API Gateway
Kubernetes
Questions & Insights
Clarifying Questions
Scale and Throughput: What is the expected peak traffic? (Assumed: 1,000,000 Requests Per Second (RPS) with 100 million active users).
Latency Budget: What is the maximum acceptable latency overhead added by the rate limiter? (Assumed: < 2ms p99 overhead).
Granularity: What is the identifier for rate limiting? (Assumed: Per API-Key and Per IP address).
Accuracy: Is a "hard limit" required (strictly accurate) or is "soft limit" (eventual consistency) acceptable for better performance? (Assumed: Highly accurate for billing tiers, but performance is prioritized).
Deployment: Should this be a centralized service or a distributed sidecar? (Assumed: Centralized Rate Limiter Service behind an API Gateway for consistency).
Thinking Process
Core Bottleneck: The primary challenge is performing a "Read-Modify-Write" operation at a million-plus RPS without creating a massive database bottleneck or race conditions.
Progressive Logic:
How do we track counts across multiple servers? Use a centralized Distributed Cache (Redis).
How do we handle race conditions between concurrent requests? Use Lua Scripts within Redis for atomic operations.
How do we minimize latency for high-traffic users? Implement Local In-memory Cache for rule-set lookups and partial count batching.
How do we scale? Implement Consistent Hashing on Redis clusters to prevent hot partitions.
Bonus Points
Token Bucket vs. Sliding Window: Using a Sliding Window Counter algorithm to prevent "bursting" at window boundaries, which is a common flaw in Fixed Window designs.
Dynamic Tiering: Implementing a weighted rate limiter where different API endpoints or user tiers consume "tokens" at different rates.
Global vs. Regional Synchronization: Discussing the trade-off between a global Redis cluster (strict accuracy) vs. regional Redis clusters with asynchronous synchronization (lower latency, eventual consistency).
Failure Mode - Fail Open: In a high-scale system, if the rate limiter service dies, the system should Fail Open (allow traffic) to maintain availability, while firing high-priority alerts.
Design Breakdown
Functional Requirements
Core Use Cases:
Determine if a request should be allowed or throttled based on defined limits.
Return
429 Too Many Requests with Retry-After headers.Support different limits for different tiers (e.g., Free vs. Pro).
Scope Control:
In-Scope: Distributed counting logic, rule engine, and basic analytics.
Out-of-Scope: User authentication (handled by Auth Service), billing processing, and deep packet inspection.
Non-Functional Requirements
Scale: Must handle 1M+ RPS and support 100M+ unique keys.
Latency: Extremely low overhead (< 2ms) to avoid degrading user experience.
Availability & Reliability: 99.99% availability; the rate limiter must not become a Single Point of Failure (SPOF).
Consistency: High consistency for billing-related limits; eventual consistency for public API IP-based limits.
Fault Tolerance: Handle Redis cluster partitions or node failures gracefully.
Estimation
Traffic: 1M RPS.
Storage:
Each key (UserID/IP + Window) \approx 64 bytes.
100M users * 64 bytes \approx 6.4 GB.
Modern Redis clusters can easily hold this in memory.
Bandwidth:
Request: UserID (16 bytes) + Metadata \approx 100 bytes.
1M RPS * 100 bytes = 100 MB/s (Inbound to Rate Limiter).
Blueprint
Concise Summary: A distributed rate limiter using an API Gateway as the entry point, a stateless Rate Limiter Service for logic, and a Redis cluster for high-speed atomic counter management.
Major Components:
API Gateway: Acts as the enforcement point, intercepting requests and querying the Rate Limiter Service.
Rate Limiter Service: A stateless service that executes the Sliding Window Counter logic.
Redis Cluster: Stores counters and uses Lua scripts to ensure atomicity.
Config Store: A persistent DB to store rate limit rules per tier/user.
Simplicity Audit: This design avoids complex distributed locks by offloading atomicity to Redis Lua scripts, keeping the service layer stateless and easy to scale horizontally.
Architecture Decision Rationale:
Redis is the industry standard for this due to its sub-millisecond latency and built-in atomic primitives.
Sliding Window Counter provides the best balance between memory efficiency and accuracy (prevents the "boundary burst" problem of Fixed Window).
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: DNS redirects traffic to the nearest regional API Gateway.
Security & Perimeter:
The API Gateway performs SSL termination and initial IP-based blacklisting before hitting the rate limiter logic.
Rate Limiting: The Gateway makes a gRPC call to the Rate Limiter Service for a "Go/No-Go" decision.
Service
Topology & Scaling: Stateless Go or Java services deployed in a Kubernetes cluster across multiple Availability Zones (AZ). Scales based on CPU and Request Count.
API Schema Design:
POST /v1/is-allowedProtocol: gRPC (for low latency and binary serialization).
Request:
{ "key": "user_123", "tier": "pro", "cost": 1 }Response:
{ "allowed": true, "remaining": 99, "reset_time": 1625091200 }Resilience & Reliability:
Fail-Open Policy: If the Rate Limiter Service returns a 5xx or times out, the API Gateway allows the request by default to ensure system availability.
Timeout: Strict 50ms timeout for the gRPC call.
Storage
Access Pattern: Heavy Write (Incr) and Heavy Read (Check count).
Database Table Design (Config DB):
rule_id: UUID (PK)target_type: String (USER_ID, IP, API_KEY)limit_value: Integerwindow_size: Integer (seconds)Technical Selection: PostgreSQL for the Config DB (relational, ACID for rules), but Redis for the operational counters.
Distribution Logic: Partition Redis using Consistent Hashing on the
key (e.g., hash(user_id) % num_shards) to ensure even distribution of traffic.Cache
Purpose & Justification:
Rule Cache: A local in-memory cache (e.g., Caffeine or LRU) inside the Rate Limiter Service to store rules from the Config DB, preventing a DB hit for every request.
Redis Counter: Acts as the centralized truth for current request counts.
Key-Value Schema:
Key:
ratelimit:{user_id}:{window_timestamp}Structure: Redis Hash or Sorted Set for Sliding Window.
Failure Handling: Use Redis Sentinel or Cluster Mode for high availability.
Infrastructure (Optional)
Observability:
Metrics: Track
limit_exceeded_count, request_latency, and cache_hit_ratio.Alerting: Alert if the "Fail-Open" mechanism is triggered or if Redis latency spikes.
Wrap Up
Advanced Topics
Trade-offs: We choose Consistency over Availability for the counter (using Redis), but we choose Availability over Accuracy for the service itself (Fail-Open strategy). This is a classic CAP theorem trade-off (CP for storage, AP for the service wrapper).
Sliding Window Optimization: To implement a true sliding window without the memory overhead of Sorted Sets, we use a Sliding Window Counter:
Current Count = count_in_current_window + count_in_previous_window * (1 - overlap_percentage). This provides a very close approximation with minimal storage.Race Conditions: Solved by using the following Lua script in Redis:
local current = redis.call("INCR", KEYS[1])
if current == 1 then
redis.call("EXPIRE", KEYS[1], ARGV[1])
end
return currentBottleneck Analysis: The Redis cluster could become a bottleneck. We mitigate this by sharding and, for extremely high volume, implementing Local Batching: The service increments a local counter and syncs to Redis every 50-100ms (trading off strict accuracy for extreme scale).