The Question
DesignScalable Distributed Rate Limiter
Design a high-performance distributed rate limiting system capable of handling 10 million requests per second with sub-2ms latency. The system must support various granularities (User, IP, API Key) and dynamic rule updates. Focus on high availability, the trade-offs between consistency and performance, and ensuring the system does not become a single point of failure for the entire architecture.
Redis
Lua
gRPC
PostgreSQL
Token Bucket
API Gateway
Sidecar Pattern
Questions & Insights
Clarifying Questions
What is the scale of the system? (Assumed: 10M+ Requests Per Second (QPS) across a global user base).
What is the required latency overhead for a rate-limit check? (Assumed: Sub-2ms overhead to avoid impacting user experience).
Should the system fail "open" or "closed"? (Assumed: Fail "open" to prioritize availability—if the rate limiter is down, traffic should still flow).
What is the granularity of the limits? (Assumed: Support for multiple tiers—e.g., per-User ID, per-IP, and per-API endpoint).
Are rules static or dynamic? (Assumed: Rules are updated via an Admin API and should propagate within seconds).
Thinking Process
Core Algorithm Choice: How do we balance accuracy and memory? (Token Bucket for flexibility vs. Fixed Window for simplicity).
State Management: Where do we store counters to ensure distributed consistency? (Centralized high-performance K-V store like Redis).
Concurrency Control: How do we avoid the "read-modify-write" race condition? (Atomic operations using Redis Lua scripting).
Resilience Strategy: How do we ensure the rate limiter doesn't become a Single Point of Failure (SPOF)? (Local in-memory shadowing and fail-open mechanisms).
Bonus Points
Local Batching / Thick Client: Discuss reducing network round-trips to Redis by batching "token acquisitions" locally at the application or gateway level.
Hierarchical Rate Limiting: Implementing a global limit (Redis-based) paired with a local limit (In-memory) to handle massive bursts or "Thundering Herd" scenarios.
Clock Drift Mitigation: Handling inaccuracies in distributed systems when using "Sliding Window Log" or timestamp-dependent algorithms.
Shadow Mode: Deploying new rate limits in "dry-run" mode to analyze impact before enforcement.
Design Breakdown
Functional Requirements
Core Use Cases:
isAllow(key, limit, window): Determine if a request should be throttled.Return headers (X-Ratelimit-Limit, X-Ratelimit-Remaining, X-Ratelimit-Retry-After).
Dynamic rule configuration (Add/Update/Delete limits).
Scope Control:
In-scope: Distributed counter management, low-latency decision engine, and rule storage.
Out-of-scope: Client-side SDK enforcement (logic remains server-side/gateway), sophisticated fraud detection (WAF territory).
Non-Functional Requirements
Scale: Must handle 10M+ peak QPS.
Latency: P99 response time < 2ms.
Availability: 99.99% availability; system must fail-open.
Consistency: Eventual consistency is acceptable for global limits, but strict atomicity is required for local counters to prevent over-limit leakage.
Security: Prevent malicious actors from bypassing limits via IP spoofing or header manipulation at the edge.
Estimation
Traffic: 10M QPS.
Storage:
Each key (User ID + Rule ID): ~100 bytes.
100M active users/keys = 10GB RAM (well within a single large Redis cluster capacity).
Bandwidth:
Request: ~200 bytes per check (Key + metadata).
10M * 200B = 2 GB/s total network throughput to the Rate Limiter layer.
CPU: High-performance Lua execution in Redis is single-threaded per shard; will require sharding across 10-20 Redis nodes.
Blueprint
Concise Summary: A sidecar or gateway-integrated service that checks a centralized Redis cluster using the Token Bucket algorithm implemented via Lua scripts.
Major Components:
API Gateway / Sidecar: Intercepts incoming requests and queries the Rate Limiter service.
Rate Limiter Service: Stateless microservice that encapsulates the logic and interfaces with storage.
Redis Cluster: Stores real-time counters and executes atomic logic via Lua scripts.
Rule Repository: Relational database for persistent storage of rate-limiting configurations.
Simplicity Audit: This design avoids complex synchronization protocols (like Paxos/Raft) by delegating atomicity to Redis, which is standard for MVPs.
Architecture Decision Rationale:
Why this architecture?: Redis provides the necessary sub-millisecond latency and built-in atomic primitives (Lua) required for high-throughput counting.
Functional Satisfaction: Meets the need for
isAllow checks and dynamic rules.Non-functional Satisfaction: Scalable through Redis sharding; highly available through Redis Sentinel/Cluster and "fail-open" logic in the service layer.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: DNS routes traffic to the nearest regional Load Balancer.
Security & Perimeter:
API Gateway: Performs SSL termination and basic AuthN before calling the Rate Limiter.
Rate Limiting: First line of defense against DDoS.
Fail-Open: If the Rate Limiter service returns an error or times out (>5ms), the Gateway logs the error and allows the request to pass.
Service
Topology & Scaling:
Stateless instances deployed in a Multi-AZ configuration.
Scaling based on CPU and Request Count (QPS).
API Schema Design:
POST /v1/checkProtocol: gRPC (for performance/low-overhead).
Request:
{ "key": "user_123", "action": "create_post" }Response:
{ "allowed": true, "remaining": 49, "reset_time": 1625000000 }Resilience & Reliability:
Circuit Breaker: If Redis latency spikes, the service stops querying and fails open immediately.
Retries: No retries for rate-limit checks (to keep latency low).
Storage
Access Pattern: Extremely high write/read frequency (1:1 ratio).
Database Table Design (Rule DB):
rule_id (PK), resource_path, method, limit_count, window_seconds, is_active.Technical Selection:
Rule DB: PostgreSQL (Relational) because rules change infrequently and require ACID for configuration management.
Counter Store: Redis (In-memory).
Distribution Logic:
Redis Sharding: Key-based sharding (CRC16 of UserID) to distribute load across the cluster.
Cache
Purpose & Justification: Redis acts as the primary state store (not just a cache).
Key-Value Schema:
Key:
ratelimit:{user_id}:{rule_id}Value: Current token count / Timestamp.
Lua Script (Token Bucket):
local tokens = redis.call('get', KEYS[1])
if tokens == nil then
tokens = ARGV[1] -- Initial limit
end
if tonumber(tokens) > 0 then
redis.call('decr', KEYS[1])
return 1
else
return 0
endFailure Handling: Use Redis Replication (Leader/Follower) for high availability.
Infrastructure (Optional)
Observability:
Metrics: Track
rate_limit_exceeded count per rule and redis_latency.Alerting: Alert if P99 latency > 5ms or if Redis memory usage > 80%.
Wrap Up
Advanced Topics
Trade-offs: We choose Availability over Consistency (AP over CP in CAP terms). If Redis is unreachable, we allow traffic. This prevents a rate-limiter outage from becoming a site-wide outage.
Reliability: Use a "Local Cache" in the Rate Limiter service to store rules from the Rule DB, refreshed every 30 seconds via a background thread to minimize DB load.
Bottleneck Analysis: Redis is the bottleneck. Mitigation: Use a "Thick Client" approach where the Gateway caches the fact that a user is already blocked for X seconds, avoiding redundant calls for blocked users.
Optimization: For global "heavy hitters" (e.g., an IP hammering the system), the Gateway can promote that IP to a "Deny List" at the L4/L7 Load Balancer level temporarily to bypass the Service Layer entirely.