The Question
Design

Distributed Rate Limiter Design

Design a high-performance distributed rate-limiting system capable of handling millions of requests per second across a global infrastructure. The system must support various limiting algorithms, maintain high availability even during storage failures, and ensure minimal latency impact on protected API services.
Redis Cluster
Lua Script
Sliding Window Counter
API Gateway Middleware
Consistent Hashing
Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumption: 10M DAU, 100k peak Requests Per Second (RPS) across multiple geographic regions).
What is the required precision/accuracy? (Assumption: High precision is required; "Hard" limits are preferred over "Soft" approximations).
What is the granularity of the limits? (Assumption: Limits applied per User ID, API Key, or IP address, with support for multiple rules per request).
What should happen if the Rate Limiter service is unavailable? (Assumption: The system should "Fail-Open" to ensure high availability, allowing traffic through if the limiter cannot be reached).
What is the target latency overhead? (Assumption: The rate-limiting check must add < 5ms to the end-to-end request latency).

Thinking Process

Core Strategy: Use an API Gateway/Middleware layer that interacts with a high-performance, in-memory data store (Redis) using the Sliding Window Counter algorithm to balance accuracy and memory efficiency.
Progressive Logic:
How do we track counts across multiple application nodes? (Use a centralized Redis cluster).
How do we ensure atomicity and avoid race conditions? (Use Redis Lua scripts for "check-and-set" operations).
How do we minimize latency? (Deploy Redis instances close to the app servers and use local caching for static rate-limit configurations).
How do we handle massive spikes? (Implement a multi-tier approach: local memory for "hot" keys and Redis for global consistency).

Bonus Points

Lua Scripting Atomicity: Executing the counting logic entirely within Redis to eliminate the RTT (Round Trip Time) between the application and the cache for multiple operations.
Cellular/Sharded Architecture: Partitioning the rate-limiting keyspace across multiple Redis clusters based on UserID hash to prevent a single Redis node from becoming a bottleneck.
Consistent Hashing: Utilizing consistent hashing for the Redis cluster to ensure minimal remapping of counters during cluster resizing.
Hybrid Local-Global Limiting: Implementing a "Local Bucket" on the app server that periodically synchronizes with the "Global Bucket" in Redis to reduce the number of remote calls for extremely high-frequency users.
Design Breakdown

Functional Requirements

Limit requests based on unique identifiers (User ID, API Key, IP).
Support configurable time windows (e.g., 100 req/min, 10,000 req/hour).
Return standard HTTP 429 (Too Many Requests) with Retry-After headers.
Support multiple concurrent rules (e.g., Tier 1: 10/sec AND Tier 2: 500/min).

Non-Functional Requirements

Low Latency: Minimal impact on request processing time.
High Availability: The system must not be a Single Point of Failure (SPOF).
Scalability: Must handle growth in traffic linearly by adding more nodes/shards.
Accuracy: Provide precise enforcement of limits across a distributed cluster.

Estimation

RPS: 100,000.
Storage: Each counter (Sliding Window) takes approx. 64 bytes in Redis.
Total Users: 10M.
Memory for Counters: 10,000,000 \times 64 \text{ bytes} \approx 640\text{ MB}.
Network BW: 100k RPS with 1KB request/response \approx 100\text{ MB/s} throughput for the gateway.
Redis Throughput: A single Redis node handles ~100k OPS. For high availability and headroom, we require a 3-node cluster.

Blueprint

Concise Summary: A distributed rate limiter utilizing an API Gateway middleware that executes Redis Lua scripts to implement a Sliding Window Counter.
Major Components:
API Gateway (Middleware): Intercepts incoming requests and communicates with the Rate Limit Service/Cache to decide if the request should proceed.
Redis Cluster (Cache Layer): Stores the actual counts and window data in-memory for sub-millisecond access.
Config Database: Stores the policy definitions (e.g., "Tier-1 users get 1000 requests/min").
Simplicity Audit: This is the simplest design because it avoids complex stream processing or distributed consensus by leveraging Redis' single-threaded atomicity for count synchronization.
Architecture Decision Rationale:
Why this architecture?: Redis is industry-standard for high-speed counters. Lua scripts provide ACID-like atomicity for the "check-then-increment" pattern without the overhead of distributed locks.
Functional Requirement Satisfaction: Handles per-user/API key logic and supports 429 status codes via the Middleware.
Non-functional Requirement Satisfaction: Redis' in-memory nature ensures low latency; Sharding ensures scalability; Replication ensures availability.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: The Rate Limiter logic is implemented as a library within the API Gateway. This avoids an extra network hop to a "Rate Limit Microservice." The Gateway scales horizontally behind a Load Balancer.
API Spec:
isAllowed(key, rules): Internal function returning a boolean and the remaining quota.
Response Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Storage

Data Model:
Table `Policies: rule_id (PK), resource_path, user_tier, limit, window_size.
Database Logic: Rules are cached in-memory within the API Gateway and refreshed every 5 minutes via a background poller to minimize DB load.

Cache

Data Structure: Redis Sorted Sets or Hashes. For MVP, we use a Sliding Window Counter using a Hash where keys are timestamps (minutes) and values are counts.
TTL: Each counter key in Redis is assigned a TTL equal to the window size + 10 seconds to ensure auto-cleanup of inactive users.
Lua Script Logic:
Calculate current window bucket.
Increment the bucket.
Sum the values of the last N buckets.
If sum > limit, return reject; else, return success.
Wrap Up

Advanced Topics

Monitoring:
RateLimit_Drop_Count: Number of 429s issued.
Redis_Latency: Time taken for the Lua script execution.
Cache_Hit_Rate: For local rule configurations.
Trade-offs:
Consistency vs. Latency: We choose "Eventual Consistency" for rules (cached locally) but "Strong Consistency" for counters (centralized Redis).
Bottlenecks: A single Redis node can be a bottleneck. We mitigate this by sharding the keyspace using the UserID.
Failure Handling:
Redis Down: The middleware catches the exception and allows the request (Fail-Open). It logs the failure for alerting.
Alternatives:
Token Bucket: Simpler to implement but less accurate for bursty traffic over fixed windows.
Envoy Global Rate Limit Service: A more robust, out-of-the-box alternative if using a service mesh.