The Question

Scalable Distributed Rate Limiter

Design a distributed rate-limiting system for a global high-traffic API. The system must support millions of users, maintain sub-5ms latency overhead, and handle different limiting tiers (e.g., free vs. premium). Discuss the choice of algorithm, handling of race conditions in a distributed environment, and the trade-offs between system availability and rate-limiting accuracy during partial network failures.

Redis

Lua

API Gateway

NoSQL

Redis Cluster

Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumption: 10 million Daily Active Users (DAU), peak traffic of 1 million QPS).

What is the required latency overhead for the rate-limiting check? (Assumption: Extremely low, < 5ms, to minimize impact on API performance).

What is the granularity of the limits? (Assumption: Primarily by User ID or API Key, with fallback to IP address).

What should happen if the rate-limiting service is down? (Assumption: Fail-open to ensure high availability of the core API service).

Do we need strict accuracy across global regions? (Assumption: For MVP, local regional consistency is sufficient; global synchronization is secondary).

Thinking Process

Core Bottleneck: Performing a read-modify-write operation on every single API request without introducing significant latency or race conditions.

Progressive Logic:

How do we store and increment counters atomically? (Redis with Lua scripts).

How do we minimize network round-trips? (Middleware/Sidecar approach in the API Gateway).

Which algorithm balances memory efficiency and accuracy? (Sliding Window Counter).

How do we scale the configuration management of limits? (Centralized Configuration Service).

Bonus Points

Race Condition Prevention: Utilizing Redis Lua scripts to ensure that "check-then-set" operations are atomic, preventing "double-spending" of tokens in a distributed environment.

Memory Optimization: Using Redis Hashes or compressed data structures for sliding windows to keep the memory footprint under 1GB even for millions of active users.

Fail-Open Strategy: Implementing a local cache bypass or circuit breaker in the middleware so that if Redis latency spikes, the rate limiter steps aside rather than taking down the API.

Multi-tiered Limiting: Designing for both "hard limits" (block request) and "soft limits" (log/warn or throttle) using a single evaluation engine.

Design Breakdown

Functional Requirements

Core Use Cases:

Authenticate and identify the requester (User ID / API Key).

Track request counts per defined time window (e.g., 1000 requests/minute).

Block requests exceeding the limit with a HTTP 429 status code.

Return rate-limit metadata in HTTP headers (Remaining, Reset Time).

Scope Control:

In-Scope: Distributed counting, sliding window logic, configuration management.

Out-of-Scope: Tiered billing/monetization logic, complex WAF-style DDOS mitigation (e.g., packet inspection), deep user behavioral analysis.

Non-Functional Requirements

Scale: Must handle 1M peak QPS.

Latency: Rate limiting check must add < 5ms to the request lifecycle.

Availability & Reliability: 99.99% availability; the rate limiter must not be a single point of failure for the API.

Consistency: Eventual consistency across regions is acceptable; strong consistency within a single region.

Fault Tolerance: Graceful degradation if the counter store is unreachable.

Security: Prevent bypass via header spoofing or IP rotation.

Estimation

Traffic Estimation:

1M peak QPS.

Each request triggers 1 check.

Storage Estimation:

10M active users/day.

Each user key in Redis: UserID (16 bytes) + Counter (8 bytes) + Window overhead (8 bytes)

\approx

32 bytes.

Total RAM:

10M \times 32 \text{ bytes} \approx 320 \text{ MB}

Bandwidth Estimation:

Small payloads (headers only).

1M QPS

\times

100 bytes/request

\approx

100 MB/s network throughput between Gateway and Redis.

Blueprint

Concise Summary: A distributed rate-limiting architecture using an API Gateway middleware that communicates with a high-performance Redis cluster using the Sliding Window Counter algorithm.

Major Components:

API Gateway: Acts as the entry point, executing the rate-limiting logic via a plugin or middleware before routing to downstream services.

Redis Cluster: Provides the centralized, in-memory atomic storage for request counters.

Config Service: Manages and distributes rate-limiting rules (e.g., "tier-1-users: 5000/min") to the Gateway.

Simplicity Audit: This design avoids complex "Sliding Window Log" storage (which is memory-heavy) and avoids custom service-mesh overhead, sticking to a battle-tested Gateway + Redis pattern.

Architecture Decision Rationale:

Why this architecture?: Redis is industry-standard for sub-millisecond atomic increments. Centralizing counters ensures that a user hitting different Gateway nodes is still accurately limited.

Functional Satisfaction: Meets all requirements for blocking, headers, and identification.

Non-functional Satisfaction: Scalable via Redis Sharding; Low latency via in-memory operations; Highly available via Redis Sentinel/Cluster and fail-open middleware logic.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global Load Balancer (Anycast) routes traffic to the nearest regional API Gateway.

Security & Perimeter:

API Gateway: Handles SSL termination and extracts the Authorization header for identity-based limiting.

Rate Limiting Middleware: Located inside the Gateway. It calculates the unique bucket key (e.g., rate_limit:v1:user_123:min_59).

Service

Topology & Scaling:

API Gateway is deployed as a stateless cluster across multiple Availability Zones (AZs).

Horizontal scaling based on CPU and Network I/O.

API Schema Design:

Endpoint: Intercepts all incoming API calls.

Protocol: REST/gRPC.

Response Headers:

X-Ratelimit-Limit: Max requests allowed.

X-Ratelimit-Remaining: Remaining requests in window.

X-Ratelimit-Reset: UTC epoch when the window resets.

Resilience & Reliability:

Circuit Breaker: If Redis round-trip exceeds 10ms, the middleware enters "bypass mode" (fail-open).

Local Cache: Gateway caches configuration rules from the Config Service for 1 minute to avoid external calls per request.

Storage

Access Pattern: 100% Write-heavy (every check is an increment).

Database Table Design:

Type: Redis (Key-Value).

Key Structure: {prefix}:{user_id}:{time_window_timestamp}.

Value: Integer (Counter).

TTL: Set to window size + 60 seconds for auto-cleanup.

Technical Selection: Redis.

Rationale: Support for Lua scripts allows the "Sliding Window Counter" logic (checking the previous window's count and current window's count) to be calculated in a single atomic step.

Distribution Logic:

Sharding: Redis Cluster sharding based on {user_id} hash tag to ensure all data for a specific user resides on the same node (enabling Lua atomicity).

Cache

Purpose: Local memory cache inside the API Gateway process.

Justification: Reduces latency for fetching rate-limit rules (e.g., "How many requests is a Basic User allowed?").

Key-Value Schema: rule:{user_tier} -> {limit_config_json}.

Failure Handling: If the Config Service is down, the Gateway uses the last known cached rules.

Wrap Up

Advanced Topics

Trade-offs:

Accuracy vs. Performance: The Sliding Window Counter algorithm is used over the Sliding Window Log. Log is 100% accurate but uses significantly more memory (storing every timestamp). Counter approximates the window boundary but uses fixed memory per user.

Reliability & Failure Handling:

Fail-Open: If the Redis cluster is unreachable, we log the error and allow the request through. It is better to let an occasional over-limit request through than to block legitimate users during a system hiccup.

Bottleneck Analysis:

Redis Hotkeys: If a specific user is extremely active (e.g., a bot), it creates a hotkey in Redis. Redis Cluster handles this via sharding, and high-performance Lua scripts keep the execution time sub-millisecond.

Security & Privacy:

Identity: Limits are applied after AuthN/AuthZ to prevent malicious actors from exhausting a legitimate user's quota via IP spoofing.

Optimization:

Batching Updates: For extremely high volume, the middleware could buffer increments locally for 100ms and send a single INCRBY to Redis, though this sacrifices some accuracy.