The Question

Distributed Rate Limiter Design

Design a distributed rate limiting system capable of handling 50,000 peak QPS. The system must support various algorithms (e.g., Sliding Window), provide sub-5ms latency, and ensure high availability across multiple application nodes. Discuss how you would handle race conditions, rule management, and system failures without significantly impacting the user experience.

Redis

Lua Scripting

API Gateway

PostgreSQL

Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumption: 10M DAU, average 10,000 QPS, peak 50,000 QPS).

What is the desired latency overhead? (Assumption: The rate limiter must add

<5ms

to the request lifecycle).

Is this a centralized or distributed rate limiter? (Assumption: Distributed, as we have multiple application nodes).

What should happen if the rate limiter service or cache is down? (Assumption: Fail-open for MVP to ensure user experience, though this is configurable).

What are the limiting criteria? (Assumption: User ID, API Key, or IP Address).

Thinking Process

Core Bottleneck: Minimizing the "check-and-set" latency and preventing race conditions in a distributed environment.

Progressive Walkthrough:

How do we store counters across multiple servers? (Redis).

How do we prevent race conditions during concurrent updates? (Redis Lua scripts).

How do we minimize network round-trips? (Middleware/Sidecar pattern).

How do we handle different rules for different tiers? (Configuration Service).

Bonus Points

Race Condition Optimization: Using Lua scripts in Redis to ensure "Get-then-Increment" operations are atomic without needing heavy distributed locks.

Local Cache Tiering: Implementing a two-tier rate limiting strategy (L1 local in-memory for massive spikes, L2 Redis for global accuracy) to reduce Redis pressure.

Operational Safety: Implementing a "Shadow Mode" where the rate limiter logs "would-be-blocked" requests without actually dropping them to tune thresholds.

Clock Drift Resilience: Using Redis server time instead of application server time to ensure sliding window accuracy across a distributed cluster.

Design Breakdown

Functional Requirements

Core Use Cases:

Limit requests based on a defined window (e.g., 100 requests per minute).

Return 429 Too Many Requests when limits are exceeded.

Include headers (X-Ratelimit-Remaining, X-Ratelimit-Limit) in the response.

Scope Control:

In-scope: Distributed counter management, sliding window algorithm, and basic rule configuration.

Out-of-scope: Complex fraud detection, user-facing dashboard for limit management, or permanent IP blacklisting (WAF territory).

Non-Functional Requirements

Scale: Support 50k+ peak QPS with horizontal scalability.

Latency: Sub-millisecond lookup time in the hot path.

Availability: 99.99% availability; if the limiter is unreachable, the system should allow the request through (Fail-open).

Consistency: Eventual consistency is acceptable for some use cases, but high accuracy is preferred (Sliding Window Counter).

Fault Tolerance: Graceful degradation if the Redis cluster is partitioned.

Estimation

Traffic Estimation: 10,000 QPS average. Peak 50,000 QPS.

Storage Estimation:

Each counter (Key: UserID + Window)

\approx

64 bytes.

1M active users per minute = 1,000,000 keys.

1,000,000 \times 64 \text{ bytes} \approx 64 \text{ MB}

of RAM.

Redis can easily handle this in-memory.

Bandwidth Estimation:

Each request check is a small packet (~200 bytes).

50,000 \text{ QPS} \times 200 \text{ bytes} \approx 10 \text{ MB/s}

network overhead.

Blueprint

Concise Summary: A distributed rate limiter implemented as a middleware component within an API Gateway, utilizing a Redis-based Sliding Window Counter for high-accuracy throttling.

Major Components:

API Gateway/Middleware: Intercepts incoming requests and communicates with the rate limiter logic.

Redis Cluster: Stores the counts and timestamps for every user/key using an atomic sliding window approach.

Config Service: Stores the rate-limiting rules (e.g., Tier A = 100/min) and pushes them to the Gateway.

Simplicity Audit: This architecture avoids complex stream processing or persistent databases for the hot path, relying on Redis for its proven low-latency performance.

Architecture Decision Rationale:

Why this?: Redis Lua scripts solve the atomicity problem without the overhead of ZooKeeper or complex locking.

Functional Satisfaction: Meets all requirements for limiting and response headers.

Non-functional Satisfaction: High availability via Redis replication and low latency via in-memory operations.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Not strictly necessary for the limiter logic itself, but the Gateway sits at the edge to stop traffic before it hits expensive compute resources.

Security & Perimeter:

API Gateway: Acts as the enforcement point.

Rate Limiting: Applied at the Gateway level to protect downstream services from DDoS and resource exhaustion.

Service

Topology & Scaling:

Stateless Gateway nodes scaled based on CPU and Network I/O.

Middleware logic is embedded in the Gateway for zero-hop execution.

API Schema Design:

Internal Check: isAllowed(key, limit, windowSize)

Response Headers:

X-Ratelimit-Limit: Total quota.

X-Ratelimit-Remaining: Leftover quota in window.

X-Ratelimit-Retry-After: Seconds until the next window.

Resilience & Reliability:

Fail-open: If Redis returns an error or times out (e.g., > 50ms), the middleware logs the error and increments a "limiter_error" metric but allows the request.

Circuit Breaker: If Redis is consistently down, stop trying to connect to prevent Gateway latency bloat.

Storage

Access Pattern: 100% Key-Value lookups. 1:1 Read/Write ratio (every check is an increment).

Database Table Design (Config DB):

rule_id (PK), resource_path, method, limit, window_seconds, client_tier.

Technical Selection:

Redis: Specifically for the counter store.

PostgreSQL/S3: For the Rule Store (static configs).

Distribution Logic:

Redis Cluster with sharding by user_id or api_key to avoid hot partitions.

Cache

Purpose & Justification: Redis is used as the primary state store for the Sliding Window Counter.

Key-Value Schema:

Key: ratelimit:{user_id}:{timestamp_minute}.

Data Structure: Redis Sorted Set (ZSET) or simple Counters with Lua.

Algorithm (Sliding Window Counter):

Use ZREMRANGEBYSCORE to remove old entries.

Use ZCARD to get the current count.

Use ZADD to add the current request.

Set TTL on the key to 2x the window size.

Failure Handling: Use Redis Sentinel or Cluster for high availability.

Infrastructure (Optional)

Observability:

Metrics: Track dropped_requests_count, redis_latency, and cache_hits.

Alerting: Alert if dropped request rate exceeds 20% (potential attack or misconfiguration).

Wrap Up

Advanced Topics

Trade-offs: We choose Consistency/Availability (AP) over Strong Consistency. If one Redis node is slightly out of sync, a user might get 101 requests instead of 100. This is acceptable.

Reliability: Exponential backoff for rule-store fetching; if rules cannot be fetched, fall back to hardcoded "safety" defaults.

Bottleneck Analysis: The main bottleneck is Redis network throughput. This is mitigated by sharding and potentially using a local L1 cache for the most aggressive global limits.

Security: Prevent "Key Exhaustion" attacks by validating the API Key format before checking the rate limiter to prevent flooding Redis with junk keys.