The Question

Scalable Distributed Rate Limiter

Design a high-performance distributed rate limiting system capable of handling 10 million requests per second with sub-2ms latency. The system must support various granularities (User, IP, API Key) and dynamic rule updates. Focus on high availability, the trade-offs between consistency and performance, and ensuring the system does not become a single point of failure for the entire architecture.

Redis

Lua

gRPC

PostgreSQL

Token Bucket

API Gateway

Sidecar Pattern

Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumed: 10M+ Requests Per Second (QPS) across a global user base).

What is the required latency overhead for a rate-limit check? (Assumed: Sub-2ms overhead to avoid impacting user experience).

Should the system fail "open" or "closed"? (Assumed: Fail "open" to prioritize availability—if the rate limiter is down, traffic should still flow).

What is the granularity of the limits? (Assumed: Support for multiple tiers—e.g., per-User ID, per-IP, and per-API endpoint).

Are rules static or dynamic? (Assumed: Rules are updated via an Admin API and should propagate within seconds).

Thinking Process

Core Algorithm Choice: How do we balance accuracy and memory? (Token Bucket for flexibility vs. Fixed Window for simplicity).

State Management: Where do we store counters to ensure distributed consistency? (Centralized high-performance K-V store like Redis).

Concurrency Control: How do we avoid the "read-modify-write" race condition? (Atomic operations using Redis Lua scripting).

Resilience Strategy: How do we ensure the rate limiter doesn't become a Single Point of Failure (SPOF)? (Local in-memory shadowing and fail-open mechanisms).

Bonus Points

Local Batching / Thick Client: Discuss reducing network round-trips to Redis by batching "token acquisitions" locally at the application or gateway level.

Hierarchical Rate Limiting: Implementing a global limit (Redis-based) paired with a local limit (In-memory) to handle massive bursts or "Thundering Herd" scenarios.

Clock Drift Mitigation: Handling inaccuracies in distributed systems when using "Sliding Window Log" or timestamp-dependent algorithms.

Shadow Mode: Deploying new rate limits in "dry-run" mode to analyze impact before enforcement.

Design Breakdown

Functional Requirements

Core Use Cases:

isAllow(key, limit, window): Determine if a request should be throttled.

Return headers (X-Ratelimit-Limit, X-Ratelimit-Remaining, X-Ratelimit-Retry-After).

Dynamic rule configuration (Add/Update/Delete limits).

Scope Control:

In-scope: Distributed counter management, low-latency decision engine, and rule storage.

Out-of-scope: Client-side SDK enforcement (logic remains server-side/gateway), sophisticated fraud detection (WAF territory).

Non-Functional Requirements

Scale: Must handle 10M+ peak QPS.

Latency: P99 response time < 2ms.

Availability: 99.99% availability; system must fail-open.

Consistency: Eventual consistency is acceptable for global limits, but strict atomicity is required for local counters to prevent over-limit leakage.

Security: Prevent malicious actors from bypassing limits via IP spoofing or header manipulation at the edge.

Estimation

Traffic: 10M QPS.

Storage:

Each key (User ID + Rule ID): ~100 bytes.

100M active users/keys = 10GB RAM (well within a single large Redis cluster capacity).

Bandwidth:

Request: ~200 bytes per check (Key + metadata).

10M * 200B = 2 GB/s total network throughput to the Rate Limiter layer.

CPU: High-performance Lua execution in Redis is single-threaded per shard; will require sharding across 10-20 Redis nodes.

Blueprint

Concise Summary: A sidecar or gateway-integrated service that checks a centralized Redis cluster using the Token Bucket algorithm implemented via Lua scripts.

Major Components:

API Gateway / Sidecar: Intercepts incoming requests and queries the Rate Limiter service.

Rate Limiter Service: Stateless microservice that encapsulates the logic and interfaces with storage.

Redis Cluster: Stores real-time counters and executes atomic logic via Lua scripts.

Rule Repository: Relational database for persistent storage of rate-limiting configurations.

Simplicity Audit: This design avoids complex synchronization protocols (like Paxos/Raft) by delegating atomicity to Redis, which is standard for MVPs.

Architecture Decision Rationale:

Why this architecture?: Redis provides the necessary sub-millisecond latency and built-in atomic primitives (Lua) required for high-throughput counting.

Functional Satisfaction: Meets the need for isAllow checks and dynamic rules.

Non-functional Satisfaction: Scalable through Redis sharding; highly available through Redis Sentinel/Cluster and "fail-open" logic in the service layer.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: DNS routes traffic to the nearest regional Load Balancer.

Security & Perimeter:

API Gateway: Performs SSL termination and basic AuthN before calling the Rate Limiter.

Rate Limiting: First line of defense against DDoS.

Fail-Open: If the Rate Limiter service returns an error or times out (>5ms), the Gateway logs the error and allows the request to pass.

Service

Topology & Scaling:

Stateless instances deployed in a Multi-AZ configuration.

Scaling based on CPU and Request Count (QPS).

API Schema Design:

POST /v1/check

Protocol: gRPC (for performance/low-overhead).

Request: { "key": "user_123", "action": "create_post" }

Response: { "allowed": true, "remaining": 49, "reset_time": 1625000000 }

Resilience & Reliability:

Circuit Breaker: If Redis latency spikes, the service stops querying and fails open immediately.

Retries: No retries for rate-limit checks (to keep latency low).

Storage

Access Pattern: Extremely high write/read frequency (1:1 ratio).

Database Table Design (Rule DB):

rule_id (PK), resource_path, method, limit_count, window_seconds, is_active.

Technical Selection:

Rule DB: PostgreSQL (Relational) because rules change infrequently and require ACID for configuration management.

Counter Store: Redis (In-memory).

Distribution Logic:

Redis Sharding: Key-based sharding (CRC16 of UserID) to distribute load across the cluster.

Cache

Purpose & Justification: Redis acts as the primary state store (not just a cache).

Key-Value Schema:

Key: ratelimit:{user_id}:{rule_id}

Value: Current token count / Timestamp.

Lua Script (Token Bucket):

  local tokens = redis.call('get', KEYS[1])
  if tokens == nil then
    tokens = ARGV[1] -- Initial limit
  end
  if tonumber(tokens) > 0 then
    redis.call('decr', KEYS[1])
    return 1
  else
    return 0
  end

Failure Handling: Use Redis Replication (Leader/Follower) for high availability.

Infrastructure (Optional)

Observability:

Metrics: Track rate_limit_exceeded count per rule and redis_latency.

Alerting: Alert if P99 latency > 5ms or if Redis memory usage > 80%.

Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP over CP in CAP terms). If Redis is unreachable, we allow traffic. This prevents a rate-limiter outage from becoming a site-wide outage.

Reliability: Use a "Local Cache" in the Rate Limiter service to store rules from the Rule DB, refreshed every 30 seconds via a background thread to minimize DB load.

Bottleneck Analysis: Redis is the bottleneck. Mitigation: Use a "Thick Client" approach where the Gateway caches the fact that a user is already blocked for X seconds, avoiding redundant calls for blocked users.

Optimization: For global "heavy hitters" (e.g., an IP hammering the system), the Gateway can promote that IP to a "Deny List" at the L4/L7 Load Balancer level temporarily to bypass the Service Layer entirely.