The Question
Design

Distributed Rate Limiter

Design a scalable distributed rate limiting system for a high-traffic API. The system should enforce per-user and per-endpoint request quotas across multiple gateway nodes, handle burst traffic gracefully, and add minimal latency overhead to each request.
Redis Cluster
Lua Script
Sliding Window Counter
API Gateway Middleware
Questions & Insights

Thinking Process

Core Bottleneck: High-frequency read-modify-write operations on shared state (counters) across distributed nodes, which can lead to race conditions and increased latency.
Atomic Operations: Using Redis Lua scripts to ensure that checking the limit and incrementing the counter happens as a single atomic operation, preventing "double-spending" of tokens.
Fail-Open Strategy: Designing the system to allow requests if the rate limiter itself is unreachable or times out, ensuring the rate limiter doesn't become a Single Point of Failure (SPOF) for the entire API.
Logic Progression Questions:
How do we ensure the rate limiting logic adds < 2ms of overhead to the request path?
How do we handle distributed race conditions when multiple app servers update the same user's counter?
Which algorithm provides the best balance between memory efficiency and protection against "burstiness"?
How do we ensure the system scales to millions of users without crashing the centralized counter store?

Bonus Points

Local Memory Tiering: Implementing a two-tier rate limit where a coarse-grained limit is checked in the application's local RAM (L1) before hitting Redis (L2) to reduce network I/O for heavily throttled malicious actors.
Generic Cell Rate Algorithm (GCRA): Mentioning GCRA as the most memory-efficient way to implement a leaky bucket/sliding window hybrid, used by companies like Heroku and Cloudflare.
Global Clock Synchronization: Discussing the impact of clock drift on sliding window algorithms and how using Redis's centralized time (TIME command) mitigates this.
Cluster Sharding: Using consistent hashing for Redis keys (e.g., {user_id}) to ensure all requests for a single user land on the same Redis shard, maintaining atomicity without cross-slot overhead.
Design Breakdown

Functional Requirements

Throttling: Limit requests based on User ID, IP Address, or API Key.
Decision Feedback: Return standard HTTP 429 (Too Many Requests) with Retry-After headers.
Configurability: Support different tiers (e.g., Basic: 100 req/min, Premium: 1000 req/min).

Non-Functional Requirements

Low Latency: The check must be extremely fast (< 2ms) to avoid degrading API performance.
Accuracy: The sliding window must be precise to prevent users from bypassing limits during window boundaries.
High Availability: The rate limiter must stay available even if one Redis node fails.
Scalability: Must handle millions of requests per second across a global user base.

Estimation

Traffic: 1,000,000 Requests Per Second (RPS).
Storage per Key: User ID (8 bytes) + Counter (4 bytes) + Timestamp (8 bytes) + Redis overhead ≈ 100 bytes.
Total Memory: 10,000,000 active users * 100 bytes = 1 GB.
Bandwidth: 1M RPS * 100 bytes per Redis call = 100 MB/s.
Redis Performance: A single Redis node can handle ~100k-150k operations per second. For 1M RPS, we need a Redis Cluster with at least 10 shards (plus replicas for HA).

Blueprint

Concise Summary: An edge-compatible middleware system that intercepts API requests, performs an atomic counter check against a Redis Cluster using a Sliding Window Counter algorithm, and allows or denies the request based on the result.
Major Components:
API Gateway / Middleware: Intercepts requests, extracts identifying keys (e.g., UserID), and communicates with the cache.
Redis Cluster (Cache): Acts as the high-speed, distributed state store for request counters and timestamps.
Simplicity Audit: This design avoids complex message queues or persistent databases in the critical path, utilizing Redis's native speed and atomic primitives to satisfy all requirements with minimal infrastructure.
Architecture Decision Rationale:
Why this architecture is the best?: Centralizing state in Redis ensures consistency across multiple distributed API servers while maintaining sub-millisecond data access.
Functional Requirement Satisfaction: Meets the need for throttling and 429 responses via middleware logic.
Non-functional Requirement Satisfaction: Redis provides the necessary low latency and high RPS throughput; clustering provides scalability and availability.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: The Rate Limiter resides as a Middleware module within the API Gateway. It scales horizontally with the Gateway instances.
API Spec:
Logic: For every request, the middleware executes GET_LIMIT(key).
Protocol: Internal communication to Redis via RESP (Redis Serialization Protocol).
Fail-Safe: A circuit breaker wraps the Redis call. If Redis latency > 50ms or connection fails, the middleware logs the error and defaults to "Allow".

Cache

Data Model: Uses a Sliding Window Counter. Keys are formatted as limiter:{user_id}:{minute_timestamp}.
Database Logic:
Algorithm: For a 1-minute window, we track the count for the current minute and the previous minute.
Atomic Calculation: count = current_minute_count + (previous_minute_count * (remainder_of_seconds_in_window / 60)).
Implementation: A Lua script performs INCR on the current minute key and sets a 2-minute TTL (Time-To-Live) automatically.
Wrap Up

Advanced Topics

Monitoring:
Key Metrics: rate_limit_exceeded_count (to identify attacks), redis_latency (critical for API performance), and middleware_fail_open_count.
Tools: Prometheus for metrics collection, Grafana for visualization.
Trade-offs:
Consistency vs. Latency: We choose "Eventual Consistency" across Redis replicas for reads but "Strong Consistency" for writes on the master shard to ensure accurate counting. If the master fails, a slight inaccuracy may occur during failover.
Bottlenecks: The Redis CPU can become a bottleneck if Lua scripts are too complex. Optimization involves keeping Lua scripts O(1).
Failure Handling:
Redis Down: API Gateway uses a "Soft Fail" strategy (Allow request).
Data Eviction: Redis must be configured with noeviction policy to prevent it from deleting rate limit counters under memory pressure, which would inadvertently allow more traffic.
Alternatives & Optimization:
Alternative: Fixed Window (simpler but allows 2x burst at window boundaries).
Optimization: Local Cache Bloom Filter. Before calling Redis, check a local Bloom filter to see if the user is definitely not over the limit (requires sophisticated synchronization, usually YAGNI for MVP).