DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Distributed Rate Limiter

Design a high-performance, distributed rate-limiting system capable of handling millions of requests per second across a global API infrastructure. The system must support multiple limiting strategies (e.g., sliding window), ensure sub-millisecond overhead, and maintain high availability even during partial network partitions or cache failures. Address how you would handle race conditions in a distributed environment and discuss the trade-offs between accuracy and system latency.
Redis
Lua
API Gateway
Distributed Cache
Circuit Breaker
Sliding Window
Questions & Insights

Clarifying Questions

What is the expected scale (QPS and DAU)?Assumption: 1 million requests per second (RPS) at peak, supporting 100 million monthly active users.
What are the latency requirements for the rate-limiting check?Assumption: The check must add <2ms of overhead to the total request latency.
What level of accuracy is required (Strict vs. Soft)?Assumption: Soft-limiting is acceptable during extreme partitions to ensure availability, but 99% accuracy is expected during normal operation.
Is the rate limiting per-user, per-IP, or per-API key?Assumption: Support for hierarchical limiting (e.g., User ID + API Endpoint).
Should the system be centralized or distributed?Assumption: Distributed, as the API serves traffic across multiple regions.

Thinking Process

Core Bottleneck: The primary challenge is the "Read-Modify-Write" race condition when multiple parallel requests update a shared counter in a distributed environment.
Progression:
How do we track counts across multiple application nodes without a single point of failure? (Distributed Cache).
How do we ensure the counting logic is atomic and fast? (Redis + Lua Scripting).
How do we handle different windowing algorithms (Fixed vs. Sliding) for accuracy? (Sliding Window Counter).
How do we ensure the API remains available even if the Rate Limiter service fails? (Fail-open strategy).

Bonus Points

Local Cache Tiering: Use a small in-memory cache (L1) on the application server to filter extremely high-frequency "hot" keys before hitting Redis (L2) to reduce network overhead.
Lua Script Atomicity: Implement the Sliding Window Counter logic within a single Redis Lua script to eliminate race conditions and reduce round-trip times.
Client-Side Throttling: Return Retry-After headers and 429 status codes to encourage well-behaved clients to back off, reducing server load.
Hybrid Consistency: Use Redis Cluster with asynchronous replication for performance, accepting that in rare failover scenarios, a user might slightly exceed their limit.
Design Breakdown

Functional Requirements

Core Use Cases:
Check if a request from a specific identifier (User/IP) is within the allowed quota.
Update the quota usage for the current time window.
Return the remaining quota and reset time in the response headers.
Scope Control:
In-scope: Distributed counting, windowing algorithms, and low-latency API integration.
Out-of-scope: Complex monetization/billing logic, permanent user blocking (WAF territory), and historical analytics.

Non-Functional Requirements

Scale: Must handle 1M+ RPS across a distributed fleet of API gateways.
Latency: Sub-2ms response time for the "Allow/Deny" decision.
Availability & Reliability: 99.99% availability; if the rate limiter is down, the system should default to "Allow" (Fail-open).
Consistency: Eventual consistency across regions; strong atomicity within a single region's cache.
Fault Tolerance: Resilience against Redis node failures using replication and sharding.

Estimation

Traffic Estimation: 1M RPS peak.
Storage Estimation:
Key: user_id:api_endpoint:window_id (approx. 50 bytes).
Value: Integer (8 bytes).
Total: ~60 bytes per entry.
10M active users in a 1-minute window = 600MB RAM. This easily fits in a single Redis node, but sharding is used for QPS.
Bandwidth Estimation:
1M requests * 100 bytes (key + command) = 100MB/s ingress to the cache layer.

Blueprint

Concise Summary: A middleware-based distributed rate limiter using Redis as a centralized atomic counter with a sliding window algorithm.
Major Components:
API Gateway / Middleware: Intercepts incoming requests and communicates with the Rate Limiter.
Rate Limiter Service: Orchestrates the logic of checking and updating quotas.
Redis Cluster: Stores the counts using a sliding window counter pattern for high performance and atomicity.
Simplicity Audit: This design avoids complex stream processing or heavy databases, using only a high-performance cache and simple scripts to meet the sub-2ms latency target.
Architecture Decision Rationale:
Why this architecture?: Redis provides the necessary atomic operations (INCR, EXPIRE) and sub-millisecond performance required for per-request checks.
Functional Satisfaction: Meets the need for real-time request tracking and quota enforcement.
Non-functional Satisfaction: Scalable via Redis sharding; highly available via Redis Sentinel/Cluster and fail-open middleware logic.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global requests are routed to the nearest regional PoP via Latency-based DNS.
Security & Perimeter:
API Gateway: Performs SSL termination and basic AuthN before passing to the rate limiter.
Rate Limiting: Integrated as a plugin/middleware within the Gateway (e.g., Kong, Envoy).

Service

Topology & Scaling: Stateless middleware nodes deployed in an Auto-Scaling Group (ASG).
API Schema Design:
Internal Check API: POST /v1/limiter/check
Payload: { "key": "user_123", "limit": 100, "window": 60 }
Response: { "allowed": true, "remaining": 99, "reset_time": 1672531200 }
Resilience & Reliability:
Fail-Open: If the Rate Limiter Service returns a 5xx or times out (>20ms), the API Gateway proceeds with the request to ensure user experience isn't broken by internal infrastructure issues.
Circuit Breaker: Prevents the API Gateway from overloading the Rate Limiter Service if it's struggling.

Storage

Access Pattern: Extremely high write-heavy (incrementing counters) and read-heavy (checking limits).
Database Table Design: (N/A for Redis, but Key Design is crucial)
Key Pattern: ratelimit:{user_id}:{api_path}:{timestamp_minute}
TTL: Set to window size + 10 seconds for auto-cleanup.
Technical Selection: Redis. Chosen for its in-memory speed and support for Lua scripting which ensures the "Check-and-Set" operation is atomic.
Distribution Logic: Hash-based sharding on the {user_id} to ensure all requests for a single user land on the same Redis shard.

Cache

Purpose & Justification: Redis is the primary state store. It solves the "Distributed Counter" problem.
Key-Value Schema:
Algorithm: Sliding Window Counter.
Logic: Split the window into buckets (e.g., 1-minute window split into 1-second buckets). Sum the buckets for the current window.
Technical Selection: Redis with Lua scripting.
Failure Handling: Use a Redis Cluster with at least one replica per shard. If a shard is down, the middleware fail-open logic triggers.
Wrap Up

Advanced Topics

Trade-offs:
Accuracy vs. Performance: We use the Sliding Window Counter algorithm. It is more accurate than Fixed Window (which suffers from bursts at edges) but uses slightly more memory.
Reliability:
Hard Limit vs. Soft Limit: During peak infrastructure failure, we switch to soft-limiting (allowing all) to prioritize availability over strict limit enforcement.
Bottleneck Analysis:
Redis Hotkeys: If a single user (e.g., a massive botnet or a popular public API) hits the system, a single Redis shard might become a bottleneck.
Optimization: Implement "Local Batching." The middleware increments a local counter and only syncs to Redis every 100ms or every 50 requests. This significantly reduces QPS to Redis at the cost of slight inaccuracy.
Security:
IP Spoofing: Rate limiting by IP is susceptible to spoofing. We prioritize Auth Tokens (JWT) or API Keys for identification.