DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Distributed Rate Limiter

Design a high-performance, distributed rate limiting system capable of handling over 1 million requests per second. The system must support multiple identification strategies (User ID, IP address, API Key) and different quota tiers. Key constraints include a sub-2ms latency overhead and high availability, with a focus on how to handle race conditions in a distributed environment and how the system should behave during partial failures.
Redis
Lua
gRPC
PostgreSQL
Consistent Hashing
API Gateway
Kubernetes
Questions & Insights

Clarifying Questions

Scale and Throughput: What is the expected peak traffic? (Assumed: 1,000,000 Requests Per Second (RPS) with 100 million active users).
Latency Budget: What is the maximum acceptable latency overhead added by the rate limiter? (Assumed: < 2ms p99 overhead).
Granularity: What is the identifier for rate limiting? (Assumed: Per API-Key and Per IP address).
Accuracy: Is a "hard limit" required (strictly accurate) or is "soft limit" (eventual consistency) acceptable for better performance? (Assumed: Highly accurate for billing tiers, but performance is prioritized).
Deployment: Should this be a centralized service or a distributed sidecar? (Assumed: Centralized Rate Limiter Service behind an API Gateway for consistency).

Thinking Process

Core Bottleneck: The primary challenge is performing a "Read-Modify-Write" operation at a million-plus RPS without creating a massive database bottleneck or race conditions.
Progressive Logic:
How do we track counts across multiple servers? Use a centralized Distributed Cache (Redis).
How do we handle race conditions between concurrent requests? Use Lua Scripts within Redis for atomic operations.
How do we minimize latency for high-traffic users? Implement Local In-memory Cache for rule-set lookups and partial count batching.
How do we scale? Implement Consistent Hashing on Redis clusters to prevent hot partitions.

Bonus Points

Token Bucket vs. Sliding Window: Using a Sliding Window Counter algorithm to prevent "bursting" at window boundaries, which is a common flaw in Fixed Window designs.
Dynamic Tiering: Implementing a weighted rate limiter where different API endpoints or user tiers consume "tokens" at different rates.
Global vs. Regional Synchronization: Discussing the trade-off between a global Redis cluster (strict accuracy) vs. regional Redis clusters with asynchronous synchronization (lower latency, eventual consistency).
Failure Mode - Fail Open: In a high-scale system, if the rate limiter service dies, the system should Fail Open (allow traffic) to maintain availability, while firing high-priority alerts.
Design Breakdown

Functional Requirements

Core Use Cases:
Determine if a request should be allowed or throttled based on defined limits.
Return 429 Too Many Requests with Retry-After headers.
Support different limits for different tiers (e.g., Free vs. Pro).
Scope Control:
In-Scope: Distributed counting logic, rule engine, and basic analytics.
Out-of-Scope: User authentication (handled by Auth Service), billing processing, and deep packet inspection.

Non-Functional Requirements

Scale: Must handle 1M+ RPS and support 100M+ unique keys.
Latency: Extremely low overhead (< 2ms) to avoid degrading user experience.
Availability & Reliability: 99.99% availability; the rate limiter must not become a Single Point of Failure (SPOF).
Consistency: High consistency for billing-related limits; eventual consistency for public API IP-based limits.
Fault Tolerance: Handle Redis cluster partitions or node failures gracefully.

Estimation

Traffic: 1M RPS.
Storage:
Each key (UserID/IP + Window) \approx 64 bytes.
100M users * 64 bytes \approx 6.4 GB.
Modern Redis clusters can easily hold this in memory.
Bandwidth:
Request: UserID (16 bytes) + Metadata \approx 100 bytes.
1M RPS * 100 bytes = 100 MB/s (Inbound to Rate Limiter).

Blueprint

Concise Summary: A distributed rate limiter using an API Gateway as the entry point, a stateless Rate Limiter Service for logic, and a Redis cluster for high-speed atomic counter management.
Major Components:
API Gateway: Acts as the enforcement point, intercepting requests and querying the Rate Limiter Service.
Rate Limiter Service: A stateless service that executes the Sliding Window Counter logic.
Redis Cluster: Stores counters and uses Lua scripts to ensure atomicity.
Config Store: A persistent DB to store rate limit rules per tier/user.
Simplicity Audit: This design avoids complex distributed locks by offloading atomicity to Redis Lua scripts, keeping the service layer stateless and easy to scale horizontally.
Architecture Decision Rationale:
Redis is the industry standard for this due to its sub-millisecond latency and built-in atomic primitives.
Sliding Window Counter provides the best balance between memory efficiency and accuracy (prevents the "boundary burst" problem of Fixed Window).

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: DNS redirects traffic to the nearest regional API Gateway.
Security & Perimeter:
The API Gateway performs SSL termination and initial IP-based blacklisting before hitting the rate limiter logic.
Rate Limiting: The Gateway makes a gRPC call to the Rate Limiter Service for a "Go/No-Go" decision.

Service

Topology & Scaling: Stateless Go or Java services deployed in a Kubernetes cluster across multiple Availability Zones (AZ). Scales based on CPU and Request Count.
API Schema Design:
POST /v1/is-allowed
Protocol: gRPC (for low latency and binary serialization).
Request: { "key": "user_123", "tier": "pro", "cost": 1 }
Response: { "allowed": true, "remaining": 99, "reset_time": 1625091200 }
Resilience & Reliability:
Fail-Open Policy: If the Rate Limiter Service returns a 5xx or times out, the API Gateway allows the request by default to ensure system availability.
Timeout: Strict 50ms timeout for the gRPC call.

Storage

Access Pattern: Heavy Write (Incr) and Heavy Read (Check count).
Database Table Design (Config DB):
rule_id: UUID (PK)
target_type: String (USER_ID, IP, API_KEY)
limit_value: Integer
window_size: Integer (seconds)
Technical Selection: PostgreSQL for the Config DB (relational, ACID for rules), but Redis for the operational counters.
Distribution Logic: Partition Redis using Consistent Hashing on the key (e.g., hash(user_id) % num_shards) to ensure even distribution of traffic.

Cache

Purpose & Justification:
Rule Cache: A local in-memory cache (e.g., Caffeine or LRU) inside the Rate Limiter Service to store rules from the Config DB, preventing a DB hit for every request.
Redis Counter: Acts as the centralized truth for current request counts.
Key-Value Schema:
Key: ratelimit:{user_id}:{window_timestamp}
Structure: Redis Hash or Sorted Set for Sliding Window.
Failure Handling: Use Redis Sentinel or Cluster Mode for high availability.

Infrastructure (Optional)

Observability:
Metrics: Track limit_exceeded_count, request_latency, and cache_hit_ratio.
Alerting: Alert if the "Fail-Open" mechanism is triggered or if Redis latency spikes.
Wrap Up

Advanced Topics

Trade-offs: We choose Consistency over Availability for the counter (using Redis), but we choose Availability over Accuracy for the service itself (Fail-Open strategy). This is a classic CAP theorem trade-off (CP for storage, AP for the service wrapper).
Sliding Window Optimization: To implement a true sliding window without the memory overhead of Sorted Sets, we use a Sliding Window Counter:
Current Count = count_in_current_window + count_in_previous_window * (1 - overlap_percentage). This provides a very close approximation with minimal storage.
Race Conditions: Solved by using the following Lua script in Redis:
    local current = redis.call("INCR", KEYS[1])
    if current == 1 then
        redis.call("EXPIRE", KEYS[1], ARGV[1])
    end
    return current
Bottleneck Analysis: The Redis cluster could become a bottleneck. We mitigate this by sharding and, for extremely high volume, implementing Local Batching: The service increments a local counter and syncs to Redis every 50-100ms (trading off strict accuracy for extreme scale).