DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable Distributed Rate Limiter

Design a high-performance distributed rate limiting system capable of handling 10 million requests per second with sub-2ms latency. The system must support various granularities (User, IP, API Key) and dynamic rule updates. Focus on high availability, the trade-offs between consistency and performance, and ensuring the system does not become a single point of failure for the entire architecture.
Redis
Lua
gRPC
PostgreSQL
Token Bucket
API Gateway
Sidecar Pattern
Questions & Insights

Clarifying Questions

What is the scale of the system? (Assumed: 10M+ Requests Per Second (QPS) across a global user base).
What is the required latency overhead for a rate-limit check? (Assumed: Sub-2ms overhead to avoid impacting user experience).
Should the system fail "open" or "closed"? (Assumed: Fail "open" to prioritize availability—if the rate limiter is down, traffic should still flow).
What is the granularity of the limits? (Assumed: Support for multiple tiers—e.g., per-User ID, per-IP, and per-API endpoint).
Are rules static or dynamic? (Assumed: Rules are updated via an Admin API and should propagate within seconds).

Thinking Process

Core Algorithm Choice: How do we balance accuracy and memory? (Token Bucket for flexibility vs. Fixed Window for simplicity).
State Management: Where do we store counters to ensure distributed consistency? (Centralized high-performance K-V store like Redis).
Concurrency Control: How do we avoid the "read-modify-write" race condition? (Atomic operations using Redis Lua scripting).
Resilience Strategy: How do we ensure the rate limiter doesn't become a Single Point of Failure (SPOF)? (Local in-memory shadowing and fail-open mechanisms).

Bonus Points

Local Batching / Thick Client: Discuss reducing network round-trips to Redis by batching "token acquisitions" locally at the application or gateway level.
Hierarchical Rate Limiting: Implementing a global limit (Redis-based) paired with a local limit (In-memory) to handle massive bursts or "Thundering Herd" scenarios.
Clock Drift Mitigation: Handling inaccuracies in distributed systems when using "Sliding Window Log" or timestamp-dependent algorithms.
Shadow Mode: Deploying new rate limits in "dry-run" mode to analyze impact before enforcement.
Design Breakdown

Functional Requirements

Core Use Cases:
isAllow(key, limit, window): Determine if a request should be throttled.
Return headers (X-Ratelimit-Limit, X-Ratelimit-Remaining, X-Ratelimit-Retry-After).
Dynamic rule configuration (Add/Update/Delete limits).
Scope Control:
In-scope: Distributed counter management, low-latency decision engine, and rule storage.
Out-of-scope: Client-side SDK enforcement (logic remains server-side/gateway), sophisticated fraud detection (WAF territory).

Non-Functional Requirements

Scale: Must handle 10M+ peak QPS.
Latency: P99 response time < 2ms.
Availability: 99.99% availability; system must fail-open.
Consistency: Eventual consistency is acceptable for global limits, but strict atomicity is required for local counters to prevent over-limit leakage.
Security: Prevent malicious actors from bypassing limits via IP spoofing or header manipulation at the edge.

Estimation

Traffic: 10M QPS.
Storage:
Each key (User ID + Rule ID): ~100 bytes.
100M active users/keys = 10GB RAM (well within a single large Redis cluster capacity).
Bandwidth:
Request: ~200 bytes per check (Key + metadata).
10M * 200B = 2 GB/s total network throughput to the Rate Limiter layer.
CPU: High-performance Lua execution in Redis is single-threaded per shard; will require sharding across 10-20 Redis nodes.

Blueprint

Concise Summary: A sidecar or gateway-integrated service that checks a centralized Redis cluster using the Token Bucket algorithm implemented via Lua scripts.
Major Components:
API Gateway / Sidecar: Intercepts incoming requests and queries the Rate Limiter service.
Rate Limiter Service: Stateless microservice that encapsulates the logic and interfaces with storage.
Redis Cluster: Stores real-time counters and executes atomic logic via Lua scripts.
Rule Repository: Relational database for persistent storage of rate-limiting configurations.
Simplicity Audit: This design avoids complex synchronization protocols (like Paxos/Raft) by delegating atomicity to Redis, which is standard for MVPs.
Architecture Decision Rationale:
Why this architecture?: Redis provides the necessary sub-millisecond latency and built-in atomic primitives (Lua) required for high-throughput counting.
Functional Satisfaction: Meets the need for isAllow checks and dynamic rules.
Non-functional Satisfaction: Scalable through Redis sharding; highly available through Redis Sentinel/Cluster and "fail-open" logic in the service layer.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: DNS routes traffic to the nearest regional Load Balancer.
Security & Perimeter:
API Gateway: Performs SSL termination and basic AuthN before calling the Rate Limiter.
Rate Limiting: First line of defense against DDoS.
Fail-Open: If the Rate Limiter service returns an error or times out (>5ms), the Gateway logs the error and allows the request to pass.

Service

Topology & Scaling:
Stateless instances deployed in a Multi-AZ configuration.
Scaling based on CPU and Request Count (QPS).
API Schema Design:
POST /v1/check
Protocol: gRPC (for performance/low-overhead).
Request: { "key": "user_123", "action": "create_post" }
Response: { "allowed": true, "remaining": 49, "reset_time": 1625000000 }
Resilience & Reliability:
Circuit Breaker: If Redis latency spikes, the service stops querying and fails open immediately.
Retries: No retries for rate-limit checks (to keep latency low).

Storage

Access Pattern: Extremely high write/read frequency (1:1 ratio).
Database Table Design (Rule DB):
rule_id (PK), resource_path, method, limit_count, window_seconds, is_active.
Technical Selection:
Rule DB: PostgreSQL (Relational) because rules change infrequently and require ACID for configuration management.
Counter Store: Redis (In-memory).
Distribution Logic:
Redis Sharding: Key-based sharding (CRC16 of UserID) to distribute load across the cluster.

Cache

Purpose & Justification: Redis acts as the primary state store (not just a cache).
Key-Value Schema:
Key: ratelimit:{user_id}:{rule_id}
Value: Current token count / Timestamp.
Lua Script (Token Bucket):
  local tokens = redis.call('get', KEYS[1])
  if tokens == nil then
    tokens = ARGV[1] -- Initial limit
  end
  if tonumber(tokens) > 0 then
    redis.call('decr', KEYS[1])
    return 1
  else
    return 0
  end
Failure Handling: Use Redis Replication (Leader/Follower) for high availability.

Infrastructure (Optional)

Observability:
Metrics: Track rate_limit_exceeded count per rule and redis_latency.
Alerting: Alert if P99 latency > 5ms or if Redis memory usage > 80%.
Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP over CP in CAP terms). If Redis is unreachable, we allow traffic. This prevents a rate-limiter outage from becoming a site-wide outage.
Reliability: Use a "Local Cache" in the Rate Limiter service to store rules from the Rule DB, refreshed every 30 seconds via a background thread to minimize DB load.
Bottleneck Analysis: Redis is the bottleneck. Mitigation: Use a "Thick Client" approach where the Gateway caches the fact that a user is already blocked for X seconds, avoiding redundant calls for blocked users.
Optimization: For global "heavy hitters" (e.g., an IP hammering the system), the Gateway can promote that IP to a "Deny List" at the L4/L7 Load Balancer level temporarily to bypass the Service Layer entirely.