The Question
Design
Scalable API Rate Limiter
Design a high-throughput, low-latency rate limiting system for a platform like OpenAI that manages millions of users. The system must enforce quotas based on both request counts and resource consumption (e.g., tokens), support multiple subscription tiers, and provide sub-millisecond enforcement latency while maintaining high availability.
Redis
Lua Script
Token Bucket Algorithm
API Gateway Middleware
PostgreSQL
February 23, 2026