The Question
DesignUser Profile and API Quota System
Design a high-scale user management system for a global AI platform. The system must handle user profiles, flexible preference configurations (e.g., model parameters), multi-device session management with revocation capabilities, and a high-throughput API quota enforcement engine. Ensure the design can handle 100M+ users with low-latency quota checks (<10ms) and high availability, while discussing the trade-offs between strict consistency and system performance.
PostgreSQL
Redis
JSONB
Lua
Kubernetes
JWT
API Gateway
Questions & Insights
Clarifying Questions
Scale: What is the expected scale in terms of Daily Active Users (DAU) and Peak QPS? (Assumption: 100M DAU, 1M Peak QPS for quota checks).
Quota Precision: Does API quota need to be strictly enforced (hard limit) or is a slight delay acceptable (soft limit)? (Assumption: Strict enforcement for paid tiers, eventual consistency for free tiers).
Multi-device: Are there limits on concurrent active sessions, and do we need to provide a "Logout from all devices" feature? (Assumption: Yes, session management is required with revocation capabilities).
Preference Structure: How complex are the user preferences? (Assumption: Mostly key-value pairs or small JSON objects like theme, default model, and language).
Thinking Process
Core Bottleneck: The high-frequency "Write" volume of API quota tracking and the high-frequency "Read" volume of session/profile lookups.
Progressive Questions:
How do we store user profiles and preferences such that they are highly available and evolvable?
How do we manage multi-device sessions to ensure fast authentication and global revocation?
How do we design a high-throughput, low-latency quota system that prevents double-spending or over-limit usage?
How do we scale this end-to-end architecture to handle 100M+ users?
Bonus Points
Write-Through Quota Pattern: Using Redis Lua scripts to perform "Check-and-Decrement" atomically to ensure strict quota enforcement without race conditions.
Global Data Locality: Implementing a "Home Region" for user profiles while caching session tokens at the Edge to minimize global latency.
Hybrid Storage: Using PostgreSQL with JSONB for flexible user preferences, balancing the reliability of RDBMS with the flexibility of NoSQL.
Token Bucket/Leaky Bucket Integration: Integrating quota management with the Rate Limiter layer to provide a unified "Request Budgeting" service.
Design Breakdown
Functional Requirements
Core Use Cases:
User Registration and Profile Management (CRUD).
Multi-device login and session tracking.
Retrieval and updates of user preference configurations.
Real-time API quota tracking and enforcement.
Scope Control:
In-Scope: Profile storage, Session management, Quota logic, Preference engine.
Out-of-Scope: Identity Provider (IdP) implementation (e.g., Auth0/Google OAuth logic), Billing/Payment processing (Stripe integration).
Non-Functional Requirements
Scale: Handle 100M+ users; Quota system must handle 1M+ TPS.
Latency: < 50ms for profile/preference retrieval; < 10ms for quota check.
Availability & Reliability: 99.99% (SLA critical for API users).
Consistency: Strong consistency for Quota; Eventual consistency for Preferences.
Security: Session hijacking protection; Encryption at rest for PII.
Estimation
Traffic: 100M DAU. Assuming 10 API requests/user/day = 1B requests/day. Avg QPS ~12k, Peak QPS ~50k-100k.
Storage: 100M users * 5KB (profile + preferences) = 500GB.
Sessions: 100M users 3 devices = 300M active sessions. 300M 1KB = 300GB in Redis.
Bandwidth: 100k QPS * 2KB/response = 200MB/s outgoing.
Blueprint
Concise Summary: A microservices architecture featuring a User Service for metadata, a Session Service for multi-device management using Redis, and a Quota Service utilizing atomic counters for high-speed usage tracking.
Major Components:
API Gateway: Handles AuthN, Rate Limiting, and routes requests to specific services.
User Service: Manages the lifecycle of user profiles and preferences stored in a relational database.
Session Service: Tracks active logins across devices using a fast KV store.
Quota Service: Dedicated high-performance counter service for real-time API budget management.
Simplicity Audit: This design avoids complex event-sourcing for an MVP, opting instead for a synchronous "Check-and-Act" pattern for quotas and standard RDBMS for profiles.
Architecture Decision Rationale:
Why this architecture?: Separating Quota from Profile allows us to scale the "Hot" quota traffic independently from the "Cold" profile traffic.
Functional Satisfaction: Meets all requirements for profile, multi-device login, and quota.
Non-functional Satisfaction: Redis ensures low-latency session and quota checks; PostgreSQL provides the reliability needed for core user identity.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing: Use a Global Load Balancer with Geo-DNS to route users to the nearest regional cluster.
Security & Perimeter: API Gateway performs JWT validation. Rate limiting is applied at the IP and User levels before reaching internal services.
Service
Topology & Scaling: Stateless microservices deployed in Kubernetes (EKS/GKE). Horizontal Pod Autoscaling (HPA) triggered by CPU and Request Count.
API Schema Design:
GET /v1/user/profile: Returns basic info.PATCH /v1/user/preferences: Updates JSON configuration.POST /v1/sessions/logout-all: Invalidation of all tokens for a user ID.POST /v1/quota/consume: Atomic decrement of available tokens.Resilience & Reliability:
Quota service uses a Fail-Open strategy for free-tier users if Redis is down (prioritize availability).
Circuit Breakers on the User Service to prevent cascading failures during DB maintenance.
Storage
Access Pattern:
User DB: Read-heavy (login/profile view).
Quota DB: Write-heavy (usage logging).
Database Table Design:
users: user_id (UUID, PK), email, password_hash, created_at.preferences: user_id (FK), config_json (JSONB), updated_at.quotas: user_id (PK), monthly_limit, current_usage, reset_date.Technical Selection:
PostgreSQL: Primary source of truth for Users/Preferences. JSONB allows for dynamic OpenAI model settings.
PostgreSQL (Sharded) or Cassandra: For Quota archival/billing history.
Distribution Logic: Shard by
user_id to ensure all data for a single user resides on the same partition.Cache
Purpose & Justification:
Session Cache: Stores session tokens (UUID) mapping to User Metadata. Essential for <5ms auth checks.
Quota Cache: Stores "Current Balance". Atomic
DECR in Redis is significantly faster than SQL updates.Key-Value Schema:
sess:{token} -> {user_id, device_type, expiry} (TTL: 30 days).quota:{user_id} -> {remaining_tokens} (TTL: Until month-end).Failure Handling: Use Redis Sentinel or Cluster for high availability.
Wrap Up
Advanced Topics
Trade-offs: We choose Availability over Consistency (AP) for the quota system in extreme failure scenarios. It is better to let a user get a free request than to block a paying user because the counter service is slow.
Reliability: Exponential backoff on the client-side for quota check failures.
Security: Multi-device login is managed by storing a unique
session_id in the database; during a "Logout All" event, we purge all keys matching sess:{user_id}: in Redis.Optimization: For extremely high-volume API keys, use Local In-Memory Buffering. Aggregate 100 requests in the API Gateway and send a single "Decrement 100" command to the Quota Service to reduce network overhead.