DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

User Profile and API Quota System

Design a high-scale user management system for a global AI platform. The system must handle user profiles, flexible preference configurations (e.g., model parameters), multi-device session management with revocation capabilities, and a high-throughput API quota enforcement engine. Ensure the design can handle 100M+ users with low-latency quota checks (<10ms) and high availability, while discussing the trade-offs between strict consistency and system performance.
PostgreSQL
Redis
JSONB
Lua
Kubernetes
JWT
API Gateway
Questions & Insights

Clarifying Questions

Scale: What is the expected scale in terms of Daily Active Users (DAU) and Peak QPS? (Assumption: 100M DAU, 1M Peak QPS for quota checks).
Quota Precision: Does API quota need to be strictly enforced (hard limit) or is a slight delay acceptable (soft limit)? (Assumption: Strict enforcement for paid tiers, eventual consistency for free tiers).
Multi-device: Are there limits on concurrent active sessions, and do we need to provide a "Logout from all devices" feature? (Assumption: Yes, session management is required with revocation capabilities).
Preference Structure: How complex are the user preferences? (Assumption: Mostly key-value pairs or small JSON objects like theme, default model, and language).

Thinking Process

Core Bottleneck: The high-frequency "Write" volume of API quota tracking and the high-frequency "Read" volume of session/profile lookups.
Progressive Questions:
How do we store user profiles and preferences such that they are highly available and evolvable?
How do we manage multi-device sessions to ensure fast authentication and global revocation?
How do we design a high-throughput, low-latency quota system that prevents double-spending or over-limit usage?
How do we scale this end-to-end architecture to handle 100M+ users?

Bonus Points

Write-Through Quota Pattern: Using Redis Lua scripts to perform "Check-and-Decrement" atomically to ensure strict quota enforcement without race conditions.
Global Data Locality: Implementing a "Home Region" for user profiles while caching session tokens at the Edge to minimize global latency.
Hybrid Storage: Using PostgreSQL with JSONB for flexible user preferences, balancing the reliability of RDBMS with the flexibility of NoSQL.
Token Bucket/Leaky Bucket Integration: Integrating quota management with the Rate Limiter layer to provide a unified "Request Budgeting" service.
Design Breakdown

Functional Requirements

Core Use Cases:
User Registration and Profile Management (CRUD).
Multi-device login and session tracking.
Retrieval and updates of user preference configurations.
Real-time API quota tracking and enforcement.
Scope Control:
In-Scope: Profile storage, Session management, Quota logic, Preference engine.
Out-of-Scope: Identity Provider (IdP) implementation (e.g., Auth0/Google OAuth logic), Billing/Payment processing (Stripe integration).

Non-Functional Requirements

Scale: Handle 100M+ users; Quota system must handle 1M+ TPS.
Latency: < 50ms for profile/preference retrieval; < 10ms for quota check.
Availability & Reliability: 99.99% (SLA critical for API users).
Consistency: Strong consistency for Quota; Eventual consistency for Preferences.
Security: Session hijacking protection; Encryption at rest for PII.

Estimation

Traffic: 100M DAU. Assuming 10 API requests/user/day = 1B requests/day. Avg QPS ~12k, Peak QPS ~50k-100k.
Storage: 100M users * 5KB (profile + preferences) = 500GB.
Sessions: 100M users 3 devices = 300M active sessions. 300M 1KB = 300GB in Redis.
Bandwidth: 100k QPS * 2KB/response = 200MB/s outgoing.

Blueprint

Concise Summary: A microservices architecture featuring a User Service for metadata, a Session Service for multi-device management using Redis, and a Quota Service utilizing atomic counters for high-speed usage tracking.
Major Components:
API Gateway: Handles AuthN, Rate Limiting, and routes requests to specific services.
User Service: Manages the lifecycle of user profiles and preferences stored in a relational database.
Session Service: Tracks active logins across devices using a fast KV store.
Quota Service: Dedicated high-performance counter service for real-time API budget management.
Simplicity Audit: This design avoids complex event-sourcing for an MVP, opting instead for a synchronous "Check-and-Act" pattern for quotas and standard RDBMS for profiles.
Architecture Decision Rationale:
Why this architecture?: Separating Quota from Profile allows us to scale the "Hot" quota traffic independently from the "Cold" profile traffic.
Functional Satisfaction: Meets all requirements for profile, multi-device login, and quota.
Non-functional Satisfaction: Redis ensures low-latency session and quota checks; PostgreSQL provides the reliability needed for core user identity.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Use a Global Load Balancer with Geo-DNS to route users to the nearest regional cluster.
Security & Perimeter: API Gateway performs JWT validation. Rate limiting is applied at the IP and User levels before reaching internal services.

Service

Topology & Scaling: Stateless microservices deployed in Kubernetes (EKS/GKE). Horizontal Pod Autoscaling (HPA) triggered by CPU and Request Count.
API Schema Design:
GET /v1/user/profile: Returns basic info.
PATCH /v1/user/preferences: Updates JSON configuration.
POST /v1/sessions/logout-all: Invalidation of all tokens for a user ID.
POST /v1/quota/consume: Atomic decrement of available tokens.
Resilience & Reliability:
Quota service uses a Fail-Open strategy for free-tier users if Redis is down (prioritize availability).
Circuit Breakers on the User Service to prevent cascading failures during DB maintenance.

Storage

Access Pattern:
User DB: Read-heavy (login/profile view).
Quota DB: Write-heavy (usage logging).
Database Table Design:
users: user_id (UUID, PK), email, password_hash, created_at.
preferences: user_id (FK), config_json (JSONB), updated_at.
quotas: user_id (PK), monthly_limit, current_usage, reset_date.
Technical Selection:
PostgreSQL: Primary source of truth for Users/Preferences. JSONB allows for dynamic OpenAI model settings.
PostgreSQL (Sharded) or Cassandra: For Quota archival/billing history.
Distribution Logic: Shard by user_id to ensure all data for a single user resides on the same partition.

Cache

Purpose & Justification:
Session Cache: Stores session tokens (UUID) mapping to User Metadata. Essential for <5ms auth checks.
Quota Cache: Stores "Current Balance". Atomic DECR in Redis is significantly faster than SQL updates.
Key-Value Schema:
sess:{token} -> {user_id, device_type, expiry} (TTL: 30 days).
quota:{user_id} -> {remaining_tokens} (TTL: Until month-end).
Failure Handling: Use Redis Sentinel or Cluster for high availability.
Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP) for the quota system in extreme failure scenarios. It is better to let a user get a free request than to block a paying user because the counter service is slow.
Reliability: Exponential backoff on the client-side for quota check failures.
Security: Multi-device login is managed by storing a unique session_id in the database; during a "Logout All" event, we purge all keys matching sess:{user_id}: in Redis.
Optimization: For extremely high-volume API keys, use Local In-Memory Buffering. Aggregate 100 requests in the API Gateway and send a single "Decrement 100" command to the Quota Service to reduce network overhead.