The Question

User Profile and API Quota System

Design a high-scale user management system for a global AI platform. The system must handle user profiles, flexible preference configurations (e.g., model parameters), multi-device session management with revocation capabilities, and a high-throughput API quota enforcement engine. Ensure the design can handle 100M+ users with low-latency quota checks (<10ms) and high availability, while discussing the trade-offs between strict consistency and system performance.

PostgreSQL

Redis

JSONB

Lua

Kubernetes

JWT

API Gateway

Questions & Insights

Clarifying Questions

Scale: What is the expected scale in terms of Daily Active Users (DAU) and Peak QPS? (Assumption: 100M DAU, 1M Peak QPS for quota checks).

Quota Precision: Does API quota need to be strictly enforced (hard limit) or is a slight delay acceptable (soft limit)? (Assumption: Strict enforcement for paid tiers, eventual consistency for free tiers).

Multi-device: Are there limits on concurrent active sessions, and do we need to provide a "Logout from all devices" feature? (Assumption: Yes, session management is required with revocation capabilities).

Preference Structure: How complex are the user preferences? (Assumption: Mostly key-value pairs or small JSON objects like theme, default model, and language).

Thinking Process

Core Bottleneck: The high-frequency "Write" volume of API quota tracking and the high-frequency "Read" volume of session/profile lookups.

Progressive Questions:

How do we store user profiles and preferences such that they are highly available and evolvable?

How do we manage multi-device sessions to ensure fast authentication and global revocation?

How do we design a high-throughput, low-latency quota system that prevents double-spending or over-limit usage?

How do we scale this end-to-end architecture to handle 100M+ users?

Bonus Points

Write-Through Quota Pattern: Using Redis Lua scripts to perform "Check-and-Decrement" atomically to ensure strict quota enforcement without race conditions.

Global Data Locality: Implementing a "Home Region" for user profiles while caching session tokens at the Edge to minimize global latency.

Hybrid Storage: Using PostgreSQL with JSONB for flexible user preferences, balancing the reliability of RDBMS with the flexibility of NoSQL.

Token Bucket/Leaky Bucket Integration: Integrating quota management with the Rate Limiter layer to provide a unified "Request Budgeting" service.

Design Breakdown

Functional Requirements

Core Use Cases:

User Registration and Profile Management (CRUD).

Multi-device login and session tracking.

Retrieval and updates of user preference configurations.

Real-time API quota tracking and enforcement.

Scope Control:

In-Scope: Profile storage, Session management, Quota logic, Preference engine.

Out-of-Scope: Identity Provider (IdP) implementation (e.g., Auth0/Google OAuth logic), Billing/Payment processing (Stripe integration).

Non-Functional Requirements

Scale: Handle 100M+ users; Quota system must handle 1M+ TPS.

Latency: < 50ms for profile/preference retrieval; < 10ms for quota check.

Availability & Reliability: 99.99% (SLA critical for API users).

Consistency: Strong consistency for Quota; Eventual consistency for Preferences.

Security: Session hijacking protection; Encryption at rest for PII.

Estimation

Traffic: 100M DAU. Assuming 10 API requests/user/day = 1B requests/day. Avg QPS ~12k, Peak QPS ~50k-100k.

Storage: 100M users * 5KB (profile + preferences) = 500GB.

Sessions: 100M users 3 devices = 300M active sessions. 300M 1KB = 300GB in Redis.

Bandwidth: 100k QPS * 2KB/response = 200MB/s outgoing.

Blueprint

Concise Summary: A microservices architecture featuring a User Service for metadata, a Session Service for multi-device management using Redis, and a Quota Service utilizing atomic counters for high-speed usage tracking.

Major Components:

API Gateway: Handles AuthN, Rate Limiting, and routes requests to specific services.

User Service: Manages the lifecycle of user profiles and preferences stored in a relational database.

Session Service: Tracks active logins across devices using a fast KV store.

Quota Service: Dedicated high-performance counter service for real-time API budget management.

Simplicity Audit: This design avoids complex event-sourcing for an MVP, opting instead for a synchronous "Check-and-Act" pattern for quotas and standard RDBMS for profiles.

Architecture Decision Rationale:

Why this architecture?: Separating Quota from Profile allows us to scale the "Hot" quota traffic independently from the "Cold" profile traffic.

Functional Satisfaction: Meets all requirements for profile, multi-device login, and quota.

Non-functional Satisfaction: Redis ensures low-latency session and quota checks; PostgreSQL provides the reliability needed for core user identity.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Use a Global Load Balancer with Geo-DNS to route users to the nearest regional cluster.

Security & Perimeter: API Gateway performs JWT validation. Rate limiting is applied at the IP and User levels before reaching internal services.

Service

Topology & Scaling: Stateless microservices deployed in Kubernetes (EKS/GKE). Horizontal Pod Autoscaling (HPA) triggered by CPU and Request Count.

API Schema Design:

GET /v1/user/profile: Returns basic info.

PATCH /v1/user/preferences: Updates JSON configuration.

POST /v1/sessions/logout-all: Invalidation of all tokens for a user ID.

POST /v1/quota/consume: Atomic decrement of available tokens.

Resilience & Reliability:

Quota service uses a Fail-Open strategy for free-tier users if Redis is down (prioritize availability).

Circuit Breakers on the User Service to prevent cascading failures during DB maintenance.

Storage

Access Pattern:

User DB: Read-heavy (login/profile view).

Quota DB: Write-heavy (usage logging).

Database Table Design:

users: user_id (UUID, PK), email, password_hash, created_at.

preferences: user_id (FK), config_json (JSONB), updated_at.

quotas: user_id (PK), monthly_limit, current_usage, reset_date.

Technical Selection:

PostgreSQL: Primary source of truth for Users/Preferences. JSONB allows for dynamic OpenAI model settings.

PostgreSQL (Sharded) or Cassandra: For Quota archival/billing history.

Distribution Logic: Shard by user_id to ensure all data for a single user resides on the same partition.

Cache

Purpose & Justification:

Session Cache: Stores session tokens (UUID) mapping to User Metadata. Essential for <5ms auth checks.

Quota Cache: Stores "Current Balance". Atomic DECR in Redis is significantly faster than SQL updates.

Key-Value Schema:

sess:{token} -> {user_id, device_type, expiry} (TTL: 30 days).

quota:{user_id} -> {remaining_tokens} (TTL: Until month-end).

Failure Handling: Use Redis Sentinel or Cluster for high availability.

Wrap Up

Advanced Topics

Trade-offs: We choose Availability over Consistency (AP) for the quota system in extreme failure scenarios. It is better to let a user get a free request than to block a paying user because the counter service is slow.

Reliability: Exponential backoff on the client-side for quota check failures.

Security: Multi-device login is managed by storing a unique session_id in the database; during a "Logout All" event, we purge all keys matching sess:{user_id}: in Redis.

Optimization: For extremely high-volume API keys, use Local In-Memory Buffering. Aggregate 100 requests in the API Gateway and send a single "Decrement 100" command to the Quota Service to reduce network overhead.