The Question
Design

Scalable User Profile and Quota System

Design a high-performance system to manage millions of user accounts, including basic metadata, persistent settings, multi-device session tracking, and real-time API usage limit enforcement.
Redis
PostgreSQL
Lua Script
Token Bucket
Microservices
Questions & Insights

Clarifying Questions

Scale and Traffic: What is the expected Scale? (Assumption: 100M total users, 10M DAU, and a peak of 50k QPS for profile lookups/quota checks).
Consistency Requirements: Is strict consistency required for all data, or just for API Quota? (Assumption: Strong consistency for API Quotas to prevent over-usage; Eventual consistency for UI preferences).
Session Management: Should we support remote logout for multi-device management? (Assumption: Yes, we need to track active session IDs per user to allow revocation).
Quota Granularity: Is the quota per-minute (Rate Limiting) or per-month (Billing)? (Assumption: Both. We need to handle sliding window rate limits and monthly token buckets).

Thinking Process

The Core Bottleneck: User profile and quota lookups are on the "Critical Path" for every single LLM request. High latency here degrades the entire user experience.
Strategy Path:
How to achieve sub-millisecond lookups for auth/quota? (Redis-first approach).
How to handle atomic quota decrements at scale? (Lua scripting in Redis).
How to ensure multi-device security? (Centralized Session Store vs. Stateless JWT).
How to persist structured metadata and preferences? (Relational DB for ACID compliance on user accounts).

Bonus Points

Write-Back Caching: Use a "Write-around" strategy for preferences but a "Write-through" for quotas to ensure durability without sacrificing latency.
Global Data Locality: Implement "Cell-based Architecture" or Global Tables (e.g., Spanner/DynamoDB Global Tables) to keep user data close to the inference edge.
Token Bucket Optimization: Implement the "Generic Cell Rate Algorithm" (GCRA) for sophisticated rate limiting instead of simple counters.
Negative Caching: Cache the "Non-existence" of users or blocked accounts to prevent DB hammering during brute-force or bot attacks.
Design Breakdown

Functional Requirements

Store and retrieve basic user info (email, name, tier).
Manage multi-device login sessions (active devices, last login, logout all).
Store and update user-specific configurations (model preferences, system prompts, theme).
Real-time API Quota tracking and enforcement (token usage).

Non-Functional Requirements

Latency: Profile/Quota checks must be < 20ms.
Availability: 99.99% (Users cannot use ChatGPT if the profile system is down).
Scalability: Support 100M+ users.
Accuracy: Quota enforcement must be precise to avoid financial loss or resource abuse.

Estimation

Storage: 100M users * 5KB (metadata + prefs) = 500GB (Easily fits in a single managed SQL instance, but sharded for availability).
Read Throughput: 10M DAU -> ~100k peak QPS for profile reads.
Write Throughput: Quota updates occur on every request. If avg user makes 50 requests/day -> 500M writes/day (~6k TPS average, 30k peak).
Cache Size: 10% active users in cache = 1M users * 2KB = 2GB (Very small memory footprint).

Blueprint

Concise Summary: A microservices-based architecture centered around a Profile Service that orchestrates data between a high-speed Redis cluster (for sessions and quotas) and a partitioned PostgreSQL cluster (for persistent metadata).
Major Components:
API Gateway: Handles authentication and acts as the enforcement point for quotas.
Profile Service: The core logic layer for CRUD operations on user data and preferences.
Session & Quota Store (Redis): Distributed cache for atomic quota increments and active session tracking.
User Metadata DB (PostgreSQL): The source of truth for user identity and persistent configurations.
Simplicity Audit: This design avoids complex event-streaming or specialized graph databases. It relies on proven relational storage for consistency and in-memory stores for performance.
Architecture Decision Rationale:
Why this architecture?: Separating the "Fast Path" (Quota/Auth) from the "Standard Path" (Metadata/Prefs) ensures that slow DB queries never block the LLM inference pipeline.
Functional Satisfaction: Covers the full lifecycle from login (Sessions) to usage (Quota) to customization (Prefs).
Non-functional Satisfaction: High availability via DB replication and low latency via Redis.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling: The Profile Service is a stateless Golang/Rust service deployed in Kubernetes, scaled horizontally based on CPU/Request count.
API Spec:
GET /v1/user/profile: Fetches basic info + preferences.
POST /v1/user/preferences: Updates UI/Model settings.
POST /v1/internal/quota/consume: Atomic decrement of tokens.
DELETE /v1/sessions/:id: Revokes a specific device session.

Storage

Data Model:
Users table: user_id (PK), email, password_hash, tier_id, created_at.
Preferences table: user_id (FK), key, value (JSONB).
Sessions table: session_id, user_id, device_info, last_active.
Database Logic: PostgreSQL with horizontal sharding by user_id. JSONB is used for preferences to allow flexibility in UI features without schema migrations.

Cache

Data Structures:
Quota: Redis Strings with INCRBY (Token Bucket). Key: quota:{user_id}:{period}.
Sessions: Redis Sets. Key: sessions:{user_id} containing a list of valid session_ids.
Profile Cache: Redis Hash. Key: user:{user_id} with TTL of 1 hour.
Eviction: Least Recently Used (LRU) for profile cache; Quotas have TTLs matching their period (e.g., 1 minute or 1 month).
Wrap Up

Advanced Topics

Monitoring:
Prometheus metrics for cache_hit_ratio and db_connection_pool saturation.
Tracing (OpenTelemetry) for profile fetch latency.
Trade-offs: We choose Availability over Consistency (AP) for preferences (it's okay if a theme change takes 1s to propagate), but Consistency over Availability (CP) for Quotas using Redis-Raft or synchronous DB writes for billing.
Bottlenecks: Redis becomes a single point of failure. Optimization: Use Redis Cluster with Sentinel for high availability and failover.
Failure Handling: If Redis is down, the system falls back to the PostgreSQL DB for quota checks (Degraded mode: Higher latency but functional).
Alternatives: Could use DynamoDB instead of Postgres + Redis to reduce operational overhead, but Postgres provides better complex querying for account management.