The Question
DesignGlobal Scalable URL Shortener
Design a high-performance, globally distributed URL shortening service. The system must support millions of concurrent redirects per second, manage a dataset of billions of records with high availability, and ensure that short URL IDs are generated uniquely and efficiently across multiple geographic regions without central bottlenecks.
DynamoDB
Redis
etcd
Base62 Encoding
GeoDNS
Bloom Filter
Anycast
Kubernetes
Questions & Insights
Clarifying Questions
What is the expected Read:Write ratio and total scale?
Assumption: 100:1 read-to-write ratio. Supporting 1M+ peak read QPS and 10k write QPS, with a total storage target of 100 billion URLs over 10 years.
What is the length requirement for the shortened URL?
Assumption: 7 characters using Base62 (a-z, A-Z, 0-9), providing 62^7 \approx 3.5 trillion unique combinations, which comfortably fits our 100B target.
Do we need to support custom aliases or expiration dates in the MVP?
Assumption: No custom aliases for the MVP to simplify ID generation; however, we will support optional expiration timestamps.
What are the geographic distribution requirements?
Assumption: Global users require sub-100ms redirect latency. Data must be replicated across at least three major geographic regions (NA, EU, APAC).
Are analytics (click tracking) required for the MVP?
Assumption*: Minimalist approach. Only the redirection is critical. Detailed analytics are out-of-scope** for the MVP to satisfy YAGNI.
Thinking Process
Core Bottleneck: Generating unique, non-colliding IDs at high velocity across multiple regions without a single point of contention.
Efficiency Point: How to handle 1M+ QPS on a budget? (Answer: Aggressive edge caching and optimized NoSQL point lookups).
Progressive Logic:
How do we generate IDs? (Distributed Range Allocator).
Where do we store the mappings? (Global NoSQL Table).
How do we ensure low latency? (Multi-layered Caching).
How do we route users? (Geo-DNS + Regional Load Balancing).
Bonus Points
Bloom Filters: Implement a Bloom Filter in the caching layer to prevent "Cache Miss + DB Miss" for non-existent short URLs, protecting the database from "Scraping" or "DDoS" on 404 paths.
Base62 Shuffling: Instead of sequential Base62 encoding (which makes URLs predictable), use a deterministic Feistel Cipher or a bit-shuffling step to make generated IDs appear random while remaining unique.
Cold Storage Migration: Implement a TTL-based archival strategy where URLs not accessed in 2 years are moved to cheaper storage (S3/Glacier) and deleted from the hot NoSQL index.
Design Breakdown
Functional Requirements
Core Use Cases:
Shorten URL: User submits a long URL and receives a 7-character short link.
Redirect URL: User visits a short link and is redirected (HTTP 301/302) to the original long URL.
Scope Control:
In-Scope: Shortening, redirection, global persistence, high-concurrency ID generation.
Out-of-Scope: User accounts, custom aliases, click analytics, UI/Frontend, API rate-limiting beyond basic protection.
Non-Functional Requirements
Scale: Handle 100B total records; 1M+ Read QPS; 10k Write QPS.
Latency: < 50ms for cache hits; < 200ms for DB lookups globally.
Availability & Reliability: 99.99% (SLA); No single point of failure (SPOF).
Consistency: Eventual consistency is acceptable for global replication (a delay of 1-2 seconds for a new link to work globally is fine).
Fault Tolerance: Regional failover capability; automated DB backups.
Estimation
Traffic Estimation:
Write QPS: 10,000/sec.
Read QPS: 1,000,000/sec.
Storage Estimation:
Record size: Short ID (7B) + Long URL (avg 500B) + Metadata (100B) \approx 600 bytes.
Total Storage (10 years, 100B records): 100 \times 10^9 \times 600 \text{ bytes} \approx 60 \text{ TB}.
Bandwidth Estimation:
Ingress (Writes): 10k \times 600 \text{ bytes} \approx 6 \text{ MB/s}.
Egress (Reads): 1M \times 600 \text{ bytes} \approx 600 \text{ MB/s}.
Blueprint
Concise Summary: A geo-distributed API layer backed by a Range-based ID Generator and a globally replicated NoSQL store (DynamoDB/Cassandra) with high-performance Redis caching.
Major Components:
ID Coordinator (Etcd): Manages numeric range allocations to ensure unique, non-overlapping short IDs across distributed nodes.
API Service: Stateless compute nodes that handle URL shortening (encoding) and redirection (decoding).
Redis Cache: Regional clusters to serve hot URLs with sub-millisecond latency.
NoSQL Store: Persistent, partitioned storage for billions of URL mappings.
Simplicity Audit: This design avoids complex "collision-retry" logic by using a pre-allocated range strategy, making the write path extremely fast and predictable.
Architecture Decision Rationale:
Why this?: Range-based allocation is superior to Snowflake IDs for URL shortening because it yields smaller, predictable-length strings.
Functional Satisfaction: Meets both shortening and redirect requirements with clear separation.
Non-functional Satisfaction: NoSQL scales horizontally for 60TB; Redis and Geo-DNS satisfy the "millions of redirects" requirement.
High Level Architecture
Sub-system Deep Dive
Edge (Optional)
Content Delivery & Traffic Routing:
Use Anycast IP or Latency-based DNS (Route53) to route users to the nearest regional data center.
L7 Load Balancer handles SSL termination and distributes traffic to API instances.
Security & Perimeter:
API Gateway provides basic rate limiting (e.g., 100 writes/min per IP) to prevent spamming the ID generator.
Service
Topology & Scaling:
Stateless API nodes deployed in Kubernetes (EKS/GKE) across 3 regions.
Scaling signal: CPU utilization and Request Count.
API Schema Design:
POST /v1/shorten: { "long_url": "string", "ttl": "int" } -> { "short_url": "http://tiny.url/abc1234" } (Idempotency: Client-side UUID).GET /{short_id}: Returns 301 Redirect to original URL.Resilience & Reliability:
API nodes cache a "range" of IDs locally (e.g., 10,000 IDs). If the Coordinator is down, the node can still function until its range is exhausted.
Storage
Access Pattern: 99% Read (Point lookup by
short_id). 1% Write (Put by short_id).Database Table Design:
short_id (String, PK): The 7-char hash.long_url (String): The destination.created_at (Timestamp).expires_at (Timestamp, Indexed for TTL).Technical Selection: Amazon DynamoDB (Global Tables) or Apache Cassandra.
Rationale: DynamoDB Global Tables provide multi-region active-active replication out-of-the-box, which is critical for low-latency global reads.
Distribution Logic:
Sharding key is
short_id. Since IDs are generated from a counter, we use a hash of the ID or a bit-shuffled version to ensure even distribution across partitions.Cache
Purpose & Justification: Handles the massive read throughput (1M+ QPS) that would otherwise melt the database.
Key-Value Schema:
Key:
url:{short_id}.Value:
long_url string.TTL: 24 hours (LRU eviction for older/inactive links).
Technical Selection: Redis Cluster.
Failure Handling: Cache-aside pattern. If Redis fails, API nodes fall back to the NoSQL store. Use a Bloom Filter (residing in Redis or local memory) to filter out 404s.
Infrastructure (Optional)
Distributed Coordination:
Etcd or Zookeeper maintains a global counter.
Logic: Node A asks for a range -> Etcd gives
[1,000,000 - 1,001,000] and increments global counter to 1,001,001.Observability:
Standard Prometheus/Grafana stack for monitoring QPS and P99 latency.
Wrap Up
Advanced Topics
Trade-offs: We choose Eventual Consistency for global replication. A user in London might shorten a URL, and a user in Tokyo might get a 404 for the first few hundred milliseconds until the DynamoDB Global Table replicates. This is acceptable for this use case.
Reliability: The Range Allocator prevents ID collisions. If the DB is slow, Redis absorbs the shock.
Bottleneck Analysis: The ID Coordinator (Etcd) is a potential bottleneck. However, by allocating large ranges (e.g., 100k IDs) to each API node, we reduce the frequency of calls to Etcd to once every few minutes per node.
Security: The "shuffled" ID generation prevents attackers from guessing sequential URLs and scraping the entire database.
Distinguishing Insights: To handle 10x Scale (10M QPS), we would move the "Redirect Logic" to the Edge (Cloudflare Workers / Lambda@Edge). This would move the logic even closer to the user and eliminate the trip to our regional Load Balancer.