The Question
Design

Scalable URL Shortening Service Design

Design a high-scale URL shortening service similar to Bitly or TinyURL. The system must support at least 100 million new URLs per day and handle 10 billion redirections daily with extremely low latency. You need to address unique ID generation without collisions, efficient storage of hundreds of billions of records, and a caching strategy for viral links. Consider how the system would handle custom short-links and automatic link expiration, while ensuring high availability even during regional failures.
NoSQL
DynamoDB
Cassandra
Redis
Zookeeper
Base62
Bloom Filter
Anycast
CDC
L7 Load Balancer
Questions & Insights

Clarifying Questions

What is the expected scale of the system? (Assumption: 100 million new URLs generated per day, with a 100:1 read-to-write ratio, resulting in 10 billion daily redirections).
What is the desired length and character set of the shortened URL? (Assumption: 7 characters using Base62 [a-z, A-Z, 0-9], which provides 62^7 \approx 3.5 trillion unique combinations).
Should the system support custom aliases and expiration dates? (Assumption: Yes, custom aliases are allowed if available; default expiration is 10 years).
Is low latency a priority for redirection? (Assumption: Yes, redirection must happen within < 10ms at the edge/service layer).

Thinking Process

How do we generate unique, non-colliding short IDs at scale? Use a distributed unique ID generator or a central counter with range-based allocation to avoid collisions without high-latency coordination.
How do we handle the 100:1 read-to-write ratio? Implement a multi-layered caching strategy (CDN + Distributed Cache) to serve popular "hot" URLs and minimize database lookups.
How do we store 300+ billion records efficiently? Use a NoSQL Key-Value store (e.g., DynamoDB or Cassandra) partitioned by the shortened ID to ensure linear scalability and high write throughput.
How do we ensure the system is globally performant? Deploy the redirection service across multiple regions with Anycast IP routing to direct users to the nearest point of presence.

Bonus Points

Bloom Filters: Implement Bloom filters in front of the cache/DB to instantly reject requests for non-existent short IDs, preventing "cache miss storms" and unnecessary storage I/O.
Global Footprint: Use Geohashing or Latency-based routing via Route53 to ensure the 302/301 redirect originates from the region closest to the user.
Write-Back Cache: For extremely high-volume custom URL requests, use a write-back strategy to acknowledge the user quickly while persisting to the database asynchronously.
Cold Storage Tiering: Move expired or non-accessed URLs (older than 2 years) to cheaper object storage (S3) while keeping a metadata index for rare "resurrection" requests.
Design Breakdown

Functional Requirements

Core Use Cases:
Shorten a long URL into a unique 7-character link.
Redirect users from a short URL to the original long URL (HTTP 301/302).
Allow users to specify a custom alias (if available).
Scope Control:
In-Scope: Shortening, Redirection, Custom Aliases, High-scale Read/Write.
Out-of-Scope: User accounts/auth (for MVP), detailed click analytics, link editing, private/password-protected links.

Non-Functional Requirements

Scale: Support 100M writes/day and 10B reads/day.
Latency: Redirection should be < 10ms for cached items; shortening < 200ms.
Availability & Reliability: 99.99% availability (High availability is critical; if redirection is down, the link is broken).
Consistency: Eventual consistency is acceptable for new links; custom aliases require strong consistency to prevent duplicates.
Fault Tolerance: No single point of failure in ID generation or storage.

Estimation

Traffic Estimation:
Write QPS: 100,000,000 / 86,400 \approx 1,157 requests/sec.
Read QPS: 10,000,000,000 / 86,400 \approx 115,740 requests/sec.
Storage Estimation:
Record size: Short ID (7B) + Long URL (avg 500B) + CreatedAt (8B) + Expiry (8B) \approx 523 bytes.
Daily storage: 100M \times 523B \approx 52 GB/day.
10-year storage: 52 GB \times 3,650 \approx 190 TB.
Bandwidth Estimation:
Incoming (Writes): 1.2k \times 500B \approx 600 KB/s.
Outgoing (Reads): 115k \times 500B \approx 57.5 MB/s.

Blueprint

Concise Summary: A high-throughput redirection engine leveraging a distributed ID generator and a partitioned NoSQL store, optimized with an aggressive caching layer.
Major Components:
API Gateway: Handles rate limiting and routes requests to Shortening or Redirection services.
ID Generation Service: Manages ranges of unique IDs using a distributed coordinator (Zookeeper) to ensure no collisions.
Shortening Service: Logic for mapping long URLs to IDs and handling custom aliases.
Redirection Service: High-speed lookup service that converts short IDs to original URLs.
Metadata Store: Highly scalable NoSQL database for mapping short IDs to original URLs.
Cache: Redis-based cluster to store frequently accessed mappings.
Simplicity Audit: This design avoids complex message queues or batch processing for the core loop, focusing purely on the critical path of mapping A \rightarrow B.
Architecture Decision Rationale:
NoSQL (DynamoDB/Cassandra): Best for key-value access patterns at 100TB+ scale; provides predictable performance.
Range-based ID Gen: Avoids the bottleneck of a single database auto-incrementing ID.
Cache-Aside Pattern: Reduces DB load significantly for viral links (80/20 rule).

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing
Anycast IP: Distribute traffic to the nearest regional Load Balancer.
CDN (Global): While long URLs are dynamic, the redirection response (301 Moved Permanently) can be cached at the CDN level for highly viral links to offload the origin.
Security & Perimeter
Rate Limiting: IP-based and User-based limiting at the Gateway to prevent scraping and DOS.

Service

Topology & Scaling
Stateless services deployed in multiple Availability Zones (AZs) using Kubernetes.
Scaling triggered by CPU utilization and Request Count (QPS).
API Schema Design
POST /api/v1/shorten:
Request: {long_url: string, custom_alias: string?, expire_at: timestamp?}
Response: {short_url: string}
GET /{short_id}:
Response: HTTP 301/302 Redirect.
ID Generation Logic
Range-based Counter: Each service instance requests a range (e.g., 1,000,000 IDs) from Zookeeper. It increments locally. When the range is exhausted, it requests a new one. This ensures zero coordination for most requests.
Encoding: The counter (Long/Int64) is converted to Base62 (e.g., ID 1234567\rightarrowa1B2c3d).

Storage

Access Pattern
Read:Write ratio is 100:1.
Lookup by short_id (Primary Key).
Database Table Design
Table: URL_Mapping
short_id (Partition Key, String)
original_url (String)
created_at (Timestamp)
expiry_at (Timestamp, TTL Index)
user_id (String - for future analytics)
Technical Selection
Amazon DynamoDB or Apache Cassandra: Chosen for sub-10ms performance at any scale and built-in TTL (Time To Live) support to automatically delete expired URLs.
Distribution Logic
Hashing the short_id ensures uniform distribution across partitions, avoiding hot spots.

Cache

Purpose & Justification: Reduces latency from ~20ms (DB) to <2ms (Cache) and protects the DB from 115k QPS.
Key-Value Schema:
Key: url:{short_id}
Value: original_url (String)
TTL: 24 hours (with LRU eviction).
Failure Handling: If Redis is down, traffic falls back to the NoSQL DB. Bloom filters (kept in memory on the service layer) prevent a flood of DB hits for non-existent keys.
Wrap Up

Advanced Topics

Trade-offs (Consistency vs Availability): According to the CAP theorem, we prioritize Availability for redirection. If a newly created link takes 1 second to propagate to a regional cache, it is acceptable. However, Custom Aliases require a conditional write to the DB (Strong Consistency) to ensure uniqueness.
Reliability & Failure Handling:
Zookeeper Cluster: If Zookeeper is down, services use their remaining cached ID ranges.
Database Replication: Multi-region replication (Global Tables) to survive regional outages.
Bottleneck Analysis: The ID generator range allocation could be a bottleneck if ranges are too small. We will set range sizes to support 1 hour of traffic per service instance.
Security & Privacy:
Phishing Detection: Integrate with Google Safe Browsing API during the shorten process to block malicious URLs.
ID Predictability: To prevent people from "guessing" URLs, we can append a random bit or shuffle the counter mapping (Feistel Cipher) before Base62 encoding to make IDs look non-sequential.