The Question
Design

Scalable URL Shortener Design

Design a high-concurrency URL shortening service like Bitly. The system must support generating shortened URLs from long ones, custom aliases, and high-performance redirection. Address challenges regarding unique ID generation without collisions, handling massive read-heavy traffic (100:1 ratio), and efficient storage management for billions of records over multiple years. Consider trade-offs between different shortening algorithms and data consistency models.
NoSQL
Redis
DynamoDB
Base62
Zookeeper
API Gateway
Snowflake ID
Questions & Insights

Clarifying Questions

Traffic Scale: What is the expected volume of new URL creations and redirections?
Assumption: 100 million new URLs per month and a 100:1 read/write ratio (10 billion redirects/month).
Short Link Length: Are there specific requirements for the length or character set of the shortened URL?
Assumption: Approximately 7 characters using Base62 ([a-z, A-Z, 0-3]) to support ~3.5 \times 10^{12} combinations.
Custom Aliases & Expiration: Do we need to support user-defined short links or expiration dates?
Assumption: Yes, both are required for a standard MVP.
Analytics: Is real-time click tracking and geo-analytics required for the MVP?
Assumption: No, to follow YAGNI. We will focus on the core redirection engine and basic click counts.

Thinking Process

Core Bottleneck: The system is read-heavy. The primary challenge is minimizing redirection latency while ensuring unique ID generation without collisions at scale.
Key Progressive Questions:
How do we generate a unique, non-colliding short key efficiently? (Base62 encoding of a distributed counter vs. Hashing).
How do we ensure sub-10ms redirection latency for "hot" links? (Layered caching strategy).
How do we handle storage growth over 5-10 years? (NoSQL horizontal scaling vs. RDBMS sharding).
How do we handle custom aliases without introducing race conditions? (Atomic "put-if-absent" operations).

Bonus Points

Base62 vs. Hashing: Using a distributed ID generator (like Snowflake) + Base62 encoding is superior to MD5 hashing because it avoids the overhead of collision resolution (check-then-insert).
Zookeeper-based Range Allocation: For the ID generator, services can fetch ranges of IDs (e.g., 1-1000) from Zookeeper to generate IDs locally, reducing the pressure on a central database.
Cache Warming: Pre-populating the cache with frequently accessed URLs or using a "Lazy TTL" approach where high-traffic links never expire from the cache.
Security against Crawlers: Implementing "Honey-token" short URLs to detect and rate-limit scrapers trying to enumerate the entire URL space.
Design Breakdown

Functional Requirements

Core Use Cases:
Shorten a long URL to a unique short link.
Redirect a short link to the original long URL (302 Found).
Support custom short link aliases.
Set optional expiration times for links.
Scope Control:
In-scope: Shortening, Redirection, Custom Aliases, Expiration.
Out-of-scope: Detailed user analytics, UI/Dashboard, User accounts/Auth (handled by external IAM), Bulk URL shortening.

Non-Functional Requirements

Scale: 40 Writes/sec, 4,000 Reads/sec (Average); Peak is 5x.
Latency: Redirection should be < 100ms; Shortening should be < 200ms.
Availability & Reliability: 99.99% availability. Links must never be lost (high durability).
Consistency: Eventual consistency for analytics; Strong consistency for custom alias creation (prevent duplicates).
Fault Tolerance: No single point of failure; multi-AZ deployment.
Security & Privacy: Protection against URL brute-forcing and rapid-fire shortening (Rate limiting).

Estimation

Traffic Estimation:
Writes: 100M/month \approx 40 URLs/sec.
Reads: 10B/month \approx 4,000 URLs/sec.
Storage Estimation:
Average URL size: 500 bytes.
100M URLs/month \times 12 months \times 5 years = 6 Billion URLs.
Total Storage: 6B \times 500 bytes \approx 3 TB.
Bandwidth Estimation:
Write: 40 \times 500B \approx 20 KB/s.
Read: 4,000 \times 500B \approx 2 MB/s.

Blueprint

Concise Summary: A high-performance redirection engine using a distributed NoSQL store for durability and Redis for low-latency lookups.
Major Components:
API Gateway: Handles rate limiting and request routing.
Shortening Service: Manages ID generation, Base62 encoding, and storage logic.
NoSQL Database (DynamoDB/Cassandra): Provides a highly scalable Key-Value store for URL mappings.
Redis Cache: Stores hot URL mappings to ensure ultra-low latency for frequent redirects.
Simplicity Audit: This design avoids complex batch processing or message queues, focusing purely on the synchronous request-response flow required for the MVP.
Architecture Decision Rationale:
Why this architecture?: NoSQL is chosen over SQL because the data is naturally a Key-Value pair (ShortURL -> LongURL) and needs to scale horizontally without complex joins.
Functional Requirement Satisfaction: Handles core shortening and redirection with 0(1) lookup time.
Non-functional Requirement Satisfaction: Redis ensures latency targets, while NoSQL ensures multi-region availability and scalability.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Use Global Server Load Balancing (GSLB) to route users to the nearest regional deployment.
Security & Perimeter:
API Gateway: Implements JWT validation (if auth is added) and Rate Limiting (e.g., 100 requests/min per IP) to prevent system abuse.
WAF: Standard protection against common web attacks.

Service

Topology & Scaling:
Stateless Services: The Shortening Service is stateless, allowing auto-scaling based on CPU/Request count.
Multi-AZ: Deployed across 3 Availability Zones to handle data center failures.
API Schema Design:
POST /v1/shorten:
Request: { "long_url": "string", "custom_alias": "string?", "expiry_delta": "int?" }
Response: { "short_url": "string" }
GET /{short_url}:
Returns 302 Redirect to the long URL.
Resilience & Reliability:
Retries: Shortening service retries DB writes with exponential backoff.
Circuit Breaker: If the DB is slow, the service fails fast to prevent resource exhaustion.
Observability:
Metrics: Monitor p99 latency for redirects and DB write errors.

Storage

Access Pattern: 100:1 Read/Write. Reads are by Primary Key (ShortCode).
Database Table Design:
URL_Mapping Table:
short_id (String, Partition Key): The Base62 encoded string.
long_url (String): The destination.
created_at (Timestamp).
expiry_at (Timestamp, TTL Index).
Technical Selection: Amazon DynamoDB or Cassandra.
Rationale: DynamoDB provides predictable performance at any scale and has built-in TTL support to automatically delete expired URLs without manual cleanup scripts.
Distribution Logic:
The short_id serves as a natural partition key, ensuring even distribution across the cluster.

Cache

Purpose & Justification: To reduce read latency and decrease the load on the NoSQL database for viral/popular links.
Key-Value Schema:
Key: url:{short_id}
Value: long_url string.
TTL: 24 hours (with LRU eviction).
Technical Selection: Redis.
Rationale: In-memory performance with support for cluster mode.
Failure Handling: If Redis is down, the system falls back to the NoSQL DB (Cache-aside pattern).
Wrap Up

Advanced Topics

Trade-offs (Consistency vs. Availability): We prioritize Availability for redirects. Using a 302 (Found) instead of a 301 (Permanent) allows us to capture click data on every visit and change the mapping if needed.
ID Generation (Alternative): We could use a Counter in a DB, but that creates a single point of failure. A Distributed ID Generator (Snowflake-like) or a Range-based ID service (using Zookeeper/Etcd) is more robust for high-scale environments.
Collision Handling: By using a distributed counter and Base62 encoding it, we mathematically guarantee no collisions, which is more efficient than hashing and checking for existence in the DB.
Hot Spot Mitigation: Viral links (e.g., a link shared on Twitter) could hit a single DB partition. Redis caches these links, effectively absorbing the "thundering herd."
Security: To prevent "enumeration attacks," we can use a small random offset in our ID generation so short codes aren't perfectly sequential.