The Question

Scalable URL Shortener Design

Design a high-performance URL shortening service like Bitly. The system must support generating shortened URLs from long links, custom aliases, and expiration policies. On access, the short link must redirect to the original URL with a P99 latency under 100ms. The system should handle 100 million new links per month and a 100:1 read/write ratio. Key requirements include high availability, link durability, abuse prevention (enumeration defense), and a mechanism for capturing basic click analytics without impacting redirection speed.

NoSQL

DynamoDB

Redis

Kafka

Base62 Encoding

API Gateway

Bloom Filter

Anycast DNS

CDC

Questions & Insights

Clarifying Questions

What is the expected scale of the system? (Assumption: 100 million new URLs per month, with a 100:1 read-to-write ratio, totaling ~4,000 QPS for reads and ~40 QPS for writes).

What is the character set and length for the short URL? (Assumption: Base62 [a-zA-Z0-9], 7 characters long, providing

62^7 \approx 3.5

trillion unique combinations).

Should redirects be 301 (Permanent) or 302 (Found)? (Assumption: 302 Found to ensure all clicks hit our servers for accurate analytics collection, though 301 is better for SEO).

What is the default expiration for URLs? (Assumption: Default 5 years, but users can specify shorter durations).

Are custom aliases required to be globally unique? (Assumption: Yes, custom aliases must be unique and checked against the existing short URL space).

Thinking Process

Core Bottleneck: The system is read-heavy. Success depends on ultra-low latency redirection and efficient mapping of a short ID to a long URL.

Progressive Logic:

How do we generate unique, non-predictable short IDs at scale?

How do we handle the massive read volume while maintaining <50ms latency?

How do we store billions of mappings durably without performance degradation?

How do we capture analytics without slowing down the user's redirection experience?

Bonus Points

Bloom Filters: Use Bloom Filters in the Write path to quickly check if a custom alias exists before hitting the database, reducing unnecessary I/O.

Geographic Routing: Use Anycast IP or Latency-based DNS routing to direct users to the nearest Point of Presence (PoP) for the fastest possible TCP/TLS handshake.

Write-around Cache Strategy: Since most newly created URLs aren't clicked immediately, use a "write-around" strategy to avoid polluting the cache with one-time write data.

Consistent Hashing: Use consistent hashing for the caching tier to minimize cache misses during node scaling.

Design Breakdown

Functional Requirements

Core Use Cases:

Shorten a long URL to a unique 7-character short link.

Support custom user-defined aliases (e.g., bit.ly/my-promo).

Redirect users from short URL to long URL with low latency.

Set expiration times for links.

Scope Control:

In-scope: Shortening, redirection, basic analytics (click counts), custom aliases.

Out-of-scope: User accounts/auth (for MVP), advanced geo-analytics, link editing/management UI.

Non-Functional Requirements

Scale: Support 100M writes/month and 10B reads/month.

Latency: Redirects must be <100ms (P99).

Availability: 99.99% availability (High availability is critical; if the shortener is down, the link is broken).

Consistency: Strong consistency for link creation (no duplicate short IDs); eventual consistency for analytics.

Fault Tolerance: No single point of failure; data must be replicated across zones.

Security: Prevent URL enumeration (crawling all links) by using non-sequential IDs.

Estimation

Traffic Estimation:

Write QPS:

100M / (30 \times 24 \times 3600) \approx 40

requests/sec.

Read QPS:

40 \times 100 = 4,000

requests/sec.

Peak Read QPS:

4,000 \times 2 = 8,000

requests/sec.

Storage Estimation:

5 years of data:

100M \times 12 \times 5 = 6

Billion records.

Per record: Long URL (500B) + Short URL (10B) + Metadata (90B)

\approx 600

bytes.

Total Storage:

6B \times 600B \approx 3.6

TB.

Bandwidth Estimation:

Incoming (Writes):

40 \times 600B \approx 24

KB/sec.

Outgoing (Reads):

4,000 \times 600B \approx 2.4

MB/sec.

Blueprint

Concise Summary: A distributed URL shortening service utilizing a NoSQL store for high-scale persistence and a Redis caching layer for rapid redirection.

Major Components:

API Gateway: Handles rate limiting and request routing.

URL Service: Stateless service for creating links and looking up redirects.

ID Generator: Provides unique, non-sequential short IDs.

NoSQL Database: Stores the primary mapping of short_id to long_url.

Redis Cache: Stores hot-link mappings to minimize DB lookups.

Message Queue: Decouples analytics processing from the critical redirection path.

Simplicity Audit: This architecture uses standard, proven components. We avoid complex distributed locking by leveraging NoSQL's atomic "put-if-absent" for custom aliases and a pre-generated ID pool.

Architecture Decision Rationale:

Why this architecture?: Distributed NoSQL (e.g., DynamoDB/Cassandra) handles the 3.6TB storage and 4k QPS easily with horizontal scaling.

Functional Satisfaction: Meets all requirements including custom aliases and expiration via TTL features.

Non-functional Satisfaction: Redis ensures <10ms internal latency, and the async analytics pipeline ensures redirects are not delayed by logging.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing: Global Load Balancing (GSLB) routes users to the nearest regional cluster.

Security & Perimeter:

API Gateway: Implements JWT validation (if auth is added) and strict rate-limiting per IP to prevent scraping.

WAF: Protects against common web attacks and malicious bots.

Service

Topology & Scaling: Stateless URL Service deployed across multiple Availability Zones (AZs). Scaling is based on CPU and Request Count.

API Schema Design:

POST /v1/shorten:

Req: {long_url: string, alias?: string, expire_at?: timestamp}

Resp: {short_url: string}

GET /{short_id}:

Response: 302 Redirect to long_url.

Resilience & Reliability: Circuit breakers on the database and cache connections. If Redis is down, the service falls back directly to the NoSQL DB.

Storage

Access Pattern:

Write: Low frequency, high durability required.

Read: High frequency, low latency key-lookup.

Database Table Design:

short_id (String, Partition Key)

long_url (String)

created_at (Timestamp)

expire_at (Timestamp, TTL Index)

user_id (String, Optional)

Technical Selection: Amazon DynamoDB or Cassandra.

Rationale: DynamoDB provides single-digit millisecond latency at scale and includes a built-in TTL feature for automatic record expiration.

Distribution Logic: Partitioned by short_id hash to ensure even distribution across the cluster.

Cache

Purpose & Justification: Reduces read latency for "viral" or hot links.

Key-Value Schema:

Key: url:{short_id}

Value: {long_url}

TTL: 24 hours (LRU eviction handles the rest).

Technical Selection: Redis.

Failure Handling: If a cache miss occurs, the URL service queries the DB and populates the cache (Lazy Loading).

Messaging

Purpose & Decoupling: Offloads analytics (click count, user agent, referrer) from the request-response cycle.

Event Schema: {short_id: string, timestamp: long, ip: string, user_agent: string}.

Technical Selection: Kafka or AWS SQS.

Rationale: High throughput and persistence.

Data Processing

Processing Model: Simple consumer group (Analytics Workers) reading from the Message Queue.

Output Sinks: A time-series database or a relational DB for simple counting (e.g., PostgreSQL).

Technical Selection: Python/Go Workers.

Wrap Up

Advanced Topics

Trade-offs:

Consistency vs Availability: We choose Eventual Consistency for analytics to ensure high availability and low latency for the redirection.

301 vs 302: 302 is chosen to ensure every click is tracked at the server, though it adds one RTT compared to client-side caching of 301.

Reliability: Multi-AZ deployment of the DB and Cache.

ID Generation:

Alternative 1: Base62(Counter). Problem: Predictable.

Alternative 2: Random String + DB check. Problem: High collision/latency.

Selected Strategy: Pre-generate blocks of unique IDs (using a distributed service like ZooKeeper or a centralized SQL sequence) and hand them to URL servers in chunks of 1000. Use a random offset or shuffle within the chunk to prevent easy enumeration.

Security: Use a 7-character ID to make the search space large (

3.5 \times 10^{12}

) to prevent brute-force crawling.