The Question

Scalable URL Shortening Service

Design a global URL shortening service similar to Bitly. The system must support generating shortened aliases for long URLs and perform high-speed redirections. Consider a scale of 100 million new URLs per month and 10 billion redirections. Address the challenges of unique ID generation without collisions, low-latency redirection, and how to handle database growth over a 5-year period while maintaining high availability.

NoSQL

DynamoDB

Cassandra

Redis

ZooKeeper

CDN

Base62

Bloom Filter

API Gateway

Questions & Insights

Clarifying Questions

Scale & Traffic: What is the expected volume of new URLs created per month and the number of redirections (reads)?

Key Constraints: What is the desired length of the shortened URL? Should they be human-readable or random?

Features: Do we need to support custom aliases, link expiration, or private links?

Analytics: Is real-time click-tracking/analytics required for the MVP, or can we stick to basic redirection?

Availability vs. Consistency: In the event of a network partition, do we prioritize being able to create new links or ensuring that a newly created link is immediately available globally?

Assumptions for this design:

Scale: 100M new URLs/month, 10B redirections/month (100:1 read/write ratio).

Storage: 5-year retention, resulting in ~6 billion records.

Latency: Redirection should be < 20ms at the tail (p99).

Availability: 99.99% (highly available redirection).

Thinking Process

Core Bottleneck: Generating unique, non-colliding short IDs at scale without introducing a single point of failure or high latency during the write path.

Efficiency: How do we handle 10B redirects monthly without hammering the database? (Caching strategy).

Scale: How do we store 6B records efficiently while maintaining fast lookups?

Progressive Logic:

How do we generate a unique 7-character string? (Base62 encoding + ID Generation).

How do we prevent ID collisions during concurrent writes? (Key Generation Service/KGS).

How do we optimize the read path (Redirection) for global users? (CDN/Cache).

How do we shard the data to ensure the system scales horizontally?

Bonus Points

Key Generation Service (KGS): Using a dedicated service with pre-generated keys in a "range-based" allocation via ZooKeeper to eliminate database round-trips for ID uniqueness.

Bloom Filters: Implementing Bloom Filters in the cache layer to quickly identify non-existent short URLs, preventing "cache-miss attacks" on the database.

Geo-Proximity Routing: Utilizing Anycast IP or Latency-based DNS to route users to the nearest PoP (Point of Presence) for redirection.

Cold Storage Migration: Strategy for moving expired or "dead" links (no hits in 1 year) to cheaper S3-based cold storage to keep the hot database size manageable.

Design Breakdown

Functional Requirements

Core Use Cases:

Shorten a long URL to a unique short alias (e.g., tiny.url/abc123).

Redirect a short URL to the original long URL with minimum latency (301/302 Redirect).

Scope Control:

In-scope: Shortening, redirection, basic expiration.

Out-of-scope: Detailed user analytics dashboard, custom UI, link editing, malware scanning of destination URLs.

Non-Functional Requirements

Scale: Must handle 4,000+ RPS (Read) and 40+ RPS (Write) on average, with 10x peaks.

Latency: Redirection (Read) must be extremely fast (<20ms).

Availability: High availability is critical; if the service is down, all links are broken.

Consistency: Eventual consistency is acceptable; it is okay if a newly created link takes 1-2 seconds to be "visible" globally.

Fault Tolerance: No single point of failure in ID generation or storage.

Estimation

Traffic Estimation:

Writes: 100M / (30 days 24h 3600s) ≈ 40 URLs/sec.

Reads: 10B / (30 days 24h 3600s) ≈ 4,000 URLs/sec.

Storage Estimation:

Record size: URL ID (8 bytes) + Long URL (500 bytes) + CreatedAt (4 bytes) + Expiry (4 bytes) ≈ 520 bytes.

Total Storage (5 years): 6B records * 520 bytes ≈ 3.12 TB.

Bandwidth Estimation:

Incoming (Write): 40 * 520 bytes ≈ 21 KB/s.

Outgoing (Read): 4,000 * 520 bytes ≈ 2 MB/s.

Blueprint

Concise Summary: A microservices-based architecture utilizing a Key Generation Service for ID assignment and a distributed NoSQL store for high-performance lookups.

Major Components:

API Gateway: Handles rate limiting, authentication, and request routing.

URL Write Service: Manages the creation of short URLs and interacts with the KGS.

URL Read Service: Handles high-volume redirection lookups.

Key Generation Service (KGS): A stateful service that provides unique IDs to the Write Service using a range-based approach.

Redis Cache: Stores hot URL mappings to minimize database load.

NoSQL DB (DynamoDB/Cassandra): Persistent store for URL mappings, chosen for horizontal scalability.

Simplicity Audit: This design avoids complex distributed locking by using a range-based ID generator (KGS), making it highly performant and easy to scale.

Architecture Decision Rationale:

Why NoSQL?: Our data is simple (Key-Value), and we need to scale to billions of records. NoSQL offers better horizontal scaling than RDBMS for this specific schema.

Functional Requirement Satisfaction: KGS ensures unique IDs; Cache + NoSQL ensures fast redirection.

Non-functional Requirement Satisfaction: Multi-node deployment ensures High Availability; Redis ensures low Latency.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:

Use a global CDN (Cloudflare/Fastly) to cache the redirection logic at the edge for top-tier "viral" links.

Latency-based DNS routing to direct users to the nearest regional data center.

Security & Perimeter:

API Gateway handles Rate Limiting (e.g., 100 requests/min per IP for creation) to prevent exhaustion attacks.

SSL Termination at the Load Balancer.

Service

Topology & Scaling:

Stateless services (Read/Write) deployed across multiple Availability Zones (Multi-AZ).

Auto-scaling based on CPU and Request Count.

API Schema Design:

POST /v1/shorten: Shortens a URL. Protocol: REST. Returns short_url.

GET /{short_id}: Redirects to long URL. Returns 301 (Permanent Redirect).

Resilience & Reliability:

Implement retries with exponential backoff for KGS requests.

Circuit breakers on the Database connection to prevent cascading failures.

Storage

Access Pattern: 100:1 Read/Write ratio. Primary key lookup is the dominant pattern.

Database Table Design:

short_id (String, Partition Key)

long_url (String)

user_id (String, Global Secondary Index)

created_at (Timestamp)

expiry_at (Timestamp)

Technical Selection: DynamoDB or Cassandra.

Rationale: Predictable performance at scale, managed (if DynamoDB), and optimized for key-value lookups.

Distribution Logic:

Hash-based sharding on short_id ensures even distribution across the cluster.

Cache

Purpose & Justification: Reduces database load for viral links.

Key-Value Schema: Key: short_id, Value: long_url. TTL: 24 hours (sliding window for active links).

Failure Handling: If Redis is down, fall back to Database. Use "Cache Aside" pattern.

Infrastructure (Optional)

Distributed Coordination:

ZooKeeper: Used by the KGS to manage ID ranges. Each KGS instance grabs a range of IDs (e.g., 1 million IDs at a time) to avoid constant locking of the database.

Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency for link creation across regions to ensure high availability for the 99.9% of users who are just clicking links.

Key Generation Alternatives:

UUID: Too long (36 chars), bad for "tiny" URLs.

MD5/SHA: Requires collision handling (Check-then-insert), which is slow at scale.

KGS (Range-based): Selected because it is the most efficient for generating short, unique, sequential-looking strings using Base62.

Security:

Short IDs are generated from a random range to prevent users from "guessing" the next URL in the sequence.

Bottleneck Analysis:

The KGS range allocation in ZooKeeper could be a bottleneck if ranges are too small. We will set large ranges (1M) to minimize coordination frequency.