DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.DowngradedOur downstream service providers are currently experiencing outages, and our engineering team is actively working on a resolution. Some services—including the Solver, Partner, and Tools—are temporarily degraded with higher latency and lower bandwidth. Rest assured, Intervipedia, Solutions, and the Question Bank features are not impacted and remain fully operational.
The Question
Design

Scalable URL Shortening Service

Design a global URL shortening service similar to Bitly. The system must support generating shortened aliases for long URLs and perform high-speed redirections. Consider a scale of 100 million new URLs per month and 10 billion redirections. Address the challenges of unique ID generation without collisions, low-latency redirection, and how to handle database growth over a 5-year period while maintaining high availability.
NoSQL
DynamoDB
Cassandra
Redis
ZooKeeper
CDN
Base62
Bloom Filter
API Gateway
Questions & Insights

Clarifying Questions

Scale & Traffic: What is the expected volume of new URLs created per month and the number of redirections (reads)?
Key Constraints: What is the desired length of the shortened URL? Should they be human-readable or random?
Features: Do we need to support custom aliases, link expiration, or private links?
Analytics: Is real-time click-tracking/analytics required for the MVP, or can we stick to basic redirection?
Availability vs. Consistency: In the event of a network partition, do we prioritize being able to create new links or ensuring that a newly created link is immediately available globally?
Assumptions for this design:
Scale: 100M new URLs/month, 10B redirections/month (100:1 read/write ratio).
Storage: 5-year retention, resulting in ~6 billion records.
Latency: Redirection should be < 20ms at the tail (p99).
Availability: 99.99% (highly available redirection).

Thinking Process

Core Bottleneck: Generating unique, non-colliding short IDs at scale without introducing a single point of failure or high latency during the write path.
Efficiency: How do we handle 10B redirects monthly without hammering the database? (Caching strategy).
Scale: How do we store 6B records efficiently while maintaining fast lookups?
Progressive Logic:
How do we generate a unique 7-character string? (Base62 encoding + ID Generation).
How do we prevent ID collisions during concurrent writes? (Key Generation Service/KGS).
How do we optimize the read path (Redirection) for global users? (CDN/Cache).
How do we shard the data to ensure the system scales horizontally?

Bonus Points

Key Generation Service (KGS): Using a dedicated service with pre-generated keys in a "range-based" allocation via ZooKeeper to eliminate database round-trips for ID uniqueness.
Bloom Filters: Implementing Bloom Filters in the cache layer to quickly identify non-existent short URLs, preventing "cache-miss attacks" on the database.
Geo-Proximity Routing: Utilizing Anycast IP or Latency-based DNS to route users to the nearest PoP (Point of Presence) for redirection.
Cold Storage Migration: Strategy for moving expired or "dead" links (no hits in 1 year) to cheaper S3-based cold storage to keep the hot database size manageable.
Design Breakdown

Functional Requirements

Core Use Cases:
Shorten a long URL to a unique short alias (e.g., tiny.url/abc123).
Redirect a short URL to the original long URL with minimum latency (301/302 Redirect).
Scope Control:
In-scope: Shortening, redirection, basic expiration.
Out-of-scope: Detailed user analytics dashboard, custom UI, link editing, malware scanning of destination URLs.

Non-Functional Requirements

Scale: Must handle 4,000+ RPS (Read) and 40+ RPS (Write) on average, with 10x peaks.
Latency: Redirection (Read) must be extremely fast (<20ms).
Availability: High availability is critical; if the service is down, all links are broken.
Consistency: Eventual consistency is acceptable; it is okay if a newly created link takes 1-2 seconds to be "visible" globally.
Fault Tolerance: No single point of failure in ID generation or storage.

Estimation

Traffic Estimation:
Writes: 100M / (30 days 24h 3600s) ≈ 40 URLs/sec.
Reads: 10B / (30 days 24h 3600s) ≈ 4,000 URLs/sec.
Storage Estimation:
Record size: URL ID (8 bytes) + Long URL (500 bytes) + CreatedAt (4 bytes) + Expiry (4 bytes) ≈ 520 bytes.
Total Storage (5 years): 6B records * 520 bytes ≈ 3.12 TB.
Bandwidth Estimation:
Incoming (Write): 40 * 520 bytes ≈ 21 KB/s.
Outgoing (Read): 4,000 * 520 bytes ≈ 2 MB/s.

Blueprint

Concise Summary: A microservices-based architecture utilizing a Key Generation Service for ID assignment and a distributed NoSQL store for high-performance lookups.
Major Components:
API Gateway: Handles rate limiting, authentication, and request routing.
URL Write Service: Manages the creation of short URLs and interacts with the KGS.
URL Read Service: Handles high-volume redirection lookups.
Key Generation Service (KGS): A stateful service that provides unique IDs to the Write Service using a range-based approach.
Redis Cache: Stores hot URL mappings to minimize database load.
NoSQL DB (DynamoDB/Cassandra): Persistent store for URL mappings, chosen for horizontal scalability.
Simplicity Audit: This design avoids complex distributed locking by using a range-based ID generator (KGS), making it highly performant and easy to scale.
Architecture Decision Rationale:
Why NoSQL?: Our data is simple (Key-Value), and we need to scale to billions of records. NoSQL offers better horizontal scaling than RDBMS for this specific schema.
Functional Requirement Satisfaction: KGS ensures unique IDs; Cache + NoSQL ensures fast redirection.
Non-functional Requirement Satisfaction: Multi-node deployment ensures High Availability; Redis ensures low Latency.

High Level Architecture

Sub-system Deep Dive

Edge (Optional)

Content Delivery & Traffic Routing:
Use a global CDN (Cloudflare/Fastly) to cache the redirection logic at the edge for top-tier "viral" links.
Latency-based DNS routing to direct users to the nearest regional data center.
Security & Perimeter:
API Gateway handles Rate Limiting (e.g., 100 requests/min per IP for creation) to prevent exhaustion attacks.
SSL Termination at the Load Balancer.

Service

Topology & Scaling:
Stateless services (Read/Write) deployed across multiple Availability Zones (Multi-AZ).
Auto-scaling based on CPU and Request Count.
API Schema Design:
POST /v1/shorten: Shortens a URL. Protocol: REST. Returns short_url.
GET /{short_id}: Redirects to long URL. Returns 301 (Permanent Redirect).
Resilience & Reliability:
Implement retries with exponential backoff for KGS requests.
Circuit breakers on the Database connection to prevent cascading failures.

Storage

Access Pattern: 100:1 Read/Write ratio. Primary key lookup is the dominant pattern.
Database Table Design:
short_id (String, Partition Key)
long_url (String)
user_id (String, Global Secondary Index)
created_at (Timestamp)
expiry_at (Timestamp)
Technical Selection: DynamoDB or Cassandra.
Rationale: Predictable performance at scale, managed (if DynamoDB), and optimized for key-value lookups.
Distribution Logic:
Hash-based sharding on short_id ensures even distribution across the cluster.

Cache

Purpose & Justification: Reduces database load for viral links.
Key-Value Schema: Key: short_id, Value: long_url. TTL: 24 hours (sliding window for active links).
Failure Handling: If Redis is down, fall back to Database. Use "Cache Aside" pattern.

Infrastructure (Optional)

Distributed Coordination:
ZooKeeper: Used by the KGS to manage ID ranges. Each KGS instance grabs a range of IDs (e.g., 1 million IDs at a time) to avoid constant locking of the database.
Wrap Up

Advanced Topics

Trade-offs: We choose Eventual Consistency for link creation across regions to ensure high availability for the 99.9% of users who are just clicking links.
Key Generation Alternatives:
UUID: Too long (36 chars), bad for "tiny" URLs.
MD5/SHA: Requires collision handling (Check-then-insert), which is slow at scale.
KGS (Range-based): Selected because it is the most efficient for generating short, unique, sequential-looking strings using Base62.
Security:
Short IDs are generated from a random range to prevent users from "guessing" the next URL in the sequence.
Bottleneck Analysis:
The KGS range allocation in ZooKeeper could be a bottleneck if ranges are too small. We will set large ranges (1M) to minimize coordination frequency.