CDN

A Content Delivery Network (CDN) is a geographically distributed network of proxy servers and data centers (Points of Presence or PoPs) designed to provide high availability and performance by distributing service spatially relative to end-users.

Cheat Sheet

Prime Use Case

Use when you have a global user base and need to minimize latency for static assets, large media files, or even dynamic content that can be cached at the edge.

Critical Tradeoffs

  • Reduced latency vs. Cache consistency challenges
  • Lower origin server load vs. Increased operational cost
  • Improved availability vs. Debugging complexity (black-box behavior)

Killer Senior Insight

Modern CDNs have evolved from simple static file caches into 'Edge Computing' platforms; they are the first line of defense (WAF/DDoS) and the first layer of logic (Edge Workers), effectively moving the 'Front Door' of your architecture thousands of miles closer to the user.

Recognition

Common Interview Phrases

The system must support millions of concurrent users globally.
Latency is a critical business metric (e.g., e-commerce, gaming).
The origin server is being overwhelmed by requests for static assets.
Requirement for high-bandwidth content like 4K video or large software binaries.

Common Scenarios

  • Static asset hosting (JS, CSS, Images).
  • Video on Demand (VoD) and Live Streaming (HLS/DASH).
  • API Acceleration (caching dynamic responses with short TTLs).
  • Security at the edge (DDoS mitigation and Web Application Firewalls).

Anti-patterns to Avoid

  • Using a CDN for a purely internal application with users in a single office.
  • Caching highly sensitive, frequently changing PII (Personally Identifiable Information) without strict 'Private' headers.
  • Relying on CDN for real-time, low-latency bidirectional communication like WebSockets (though some CDNs support this, it's often not the primary use case).

The Problem

The Fundamental Issue

The 'Speed of Light' problem: Physical distance between a server and a user creates unavoidable propagation delay, leading to high RTT (Round Trip Time) and poor user experience.

What breaks without it

Origin servers crash under 'Flash Crowd' events (e.g., a product launch).

Global users experience multi-second load times due to TCP/TLS handshakes across oceans.

Bandwidth costs at the origin become prohibitively expensive.

Why alternatives fail

Vertical scaling of origin servers doesn't solve physical distance/latency.

Multi-region database deployments are complex and expensive for simple asset delivery.

Local caching (browser-side) only helps on repeat visits, not the critical first-load experience.

Mental Model

The Intuition

Think of a CDN like a global chain of convenience stores. Instead of every customer driving to the central factory (the Origin) to buy milk, the factory sends truckloads to local neighborhood stores (PoPs). Customers get their milk faster, and the factory doesn't have a traffic jam at its front gate.

Key Mechanics

1

DNS Resolution: Using CNAMEs to point to the CDN's managed DNS.

2

Anycast Routing: Routing the user to the topologically nearest edge node using the same IP address.

3

Cache-Control Headers: Directing the edge on how long to store content (TTL).

4

Purging/Invalidation: The mechanism to remove stale content from the edge globally.

5

Origin Shielding: An intermediate cache layer to protect the origin from 'thundering herd' cache misses.

Framework

When it's the best choice

  • When read-to-write ratio is high.
  • When content is static or semi-static.
  • When global availability and low latency are non-negotiable requirements.

When to avoid

  • When data is strictly private and cannot be stored on third-party infrastructure.
  • When content changes every second and has zero cacheability (though Dynamic Site Acceleration might still help with TCP optimization).

Fast Heuristics

If the asset is >100KB and static
CDN is mandatory.
If the user base is distributed across continents
CDN is mandatory.
If the cost of egress from the cloud provider exceeds CDN subscription
CDN is a cost-saver.

Tradeoffs

+

Strengths

  • Massive reduction in Time to First Byte (TTFB).
  • Offloads 90%+ of traffic from origin servers.
  • Built-in DDoS protection and global traffic management.
  • Reduced bandwidth costs via peering and compression (Brotli/Gzip).

Weaknesses

  • Cache invalidation is 'one of the two hard things in computer science'.
  • Potential for 'Stale-while-revalidate' issues leading to UI inconsistencies.
  • Increased complexity in the request-response flow and debugging.
  • Vendor lock-in and potential high costs for premium features like 'Edge Compute'.

Alternatives

Multi-region Origin Deployment
Alternative

When it wins

When the application is highly dynamic and requires low-latency database access rather than just asset delivery.

Key Difference

Involves deploying the full application stack in multiple geographic locations.

P2P Content Delivery
Alternative

When it wins

For extremely large file updates (like game patches) where users can share fragments with each other.

Key Difference

Decentralizes delivery to the client devices themselves rather than managed edge servers.

Execution

Must-hit talking points

  • Mention 'Anycast' for routing users to the nearest PoP.
  • Discuss 'Cache Hit Ratio' (CHR) as the primary success metric.
  • Explain 'Pull' vs 'Push' CDN models.
  • Address the 'Thundering Herd' problem and how 'Origin Shielding' or 'Request Collapsing' mitigates it.
  • Talk about 'Tiered Caching' to improve hit rates.

Anticipate follow-ups

  • Q:How do you handle cache invalidation at scale?
  • Q:How does a CDN handle HTTPS/TLS termination at the edge?
  • Q:What happens if the CDN provider goes down? (Multi-CDN strategy).
  • Q:How do you secure private content (e.g., signed URLs/cookies)?

Red Flags

Setting TTLs too long without a robust invalidation strategy.

Why it fails: Users see outdated content (e.g., old CSS/JS), leading to broken UI or incorrect data that is hard to clear globally.

Forgetting to vary cache keys by headers (like 'Accept-Encoding').

Why it fails: A client that doesn't support Gzip might receive a compressed file, or vice versa, causing errors.

Ignoring the 'Long Tail' of content.

Why it fails: If you have millions of unique assets that are rarely accessed, your Cache Hit Ratio will be low, and the CDN will provide little value while increasing cost.