Load Balancing
Cheat Sheet
Prime Use Case
Whenever a system requires horizontal scaling, high availability (HA), or the ability to perform zero-downtime deployments.
Critical Tradeoffs
- Introduction of an additional network hop vs. increased system throughput
- Centralized management vs. single point of failure (if not redundant)
- Application-aware routing (L7) vs. high-performance packet switching (L4)
Killer Senior Insight
Load balancing is fundamentally a 'health-aware' proxy. Without aggressive, multi-layered health checks, a load balancer is simply a mechanism for distributing failure across your entire fleet.
Recognition
Common Interview Phrases
Common Scenarios
- Distributing HTTP requests across a cluster of web servers
- Internal service-to-service communication in a microservices architecture
- Global traffic management using Geo-DNS and Anycast
- Database read-replica load distribution
Anti-patterns to Avoid
- Using a single Load Balancer instance without an HA pair (Active-Passive)
- Relying solely on DNS Round Robin for high-availability (due to TTL caching issues)
- Applying L7 balancing to high-volume, low-latency UDP traffic where L4 is more efficient
The Problem
The Fundamental Issue
The 'Single Server Bottleneck' and 'Single Point of Failure' (SPOF) inherent in vertical scaling.
What breaks without it
Total system outage if the primary server fails
Performance degradation as a single node reaches CPU/Memory limits
Inability to perform maintenance without taking the service offline
Why alternatives fail
DNS Round Robin cannot detect if a server is down in real-time, leading to 'black-holing' traffic
Client-side hardcoding of IPs is brittle and impossible to manage at scale
Manual failover is too slow for modern SLA requirements (e.g., 99.99%)
Mental Model
The Intuition
Imagine a busy restaurant with a host at the front. Instead of customers crowding one table, the host checks which waiters are free and which tables are clean, then directs guests to the most appropriate spot to ensure everyone is served quickly.
Key Mechanics
Health Checking: Periodically pinging backends to ensure they are 'alive' and 'ready'
Algorithm Selection: Choosing how to pick the next server (e.g., Round Robin, Least Connections, IP Hash)
Session Persistence: Ensuring a user stays on the same server if the app is stateful (Sticky Sessions)
Termination: Handling SSL/TLS handshakes at the LB to offload CPU work from backends
Framework
When it's the best choice
- L7 (Application Layer) when you need to route based on URLs, cookies, or HTTP headers
- L4 (Transport Layer) when you need maximum throughput and low latency for TCP/UDP
- Global Server Load Balancing (GSLB) for multi-region disaster recovery
When to avoid
- Extremely low-latency trading systems where the 1-2ms hop is unacceptable
- Simple internal tools with very low traffic where a single instance is sufficient and cost-effective
Fast Heuristics
Tradeoffs
Strengths
- Seamless horizontal scaling
- Isolation of server failures
- Simplified security (centralized WAF and SSL management)
- Enables Blue-Green and Canary deployments
Weaknesses
- Increased architectural complexity
- Potential for 'Thundering Herd' if health checks are misconfigured
- Additional cost (hardware or cloud provider fees)
Alternatives
When it wins
For global entry points where you want to distribute traffic across different IP ranges/regions.
Key Difference
Operates at the DNS level; no awareness of server health or connection state.
When it wins
In microservices (e.g., using gRPC or Netflix Ribbon) to eliminate the extra hop of a centralized LB.
Key Difference
The client holds a list of healthy endpoints and chooses one directly.
When it wins
For routing traffic to the topologically nearest node (common in CDNs).
Key Difference
Multiple nodes share the same IP address; the BGP protocol handles the routing.
Execution
Must-hit talking points
- Distinguish between L4 (TCP) and L7 (HTTP) load balancing
- Explain 'Consistent Hashing' for cache-friendly load balancing
- Discuss 'Deep Health Checks' (checking DB connectivity) vs 'Shallow Health Checks' (TCP ping)
- Mention the 'Load Balancer for the Load Balancer' (using Keepalived or VRRP for HA)
Anticipate follow-ups
- Q:How do you handle the 'Thundering Herd' problem when a server comes back online?
- Q:What are the security implications of SSL Termination at the LB?
- Q:How does the LB handle 'Slow Start' for new servers to prevent overwhelming them?
Red Flags
Ignoring the 'Sticky Session' trap.
Why it fails: If one server becomes a 'hot spot' because it holds all the long-lived sessions, the LB cannot rebalance that traffic, leading to localized failure.
Misconfiguring Health Check intervals.
Why it fails: Too frequent checks can DDoS your own backends; too infrequent checks mean users hit dead servers for seconds before the LB notices.
Forgetting to scale the Load Balancer itself.
Why it fails: A single software LB instance has a maximum PPS (Packets Per Second) limit; failing to use an HA pair or a managed service leads to a SPOF.