Load Balancing

A core architectural component that sits between clients and servers, distributing incoming network traffic across a pool of healthy backend resources to optimize resource utilization, maximize throughput, and ensure high availability.

Cheat Sheet

Prime Use Case

Whenever a system requires horizontal scaling, high availability (HA), or the ability to perform zero-downtime deployments.

Critical Tradeoffs

  • Introduction of an additional network hop vs. increased system throughput
  • Centralized management vs. single point of failure (if not redundant)
  • Application-aware routing (L7) vs. high-performance packet switching (L4)

Killer Senior Insight

Load balancing is fundamentally a 'health-aware' proxy. Without aggressive, multi-layered health checks, a load balancer is simply a mechanism for distributing failure across your entire fleet.

Recognition

Common Interview Phrases

How do we handle a sudden spike in traffic?
What happens if one of our application servers crashes?
How do we route users to the closest data center?
We need to scale our web tier horizontally.

Common Scenarios

  • Distributing HTTP requests across a cluster of web servers
  • Internal service-to-service communication in a microservices architecture
  • Global traffic management using Geo-DNS and Anycast
  • Database read-replica load distribution

Anti-patterns to Avoid

  • Using a single Load Balancer instance without an HA pair (Active-Passive)
  • Relying solely on DNS Round Robin for high-availability (due to TTL caching issues)
  • Applying L7 balancing to high-volume, low-latency UDP traffic where L4 is more efficient

The Problem

The Fundamental Issue

The 'Single Server Bottleneck' and 'Single Point of Failure' (SPOF) inherent in vertical scaling.

What breaks without it

Total system outage if the primary server fails

Performance degradation as a single node reaches CPU/Memory limits

Inability to perform maintenance without taking the service offline

Why alternatives fail

DNS Round Robin cannot detect if a server is down in real-time, leading to 'black-holing' traffic

Client-side hardcoding of IPs is brittle and impossible to manage at scale

Manual failover is too slow for modern SLA requirements (e.g., 99.99%)

Mental Model

The Intuition

Imagine a busy restaurant with a host at the front. Instead of customers crowding one table, the host checks which waiters are free and which tables are clean, then directs guests to the most appropriate spot to ensure everyone is served quickly.

Key Mechanics

1

Health Checking: Periodically pinging backends to ensure they are 'alive' and 'ready'

2

Algorithm Selection: Choosing how to pick the next server (e.g., Round Robin, Least Connections, IP Hash)

3

Session Persistence: Ensuring a user stays on the same server if the app is stateful (Sticky Sessions)

4

Termination: Handling SSL/TLS handshakes at the LB to offload CPU work from backends

Framework

When it's the best choice

  • L7 (Application Layer) when you need to route based on URLs, cookies, or HTTP headers
  • L4 (Transport Layer) when you need maximum throughput and low latency for TCP/UDP
  • Global Server Load Balancing (GSLB) for multi-region disaster recovery

When to avoid

  • Extremely low-latency trading systems where the 1-2ms hop is unacceptable
  • Simple internal tools with very low traffic where a single instance is sufficient and cost-effective

Fast Heuristics

If stateful app
use Consistent Hashing or Sticky Sessions
If heterogeneous server hardware
use Weighted Round Robin
If long-lived connections (e.g., WebSockets)
use Least Connections

Tradeoffs

+

Strengths

  • Seamless horizontal scaling
  • Isolation of server failures
  • Simplified security (centralized WAF and SSL management)
  • Enables Blue-Green and Canary deployments

Weaknesses

  • Increased architectural complexity
  • Potential for 'Thundering Herd' if health checks are misconfigured
  • Additional cost (hardware or cloud provider fees)

Alternatives

DNS Round Robin
Alternative

When it wins

For global entry points where you want to distribute traffic across different IP ranges/regions.

Key Difference

Operates at the DNS level; no awareness of server health or connection state.

Client-Side Load Balancing
Alternative

When it wins

In microservices (e.g., using gRPC or Netflix Ribbon) to eliminate the extra hop of a centralized LB.

Key Difference

The client holds a list of healthy endpoints and chooses one directly.

Anycast
Alternative

When it wins

For routing traffic to the topologically nearest node (common in CDNs).

Key Difference

Multiple nodes share the same IP address; the BGP protocol handles the routing.

Execution

Must-hit talking points

  • Distinguish between L4 (TCP) and L7 (HTTP) load balancing
  • Explain 'Consistent Hashing' for cache-friendly load balancing
  • Discuss 'Deep Health Checks' (checking DB connectivity) vs 'Shallow Health Checks' (TCP ping)
  • Mention the 'Load Balancer for the Load Balancer' (using Keepalived or VRRP for HA)

Anticipate follow-ups

  • Q:How do you handle the 'Thundering Herd' problem when a server comes back online?
  • Q:What are the security implications of SSL Termination at the LB?
  • Q:How does the LB handle 'Slow Start' for new servers to prevent overwhelming them?

Red Flags

Ignoring the 'Sticky Session' trap.

Why it fails: If one server becomes a 'hot spot' because it holds all the long-lived sessions, the LB cannot rebalance that traffic, leading to localized failure.

Misconfiguring Health Check intervals.

Why it fails: Too frequent checks can DDoS your own backends; too infrequent checks mean users hit dead servers for seconds before the LB notices.

Forgetting to scale the Load Balancer itself.

Why it fails: A single software LB instance has a maximum PPS (Packets Per Second) limit; failing to use an HA pair or a managed service leads to a SPOF.