Skip to main content

Load Balancing & Service Reliability

Comparative Architecture

To understand how a Load Balancer differs from a Reverse Proxy and an API Gateway, and how they coexist in a production network path, see the Reverse Proxy vs. Load Balancer vs. API Gateway Guide.


Load Balancing Algorithmsโ€‹

AlgorithmHowBest For
Round RobinRotate through servers in orderEqual capacity servers, stateless
Weighted Round RobinMore requests to higher-weight serversMixed capacity servers
Least ConnectionsRoute to server with fewest active connectionsVariable request durations
IP HashHash client IP โ†’ same serverSession affinity, WebSocket
RandomRandom server selectionSimple, low overhead
Resource-basedRoute based on CPU/memoryCPU-intensive workloads

L4 vs L7 Load Balancingโ€‹

L4 (Transport)L7 (Application)
Works atTCP/UDP levelHTTP/HTTPS level
Content awarenessNo (binary stream)Yes (URL, headers, cookies)
TLS terminationNoYes
Routing byIP:portURL path, headers, cookies
PerformanceHigherSlightly lower (parses headers)
ExamplesAWS NLB, HAProxy (L4 mode)AWS ALB, nginx, Envoy

Load Balancer Architectureโ€‹

Internet
โ†“
DNS (Route 53) โ†’ TTL-based failover across regions
โ†“
Global Load Balancer (Anycast IP)
โ†“
Regional Load Balancer (L7)
โ†“
Service Instance Pool
โ†“
Internal Load Balancer (service mesh / k8s)
โ†“
Downstream Services

Health Checksโ€‹

Typesโ€‹

TypeDescriptionUse
PassiveTrack errors/timeouts on live trafficReal traffic quality signal
ActivePeriodic synthetic request to health endpointProactive failure detection
HybridBoth passive + activeBest coverage
# nginx health check config
upstream backend {
server app1:8080;
server app2:8080;

check interval=3000 rise=2 fall=3 timeout=1000 type=http;
check_http_send "GET /actuator/health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx;
}

Kubernetes Probesโ€‹

livenessProbe: # Is the container alive? Restart if fails.
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3

readinessProbe: # Is the container ready to serve traffic? Remove from LB if fails.
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 2

startupProbe: # Is the app done starting? Don't liveness-kill a slow start.
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30
periodSeconds: 10

Failover Strategiesโ€‹

Active-Passive (Hot Standby)โ€‹

Primary: handles all traffic
Standby: idle, ready to take over

On primary failure:
DNS failover (1โ€“5 min TTL) โ†’ Standby
OR
Heartbeat + VIP (Virtual IP) โ†’ Standby claims IP

RPO (Recovery Point Objective): Time since last replication
RTO (Recovery Time Objective): Time to restore service

Active-Activeโ€‹

Both primaries handle traffic
Traffic split across multiple regions/AZs

On failure of one:
Load balancer removes from pool
Remaining primary absorbs load

Multi-Regionโ€‹

Region A (us-east) โ†โ†’ Region B (eu-west) [bidirectional replication]
โ†‘ โ†‘
Users (US) Users (EU)

On Region A failure:
DNS โ†’ Route all traffic to Region B
Region B reads are slightly stale (replication lag)

Graceful Degradationโ€‹

Design systems to provide reduced functionality rather than complete failure.

@CircuitBreaker(name = "recommendations", fallbackMethod = "defaultRecommendations")
public List<Product> getRecommendations(Long userId) {
return recommendationService.getPersonalized(userId);
}

// Fallback: show popular items instead of personalized
public List<Product> defaultRecommendations(Long userId, Exception ex) {
log.warn("Recommendation service unavailable, using popular items fallback");
return productService.getMostPopular(10);
}

Degradation Levelsโ€‹

Level 0: Full functionality (green)
Level 1: Personalization disabled, use cached/popular content (yellow)
Level 2: Read-only mode, no writes accepted (orange)
Level 3: Static maintenance page (red)

Chaos Engineeringโ€‹

Deliberately inject failures to find weaknesses before users do.

Principlesโ€‹

  1. Define steady state (normal behavior)
  2. Hypothesize what will happen during failure
  3. Introduce failure in controlled way
  4. Compare actual vs expected

Common Chaos Experimentsโ€‹

ExperimentWhat to Test
Kill random podService resilience, restart behavior
Introduce network latencyTimeout handling, circuit breakers
Drop packetsRetry logic, idempotency
Exhaust connection poolBackpressure, error handling
Spike CPU to 90%Autoscaling, latency under load
Kill a DB replicaFailover, replication handling

Chaos Monkey (Spring Boot)โ€‹

// Chaos Monkey for Spring Boot (Netflix Chaos Monkey style)
@ChaosMonkey(
assaults = {LatencyAssault.class},
watcher = {ServiceWatcher.class}
)
@Service
public class OrderService { ... }

Disaster Recoveryโ€‹

RTO & RPO Targetsโ€‹

TierRTORPOStrategy
Tier 1 (Critical)< 1 hour~0 (zero data loss)Active-active, synchronous replication
Tier 2 (Important)< 4 hours< 1 hourActive-passive, async replication
Tier 3 (Standard)< 24 hours< 24 hoursBackup + restore
Tier 4 (Low)< 72 hours< 72 hoursPeriodic backup

DR Runbook Checklistโ€‹

  • Identify failed component(s)
  • Assess data loss window (check last replication timestamp)
  • Activate DR environment
  • Point DNS to DR
  • Verify functionality with smoke tests
  • Notify stakeholders
  • Document incident timeline
  • Post-mortem within 48 hours

Zero-Downtime Deploymentsโ€‹

Blue-Greenโ€‹

Blue (current v1) โ† 100% traffic
Green (new v2) โ† Deploy + test with 0% traffic

Switch LB:
Blue (current v1) โ† 0% traffic (keep for rollback)
Green (new v2) โ† 100% traffic

Canaryโ€‹

v1: 95% traffic
v2: 5% traffic (canary)

Watch metrics for 30 min...

If OK: gradually increase v2 to 100%
If bad: route all back to v1 (instant rollback)
# Kubernetes canary with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% to canary
- pause: {duration: 10m}
- setWeight: 25
- pause: {duration: 10m}
- setWeight: 100

Rolling Updateโ€‹

v1: [pod1, pod2, pod3, pod4]
Update pod1 โ†’ v2, health check passes
Update pod2 โ†’ v2, health check passes
Update pod3 โ†’ v2, health check passes
Update pod4 โ†’ v2
Done: [pod1v2, pod2v2, pod3v2, pod4v2]

Anycastโ€‹

CDNs use anycast to route users to the nearest PoP automatically.

Same IP address (e.g., 104.16.0.0) announced from multiple locations
BGP routing automatically sends packets to the nearest PoP

User in Tokyo โ†’ Tokyo PoP (same IP, different physical server)
User in London โ†’ London PoP

No DNS magic needed โ€” routing infrastructure handles it

Used by: Cloudflare, Google (8.8.8.8), root DNS servers.


Session Persistence (Sticky Sessions)โ€‹

When application state is stored in-server memory, all requests from a user must go to the same server.

User Alice โ†’ LB โ†’ Server A (session stored on A)
Next request from Alice โ†’ must go to Server A (not B or C)

Methods:
1. Cookie-based: LB injects SERVERID cookie
Set-Cookie: SERVERID=server-a; Path=/

2. IP-based: hash source IP (breaks with NAT)

3. Application-level: store session in Redis (preferred โ€” stateless servers)
โ†’ Eliminates need for sticky sessions
Best Practice

Avoid sticky sessions when possible. Store session data in a distributed cache (Redis) so any server can handle any request โ€” true horizontal scaling.


nginx Load Balancer Configurationโ€‹

upstream api_servers {
least_conn; # algorithm

server 10.0.0.1:8080 weight=3;
server 10.0.0.2:8080 weight=2;
server 10.0.0.3:8080 backup; # only used when others are down

keepalive 32; # keep N idle upstream connections open
}

server {
listen 80;
server_name api.example.com;

location /api/ {
proxy_pass http://api_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}

location /static/ {
root /var/www;
expires 30d;
add_header Cache-Control "public, immutable";
}
}

Global Server Load Balancing (GSLB)โ€‹

Distributes traffic across multiple data centers globally using DNS:

Primary DC: us-east-1 (203.0.113.10)
DR DC: eu-west-1 (198.51.100.20)

DNS-based GSLB:
Healthy: api.example.com โ†’ 203.0.113.10
Primary fails health check โ†’ DNS switches to 198.51.100.20
(with low TTL for fast failover)

Latency-based GSLB (AWS Route 53):
US user โ†’ us-east-1
EU user โ†’ eu-west-1

Interview Questionsโ€‹

Q: What load balancing algorithm would you use for WebSocket connections? Why?โ€‹

A: Use least-connections (or weighted least-connections) because WebSockets are long-lived and uneven. It balances concurrent connection load better than round-robin.

Q: What is the difference between L4 and L7 load balancing?โ€‹

A: L4 routes by IP/port and is fast/protocol-agnostic; L7 routes by HTTP attributes like path, host, or headers. L7 enables smarter routing and policy but adds more processing overhead.

Q: What is the difference between liveness and readiness probes in Kubernetes?โ€‹

A: Liveness checks if container should be restarted; readiness checks if it can serve traffic now. Readiness protects rollout and dependency warm-up phases.

Q: What is blue-green deployment and how does it enable zero-downtime releases?โ€‹

A: Run old and new stacks in parallel, then switch traffic atomically to the new stack. Rollback is fast by flipping traffic back to the old environment.

Q: What is canary deployment? How do you decide when to proceed vs rollback?โ€‹

A: Canary sends a small traffic slice to new version first and expands gradually. Advance only if error/latency/business KPIs stay within guardrails; otherwise auto-rollback.

Q: What is chaos engineering and why is it important?โ€‹

A: Chaos engineering injects controlled faults to validate resilience assumptions in production-like conditions. It exposes hidden coupling before real incidents do.

Q: What is RPO and RTO? How do you design a system to meet given targets?โ€‹

A: RPO is acceptable data loss window; RTO is acceptable recovery time. Meet targets with replication frequency, backup strategy, failover automation, and regular disaster drills.

Q: How do you implement graceful degradation in a microservices system?โ€‹

A: Prioritize core paths and shed optional features during overload/failure. Use timeouts, fallbacks, cached defaults, and feature flags to keep essential functionality alive.

Q: What is the difference between active-passive and active-active failover?โ€‹

A: Active-passive keeps standby idle until failover, simplifying consistency but increasing failover delay. Active-active serves traffic in multiple sites continuously, improving availability but complicating conflict handling.

Q: How do you design a multi-region system that remains consistent during a regional outage?โ€‹

A: Classify data by consistency need: strongly consistent writes via quorum/primary strategy, and eventually consistent data via async replication. Automate failover with clear write-routing and reconciliation plans.

CDN & Network Level Load Balancing Questionsโ€‹

Q1. What is the difference between L4 and L7 load balancing?

L4 (transport layer) load balancing operates on TCP/UDP โ€” it sees source/destination IP and port but not application data. Fast, protocol-agnostic, but can't route by URL path or headers. L7 (application layer) operates on HTTP โ€” it can route by URL, headers, cookies, method; terminate TLS; perform content-based routing; modify requests. More overhead but much more flexible.

Q2. What is the difference between round-robin and least-connections algorithms?

Round-robin cycles requests evenly across servers โ€” good for stateless apps where each request takes similar time. Least-connections sends to the server with fewest active connections โ€” better when request duration varies significantly (e.g., some requests take 100ms, others take 10s). Least-connections prevents piling requests onto a slow server that's busy with long operations.

Q3. What are sticky sessions and why are they problematic at scale?

Sticky sessions (session affinity) ensure all requests from one user go to the same backend server โ€” needed when session state is in server memory. Problems: uneven load distribution, a server crash loses all its users' sessions, prevents true horizontal scaling. Solution: externalize session state to Redis (or a database), making all servers stateless and interchangeable.

Q4. How does anycast work and where is it used?

Anycast announces the same IP address from multiple geographic locations. BGP routing naturally sends packets to the topologically nearest location announcing that prefix. There's no DNS trick โ€” the internet's routing infrastructure handles it. Used by: CDNs (serve from nearest PoP), public DNS resolvers (8.8.8.8 works from anywhere), root DNS servers, DDoS mitigation (absorb attacks at multiple locations).

Q5. What is the purpose of health checks in load balancing?

Health checks detect unhealthy backends before they cause user-facing errors. Active checks proactively send probes (HTTP GET to /health) and mark backends down before failures impact traffic. Passive checks detect failures from live traffic patterns. Without health checks, the LB would send requests to dead servers, causing errors for those users until the problem is noticed manually.

Q6. What is cache busting and why is it needed with CDNs?

When static assets (JS, CSS) are updated but clients have cached the old version, users see outdated code. Cache busting adds a content hash to filenames (app.a3f4b.js) so each new version has a unique URL โ€” never cached as the old version. CDN serves app.a3f4b.js with very long TTL (immutable); after deployment, the HTML references app.c7d2a.js (new hash) โ€” a fresh request.

Q7. How would you design a load balancer health check for a Spring Boot application?

Expose Spring Boot Actuator's /actuator/health endpoint. Configure it to check database connectivity, cache availability, and any critical dependencies. Return HTTP 200 when healthy, 503 when not. Configure the LB to send GET /actuator/health every 10โ€“30s, mark unhealthy after 2โ€“3 failures, mark healthy after 2 successes. Protect the endpoint โ€” restrict access to LB's IP range or use a separate management port.

Q8. Explain CDN cache invalidation strategies.

Options: (1) TTL expiry โ€” wait for TTL to expire (simplest; acceptable for slow-changing content); (2) Versioned URLs โ€” change the URL on update (hash in filename); cache old versions expire naturally; (3) API purge โ€” call CDN's purge API to immediately invalidate specific URLs or patterns (Cloudflare, Fastly, CloudFront all support this); (4) Cache tags/surrogate keys โ€” tag cached objects and purge all objects with a tag (e.g., purge all objects tagged product:42).