Load Balancing & Service Reliability
To understand how a Load Balancer differs from a Reverse Proxy and an API Gateway, and how they coexist in a production network path, see the Reverse Proxy vs. Load Balancer vs. API Gateway Guide.
Load Balancing Algorithmsโ
| Algorithm | How | Best For |
|---|---|---|
| Round Robin | Rotate through servers in order | Equal capacity servers, stateless |
| Weighted Round Robin | More requests to higher-weight servers | Mixed capacity servers |
| Least Connections | Route to server with fewest active connections | Variable request durations |
| IP Hash | Hash client IP โ same server | Session affinity, WebSocket |
| Random | Random server selection | Simple, low overhead |
| Resource-based | Route based on CPU/memory | CPU-intensive workloads |
L4 vs L7 Load Balancingโ
| L4 (Transport) | L7 (Application) | |
|---|---|---|
| Works at | TCP/UDP level | HTTP/HTTPS level |
| Content awareness | No (binary stream) | Yes (URL, headers, cookies) |
| TLS termination | No | Yes |
| Routing by | IP:port | URL path, headers, cookies |
| Performance | Higher | Slightly lower (parses headers) |
| Examples | AWS NLB, HAProxy (L4 mode) | AWS ALB, nginx, Envoy |
Load Balancer Architectureโ
Internet
โ
DNS (Route 53) โ TTL-based failover across regions
โ
Global Load Balancer (Anycast IP)
โ
Regional Load Balancer (L7)
โ
Service Instance Pool
โ
Internal Load Balancer (service mesh / k8s)
โ
Downstream Services
Health Checksโ
Typesโ
| Type | Description | Use |
|---|---|---|
| Passive | Track errors/timeouts on live traffic | Real traffic quality signal |
| Active | Periodic synthetic request to health endpoint | Proactive failure detection |
| Hybrid | Both passive + active | Best coverage |
# nginx health check config
upstream backend {
server app1:8080;
server app2:8080;
check interval=3000 rise=2 fall=3 timeout=1000 type=http;
check_http_send "GET /actuator/health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx;
}
Kubernetes Probesโ
livenessProbe: # Is the container alive? Restart if fails.
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe: # Is the container ready to serve traffic? Remove from LB if fails.
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 2
startupProbe: # Is the app done starting? Don't liveness-kill a slow start.
httpGet:
path: /actuator/health
port: 8080
failureThreshold: 30
periodSeconds: 10
Failover Strategiesโ
Active-Passive (Hot Standby)โ
Primary: handles all traffic
Standby: idle, ready to take over
On primary failure:
DNS failover (1โ5 min TTL) โ Standby
OR
Heartbeat + VIP (Virtual IP) โ Standby claims IP
RPO (Recovery Point Objective): Time since last replication
RTO (Recovery Time Objective): Time to restore service
Active-Activeโ
Both primaries handle traffic
Traffic split across multiple regions/AZs
On failure of one:
Load balancer removes from pool
Remaining primary absorbs load
Multi-Regionโ
Region A (us-east) โโ Region B (eu-west) [bidirectional replication]
โ โ
Users (US) Users (EU)
On Region A failure:
DNS โ Route all traffic to Region B
Region B reads are slightly stale (replication lag)
Graceful Degradationโ
Design systems to provide reduced functionality rather than complete failure.
@CircuitBreaker(name = "recommendations", fallbackMethod = "defaultRecommendations")
public List<Product> getRecommendations(Long userId) {
return recommendationService.getPersonalized(userId);
}
// Fallback: show popular items instead of personalized
public List<Product> defaultRecommendations(Long userId, Exception ex) {
log.warn("Recommendation service unavailable, using popular items fallback");
return productService.getMostPopular(10);
}
Degradation Levelsโ
Level 0: Full functionality (green)
Level 1: Personalization disabled, use cached/popular content (yellow)
Level 2: Read-only mode, no writes accepted (orange)
Level 3: Static maintenance page (red)
Chaos Engineeringโ
Deliberately inject failures to find weaknesses before users do.
Principlesโ
- Define steady state (normal behavior)
- Hypothesize what will happen during failure
- Introduce failure in controlled way
- Compare actual vs expected
Common Chaos Experimentsโ
| Experiment | What to Test |
|---|---|
| Kill random pod | Service resilience, restart behavior |
| Introduce network latency | Timeout handling, circuit breakers |
| Drop packets | Retry logic, idempotency |
| Exhaust connection pool | Backpressure, error handling |
| Spike CPU to 90% | Autoscaling, latency under load |
| Kill a DB replica | Failover, replication handling |
Chaos Monkey (Spring Boot)โ
// Chaos Monkey for Spring Boot (Netflix Chaos Monkey style)
@ChaosMonkey(
assaults = {LatencyAssault.class},
watcher = {ServiceWatcher.class}
)
@Service
public class OrderService { ... }
Disaster Recoveryโ
RTO & RPO Targetsโ
| Tier | RTO | RPO | Strategy |
|---|---|---|---|
| Tier 1 (Critical) | < 1 hour | ~0 (zero data loss) | Active-active, synchronous replication |
| Tier 2 (Important) | < 4 hours | < 1 hour | Active-passive, async replication |
| Tier 3 (Standard) | < 24 hours | < 24 hours | Backup + restore |
| Tier 4 (Low) | < 72 hours | < 72 hours | Periodic backup |
DR Runbook Checklistโ
- Identify failed component(s)
- Assess data loss window (check last replication timestamp)
- Activate DR environment
- Point DNS to DR
- Verify functionality with smoke tests
- Notify stakeholders
- Document incident timeline
- Post-mortem within 48 hours
Zero-Downtime Deploymentsโ
Blue-Greenโ
Blue (current v1) โ 100% traffic
Green (new v2) โ Deploy + test with 0% traffic
Switch LB:
Blue (current v1) โ 0% traffic (keep for rollback)
Green (new v2) โ 100% traffic
Canaryโ
v1: 95% traffic
v2: 5% traffic (canary)
Watch metrics for 30 min...
If OK: gradually increase v2 to 100%
If bad: route all back to v1 (instant rollback)
# Kubernetes canary with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% to canary
- pause: {duration: 10m}
- setWeight: 25
- pause: {duration: 10m}
- setWeight: 100
Rolling Updateโ
v1: [pod1, pod2, pod3, pod4]
Update pod1 โ v2, health check passes
Update pod2 โ v2, health check passes
Update pod3 โ v2, health check passes
Update pod4 โ v2
Done: [pod1v2, pod2v2, pod3v2, pod4v2]
Anycastโ
CDNs use anycast to route users to the nearest PoP automatically.
Same IP address (e.g., 104.16.0.0) announced from multiple locations
BGP routing automatically sends packets to the nearest PoP
User in Tokyo โ Tokyo PoP (same IP, different physical server)
User in London โ London PoP
No DNS magic needed โ routing infrastructure handles it
Used by: Cloudflare, Google (8.8.8.8), root DNS servers.
Session Persistence (Sticky Sessions)โ
When application state is stored in-server memory, all requests from a user must go to the same server.
User Alice โ LB โ Server A (session stored on A)
Next request from Alice โ must go to Server A (not B or C)
Methods:
1. Cookie-based: LB injects SERVERID cookie
Set-Cookie: SERVERID=server-a; Path=/
2. IP-based: hash source IP (breaks with NAT)
3. Application-level: store session in Redis (preferred โ stateless servers)
โ Eliminates need for sticky sessions
Avoid sticky sessions when possible. Store session data in a distributed cache (Redis) so any server can handle any request โ true horizontal scaling.
nginx Load Balancer Configurationโ
upstream api_servers {
least_conn; # algorithm
server 10.0.0.1:8080 weight=3;
server 10.0.0.2:8080 weight=2;
server 10.0.0.3:8080 backup; # only used when others are down
keepalive 32; # keep N idle upstream connections open
}
server {
listen 80;
server_name api.example.com;
location /api/ {
proxy_pass http://api_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
location /static/ {
root /var/www;
expires 30d;
add_header Cache-Control "public, immutable";
}
}
Global Server Load Balancing (GSLB)โ
Distributes traffic across multiple data centers globally using DNS:
Primary DC: us-east-1 (203.0.113.10)
DR DC: eu-west-1 (198.51.100.20)
DNS-based GSLB:
Healthy: api.example.com โ 203.0.113.10
Primary fails health check โ DNS switches to 198.51.100.20
(with low TTL for fast failover)
Latency-based GSLB (AWS Route 53):
US user โ us-east-1
EU user โ eu-west-1
Interview Questionsโ
Q: What load balancing algorithm would you use for WebSocket connections? Why?โ
A: Use least-connections (or weighted least-connections) because WebSockets are long-lived and uneven. It balances concurrent connection load better than round-robin.
Q: What is the difference between L4 and L7 load balancing?โ
A: L4 routes by IP/port and is fast/protocol-agnostic; L7 routes by HTTP attributes like path, host, or headers. L7 enables smarter routing and policy but adds more processing overhead.
Q: What is the difference between liveness and readiness probes in Kubernetes?โ
A: Liveness checks if container should be restarted; readiness checks if it can serve traffic now. Readiness protects rollout and dependency warm-up phases.
Q: What is blue-green deployment and how does it enable zero-downtime releases?โ
A: Run old and new stacks in parallel, then switch traffic atomically to the new stack. Rollback is fast by flipping traffic back to the old environment.
Q: What is canary deployment? How do you decide when to proceed vs rollback?โ
A: Canary sends a small traffic slice to new version first and expands gradually. Advance only if error/latency/business KPIs stay within guardrails; otherwise auto-rollback.
Q: What is chaos engineering and why is it important?โ
A: Chaos engineering injects controlled faults to validate resilience assumptions in production-like conditions. It exposes hidden coupling before real incidents do.
Q: What is RPO and RTO? How do you design a system to meet given targets?โ
A: RPO is acceptable data loss window; RTO is acceptable recovery time. Meet targets with replication frequency, backup strategy, failover automation, and regular disaster drills.
Q: How do you implement graceful degradation in a microservices system?โ
A: Prioritize core paths and shed optional features during overload/failure. Use timeouts, fallbacks, cached defaults, and feature flags to keep essential functionality alive.
Q: What is the difference between active-passive and active-active failover?โ
A: Active-passive keeps standby idle until failover, simplifying consistency but increasing failover delay. Active-active serves traffic in multiple sites continuously, improving availability but complicating conflict handling.
Q: How do you design a multi-region system that remains consistent during a regional outage?โ
A: Classify data by consistency need: strongly consistent writes via quorum/primary strategy, and eventually consistent data via async replication. Automate failover with clear write-routing and reconciliation plans.
CDN & Network Level Load Balancing Questionsโ
Q1. What is the difference between L4 and L7 load balancing?
L4 (transport layer) load balancing operates on TCP/UDP โ it sees source/destination IP and port but not application data. Fast, protocol-agnostic, but can't route by URL path or headers. L7 (application layer) operates on HTTP โ it can route by URL, headers, cookies, method; terminate TLS; perform content-based routing; modify requests. More overhead but much more flexible.
Q2. What is the difference between round-robin and least-connections algorithms?
Round-robin cycles requests evenly across servers โ good for stateless apps where each request takes similar time. Least-connections sends to the server with fewest active connections โ better when request duration varies significantly (e.g., some requests take 100ms, others take 10s). Least-connections prevents piling requests onto a slow server that's busy with long operations.
Q3. What are sticky sessions and why are they problematic at scale?
Sticky sessions (session affinity) ensure all requests from one user go to the same backend server โ needed when session state is in server memory. Problems: uneven load distribution, a server crash loses all its users' sessions, prevents true horizontal scaling. Solution: externalize session state to Redis (or a database), making all servers stateless and interchangeable.
Q4. How does anycast work and where is it used?
Anycast announces the same IP address from multiple geographic locations. BGP routing naturally sends packets to the topologically nearest location announcing that prefix. There's no DNS trick โ the internet's routing infrastructure handles it. Used by: CDNs (serve from nearest PoP), public DNS resolvers (8.8.8.8 works from anywhere), root DNS servers, DDoS mitigation (absorb attacks at multiple locations).
Q5. What is the purpose of health checks in load balancing?
Health checks detect unhealthy backends before they cause user-facing errors. Active checks proactively send probes (HTTP GET to
/health) and mark backends down before failures impact traffic. Passive checks detect failures from live traffic patterns. Without health checks, the LB would send requests to dead servers, causing errors for those users until the problem is noticed manually.
Q6. What is cache busting and why is it needed with CDNs?
When static assets (JS, CSS) are updated but clients have cached the old version, users see outdated code. Cache busting adds a content hash to filenames (
app.a3f4b.js) so each new version has a unique URL โ never cached as the old version. CDN servesapp.a3f4b.jswith very long TTL (immutable); after deployment, the HTML referencesapp.c7d2a.js(new hash) โ a fresh request.
Q7. How would you design a load balancer health check for a Spring Boot application?
Expose Spring Boot Actuator's
/actuator/healthendpoint. Configure it to check database connectivity, cache availability, and any critical dependencies. Return HTTP 200 when healthy, 503 when not. Configure the LB to send GET/actuator/healthevery 10โ30s, mark unhealthy after 2โ3 failures, mark healthy after 2 successes. Protect the endpoint โ restrict access to LB's IP range or use a separate management port.
Q8. Explain CDN cache invalidation strategies.
Options: (1) TTL expiry โ wait for TTL to expire (simplest; acceptable for slow-changing content); (2) Versioned URLs โ change the URL on update (hash in filename); cache old versions expire naturally; (3) API purge โ call CDN's purge API to immediately invalidate specific URLs or patterns (Cloudflare, Fastly, CloudFront all support this); (4) Cache tags/surrogate keys โ tag cached objects and purge all objects with a tag (e.g., purge all objects tagged
product:42).