Skip to main content

Reverse Proxy vs. Load Balancer vs. API Gateway

Three terms that appear at the entry point of almost every distributed system diagram โ€” yet they are consistently conflated. All three sit between clients and backend servers. All three forward requests. But their primary responsibilities, operating layers, and architectural purposes are fundamentally different.

Understanding these distinctions is not just academic. Choosing the wrong component at the wrong layer leads to: API Gateway business logic leaking into load balancers, single points of failure from misunderstood health-check behavior, TLS termination at the wrong layer, and authentication gaps that create security vulnerabilities.

Who this guide is for

The Airport Analogyโ€‹

Before any diagrams, a concrete mental model:

Imagine a massive international airport:

Reverse Proxy โ€” The Terminal Facade: Passengers never enter the maintenance hangars, fuel depots, or control tower. They see only the terminal building. The terminal hides the airport's entire internal layout, terminates the security checkpoint (TLS), and presents one front door regardless of which aircraft or gate is actually serving the flight. The reverse proxy hides the specific backend servers behind a single address.

Load Balancer โ€” The Queue Manager: When 5,000 passengers arrive at security simultaneously, a queue manager directs them: "Lanes 1โ€“5 are open, Lane 3 is full, Lane 6 is closed for maintenance." Its entire job is distributing load so no single lane is overwhelmed while others sit empty. It does not care who you are or where you are going โ€” only which lane can serve you fastest right now.

API Gateway โ€” The Customs and Border Control Officer: Before you board your international flight, an officer checks your passport (authentication), verifies your visa type (authorization), limits how many bags you can carry (rate limiting), and translates your customs declaration form if it's in the wrong format (protocol translation). The gateway is a smart, policy-enforcing checkpoint โ€” not just a traffic router.

In a real airport, all three exist simultaneously in a chain. So do they in production systems.


๐Ÿ›ก๏ธ Deep Dive: Reverse Proxyโ€‹

What It Isโ€‹

A Reverse Proxy is an intermediary server positioned in front of one or more backend servers. Clients connect to it as if it were the destination โ€” they never know the backend's real address. The proxy forwards the request, receives the backend's response, and returns it to the client.

Client (Internet)
โ”‚ HTTPS โ†’ api.company.com
โ–ผ
[ Reverse Proxy โ€” Nginx / Caddy / HAProxy ]
โ”‚ HTTP โ†’ 10.0.1.15:8080 (internal, private subnet)
โ–ผ
[ Backend Server โ€” Spring Boot / Node.js ]

The client sees only api.company.com. The backend's actual IP, port, and technology stack are completely hidden.

Forward Proxy vs. Reverse Proxyโ€‹

To understand a reverse proxy, it helps to contrast it with a Forward Proxy (the type of proxy most people are familiar with, like VPNs or IP masking tools):

  • Forward Proxy (Client-Side): Acts on behalf of the client. It sits between the client and the public internet, masking the client's identity. The destination server thinks the request originated from the proxy, not the actual client.
  • Reverse Proxy (Server-Side): Acts on behalf of the server. It sits between the public internet and your backend infrastructure, masking the servers' identities. The client thinks they are talking directly to the destination server, but they are actually talking to the reverse proxy.

How It Works Internallyโ€‹

Step-by-step:

  1. TLS Termination: The proxy holds the SSL certificate. It decrypts HTTPS traffic at the edge. Traffic between the proxy and backend can travel over plain HTTP within a secured private network (VPC/subnet), offloading cryptographic CPU from backend servers.

  2. Request transformation: Injects forwarding headers (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) so backends can see the original client information despite the proxy sitting in between.

  3. Cache check: If the response for this URL is already cached (static files, TTL-based responses), the proxy returns it directly without touching the backend at all.

  4. Backend forwarding: Forwards the request to the appropriate backend server by configured rules (URL path, hostname, etc.).

  5. Response transformation: Compresses the response (gzip/Brotli), strips internal headers that should not leak to clients, and returns to the client over TLS.

Core Responsibilitiesโ€‹

ResponsibilityDescription
Server anonymityHides backend IPs, ports, and internal topology from the public internet
TLS/SSL terminationDecrypts HTTPS at the edge; backends communicate over HTTP internally
Static asset servingServes CSS, JS, images directly from disk โ€” never forwarding to app servers
Response cachingCaches backend responses by URL/headers to reduce repeated backend load
Compressiongzip/Brotli compression of responses before delivery to clients
Request/response rewritingAdd/remove headers, rewrite URLs, inject security headers (HSTS, CSP)
DDoS surface reductionBackends are unreachable from the public internet; only the proxy is exposed

Nginx Configuration โ€” Production Exampleโ€‹

# nginx.conf โ€” production-grade reverse proxy with TLS, caching, compression, security headers
server {
listen 443 ssl http2;
server_name api.company.com;

# TLS Termination
ssl_certificate /etc/ssl/certs/company.crt;
ssl_certificate_key /etc/ssl/private/company.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;

# Compression
gzip on;
gzip_types application/json text/plain text/css application/javascript;
gzip_min_length 1024;

# Static assets โ€” served directly, never reach the app server
location /static/ {
root /var/www/html;
expires 30d;
add_header Cache-Control "public, immutable";
}

# API traffic โ€” forward to backend
location /api/ {
proxy_pass http://backend-pool; # upstream group
proxy_http_version 1.1;

# Preserve original client info
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;

# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
}

# Redirect HTTP โ†’ HTTPS
server {
listen 80;
server_name api.company.com;
return 301 https://$host$request_uri;
}

Core Limitations: A General-Purpose Utilityโ€‹

While a reverse proxy is highly capable, it is fundamentally a general-purpose network utility that operates at the connection and routing layer (typically Layer 7 for HTTP proxying, but without application-level policy awareness). It has key limitations in modern microservices architectures:

  • No API Semantics Awareness: It does not understand the business logic of your APIs. It does not know what /users means versus /orders, or whether a request is authenticated.
  • No Application Auth/AuthZ: It cannot natively validate user identities, verify JWT signature claims against a user registry, or enforce user-specific scopes.
  • Domain Blindness: It treats all HTTP requests as raw bytes to be routed based on basic rules (hostnames, paths, headers) rather than understanding developer policies, business tiers, or client quotas.

Because a reverse proxy alone cannot solve these application-level challenges, systems must evolve to use more specialized components as they grow.

When to Use a Reverse Proxyโ€‹

โœ… You have one application server (monolith) and need TLS termination without burdening the app. โœ… You need to serve static files efficiently alongside a dynamic backend. โœ… You need to hide backend infrastructure from the public internet. โœ… You need basic URL rewriting, header injection, or response compression. โœ… You are in front of a single service โ€” not distributing across a cluster (that is a load balancer's job).


โš–๏ธ Deep Dive: Load Balancerโ€‹

What It Isโ€‹

A Load Balancer is a specialized component designed to distribute incoming traffic across a pool of identical backend servers. Its singular concern is availability and capacity โ€” ensuring no single server becomes a bottleneck or single point of failure.

โ”Œโ”€โ”€โ–บ Backend Server A (healthy โ€” 40 connections)
โ”‚
Client โ”€โ”€โ–บ [ Load ] โ”€โ”€โ”ผโ”€โ”€โ–บ Backend Server B (healthy โ€” 38 connections)
[ Balancer] โ”‚
โ””โ”€โ”€โ–บ Backend Server C (unhealthy โ€” removed from rotation)

L4 vs. L7: The Most Important Distinctionโ€‹

Load balancers operate at one of two layers, with fundamentally different capabilities and performance characteristics:

DimensionL4 (Transport Layer)L7 (Application Layer)
InspectsIP address + TCP/UDP port onlyHTTP headers, URL path, cookies, request body
Routing basisIP + port tuplesURL pattern, header value, HTTP method
TLS handlingPassthrough (cannot decrypt) or terminateTerminates TLS, inspects decrypted content
PerformanceExtremely high (hardware-speed)Lower (must parse HTTP)
Use casesRaw TCP services, databases, non-HTTP protocolsHTTP APIs, microservices, path-based routing
AWS equivalentNLB (Network Load Balancer)ALB (Application Load Balancer)
ExamplesAWS NLB, HAProxy TCP modeAWS ALB, Nginx upstream, HAProxy HTTP mode

How It Works Internally โ€” Health Checkingโ€‹

The most critical function of a load balancer that is often overlooked: active health checking. A load balancer that cannot detect failed backends is useless.

Health check configuration (AWS ALB example):

# Terraform โ€” ALB target group with health checks
resource "aws_lb_target_group" "api" {
name = "api-target-group"
port = 8080
protocol = "HTTP"
vpc_id = aws_vpc.main.id

health_check {
enabled = true
path = "/actuator/health" # Spring Boot Actuator endpoint
port = "traffic-port"
healthy_threshold = 2 # 2 consecutive successes โ†’ mark healthy
unhealthy_threshold = 3 # 3 consecutive failures โ†’ mark unhealthy
timeout = 5 # Seconds to wait for response
interval = 10 # Check every 10 seconds
matcher = "200" # Only HTTP 200 counts as healthy
}
}

Spring Boot health endpoint (what the load balancer calls):

// spring-boot-starter-actuator exposes /actuator/health automatically
// Returns 200 when healthy, 503 when degraded/down

// Custom health indicator โ€” add domain-specific checks
@Component
public class DatabaseHealthIndicator implements HealthIndicator {

private final DataSource dataSource;

@Override
public Health health() {
try (Connection conn = dataSource.getConnection()) {
conn.isValid(1);
return Health.up()
.withDetail("database", "reachable")
.build();
} catch (SQLException e) {
// Returning DOWN causes the load balancer to remove this instance
return Health.down()
.withDetail("database", "unreachable")
.withException(e)
.build();
}
}
}

Load Balancing Algorithmsโ€‹

AlgorithmHow it worksBest for
Round RobinDistributes requests in sequential orderStateless services with uniform request cost
Weighted Round RobinServers with higher weight receive proportionally more trafficMixed instance sizes (e.g., some pods have more CPU)
Least ConnectionsRoutes to the server with the fewest active connectionsLong-running requests (file uploads, streaming, DB queries, or varying execution costs)
Least Response TimeRoutes to the server with the lowest average response timeLatency-sensitive applications
IP HashHashes client IP to always route to the same serverStateful applications requiring session affinity
RandomPicks a random healthy serverSimple, surprisingly effective at scale
Resource-basedRoutes based on actual CPU/memory utilizationCloud-native environments with heterogeneous pods
Least Connections for Heterogeneous Traffic

Unlike Round Robin, which blindly distributes requests, the Least Connections algorithm dynamically adjusts to the actual workload. It is highly effective when request execution times vary significantly (e.g., when some users trigger expensive database queries or file uploads while others hit simple static endpoints).

Session Stickiness (Sticky Sessions)โ€‹

For stateful applications that store session data in-memory (legacy apps not yet using distributed sessions), the load balancer can ensure a client always hits the same backend.

Client with session cookie SESS=abc123
โ†’ Load balancer reads SESS cookie
โ†’ Routes to Server B (which has SESS=abc123 in its memory)
โ†’ NOT Server A or C

There are two primary ways to configure session stickiness:

  1. IP Hashing: Routes requests based on hashing the client's IP. However, this is highly prone to traffic imbalances when many clients connect through a shared gateway or NAT (such as a large corporate office).
  2. Cookie-Based Stickiness: The load balancer injects its own cookie (or reads an existing session cookie) to maintain the association. This is much more precise.
Sticky Sessions Are a Scaling Anti-Pattern

Sticky sessions mean you cannot freely scale down, restart, or replace backend instances without losing active user sessions. For modern systems, favor a stateless architecture where session state is externalized (e.g., in Redis or a database), allowing the load balancer to distribute requests completely freely.

When to Use a Load Balancerโ€‹

โœ… You need horizontal scaling โ€” multiple identical instances of the same service. โœ… You need high availability โ€” automatic failover when an instance crashes. โœ… You need zero-downtime deployments โ€” drain connections from old instances before decommissioning. โœ… You have high raw throughput requirements โ€” L4 load balancers handle millions of connections per second. โœ… You need to distribute traffic across Availability Zones for geographic resilience.


๐Ÿšช Deep Dive: API Gatewayโ€‹

What It Isโ€‹

An API Gateway is a specialized L7 reverse proxy purpose-built for microservices ecosystems. It acts as the single, smart, policy-enforcing entry point for all client API requests. Unlike a basic reverse proxy (which routes traffic) or a load balancer (which distributes it), the API Gateway understands API semantics and enforces cross-cutting policies: authentication, authorization, rate limiting, quota management, protocol translation, and response aggregation.

Mobile App / Browser / Partner API
โ”‚
โ–ผ
[ API Gateway โ€” Kong / Spring Cloud Gateway / AWS API Gateway ]
โ”œโ”€โ”€ Authenticate: validate JWT / API key
โ”œโ”€โ”€ Authorize: check user scopes
โ”œโ”€โ”€ Rate limit: 1000 req/min per client
โ”œโ”€โ”€ Route: /v1/orders โ†’ Order Service
โ””โ”€โ”€ Transform: REST/HTTP โ†’ gRPC (internal)
โ”‚
โ”œโ”€โ”€โ–บ User Service (internal gRPC)
โ”œโ”€โ”€โ–บ Order Service (internal gRPC)
โ””โ”€โ”€โ–บ Payment Service (internal gRPC)

The Microservice Perimeter Problemโ€‹

When scaling a system from a monolith to a microservices architecture, a major architectural pain point emerges: duplicating infrastructure concerns.

If you have 12 separate services (e.g., user service, order service, payment service), they all require authentication, rate limiting, logging, and error tracking. Implementing these concerns independently leads to 12 duplicate copies of the same infrastructure code, managed by different teams, written in potentially different languages. This creates severe maintenance overhead, configuration drift, and security vulnerabilities.

The API Gateway solves this by acting as the single "front door" or perimeter guard, handling these cross-cutting, domain-agnostic policies once at the edge, freeing the backend services to focus purely on their core business logic.

How It Works Internally โ€” Request Pipelineโ€‹

An API gateway processes each request through an ordered plugin/filter pipeline โ€” each stage can inspect, modify, reject, or short-circuit the request.

Core Responsibilities In Depthโ€‹

1. Authentication & Authorizationโ€‹

The gateway validates identity once at the perimeter โ€” downstream microservices trust that any request reaching them is already authenticated.

# Kong Gateway โ€” JWT plugin configuration
plugins:
- name: jwt
config:
key_claim_name: kid
claims_to_verify:
- exp # Token must not be expired
- nbf # Token must be active
secret_is_base64: false
run_on_preflight: true
// Spring Cloud Gateway โ€” JWT validation filter
@Component
public class JwtAuthenticationFilter implements GatewayFilter {

private final JwtTokenValidator tokenValidator;

@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String authHeader = exchange.getRequest().getHeaders()
.getFirst(HttpHeaders.AUTHORIZATION);

if (authHeader == null || !authHeader.startsWith("Bearer ")) {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
}

String token = authHeader.substring(7);
return tokenValidator.validate(token)
.flatMap(claims -> {
// Inject validated claims as headers for downstream services
ServerHttpRequest mutatedRequest = exchange.getRequest().mutate()
.header("X-User-Id", claims.getSubject())
.header("X-User-Roles", String.join(",", claims.getRoles()))
.build();
return chain.filter(exchange.mutate().request(mutatedRequest).build());
})
.onErrorResume(e -> {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
});
}
}

2. Rate Limitingโ€‹

The gateway enforces per-client request quotas using algorithms stored in Redis (for distributed, multi-instance enforcement).

Centralized Algorithms Guide

For a comprehensive architectural breakdown of rate-limiting algorithms, implementation code, and decision trade-offs, see the Rate Limiting Algorithms Guide.

Token Bucket algorithm (conceptual):

Each client gets a "bucket" that holds N tokens.
Each request consumes 1 token.
Tokens refill at a fixed rate (e.g., 100 tokens/minute).
If bucket is empty โ†’ reject with HTTP 429.
// Spring Cloud Gateway โ€” Redis rate limiter
@Bean
public RouteLocator routes(RouteLocatorBuilder builder, RedisRateLimiter rateLimiter) {
return builder.routes()
.route("order-service", r -> r
.path("/v1/orders/**")
.filters(f -> f
.requestRateLimiter(config -> config
.setRateLimiter(rateLimiter)
.setKeyResolver(userKeyResolver()) // Rate limit per user ID
)
.rewritePath("/v1/orders/(?<segment>.*)", "/api/${segment}")
)
.uri("lb://order-service") // service discovery via Eureka/Consul
)
.build();
}

@Bean
public RedisRateLimiter redisRateLimiter() {
return new RedisRateLimiter(
100, // replenishRate: tokens added per second
200, // burstCapacity: max tokens in bucket
1 // requestedTokens: tokens consumed per request
);
}

@Bean
public KeyResolver userKeyResolver() {
// Rate limit per authenticated user ID
return exchange -> Mono.justOrEmpty(
exchange.getRequest().getHeaders().getFirst("X-User-Id")
).defaultIfEmpty("anonymous");
}

3. Service Discovery Integrationโ€‹

Unlike a reverse proxy with hardcoded backend IPs, an API gateway integrates with a service registry to dynamically resolve which instances are currently healthy and where they are running.

# Spring Cloud Gateway application.yml
spring:
cloud:
gateway:
discovery:
locator:
enabled: true # Auto-create routes from service registry
lower-case-service-id: true
routes:
- id: order-service
uri: lb://order-service # lb:// prefix = resolve from service registry
predicates:
- Path=/v1/orders/**
filters:
- RewritePath=/v1/orders/(?<segment>.*), /api/${segment}
- name: CircuitBreaker
args:
name: orderServiceCB
fallbackUri: forward:/fallback/orders

4. API Composition / Aggregation (Backend-for-Frontend Pattern)โ€‹

The gateway can make parallel calls to multiple microservices and merge their responses โ€” reducing client round-trips.

// Gateway aggregates 3 microservice calls into one response for the dashboard
@RestController
public class DashboardAggregator {

private final WebClient userClient;
private final WebClient orderClient;
private final WebClient notificationClient;

@GetMapping("/v1/dashboard")
public Mono<DashboardResponse> getDashboard(
@RequestHeader("X-User-Id") String userId) {

// All three calls execute in parallel
Mono<UserProfile> userMono = userClient
.get().uri("/users/{id}", userId)
.retrieve().bodyToMono(UserProfile.class)
.onErrorReturn(UserProfile.empty()); // Fallback on failure

Mono<List<Order>> ordersMono = orderClient
.get().uri("/orders?userId={id}&limit=5", userId)
.retrieve().bodyToMono(new ParameterizedTypeReference<>() {})
.onErrorReturn(List.of());

Mono<List<Notification>> notifMono = notificationClient
.get().uri("/notifications?userId={id}&unread=true", userId)
.retrieve().bodyToMono(new ParameterizedTypeReference<>() {})
.onErrorReturn(List.of());

// zip waits for all three and combines the results
return Mono.zip(userMono, ordersMono, notifMono)
.map(tuple -> new DashboardResponse(
tuple.getT1(),
tuple.getT2(),
tuple.getT3()
));
}
}

5. Protocol & Content Translationโ€‹

The gateway can translate network protocols (e.g., exposing public HTTP REST/WebSocket endpoints but translating them to internal gRPC or AMQP message broker commands) and perform payload transformations.

  • Legacy Systems Integration: E.g., translating a modern client JSON payload into legacy XML format required by an old SOAP payment service.
  • Response Sanitization: E.g., stripping internal backend debug fields, database IDs, or sensitive stack traces from headers/bodies before returning responses to the client.
External Client: POST /v1/orders HTTP/1.1 (JSON over HTTP)
โ†“
API Gateway: Translates protocol & format
โ†“
Internal Service: OrderService.CreateOrder(CreateOrderRequest) (Protobuf over HTTP/2)

When to Use an API Gatewayโ€‹

โœ… You have multiple microservices that need a unified, versioned public API surface. โœ… You need centralized authentication without duplicating JWT validation in every service. โœ… You need rate limiting, quota management, or API monetization. โœ… You need to shield clients from internal service topology changes (service renamed, split, merged). โœ… You need protocol translation (REST โ†’ gRPC, HTTP/1.1 โ†’ HTTP/2, REST โ†’ WebSocket). โœ… You need API versioning (/v1/, /v2/) without touching downstream services.


๐ŸŒ€ The Spectrum of Capabilities & Tool Overlapโ€‹

To design scalable systems, it is vital to realize that Reverse Proxies, Load Balancers, and API Gateways are not competing, isolated technologies โ€” they represent an evolutionary spectrum of network capabilities.

Each stage builds upon the foundation of the previous one:

  1. Reverse Proxy (Foundational Layer): Focuses on raw connection handling, TLS termination, caching, compression, and basic IP hiding.
  2. Load Balancer (Scale Layer): Adds horizontal scaling, backend health awareness, failover handling, and L4/L7 traffic distribution.
  3. API Gateway (Application Layer): Adds fine-grained API semantics, authorization scopes, per-client rate limits, payload transformations, monetization, and developer portal policies.

Why the Lines Are Blurred: The Tool Overlapโ€‹

Because these roles represent a spectrum, the software tools we use do not strictly respect these boundaries. A single product is often configured to wear multiple hats:

  • NGINX: Originally designed as a high-performance reverse proxy and web server. By adding an upstream block, it acts as a load balancer. By compiling it with Lua (via OpenResty) and injecting plugins for authentication and rate-limiting, it behaves as an API gateway.
  • Kong: A dedicated API gateway. However, underneath the hood, Kong is built directly on top of NGINX and OpenResty. It relies on NGINX's reverse proxying and load balancing capabilities, layering its own admin API and plugins on top.
  • AWS Application Load Balancer (ALB) vs. AWS API Gateway: An AWS ALB operates at Layer 7 and is capable of content-based routing (e.g., directing /users to a user service). However, it does not validate JWTs, rate limit per user, or perform payload translation. For those capabilities, you must layer the AWS API Gateway in front of or instead of the ALB.

Understanding these overlaps allows you to choose tools based on the specific capabilities your architecture demands, rather than the marketing names of the products.


โš–๏ธ Alternatives & When to Choose Whatโ€‹

Before picking a component, understand the full landscape including the emerging Service Mesh option.

Full Comparison Matrixโ€‹

DimensionReverse ProxyL4 Load BalancerL7 Load BalancerAPI GatewayService Mesh
Primary concernHiding backends, TLS, cachingRaw TCP distributionHTTP-aware distributionAPI policy enforcementEast-west service-to-service
OSI LayerL7 (L4 passthrough available)L4L7L7L4 + L7 sidecar
Auth / AuthZโŒ Basic onlyโŒ NoneโŒ Noneโœ… Full JWT/OAuth2โš ๏ธ mTLS only
Rate limitingโš ๏ธ Basic (Nginx limit_req)โŒ NoneโŒ Noneโœ… Per-client, per-routeโŒ None
Service discoveryโŒ Static configโŒ Staticโš ๏ธ Manualโœ… Dynamic (Eureka/Consul/K8s)โœ… Automatic
Health checkingโš ๏ธ Passive onlyโœ… Activeโœ… Activeโœ… Active + circuit breakerโœ… Active
Protocol translationโŒโŒโŒโœ… RESTโ†”gRPC, HTTPโ†”WSโŒ
API compositionโŒโŒโŒโœ…โŒ
Observabilityโš ๏ธ Access logsโš ๏ธ Flow logsโš ๏ธ Access logsโœ… Distributed tracingโœ… Automatic traces
Traffic encryptionTLS terminationTLS passthroughTLS terminationTLS terminationmTLS between services
Operational complexityLowVery lowLowMediumHigh
Best forSingle app, monolithHigh-volume TCPHTTP scalingMicroservices API perimeterKubernetes-native service mesh

1. Reverse Proxy Only (Monolith / Simple Services)โ€‹

Internet โ†’ [ Nginx ] โ†’ Single App Server

Choose when: One application, one server, TLS needed, static assets to serve. Adding a load balancer or gateway would be over-engineering.


2. Reverse Proxy + L4 Load Balancer (Scaling Without Smart Routing)โ€‹

Internet โ†’ [ AWS NLB ] โ†’ [ Nginx cluster ] โ†’ App Servers

Choose when: High raw throughput of TCP connections (millions/sec), no need for HTTP-level routing. NLB handles TCP distribution; Nginx handles TLS and static assets.


3. L7 Load Balancer with Path Routing (Simple Microservices)โ€‹

Internet โ†’ [ AWS ALB ] โ†’ /api/users โ†’ User Service
โ†’ /api/orders โ†’ Order Service

Choose when: Small number of microservices, no complex auth needs, cost-sensitive (no gateway license needed). AWS ALB's listener rules cover simple path-based routing without a dedicated gateway.


4. Full Stack: L4 LB + API Gateway + Services (Production Microservices)โ€‹

Internet โ†’ [ NLB ] โ†’ [ API Gateway ] โ†’ [ Services ]

Choose when: Production microservices with auth, rate limiting, versioning, and dynamic service discovery. This is the gold standard for most enterprise systems.


5. Service Mesh: The Fourth Entrantโ€‹

A Service Mesh (Istio, Linkerd, Consul Connect) solves a different problem than the other three: east-west traffic (service-to-service inside the cluster), not north-south (client-to-cluster).

Without Service Mesh:
Service A โ†’ Service B (plain HTTP, no auth, no retry, no circuit breaker)

With Service Mesh:
Service A โ†’ [Sidecar Proxy] โ†’ [Sidecar Proxy] โ†’ Service B
(mTLS, retries, circuit breaking, distributed tracing โ€” automatic)

Service Mesh vs. API Gateway:

ConcernAPI GatewayService Mesh
Traffic directionNorth-south (client โ†’ cluster)East-west (service โ†’ service)
Auth mechanismJWT / OAuth2 / API KeymTLS (mutual TLS certificates)
ScopeExternal API surfaceAll internal service communication
Protocol translationโœ… REST โ†” gRPCโŒ
Rate limitingโœ… Per external clientโŒ
Deployment modelCentralized gateway pod(s)Sidecar injected into every pod

They are complementary, not alternatives. A production Kubernetes cluster often runs both: an API Gateway for the external perimeter and a service mesh for internal security and observability.


How They Coexist in Productionโ€‹

Standard Enterprise Architectureโ€‹

The Request Lifecycle (Every Layer Explained)โ€‹

1. DNS Resolution: api.company.com resolves to the public IP of the WAF or CDN edge node. Route 53 can also apply geographic routing โ€” directing Asia-Pacific users to the Singapore region, EU users to Frankfurt.

2. WAF (Web Application Firewall): Sits in front of everything. Inspects raw HTTP for SQL injection patterns, XSS payloads, known CVE exploit patterns, and blocks malicious IPs. This is not a proxy, load balancer, or gateway โ€” it is a security appliance. But it belongs in this request lifecycle.

3. CDN Edge (A Global Network of Reverse Proxies): A CDN (like Cloudflare, CloudFront, or Fastly) is essentially a massive, globally distributed network of reverse proxies. By placing edge servers geographically closer to users, the CDN terminates TLS at the edge, serves static assets (JS, CSS, images) and cached API responses instantly, and absorbs massive traffic spikes before they ever reach your origin network.

4. L4 Load Balancer (AWS NLB): Handles raw TCP connection distribution across multiple API Gateway instances. Does not inspect HTTP. Does not terminate TLS (or optionally does). Operates at wire speed โ€” millions of connections per second. Provides cross-AZ redundancy for the API Gateway cluster itself.

5. API Gateway Cluster: Terminates TLS (if not done at NLB). Validates JWT tokens. Checks rate limits per user/IP in Redis. Routes based on URL path to the appropriate microservice. Injects authenticated user context as headers. Emits distributed trace spans.

6. Microservices (Private Subnet): Never exposed to the public internet. Accept only connections from the API Gateway's CIDR range via security group rules. Trust X-User-Id and X-User-Roles headers injected by the gateway (they do not re-validate JWTs). Communicate with each other via service mesh mTLS.

Zero-Downtime Deployment Flowโ€‹


Senior Deep Diveโ€‹

1. The API Gateway Monolith Anti-Patternโ€‹

The most dangerous long-term failure mode for API gateways. As teams add features, the gateway accumulates business logic that belongs in services.

โŒ Anti-pattern โ€” business logic in the gateway:
GET /v1/order-summary
โ†’ Gateway fetches order
โ†’ Gateway calculates discount based on user tier (BUSINESS LOGIC)
โ†’ Gateway fetches user loyalty points (DOMAIN KNOWLEDGE)
โ†’ Gateway computes final total (DOMAIN LOGIC)
โ†’ Returns composed response

โœ… Correct โ€” gateway stays domain-agnostic:
GET /v1/order-summary
โ†’ Gateway authenticates + routes
โ†’ Order Service computes everything (owns the domain)
โ†’ Gateway returns the response unchanged

Rule of thumb: If removing the gateway's logic would require changing a service's interface or behavior, that logic does not belong in the gateway. The gateway should be replaceable with a different gateway product without rewriting business rules.


2. Circuit Breaker at the Gateway Layerโ€‹

The gateway is the ideal place to implement circuit breakers โ€” if a downstream service is degraded, the gateway can fail fast rather than letting all requests queue up and exhaust threads.

// Spring Cloud Gateway โ€” Resilience4j circuit breaker
@Bean
public RouteLocator routesWithCircuitBreaker(RouteLocatorBuilder builder) {
return builder.routes()
.route("payment-service", r -> r
.path("/v1/payments/**")
.filters(f -> f
.circuitBreaker(config -> config
.setName("paymentServiceCB")
.setFallbackUri("forward:/fallback/payment")
// Open circuit after 50% failure rate in 10s window
)
.retry(retryConfig -> retryConfig
.setRetries(2)
.setStatuses(HttpStatus.SERVICE_UNAVAILABLE)
.setBackoff(Duration.ofMillis(100), Duration.ofSeconds(1), 2, false)
)
)
.uri("lb://payment-service")
)
.build();
}

// Fallback controller โ€” returns degraded response when circuit is open
@RestController
public class FallbackController {

@GetMapping("/fallback/payment")
public ResponseEntity<Map<String, String>> paymentFallback() {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of(
"error", "Payment service temporarily unavailable",
"retryAfter", "30"
));
}
}

Circuit breaker states:

CLOSED (normal): Requests pass through. Failure rate monitored.
โ†“ (failure rate > threshold)
OPEN (failing fast): All requests immediately return 503. Backend gets no load.
โ†“ (after timeout)
HALF-OPEN (probing): Small % of requests pass through to test recovery.
โ†“ (probe succeeds)
CLOSED (recovered): Full traffic resumes.

3. TLS Termination Strategy: Where to Decrypt?โ€‹

TLS can be terminated at different layers, each with different security and performance implications.

Option A โ€” Terminate at L4 Load Balancer (NLB):
Client โ†’ [NLB: TLS terminate] โ†’ [API Gateway: HTTP] โ†’ [Services: HTTP]
โœ… Offloads crypto from gateway
โŒ Traffic between NLB and gateway is plain HTTP โ€” only safe within a trusted VPC

Option B โ€” TLS Passthrough at L4, terminate at API Gateway:
Client โ†’ [NLB: TCP passthrough] โ†’ [API Gateway: TLS terminate] โ†’ [Services: HTTP]
โœ… Gateway can inspect SNI-based routing, perform full TLS policy enforcement
โœ… NLB never sees decrypted traffic
โœ… Most common production pattern

Option C โ€” End-to-end TLS (mTLS to services):
Client โ†’ [NLB: TCP passthrough] โ†’ [Gateway: TLS terminate] โ†’ [Services: mTLS]
โœ… Encrypted all the way to the service โ€” zero trust
โœ… Required in regulated industries (PCI DSS, HIPAA)
โŒ Each service needs certificate management (service mesh handles this automatically)

4. Observability: What to Instrument at Each Layerโ€‹

Each component should emit its own signals. Correlate them with a shared Trace ID (W3C traceparent header or X-B3-TraceId).

// API Gateway โ€” emit spans for every route decision and downstream call
@Component
public class TracingFilter implements GatewayFilter {

private final Tracer tracer;

@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
Span span = tracer.nextSpan()
.name("api-gateway.route")
.tag("http.method", exchange.getRequest().getMethod().name())
.tag("http.path", exchange.getRequest().getPath().value())
.tag("gateway.route", resolveRouteId(exchange))
.start();

return chain.filter(exchange)
.doOnSuccess(v -> span.tag("http.status",
String.valueOf(exchange.getResponse().getStatusCode().value())))
.doOnError(e -> span.tag("error", e.getMessage()))
.doFinally(s -> span.end());
}
}

Key metrics per layer:

LayerKey MetricsAlerts
L4 Load BalancerActive connections, new connections/sec, unhealthy target countUnhealthy targets > 0, connection errors spike
API GatewayRequest rate, error rate (4xx/5xx), p99 latency, auth failures, rate limit hitsError rate > 1%, p99 latency > 2s, rate limit hit rate growing
Reverse ProxyCache hit ratio, upstream response time, TLS handshake errorsCache hit ratio drops, upstream 5xx rate increases
Backend ServicesJVM heap, GC pause, DB pool utilization, business error ratesHealth check failures โ€” propagate to LB immediately

5. Choosing Between Kong, AWS API Gateway, and Spring Cloud Gatewayโ€‹

CriterionKong GatewayAWS API GatewaySpring Cloud Gateway
Deployment modelSelf-hosted (K8s / VM) or Kong CloudFully managed AWS serviceEmbedded in Spring Boot app
Protocol supportHTTP, gRPC, WebSocket, TCPHTTP, WebSocketHTTP, WebSocket, gRPC (via filter)
Plugin ecosystem100+ plugins (OSS + Enterprise)Lambda integrationsSpring ecosystem (Spring Security, Resilience4j, etc.)
Service discoveryConsul, Kubernetes, customAWS service integrations onlyEureka, Consul, Kubernetes
Rate limitingBuilt-in (Redis/Postgres backed)Built-in (usage plans)Custom (Redis RateLimiter)
AuthJWT, OAuth2, LDAP, OIDC pluginsIAM, Cognito, Lambda authorizerSpring Security + custom filters
Operational overheadMedium (K8s deployment)ZeroLow (embedded in app)
CostOpen source (self-hosted)Per-call pricing ($3.50/million requests)Free (Spring OSS)
Best forPolyglot microservices on K8sAWS-native serverless architecturesSpring Boot microservices teams

๐ŸŽฏ Interview Decision Matrixโ€‹

Decision Flowโ€‹

Is it ONE server / monolith that needs TLS and static serving?
โ†’ Reverse Proxy (Nginx)

Does it need to scale HORIZONTALLY across IDENTICAL instances?
โ†’ L4 Load Balancer (AWS NLB) for TCP
โ†’ L7 Load Balancer (AWS ALB) for HTTP with path routing

Does it serve a MICROSERVICES ecosystem needing auth + rate limiting + routing?
โ†’ API Gateway (Kong / Spring Cloud Gateway / AWS API Gateway)

Do services need to communicate SECURELY with each other INSIDE the cluster?
โ†’ Service Mesh (Istio / Linkerd) โ€” in addition to gateway, not instead of

Interview Q&Aโ€‹

Q: Since an API Gateway is technically a reverse proxy, why not just call it a reverse proxy?

A: While an API Gateway uses reverse-proxying mechanics, its semantic purpose is categorically different. A reverse proxy is a general-purpose network utility โ€” it routes traffic, terminates TLS, and caches content. An API Gateway is an application architecture component that understands APIs: it validates tokens, enforces per-client rate limits, routes by business rules, aggregates responses, and translates protocols. Calling an API Gateway a "reverse proxy" is like calling a hospital's triage nurse a "receptionist" because they both stand at the front desk.

Q: Can Nginx replace Kong as an API Gateway?

A: Nginx can approximate some API gateway behaviors via Lua scripts (lua-resty-* modules) or the OpenResty distribution. For simple use cases โ€” basic JWT validation, rate limiting with limit_req โ€” this is viable. But it does not scale to production API gateway requirements: per-client rate limit tracking across pods (requires distributed Redis state), dynamic service discovery without config reloads, rich plugin ecosystems with upgrade management, or zero-downtime route configuration changes. Kong is itself built on Nginx/OpenResty โ€” it adds the plugin architecture, admin API, and operational tooling that raw Nginx lacks.

Q: What is the risk of putting too much logic into the API Gateway?

A: The gateway becoming a logical monolith. When domain-specific business logic (discount calculation, loyalty point checks, order validation) migrates into gateway plugins, the gateway becomes tightly coupled to every service's domain model. Changes to any service's logic now require gateway deployments. The correct boundary: the gateway handles generic, domain-agnostic perimeter concerns (auth, rate limiting, routing, TLS). Any logic that could be described as belonging to a specific service's domain must stay in that service.

Q: How does service discovery work in dynamic environments like Kubernetes?

A: Instead of hardcoded backend IPs in static config files (which would require gateway redeployments for every pod scaling event), the API gateway integrates with the service registry. In Kubernetes, this is typically CoreDNS โ€” the gateway routes to order-service.default.svc.cluster.local, and Kubernetes DNS + kube-proxy transparently distributes connections across healthy pod IPs. In non-Kubernetes environments, it integrates with Consul or Eureka, querying the registry per-request (with caching). The gateway never needs to know a specific IP โ€” only the logical service name.

Q: When would you use a Service Mesh instead of (or in addition to) an API Gateway?

A: They solve orthogonal problems. An API Gateway manages north-south traffic: external clients calling into your cluster. It enforces external-facing policies (API keys, OAuth2, rate limits, public URL structure). A Service Mesh manages east-west traffic: services calling each other inside the cluster. It enforces internal policies (mTLS between services, internal retries, circuit breaking, distributed tracing without code changes). In Kubernetes production systems, you typically deploy both: the API Gateway as the external perimeter, and Istio or Linkerd as the internal service-to-service security and observability layer.

Interview Phrasing โ€” Choosing the Stack

"For a production microservices platform, I would layer these components: an AWS NLB at L4 for raw TCP connection distribution and cross-AZ resilience, Kong or Spring Cloud Gateway as the API Gateway for JWT validation, rate limiting, and service routing, and each microservice exposing a Spring Actuator health endpoint so the load balancer can pull unhealthy instances from rotation automatically. For service-to-service communication inside the cluster on Kubernetes, I'd add Istio for mTLS and automatic distributed tracing, keeping the gateway focused purely on the external perimeter and not leaking internal service topology or business logic into it."


๐Ÿ“š Further Readingโ€‹

See Alsoโ€‹

  • Rate Limiting Algorithms: Explore the conceptual design, comparison, and pseudocode implementations of all core rate-limiting algorithms.