Skip to main content

Database Connection Pooling

:::info Who this guide is for


What is a Database Connection?โ€‹

A database connection is a persistent, stateful communication channel between your application and the database server. It is not just a network socket โ€” it carries:

  • An authenticated session (the DB knows who you are)
  • A transaction context (what isolation level, what's in-flight)
  • Server-side memory allocation (caches, temp buffers, cursor state)
  • A dedicated backend process on PostgreSQL (or thread on MySQL)
Your App (JVM) PostgreSQL Server
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Thread 1 Backend Process 1 (~10MB RAM)
Thread 2 โ†โ”€โ”€ TCP Socket โ”€โ”€โ–บ Backend Process 2 (~10MB RAM)
Thread 3 Backend Process 3 (~10MB RAM)

Every physical connection consumes resources on both sides โ€” not just on the database.

Why creating a connection is expensiveโ€‹

Opening a new connection from scratch requires a full multi-step handshake before a single SQL query can run:

Step 1 โ€” TCP Handshake (1 network round trip)
Client SYN โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Server
Client โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ SYN-ACK
Client ACK โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Server
Duration: 0.5โ€“10ms (depends on network)

Step 2 โ€” TLS/SSL Negotiation (2โ€“4 round trips)
ClientHello โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Server
โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ServerHello + Certificate
Key Exchange โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Server
Duration: 5โ€“50ms

Step 3 โ€” Database Authentication
Auth request + credentials โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Server
โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Permission check + session init
Duration: 2โ€“20ms

Step 4 โ€” Backend Process Allocation (PostgreSQL)
Server forks a new OS process for this connection
Allocates ~10MB RAM (work_mem, sort buffers, etc.)
Duration: 1โ€“5ms

Total: 10โ€“100ms before your first SQL query runs
A typical indexed SQL query: < 1ms

The implication: if every application request opened a fresh connection, connection setup would dominate your response time โ€” especially under load.

Connection cost comparisonโ€‹

OperationTypical duration
Open a new TCP connection0.5โ€“10ms
TLS handshake5โ€“50ms
Database authentication2โ€“20ms
Borrow from existing pool< 1ms
Simple indexed SQL query0.1โ€“5ms
Total new connection cost10โ€“100ms

What is a Connection Pool?โ€‹

A connection pool is a cache of pre-opened, pre-authenticated database connections that application threads borrow and return โ€” rather than creating and destroying a new connection on every request.

The library analogyโ€‹

Without poolingWith pooling
Every student registers for a new library card, gets a background check, receives a laminated card โ€” uses it once, and the card is destroyedThe library keeps 20 pre-made cards at the desk. Students borrow one, do their reading, return it โ€” the next student uses the same card immediately

How borrowing and returning worksโ€‹

Application starts:
Pool creates 10 connections to the DB (pre-authenticated, warm)
Pool: [C1] [C2] [C3] [C4] [C5] [C6] [C7] [C8] [C9] [C10]

HTTP request arrives (Thread A needs a connection):
Thread A borrows C1 โ† instantly, no TCP handshake
Pool: [ ] [C2] [C3] [C4] [C5] [C6] [C7] [C8] [C9] [C10]

Thread A runs queries on C1 (1ms)

Thread A finishes โ†’ returns C1 to pool:
Pool: [C1] [C2] [C3] [C4] [C5] [C6] [C7] [C8] [C9] [C10]

Thread B borrows C1 immediately (no waiting, no handshake)

What happens when all connections are borrowed?โ€‹

10 threads all holding connections simultaneously:
Pool: [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] โ† empty

Thread 11 arrives and asks for a connection:
โ†’ Waits in a queue for connection-timeout (e.g. 30 seconds)
โ†’ If no connection is returned within timeout:
โœ— Throws SQLTransientConnectionException
โœ— HTTP request fails with 500

This is called POOL STARVATION โ€” the most common connection pool failure.

HikariCP โ€” Spring Boot's Default Poolโ€‹

HikariCP is the fastest JVM connection pool. It is the default in Spring Boot since 2.x and is chosen for its extremely low overhead (single-digit microsecond borrow time) and robust failure detection.

Core parameters explainedโ€‹

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ HikariCP Pool (max=10) โ”‚
โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ† Active (borrowed by threads) โ”‚
โ”‚ โ”‚ C1 โ”‚ โ”‚ C2 โ”‚ โ”‚ C3 โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ” โ† Idle (warm, ready) โ”‚
โ”‚ โ”‚ C4 โ”‚ โ”‚ C5 โ”‚ โ”‚ C6 โ”‚ โ”‚ C7 โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”‚
โ”‚ โ† minimum-idle keeps at least N idle connections warm โ†’ โ”‚
โ”‚ โ† maximum-pool-size is the total ceiling โ†’ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
ParameterDefaultWhat it controlsRecommendation
maximum-pool-size10Total connections (active + idle). Hard ceiling the DB sees.Start at 10โ€“20; tune via load test
minimum-idle= maxMinimum idle connections kept warmSet equal to maximum-pool-size (fixed pool)
connection-timeout30,000msHow long a thread waits to borrow before exceptionReduce to 2,000โ€“5,000ms for OLTP
idle-timeout600,000msHow long an idle connection lives before removalOnly relevant if min-idle < max
max-lifetime1,800,000msMaximum age of any connection before recyclingSet 30s shorter than DB's wait_timeout
keepalive-time0 (off)Sends a test query to keep idle connections alive through firewallsSet to 30,000โ€“60,000ms in cloud
leak-detection-threshold0 (off)Logs warning if a connection is held longer than N msSet to 2,000โ€“5,000ms in staging
validation-timeout5,000msHow long to test a connection before declaring it deadKeep default
# application.yaml
spring:
datasource:
url: jdbc:postgresql://db-host:5432/mydb
username: ${DB_USER}
password: ${DB_PASSWORD}
hikari:
# Pool size โ€” tune for your DB server's CPU cores
maximum-pool-size: 20
minimum-idle: 20 # Fixed-size pool โ€” no dynamic churn

# Fail fast under load โ€” don't let threads queue for 30s
connection-timeout: 3000 # 3 seconds

# Recycle connections before DB kills them
max-lifetime: 1800000 # 30 minutes (set 30s below DB's wait_timeout)

# Keep connections alive through cloud firewalls
keepalive-time: 30000 # ping every 30 seconds

# Catch transactional leaks in staging
leak-detection-threshold: 5000

# Connection validation
connection-test-query: SELECT 1 # for MySQL; not needed for PostgreSQL
validation-timeout: 3000

# Pool name โ€” appears in logs and metrics
pool-name: HikariPool-Orders

Fixed-size pool vs dynamic poolโ€‹

Dynamic pool (minimum-idle < maximum-pool-size):
Quiet period: pool shrinks to minimum-idle (e.g. 5 connections)
Traffic spike arrives: pool must create 15 new connections on demand
Each new connection: TCP + TLS + auth = 20โ€“80ms
โ†’ First 15 requests of the spike experience high latency
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
Response time: โ•‘ spike! โ•‘____
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Fixed-size pool (minimum-idle = maximum-pool-size):
Pool always keeps 20 connections warm regardless of traffic
Traffic spike arrives: all 20 connections already ready
โ†’ No latency spike on first requests
___________________________
Response time:

Use a fixed-size pool in production. The marginal cost of keeping extra connections warm is far less than the latency spike of establishing connections on demand.


Pool Sizingโ€‹

The common beginner mistakeโ€‹

"More connections = more parallel queries = higher performance. I'll set my pool to 500."

This is wrong. Here is why:

CPU parallelism limit: a database server with 8 CPU cores can physically execute exactly 8 queries simultaneously. If 500 connections all submit queries, 492 queries queue waiting for a CPU core. The OS spends more time on context switching than executing SQL.

Disk I/O bottleneck: a single SSD has a finite IOPS queue depth (typically 32โ€“128 concurrent I/O operations). 500 concurrent write transactions all competing for the same disk cause seek thrashing โ€” latency goes up, throughput goes down.

Lock contention: more concurrent transactions increase the probability of two transactions waiting for the same row lock. As active connections grow, lock wait time grows super-linearly.

Throughput vs Connection Count:
โ–ฒ
Throughput โ”‚ โ—
โ”‚ โ— โ—
โ”‚ โ— โ—
โ”‚ โ— โ—โ—โ—โ—โ—โ— โ† plateau, then degradation
โ”‚ โ—
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ
Connection count
โ–ฒ
sweet spot

The sizing formulaโ€‹

A widely-used empirical formula from PostgreSQL and Oracle benchmarks:

connections = (CPU_cores ร— 2) + effective_spindle_count

Where:
CPU_cores = physical cores on the DATABASE server (not app server)
effective_spindle_count = 1 for a single SSD
= RAID disk count for spinning disks
= 1 for most cloud RDS/Aurora instances

Examples:
4-core DB server, single SSD: (4 ร— 2) + 1 = 9 โ†’ start at 10
8-core DB server, single SSD: (8 ร— 2) + 1 = 17 โ†’ start at 20
16-core DB server, SSD RAID-10 (4 disks): (16 ร— 2) + 4 = 36 โ†’ start at 40

:::tip This formula gives you a starting point, not the final answer The formula captures CPU and I/O saturation. It does not account for:

  • Query mix (read-heavy vs write-heavy)
  • Average query duration
  • Lock contention patterns
  • Number of application instances sharing the pool

Always validate with load testing โ€” find the "knee" where adding connections stops improving throughput and starts increasing latency. :::

Sizing across multiple application instancesโ€‹

The formula gives the total connections the DB can handle. With multiple app instances, divide across them:

DB can handle: 20 connections total (from formula)
Running 4 app instances (pods/containers):

Max pool per instance = 20 / 4 = 5

# application.yaml on each instance:
hikari.maximum-pool-size: 5

If you scale from 4 to 8 app instances without adjusting the pool size, you go from 20 total connections to 40 โ€” overloading the database. Either reduce per-instance pool size or put a connection proxy (PgBouncer/RDS Proxy) in front.


Failure Modesโ€‹

Pool starvation โ€” the cascade failureโ€‹

Pool starvation occurs when all connections are borrowed and new requests queue, time out, and fail โ€” cascading into a service outage.

Normal state (pool size = 10):
Avg request holds connection for 5ms
At 200 req/sec: 200 ร— 0.005s = 1 connection in use at any time
Pool: 1 active, 9 idle โœ…

Slow query incident (avg hold time jumps to 500ms):
At 200 req/sec: 200 ร— 0.5s = 100 connections needed
Pool only has 10 โ†’ 90 threads queue
Queue fills โ†’ requests timeout at connection-timeout (30s by default)
All 200 req/sec fail with SQLTransientConnectionException
โ†’ Cascade: upstream services retry โ†’ more load โ†’ starvation deepens โŒ

How to detect it: hikaricp.connections.pending > 0 sustained for more than a few seconds.

How to prevent it:

  1. Eliminate slow queries โ€” add indexes, fix N+1 queries.
  2. Reduce connection-timeout to 2โ€“5 seconds so threads fail fast and don't pile up.
  3. Add a circuit breaker โ€” stop accepting requests when pool is exhausted.
  4. Scale the pool (but only after fixing the root cause).

Anti-pattern 1 โ€” Holding connections across slow operationsโ€‹

The most common cause of pool starvation is keeping a connection borrowed during non-database work โ€” external HTTP calls, file I/O, or heavy computation.

// โŒ DANGEROUS: DB connection held for the entire 2-second API call
@Transactional
public void processOrder(Long orderId) {
Order order = orderRepository.findById(orderId).orElseThrow();
// โ† connection is held from here...

PaymentResult result = paymentGateway.charge(order); // โ† 2000ms HTTP call
// โ† ...to here. Connection is idle but unavailable to other threads.

order.setStatus(result.isSuccess() ? PAID : FAILED);
orderRepository.save(order);
}
// With 10 concurrent requests: 10 threads ร— 2s = all 10 pool connections
// exhausted for 2 seconds. Request 11 waits 30s then fails.
// โœ… CORRECT: Connection is held only during actual DB work (ms, not seconds)
public void processOrder(Long orderId) {
// Transaction 1: fetch data โ€” holds connection for ~2ms
Order order = orderService.getOrderById(orderId);

// Outside any transaction โ€” no connection borrowed
PaymentResult result = paymentGateway.charge(order); // 2000ms HTTP call

// Transaction 2: update status โ€” holds connection for ~2ms
orderService.updateStatus(orderId, result.isSuccess() ? PAID : FAILED);
}

@Service
public class OrderService {
@Transactional(readOnly = true)
public Order getOrderById(Long id) { return repo.findById(id).orElseThrow(); }

@Transactional
public void updateStatus(Long id, OrderStatus status) { repo.updateStatus(id, status); }
}

Rule of thumb: the connection is held from the first database call inside a @Transactional method to when the method returns. Every millisecond your method spends on non-DB work is wasted borrowed connection time.


Anti-pattern 2 โ€” N+1 queries depleting the poolโ€‹

N+1 queries hold a connection while executing N serial round trips. Under load, this multiplies connection hold time by N.

// โŒ N+1: holds the connection for 1 + N DB round trips
@Transactional
public List<OrderDto> getOrdersWithUsers(List<Long> orderIds) {
List<Order> orders = orderRepository.findAllById(orderIds); // 1 query
return orders.stream()
.map(o -> {
User user = userRepository.findById(o.getUserId()).orElseThrow(); // N queries
return new OrderDto(o, user);
})
.collect(toList());
}
// 100 orders = 101 queries ร— (avg 2ms each) = 202ms connection hold time
// vs JOIN FETCH: 1 query ร— 5ms = 5ms connection hold time โ€” 40ร— better
// โœ… Fix: single JOIN FETCH โ€” one round trip, connection released 40ร— faster
@Query("SELECT o FROM Order o JOIN FETCH o.user WHERE o.id IN :ids")
List<Order> findAllWithUsers(@Param("ids") List<Long> ids);

Anti-pattern 3 โ€” Thread pool / connection pool mismatch deadlockโ€‹

If your web server has more threads than the pool has connections, you can hit a deadlock when threads depend on each other:

Setup:
Tomcat max-threads = 200
HikariCP maximum-pool-size = 10

Scenario:
200 requests arrive โ†’ Tomcat spawns 200 threads
Thread 1โ€“10 borrow all 10 connections
Thread 11โ€“200 queue waiting for connections

Thread 1 (holding C1) makes an internal REST call to /api/helper
โ†’ That request also needs a connection from the same pool
โ†’ All 10 connections are held by threads 1โ€“10
โ†’ Thread for /api/helper queues at connection-timeout (30s)
โ†’ Thread 1 waits for /api/helper โ†’ Thread 1 cannot release C1
โ†’ DEADLOCK โ€” no thread can make progress

Fix options:

FixApproach
Increase pool sizeAdd more connections to break the cycle (increases DB load)
Separate poolsUse a dedicated pool for internal sub-calls
Reduce connection-timeoutFail fast โ€” threads waiting > 3s throw an exception, breaking the cycle
Remove internal synchronous callsAsync messaging (Kafka/RabbitMQ) eliminates the dependency
@Transactional(readOnly = true)Ensures sub-requests don't also need write connections

Anti-pattern 4 โ€” @Transactional on public vs private methodsโ€‹

Spring's @Transactional uses AOP proxies โ€” it only intercepts calls through the Spring proxy, not direct this.method() calls:

@Service
public class OrderService {

// โŒ Self-invocation โ€” @Transactional is ignored
public void processAll(List<Long> ids) {
for (Long id : ids) {
this.processSingle(id); // โ† calls directly on 'this', not through proxy
}
}

@Transactional
public void processSingle(Long id) {
// This method's @Transactional has no effect when called from processAll()
// Result: no transaction, dirty reads possible, connection borrowed without transaction scope
orderRepository.findById(id);
}

// โœ… Fix: inject self-reference so calls go through the proxy
@Autowired
private OrderService self;

public void processAll(List<Long> ids) {
for (Long id : ids) {
self.processSingle(id); // goes through Spring proxy โ†’ @Transactional works
}
}
}

PgBouncerโ€‹

Why PostgreSQL needs a connection proxyโ€‹

PostgreSQL's architecture spawns a dedicated OS process for every connection โ€” not a lightweight thread. Each process consumes:

  • ~10 MB RAM for memory buffers (work_mem, sort buffers, temp storage)
  • OS process overhead (context switches, scheduling)
  • File descriptor allocation
10 connections โ†’ 100 MB RAM used on the DB server for processes alone
100 connections โ†’ 1 GB RAM
500 connections โ†’ 5 GB RAM โ†’ database server starts swapping โ†’ performance collapses

PgBouncer sits between your application and PostgreSQL, multiplexing many application connections onto a small number of real PostgreSQL connections:

Without PgBouncer:
[App instance 1] (pool=20) โ”€โ”€โ–บ
[App instance 2] (pool=20) โ”€โ”€โ–บ PostgreSQL (60 processes ร— 10MB = 600MB)
[App instance 3] (pool=20) โ”€โ”€โ–บ

With PgBouncer:
[App instance 1] (pool=20) โ”€โ”€โ”
[App instance 2] (pool=20) โ”€โ”€โ”ผโ”€โ”€โ–บ [PgBouncer] โ”€โ”€โ–บ PostgreSQL (20 processes ร— 10MB = 200MB)
[App instance 3] (pool=20) โ”€โ”€โ”˜
โ†‘ โ†‘
60 client connections 20 real DB connections
(cheap โ€” just sockets) (expensive โ€” OS processes)

PgBouncer pool modesโ€‹

A real PostgreSQL connection is assigned to a client only for the duration of one transaction. Once the transaction commits or rolls back, the connection is returned to PgBouncer's pool.

Client 1: BEGIN โ†’ INSERT โ†’ COMMIT โ† uses real connection C1
โ†• (between transactions, C1 is free)
Client 2: BEGIN โ†’ SELECT โ†’ COMMIT โ† uses real connection C1 (same physical conn!)

Multiplexing ratio: 3 real connections can serve 10+ clients if transactions are short.

Gotchas:

FeatureBroken in transaction mode?Fix
Prepared statements (JDBC)โœ… YesSet prepareThreshold=0 in JDBC URL or disable in Hibernate
SET session variablesโœ… YesUse SET LOCAL inside a transaction instead
Temporary tablesโœ… YesUse regular tables with session-scoped cleanup
Advisory locksโœ… YesAvoid or use application-level locks
LISTEN/NOTIFYโœ… YesUse a dedicated non-pooled connection
# Spring datasource URL โ€” disable prepared statement caching for PgBouncer transaction mode
spring.datasource.url=jdbc:postgresql://pgbouncer:5432/mydb?prepareThreshold=0

PgBouncer configurationโ€‹

# pgbouncer.ini
[databases]
mydb = host=postgres-server port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000 # max clients connecting to PgBouncer
default_pool_size = 20 # real PostgreSQL connections per database
reserve_pool_size = 5 # extra connections for emergencies
reserve_pool_timeout = 3 # seconds before using reserve pool

# Connection to PostgreSQL
server_lifetime = 1800 # recycle real connections after 30 min
server_idle_timeout = 600 # remove idle real connections after 10 min
server_connect_timeout = 5
server_login_retry = 3

# Client connection limits
client_login_timeout = 60
query_timeout = 30 # kill queries running over 30 seconds
transaction_timeout = 300 # kill transactions over 5 minutes

# Auth
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Logging
log_connections = 1
log_disconnections = 1
stats_period = 60 # log pool stats every 60 seconds

RDS Proxy (for Serverless)โ€‹

The Lambda connection explosion problemโ€‹

AWS Lambda scales out horizontally โ€” each concurrent invocation is a separate execution environment that opens its own database connections:

Normal load: 10 Lambda instances ร— pool=5 = 50 connections โœ…
Traffic spike: 500 Lambda instances ร— pool=5 = 2,500 connections

2,500 connections hitting RDS:
PostgreSQL: 2,500 ร— 10MB RAM = 25 GB just for connection processes
Result: RDS runs out of memory โ†’ connection refused โ†’ all Lambdas fail โŒ

RDS Proxy solves this by acting as a managed connection pool between Lambda and RDS:

[Lambda 1] โ”€โ”€โ”
[Lambda 2] โ”€โ”€โ”ค
[Lambda 3] โ”€โ”€โ”ค All connect to RDS Proxy (thin, cheap connections)
... โ”€โ”€โ”ค
[Lambda 500]โ”€โ”€โ”˜
โ†“
[RDS Proxy] โ† maintains a small fixed pool of real RDS connections
โ†“
[Amazon RDS] โ† only sees 20โ€“50 real connections, not 2,500

RDS Proxy featuresโ€‹

FeatureDetail
Connection multiplexingThousands of Lambda connections โ†’ tens of real RDS connections
IAM authenticationLambda functions authenticate via IAM role, not hardcoded DB passwords
Failover handlingOn RDS Multi-AZ failover, RDS Proxy queues queries and routes them to the new primary automatically โ€” Lambda functions don't see the failover
Connection pinningSome SQL features (temp tables, SET session variables) "pin" a Lambda to a specific RDS connection โ€” reduces multiplexing efficiency
Secrets Manager integrationDB credentials auto-rotated without Lambda redeployment

When RDS Proxy helps vs hurtsโ€‹

ScenarioRDS Proxy helpful?
Serverless (Lambda, Fargate spot) โ€” many short-lived clientsโœ… Essential
Containers on ECS/EKS with fixed instance countโš ๏ธ PgBouncer may be cheaper
Traditional EC2 application with long-lived poolโŒ Adds latency with no benefit
RDS Multi-AZ with critical failover SLAโœ… Smooth failover
PostgreSQL with heavy use of session featuresโš ๏ธ Connection pinning reduces efficiency

Connection Validation Strategiesโ€‹

A connection can go stale โ€” the database server closes it (timeout, restart, network interruption) while your pool still considers it valid. Borrowing a stale connection results in an immediate SQLException.

How pools detect stale connectionsโ€‹

Before handing a connection to a thread, the pool sends a lightweight test query (SELECT 1). If it fails, the connection is discarded and a new one is created.

# HikariCP โ€” enable test on borrow
hikari:
connection-test-query: SELECT 1 # for older drivers; modern drivers use isValid()
validation-timeout: 3000 # fail if test takes > 3s

Overhead: ~0.1ms per borrow (negligible). Recommended for production.


Connection Pool Library Comparisonโ€‹

<!-- Already included in Spring Boot starter โ€” no extra dependency needed -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
  • Fastest borrow/return time (~microseconds)
  • Smallest codebase (~130KB jar)
  • Best diagnostics (leak detection, metrics)
  • Default in Spring Boot 2.x+

Choose HikariCP for all new Spring Boot applications.


Monitoring & Observabilityโ€‹

Auto-exposing HikariCP metrics via Actuatorโ€‹

<!-- pom.xml โ€” Actuator and Micrometer -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
# application.yaml โ€” expose metrics endpoint
management:
endpoints:
web:
exposure:
include: health, metrics, prometheus
metrics:
tags:
application: ${spring.application.name}
// Register HikariCP metrics with Micrometer
@Configuration
public class HikariMetricsConfig {

@Bean
public MeterRegistryCustomizer<MeterRegistry> hikariPoolMetrics(DataSource dataSource) {
return registry -> {
if (dataSource instanceof HikariDataSource hds) {
new HikariDataSourcePoolMetrics(hds, registry, Tags.empty()).bindTo(registry);
}
};
}
}

Key metrics and alert thresholdsโ€‹

MetricWhat it measuresAlert threshold
hikaricp.connections.activeCurrently borrowed connectionsApproaching maximum-pool-size sustained
hikaricp.connections.pendingThreads waiting for a connection> 0 for more than 5 seconds โ†’ pool starvation
hikaricp.connections.idleWarm, available connectionsDrops to 0 โ†’ imminent starvation
hikaricp.connections.acquireAvg time to borrow a connectionP99 > 10ms โ†’ pool under pressure
hikaricp.connections.creationTime to open a new physical connectionSpike โ†’ dynamic pool churn (fix: fixed-size pool)
hikaricp.connections.usageTime a connection is held per borrowSpike โ†’ long transactions or slow queries
hikaricp.connections.timeout.totalCount of connection-timeout exceptionsAny > 0 โ†’ pool exhaustion events

Grafana dashboard queries (PromQL)โ€‹

# Active connections as % of pool capacity โ€” alert at 80%
hikaricp_connections_active / hikaricp_connections_max * 100

# Pending threads โ€” alert if > 0 for > 30 seconds
hikaricp_connections_pending > 0

# Connection acquisition P99 latency
histogram_quantile(0.99, rate(hikaricp_connections_acquire_seconds_bucket[5m]))

# Timeout rate per minute
rate(hikaricp_connections_timeout_total[1m]) * 60

Connection leak detectionโ€‹

// Enable in staging and production
spring:
datasource:
hikari:
leak-detection-threshold: 5000 # warn if held > 5 seconds

// When a leak is detected, HikariCP logs:
// WARN HikariPool-Orders - Connection leak detection triggered for
// com.example.OrderService.processOrder(OrderService.java:42),
// stack trace follows ...
//
// Stack trace shows EXACTLY which method borrowed and didn't return the connection.

Production Patternsโ€‹

๐Ÿ”ฌ Senior deep-dive: connection pool per DataSource in multi-tenant systems

In a multi-tenant SaaS app where each tenant has their own database schema or database instance, you need a pool per tenant โ€” not one shared pool:

@Configuration
public class MultiTenantDataSourceConfig {

// Lazily created pool per tenant โ€” created on first request, reused thereafter
private final ConcurrentHashMap<String, HikariDataSource> pools = new ConcurrentHashMap<>();

public DataSource getDataSourceForTenant(String tenantId) {
return pools.computeIfAbsent(tenantId, this::createPool);
}

private HikariDataSource createPool(String tenantId) {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db-" + tenantId + ":5432/app");
config.setUsername(getTenantUser(tenantId));
config.setPassword(getTenantPassword(tenantId));
config.setMaximumPoolSize(5); // smaller pool per tenant
config.setMinimumIdle(2);
config.setConnectionTimeout(3000);
config.setPoolName("HikariPool-" + tenantId);
return new HikariDataSource(config);
}

// Shutdown pools gracefully on application stop
@PreDestroy
public void closeAll() {
pools.values().forEach(HikariDataSource::close);
}
}

Key consideration: if you have 1,000 tenants and each pool keeps 2 idle connections, you have 2,000 open connections to your database cluster. Use PgBouncer in front of each database or shared schema isolation to manage this.

๐Ÿ”ฌ Senior deep-dive: read/write splitting with separate pools

For read-heavy applications, route read-only queries to replicas and writes to the primary โ€” each with its own pool:

@Configuration
public class ReadWriteDataSourceConfig {

@Bean
@Primary
public DataSource routingDataSource(
@Qualifier("primaryDs") DataSource primary,
@Qualifier("replicaDs") DataSource replica) {

AbstractRoutingDataSource routing = new AbstractRoutingDataSource() {
@Override
protected Object determineCurrentLookupKey() {
// TransactionSynchronizationManager tells us if we're in a read-only tx
return TransactionSynchronizationManager.isCurrentTransactionReadOnly()
? "replica" : "primary";
}
};
routing.setTargetDataSources(Map.of("primary", primary, "replica", replica));
routing.setDefaultTargetDataSource(primary);
return routing;
}

@Bean("primaryDs")
public DataSource primaryDataSource() {
HikariConfig cfg = new HikariConfig();
cfg.setJdbcUrl("jdbc:postgresql://primary-db:5432/mydb");
cfg.setMaximumPoolSize(20);
cfg.setPoolName("HikariPool-Primary");
return new HikariDataSource(cfg);
}

@Bean("replicaDs")
public DataSource replicaDataSource() {
HikariConfig cfg = new HikariConfig();
cfg.setJdbcUrl("jdbc:postgresql://replica-db:5432/mydb");
cfg.setMaximumPoolSize(40); // more connections โ€” reads are higher volume
cfg.setPoolName("HikariPool-Replica");
return new HikariDataSource(cfg);
}
}

// Usage โ€” Spring routes to replica automatically
@Transactional(readOnly = true) // โ† routes to replica pool
public List<OrderDto> listOrders() { return orderRepository.findAll(); }

@Transactional // โ† routes to primary pool
public OrderDto createOrder(CreateOrderRequest req) { ... }
๐Ÿ”ฌ Senior deep-dive: graceful shutdown and connection draining

During a rolling deployment, you must drain connections gracefully โ€” in-flight queries must complete before the pool closes:

@Configuration
public class GracefulShutdownConfig {

@Bean
public HikariDataSource dataSource() {
HikariConfig config = new HikariConfig();
config.setMaximumPoolSize(20);
config.setConnectionTimeout(3000);
return new HikariDataSource(config);
}

// Spring Boot calls this before the JVM exits
@PreDestroy
public void closeDataSource(HikariDataSource dataSource) {
// HikariCP close() waits for active connections to be returned
// before physically closing the underlying TCP sockets.
// In-flight transactions will complete; new borrows will fail immediately.
dataSource.close();
log.info("Connection pool closed gracefully");
}
}
# application.yaml โ€” give enough time for in-flight requests to complete
server:
shutdown: graceful # wait for active requests before stopping

spring:
lifecycle:
timeout-per-shutdown-phase: 30s # 30 seconds to drain

Kubernetes readiness probe: set the pod to NotReady first (so no new traffic is routed), then wait for in-flight requests to drain, then close the pool. The pool's close() call blocks until all connections are returned.


Common Mistakesโ€‹

MistakeProblemFix
minimum-idle < maximum-pool-size (dynamic pool)Latency spike on traffic surges while new connections are establishedSet minimum-idle = maximum-pool-size for a fixed-size pool
connection-timeout: 30000 (default 30s)Threads queue for 30 seconds before failing โ€” request pile-up under starvationReduce to 2,000โ€“5,000ms to fail fast
External HTTP call inside @TransactionalConnection held idle during slow external call โ€” pool depletesMove external calls outside @Transactional scope
N+1 queries inside a transactionConnection held for N sequential DB round tripsFix with JOIN FETCH or @EntityGraph
Pool size = 500 "for performance"DB CPU thrash, context switching, lock contention โ€” performance collapsesUse the sizing formula: (cores ร— 2) + spindles
No max-lifetime configuredConnections silently stale โ€” DB closes them, JDBC gets broken pipe errorsAlways set max-lifetime: 1800000 (30 min)
leak-detection-threshold disabled in productionConnection leaks silently drain the pool with no log evidenceEnable at 5,000ms; the stack trace shows exactly where the leak is
PgBouncer transaction mode + prepared statementsERROR: prepared statement "S_1" does not existAdd prepareThreshold=0 to JDBC URL
Lambda without RDS Proxy500 Lambda instances ร— pool=5 = 2,500 connections โ†’ RDS OOM crashUse RDS Proxy for serverless; limit pool size on Lambda to 1โ€“2
Not accounting for multiple app instances5 instances ร— pool=20 = 100 connections โ€” exceeds DB formula of 20Divide total by instance count; use PgBouncer to centralise pooling

๐ŸŽฏ Interview Questionsโ€‹

Q1. What is connection pooling and why is it needed?

Opening a new database connection requires a TCP handshake, TLS negotiation, database authentication, and backend process allocation โ€” typically 10โ€“100ms. A connection pool maintains a cache of pre-opened, pre-authenticated connections. Threads borrow a connection (< 1ms), run their query, and return it. This eliminates the per-request setup cost, enables connection reuse, and caps the total number of connections the database must serve โ€” preventing resource exhaustion.

Q2. How does HikariCP determine when to time out a connection request?

When a thread calls getConnection(), HikariCP checks if an idle connection is available. If not, it places the thread in a queue and waits up to connection-timeout milliseconds (default 30s). If no connection becomes available within that window, HikariCP throws SQLTransientConnectionException. This is pool starvation โ€” the signal that either pool size is too small, queries are holding connections too long, or the system is overloaded.

Q3. Why is setting the pool size to a very large number a bad idea?

More connections does not equal more throughput. A database server with N CPU cores can execute exactly N queries simultaneously. Adding more concurrent connections beyond the optimal point causes: (1) CPU context switching overhead โ€” the OS wastes cycles saving and restoring thread state; (2) disk I/O contention โ€” concurrent writes compete for finite IOPS; (3) increased lock contention โ€” more active transactions mean more row-level and table-level lock waits. Performance peaks at the "knee" of the throughput curve and degrades beyond it. The empirical formula (CPU_cores ร— 2) + spindle_count gives a good starting point โ€” validate with load testing.

Q4. What is pool starvation and what causes it?

Pool starvation is when all connections are borrowed simultaneously, causing new requests to queue at connection-timeout and then fail. Common causes: (1) N+1 queries โ€” one request holds a connection across N serial DB round trips; (2) long transactions โ€” a @Transactional method makes a slow external HTTP call while holding the connection idle; (3) thread pool / connection pool mismatch โ€” more app threads than DB connections, causing threads waiting for connections to block threads holding them; (4) slow queries โ€” a query suddenly takes 5ร— longer, multiplying connection hold time across all concurrent threads.

Q5. What is the difference between PgBouncer session mode and transaction mode?

In session mode, a real PostgreSQL connection is assigned to a client for their entire session โ€” returned only when they disconnect. Safe for all features but provides limited multiplexing. In transaction mode, the real connection is assigned only for the duration of a transaction โ€” returned immediately on COMMIT/ROLLBACK. Enables 10ร— multiplexing (one real connection can serve multiple clients interleaved between transactions) but breaks connection-scoped features: prepared statement caching, temporary tables, SET session variables, and advisory locks. Transaction mode is the production default; prepared statement caching must be disabled via prepareThreshold=0 in the JDBC URL.

Q6. Why does Lambda without RDS Proxy crash the database during traffic spikes?

AWS Lambda scales horizontally โ€” each concurrent invocation runs in an isolated execution environment with its own database connections. A traffic spike from 10 to 500 concurrent Lambda invocations causes 500 separate connection pools to try connecting to RDS simultaneously. PostgreSQL spawns one OS process per connection (~10MB RAM each) โ€” 500 connections consume 5GB of RAM just for connection management, often exceeding the RDS instance's available memory. The database crashes with "too many connections" or runs out of memory. RDS Proxy solves this by multiplexing the 500 Lambda connections onto 20โ€“50 real RDS connections.

Q7. (Senior) How would you diagnose a connection pool leak in production without restarting the application?

Step 1: Enable leak-detection-threshold: 5000 in HikariCP config (requires restart or dynamic config update). This logs a warning with a full stack trace identifying exactly which method borrowed a connection and didn't return it within 5 seconds. Step 2: Check hikaricp.connections.pending in Prometheus/Grafana โ€” sustained pending count > 0 confirms starvation. Step 3: Check hikaricp.connections.active approaching maximum-pool-size โ€” confirm all connections are borrowed. Step 4: Query the database directly: SELECT pid, query, state, query_start, state_change FROM pg_stat_activity WHERE state = 'idle in transaction' โ€” connections in "idle in transaction" state have borrowed a connection but are not executing SQL (the leak). Step 5: Correlate the pid from pg_stat_activity with the pg_locks view to see if these connections hold locks blocking other queries.


See Alsoโ€‹