Java Concurrency & Utilities
A comprehensive guide to concurrent programming in Java โ from thread pools and atomic classes to advanced work-stealing algorithms like Fork/Join and modern structured concurrency.
For fundamental concepts on threads and locking, please refer to the dedicated pages:
- Threads & Processes: Thread creation, lifecycles, states, daemon threads, and deadlocks.
- Locks & Synchronization: Monitor locks (
synchronized), visibility (volatile), lock interfaces (ReentrantLock,ReadWriteLock),StampedLock, and AQS intro.
1. AQS Synchronization Utilitiesโ
While ReentrantLock provides basic mutual exclusion, the AQS framework powers several high-level coordination utilities essential for distributed systems and microservice architectures.
1. CountDownLatchโ
Allows one or more threads to wait until a set of operations being performed in other threads completes.
- Mechanism: Initialized with a count. The
await()methods block until the current count reaches zero due to invocations ofcountDown(). - Reusability: Cannot be reset. Once the count reaches zero, it stays zero.
Real-World Use Case: Microservice Health-Check Fan-Outโ
// Application startup: wait for ALL dependent services to become healthy
public class ApplicationBootstrap {
public void startApplication() throws InterruptedException {
CountDownLatch latch = new CountDownLatch(3);
executor.submit(() -> { waitForDatabase(); latch.countDown(); });
executor.submit(() -> { waitForRedis(); latch.countDown(); });
executor.submit(() -> { waitForMessageBroker(); latch.countDown(); });
boolean ready = latch.await(30, TimeUnit.SECONDS); // Timeout after 30s
if (!ready) throw new IllegalStateException("Dependent services not ready!");
System.out.println("All services initialized. Starting application.");
startHttpServer();
}
}
2. CyclicBarrierโ
Allows a set of threads to all wait for each other to reach a common barrier point.
- Mechanism: Initialized with the number of participating threads. Threads call
await()when they reach the barrier. Once the last thread callsawait(), the barrier is tripped, and all threads proceed. - Reusability: Can be reset and reused after the barrier is tripped.
Real-World Use Case: Multi-Phase Parallel Computationโ
// Simulation: all worker threads must complete phase N before any starts phase N+1
public class ParallelSimulation {
private static final int WORKER_COUNT = 4;
public static void main(String[] args) {
// Optional barrier action runs after ALL threads arrive (before any proceed)
CyclicBarrier barrier = new CyclicBarrier(WORKER_COUNT,
() -> System.out.println("--- All workers completed phase. Merging results... ---")
);
for (int i = 0; i < WORKER_COUNT; i++) {
final int workerId = i;
new Thread(() -> {
try {
for (int phase = 1; phase <= 3; phase++) {
computePhase(workerId, phase);
barrier.await(); // Wait for ALL workers to finish this phase
// Barrier resets automatically โ ready for next phase!
}
} catch (Exception e) { e.printStackTrace(); }
}).start();
}
}
}
3. Semaphoreโ
Maintains a set of permits. acquire() blocks if necessary until a permit is available. release() adds a permit, potentially releasing a blocking acquirer. Used heavily for rate limiting or resource pooling.
Real-World Use Case: Database Connection Pool Limitingโ
// Limit concurrent database connections to prevent overwhelming the DB
public class DatabaseConnectionPool {
private final Semaphore semaphore;
private final BlockingQueue<Connection> pool;
public DatabaseConnectionPool(int maxConnections) {
this.semaphore = new Semaphore(maxConnections, true); // fair=true
this.pool = new ArrayBlockingQueue<>(maxConnections);
// Pre-create connections...
}
public Connection borrowConnection() throws InterruptedException {
semaphore.acquire(); // Block if all connections are in use
return pool.take();
}
public void returnConnection(Connection conn) {
pool.offer(conn);
semaphore.release(); // Signal: a connection is available
}
}
4. Phaser โ The Flexible Barrierโ
Phaser is a reusable synchronization barrier that supports a dynamic number of participants (parties can register/deregister at any time). It subsumes the capabilities of both CountDownLatch and CyclicBarrier.
// Dynamic participant registration โ threads can join/leave between phases
Phaser phaser = new Phaser(1); // Register self (main thread)
for (int i = 0; i < 3; i++) {
phaser.register(); // Dynamically add participant
new Thread(() -> {
System.out.println(Thread.currentThread().getName() + " arrived at phase " + phaser.getPhase());
phaser.arriveAndAwaitAdvance(); // Wait for all parties
System.out.println(Thread.currentThread().getName() + " completed phase 1");
phaser.arriveAndDeregister(); // Done โ leave the phaser
}).start();
}
phaser.arriveAndAwaitAdvance(); // Main thread waits for phase 0
phaser.arriveAndDeregister(); // Main thread leaves
When to use
PhaseroverCyclicBarrier: UsePhaserwhen the number of participating threads is not known in advance, or when threads need to join/leave between phases. UseCyclicBarrierwhen the participant count is fixed.
5. Exchanger โ Two-Thread Data Swapโ
Exchanger<V> allows exactly two threads to swap data at a rendezvous point. Useful for pipeline-style processing where a producer and consumer exchange buffers.
Exchanger<List<String>> exchanger = new Exchanger<>();
// Producer fills buffer, swaps with consumer's empty buffer
Thread producer = new Thread(() -> {
List<String> buffer = new ArrayList<>();
buffer.add("data1"); buffer.add("data2");
List<String> emptyBuffer = exchanger.exchange(buffer); // swap
// Now producer has consumer's empty buffer to refill
});
Q: What is the core difference between CountDownLatch, CyclicBarrier, and Phaser?
- Who is waiting? In
CountDownLatch, usually one main thread waits for N other threads to finish. InCyclicBarrier, N threads wait for each other.Phasersupports both patterns. - Reusability:
CountDownLatchcount cannot be reset.CyclicBarrierresets automatically.Phaseradvances to next phase automatically. - Dynamic parties: Only
Phasersupports adding/removing participants at runtime.
2. Atomic Classes & CASโ
CAS (Compare-And-Swap)โ
CAS is a lock-free atomic operation supported by the CPU: "If the current value equals the expected value, update it. Otherwise, retry." Used extensively in java.util.concurrent.atomic.
๐ถ Beginner Example: AtomicInteger vs synchronizedโ
// โ Race condition without synchronization
private int unsafeCount = 0;
unsafeCount++; // read-modify-write โ NOT atomic
// โ
Option 1: AtomicInteger (lock-free, usually faster)
private final AtomicInteger atomicCount = new AtomicInteger(0);
atomicCount.incrementAndGet(); // Single atomic CPU instruction (CAS loop)
// โ
Option 2: synchronized (simpler, but acquires a lock)
private int syncCount = 0;
public synchronized void increment() { syncCount++; }
AtomicReference & The CAS Loop Patternโ
For atomically updating object references or custom logic:
// Thread-safe immutable state update using CAS loop
private final AtomicReference<ImmutableConfig> config =
new AtomicReference<>(ImmutableConfig.defaults());
public void updateTimeout(int newTimeout) {
ImmutableConfig prev, next;
do {
prev = config.get(); // Read current
next = prev.withTimeout(newTimeout); // Create new version
} while (!config.compareAndSet(prev, next)); // CAS: retry if someone else changed it
// No locks needed! Thread-safe via optimistic concurrency.
}
๐ง Senior: LongAdder & LongAccumulator โ High-Contention Countersโ
Under high contention (many threads incrementing the same counter), AtomicLong becomes a bottleneck because every thread CAS-retries on the same memory location. LongAdder solves this by striping the counter across multiple cells:
// AtomicLong: all threads compete on ONE variable โ CAS retries under contention
AtomicLong atomicCounter = new AtomicLong();
atomicCounter.incrementAndGet(); // Contention hotspot!
// LongAdder: each thread increments its own cell โ aggregate on read
LongAdder adder = new LongAdder();
adder.increment(); // Thread writes to its own cell (no contention)
long total = adder.sum(); // Aggregates all cells (slightly expensive read)
// LongAccumulator: generalized version with custom accumulation function
LongAccumulator maxFinder = new LongAccumulator(Long::max, Long.MIN_VALUE);
maxFinder.accumulate(42); // Thread-safe max tracking
long currentMax = maxFinder.get();
When to Use What?โ
| Class | Throughput under Contention | Read Cost | Use Case |
|---|---|---|---|
AtomicInteger/Long | Degrades with thread count | Cheap (single read) | Low-to-moderate contention; need exact reads |
LongAdder | Scales linearly | Moderate (sum() aggregates cells) | High-contention counters (metrics, request counts) |
LongAccumulator | Scales linearly | Moderate | Custom reductions (max, min, running stats) |
synchronized | Worst (OS mutex under contention) | Cheap | Complex multi-step operations |
Q: What is the ABA problem in CAS and how is it solved? If a value changes from A โ B โ A, CAS checks the value and sees 'A', incorrectly assuming it was never modified. This is dangerous for structures like lock-free linked lists.
Concrete scenario: Thread 1 reads head node A from a lock-free stack. Thread 2 pops A, pops B, then pushes A back. Thread 1's CAS succeeds (head is still A), but the stack structure has changed โ node B is lost!
Solution: Use AtomicStampedReference. It appends a version stamp (integer) to the reference. The CAS operation now checks both the value AND the version stamp.
AtomicStampedReference<String> ref = new AtomicStampedReference<>("A", 0);
int[] stampHolder = new int[1];
String current = ref.get(stampHolder); // stampHolder[0] = current stamp
// CAS checks BOTH value AND stamp โ detects AโBโA
boolean success = ref.compareAndSet(current, "B", stampHolder[0], stampHolder[0] + 1);
3. Java Memory Model (JMM)โ
๐ถ Why Does This Matter?โ
If you skip this section, you will one day write code that works perfectly on your MacBook but randomly fails on a 64-core production server. The JMM explains why โ and it's because your laptop's single CPU doesn't expose the reordering and caching bugs that multi-core systems do.
The Java Memory Model (JMM) specifies the contract between the Java code, the JVM, and the physical CPU hardware regarding how memory reads and writes are propagated across threads. It provides a formal framework for visibility, ordering, and synchronization guarantees.
๐พ Visibility, CPU Caches & Hardware Realityโ
In modern computers, processors execute instructions at gigahertz speeds, but reading from RAM takes hundreds of clock cycles (the memory wall). To bridge this latency gap, CPUs use L1, L2, and L3 hardware caches:
โโโโโโโโโโโโ โโโโโโโโโโโโ
โ Core 0 โ โ Core 1 โ
โ โโโโโโโโ โ โ โโโโโโโโ โ
โ โ L1 โ โ โ โ L1 โ โ
โ โโโโโโโโ โ โ โโโโโโโโ โ
โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ
โโโโโโโโโโโฌโโโโโโโโ
โโโโโผโโโโ
โ L2 โ
โโโโโฌโโโโ
โโโโโผโโโโ
โ L3 โ (Shared Cache)
โโโโโฌโโโโ
โโโโโผโโโโ
โ RAM โ (Main Memory)
โโโโโโโโโ
When a thread modifies a variable:
- It writes the change to its local processor register or private store buffer.
- It propagates to the L1/L2 cache of that core.
- It may take some time before the dirty cache line is flushed to the shared L3 cache or main memory (RAM).
- Meanwhile, a thread running on another core reads the variable from its own L1/L2 cache, seeing a stale value. This is the visibility problem.
๐ Instruction Reordering & Data Racesโ
To maximize CPU pipeline throughput, both the compiler (JIT) and the CPU execution engine are allowed to reorder instructions as long as the behavior remains identical within a single thread (as-if-serial semantics). However, in concurrent environments, reordering can cause catastrophic failures.
Consider the classic data race example:
public class ReorderingExample {
int x = 0;
boolean ready = false;
// Thread 1
public void writer() {
x = 42; // Instruction A
ready = true; // Instruction B
}
// Thread 2
public void reader() {
if (ready) { // Instruction C
System.out.println(x); // Instruction D
}
}
}
Without synchronization:
- The JIT compiler or CPU can reorder
writer()to execute B before A (since they are independent variables). - If Thread 2 runs
reader()at the exact moment Thread 1 completes B (ready = true) but not A (x = 42), Thread 2 will print0(which should be impossible sequentially!). - JMM forbids this reordering if
readyis declaredvolatile.
๐ถ Beginner Example: synchronized Establishes Happens-Beforeโ
public class VisibilityExample {
private int sharedData = 0;
private final Object lock = new Object();
public void writer() {
synchronized (lock) {
sharedData = 42; // Write happens inside synchronized block
} // Monitor unlock โ happens-before...
}
public void reader() {
synchronized (lock) { // ...the next monitor lock on the same object
System.out.println(sharedData); // GUARANTEED to see 42
}
}
// Without synchronized: reader might print 0 (stale cached value)
}
๐ The 8 Happens-Before Rulesโ
The JMM defines thread interactions using happens-before relationships. If action A happens-before action B, the JMM guarantees that all memory writes made by A are visible to B, and that the compiler/CPU cannot reorder A after B.
Here are the 8 formal happens-before rules defined by the Java Language Specification (JLS):
| Rule | Description |
|---|---|
| 1. Program Order Rule | Within a single thread, each action happens-before any subsequent action in program order. |
| 2. Volatile Variable Rule | A write to a volatile variable happens-before every subsequent read of that same variable. |
| 3. Monitor Lock Rule | An unlock operation on a monitor (exiting a synchronized block) happens-before every subsequent lock operation on that same monitor. |
| 4. Thread Start Rule | A call to Thread.start() on a thread happens-before any action in the started thread's run() method. |
| 5. Thread Join Rule | All actions in a thread happen-before any other thread successfully returns from a join() call on that thread. |
| 6. Transitivity Rule | If A happens-before B, and B happens-before C, then A happens-before C. |
| 7. Default Value Rule | The initialization of default values for any object fields happens-before any actions in the constructor. |
| 8. Finalizer Rule | The completion of an object's constructor happens-before the start of its finalizer. |
๐ Memory Barriers (Fences)โ
To enforce happens-before relationships, the JVM inserts hardware-specific memory barriers (instructions that force the CPU to flush write buffers and invalidate read caches):
- StoreStore: Prevents writes before the barrier from being reordered with writes after the barrier.
- LoadLoad: Prevents reads before the barrier from being reordered with reads after the barrier.
- StoreLoad: A heavy fence. Flushes all writes to memory and blocks subsequent reads until flush completes.
- LoadStore: Prevents reads before the barrier from being reordered with writes after the barrier.
Volatile under the hood:โ
- Writing a
volatilevariable inserts:[StoreStore] -> volatile_write -> [StoreLoad] - Reading a
volatilevariable inserts:volatile_read -> [LoadLoad] -> [LoadStore]
โ๏ธ Final Field Guarantees (The Constructor Freeze)โ
In addition to happens-before, the JMM provides a special guarantee for final fields: Safe Publication.
When an object constructor completes, the JVM executes a freeze action on all final fields. If a reference to the object is published after the constructor completes, other threads are guaranteed to see the correctly initialized values of those final fields without any synchronization.
public class ImmutableHolder {
public final int value; // Final: guaranteed to be safely published
public int nonFinalValue; // Non-final: may be seen as 0 by other threads!
public ImmutableHolder() {
this.value = 42;
this.nonFinalValue = 99;
}
}
[!CAUTION] Escape during construction: If the constructor leaks the
thisreference (e.g., registeringthisto a listener inside the constructor), final field guarantees are completely voided, and other threads can see uninitialized state.
4. ThreadLocalโ
ThreadLocal provides per-thread isolated variables.
๐ถ Beginner Concept: Why Do We Need This?โ
SimpleDateFormat is not thread-safe. If 10 threads share one instance, dates get corrupted. Solutions:
- Create a new instance every time โ wasteful (object creation overhead)
synchronizedโ slow (serializes all date formatting)ThreadLocalโ each thread gets its own instance (no sharing, no locking)
// Each thread gets its own SimpleDateFormat โ no sharing, no locks
private static final ThreadLocal<SimpleDateFormat> dateFormat =
ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd"));
public String formatDate(Date date) {
return dateFormat.get().format(date); // Thread-safe!
}
Real-World Use Case: Request-Scoped MDC Logging in Springโ
// In a Spring web filter โ store request context per thread
public class RequestContextFilter implements Filter {
private static final ThreadLocal<RequestContext> CONTEXT = new ThreadLocal<>();
@Override
public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain)
throws IOException, ServletException {
try {
CONTEXT.set(new RequestContext(extractTraceId(req), extractUserId(req)));
MDC.put("traceId", CONTEXT.get().traceId()); // SLF4J MDC uses ThreadLocal
chain.doFilter(req, resp);
} finally {
CONTEXT.remove(); // CRITICAL: prevent memory leak in thread pool
MDC.clear();
}
}
public static RequestContext current() { return CONTEXT.get(); }
}
InheritableThreadLocal โ Parent-to-Child Propagationโ
Standard ThreadLocal values are invisible to child threads. InheritableThreadLocal copies the parent thread's value to child threads at creation time:
InheritableThreadLocal<String> userId = new InheritableThreadLocal<>();
userId.set("user-123");
new Thread(() -> {
System.out.println(userId.get()); // Prints "user-123" (inherited from parent)
}).start();
Limitation: Works for
new Thread()but does NOT work with thread pools (threads are reused, not newly created). For thread pools, use task-wrapping or the framework's built-in propagation (e.g., Spring'sTaskDecorator).
๐ง Senior: ScopedValues (Java 21+ Preview) โ The Modern Replacementโ
ScopedValues solves all the problems of ThreadLocal: no memory leaks, no mutable state, works natively with virtual threads, and automatically propagates to child scopes.
// Immutable, scoped, no cleanup needed, virtual-thread-friendly
private static final ScopedValue<String> CURRENT_USER = ScopedValue.newInstance();
public void handleRequest(String userId) {
ScopedValue.runWhere(CURRENT_USER, userId, () -> {
// All code in this scope (including nested calls) can read CURRENT_USER
processOrder(); // CURRENT_USER.get() returns userId
});
// Outside the scope, CURRENT_USER is no longer bound โ no cleanup required
}
| Feature | ThreadLocal | InheritableThreadLocal | ScopedValue (Java 21+) |
|---|---|---|---|
| Mutability | Mutable (set()/get()) | Mutable | Immutable (bound per scope) |
| Memory leak risk | High (thread pool reuse) | High | None (auto-scoped) |
| Virtual thread support | โ ๏ธ Costly (cloned per VT) | โ ๏ธ Costly | โ Designed for VT |
| Child thread propagation | โ | โ (at creation only) | โ (structured concurrency) |
| Cleanup required? | Yes (remove()) | Yes | No |
Q: Why does ThreadLocal cause memory leaks and how do WeakReferences play a role? Internally, each Thread has a ThreadLocalMap. The map uses ThreadLocal instances as WeakReference keys, but the values are strong references. If a ThreadLocal is garbage-collected, its key in the map becomes null, but the value remains referenced by the thread. In thread pools, where threads are never destroyed, this value lives forever.
Fix: Always call threadLocal.remove() in a finally block after use.
5. Thread Pools & Executorsโ
๐ถ Beginner: ThreadPoolExecutor Constructor Walkthroughโ
The Executors factory methods are convenient but dangerous. Understanding the raw constructor teaches you what's really happening:
ThreadPoolExecutor executor = new ThreadPoolExecutor(
4, // corePoolSize: always-alive threads
8, // maximumPoolSize: max threads under load
60, TimeUnit.SECONDS, // keepAliveTime: idle non-core threads die after 60s
new ArrayBlockingQueue<>(100), // workQueue: bounded! (unbounded = OOM risk)
new ThreadFactory() { // threadFactory: name your threads!
private final AtomicInteger counter = new AtomicInteger(1);
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r, "order-processor-" + counter.getAndIncrement());
t.setDaemon(false);
return t;
}
},
new ThreadPoolExecutor.CallerRunsPolicy() // rejectionHandler: backpressure
);
Task Lifecycleโ
Task submitted
โ Is corePool full?
NO โ Create new core thread to execute task
YES โ Is workQueue full?
NO โ Add task to queue
YES โ Is maximumPoolSize reached?
NO โ Create new non-core thread
YES โ Execute RejectionPolicy
Production rule of thumb: Name your threads (
"order-processor-3"instead of"pool-1-thread-3") so thread dumps are readable during incidents.
๐ง Senior Deep Dive: The Mathematics of Thread Pool Starvationโ
Managing threads manually is an anti-pattern. The Executor framework decouples task submission from execution mechanics. However, blindly setting corePoolSize is a catastrophic senior mistake.
A poorly sized thread pool leads to CPU context-switching death or complete Application Starvation.
The Context Switch Costโ
If your Linux server has 8 CPU cores, it can only physically execute 8 threads simultaneously. If you configure a Thread Pool of 5,000 threads, the Linux kernel has to rapidly switch the 8 physical cores between the 5,000 threads.
- A context switch takes roughly 1 to 5 microseconds.
- If it context switches 100,000 times a second, your CPU spends 50% of its power simply managing threads rather than executing your business logic!
The Sizing Formula (Interview Critical)โ
1. CPU-Bound Tasks (e.g., Video encoding, heavy math, sorting arrays)
- Formula:
N_threads = CPU_Cores + 1 - Why? Adding more threads than cores physically degrades performance due to Context Switching. The
+1acts as a backup in case a working thread takes a page fault (memory swap).
2. I/O-Bound Tasks (e.g., DB queries, HTTP calls, File reads)
- Formula:
N_threads = CPU_Cores * Target_CPU_Utilization * (1 + Wait_Time / Compute_Time) - Rule of Thumb: If an API call takes 100ms, and compiling the JSON response takes 1ms... the thread is blocked waiting for the network 99% of the time! You need massively large Thread Pools (e.g., 200โ500 threads) to ensure the physical CPU cores aren't just sitting idle while threads sleep waiting for network packets.
ScheduledExecutorService โ Periodic & Delayed Tasksโ
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(2);
// Run once after 5-second delay
scheduler.schedule(() -> sendReport(), 5, TimeUnit.SECONDS);
// Run every 30 seconds (fixed rate โ starts every 30s regardless of task duration)
scheduler.scheduleAtFixedRate(() -> collectMetrics(), 0, 30, TimeUnit.SECONDS);
// Run with 30-second delay BETWEEN executions (waits for previous to finish)
scheduler.scheduleWithFixedDelay(() -> cleanupTempFiles(), 0, 30, TimeUnit.SECONDS);
scheduleAtFixedRatevsscheduleWithFixedDelay: If your task takes 10s and interval is 30s โfixedRatefires at 0s, 30s, 60s.fixedDelayfires at 0s, 40s, 80s (30s gap after completion).
Rejection Policiesโ
AbortPolicy(Default): ThrowsRejectedExecutionException.CallerRunsPolicy: Runs task in the caller's thread (acts as natural backpressure).DiscardPolicy: Silently drops task.DiscardOldestPolicy: Drops oldest unhandled request and retries.
Q: Why do strict engineering guidelines forbid using Executors factory methods? * Executors.newFixedThreadPool() uses an unbounded LinkedBlockingQueue. If tasks build up faster than they process, it will cause an OOM.
Executors.newCachedThreadPool()allowsInteger.MAX_VALUEmaximum threads, leading to OOM by creating too many threads. Always explicitly configureThreadPoolExecutorto control queue sizes and thread limits.
๐ For a complete guide on how thread pools relate to Tomcat, Netty, HikariCP, and production sizing, see Thread Pools, Netty, Tomcat & HikariCP.
6. The Fork/Join Frameworkโ
Introduced in Java 7, the Fork/Join framework is designed for work that can be broken down recursively into smaller pieces (Divide and Conquer). It is the engine that powers Arrays.parallelSort() and parallel Streams.
Core Componentsโ
ForkJoinPool: The specialized executor.RecursiveTask<V>: A task that returns a result.RecursiveAction: A task that does not return a result.
The Work-Stealing Algorithmโ
Standard thread pools use a single shared queue, which can become a bottleneck. The ForkJoinPool gives every worker thread its own double-ended queue (deque).
Q: How does Fork/Join prevent idle threads? If a worker thread finishes all the tasks in its own deque, it becomes a "thief." It looks at the deques of other busy worker threads and steals tasks from the tail (the oldest, largest chunks of work). This minimizes contention, because the owner thread operates on the head of the deque, while the thief operates on the tail.
Example: Array Summationโ
public class SumTask extends RecursiveTask<Long> {
private static final int THRESHOLD = 1000;
private long[] array;
private int start, end;
public SumTask(long[] array, int start, int end) {
this.array = array; this.start = start; this.end = end;
}
@Override
protected Long compute() {
if (end - start <= THRESHOLD) {
long sum = 0;
for (int i = start; i < end; i++) sum += array[i];
return sum;
} else {
// Fork: divide task in half
int mid = start + (end - start) / 2;
SumTask left = new SumTask(array, start, mid);
SumTask right = new SumTask(array, mid, end);
left.fork(); // pushes to deque, executed asynchronously
long rightResult = right.compute(); // compute right half in current thread
long leftResult = left.join(); // block and wait for left half
return leftResult + rightResult;
}
}
}
// Usage:
ForkJoinPool pool = ForkJoinPool.commonPool();
long total = pool.invoke(new SumTask(massiveArray, 0, massiveArray.length));
๐ง Fork/Join vs Parallel Streams vs ExecutorServiceโ
| Feature | ExecutorService | ForkJoinPool | Parallel Streams |
|---|---|---|---|
| Task model | Independent tasks | Recursive divide-and-conquer | Data-parallel pipeline |
| Work stealing | โ | โ | โ (uses ForkJoinPool) |
| Best for | Independent I/O tasks | CPU-bound recursive problems | Collection processing |
| Queue | Single shared queue | Per-thread deque | Automatic (Spliterator) |
| Customization | Full control | Moderate | Minimal |
| Common pitfall | Pool starvation | Too-small threshold | Shared commonPool() contention |
- Threshold too small: If THRESHOLD is 1, you create millions of task objects. The object allocation overhead exceeds the parallelism benefit.
- Blocking I/O in Fork/Join: Fork/Join is designed for CPU-bound work. Blocking I/O (HTTP calls, DB queries) will starve the pool.
fork()thenfork(): Alwaysfork()one side andcompute()the other in-place. Forking both wastes threads.
7. CompletableFutureโ
CompletableFuture (Java 8+) provides a powerful API for composing asynchronous, non-blocking operations. By default, it uses the ForkJoinPool.commonPool().
๐ถ Real-World Example: Dashboard Data Aggregationโ
// Fetch user profile and recent orders IN PARALLEL, then combine into a Dashboard
public CompletableFuture<Dashboard> loadDashboard(String userId) {
ExecutorService ioPool = Executors.newVirtualThreadPerTaskExecutor();
CompletableFuture<User> userFuture = CompletableFuture
.supplyAsync(() -> userService.findById(userId), ioPool);
CompletableFuture<List<Order>> ordersFuture = CompletableFuture
.supplyAsync(() -> orderService.getRecentOrders(userId), ioPool);
// thenCombine: runs AFTER both complete, merges results
return userFuture.thenCombine(ordersFuture, (user, orders) -> {
return new Dashboard(user, orders, calculateStats(orders));
});
}
// Usage: non-blocking
loadDashboard("user-42").thenAccept(dashboard -> renderPage(dashboard));
Visual Pipelineโ
โโโโโโโโโโโโโโโ
โโโโถโ Fetch User โโโโ
supplyAsync() โ โโโโโโโโโโโโโโโ โ thenCombine() โโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโถโ Dashboard() โ
โ โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโ
โโโโถโ Fetch Orders โโโโ
โโโโโโโโโโโโโโโโ
// Chain transformations and handle errors elegantly
CompletableFuture<Integer> result = CompletableFuture.supplyAsync(() -> fetchData())
.thenApply(data -> parse(data))
.exceptionally(ex -> {
log.error("Failed", ex);
return -1; // fallback value
});
// Parallel composition (combine)
CompletableFuture<String> combined = getPrice()
.thenCombine(getDiscount(), (price, discount) -> applyDiscount(price, discount));
Warning: Always provide a custom
Executoras the second argument tosupplyAsync()if you are doing I/O-bound tasks. The commonForkJoinPoolis sized for CPU-bound work and will quickly exhaust if blocked by database or network calls.
8. Concurrent Collectionsโ
| Collection | Description |
|---|---|
ConcurrentHashMap | Thread-safe Map. Read operations are entirely lock-free. |
CopyOnWriteArrayList | Creates a new array copy on every write. Ideal for read-heavy scenarios. |
ConcurrentSkipListMap | Thread-safe sorted Map (like a concurrent TreeMap). O(log n) operations. |
BlockingQueue | Interface for producer-consumer queues (ArrayBlockingQueue, LinkedBlockingQueue). |
ConcurrentHashMap Atomic Operationsโ
Beyond get()/put(), ConcurrentHashMap provides atomic compute methods that eliminate the need for external synchronization:
ConcurrentHashMap<String, LongAdder> metrics = new ConcurrentHashMap<>();
// computeIfAbsent: atomically create entry if missing, then use it
metrics.computeIfAbsent("request_count", k -> new LongAdder()).increment();
// merge: atomically combine old and new values
ConcurrentHashMap<String, Integer> wordCounts = new ConcurrentHashMap<>();
wordCounts.merge("hello", 1, Integer::sum); // If "hello" exists, add 1; else set 1
// compute: atomically read-modify-write
map.compute("key", (k, v) -> v == null ? 1 : v + 1);
Common mistake: Using
map.get()+map.put()is NOT atomic even withConcurrentHashMap. Always usecompute(),merge(), orcomputeIfAbsent()for atomic read-modify-write.
CopyOnWriteArrayList โ When to Use (and When NOT To)โ
// โ
Perfect for: event listener lists (written rarely, iterated frequently)
CopyOnWriteArrayList<EventListener> listeners = new CopyOnWriteArrayList<>();
listeners.add(new LoggingListener()); // Creates a new internal array (expensive)
for (EventListener l : listeners) { // Iterates over a snapshot (lock-free, safe)
l.onEvent(event);
}
// โ NEVER use for: frequently modified lists
// Each add/remove copies the ENTIRE array โ O(n) per write operation
BlockingQueue Implementation Comparisonโ
| Implementation | Bound | Internal Structure | Fairness | Best For |
|---|---|---|---|---|
ArrayBlockingQueue | Bounded (fixed) | Single array | Configurable | General producer-consumer |
LinkedBlockingQueue | Optionally bounded | Linked nodes | No | Higher throughput (separate head/tail locks) |
PriorityBlockingQueue | Unbounded | Heap | No | Priority-ordered processing |
SynchronousQueue | Zero capacity | No storage | Configurable | Direct hand-off (each put() blocks until take()) |
DelayQueue | Unbounded | Heap | No | Scheduled/delayed task execution |
LinkedTransferQueue | Unbounded | Linked nodes | No | Low-latency transfer (producer blocks until consumer takes) |
Q: How did ConcurrentHashMap change from JDK 1.7 to 1.8? * JDK 1.7: Used Segment-based locking (an array of Segments). Granularity was locked at the Segment level (default 16).
- JDK 1.8: Removed Segments. Uses a Node array + Linked List + Red-Black tree. Thread safety is achieved using CAS +
synchronized. It locks only the head node of the specific bucket being modified, massively reducing lock contention.
9. Virtual Threads (Java 21+)โ
Virtual threads (Project Loom) completely change the physical threading model of Java, solving the "thread-per-request" bottleneck without the callback-hell of Reactive Programming.
Because Virtual Threads represent a fundamental paradigm shift in the JVM, altering everything from OS Carrier Threads to
ThreadLocalallocations andsynchronizedpinning constraints, we have dedicated an entire architectural guide to it.
10. Structured Concurrency (Java 21+)โ
StructuredTaskScope enforces a discipline that child threads cannot outlive their parent โ solving the "orphaned thread" problem common in CompletableFuture chains.
// Structured Concurrency: all subtasks are scoped and managed together
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Subtask<User> user = scope.fork(this::fetchUser);
Subtask<Order> orders = scope.fork(this::fetchOrders);
scope.join() // wait for both
.throwIfFailed(); // propagate first failure
return new Dashboard(user.get(), orders.get());
} // scope closes โ any unfinished subtasks are cancelled automatically
ShutdownOnFailure vs ShutdownOnSuccessโ
| Policy | Behavior | Use case |
|---|---|---|
ShutdownOnFailure | Cancel all subtasks if any fails | All results required |
ShutdownOnSuccess | Cancel remaining once one succeeds | First-response-wins (redundant calls) |
// ShutdownOnSuccess: try multiple sources, use fastest response
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
scope.fork(() -> fetchFromPrimaryDB());
scope.fork(() -> fetchFromCache());
scope.fork(() -> fetchFromReplica());
scope.join();
return scope.result(); // returns whichever completed first
}
CompletableFuture vs StructuredTaskScopeโ
| Feature | CompletableFuture | StructuredTaskScope |
|---|---|---|
| Thread lifecycle | Unstructured โ tasks can outlive caller | Structured โ tasks cannot outlive scope |
| Cancellation | Manual (easy to forget) | Automatic on scope close |
| Error propagation | Must chain .exceptionally() | throwIfFailed() propagates naturally |
| Thread dump readability | Flat โ no parent-child relationship visible | Hierarchical โ clear parent-child tree |
| Orphaned threads | โ ๏ธ Common pitfall | Impossible by design |
| Java version | 8+ | 21+ (preview) |
| Best for | Legacy code, complex async pipelines | New code with concurrent subtasks |
Key insight: Unlike
CompletableFuture.allOf(), structured concurrency ensures all forked threads terminate before the scope exits โ either normally or cancelled. No orphaned background work.
๐ง Custom StructuredTaskScope โ Aggregating Partial Resultsโ
You can extend StructuredTaskScope to implement custom completion policies, such as collecting all successful results even if some tasks fail:
// Collect all successful results, ignore failures (e.g., best-effort fan-out)
public class CollectingScope<T> extends StructuredTaskScope<T> {
private final ConcurrentLinkedQueue<T> results = new ConcurrentLinkedQueue<>();
private final ConcurrentLinkedQueue<Throwable> errors = new ConcurrentLinkedQueue<>();
@Override
protected void handleComplete(Subtask<? extends T> subtask) {
if (subtask.state() == Subtask.State.SUCCESS) {
results.add(subtask.get());
} else if (subtask.state() == Subtask.State.FAILED) {
errors.add(subtask.exception());
}
}
public List<T> successfulResults() { return List.copyOf(results); }
public List<Throwable> failures() { return List.copyOf(errors); }
}
// Usage: query 5 replicas, use whatever responds successfully
try (var scope = new CollectingScope<SearchResult>()) {
for (String replica : replicas) {
scope.fork(() -> queryReplica(replica));
}
scope.join();
List<SearchResult> results = scope.successfulResults(); // partial success OK
}
11. Advanced CompletableFuture Patternsโ
Combining Multiple Futuresโ
// Get all results after allOf completes (workaround for allOf's Void return)
CompletableFuture<List<String>> allResults = CompletableFuture.allOf(future1, future2, future3)
.thenApply(v -> Stream.of(future1, future2, future3)
.map(CompletableFuture::join) // safe โ all completed
.collect(Collectors.toList()));
Error Handlingโ
CompletableFuture<User> future = CompletableFuture.supplyAsync(this::fetchUser)
.exceptionally(ex -> User.anonymous()) // recover from error with default
.handle((user, ex) -> { // always runs โ inspect both
if (ex != null) return User.anonymous();
return user;
})
.whenComplete((user, ex) -> // side-effect logging only
log.info("Completed: user={}, error={}", user, ex));
Timeout (Java 9+)โ
CompletableFuture<String> withTimeout = CompletableFuture
.supplyAsync(this::slowExternalCall)
.orTimeout(2, TimeUnit.SECONDS) // throws after 2s
.completeOnTimeout("fallback", 2, TimeUnit.SECONDS); // or return fallback
thenCompose vs thenApplyโ
// thenApply: synchronous transform โ adapts T โ U
CompletableFuture<String> upper = future.thenApply(String::toUpperCase);
// thenCompose: flatMap โ use when transform itself returns a CompletableFuture
// Prevents CompletableFuture<CompletableFuture<Order>>
CompletableFuture<Order> orders = userFuture
.thenCompose(user -> fetchOrdersFor(user.getId()));
Retry Pattern with Exponential Backoffโ
public static <T> CompletableFuture<T> retryWithBackoff(
Supplier<CompletableFuture<T>> action,
int maxRetries,
ScheduledExecutorService scheduler) {
return action.get().thenApply(CompletableFuture::completedFuture)
.exceptionally(ex -> {
if (maxRetries <= 0) return CompletableFuture.failedFuture(ex);
long delay = (long) Math.pow(2, 3 - maxRetries) * 1000; // 1s, 2s, 4s...
CompletableFuture<T> delayed = new CompletableFuture<>();
scheduler.schedule(
() -> retryWithBackoff(action, maxRetries - 1, scheduler)
.whenComplete((val, err) -> {
if (err != null) delayed.completeExceptionally(err);
else delayed.complete(val);
}),
delay, TimeUnit.MILLISECONDS
);
return delayed;
})
.thenCompose(Function.identity());
}
// Usage:
retryWithBackoff(() -> CompletableFuture.supplyAsync(() -> callFlakyApi()), 3, scheduler);
Fan-Out / Fan-In โ Parallel Service Callsโ
// Query multiple search engines in parallel, aggregate results
public CompletableFuture<List<SearchResult>> fanOutSearch(String query) {
List<SearchEngine> engines = List.of(googleEngine, bingEngine, duckEngine);
// Fan-out: launch all searches in parallel
List<CompletableFuture<List<SearchResult>>> futures = engines.stream()
.map(engine -> CompletableFuture
.supplyAsync(() -> engine.search(query), ioExecutor)
.orTimeout(3, TimeUnit.SECONDS)
.exceptionally(ex -> List.of())) // Partial failure OK
.toList();
// Fan-in: wait for all, flatten results
return CompletableFuture.allOf(futures.toArray(CompletableFuture[]::new))
.thenApply(v -> futures.stream()
.flatMap(f -> f.join().stream())
.distinct()
.collect(Collectors.toList()));
}
Always use a custom executor for I/Oโ
// โ Uses ForkJoinPool.commonPool() โ designed for CPU-bound, not blocking I/O
CompletableFuture.supplyAsync(() -> httpClient.fetch(url));
// โ
Dedicated I/O executor (or virtual threads in Java 21)
ExecutorService ioExecutor = Executors.newVirtualThreadPerTaskExecutor();
CompletableFuture.supplyAsync(() -> httpClient.fetch(url), ioExecutor);
12. Producer-Consumer Patternโ
A fundamental pattern where producer threads generate data and consumer threads process it, communicating via a shared buffer.
While you can write this using wait()/notifyAll(), modern backend engineering relies on BlockingQueue:
BlockingQueue<String> queue = new ArrayBlockingQueue<>(100);
// Producer
executor.submit(() -> {
queue.put("payload"); // Blocks automatically if full
});
// Consumer
executor.submit(() -> {
String payload = queue.take(); // Blocks automatically if empty
process(payload);
});
Poison Pill Shutdown Patternโ
Gracefully stopping consumers without Thread.interrupt():
public class ProducerConsumerWithShutdown {
private static final String POISON_PILL = "__SHUTDOWN__";
private final BlockingQueue<String> queue = new ArrayBlockingQueue<>(100);
// Producer: signals completion by sending a poison pill
public void produce(List<String> items) {
for (String item : items) {
queue.put(item);
}
queue.put(POISON_PILL); // Signal: "no more data"
}
// Consumer: processes until it receives the poison pill
public void consume() {
while (true) {
String item = queue.take();
if (POISON_PILL.equals(item)) {
log.info("Received shutdown signal. Exiting.");
break; // Clean exit โ no interrupt handling needed
}
process(item);
}
}
}
// For multiple consumers: send one POISON_PILL per consumer thread
BlockingQueue Implementation Quick Guideโ
| Queue | When to Use |
|---|---|
ArrayBlockingQueue | Default choice. Fixed capacity. Fair ordering optional. |
LinkedBlockingQueue | Higher throughput than Array (separate head/tail locks). Optional bound. |
SynchronousQueue | Direct hand-off. No buffering. Producer blocks until consumer takes. |
PriorityBlockingQueue | Process highest-priority items first. Unbounded. |
DelayQueue | Items only become available after a delay expires. |
๐ง Senior: LMAX Disruptor โ Ultra-Low-Latency Alternativeโ
For systems where BlockingQueue latency (microseconds from lock contention) is unacceptable (e.g., financial trading, game servers), the LMAX Disruptor achieves nanosecond-level inter-thread messaging:
- Ring buffer instead of linked list/array (cache-line friendly, pre-allocated)
- No locks โ uses CAS + memory barriers only
- Mechanical sympathy โ designed around CPU cache architecture
- Throughput: 100M+ messages/second on commodity hardware
When
BlockingQueuegives you microsecond latency but you need nanoseconds, the Disruptor is the industry standard. Used by LMAX Exchange, Apache Log4j 2 (async loggers), and Hazelcast.
Compare Nextโ
Interview Questionsโ
Q: How do you pick concurrency primitives in production services?โ
A: Follow the hierarchy of simplicity: (1) Start with immutability โ if data doesn't change, no synchronization is needed. (2) Use thread confinement (e.g., ThreadLocal, actor model) to avoid sharing entirely. (3) Use high-level utilities (ConcurrentHashMap, BlockingQueue, CompletableFuture) before reaching for low-level locks. (4) Only use synchronized or ReentrantLock when you need fine-grained control over shared mutable state. The simpler the primitive, the fewer bugs.
Q: What is the most common thread-pool failure pattern?โ
A: Pool starvation from blocking I/O on executors sized for CPU-bound tasks. Example: a ForkJoinPool.commonPool() with 8 threads serves an endpoint that makes HTTP calls taking 500ms each. After 8 concurrent requests, all threads are blocked waiting for network responses, and subsequent requests queue indefinitely. Fix: use separate I/O-bound pools sized with the formula N = cores ร (1 + wait/compute), or use virtual threads (Java 21+).
Q: How do you design backpressure in asynchronous pipelines?โ
A: (1) Bound your queues โ use ArrayBlockingQueue(capacity) instead of unbounded queues. (2) Choose rejection policies intentionally โ CallerRunsPolicy naturally slows down producers when consumers are overwhelmed. (3) Propagate load-shedding upstream โ return HTTP 429 or use circuit breakers to signal overload to callers. (4) Monitor queue depth as a key metric โ growing queues indicate a consumer that can't keep up.
Q: Why is lock ordering still relevant with modern utilities?โ
A: Even with ReentrantLock and ConcurrentHashMap, mixed synchronization paths can deadlock. Example: Service A acquires Lock-X then Lock-Y; Service B acquires Lock-Y then Lock-X. Modern tools don't prevent logical ordering violations. The fix is a global lock ordering convention (e.g., always acquire in alphabetical order by resource name) and using tryLock(timeout) to detect and recover from ordering mistakes.
Q: When are virtual threads not a silver bullet?โ
A: (1) CPU-bound work โ virtual threads don't add cores; you still need N = cores + 1 threads. (2) synchronized pinning โ a virtual thread inside a synchronized block pins its carrier thread, negating the benefit. Use ReentrantLock instead. (3) External bottlenecks โ if your DB connection pool has 20 connections, 1 million virtual threads still queue at the pool. Virtual threads solve the thread bottleneck, not the resource bottleneck.
Q: How do you evaluate CompletableFuture chains in code review?โ
A: Check these 5 things: (1) Executor selection โ is supplyAsync() using a custom I/O executor, not the common pool? (2) Error propagation โ does every chain have .exceptionally() or .handle()? Unhandled errors are silently swallowed. (3) Timeout behavior โ is orTimeout() or completeOnTimeout() set for external calls? (4) Cancellation โ does cancelling one future properly cancel dependent futures? (5) Thread safety of shared state โ lambdas in the chain may execute on different threads.
Q: What does senior-level concurrency testing include?โ
A: (1) Deterministic stress tests โ use CyclicBarrier to force all threads to start simultaneously, maximizing race condition probability. (2) Race-condition probes โ tools like jcstress (OpenJDK's concurrency stress testing harness) systematically explore thread interleavings. (3) Latency assertions under contention โ verify P99 latency doesn't degrade beyond SLA when thread count increases. (4) Deadlock detection โ programmatic ThreadMXBean.findDeadlockedThreads() in health checks.
Q: Why is SimpleDateFormat not thread-safe, and what are the fixes?โ
A: SimpleDateFormat uses internal mutable state (a Calendar field) during formatting/parsing. When multiple threads share one instance, they corrupt each other's intermediate state, producing garbled dates or NumberFormatException. Fixes: (1) ThreadLocal<SimpleDateFormat> โ one instance per thread. (2) DateTimeFormatter (Java 8+) โ immutable and thread-safe by design. (3) Create a new instance per call (wasteful but correct). In modern Java, always use DateTimeFormatter.
Q: How do you detect deadlocks in a production JVM?โ
A: (1) jstack <pid> โ prints all thread stack traces; the JVM automatically detects and reports monitor deadlock cycles at the bottom of the output. (2) ThreadMXBean.findDeadlockedThreads() โ programmatic API; wire this into a health-check endpoint or periodic monitoring. (3) Thread dump analysis tools โ fastThread.io, IBM TDMA parse thread dumps and visualize lock chains. (4) Prevention โ use tryLock(timeout) with ReentrantLock to avoid infinite blocking; log and alert on timeout.
Q: When should you use StampedLock optimistic reads?โ
A: When your workload is overwhelmingly read-heavy (>95% reads) and reads are short operations. The optimistic read path avoids acquiring any lock โ it grabs a stamp, reads data, then validates. If a writer intervened, it falls back to a standard read lock. Do not use if: (1) reads are long-running (validation failure wastes work), (2) writes are frequent (constant fallbacks negate the benefit), or (3) you need reentrancy (StampedLock is not reentrant โ calling it recursively deadlocks).
Q: How would you design a rate limiter using Semaphore?โ
A: Use a Semaphore with N permits representing the maximum concurrent requests. Each request acquire()s a permit before execution and release()s it in a finally block after. For time-based rate limiting (e.g., 100 requests/second), combine with a ScheduledExecutorService that periodically replenishes permits. Use tryAcquire(timeout) to fail fast instead of queuing indefinitely. For distributed rate limiting, use Redis + Lua scripts instead of in-process Semaphore.
Q: What is the difference between CompletableFuture and Structured Concurrency?โ
A: CompletableFuture provides unstructured concurrency โ tasks are fire-and-forget, with no guarantee that child tasks terminate when the parent does. This leads to orphaned threads, resource leaks, and unreadable thread dumps. StructuredTaskScope (Java 21+) enforces structured concurrency โ child tasks cannot outlive their parent scope, cancellation is automatic, and thread dumps show a clear parent-child hierarchy. Use CompletableFuture for complex async pipelines in pre-Java-21 code; use StructuredTaskScope for new concurrent subtask patterns.