Skip to main content

Processes & Threads โ€” Complete Guide

Who this guide is for

What is a Process?โ€‹

A process is a running instance of a program. When you double-click a Java application or run java -jar app.jar, the OS loads the program from disk into memory, creates a process to run it, and assigns it resources (memory, CPU time, file handles).

The key word is isolated โ€” each process has its own private memory space. Process A cannot read or write Process B's memory without going through the OS. This isolation is both a safety feature and a performance cost.

The restaurant analogyโ€‹

Restaurant conceptOS equivalent
The restaurant buildingYour computer
One restaurant kitchenOne process (isolated resources)
Cooks working in that kitchenThreads (share the kitchen's tools)
Walls between restaurantsProcess memory isolation
Shouting through a window to the next restaurantInter-process communication (IPC)
A cook opening the refrigeratorThread accessing shared heap memory

Two restaurants cannot share their refrigerators (different process memory). But all cooks in the same kitchen can reach into the same fridge (shared heap within one process).

Process memory layoutโ€‹

When the OS loads a process, it lays out memory in a specific structure:

High Address (e.g. 0xFFFF_FFFF)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Stack โ”‚
โ”‚ Local variables, function call frames, โ”‚
โ”‚ return addresses, function arguments โ”‚
โ”‚ โ†“ grows downward โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ”‚
โ”‚ (free space โ€” stack grows down, โ”‚
โ”‚ heap grows up into this) โ”‚
โ”‚ โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ†‘ grows upward โ”‚
โ”‚ Heap โ”‚
โ”‚ Dynamically allocated memory: โ”‚
โ”‚ new MyObject(), malloc(), ArrayList โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ BSS Segment โ”‚
โ”‚ Uninitialised global/static variables โ”‚
โ”‚ (zeroed by OS at startup) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Data Segment โ”‚
โ”‚ Initialised global/static variables โ”‚
โ”‚ e.g. static int MAX = 100; โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Text Segment โ”‚
โ”‚ Compiled program bytecode / machine code โ”‚
โ”‚ Read-only โ€” prevents accidental modification โ”‚
Low Address (e.g. 0x0000_0000)

Important: the stack and heap grow toward each other. A stack overflow happens when the stack grows so large it collides with the heap region.

In Java / JVMโ€‹

JVM Process Memory:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ JVM Heap (Xmx) โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Young Gen โ”‚ โ”‚ Old Gen โ”‚ โ† GC manages โ”‚
โ”‚ โ”‚ (Eden,S0,S1)โ”‚ โ”‚ โ”‚ all of this โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Metaspace (class metadata, method bytecodes) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Thread Stacks (one per thread, ~512KBโ€“1MB each) โ”‚
โ”‚ Thread 1 stack โ”‚ Thread 2 stack โ”‚ Thread 3 stack โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Code Cache (JIT-compiled native code) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Process Control Block (PCB)โ€‹

The OS maintains a PCB (Process Control Block) data structure for every process. It's the process's "identity card" โ€” everything the OS needs to manage and resume the process.

// Conceptual PCB structure (simplified from Linux task_struct)
struct PCB {
int pid; // Unique process ID (e.g. 4242)
int ppid; // Parent process ID
int state; // RUNNING, READY, WAITING, ZOMBIE

// CPU context โ€” saved when process is descheduled
void* program_counter; // Address of NEXT instruction to execute
int registers[16]; // General-purpose register values
int stack_pointer; // Current top of stack
int flags_register; // CPU condition flags (zero, overflow, etc.)

// Memory management
PageTable* page_table; // Maps virtual โ†’ physical memory addresses
void* heap_start;
void* stack_start;

// Scheduling
int priority; // Scheduling priority
long cpu_time_used; // Total CPU time consumed (for billing/fairness)
long last_scheduled; // When was this process last run

// I/O and resources
File* open_files[1024];// Table of open file descriptors
Signal signal_handlers[]; // Registered signal handlers
};

When a context switch happens, the current process's CPU state is saved into its PCB so it can be resumed later exactly where it left off.


Process States and Lifecycleโ€‹

fork() / CreateProcess()
โ”‚
โ–ผ
โ•”โ•โ•โ•โ•โ•โ•—
โ•‘ NEW โ•‘ โ† process created but not yet admitted
โ•šโ•โ•โ•โ•โ•โ•
โ”‚ OS admits to memory
โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ–บโ•‘ READY โ•‘โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ• โ”‚
โ”‚ scheduler dispatches โ”‚
โ”‚ โ”‚ โ”‚
I/O โ”‚ โ–ผ I/O completes /
completes โ”‚ โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•— event occurs
โ”‚ โ•‘ RUNNING โ•‘โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚
โ”‚ โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ• โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ”‚ โ”‚ โ•‘ WAITING โ•‘
โ”‚ preempted (time slice) โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
exit()
โ”‚
โ–ผ
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ TERMINATED โ•‘ โ† PCB kept until parent calls wait()
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
StateWhat it means
NewProcess created โ€” not yet admitted by scheduler
ReadyIn memory, waiting for CPU time
RunningCurrently executing on a CPU core
WaitingBlocked on I/O, sleep, or synchronisation โ€” not using CPU
TerminatedFinished โ€” PCB kept until parent reads exit status

Zombie and Orphan processesโ€‹

Zombie process:
Child exits โ†’ becomes zombie (PCB kept but code no longer running)
Parent hasn't called wait() โ†’ zombie accumulates
Problem: PCB entries are finite โ€” zombie flood can exhaust them
Fix: always call wait() or waitpid() in parent; use SIGCHLD handler

Orphan process:
Parent exits before child โ†’ child has no parent to read its exit status
OS reparents orphan to init (PID 1) / systemd
init periodically calls wait() to clean up reparented children
Orphans are harmless โ€” init manages them

Process Creationโ€‹

#include <unistd.h>
#include <sys/wait.h>

int main() {
pid_t pid = fork(); // Creates a child process

if (pid == 0) {
// โ”€โ”€ Child process โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// At this point, child is an exact copy of parent (copy-on-write)
// Replace child's image with a different program:
execl("/bin/ls", "ls", "-la", "/tmp", NULL);
// exec() replaces the process image โ€” code after this never runs
// if exec() returns, it failed
perror("exec failed");
exit(1);

} else if (pid > 0) {
// โ”€โ”€ Parent process โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
int status;
waitpid(pid, &status, 0); // Wait for child; prevents zombie
printf("Child %d exited with status %d\n", pid, WEXITSTATUS(status));

} else {
// pid < 0 = fork failed
perror("fork failed");
exit(1);
}
return 0;
}

fork() + copy-on-write:

Before fork(): After fork():
Parent memory: Parent memory: Child memory:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Code (R/O) โ”‚ โ”‚ Code (R/O) โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Code (R/O) โ”‚ (shared, read-only)
โ”‚ Data=42 โ”‚ โ”‚ Data=42 โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Data=42 โ”‚ (shared, copy-on-write)
โ”‚ Heap โ”‚ โ”‚ Heap โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ Heap โ”‚ (shared until written)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

When child writes to Data:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Data=42 โ”‚ โ”‚ Data=99 โ”‚ โ† OS makes private copy
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
No upfront copying โ€” pages are copied lazily only when written.

Inter-Process Communication (IPC)โ€‹

Since processes have separate memory spaces, they need OS-mediated mechanisms to communicate:

MechanismDirectionLatencyPersistenceBest for
PipeUnidirectionalVery lowIn-memory onlyParentโ†’child byte streaming
Named Pipe (FIFO)UnidirectionalVery lowFilesystem entryUnrelated processes same machine
Message QueueBidirectionalLowKernel-managedStructured message passing
Shared MemoryBidirectionalLowestRAM onlyHigh-speed bulk data transfer
Unix Domain SocketBidirectionalVery lowIn-memoryHigh-perf local IPC (Nginxโ†’PHP-FPM)
TCP SocketBidirectionalHigherNetwork-capableCross-machine or cross-container
SignalNotificationVery lowNoneSimple events (SIGTERM, SIGKILL)
Memory-Mapped FileBidirectionalLowFile-backedLarge data, database files

Shared memory โ€” the fastest IPCโ€‹

// Process A: creates shared memory segment, writes to it
int shm_id = shmget(IPC_PRIVATE, sizeof(int) * 1000, IPC_CREAT | 0666);
int* data = (int*)shmat(shm_id, NULL, 0); // attach to address space
data[0] = 42; // write directly to RAM

// Process B: attaches to same segment, reads it
int* data = (int*)shmat(shm_id, NULL, 0); // same physical RAM
printf("%d\n", data[0]); // reads 42 โ€” zero copy!

// Warning: no synchronisation โ€” need a semaphore or mutex alongside

Java equivalent:

// Java NIO MappedByteBuffer (memory-mapped file โ€” OS maps file into address space)
RandomAccessFile file = new RandomAccessFile("shared.dat", "rw");
FileChannel channel = file.getChannel();
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 4096);

buffer.putInt(0, 42); // write โ€” goes directly to mapped memory
int val = buffer.getInt(0); // read โ€” from mapped memory

// Multiple JVM processes mapping the same file share the same physical RAM pages
// Used by: Kafka (log files), Chronicle Map, off-heap databases

What is a Thread?โ€‹

A thread is the smallest unit of CPU execution within a process. All threads in a process share the same memory space โ€” heap, code, global data, open files โ€” but each thread has its own:

Process (shared resources)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Heap (shared by all threads) โ”‚
โ”‚ Code Segment (shared) โ”‚
โ”‚ Data Segment (shared) โ”‚
โ”‚ Open File Descriptors (shared) โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Thread 1 โ”‚ Thread 2 โ”‚ Thread 3 โ”‚
โ”‚ Stack โ†“ โ”‚ Stack โ†“ โ”‚ Stack โ†“ โ”‚
โ”‚ PC: 0x4A2F โ”‚ PC: 0x9C10 โ”‚ PC: 0x1F03 โ”‚
โ”‚ Registers โ”‚ Registers โ”‚ Registers โ”‚
โ”‚ Thread ID โ”‚ Thread ID โ”‚ Thread ID โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Thread vs process โ€” the key differenceโ€‹

Creating a new process (fork):
โœ— Copy the entire address space (even with CoW, page tables are copied)
โœ— New PCB, file descriptor table, signal handlers
โœ— Time: ~1ms
โœ“ Complete isolation โ€” crash in one doesn't affect others

Creating a new thread:
โœ“ Share existing address space โ€” only a new stack is allocated
โœ“ Share file descriptors, heap, code
โœ“ Time: ~10ยตs (100ร— faster than process creation)
โœ— Crash in one thread (e.g. StackOverflowError) crashes the whole process
โœ— Shared memory means synchronisation is required

Why threads exist โ€” the parallelism problemโ€‹

Single-threaded server handling 3 requests:

Request A (100ms DB query) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Request B waits โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Request C waits โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Total time: 300ms

Multi-threaded server (3 threads):

Thread 1: Request A (100ms DB query) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Thread 2: Request B (100ms DB query) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Thread 3: Request C (100ms DB query) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ response
Total time: 100ms (3ร— faster for same work)

Threading Modelsโ€‹

ModelMappingParallelismUsed byTrade-off
1:11 user thread = 1 kernel threadโœ… TrueJava (modern), pthreadsOS overhead per thread
M:1N user threads = 1 kernel threadโŒ NoneOld green threadsOne block blocks all
M:NM user threads = N kernel threadsโœ… TrueGo (goroutines), ErlangComplex scheduler

Java's threading model evolutionโ€‹

Java 1โ€“20 (Platform threads, 1:1 model):
Each Java Thread = one OS kernel thread
Creating 10,000 threads โ†’ 10,000 OS threads โ†’ ~10 GB of stack RAM
OS scheduler manages all 10,000 โ†’ context switch overhead

Java 21+ (Virtual threads, M:N model via Project Loom):
Each Virtual Thread = lightweight JVM-managed thread
10,000,000 virtual threads โ†’ small number of OS carrier threads
JVM scheduler manages virtual threads; OS only sees carrier threads
When virtual thread blocks on I/O โ†’ unmounted from carrier โ†’ carrier does other work

Java Thread Lifecycleโ€‹

Creating and starting threadsโ€‹

// โ”€โ”€ Option 1: Extend Thread (rarely used) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
class WorkerThread extends Thread {
@Override
public void run() {
System.out.println("Running in: " + Thread.currentThread().getName());
}
}
new WorkerThread().start();

// โ”€โ”€ Option 2: Implement Runnable (functional style) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Thread t = new Thread(() -> System.out.println("Lambda thread"));
t.setName("worker-1");
t.setDaemon(true); // daemon threads don't prevent JVM shutdown
t.setPriority(Thread.NORM_PRIORITY); // 1โ€“10, OS uses as a hint
t.start(); // don't call run() directly โ€” that runs synchronously

// โ”€โ”€ Option 3: ExecutorService (ALWAYS use in production) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ExecutorService pool = Executors.newFixedThreadPool(4);
Future<String> future = pool.submit(() -> "result from thread");
String result = future.get(5, TimeUnit.SECONDS); // blocks at most 5s
pool.shutdown(); // graceful: wait for running tasks to finish
pool.awaitTermination(30, TimeUnit.SECONDS);

Java thread statesโ€‹

NEW
โ”‚ thread.start()
โ–ผ
RUNNABLE โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ โ”‚
โ”œโ”€โ”€ tries to enter synchronized block โ†’ BLOCKED โ”€โ”€โ–บโ”ค (lock released)
โ”‚ โ”‚
โ”œโ”€โ”€ calls wait() / LockSupport.park() โ†’ WAITING โ”€โ”€โ–บโ”ค (notify/unpark)
โ”‚ โ”‚
โ”œโ”€โ”€ calls sleep(n) / wait(n) / join(n) โ†’ TIMED_WAITING โ”€โ”€โ–บ (timeout expires)
โ”‚
โ”‚ run() completes or exception thrown
โ–ผ
TERMINATED
StateTriggerHow to resume
NEWnew Thread() calledCall .start()
RUNNABLE.start() calledOS schedules it
BLOCKEDWaiting for a synchronized lockOther thread releases the lock
WAITINGwait(), join(), park()notify(), thread completes, unpark()
TIMED_WAITINGsleep(n), wait(n), join(n)Timer expires or interrupted
TERMINATEDrun() returns or throwsCannot restart
// Inspect thread state programmatically
Thread t = new Thread(() -> {
try { Thread.sleep(5000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
});
System.out.println(t.getState()); // NEW
t.start();
Thread.sleep(100);
System.out.println(t.getState()); // TIMED_WAITING
t.join();
System.out.println(t.getState()); // TERMINATED

Context Switching Internalsโ€‹

A context switch is the OS saving one thread's CPU state and restoring another's. It is pure overhead โ€” no useful work happens during a switch.

What gets saved and restoredโ€‹

Thread A is running โ†’ time slice expires โ†’ context switch โ†’ Thread B runs

Step 1: Save Thread A's CPU context to its PCB/TCB:
program_counter โ† memory address of next instruction Thread A would execute
stack_pointer โ† current top of Thread A's call stack
registers[0..15] โ† all general-purpose register values (rdx, rsi, r8, ...)
flags_register โ† condition codes (zero flag, carry flag, etc.)
FPU state โ† floating-point unit registers (if used)

Step 2: If switching processes (not just threads):
TLB flush โ† Translation Lookaside Buffer must be invalidated
(virtualโ†’physical address mapping changes per process)
Page table swap โ† new process's page table loaded into CR3 register

Step 3: Load Thread B's CPU context from its PCB/TCB:
(reverse of step 1 โ€” restore all saved state)

Step 4: CPU resumes executing at Thread B's saved program_counter

Context switch costsโ€‹

Switch typeTypical costPrimary cost driver
Thread switch (same process)1โ€“5 ยตsRegister save/restore, scheduler
Thread switch (different process)5โ€“15 ยตs+ TLB flush, page table swap
Virtual thread switch (Java 21)< 1 ยตsJVM-managed โ€” no kernel syscall

The TLB flush problem: the TLB (Translation Lookaside Buffer) caches virtualโ†’physical address translations. When switching between processes, the TLB must be invalidated because the new process has a completely different address space. After the switch, every memory access triggers a TLB miss until the cache warms up again โ€” this is why process switches are more expensive than thread switches within the same process.

Cache pollution: CPU L1/L2 caches contain the working set of the running thread. A context switch brings in a different thread's working set, evicting the previous thread's data. When that thread resumes, it faces cache misses until its data is reloaded.

When context switching hurts performanceโ€‹

// Anti-pattern: more threads than CPU cores on CPU-bound work
// 8 core machine, 200 threads doing CPU-intensive computation:
ExecutorService pool = Executors.newFixedThreadPool(200);

// What actually happens:
// OS constantly context-switches 200 threads across 8 cores
// Each switch: 5ยตs overhead ร— thousands of switches/sec = significant CPU waste
// Threads spend more time being switched than doing actual computation

// Fix for CPU-bound work: match threads to available cores
int cpuCores = Runtime.getRuntime().availableProcessors();
ExecutorService pool = Executors.newFixedThreadPool(cpuCores); // 8 threads, 8 cores
// Each core runs one thread continuously โ€” no context switching needed

// Fix for I/O-bound work: more threads are OK (they spend most time waiting)
// Or better: use virtual threads (Java 21) โ€” no OS thread blocked during I/O wait

Java Thread Pool Tuningโ€‹

The two workload typesโ€‹

CPU-bound work:
Uses 100% CPU during execution (sorting, encryption, image processing)
Context-switching between threads wastes CPU cycles
Optimal threads = CPU cores (1 thread per core, no switching needed)

ExecutorService cpuPool = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors()
);

I/O-bound work:
Thread spends most time blocked waiting (DB query, HTTP call, file read)
CPU is idle during the wait โ€” other threads can use it
Optimal threads = (CPU cores) / (1 - blocking_ratio)
Example: 8 cores, 90% blocking โ†’ 8 / 0.1 = 80 threads
Or: use virtual threads (Java 21) โ€” no kernel thread blocked during I/O

ExecutorService ioPool = Executors.newFixedThreadPool(80);
// OR (Java 21+):
ExecutorService vtPool = Executors.newVirtualThreadPerTaskExecutor();

ThreadPoolExecutor โ€” full controlโ€‹

// Executors.newFixedThreadPool(n) is just a convenience wrapper.
// For production, use ThreadPoolExecutor directly for full control:

ThreadPoolExecutor executor = new ThreadPoolExecutor(
10, // corePoolSize: always-alive threads
50, // maximumPoolSize: peak thread count
60, TimeUnit.SECONDS, // keepAliveTime: idle thread survival time
new LinkedBlockingQueue<>(1000), // workQueue: task buffer when all threads busy
new ThreadFactory() {
private final AtomicInteger counter = new AtomicInteger(0);
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setName("order-processor-" + counter.incrementAndGet());
t.setDaemon(false);
return t;
}
},
new ThreadPoolExecutor.CallerRunsPolicy() // saturation policy
);

// Saturation policies (what to do when queue is full AND all threads busy):
// AbortPolicy (default): throws RejectedExecutionException
// CallerRunsPolicy: calling thread executes the task (backpressure)
// DiscardPolicy: silently drops the task
// DiscardOldestPolicy: drops the oldest queued task, tries again

// Monitor the pool
int active = executor.getActiveCount();
int queued = executor.getQueue().size();
long total = executor.getCompletedTaskCount();
System.out.printf("active=%d queued=%d completed=%d%n", active, queued, total);

Bounded vs unbounded queues โ€” the danger of unboundedโ€‹

// โŒ DANGEROUS: LinkedBlockingQueue() with no bound
// When producers are faster than consumers:
// Queue grows without limit โ†’ OutOfMemoryError after consuming all heap
ExecutorService pool = Executors.newFixedThreadPool(10); // uses unbounded queue!

// โœ… ALWAYS bound your queues in production:
new ThreadPoolExecutor(10, 10, 0L, MILLISECONDS,
new LinkedBlockingQueue<>(500), // max 500 queued tasks
new ThreadPoolExecutor.AbortPolicy() // reject if queue full
);

ThreadLocal โ€” per-thread dataโ€‹

// ThreadLocal stores a separate value per thread โ€” no synchronisation needed
// because each thread has its own copy
public class RequestContext {

private static final ThreadLocal<String> currentUserId = new ThreadLocal<>();
private static final ThreadLocal<String> requestTraceId = new ThreadLocal<>();

public static void set(String userId, String traceId) {
currentUserId.set(userId);
requestTraceId.set(traceId);
}

public static String getUserId() { return currentUserId.get(); }
public static String getTraceId() { return requestTraceId.get(); }

// CRITICAL: always clean up โ€” thread pool threads are reused!
// If you don't clear, the next request on this thread sees the previous request's values
public static void clear() {
currentUserId.remove();
requestTraceId.remove();
}
}

// In a Spring filter:
@Component
public class RequestContextFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest req,
HttpServletResponse res,
FilterChain chain) throws IOException, ServletException {
try {
RequestContext.set(
extractUserId(req),
req.getHeader("X-Trace-Id")
);
chain.doFilter(req, res);
} finally {
RequestContext.clear(); // MUST clear in finally block
}
}
}
ThreadLocal with thread pools

In a thread pool, threads are reused across many requests. If you set a ThreadLocal value and don't clear it in a finally block, the next request handled by the same thread sees the previous request's stale value. This is a common source of security bugs (wrong user ID) and data leaks.


Memory Visibility & the JMMโ€‹

The Java Memory Model (JMM) defines when one thread's writes are visible to another thread. This is non-trivial because:

CPU 1 (Thread A) L1 Cache (Core 1) Main RAM
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
x = 42 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ x = 42 (cached)
(not yet flushed) x = 0 โ† Thread B still sees 0!

CPU 2 (Thread B) L1 Cache (Core 2)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
read x โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ x = 0 (stale!)

Modern CPUs and compilers reorder instructions for performance. Without explicit synchronisation, one thread's writes may not be visible to another.

The happens-before relationshipโ€‹

A happens-before relationship guarantees that a write by Thread A is visible to Thread B:

// Guaranteed happens-before relationships:
// 1. Within one thread: each statement happens-before the next
// 2. Thread.start(): all actions before start() happen-before any action in the thread
// 3. Thread.join(): all actions in a thread happen-before join() returns
// 4. Synchronized: release of lock happens-before acquisition by another thread
// 5. volatile: write to a volatile field happens-before any subsequent read

volatile โ€” visibility without lockingโ€‹

// Without volatile: compiler may cache flag in register โ€” other threads don't see update
private boolean running = true;

// Thread A:
running = false; // writes to local cache โ€” may not flush to RAM

// Thread B:
while (running) { // reads from its own cache โ€” may loop forever โŒ

// โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

// With volatile: every read/write goes to main memory โ€” fully visible
private volatile boolean running = true;

// Thread A:
running = false; // guaranteed to write to main memory immediately

// Thread B:
while (running) { // reads from main memory โ€” sees false immediately โœ…

volatile guarantees: visibility (all threads see the latest write) and ordering (no reordering of volatile reads/writes). It does NOT guarantee atomicity โ€” volatile int counter; counter++ is still not thread-safe (read-modify-write is three operations).

synchronized โ€” mutual exclusion + visibilityโ€‹

public class Counter {
private int count = 0;

// Only one thread at a time can execute this method
public synchronized void increment() {
count++; // read-modify-write is now atomic
}

public synchronized int getCount() {
return count; // guaranteed to see latest value
}

// Equivalent with explicit lock (more flexible):
private final Object lock = new Object();

public void increment() {
synchronized (lock) { // intrinsic lock on the lock object
count++;
}
}
}

java.util.concurrent.locks.Lock โ€” explicit lockingโ€‹

import java.util.concurrent.locks.*;

public class ReadWriteCounter {

private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock readLock = rwLock.readLock();
private final Lock writeLock = rwLock.writeLock();
private int count = 0;

// Many threads can read simultaneously
public int getCount() {
readLock.lock();
try {
return count;
} finally {
readLock.unlock(); // ALWAYS unlock in finally
}
}

// Only one thread can write (exclusive lock, blocks all readers)
public void increment() {
writeLock.lock();
try {
count++;
} finally {
writeLock.unlock();
}
}

// tryLock โ€” non-blocking attempt
public boolean tryIncrement(long timeout, TimeUnit unit) throws InterruptedException {
if (writeLock.tryLock(timeout, unit)) {
try {
count++;
return true;
} finally {
writeLock.unlock();
}
}
return false; // lock not acquired within timeout
}
}

Atomic variables โ€” lock-free thread safetyโ€‹

import java.util.concurrent.atomic.*;

// AtomicInteger uses CAS (Compare-And-Swap) CPU instructions โ€” no lock needed
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet(); // atomic: fetch + increment
counter.compareAndSet(5, 10); // atomic: if current==5, set to 10
int val = counter.getAndAdd(3); // atomic: returns old value, adds 3

// AtomicReference for objects
AtomicReference<String> ref = new AtomicReference<>("initial");
ref.compareAndSet("initial", "updated"); // safe atomic swap

// LongAdder โ€” better than AtomicLong under high contention
// Maintains multiple internal counters, sums them on read
LongAdder adder = new LongAdder();
adder.increment(); // each thread increments its own cell โ€” no contention
adder.sum(); // aggregates all cells on read

Virtual Threads (Java 21 โ€” Project Loom)โ€‹

The platform thread problem for I/O-heavy workloadsโ€‹

Traditional (platform) thread handling a DB query:

Thread A (OS kernel thread, ~1MB stack):
receive HTTP request
call DB query โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ DB responds
[100ms: thread is BLOCKED, OS thread idle]
parse + respond

Problems at 10,000 concurrent requests:
โ†’ 10,000 OS threads ร— 1MB stack = 10 GB RAM just for stacks
โ†’ OS scheduler managing 10,000 threads โ†’ massive context switching overhead
โ†’ Thread pool exhaustion โ†’ requests queue โ†’ latency spikes

How virtual threads solve thisโ€‹

Virtual thread handling the same DB query:

Virtual Thread A (JVM-managed, ~few KB):
receive HTTP request
call DB query โ”€โ”€โ”€โ”€โ”€โ”€โ–บ JVM detects blocking I/O
Virtual Thread A is UNMOUNTED from OS carrier thread
OS carrier thread is now FREE for other virtual threads
Virtual Thread B, C, D... run on the freed carrier thread
โ†“ DB responds
Virtual Thread A is REMOUNTED onto a carrier thread
parse + respond

Benefits at 10,000 concurrent requests:
โ†’ JVM has ~8 carrier threads (one per CPU core)
โ†’ 10,000 virtual threads share 8 OS threads
โ†’ Only ~8 OS threads total โ†’ no OS scheduling overhead
โ†’ No thread pool exhaustion โ€” create one virtual thread per request

Virtual thread implementationโ€‹

// โ”€โ”€ Creating virtual threads โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// Direct creation
Thread vt = Thread.ofVirtual()
.name("vt-", 0) // names vt-0, vt-1, vt-2, ...
.start(() -> doWork());

// Via ExecutorService โ€” preferred for server applications
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
// Creates one virtual thread per submitted task โ€” no pool size needed
for (int i = 0; i < 100_000; i++) {
executor.submit(() -> handleRequest());
}
} // auto-close waits for all tasks to complete

// In Spring Boot 3.2+ โ€” enable virtual threads with one config line:
// application.yaml:
// spring:
// threads:
// virtual:
// enabled: true
// This configures Tomcat to use a virtual thread per HTTP request

Platform threads vs Virtual threadsโ€‹

Platform threadsVirtual threads
Managed byOS kernelJVM
Stack size~1 MB (fixed OS allocation)~few KB (grows dynamically)
Creation cost~1ms (OS syscall)~1ยตs (JVM allocation)
Max practical count~10,000Millions
Blocking I/O behaviourOS thread blocked and idleUnmounted from carrier; carrier does other work
CPU-bound suitabilityโœ… Excellentโš ๏ธ Same as platform (still needs carrier thread)
I/O-bound suitabilityโš ๏ธ Thread-per-request doesn't scaleโœ… Excellent โ€” thread-per-task scales to millions
ThreadLocal supportโœ… Fullโœ… Full (but consider ScopedValues)
DebuggabilityThread dump shows allThread dump shows all

What virtual threads do NOT solveโ€‹

// โŒ Virtual threads don't help CPU-bound work
// If your task burns CPU, the carrier thread is occupied the entire time
// 1,000,000 virtual threads all doing CPU work โ†’ still limited to 8 actual cores

// โŒ Synchronized blocks pin the virtual thread to the carrier thread
// Virtual thread cannot unmount while holding a synchronized lock
// This defeats the purpose โ€” use java.util.concurrent.locks.Lock instead

synchronized (lock) {
db.query(...) // virtual thread PINNED โ€” carrier thread blocked too โŒ
}

ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
db.query(...) // virtual thread can unmount โ€” carrier thread freed โœ…
} finally {
lock.unlock();
}

// โŒ Thread pool sized for platform threads is wrong for virtual threads
// Don't limit: Executors.newVirtualThreadPerTaskExecutor() โ€” no pool size
// Don't wrap in fixed thread pool โ€” defeats the entire model

User-Level vs Kernel-Level Threadsโ€‹

User-Level ThreadsKernel-Level Threads
Managed byUser-space library / JVMOS kernel
Context switchFast โ€” no syscall, just register swapSlow โ€” kernel transition required
One thread blocksEntire process blocks (M:1 model)Other threads continue (1:1 model)
ParallelismNo (unless M:N with kernel support)Yes โ€” one per physical core
Scheduling controlApp has full controlOS decides
ExamplesJava virtual threads (carrier side), Go goroutinesPOSIX pthreads, Java platform threads

Concurrency Primitives Comparisonโ€‹

// The concurrency tool decision tree:

// Need to run something on another thread?
// โ†’ ExecutorService.submit() or CompletableFuture.supplyAsync()

// Need a result from another thread?
// โ†’ Future<T> or CompletableFuture<T>

// Need a simple flag visible across threads?
// โ†’ volatile boolean

// Need to increment/decrement safely without locks?
// โ†’ AtomicInteger / AtomicLong / LongAdder

// Need to swap an object reference safely?
// โ†’ AtomicReference<T>

// Need exclusive access to a block of code?
// โ†’ synchronized or ReentrantLock

// Need many readers, one writer?
// โ†’ ReadWriteLock (ReentrantReadWriteLock)

// Need to wait for N threads to reach a point?
// โ†’ CountDownLatch (one-time) or CyclicBarrier (reusable)

// Need to limit concurrent access to a resource?
// โ†’ Semaphore

// Need a thread-safe queue for producer-consumer?
// โ†’ LinkedBlockingQueue or ArrayBlockingQueue

// Need per-thread isolated data?
// โ†’ ThreadLocal (remember to clear in finally)

// Need to compose async operations?
// โ†’ CompletableFuture.thenApply().thenCompose().exceptionally()

IPC in Modern Java / Springโ€‹

// โ”€โ”€ Pipes: parent-child process communication โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ProcessBuilder pb = new ProcessBuilder("wc", "-l");
pb.redirectInput(ProcessBuilder.Redirect.PIPE);
Process process = pb.start();
try (PrintWriter pw = new PrintWriter(process.getOutputStream())) {
pw.println("line one");
pw.println("line two");
}
String result = new String(process.getInputStream().readAllBytes());
// result = "2"

// โ”€โ”€ Shared memory equivalent in JVM: use concurrent data structures โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// Threads share heap โ€” just use thread-safe collections:
ConcurrentHashMap<String, Order> orderCache = new ConcurrentHashMap<>();
BlockingQueue<Event> eventQueue = new LinkedBlockingQueue<>(1000);

// Producer thread:
eventQueue.put(new OrderCreatedEvent(orderId)); // blocks if queue full

// Consumer thread:
Event event = eventQueue.take(); // blocks if queue empty

// โ”€โ”€ Cross-process in microservices: use Kafka, Redis, HTTP โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
// "IPC" between microservices is just messaging/APIs

Production Patternsโ€‹

๐Ÿ”ฌ Senior deep-dive: CompletableFuture for async orchestration
@Service
public class OrderService {

@Autowired private InventoryClient inventory;
@Autowired private PaymentClient payment;
@Autowired private NotificationClient notification;

// โŒ Sequential: total time = inventory + payment + notification
public void processOrderSequential(Order order) {
inventory.reserve(order); // 50ms
payment.charge(order); // 100ms
notification.send(order); // 30ms
// Total: 180ms
}

// โœ… Parallel where possible: total time = max(inventory, payment) + notification
public CompletableFuture<Void> processOrderAsync(Order order) {
// Reserve inventory and charge payment in parallel
CompletableFuture<Void> inventoryFuture =
CompletableFuture.runAsync(() -> inventory.reserve(order));

CompletableFuture<Void> paymentFuture =
CompletableFuture.runAsync(() -> payment.charge(order));

// Wait for BOTH to complete, then send notification
return CompletableFuture.allOf(inventoryFuture, paymentFuture)
.thenRunAsync(() -> notification.send(order))
.exceptionally(ex -> {
log.error("Order processing failed: {}", ex.getMessage());
// Compensate: release inventory, refund payment
return null;
});
// Total: max(50ms, 100ms) + 30ms = 130ms โ€” 28% faster
}
}
๐Ÿ”ฌ Senior deep-dive: ForkJoinPool and parallel streams
// ForkJoinPool: designed for recursive divide-and-conquer tasks
// Uses work-stealing: idle threads steal tasks from busy threads' queues
ForkJoinPool pool = new ForkJoinPool(
Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, // exception handler
true // async mode (FIFO for unjoined tasks)
);

// Java parallel streams use the common ForkJoinPool by default
// WARNING: all parallel streams share the SAME common pool
// A heavy stream can starve other parallel streams
List<Order> orders = getOrders();
long total = orders.parallelStream()
.mapToLong(Order::getTotal)
.sum();

// Use a custom pool to isolate from common pool:
ForkJoinPool customPool = new ForkJoinPool(4);
customPool.submit(() ->
orders.parallelStream()
.mapToLong(Order::getTotal)
.sum()
).get();
๐Ÿ”ฌ Senior deep-dive: diagnosing thread issues in production
# Get a thread dump of a running JVM process (non-intrusive)
kill -3 <pid> # sends SIGQUIT โ†’ JVM prints thread dump to stdout/log
jstack <pid> # prints thread dump to console
jstack -l <pid> # includes lock information (deadlock detection)

# Look for threads in these states:
# BLOCKED on lock: potential deadlock or lock contention
# WAITING at sun.misc.Unsafe.park: threads waiting on a condition
# RUNNABLE at java.net.SocketInputStream.socketRead: threads blocked on I/O

# Detect deadlocks automatically:
jstack -l <pid> | grep -A 10 "deadlock"

# Example deadlock in thread dump:
# Thread A: waiting to acquire lock 0x00000007d5a44ab8
# locked 0x00000007d5a44b28
# Thread B: waiting to acquire lock 0x00000007d5a44b28
# locked 0x00000007d5a44ab8
# โ† Thread A holds what B needs; B holds what A needs = deadlock
// Programmatic thread monitoring
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();

// Detect deadlocks
long[] deadlockedIds = threadBean.findDeadlockedThreads();
if (deadlockedIds != null) {
ThreadInfo[] infos = threadBean.getThreadInfo(deadlockedIds, true, true);
for (ThreadInfo info : infos) {
log.error("DEADLOCK detected: thread={} state={} blockedOn={}",
info.getThreadName(), info.getThreadState(),
info.getLockName());
}
}

// Get all thread states for monitoring
ThreadInfo[] allThreads = threadBean.dumpAllThreads(false, false);
Map<Thread.State, Long> stateCount = Arrays.stream(allThreads)
.collect(Collectors.groupingBy(ThreadInfo::getThreadState, Collectors.counting()));
// Alert if BLOCKED count is high (lock contention) or WAITING count is abnormal
# Spring Boot Actuator โ€” expose thread metrics
management:
endpoints:
web:
exposure:
include: health, metrics, threaddump

# GET /actuator/threaddump โ€” full thread dump as JSON
# GET /actuator/metrics/jvm.threads.states โ€” thread counts by state
# GET /actuator/metrics/jvm.threads.peak โ€” peak thread count

Common Mistakesโ€‹

MistakeProblemFix
thread.run() instead of thread.start()run() executes synchronously on the current thread โ€” no new thread createdAlways call thread.start()
Unbounded thread creation (new Thread() per request)10,000 requests โ†’ 10,000 OS threads โ†’ OOMUse a bounded ExecutorService or virtual threads
Not clearing ThreadLocal in thread poolsNext request sees previous request's stale values โ€” security bugAlways threadLocal.remove() in finally
synchronized on virtual threadsPins virtual thread to carrier โ€” defeats the purposeUse ReentrantLock instead of synchronized
Shared mutable state without synchronisationRace conditions โ€” non-deterministic results, data corruptionUse synchronized, Lock, AtomicXxx, or immutable objects
Catching InterruptedException and swallowing itThread interruption mechanism broken โ€” cannot shut down cleanlyRe-interrupt: Thread.currentThread().interrupt() then handle or rethrow
Unbounded LinkedBlockingQueue in ThreadPoolExecutorQueue grows without limit โ†’ OOM under sustained overloadAlways bound queues: new LinkedBlockingQueue<>(capacity)
CPU-bound tasks on virtual threadsVirtual threads don't add parallelism beyond carrier thread countUse platform threads sized to CPU cores for CPU-bound work
Deadlock from acquiring two locks in different ordersThread A holds lock1, waits for lock2; Thread B holds lock2, waits for lock1Always acquire multiple locks in the same order everywhere
parallel() streams without a custom poolAll parallel streams share one common ForkJoinPool โ€” one heavy stream starves othersUse a dedicated ForkJoinPool for heavy parallel operations

๐ŸŽฏ Interview Questionsโ€‹

Q1. What is the difference between a process and a thread?

A process is an isolated instance of a running program with its own private memory address space โ€” code, heap, stack, and data are all separate from other processes. A thread is a unit of execution within a process; all threads in a process share the same heap, code, and file descriptors but have their own stack, program counter, and registers. Processes communicate via IPC (pipes, sockets, shared memory) โ€” expensive. Threads communicate via shared memory โ€” fast but requiring synchronisation. A crash in one process doesn't affect others; a crash in one thread can kill the entire process.

Q2. What is a context switch and what are its costs?

A context switch is when the OS saves the current thread/process's CPU state (program counter, registers, stack pointer, flags) into its PCB and loads another thread/process's saved state. The cost is pure overhead โ€” no useful work happens. Costs include: saving/restoring 15+ registers (~10ns each), flushing the TLB if switching between processes (causes cache misses on subsequent memory accesses), cache pollution (new thread evicts previous thread's L1/L2 cached data), and kernel overhead. Typical cost: 1โ€“5ยตs for thread switches, 5โ€“15ยตs for process switches. Virtual thread switches in Java 21 cost < 1ยตs as they are entirely JVM-managed with no kernel syscall.

Q3. What is the difference between BLOCKED, WAITING, and TIMED_WAITING in Java?

BLOCKED: thread is waiting to acquire a synchronized monitor lock โ€” another thread holds the lock. WAITING: thread has deliberately given up the CPU using wait(), join(), or LockSupport.park() with no timeout โ€” it will wait indefinitely until explicitly woken by notify(), the joined thread completing, or unpark(). TIMED_WAITING: same as WAITING but with a timeout โ€” sleep(n), wait(n), join(n). BLOCKED is the most dangerous in production because it signals lock contention that can grow into deadlocks. High BLOCKED thread counts in a thread dump indicate excessive synchronisation.

Q4. What is a race condition and how do you prevent it?

A race condition occurs when the correctness of a program depends on the relative timing of thread execution. Classic example: counter++ is not atomic โ€” it is three operations (read, increment, write). If two threads execute it concurrently, both may read the same value, each increment it, and both write the same result โ€” one increment is lost. Prevention: (1) synchronized blocks or methods โ€” mutual exclusion; (2) AtomicInteger.incrementAndGet() โ€” CAS-based lock-free atomicity; (3) immutable objects โ€” no shared mutable state; (4) volatile for simple visibility (not sufficient for compound operations); (5) confinement โ€” only one thread ever accesses a piece of data.

Q5. What are virtual threads in Java 21 and how do they differ from platform threads?

Virtual threads (Project Loom) are lightweight JVM-managed threads that run on a small number of OS carrier threads. Unlike platform threads (1 Java thread = 1 OS kernel thread), virtual threads are unmounted from their carrier when they block on I/O โ€” the carrier thread is freed to run other virtual threads. This enables millions of concurrent virtual threads on a handful of OS threads. Platform threads are fixed at ~1MB stack (OS allocation), ~1ms to create; virtual threads start at a few KB, ~1ยตs to create. For I/O-bound workloads (databases, HTTP, file I/O), virtual threads eliminate thread pool sizing concerns โ€” create one per task. For CPU-bound workloads, virtual threads offer no advantage over platform threads โ€” both are limited by physical core count.

Q6. What is a zombie process and how do you prevent it?

A zombie process has completed execution but its PCB entry remains in the process table because its parent hasn't called wait() to read its exit status. The child's code no longer runs, but its entry occupies a slot in the process table. Process table slots are finite (typically 32,768 on Linux). A zombie flood can exhaust all slots, preventing any new process creation โ€” a form of DoS. Prevention: always call wait() or waitpid() in the parent after forking; register a SIGCHLD signal handler that calls waitpid(-1, WNOHANG) to reap all terminated children asynchronously; or double-fork so the intermediate process exits immediately, reparenting the grandchild to init which handles cleanup automatically.

Q7. (Senior) How does the JVM threading model change with virtual threads, and what pitfalls remain?

Virtual threads implement an M:N threading model: M virtual threads multiplex onto N OS carrier threads (N โ‰ˆ CPU cores). The JVM scheduler mounts a virtual thread onto a carrier for execution and unmounts it when it blocks. Unmounting requires saving only the virtual thread's stack frame (small, growable) rather than involving the OS. Remaining pitfalls: (1) synchronized blocks pin the virtual thread to its carrier โ€” the carrier cannot serve other virtual threads while the pinned virtual thread waits for I/O. Use ReentrantLock instead. (2) ThreadLocal still works but is potentially wasteful โ€” millions of virtual threads each holding a ThreadLocal value consume significant memory. Consider ScopedValue (JEP 446) for immutable per-scope data. (3) CPU-bound tasks still block the carrier thread โ€” virtual threads only help when the task spends time waiting, not computing. (4) Native methods (JNI) may pin the carrier thread if they block.


See Alsoโ€‹