Processes & Threads โ Complete Guide
- New learners โ start at What is a Process? and What is a Thread? to build the foundational mental model before looking at concurrency.
- Senior engineers โ jump to Context Switching Internals, Java Thread Pool Tuning, Memory Visibility, Virtual Threads, or Production Patterns.
What is a Process?โ
A process is a running instance of a program. When you double-click a Java application or run java -jar app.jar, the OS loads the program from disk into memory, creates a process to run it, and assigns it resources (memory, CPU time, file handles).
The key word is isolated โ each process has its own private memory space. Process A cannot read or write Process B's memory without going through the OS. This isolation is both a safety feature and a performance cost.
The restaurant analogyโ
| Restaurant concept | OS equivalent |
|---|---|
| The restaurant building | Your computer |
| One restaurant kitchen | One process (isolated resources) |
| Cooks working in that kitchen | Threads (share the kitchen's tools) |
| Walls between restaurants | Process memory isolation |
| Shouting through a window to the next restaurant | Inter-process communication (IPC) |
| A cook opening the refrigerator | Thread accessing shared heap memory |
Two restaurants cannot share their refrigerators (different process memory). But all cooks in the same kitchen can reach into the same fridge (shared heap within one process).
Process memory layoutโ
When the OS loads a process, it lays out memory in a specific structure:
High Address (e.g. 0xFFFF_FFFF)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stack โ
โ Local variables, function call frames, โ
โ return addresses, function arguments โ
โ โ grows downward โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ (free space โ stack grows down, โ
โ heap grows up into this) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ grows upward โ
โ Heap โ
โ Dynamically allocated memory: โ
โ new MyObject(), malloc(), ArrayList โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ BSS Segment โ
โ Uninitialised global/static variables โ
โ (zeroed by OS at startup) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Data Segment โ
โ Initialised global/static variables โ
โ e.g. static int MAX = 100; โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Text Segment โ
โ Compiled program bytecode / machine code โ
โ Read-only โ prevents accidental modification โ
Low Address (e.g. 0x0000_0000)
Important: the stack and heap grow toward each other. A stack overflow happens when the stack grows so large it collides with the heap region.
In Java / JVMโ
JVM Process Memory:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ JVM Heap (Xmx) โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Young Gen โ โ Old Gen โ โ GC manages โ
โ โ (Eden,S0,S1)โ โ โ all of this โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Metaspace (class metadata, method bytecodes) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Thread Stacks (one per thread, ~512KBโ1MB each) โ
โ Thread 1 stack โ Thread 2 stack โ Thread 3 stack โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Code Cache (JIT-compiled native code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Process Control Block (PCB)โ
The OS maintains a PCB (Process Control Block) data structure for every process. It's the process's "identity card" โ everything the OS needs to manage and resume the process.
// Conceptual PCB structure (simplified from Linux task_struct)
struct PCB {
int pid; // Unique process ID (e.g. 4242)
int ppid; // Parent process ID
int state; // RUNNING, READY, WAITING, ZOMBIE
// CPU context โ saved when process is descheduled
void* program_counter; // Address of NEXT instruction to execute
int registers[16]; // General-purpose register values
int stack_pointer; // Current top of stack
int flags_register; // CPU condition flags (zero, overflow, etc.)
// Memory management
PageTable* page_table; // Maps virtual โ physical memory addresses
void* heap_start;
void* stack_start;
// Scheduling
int priority; // Scheduling priority
long cpu_time_used; // Total CPU time consumed (for billing/fairness)
long last_scheduled; // When was this process last run
// I/O and resources
File* open_files[1024];// Table of open file descriptors
Signal signal_handlers[]; // Registered signal handlers
};
When a context switch happens, the current process's CPU state is saved into its PCB so it can be resumed later exactly where it left off.
Process States and Lifecycleโ
fork() / CreateProcess()
โ
โผ
โโโโโโโ
โ NEW โ โ process created but not yet admitted
โโโโโโโ
โ OS admits to memory
โผ
โโโโโโโโโโโ
โโโโโโโโ โบโ READY โโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโ โ
โ scheduler dispatches โ
โ โ โ
I/O โ โผ I/O completes /
completes โ โโโโโโโโโโโ event occurs
โ โ RUNNING โโโโโโโโโโโโโโโโโโโโโโโโบโ
โ โโโโโโโโโโโ โโโโโโโโโโโ
โ โ โ WAITING โ
โ preempted (time slice) โโโโโโโโโโโ
โโโโโโโโโโโโโโโโโ
โ
exit()
โ
โผ
โโโโโโโโโโโโโโ
โ TERMINATED โ โ PCB kept until parent calls wait()
โโโโโโโโโโโโโโ
| State | What it means |
|---|---|
| New | Process created โ not yet admitted by scheduler |
| Ready | In memory, waiting for CPU time |
| Running | Currently executing on a CPU core |
| Waiting | Blocked on I/O, sleep, or synchronisation โ not using CPU |
| Terminated | Finished โ PCB kept until parent reads exit status |
Zombie and Orphan processesโ
Zombie process:
Child exits โ becomes zombie (PCB kept but code no longer running)
Parent hasn't called wait() โ zombie accumulates
Problem: PCB entries are finite โ zombie flood can exhaust them
Fix: always call wait() or waitpid() in parent; use SIGCHLD handler
Orphan process:
Parent exits before child โ child has no parent to read its exit status
OS reparents orphan to init (PID 1) / systemd
init periodically calls wait() to clean up reparented children
Orphans are harmless โ init manages them
Process Creationโ
- Unix / Linux (fork + exec)
- Java (ProcessBuilder)
- Spring Boot โ spawning child processes
#include <unistd.h>
#include <sys/wait.h>
int main() {
pid_t pid = fork(); // Creates a child process
if (pid == 0) {
// โโ Child process โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// At this point, child is an exact copy of parent (copy-on-write)
// Replace child's image with a different program:
execl("/bin/ls", "ls", "-la", "/tmp", NULL);
// exec() replaces the process image โ code after this never runs
// if exec() returns, it failed
perror("exec failed");
exit(1);
} else if (pid > 0) {
// โโ Parent process โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
int status;
waitpid(pid, &status, 0); // Wait for child; prevents zombie
printf("Child %d exited with status %d\n", pid, WEXITSTATUS(status));
} else {
// pid < 0 = fork failed
perror("fork failed");
exit(1);
}
return 0;
}
fork() + copy-on-write:
Before fork(): After fork():
Parent memory: Parent memory: Child memory:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Code (R/O) โ โ Code (R/O) โโโโโโโโ Code (R/O) โ (shared, read-only)
โ Data=42 โ โ Data=42 โโโโโโโโ Data=42 โ (shared, copy-on-write)
โ Heap โ โ Heap โโโโโโโโ Heap โ (shared until written)
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
When child writes to Data:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Data=42 โ โ Data=99 โ โ OS makes private copy
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
No upfront copying โ pages are copied lazily only when written.
// Java creates child processes via ProcessBuilder
ProcessBuilder pb = new ProcessBuilder("ls", "-la", "/tmp");
pb.redirectErrorStream(true);
pb.directory(new File("/"));
Process process = pb.start();
// Read child's output
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(process.getInputStream()))) {
reader.lines().forEach(System.out::println);
}
int exitCode = process.waitFor();
System.out.println("Exit code: " + exitCode);
// On Unix: ProcessBuilder.start() calls fork() + exec() internally
// On Windows: calls CreateProcess()
// In Spring Boot: use ProcessBuilder for CLI tools, scripts
@Service
public class ImageProcessingService {
public void convertImage(Path input, Path output) throws IOException, InterruptedException {
ProcessBuilder pb = new ProcessBuilder(
"convert", input.toString(), "-resize", "800x600", output.toString()
);
pb.environment().put("PATH", "/usr/bin:/usr/local/bin");
pb.redirectErrorStream(true);
Process process = pb.start();
// Capture output in a separate thread to avoid blocking
CompletableFuture.runAsync(() -> {
try (var reader = new BufferedReader(new InputStreamReader(process.getInputStream()))) {
reader.lines().forEach(log::debug);
} catch (IOException ignored) {}
});
boolean finished = process.waitFor(30, TimeUnit.SECONDS);
if (!finished) {
process.destroyForcibly();
throw new RuntimeException("Image conversion timed out");
}
if (process.exitValue() != 0) {
throw new RuntimeException("Image conversion failed: exit " + process.exitValue());
}
}
}
Inter-Process Communication (IPC)โ
Since processes have separate memory spaces, they need OS-mediated mechanisms to communicate:
| Mechanism | Direction | Latency | Persistence | Best for |
|---|---|---|---|---|
| Pipe | Unidirectional | Very low | In-memory only | Parentโchild byte streaming |
| Named Pipe (FIFO) | Unidirectional | Very low | Filesystem entry | Unrelated processes same machine |
| Message Queue | Bidirectional | Low | Kernel-managed | Structured message passing |
| Shared Memory | Bidirectional | Lowest | RAM only | High-speed bulk data transfer |
| Unix Domain Socket | Bidirectional | Very low | In-memory | High-perf local IPC (NginxโPHP-FPM) |
| TCP Socket | Bidirectional | Higher | Network-capable | Cross-machine or cross-container |
| Signal | Notification | Very low | None | Simple events (SIGTERM, SIGKILL) |
| Memory-Mapped File | Bidirectional | Low | File-backed | Large data, database files |
Shared memory โ the fastest IPCโ
// Process A: creates shared memory segment, writes to it
int shm_id = shmget(IPC_PRIVATE, sizeof(int) * 1000, IPC_CREAT | 0666);
int* data = (int*)shmat(shm_id, NULL, 0); // attach to address space
data[0] = 42; // write directly to RAM
// Process B: attaches to same segment, reads it
int* data = (int*)shmat(shm_id, NULL, 0); // same physical RAM
printf("%d\n", data[0]); // reads 42 โ zero copy!
// Warning: no synchronisation โ need a semaphore or mutex alongside
Java equivalent:
// Java NIO MappedByteBuffer (memory-mapped file โ OS maps file into address space)
RandomAccessFile file = new RandomAccessFile("shared.dat", "rw");
FileChannel channel = file.getChannel();
MappedByteBuffer buffer = channel.map(FileChannel.MapMode.READ_WRITE, 0, 4096);
buffer.putInt(0, 42); // write โ goes directly to mapped memory
int val = buffer.getInt(0); // read โ from mapped memory
// Multiple JVM processes mapping the same file share the same physical RAM pages
// Used by: Kafka (log files), Chronicle Map, off-heap databases
What is a Thread?โ
A thread is the smallest unit of CPU execution within a process. All threads in a process share the same memory space โ heap, code, global data, open files โ but each thread has its own:
Process (shared resources)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Heap (shared by all threads) โ
โ Code Segment (shared) โ
โ Data Segment (shared) โ
โ Open File Descriptors (shared) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโค
โ Thread 1 โ Thread 2 โ Thread 3 โ
โ Stack โ โ Stack โ โ Stack โ โ
โ PC: 0x4A2F โ PC: 0x9C10 โ PC: 0x1F03 โ
โ Registers โ Registers โ Registers โ
โ Thread ID โ Thread ID โ Thread ID โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโ
Thread vs process โ the key differenceโ
Creating a new process (fork):
โ Copy the entire address space (even with CoW, page tables are copied)
โ New PCB, file descriptor table, signal handlers
โ Time: ~1ms
โ Complete isolation โ crash in one doesn't affect others
Creating a new thread:
โ Share existing address space โ only a new stack is allocated
โ Share file descriptors, heap, code
โ Time: ~10ยตs (100ร faster than process creation)
โ Crash in one thread (e.g. StackOverflowError) crashes the whole process
โ Shared memory means synchronisation is required
Why threads exist โ the parallelism problemโ
Single-threaded server handling 3 requests:
Request A (100ms DB query) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Request B waits โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Request C waits โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Total time: 300ms
Multi-threaded server (3 threads):
Thread 1: Request A (100ms DB query) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Thread 2: Request B (100ms DB query) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Thread 3: Request C (100ms DB query) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ response
Total time: 100ms (3ร faster for same work)
Threading Modelsโ
| Model | Mapping | Parallelism | Used by | Trade-off |
|---|---|---|---|---|
| 1:1 | 1 user thread = 1 kernel thread | โ True | Java (modern), pthreads | OS overhead per thread |
| M:1 | N user threads = 1 kernel thread | โ None | Old green threads | One block blocks all |
| M:N | M user threads = N kernel threads | โ True | Go (goroutines), Erlang | Complex scheduler |
Java's threading model evolutionโ
Java 1โ20 (Platform threads, 1:1 model):
Each Java Thread = one OS kernel thread
Creating 10,000 threads โ 10,000 OS threads โ ~10 GB of stack RAM
OS scheduler manages all 10,000 โ context switch overhead
Java 21+ (Virtual threads, M:N model via Project Loom):
Each Virtual Thread = lightweight JVM-managed thread
10,000,000 virtual threads โ small number of OS carrier threads
JVM scheduler manages virtual threads; OS only sees carrier threads
When virtual thread blocks on I/O โ unmounted from carrier โ carrier does other work
Java Thread Lifecycleโ
Creating and starting threadsโ
// โโ Option 1: Extend Thread (rarely used) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
class WorkerThread extends Thread {
@Override
public void run() {
System.out.println("Running in: " + Thread.currentThread().getName());
}
}
new WorkerThread().start();
// โโ Option 2: Implement Runnable (functional style) โโโโโโโโโโโโโโโโโโโโโโโ
Thread t = new Thread(() -> System.out.println("Lambda thread"));
t.setName("worker-1");
t.setDaemon(true); // daemon threads don't prevent JVM shutdown
t.setPriority(Thread.NORM_PRIORITY); // 1โ10, OS uses as a hint
t.start(); // don't call run() directly โ that runs synchronously
// โโ Option 3: ExecutorService (ALWAYS use in production) โโโโโโโโโโโโโโโโโ
ExecutorService pool = Executors.newFixedThreadPool(4);
Future<String> future = pool.submit(() -> "result from thread");
String result = future.get(5, TimeUnit.SECONDS); // blocks at most 5s
pool.shutdown(); // graceful: wait for running tasks to finish
pool.awaitTermination(30, TimeUnit.SECONDS);
Java thread statesโ
NEW
โ thread.start()
โผ
RUNNABLE โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโ tries to enter synchronized block โ BLOCKED โโโบโค (lock released)
โ โ
โโโ calls wait() / LockSupport.park() โ WAITING โโโบโค (notify/unpark)
โ โ
โโโ calls sleep(n) / wait(n) / join(n) โ TIMED_WAITING โโโบ (timeout expires)
โ
โ run() completes or exception thrown
โผ
TERMINATED
| State | Trigger | How to resume |
|---|---|---|
NEW | new Thread() called | Call .start() |
RUNNABLE | .start() called | OS schedules it |
BLOCKED | Waiting for a synchronized lock | Other thread releases the lock |
WAITING | wait(), join(), park() | notify(), thread completes, unpark() |
TIMED_WAITING | sleep(n), wait(n), join(n) | Timer expires or interrupted |
TERMINATED | run() returns or throws | Cannot restart |
// Inspect thread state programmatically
Thread t = new Thread(() -> {
try { Thread.sleep(5000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); }
});
System.out.println(t.getState()); // NEW
t.start();
Thread.sleep(100);
System.out.println(t.getState()); // TIMED_WAITING
t.join();
System.out.println(t.getState()); // TERMINATED
Context Switching Internalsโ
A context switch is the OS saving one thread's CPU state and restoring another's. It is pure overhead โ no useful work happens during a switch.
What gets saved and restoredโ
Thread A is running โ time slice expires โ context switch โ Thread B runs
Step 1: Save Thread A's CPU context to its PCB/TCB:
program_counter โ memory address of next instruction Thread A would execute
stack_pointer โ current top of Thread A's call stack
registers[0..15] โ all general-purpose register values (rdx, rsi, r8, ...)
flags_register โ condition codes (zero flag, carry flag, etc.)
FPU state โ floating-point unit registers (if used)
Step 2: If switching processes (not just threads):
TLB flush โ Translation Lookaside Buffer must be invalidated
(virtualโphysical address mapping changes per process)
Page table swap โ new process's page table loaded into CR3 register
Step 3: Load Thread B's CPU context from its PCB/TCB:
(reverse of step 1 โ restore all saved state)
Step 4: CPU resumes executing at Thread B's saved program_counter
Context switch costsโ
| Switch type | Typical cost | Primary cost driver |
|---|---|---|
| Thread switch (same process) | 1โ5 ยตs | Register save/restore, scheduler |
| Thread switch (different process) | 5โ15 ยตs | + TLB flush, page table swap |
| Virtual thread switch (Java 21) | < 1 ยตs | JVM-managed โ no kernel syscall |
The TLB flush problem: the TLB (Translation Lookaside Buffer) caches virtualโphysical address translations. When switching between processes, the TLB must be invalidated because the new process has a completely different address space. After the switch, every memory access triggers a TLB miss until the cache warms up again โ this is why process switches are more expensive than thread switches within the same process.
Cache pollution: CPU L1/L2 caches contain the working set of the running thread. A context switch brings in a different thread's working set, evicting the previous thread's data. When that thread resumes, it faces cache misses until its data is reloaded.
When context switching hurts performanceโ
// Anti-pattern: more threads than CPU cores on CPU-bound work
// 8 core machine, 200 threads doing CPU-intensive computation:
ExecutorService pool = Executors.newFixedThreadPool(200);
// What actually happens:
// OS constantly context-switches 200 threads across 8 cores
// Each switch: 5ยตs overhead ร thousands of switches/sec = significant CPU waste
// Threads spend more time being switched than doing actual computation
// Fix for CPU-bound work: match threads to available cores
int cpuCores = Runtime.getRuntime().availableProcessors();
ExecutorService pool = Executors.newFixedThreadPool(cpuCores); // 8 threads, 8 cores
// Each core runs one thread continuously โ no context switching needed
// Fix for I/O-bound work: more threads are OK (they spend most time waiting)
// Or better: use virtual threads (Java 21) โ no OS thread blocked during I/O wait
Java Thread Pool Tuningโ
The two workload typesโ
CPU-bound work:
Uses 100% CPU during execution (sorting, encryption, image processing)
Context-switching between threads wastes CPU cycles
Optimal threads = CPU cores (1 thread per core, no switching needed)
ExecutorService cpuPool = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors()
);
I/O-bound work:
Thread spends most time blocked waiting (DB query, HTTP call, file read)
CPU is idle during the wait โ other threads can use it
Optimal threads = (CPU cores) / (1 - blocking_ratio)
Example: 8 cores, 90% blocking โ 8 / 0.1 = 80 threads
Or: use virtual threads (Java 21) โ no kernel thread blocked during I/O
ExecutorService ioPool = Executors.newFixedThreadPool(80);
// OR (Java 21+):
ExecutorService vtPool = Executors.newVirtualThreadPerTaskExecutor();
ThreadPoolExecutor โ full controlโ
// Executors.newFixedThreadPool(n) is just a convenience wrapper.
// For production, use ThreadPoolExecutor directly for full control:
ThreadPoolExecutor executor = new ThreadPoolExecutor(
10, // corePoolSize: always-alive threads
50, // maximumPoolSize: peak thread count
60, TimeUnit.SECONDS, // keepAliveTime: idle thread survival time
new LinkedBlockingQueue<>(1000), // workQueue: task buffer when all threads busy
new ThreadFactory() {
private final AtomicInteger counter = new AtomicInteger(0);
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
t.setName("order-processor-" + counter.incrementAndGet());
t.setDaemon(false);
return t;
}
},
new ThreadPoolExecutor.CallerRunsPolicy() // saturation policy
);
// Saturation policies (what to do when queue is full AND all threads busy):
// AbortPolicy (default): throws RejectedExecutionException
// CallerRunsPolicy: calling thread executes the task (backpressure)
// DiscardPolicy: silently drops the task
// DiscardOldestPolicy: drops the oldest queued task, tries again
// Monitor the pool
int active = executor.getActiveCount();
int queued = executor.getQueue().size();
long total = executor.getCompletedTaskCount();
System.out.printf("active=%d queued=%d completed=%d%n", active, queued, total);
Bounded vs unbounded queues โ the danger of unboundedโ
// โ DANGEROUS: LinkedBlockingQueue() with no bound
// When producers are faster than consumers:
// Queue grows without limit โ OutOfMemoryError after consuming all heap
ExecutorService pool = Executors.newFixedThreadPool(10); // uses unbounded queue!
// โ
ALWAYS bound your queues in production:
new ThreadPoolExecutor(10, 10, 0L, MILLISECONDS,
new LinkedBlockingQueue<>(500), // max 500 queued tasks
new ThreadPoolExecutor.AbortPolicy() // reject if queue full
);
ThreadLocal โ per-thread dataโ
// ThreadLocal stores a separate value per thread โ no synchronisation needed
// because each thread has its own copy
public class RequestContext {
private static final ThreadLocal<String> currentUserId = new ThreadLocal<>();
private static final ThreadLocal<String> requestTraceId = new ThreadLocal<>();
public static void set(String userId, String traceId) {
currentUserId.set(userId);
requestTraceId.set(traceId);
}
public static String getUserId() { return currentUserId.get(); }
public static String getTraceId() { return requestTraceId.get(); }
// CRITICAL: always clean up โ thread pool threads are reused!
// If you don't clear, the next request on this thread sees the previous request's values
public static void clear() {
currentUserId.remove();
requestTraceId.remove();
}
}
// In a Spring filter:
@Component
public class RequestContextFilter extends OncePerRequestFilter {
@Override
protected void doFilterInternal(HttpServletRequest req,
HttpServletResponse res,
FilterChain chain) throws IOException, ServletException {
try {
RequestContext.set(
extractUserId(req),
req.getHeader("X-Trace-Id")
);
chain.doFilter(req, res);
} finally {
RequestContext.clear(); // MUST clear in finally block
}
}
}
In a thread pool, threads are reused across many requests. If you set a ThreadLocal value and don't clear it in a finally block, the next request handled by the same thread sees the previous request's stale value. This is a common source of security bugs (wrong user ID) and data leaks.
Memory Visibility & the JMMโ
The Java Memory Model (JMM) defines when one thread's writes are visible to another thread. This is non-trivial because:
CPU 1 (Thread A) L1 Cache (Core 1) Main RAM
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโ
x = 42 โโโโโโโโโโโโโโโบ x = 42 (cached)
(not yet flushed) x = 0 โ Thread B still sees 0!
CPU 2 (Thread B) L1 Cache (Core 2)
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
read x โโโโโโโโโโโโโโโโโบ โโโโโโโโโโโโโโโโโโโบ x = 0 (stale!)
Modern CPUs and compilers reorder instructions for performance. Without explicit synchronisation, one thread's writes may not be visible to another.
The happens-before relationshipโ
A happens-before relationship guarantees that a write by Thread A is visible to Thread B:
// Guaranteed happens-before relationships:
// 1. Within one thread: each statement happens-before the next
// 2. Thread.start(): all actions before start() happen-before any action in the thread
// 3. Thread.join(): all actions in a thread happen-before join() returns
// 4. Synchronized: release of lock happens-before acquisition by another thread
// 5. volatile: write to a volatile field happens-before any subsequent read
volatile โ visibility without lockingโ
// Without volatile: compiler may cache flag in register โ other threads don't see update
private boolean running = true;
// Thread A:
running = false; // writes to local cache โ may not flush to RAM
// Thread B:
while (running) { // reads from its own cache โ may loop forever โ
// โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// With volatile: every read/write goes to main memory โ fully visible
private volatile boolean running = true;
// Thread A:
running = false; // guaranteed to write to main memory immediately
// Thread B:
while (running) { // reads from main memory โ sees false immediately โ
volatile guarantees: visibility (all threads see the latest write) and ordering (no reordering of volatile reads/writes). It does NOT guarantee atomicity โ volatile int counter; counter++ is still not thread-safe (read-modify-write is three operations).
synchronized โ mutual exclusion + visibilityโ
public class Counter {
private int count = 0;
// Only one thread at a time can execute this method
public synchronized void increment() {
count++; // read-modify-write is now atomic
}
public synchronized int getCount() {
return count; // guaranteed to see latest value
}
// Equivalent with explicit lock (more flexible):
private final Object lock = new Object();
public void increment() {
synchronized (lock) { // intrinsic lock on the lock object
count++;
}
}
}
java.util.concurrent.locks.Lock โ explicit lockingโ
import java.util.concurrent.locks.*;
public class ReadWriteCounter {
private final ReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Lock readLock = rwLock.readLock();
private final Lock writeLock = rwLock.writeLock();
private int count = 0;
// Many threads can read simultaneously
public int getCount() {
readLock.lock();
try {
return count;
} finally {
readLock.unlock(); // ALWAYS unlock in finally
}
}
// Only one thread can write (exclusive lock, blocks all readers)
public void increment() {
writeLock.lock();
try {
count++;
} finally {
writeLock.unlock();
}
}
// tryLock โ non-blocking attempt
public boolean tryIncrement(long timeout, TimeUnit unit) throws InterruptedException {
if (writeLock.tryLock(timeout, unit)) {
try {
count++;
return true;
} finally {
writeLock.unlock();
}
}
return false; // lock not acquired within timeout
}
}
Atomic variables โ lock-free thread safetyโ
import java.util.concurrent.atomic.*;
// AtomicInteger uses CAS (Compare-And-Swap) CPU instructions โ no lock needed
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet(); // atomic: fetch + increment
counter.compareAndSet(5, 10); // atomic: if current==5, set to 10
int val = counter.getAndAdd(3); // atomic: returns old value, adds 3
// AtomicReference for objects
AtomicReference<String> ref = new AtomicReference<>("initial");
ref.compareAndSet("initial", "updated"); // safe atomic swap
// LongAdder โ better than AtomicLong under high contention
// Maintains multiple internal counters, sums them on read
LongAdder adder = new LongAdder();
adder.increment(); // each thread increments its own cell โ no contention
adder.sum(); // aggregates all cells on read
Virtual Threads (Java 21 โ Project Loom)โ
The platform thread problem for I/O-heavy workloadsโ
Traditional (platform) thread handling a DB query:
Thread A (OS kernel thread, ~1MB stack):
receive HTTP request
call DB query โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ DB responds
[100ms: thread is BLOCKED, OS thread idle]
parse + respond
Problems at 10,000 concurrent requests:
โ 10,000 OS threads ร 1MB stack = 10 GB RAM just for stacks
โ OS scheduler managing 10,000 threads โ massive context switching overhead
โ Thread pool exhaustion โ requests queue โ latency spikes
How virtual threads solve thisโ
Virtual thread handling the same DB query:
Virtual Thread A (JVM-managed, ~few KB):
receive HTTP request
call DB query โโโโโโโบ JVM detects blocking I/O
Virtual Thread A is UNMOUNTED from OS carrier thread
OS carrier thread is now FREE for other virtual threads
Virtual Thread B, C, D... run on the freed carrier thread
โ DB responds
Virtual Thread A is REMOUNTED onto a carrier thread
parse + respond
Benefits at 10,000 concurrent requests:
โ JVM has ~8 carrier threads (one per CPU core)
โ 10,000 virtual threads share 8 OS threads
โ Only ~8 OS threads total โ no OS scheduling overhead
โ No thread pool exhaustion โ create one virtual thread per request
Virtual thread implementationโ
// โโ Creating virtual threads โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// Direct creation
Thread vt = Thread.ofVirtual()
.name("vt-", 0) // names vt-0, vt-1, vt-2, ...
.start(() -> doWork());
// Via ExecutorService โ preferred for server applications
try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
// Creates one virtual thread per submitted task โ no pool size needed
for (int i = 0; i < 100_000; i++) {
executor.submit(() -> handleRequest());
}
} // auto-close waits for all tasks to complete
// In Spring Boot 3.2+ โ enable virtual threads with one config line:
// application.yaml:
// spring:
// threads:
// virtual:
// enabled: true
// This configures Tomcat to use a virtual thread per HTTP request
Platform threads vs Virtual threadsโ
| Platform threads | Virtual threads | |
|---|---|---|
| Managed by | OS kernel | JVM |
| Stack size | ~1 MB (fixed OS allocation) | ~few KB (grows dynamically) |
| Creation cost | ~1ms (OS syscall) | ~1ยตs (JVM allocation) |
| Max practical count | ~10,000 | Millions |
| Blocking I/O behaviour | OS thread blocked and idle | Unmounted from carrier; carrier does other work |
| CPU-bound suitability | โ Excellent | โ ๏ธ Same as platform (still needs carrier thread) |
| I/O-bound suitability | โ ๏ธ Thread-per-request doesn't scale | โ Excellent โ thread-per-task scales to millions |
| ThreadLocal support | โ Full | โ Full (but consider ScopedValues) |
| Debuggability | Thread dump shows all | Thread dump shows all |
What virtual threads do NOT solveโ
// โ Virtual threads don't help CPU-bound work
// If your task burns CPU, the carrier thread is occupied the entire time
// 1,000,000 virtual threads all doing CPU work โ still limited to 8 actual cores
// โ Synchronized blocks pin the virtual thread to the carrier thread
// Virtual thread cannot unmount while holding a synchronized lock
// This defeats the purpose โ use java.util.concurrent.locks.Lock instead
synchronized (lock) {
db.query(...) // virtual thread PINNED โ carrier thread blocked too โ
}
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
db.query(...) // virtual thread can unmount โ carrier thread freed โ
} finally {
lock.unlock();
}
// โ Thread pool sized for platform threads is wrong for virtual threads
// Don't limit: Executors.newVirtualThreadPerTaskExecutor() โ no pool size
// Don't wrap in fixed thread pool โ defeats the entire model
User-Level vs Kernel-Level Threadsโ
| User-Level Threads | Kernel-Level Threads | |
|---|---|---|
| Managed by | User-space library / JVM | OS kernel |
| Context switch | Fast โ no syscall, just register swap | Slow โ kernel transition required |
| One thread blocks | Entire process blocks (M:1 model) | Other threads continue (1:1 model) |
| Parallelism | No (unless M:N with kernel support) | Yes โ one per physical core |
| Scheduling control | App has full control | OS decides |
| Examples | Java virtual threads (carrier side), Go goroutines | POSIX pthreads, Java platform threads |
Concurrency Primitives Comparisonโ
// The concurrency tool decision tree:
// Need to run something on another thread?
// โ ExecutorService.submit() or CompletableFuture.supplyAsync()
// Need a result from another thread?
// โ Future<T> or CompletableFuture<T>
// Need a simple flag visible across threads?
// โ volatile boolean
// Need to increment/decrement safely without locks?
// โ AtomicInteger / AtomicLong / LongAdder
// Need to swap an object reference safely?
// โ AtomicReference<T>
// Need exclusive access to a block of code?
// โ synchronized or ReentrantLock
// Need many readers, one writer?
// โ ReadWriteLock (ReentrantReadWriteLock)
// Need to wait for N threads to reach a point?
// โ CountDownLatch (one-time) or CyclicBarrier (reusable)
// Need to limit concurrent access to a resource?
// โ Semaphore
// Need a thread-safe queue for producer-consumer?
// โ LinkedBlockingQueue or ArrayBlockingQueue
// Need per-thread isolated data?
// โ ThreadLocal (remember to clear in finally)
// Need to compose async operations?
// โ CompletableFuture.thenApply().thenCompose().exceptionally()
IPC in Modern Java / Springโ
// โโ Pipes: parent-child process communication โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
ProcessBuilder pb = new ProcessBuilder("wc", "-l");
pb.redirectInput(ProcessBuilder.Redirect.PIPE);
Process process = pb.start();
try (PrintWriter pw = new PrintWriter(process.getOutputStream())) {
pw.println("line one");
pw.println("line two");
}
String result = new String(process.getInputStream().readAllBytes());
// result = "2"
// โโ Shared memory equivalent in JVM: use concurrent data structures โโโโโโโ
// Threads share heap โ just use thread-safe collections:
ConcurrentHashMap<String, Order> orderCache = new ConcurrentHashMap<>();
BlockingQueue<Event> eventQueue = new LinkedBlockingQueue<>(1000);
// Producer thread:
eventQueue.put(new OrderCreatedEvent(orderId)); // blocks if queue full
// Consumer thread:
Event event = eventQueue.take(); // blocks if queue empty
// โโ Cross-process in microservices: use Kafka, Redis, HTTP โโโโโโโโโโโโโโโโ
// "IPC" between microservices is just messaging/APIs
Production Patternsโ
๐ฌ Senior deep-dive: CompletableFuture for async orchestration
@Service
public class OrderService {
@Autowired private InventoryClient inventory;
@Autowired private PaymentClient payment;
@Autowired private NotificationClient notification;
// โ Sequential: total time = inventory + payment + notification
public void processOrderSequential(Order order) {
inventory.reserve(order); // 50ms
payment.charge(order); // 100ms
notification.send(order); // 30ms
// Total: 180ms
}
// โ
Parallel where possible: total time = max(inventory, payment) + notification
public CompletableFuture<Void> processOrderAsync(Order order) {
// Reserve inventory and charge payment in parallel
CompletableFuture<Void> inventoryFuture =
CompletableFuture.runAsync(() -> inventory.reserve(order));
CompletableFuture<Void> paymentFuture =
CompletableFuture.runAsync(() -> payment.charge(order));
// Wait for BOTH to complete, then send notification
return CompletableFuture.allOf(inventoryFuture, paymentFuture)
.thenRunAsync(() -> notification.send(order))
.exceptionally(ex -> {
log.error("Order processing failed: {}", ex.getMessage());
// Compensate: release inventory, refund payment
return null;
});
// Total: max(50ms, 100ms) + 30ms = 130ms โ 28% faster
}
}
๐ฌ Senior deep-dive: ForkJoinPool and parallel streams
// ForkJoinPool: designed for recursive divide-and-conquer tasks
// Uses work-stealing: idle threads steal tasks from busy threads' queues
ForkJoinPool pool = new ForkJoinPool(
Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, // exception handler
true // async mode (FIFO for unjoined tasks)
);
// Java parallel streams use the common ForkJoinPool by default
// WARNING: all parallel streams share the SAME common pool
// A heavy stream can starve other parallel streams
List<Order> orders = getOrders();
long total = orders.parallelStream()
.mapToLong(Order::getTotal)
.sum();
// Use a custom pool to isolate from common pool:
ForkJoinPool customPool = new ForkJoinPool(4);
customPool.submit(() ->
orders.parallelStream()
.mapToLong(Order::getTotal)
.sum()
).get();
๐ฌ Senior deep-dive: diagnosing thread issues in production
# Get a thread dump of a running JVM process (non-intrusive)
kill -3 <pid> # sends SIGQUIT โ JVM prints thread dump to stdout/log
jstack <pid> # prints thread dump to console
jstack -l <pid> # includes lock information (deadlock detection)
# Look for threads in these states:
# BLOCKED on lock: potential deadlock or lock contention
# WAITING at sun.misc.Unsafe.park: threads waiting on a condition
# RUNNABLE at java.net.SocketInputStream.socketRead: threads blocked on I/O
# Detect deadlocks automatically:
jstack -l <pid> | grep -A 10 "deadlock"
# Example deadlock in thread dump:
# Thread A: waiting to acquire lock 0x00000007d5a44ab8
# locked 0x00000007d5a44b28
# Thread B: waiting to acquire lock 0x00000007d5a44b28
# locked 0x00000007d5a44ab8
# โ Thread A holds what B needs; B holds what A needs = deadlock
// Programmatic thread monitoring
ThreadMXBean threadBean = ManagementFactory.getThreadMXBean();
// Detect deadlocks
long[] deadlockedIds = threadBean.findDeadlockedThreads();
if (deadlockedIds != null) {
ThreadInfo[] infos = threadBean.getThreadInfo(deadlockedIds, true, true);
for (ThreadInfo info : infos) {
log.error("DEADLOCK detected: thread={} state={} blockedOn={}",
info.getThreadName(), info.getThreadState(),
info.getLockName());
}
}
// Get all thread states for monitoring
ThreadInfo[] allThreads = threadBean.dumpAllThreads(false, false);
Map<Thread.State, Long> stateCount = Arrays.stream(allThreads)
.collect(Collectors.groupingBy(ThreadInfo::getThreadState, Collectors.counting()));
// Alert if BLOCKED count is high (lock contention) or WAITING count is abnormal
# Spring Boot Actuator โ expose thread metrics
management:
endpoints:
web:
exposure:
include: health, metrics, threaddump
# GET /actuator/threaddump โ full thread dump as JSON
# GET /actuator/metrics/jvm.threads.states โ thread counts by state
# GET /actuator/metrics/jvm.threads.peak โ peak thread count
Common Mistakesโ
| Mistake | Problem | Fix |
|---|---|---|
thread.run() instead of thread.start() | run() executes synchronously on the current thread โ no new thread created | Always call thread.start() |
Unbounded thread creation (new Thread() per request) | 10,000 requests โ 10,000 OS threads โ OOM | Use a bounded ExecutorService or virtual threads |
Not clearing ThreadLocal in thread pools | Next request sees previous request's stale values โ security bug | Always threadLocal.remove() in finally |
synchronized on virtual threads | Pins virtual thread to carrier โ defeats the purpose | Use ReentrantLock instead of synchronized |
| Shared mutable state without synchronisation | Race conditions โ non-deterministic results, data corruption | Use synchronized, Lock, AtomicXxx, or immutable objects |
Catching InterruptedException and swallowing it | Thread interruption mechanism broken โ cannot shut down cleanly | Re-interrupt: Thread.currentThread().interrupt() then handle or rethrow |
Unbounded LinkedBlockingQueue in ThreadPoolExecutor | Queue grows without limit โ OOM under sustained overload | Always bound queues: new LinkedBlockingQueue<>(capacity) |
| CPU-bound tasks on virtual threads | Virtual threads don't add parallelism beyond carrier thread count | Use platform threads sized to CPU cores for CPU-bound work |
| Deadlock from acquiring two locks in different orders | Thread A holds lock1, waits for lock2; Thread B holds lock2, waits for lock1 | Always acquire multiple locks in the same order everywhere |
parallel() streams without a custom pool | All parallel streams share one common ForkJoinPool โ one heavy stream starves others | Use a dedicated ForkJoinPool for heavy parallel operations |
๐ฏ Interview Questionsโ
Q1. What is the difference between a process and a thread?
A process is an isolated instance of a running program with its own private memory address space โ code, heap, stack, and data are all separate from other processes. A thread is a unit of execution within a process; all threads in a process share the same heap, code, and file descriptors but have their own stack, program counter, and registers. Processes communicate via IPC (pipes, sockets, shared memory) โ expensive. Threads communicate via shared memory โ fast but requiring synchronisation. A crash in one process doesn't affect others; a crash in one thread can kill the entire process.
Q2. What is a context switch and what are its costs?
A context switch is when the OS saves the current thread/process's CPU state (program counter, registers, stack pointer, flags) into its PCB and loads another thread/process's saved state. The cost is pure overhead โ no useful work happens. Costs include: saving/restoring 15+ registers (~10ns each), flushing the TLB if switching between processes (causes cache misses on subsequent memory accesses), cache pollution (new thread evicts previous thread's L1/L2 cached data), and kernel overhead. Typical cost: 1โ5ยตs for thread switches, 5โ15ยตs for process switches. Virtual thread switches in Java 21 cost < 1ยตs as they are entirely JVM-managed with no kernel syscall.
Q3. What is the difference between BLOCKED, WAITING, and TIMED_WAITING in Java?
BLOCKED: thread is waiting to acquire asynchronizedmonitor lock โ another thread holds the lock.WAITING: thread has deliberately given up the CPU usingwait(),join(), orLockSupport.park()with no timeout โ it will wait indefinitely until explicitly woken bynotify(), the joined thread completing, orunpark().TIMED_WAITING: same as WAITING but with a timeout โsleep(n),wait(n),join(n). BLOCKED is the most dangerous in production because it signals lock contention that can grow into deadlocks. High BLOCKED thread counts in a thread dump indicate excessive synchronisation.
Q4. What is a race condition and how do you prevent it?
A race condition occurs when the correctness of a program depends on the relative timing of thread execution. Classic example:
counter++is not atomic โ it is three operations (read, increment, write). If two threads execute it concurrently, both may read the same value, each increment it, and both write the same result โ one increment is lost. Prevention: (1)synchronizedblocks or methods โ mutual exclusion; (2)AtomicInteger.incrementAndGet()โ CAS-based lock-free atomicity; (3) immutable objects โ no shared mutable state; (4)volatilefor simple visibility (not sufficient for compound operations); (5) confinement โ only one thread ever accesses a piece of data.
Q5. What are virtual threads in Java 21 and how do they differ from platform threads?
Virtual threads (Project Loom) are lightweight JVM-managed threads that run on a small number of OS carrier threads. Unlike platform threads (1 Java thread = 1 OS kernel thread), virtual threads are unmounted from their carrier when they block on I/O โ the carrier thread is freed to run other virtual threads. This enables millions of concurrent virtual threads on a handful of OS threads. Platform threads are fixed at ~1MB stack (OS allocation), ~1ms to create; virtual threads start at a few KB, ~1ยตs to create. For I/O-bound workloads (databases, HTTP, file I/O), virtual threads eliminate thread pool sizing concerns โ create one per task. For CPU-bound workloads, virtual threads offer no advantage over platform threads โ both are limited by physical core count.
Q6. What is a zombie process and how do you prevent it?
A zombie process has completed execution but its PCB entry remains in the process table because its parent hasn't called
wait()to read its exit status. The child's code no longer runs, but its entry occupies a slot in the process table. Process table slots are finite (typically 32,768 on Linux). A zombie flood can exhaust all slots, preventing any new process creation โ a form of DoS. Prevention: always callwait()orwaitpid()in the parent after forking; register aSIGCHLDsignal handler that callswaitpid(-1, WNOHANG)to reap all terminated children asynchronously; or double-fork so the intermediate process exits immediately, reparenting the grandchild toinitwhich handles cleanup automatically.
Q7. (Senior) How does the JVM threading model change with virtual threads, and what pitfalls remain?
Virtual threads implement an M:N threading model: M virtual threads multiplex onto N OS carrier threads (N โ CPU cores). The JVM scheduler mounts a virtual thread onto a carrier for execution and unmounts it when it blocks. Unmounting requires saving only the virtual thread's stack frame (small, growable) rather than involving the OS. Remaining pitfalls: (1)
synchronizedblocks pin the virtual thread to its carrier โ the carrier cannot serve other virtual threads while the pinned virtual thread waits for I/O. UseReentrantLockinstead. (2)ThreadLocalstill works but is potentially wasteful โ millions of virtual threads each holding a ThreadLocal value consume significant memory. ConsiderScopedValue(JEP 446) for immutable per-scope data. (3) CPU-bound tasks still block the carrier thread โ virtual threads only help when the task spends time waiting, not computing. (4) Native methods (JNI) may pin the carrier thread if they block.