JVM Internals: Memory, GC & Class Loading
A guide to the Java Virtual Machine β runtime memory areas, garbage collection algorithms and collectors, class loading, and monitoring tools.
1. JVM Architecture Overviewβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JVM β
β ββββββββββββ ββββββββββββββββββββββββββββββββββββββββ β
β β Class β β Runtime Data Areas β β
β β Loader β β ββββββββββββ βββββββββββββββββ β β
β β SubsystemββββΆβ β Method β β Heap β β β
β ββββββββββββ β β Area β β (Young + Old) β β β
β β ββββββββββββ βββββββββββββββββ β β
β β ββββββββββββ βββββββββββββββββ β β
β β β VM β β Program β β β
β β β Stack β β Counter β β β
β β ββββββββββββ βββββββββββββββββ β β
β β ββββββββββββββββββββββββββββββββ β β
β β β Native Method Stack β β β
β β ββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Execution Engine β β
β β Interpreter + JIT Compiler + Garbage Collector β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Runtime Memory Areasβ
Heap (Shared, GC-managed)β
πΆ Beginner Concept: The "Warehouse and the Desk"β
- The Heap (The Warehouse): This is a massive, shared storage facility where every object you create (
new User(),new ArrayList()) permanently lives. It is huge, fully shared by all threads, but requires a Garbage Collector janitor to clean up abandoned items. - The Stack (The Desk): Every thread gets its own tiny, private working desk. You cannot put a giant
ArrayListon the desk. You can only put tiny primitives (int,boolean) and Remote Controls (Pointers/References) on the desk. When a method finishes, the entire desk is instantly wiped clean.
The largest memory area. Stores all object instances and arrays. Divided into generations for GC efficiency:
Heap
βββ Young Generation
β βββ Eden Space (~80% of young gen)
β βββ Survivor 0 (S0) (~10%)
β βββ Survivor 1 (S1) (~10%)
βββ Old Generation (Tenured)
- Eden: New objects are allocated here.
- Survivors: Objects that survive a minor GC move between S0 and S1.
- Old Generation: Long-lived objects promoted from young gen after surviving multiple GC cycles (default threshold: 15).
Method Area / Metaspace (Shared)β
Stores class metadata, static variables, constant pool, and compiled code.
- JDK 7 and earlier: PermGen (permanent generation) β fixed size, prone to
OutOfMemoryError: PermGen space - JDK 8+: Metaspace β stored in native memory (not heap), grows dynamically
// PermGen (JDK β€ 7)
-XX:PermSize=256m -XX:MaxPermSize=512m
// Metaspace (JDK 8+)
-XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m
VM Stack (Per-Thread)β
Each thread has its own stack. Each method call creates a stack frame containing:
- Local variable array β method parameters and local variables
- Operand stack β intermediate computation values
- Frame data β constant pool reference, return address
π§ Senior Deep Dive: Escape Analysis & Scalar Replacementβ
Seniors know a critical JVM hardware optimization: Objects do NOT always go to the Heap. Since Java 1.6, the JIT Compiler runs Escape Analysis. If the compiler proves that an object created inside a method never "escapes" that method (it isn't returned, nor passed to another thread), it performs Scalar Replacement. The JVM literally breaks the object apart and places its primitive fields directly onto the CPU registers / VM Stack. This completely averts Heap allocation, meaning zero Garbage Collection overhead for those objects.
Errors:
StackOverflowErrorβ too many nested calls (e.g., infinite recursion)OutOfMemoryErrorβ cannot allocate new thread stacks
Program Counter (Per-Thread)β
A small memory area holding the address of the current bytecode instruction being executed. Undefined for native methods.
Native Method Stack (Per-Thread)β
Similar to the VM stack but for native (JNI) methods. HotSpot JVM combines native method stack and VM stack.
3. Object Lifecycleβ
Object Creationβ
When the JVM encounters a new instruction:
- Class loading check β Is the class loaded? If not, trigger class loading.
- Memory allocation β Allocate space in Eden. Two strategies:
- Bump-the-pointer β if heap is compacted, just move the pointer forward
- Free list β if heap is fragmented, find a suitable gap
- Initialize to zero β Set all fields to default values (0, null, false)
- Set object header β Store class pointer, hash code, GC age, lock info
- Execute
<init>β Run the constructor
Object Memory Layoutβ
βββββββββββββββββββββββββββββββββββββββββββ
β Object Header β
β βββββββββββββββββ βββββββββββββββββ β
β β Mark Word β β Class Pointerβ β
β β (hash, GC β β (pointer to β β
β β age, lock) β β Class meta) β β
β βββββββββββββββββ βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ€
β Instance Data β
β (fields from this class + parents) β
βββββββββββββββββββββββββββββββββββββββββββ€
β Padding (alignment) β
βββββββββββββββββββββββββββββββββββββββββββ
Compressed OOPs & Object Alignment (Memory Optimization)β
On 64-bit JVMs, object references (known as Ordinary Object Pointers / OOPs) occupy 8 bytes (64 bits) of memory. This pointer widening increases heap consumption by 30% to 40% compared to 32-bit JVMs. To mitigate this, the JVM uses an optimization called Compressed OOPs (-XX:+UseCompressedOops).
The 8-Byte Object Alignment Trickβ
In HotSpot JVM, all objects allocated on the heap are aligned to 8-byte boundaries. This means an object's memory size is always a multiple of 8, and the JVM adds 1 to 7 bytes of Padding at the end of the object layout to satisfy this constraint.
Because every object address is a multiple of 8, the lower 3 bits of any object memory address are always 000:
- Address
8is00001000 - Address
16is00010000 - Address
24is00011000
The JVM exploits this by shifting the 32-bit pointer left by 3 bits when loading it from CPU registers, and shifting it right by 3 bits when storing it back to the heap.
This bit-shifting trick allows a 32-bit pointer (which can only address 4GB of memory space) to reference up to 32 GB of heap space:
\text{Max Addressable Space} = 2^{32} \times 8 \text{ bytes} = 32 \text{ GB}
β οΈ The 32GB Heap Threshold Trap (Interview Critical)β
When the heap size configured (-Xmx) exceeds 32GB (roughly 32GB to 35GB depending on the OS and JVM vendor), the JVM disables Compressed OOPs and reverts to raw 64-bit pointers.
- The trap: When Compressed OOPs are disabled, all pointers instantly widen from 4 bytes to 8 bytes.
- The impact: A heap configured for 33GB can hold fewer actual objects than a heap configured for 31GB because the wider 64-bit references consume more memory, leading to higher GC pressure.
- Senior Heuristic: Never set your heap size just over the threshold (e.g. 33β36GB). If you need more than 31GB of heap, jump straight to 40GB+ to compensate for pointer widening.
Object Accessβ
Two approaches:
- Direct pointer (HotSpot): Reference points directly to the object. Faster access.
- Handle pool: Reference points to a handle containing pointers to both instance data and class data. More resilient during GC (only handle pointer changes).
4. Garbage Collectionβ
How GC Identifies Garbageβ
Reference Countingβ
Each object has a counter incremented/decremented when references are added/removed. Object is garbage when count = 0.
Problem: Cannot detect circular references (A β B β A).
Reachability Analysis (Used by JVM)β
Starting from GC Roots, traverse all reachable objects. Anything unreachable is garbage.
GC Roots include:
- Objects referenced in VM stack (local variables)
- Static fields in the method area
- Objects referenced by active threads
- JNI references
- Synchronized monitors
GC Algorithmsβ
Mark-Sweepβ
- Mark all reachable objects
- Sweep (free) unmarked objects
Pros: Simple. Cons: Memory fragmentation (scattered free spaces).
Mark-Compact (Mark-Sweep-Compact)β
- Mark reachable objects
- Compact β move live objects to one end
- Clear the rest
Pros: No fragmentation. Cons: Slower (requires moving objects).
Copyingβ
Divide memory into two halves. Copy live objects from one half to the other, then clear the first half.
Pros: Fast, no fragmentation. Cons: Wastes 50% of memory.
The Young generation uses a modified copying algorithm with Eden + 2 Survivors (only ~10% wasted).
Generational Collectionβ
Most objects die young (weak generational hypothesis). The JVM exploits this:
| Generation | Algorithm | Trigger | Name |
|---|---|---|---|
| Young | Copying (Eden β Survivor) | Eden full | Minor GC / Young GC |
| Old | Mark-Compact or Mark-Sweep | Old gen full | Major GC / Old GC |
| Both | Full heap collection | Various | Full GC (stop-the-world) |
Minor GC flow:
- New objects allocated in Eden
- Eden fills up β Minor GC triggered
- Live objects in Eden + active Survivor β copied to the empty Survivor
- Ages incremented; objects exceeding threshold (default 15) β promoted to Old gen
- If Survivor can't hold all survivors β overflow to Old gen
5. Garbage Collectorsβ
Serial Collector (-XX:+UseSerialGC)β
Single-threaded, stop-the-world. Suitable for small heaps and single-CPU machines.
Parallel Collector (-XX:+UseParallelGC)β
Multi-threaded young + old gen collection. Throughput-oriented β minimizes total GC time at the cost of longer individual pauses. Default in JDK 8.
CMS (Concurrent Mark Sweep) (-XX:+UseConcMarkSweepGC)β
Low-latency collector for old generation. Most work is done concurrently with application threads:
- Initial Mark (STW) β mark GC roots
- Concurrent Mark β traverse object graph concurrently
- Remark (STW) β fix changes during concurrent mark
- Concurrent Sweep β free dead objects concurrently
Downsides: CPU-intensive, produces fragmentation (no compaction), "concurrent mode failure" if old gen fills during collection. Deprecated since JDK 9, removed in JDK 14.
G1 (Garbage First) (-XX:+UseG1GC)β
Region-based collector. Divides the heap into equal-sized regions (~2048). Each region can be Eden, Survivor, Old, or Humongous (for large objects).
βββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ
β Edenβ Old βSurv β Edenβ Old βHum. β
βββββββΌββββββΌββββββΌββββββΌββββββΌββββββ€
β Old β EdenβFree β Old β Old β Edenβ
βββββββΌββββββΌββββββΌββββββΌββββββΌββββββ€
βFree β Old β Old βSurv βFree β Old β
βββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ
Key features:
- Predictable pause times:
-XX:MaxGCPauseMillis=200(target, not guarantee) - Mixed collections: Can collect young + some old regions selectively
- Compacting: Copies live objects between regions β no fragmentation
- Default in JDK 9+
ZGC (-XX:+UseZGC)β
Ultra-low-latency collector (sub-millisecond pauses) using colored pointers and load barriers.
- Pauses are < 1ms regardless of heap size.
- Supports multi-terabyte heaps (from 16MB to 16TB).
- Concurrent relocation (moves objects in memory concurrently while application threads are running, resolving fragmentation without STW pauses).
- Production-ready since JDK 15.
π§ Senior Deep Dive: Generational ZGC (Java 21+ / JEP 439)β
Historically, ZGC was a single-generation collector, meaning it concurrently scanned the entire heap during every GC cycle. Under high allocation rate workloads, this design led to allocation stalls (where application threads ran out of memory before the concurrent collector finished scanning, freezing the application).
To solve this, Java 21 introduced Generational ZGC (-XX:+UseZGC -XX:+ZGenerational), which leverages the weak generational hypothesis (most objects die young) by splitting the heap into two logical generations:
- Young Generation: Collected frequently in a very fast, low-overhead cycle.
- Old Generation: Collected less frequently.
Key Benefits over Non-Generational ZGC:β
- Higher Throughput: Collecting only young objects requires scanning a fraction of the heap, releasing CPU cycles back to application threads.
- Preventing Allocation Stalls: Rapid reclamation of short-lived objects makes allocation stalls extremely rare under heavy load.
- Sub-millisecond Latency: Retains the core concurrent guarantees of ZGC, keeping pause times under 1 millisecond (typically under 100 microseconds).
Collector Selection Guideβ
| Collector | Pause Target | Heap Size | Use Case |
|---|---|---|---|
| Serial | N/A | Small (< 100 MB) | Embedded, single-core |
| Parallel | High throughput | Medium | Batch processing |
| G1 | < 200ms | Medium-Large | General purpose (default) |
| ZGC | < 1ms | Any (up to TB) | Latency-critical apps |
| Shenandoah | < 10ms | Large | Low-latency alternative |
6. Class Loadingβ
Class Loading Processβ
Loading β Verification β Preparation β Resolution β Initialization
β β β β β
β β β β ββ Execute <clinit>
β β β β (static initializers)
β β β ββ Resolve symbolic
β β β references to direct
β β ββ Allocate memory for
β β static fields (set defaults)
β ββ Verify bytecode correctness
β (format, semantics, bytecode, symbol)
ββ Read .class file into memory,
create Class object
Class Loadersβ
Java uses a hierarchical delegation model (parent delegation):
Bootstrap ClassLoader (C/C++)
βββ loads: java.lang.*, java.util.* (core JDK)
Extension ClassLoader (Java)
βββ loads: javax.*, java.ext.dirs
Application ClassLoader (Java)
βββ loads: classpath classes (your code)
Custom ClassLoader (your implementation)
βββ loads: special sources (network, encrypted, etc.)
Parent Delegation Modelβ
When a class needs to be loaded:
- Check if already loaded
- Delegate to parent class loader first
- If parent can't load it, try loading it yourself
protected Class<?> loadClass(String name, boolean resolve) {
// 1. Already loaded?
Class<?> c = findLoadedClass(name);
if (c == null) {
try {
// 2. Delegate to parent
c = parent.loadClass(name, false);
} catch (ClassNotFoundException e) {
// 3. Parent failed β load it ourselves
c = findClass(name);
}
}
return c;
}
Why parent delegation?
- Security: Prevents malicious code from replacing core classes (e.g., custom
java.lang.String) - Consistency: Ensures core classes are loaded by the same loader
Thread Context ClassLoader (TCCL)β
While the Parent Delegation model is excellent for security and consistency, it has a fundamental design flaw: Core classes loaded by parent loaders cannot load classes that only exist in child loaders.
The Service Provider Interface (SPI) Conundrumβ
Consider the Java Database Connectivity (JDBC) API:
- The JDBC framework class
java.sql.DriverManageris part of the core Java API and is loaded by the Bootstrap ClassLoader. - When
DriverManagertries to establish a connection, it uses Java's SPI (ServiceLoader) to find and load concrete database driver implementations (likecom.mysql.cj.jdbc.Driver) present on your application's classpath. - However, the classpath is loaded by the Application ClassLoader. Since the Bootstrap ClassLoader is a parent loader, it cannot see classes loaded by its child (the Application ClassLoader). Parent delegation only goes up, not down.
Bootstrap ClassLoader (DriverManager)
β
βΌ Parent Delegation (DriverManager tries to load MySQL Driver but fails)
Application ClassLoader (mysql-connector.jar)
Breaking the Hierarchyβ
To solve this chicken-and-egg problem, Java introduced the Thread Context ClassLoader (TCCL). Each thread holds a reference to a ClassLoader (Thread.currentThread().getContextClassLoader()), which defaults to the Application ClassLoader.
Core classes in the parent ClassLoader can "break" the hierarchy by fetching the context loader from the current running thread and using it to load the child classes:
// How DriverManager breaks parent delegation (simplified)
ClassLoader cl = Thread.currentThread().getContextClassLoader();
ServiceLoader<Driver> loadedDrivers = ServiceLoader.load(Driver.class, cl);
β οΈ Senior Context: ClassLoader Memory Leaks in Containersβ
In application servers (like Tomcat) or plug-in systems where applications are deployed/undeployed dynamically, TCCL can cause severe memory leaks:
- When a web application is deployed, Tomcat creates a custom
WebappClassLoaderand sets it as the TCCL for the request thread. - If the application starts a thread pool or registers a ThreadLocal that isn't cleaned up, the thread retains a strong reference to the
WebappClassLoadervia its context class loader. - When the web application is undeployed, the GC cannot reclaim the classloader or any of the classes it loaded because the thread context pointer is still active. This leads to
OutOfMemoryError: Metaspace. - Mitigation: Always restore the original context classloader in a
finallyblock or clean up custom threads upon application shutdown.
7. Class File Structureβ
Every .class file follows a strict binary format:
ClassFile {
u4 magic; // 0xCAFEBABE
u2 minor_version;
u2 major_version; // Java 17 = 61
u2 constant_pool_count;
cp_info constant_pool[]; // literals, type refs, method refs
u2 access_flags; // public, final, abstract, etc.
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[];
u2 fields_count;
field_info fields[];
u2 methods_count;
method_info methods[];
u2 attributes_count;
attribute_info attributes[];
}
Use javap -verbose MyClass.class to inspect the structure.
8. Important JVM Parametersβ
Heap Sizingβ
# Initial and maximum heap size
-Xms512m # initial heap (set equal to -Xmx to avoid resizing)
-Xmx2g # maximum heap
# Young generation size
-Xmn512m # young gen size
-XX:NewRatio=2 # old:young ratio (default 2 β old is 2x young)
# Metaspace
-XX:MetaspaceSize=256m
-XX:MaxMetaspaceSize=512m
GC Configurationβ
# Select collector
-XX:+UseG1GC # G1 (default JDK 9+)
-XX:+UseZGC # ZGC
-XX:+UseParallelGC # Parallel (default JDK 8)
# G1 tuning
-XX:MaxGCPauseMillis=200 # target pause time
-XX:G1HeapRegionSize=4m # region size (1-32 MB, power of 2)
# GC logging (JDK 9+)
-Xlog:gc*:file=gc.log:time,uptime,level,tags
Thread Stackβ
-Xss512k # thread stack size (default ~1MB)
Troubleshootingβ
# Heap dump on OOM
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/path/to/dump.hprof
# Print GC details
-verbose:gc
9. JIT Compilation (HotSpot C1 / C2)β
The JVM doesn't just interpret bytecode β it dynamically compiles hot code to native machine code. Understanding the tiers is critical for diagnosing startup slowdowns and latency spikes.
Compilation Tiersβ
| Tier | Compiler | Description |
|---|---|---|
| 0 | Interpreter | Execute bytecode directly (cold start) |
| 1β3 | C1 (Client) | Quick compilation with basic optimization |
| 4 | C2 (Server) | Aggressive optimization: inlining, loop unrolling, escape analysis |
"Compilation storm": At startup, many methods reach the hot threshold simultaneously β C2 compiler overwhelmed β CPU spike, latency increase. Common in Kubernetes when pods receive traffic immediately.
Mitigation: GraalVM Native Image (AOT) for instant startup; JVM Tiered Compilation (-XX:+TieredCompilation) for warmup.
Deoptimizationβ
JIT makes optimistic assumptions β e.g., that a virtual method is called with only one concrete type (monomorphic call). When assumptions break:
// JIT inlines Dog.speak() for all calls β optimized for monomorphic dispatch
void speak(Animal a) { a.speak(); }
// First Cat appears β JIT's inline prediction invalid β deoptimize β interpreter
speak(new Cat());
Cold code paths with rare types cause unexpected production latency spikes even after warm-up.
-XX:+PrintCompilation # See which methods JIT compiles
-XX:CompileThreshold=10000 # Invocations before C2 trigger (default)
10. G1 GC β Internal Mechanicsβ
Humongous Objectsβ
Objects larger than 50% of a region size are allocated directly in humongous regions (multiple contiguous Old gen regions). These are only collected during a full GC unless explicitly triggered.
# Fix: increase region size to reduce humongous allocations
-XX:G1HeapRegionSize=32m
Remembered Sets (RSet) and SATBβ
Remembered Sets: Each G1 region tracks external references into it. Required so G1 can collect a single region without scanning the entire heap.
SATB (Snapshot-At-The-Beginning): G1's write barrier during concurrent marking. When a reference is overwritten, G1 records the old value in an SATB log buffer. This ensures that objects alive at mark-start remain live even if pointers are nulled during marking.
obj.field = newRef;
// SATB write barrier fires here β logs old obj.field reference
Without SATB, a concurrent mutator could hide a live object from the marking thread, causing premature collection.
Mixed Collectionsβ
After a full concurrent mark cycle, G1 picks the highest-garbage-density Old regions and collects them alongside Young gen:
-XX:G1MixedGCLiveThresholdPercent=85 # Only collect Old regions < 85% live data
-XX:G1HeapWastePercent=5 # Stop mixed GC if < 5% heap is reclaimable
# Diagnosing pauses:
-Xlog:gc*:file=gc.log:time,uptime,level,tags
# Look for: "Pause Full" β means G1 fell back to stop-the-world (bad!)
11. JDK Monitoring & Troubleshooting Toolsβ
Command-Line Toolsβ
| Tool | Purpose | Example |
|---|---|---|
jps | List running JVM processes | jps -lv |
jstat | GC and memory statistics | jstat -gcutil <pid> 1000 |
jinfo | View/modify JVM flags | jinfo -flags <pid> |
jmap | Heap dump and histogram | jmap -dump:format=b,file=heap.hprof <pid> |
jstack | Thread dump (diagnose deadlocks) | jstack <pid> |
jcmd | All-in-one diagnostic tool | jcmd <pid> GC.heap_info |
Graphical Toolsβ
- JVisualVM β bundled with JDK (up to JDK 8), monitors heap, threads, CPU
- JConsole β JMX-based monitoring console
- Eclipse MAT β heap dump analysis, find memory leaks
- Arthas β powerful runtime diagnostic tool (bytecode-level debugging)
Common Troubleshooting Scenariosβ
OutOfMemoryError: Java heap space
- Generate heap dump:
-XX:+HeapDumpOnOutOfMemoryError - Analyze with Eclipse MAT β find objects consuming most memory
- Check for memory leaks (growing collections, unclosed resources)
High CPU usage
top -H -p <pid>β find the CPU-intensive thread (note the TID)jstack <pid>β find the thread by TID (convert to hex)- Analyze the stack trace
Deadlock detection
jstack <pid>β JVM automatically detects and reports deadlocks- Look for "Found one Java-level deadlock" in the output
Frequent Full GC
jstat -gcutil <pid> 1000β monitor GC frequency and duration- Check if old gen is filling up (memory leak?) or if young gen is too small (premature promotion)
- Consider switching to G1 or ZGC for better pause behavior
12. Java Agents & Instrumentation (Telemetry Hooks)β
For Senior and Lead developers working on APM (Application Performance Monitoring) tools or custom frameworks, understanding Java Agents is essential.
What is a Java Agent?β
A Java Agent is a pluggable JVM-level tool that uses the Java Instrumentation API (java.lang.instrument) to intercept and modify the bytecode of classes loaded into the JVM.
Execution Mechanismsβ
A Java Agent can be loaded in two ways:
1. Static Loading (premain)β
The agent is specified at JVM startup using the -javaagent flag. The JVM runs the agent's premain method before the application's main method starts.
// Command: java -javaagent:myagent.jar -jar myapp.jar
public static void premain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new MyClassFileTransformer());
}
2. Dynamic Attachment (agentmain)β
The agent is dynamically loaded into a running JVM using the VirtualMachine API (from the tools.jar Attach API) after the application has already started.
public static void agentmain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new MyClassFileTransformer(), true);
// Force retransformation of already-loaded classes
inst.retransformClasses(TargetClass.class);
}
Bytecode Modificationβ
Inside the ClassFileTransformer, you inspect the class bytes, modify them (usually using libraries like ByteBuddy, ASM, or Javassist), and return the modified byte array:
public class MyClassFileTransformer implements ClassFileTransformer {
@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined,
ProtectionDomain protectionDomain, byte[] classfileBuffer) {
if ("com/example/service/BillingService".equals(className)) {
// Intercept billing methods, inject entry/exit logs or latency trackers
return injectLatencyProfilingBytes(classfileBuffer);
}
return null; // Return null to indicate no changes
}
}
Real-world Use Cases:β
- APM Tooling (Datadog, Dynatrace, New Relic): Auto-instruments database drivers, HTTP controllers, and outbound clients to record transaction traces and execution metrics without changing application code.
- Dynamic Profiling (async-profiler, Arthas): Inspects class bytecode and system metrics dynamically in production.
- Frameworks & Testing (Lombok, Mockito): Lombok uses compile-time annotation processing, but Mockito uses runtime bytecode generation (ByteBuddy) to mock interfaces and classes.
13. Reference Types & GCβ
Java provides four reference types that influence garbage collection behavior:
| Reference Type | Class | GC Behavior | Use Case |
|---|---|---|---|
| Strong | (default) | Never collected while reachable | Normal references |
| Soft | SoftReference<T> | Collected when JVM is low on memory | Memory-sensitive caches |
| Weak | WeakReference<T> | Collected at next GC | WeakHashMap, canonicalizing maps |
| Phantom | PhantomReference<T> | Enqueued after finalization | Resource cleanup tracking |
// Soft reference: cache that yields to memory pressure
SoftReference<byte[]> cache = new SoftReference<>(new byte[1024 * 1024]);
byte[] data = cache.get(); // may be null if GC reclaimed it
// Weak reference: doesn't prevent GC
WeakReference<ExpensiveObject> ref = new WeakReference<>(new ExpensiveObject());
ExpensiveObject obj = ref.get(); // null after GC
ReferenceQueues for Cleanupsβ
To cleanly handle post-mortem resources, you can register soft, weak, or phantom references with a ReferenceQueue.
When the garbage collector decides to reclaim the referent (the object referenced), it automatically clears the reference (sets it to null) and appends the reference container itself (the SoftReference or WeakReference instance) to the registered ReferenceQueue.
The application can poll or block on this queue in a background thread to safely release associated native resources (like database connections, file handles, or off-heap memory) without using slow, deprecated finalize() methods.
ReferenceQueue<ExpensiveObject> queue = new ReferenceQueue<>();
WeakReference<ExpensiveObject> ref = new WeakReference<>(new ExpensiveObject(), queue);
// ... later, after ExpensiveObject has been garbage-collected ...
Reference<? extends ExpensiveObject> clearedRef = queue.poll();
if (clearedRef != null) {
// Perform resource cleanup associated with this reference
}
π» Phantom References Require ReferenceQueueβ
Unlike Soft and Weak references, a PhantomReference's get() method always returns null. This prevents the application from accidentally resurrecting the object during garbage collection.
A PhantomReference is completely useless without a ReferenceQueue. It is used purely as a notification mechanism to know exactly when an object has been fully finalized and its memory reclaimed by the GC.
βοΈ Production Example: DirectByteBuffer & Cleanerβ
The most notable use of PhantomReference and ReferenceQueue is Java's off-heap memory management:
- When you allocate off-heap memory using
ByteBuffer.allocateDirect(10 * 1024), the JVM creates aDirectByteBufferobject on the heap. - This heap object references a native memory address allocated outside the JVM heap.
- To prevent memory leaks,
DirectByteBufferregisters a phantom reference with aCleaner(which uses aReferenceQueueinternally). - When the heap-based
DirectByteBufferis garbage-collected, the phantom reference is enqueued in theReferenceQueue. - A system-level daemon thread polls this queue and frees the associated off-heap native memory using
unsafe.freeMemory().
13. Common OOM Scenarios & Solutionsβ
| Error | Cause | Solution |
|---|---|---|
OutOfMemoryError: Java heap space | Heap exhausted | Increase -Xmx, fix memory leaks |
OutOfMemoryError: Metaspace | Too many classes loaded | Increase -XX:MaxMetaspaceSize, fix classloader leaks |
OutOfMemoryError: GC overhead limit | GC consuming over 98% CPU for under 2% heap recovery | Fix memory leaks, increase heap |
StackOverflowError | Deep/infinite recursion | Fix recursion, increase -Xss |
OutOfMemoryError: unable to create new native thread | Too many threads | Use thread pools, reduce stack size |
Advanced Editorial Pass: JVM Internals for Operational Excellenceβ
Senior-Level Focusβ
- GC tuning is workload-specific and must be tied to SLO outcomes.
- Heap, metaspace, and thread configuration are architecture choices, not defaults.
- Classloading and JIT behavior can materially impact startup and latency profiles.
Failure Modes in Productionβ
- Over-tuned JVM flags copied between services with different traffic patterns.
- Memory leaks masked by oversized heaps until incident windows.
- Misinterpreting GC logs without correlating application-level latency.
Practical Heuristicsβ
- Treat JVM tuning as iterative experimentation with measurable hypotheses.
- Baseline key metrics before any flag change.
- Keep service-specific runbooks for memory, GC, and thread incidents.
Compare Nextβ
- Java Concurrency: Threads, Locks & Concurrent Utilities
- Java Fundamentals: Core Language Concepts
- Java Interview Questions & Answers
Interview Questionsβ
Q: How do you choose between G1 and ZGC for a backend service?β
A: G1 is a strong default for balanced throughput and latency; ZGC is preferred for strict low-latency requirements with larger heaps.
Q: What metrics indicate GC tuning is required?β
A: Rising tail latency, frequent long pauses, promotion failures, and high GC CPU share under normal load.
Q: Why is allocation rate often more important than heap size?β
A: High allocation churn drives GC pressure even on large heaps, so reducing object churn often beats increasing memory.
Q: How do classloader leaks usually appear in production?β
A: Metaspace growth over time after redeploy/plugin cycles and inability to reclaim old class metadata.
Q: What is a practical JVM tuning workflow for senior engineers?β
A: Baseline, form a hypothesis, apply one controlled change, validate with load and latency data, then iterate.
Q: Why are full GC events high priority incidents?β
A: They are stop-the-world and can trigger latency spikes, timeouts, and cascading failures.
Q: How do you explain JIT warmup impact during autoscaling?β
A: New pods initially run colder code paths, so p95/p99 latency can temporarily degrade until optimization stabilizes.