Skip to main content

JVM Internals: Memory, GC & Class Loading

A guide to the Java Virtual Machine β€” runtime memory areas, garbage collection algorithms and collectors, class loading, and monitoring tools.


1. JVM Architecture Overview​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ JVM β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Class β”‚ β”‚ Runtime Data Areas β”‚ β”‚
β”‚ β”‚ Loader β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ Subsystem│──▢│ β”‚ Method β”‚ β”‚ Heap β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ Area β”‚ β”‚ (Young + Old) β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ VM β”‚ β”‚ Program β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Stack β”‚ β”‚ Counter β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Native Method Stack β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Execution Engine β”‚ β”‚
β”‚ β”‚ Interpreter + JIT Compiler + Garbage Collector β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Runtime Memory Areas​

Heap (Shared, GC-managed)​

πŸ‘Ά Beginner Concept: The "Warehouse and the Desk"​

  • The Heap (The Warehouse): This is a massive, shared storage facility where every object you create (new User(), new ArrayList()) permanently lives. It is huge, fully shared by all threads, but requires a Garbage Collector janitor to clean up abandoned items.
  • The Stack (The Desk): Every thread gets its own tiny, private working desk. You cannot put a giant ArrayList on the desk. You can only put tiny primitives (int, boolean) and Remote Controls (Pointers/References) on the desk. When a method finishes, the entire desk is instantly wiped clean.

The largest memory area. Stores all object instances and arrays. Divided into generations for GC efficiency:

Heap
β”œβ”€β”€ Young Generation
β”‚ β”œβ”€β”€ Eden Space (~80% of young gen)
β”‚ β”œβ”€β”€ Survivor 0 (S0) (~10%)
β”‚ └── Survivor 1 (S1) (~10%)
└── Old Generation (Tenured)
  • Eden: New objects are allocated here.
  • Survivors: Objects that survive a minor GC move between S0 and S1.
  • Old Generation: Long-lived objects promoted from young gen after surviving multiple GC cycles (default threshold: 15).

Method Area / Metaspace (Shared)​

Stores class metadata, static variables, constant pool, and compiled code.

  • JDK 7 and earlier: PermGen (permanent generation) β€” fixed size, prone to OutOfMemoryError: PermGen space
  • JDK 8+: Metaspace β€” stored in native memory (not heap), grows dynamically
// PermGen (JDK ≀ 7)
-XX:PermSize=256m -XX:MaxPermSize=512m

// Metaspace (JDK 8+)
-XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m

VM Stack (Per-Thread)​

Each thread has its own stack. Each method call creates a stack frame containing:

  • Local variable array β€” method parameters and local variables
  • Operand stack β€” intermediate computation values
  • Frame data β€” constant pool reference, return address

🧠 Senior Deep Dive: Escape Analysis & Scalar Replacement​

Seniors know a critical JVM hardware optimization: Objects do NOT always go to the Heap. Since Java 1.6, the JIT Compiler runs Escape Analysis. If the compiler proves that an object created inside a method never "escapes" that method (it isn't returned, nor passed to another thread), it performs Scalar Replacement. The JVM literally breaks the object apart and places its primitive fields directly onto the CPU registers / VM Stack. This completely averts Heap allocation, meaning zero Garbage Collection overhead for those objects.

Errors:

  • StackOverflowError β€” too many nested calls (e.g., infinite recursion)
  • OutOfMemoryError β€” cannot allocate new thread stacks

Program Counter (Per-Thread)​

A small memory area holding the address of the current bytecode instruction being executed. Undefined for native methods.

Native Method Stack (Per-Thread)​

Similar to the VM stack but for native (JNI) methods. HotSpot JVM combines native method stack and VM stack.


3. Object Lifecycle​

Object Creation​

When the JVM encounters a new instruction:

  1. Class loading check β€” Is the class loaded? If not, trigger class loading.
  2. Memory allocation β€” Allocate space in Eden. Two strategies:
    • Bump-the-pointer β€” if heap is compacted, just move the pointer forward
    • Free list β€” if heap is fragmented, find a suitable gap
  3. Initialize to zero β€” Set all fields to default values (0, null, false)
  4. Set object header β€” Store class pointer, hash code, GC age, lock info
  5. Execute <init> β€” Run the constructor

Object Memory Layout​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Object Header β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Mark Word β”‚ β”‚ Class Pointerβ”‚ β”‚
β”‚ β”‚ (hash, GC β”‚ β”‚ (pointer to β”‚ β”‚
β”‚ β”‚ age, lock) β”‚ β”‚ Class meta) β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Instance Data β”‚
β”‚ (fields from this class + parents) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Padding (alignment) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Compressed OOPs & Object Alignment (Memory Optimization)​

On 64-bit JVMs, object references (known as Ordinary Object Pointers / OOPs) occupy 8 bytes (64 bits) of memory. This pointer widening increases heap consumption by 30% to 40% compared to 32-bit JVMs. To mitigate this, the JVM uses an optimization called Compressed OOPs (-XX:+UseCompressedOops).

The 8-Byte Object Alignment Trick​

In HotSpot JVM, all objects allocated on the heap are aligned to 8-byte boundaries. This means an object's memory size is always a multiple of 8, and the JVM adds 1 to 7 bytes of Padding at the end of the object layout to satisfy this constraint.

Because every object address is a multiple of 8, the lower 3 bits of any object memory address are always 000:

  • Address 8 is 00001000
  • Address 16 is 00010000
  • Address 24 is 00011000

The JVM exploits this by shifting the 32-bit pointer left by 3 bits when loading it from CPU registers, and shifting it right by 3 bits when storing it back to the heap.

This bit-shifting trick allows a 32-bit pointer (which can only address 4GB of memory space) to reference up to 32 GB of heap space: \text{Max Addressable Space} = 2^{32} \times 8 \text{ bytes} = 32 \text{ GB}

⚠️ The 32GB Heap Threshold Trap (Interview Critical)​

When the heap size configured (-Xmx) exceeds 32GB (roughly 32GB to 35GB depending on the OS and JVM vendor), the JVM disables Compressed OOPs and reverts to raw 64-bit pointers.

  • The trap: When Compressed OOPs are disabled, all pointers instantly widen from 4 bytes to 8 bytes.
  • The impact: A heap configured for 33GB can hold fewer actual objects than a heap configured for 31GB because the wider 64-bit references consume more memory, leading to higher GC pressure.
  • Senior Heuristic: Never set your heap size just over the threshold (e.g. 33–36GB). If you need more than 31GB of heap, jump straight to 40GB+ to compensate for pointer widening.

Object Access​

Two approaches:

  • Direct pointer (HotSpot): Reference points directly to the object. Faster access.
  • Handle pool: Reference points to a handle containing pointers to both instance data and class data. More resilient during GC (only handle pointer changes).

4. Garbage Collection​

How GC Identifies Garbage​

Reference Counting​

Each object has a counter incremented/decremented when references are added/removed. Object is garbage when count = 0.

Problem: Cannot detect circular references (A β†’ B β†’ A).

Reachability Analysis (Used by JVM)​

Starting from GC Roots, traverse all reachable objects. Anything unreachable is garbage.

GC Roots include:

  • Objects referenced in VM stack (local variables)
  • Static fields in the method area
  • Objects referenced by active threads
  • JNI references
  • Synchronized monitors

GC Algorithms​

Mark-Sweep​

  1. Mark all reachable objects
  2. Sweep (free) unmarked objects

Pros: Simple. Cons: Memory fragmentation (scattered free spaces).

Mark-Compact (Mark-Sweep-Compact)​

  1. Mark reachable objects
  2. Compact β€” move live objects to one end
  3. Clear the rest

Pros: No fragmentation. Cons: Slower (requires moving objects).

Copying​

Divide memory into two halves. Copy live objects from one half to the other, then clear the first half.

Pros: Fast, no fragmentation. Cons: Wastes 50% of memory.

The Young generation uses a modified copying algorithm with Eden + 2 Survivors (only ~10% wasted).

Generational Collection​

Most objects die young (weak generational hypothesis). The JVM exploits this:

GenerationAlgorithmTriggerName
YoungCopying (Eden β†’ Survivor)Eden fullMinor GC / Young GC
OldMark-Compact or Mark-SweepOld gen fullMajor GC / Old GC
BothFull heap collectionVariousFull GC (stop-the-world)

Minor GC flow:

  1. New objects allocated in Eden
  2. Eden fills up β†’ Minor GC triggered
  3. Live objects in Eden + active Survivor β†’ copied to the empty Survivor
  4. Ages incremented; objects exceeding threshold (default 15) β†’ promoted to Old gen
  5. If Survivor can't hold all survivors β†’ overflow to Old gen

5. Garbage Collectors​

Serial Collector (-XX:+UseSerialGC)​

Single-threaded, stop-the-world. Suitable for small heaps and single-CPU machines.

Parallel Collector (-XX:+UseParallelGC)​

Multi-threaded young + old gen collection. Throughput-oriented β€” minimizes total GC time at the cost of longer individual pauses. Default in JDK 8.

CMS (Concurrent Mark Sweep) (-XX:+UseConcMarkSweepGC)​

Low-latency collector for old generation. Most work is done concurrently with application threads:

  1. Initial Mark (STW) β€” mark GC roots
  2. Concurrent Mark β€” traverse object graph concurrently
  3. Remark (STW) β€” fix changes during concurrent mark
  4. Concurrent Sweep β€” free dead objects concurrently

Downsides: CPU-intensive, produces fragmentation (no compaction), "concurrent mode failure" if old gen fills during collection. Deprecated since JDK 9, removed in JDK 14.

G1 (Garbage First) (-XX:+UseG1GC)​

Region-based collector. Divides the heap into equal-sized regions (~2048). Each region can be Eden, Survivor, Old, or Humongous (for large objects).

β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ Edenβ”‚ Old β”‚Surv β”‚ Edenβ”‚ Old β”‚Hum. β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ Old β”‚ Edenβ”‚Free β”‚ Old β”‚ Old β”‚ Edenβ”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚Free β”‚ Old β”‚ Old β”‚Surv β”‚Free β”‚ Old β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Key features:

  • Predictable pause times: -XX:MaxGCPauseMillis=200 (target, not guarantee)
  • Mixed collections: Can collect young + some old regions selectively
  • Compacting: Copies live objects between regions β€” no fragmentation
  • Default in JDK 9+

ZGC (-XX:+UseZGC)​

Ultra-low-latency collector (sub-millisecond pauses) using colored pointers and load barriers.

  • Pauses are < 1ms regardless of heap size.
  • Supports multi-terabyte heaps (from 16MB to 16TB).
  • Concurrent relocation (moves objects in memory concurrently while application threads are running, resolving fragmentation without STW pauses).
  • Production-ready since JDK 15.

🧠 Senior Deep Dive: Generational ZGC (Java 21+ / JEP 439)​

Historically, ZGC was a single-generation collector, meaning it concurrently scanned the entire heap during every GC cycle. Under high allocation rate workloads, this design led to allocation stalls (where application threads ran out of memory before the concurrent collector finished scanning, freezing the application).

To solve this, Java 21 introduced Generational ZGC (-XX:+UseZGC -XX:+ZGenerational), which leverages the weak generational hypothesis (most objects die young) by splitting the heap into two logical generations:

  • Young Generation: Collected frequently in a very fast, low-overhead cycle.
  • Old Generation: Collected less frequently.
Key Benefits over Non-Generational ZGC:​
  1. Higher Throughput: Collecting only young objects requires scanning a fraction of the heap, releasing CPU cycles back to application threads.
  2. Preventing Allocation Stalls: Rapid reclamation of short-lived objects makes allocation stalls extremely rare under heavy load.
  3. Sub-millisecond Latency: Retains the core concurrent guarantees of ZGC, keeping pause times under 1 millisecond (typically under 100 microseconds).

Collector Selection Guide​

CollectorPause TargetHeap SizeUse Case
SerialN/ASmall (< 100 MB)Embedded, single-core
ParallelHigh throughputMediumBatch processing
G1< 200msMedium-LargeGeneral purpose (default)
ZGC< 1msAny (up to TB)Latency-critical apps
Shenandoah< 10msLargeLow-latency alternative

6. Class Loading​

Class Loading Process​

Loading β†’ Verification β†’ Preparation β†’ Resolution β†’ Initialization
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ └─ Execute <clinit>
β”‚ β”‚ β”‚ β”‚ (static initializers)
β”‚ β”‚ β”‚ └─ Resolve symbolic
β”‚ β”‚ β”‚ references to direct
β”‚ β”‚ └─ Allocate memory for
β”‚ β”‚ static fields (set defaults)
β”‚ └─ Verify bytecode correctness
β”‚ (format, semantics, bytecode, symbol)
└─ Read .class file into memory,
create Class object

Class Loaders​

Java uses a hierarchical delegation model (parent delegation):

Bootstrap ClassLoader (C/C++)
└── loads: java.lang.*, java.util.* (core JDK)

Extension ClassLoader (Java)
└── loads: javax.*, java.ext.dirs

Application ClassLoader (Java)
└── loads: classpath classes (your code)

Custom ClassLoader (your implementation)
└── loads: special sources (network, encrypted, etc.)

Parent Delegation Model​

When a class needs to be loaded:

  1. Check if already loaded
  2. Delegate to parent class loader first
  3. If parent can't load it, try loading it yourself
protected Class<?> loadClass(String name, boolean resolve) {
// 1. Already loaded?
Class<?> c = findLoadedClass(name);
if (c == null) {
try {
// 2. Delegate to parent
c = parent.loadClass(name, false);
} catch (ClassNotFoundException e) {
// 3. Parent failed β€” load it ourselves
c = findClass(name);
}
}
return c;
}

Why parent delegation?

  • Security: Prevents malicious code from replacing core classes (e.g., custom java.lang.String)
  • Consistency: Ensures core classes are loaded by the same loader

Thread Context ClassLoader (TCCL)​

While the Parent Delegation model is excellent for security and consistency, it has a fundamental design flaw: Core classes loaded by parent loaders cannot load classes that only exist in child loaders.

The Service Provider Interface (SPI) Conundrum​

Consider the Java Database Connectivity (JDBC) API:

  1. The JDBC framework class java.sql.DriverManager is part of the core Java API and is loaded by the Bootstrap ClassLoader.
  2. When DriverManager tries to establish a connection, it uses Java's SPI (ServiceLoader) to find and load concrete database driver implementations (like com.mysql.cj.jdbc.Driver) present on your application's classpath.
  3. However, the classpath is loaded by the Application ClassLoader. Since the Bootstrap ClassLoader is a parent loader, it cannot see classes loaded by its child (the Application ClassLoader). Parent delegation only goes up, not down.
Bootstrap ClassLoader (DriverManager)
β”‚
β–Ό Parent Delegation (DriverManager tries to load MySQL Driver but fails)
Application ClassLoader (mysql-connector.jar)

Breaking the Hierarchy​

To solve this chicken-and-egg problem, Java introduced the Thread Context ClassLoader (TCCL). Each thread holds a reference to a ClassLoader (Thread.currentThread().getContextClassLoader()), which defaults to the Application ClassLoader.

Core classes in the parent ClassLoader can "break" the hierarchy by fetching the context loader from the current running thread and using it to load the child classes:

// How DriverManager breaks parent delegation (simplified)
ClassLoader cl = Thread.currentThread().getContextClassLoader();
ServiceLoader<Driver> loadedDrivers = ServiceLoader.load(Driver.class, cl);
⚠️ Senior Context: ClassLoader Memory Leaks in Containers​

In application servers (like Tomcat) or plug-in systems where applications are deployed/undeployed dynamically, TCCL can cause severe memory leaks:

  • When a web application is deployed, Tomcat creates a custom WebappClassLoader and sets it as the TCCL for the request thread.
  • If the application starts a thread pool or registers a ThreadLocal that isn't cleaned up, the thread retains a strong reference to the WebappClassLoader via its context class loader.
  • When the web application is undeployed, the GC cannot reclaim the classloader or any of the classes it loaded because the thread context pointer is still active. This leads to OutOfMemoryError: Metaspace.
  • Mitigation: Always restore the original context classloader in a finally block or clean up custom threads upon application shutdown.

7. Class File Structure​

Every .class file follows a strict binary format:

ClassFile {
u4 magic; // 0xCAFEBABE
u2 minor_version;
u2 major_version; // Java 17 = 61
u2 constant_pool_count;
cp_info constant_pool[]; // literals, type refs, method refs
u2 access_flags; // public, final, abstract, etc.
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[];
u2 fields_count;
field_info fields[];
u2 methods_count;
method_info methods[];
u2 attributes_count;
attribute_info attributes[];
}

Use javap -verbose MyClass.class to inspect the structure.


8. Important JVM Parameters​

Heap Sizing​

# Initial and maximum heap size
-Xms512m # initial heap (set equal to -Xmx to avoid resizing)
-Xmx2g # maximum heap

# Young generation size
-Xmn512m # young gen size
-XX:NewRatio=2 # old:young ratio (default 2 β†’ old is 2x young)

# Metaspace
-XX:MetaspaceSize=256m
-XX:MaxMetaspaceSize=512m

GC Configuration​

# Select collector
-XX:+UseG1GC # G1 (default JDK 9+)
-XX:+UseZGC # ZGC
-XX:+UseParallelGC # Parallel (default JDK 8)

# G1 tuning
-XX:MaxGCPauseMillis=200 # target pause time
-XX:G1HeapRegionSize=4m # region size (1-32 MB, power of 2)

# GC logging (JDK 9+)
-Xlog:gc*:file=gc.log:time,uptime,level,tags

Thread Stack​

-Xss512k # thread stack size (default ~1MB)

Troubleshooting​

# Heap dump on OOM
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/path/to/dump.hprof

# Print GC details
-verbose:gc

9. JIT Compilation (HotSpot C1 / C2)​

The JVM doesn't just interpret bytecode β€” it dynamically compiles hot code to native machine code. Understanding the tiers is critical for diagnosing startup slowdowns and latency spikes.

Compilation Tiers​

TierCompilerDescription
0InterpreterExecute bytecode directly (cold start)
1–3C1 (Client)Quick compilation with basic optimization
4C2 (Server)Aggressive optimization: inlining, loop unrolling, escape analysis

"Compilation storm": At startup, many methods reach the hot threshold simultaneously β†’ C2 compiler overwhelmed β†’ CPU spike, latency increase. Common in Kubernetes when pods receive traffic immediately.

Mitigation: GraalVM Native Image (AOT) for instant startup; JVM Tiered Compilation (-XX:+TieredCompilation) for warmup.

Deoptimization​

JIT makes optimistic assumptions β€” e.g., that a virtual method is called with only one concrete type (monomorphic call). When assumptions break:

// JIT inlines Dog.speak() for all calls β€” optimized for monomorphic dispatch
void speak(Animal a) { a.speak(); }

// First Cat appears β†’ JIT's inline prediction invalid β†’ deoptimize β†’ interpreter
speak(new Cat());

Cold code paths with rare types cause unexpected production latency spikes even after warm-up.

-XX:+PrintCompilation # See which methods JIT compiles
-XX:CompileThreshold=10000 # Invocations before C2 trigger (default)

10. G1 GC β€” Internal Mechanics​

Humongous Objects​

Objects larger than 50% of a region size are allocated directly in humongous regions (multiple contiguous Old gen regions). These are only collected during a full GC unless explicitly triggered.

# Fix: increase region size to reduce humongous allocations
-XX:G1HeapRegionSize=32m

Remembered Sets (RSet) and SATB​

Remembered Sets: Each G1 region tracks external references into it. Required so G1 can collect a single region without scanning the entire heap.

SATB (Snapshot-At-The-Beginning): G1's write barrier during concurrent marking. When a reference is overwritten, G1 records the old value in an SATB log buffer. This ensures that objects alive at mark-start remain live even if pointers are nulled during marking.

obj.field = newRef;
// SATB write barrier fires here β†’ logs old obj.field reference

Without SATB, a concurrent mutator could hide a live object from the marking thread, causing premature collection.

Mixed Collections​

After a full concurrent mark cycle, G1 picks the highest-garbage-density Old regions and collects them alongside Young gen:

-XX:G1MixedGCLiveThresholdPercent=85 # Only collect Old regions < 85% live data
-XX:G1HeapWastePercent=5 # Stop mixed GC if < 5% heap is reclaimable
# Diagnosing pauses:
-Xlog:gc*:file=gc.log:time,uptime,level,tags
# Look for: "Pause Full" β€” means G1 fell back to stop-the-world (bad!)

11. JDK Monitoring & Troubleshooting Tools​

Command-Line Tools​

ToolPurposeExample
jpsList running JVM processesjps -lv
jstatGC and memory statisticsjstat -gcutil <pid> 1000
jinfoView/modify JVM flagsjinfo -flags <pid>
jmapHeap dump and histogramjmap -dump:format=b,file=heap.hprof <pid>
jstackThread dump (diagnose deadlocks)jstack <pid>
jcmdAll-in-one diagnostic tooljcmd <pid> GC.heap_info

Graphical Tools​

  • JVisualVM β€” bundled with JDK (up to JDK 8), monitors heap, threads, CPU
  • JConsole β€” JMX-based monitoring console
  • Eclipse MAT β€” heap dump analysis, find memory leaks
  • Arthas β€” powerful runtime diagnostic tool (bytecode-level debugging)

Common Troubleshooting Scenarios​

OutOfMemoryError: Java heap space

  1. Generate heap dump: -XX:+HeapDumpOnOutOfMemoryError
  2. Analyze with Eclipse MAT β†’ find objects consuming most memory
  3. Check for memory leaks (growing collections, unclosed resources)

High CPU usage

  1. top -H -p <pid> β†’ find the CPU-intensive thread (note the TID)
  2. jstack <pid> β†’ find the thread by TID (convert to hex)
  3. Analyze the stack trace

Deadlock detection

  1. jstack <pid> β†’ JVM automatically detects and reports deadlocks
  2. Look for "Found one Java-level deadlock" in the output

Frequent Full GC

  1. jstat -gcutil <pid> 1000 β†’ monitor GC frequency and duration
  2. Check if old gen is filling up (memory leak?) or if young gen is too small (premature promotion)
  3. Consider switching to G1 or ZGC for better pause behavior

12. Java Agents & Instrumentation (Telemetry Hooks)​

For Senior and Lead developers working on APM (Application Performance Monitoring) tools or custom frameworks, understanding Java Agents is essential.

What is a Java Agent?​

A Java Agent is a pluggable JVM-level tool that uses the Java Instrumentation API (java.lang.instrument) to intercept and modify the bytecode of classes loaded into the JVM.

Execution Mechanisms​

A Java Agent can be loaded in two ways:

1. Static Loading (premain)​

The agent is specified at JVM startup using the -javaagent flag. The JVM runs the agent's premain method before the application's main method starts.

// Command: java -javaagent:myagent.jar -jar myapp.jar
public static void premain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new MyClassFileTransformer());
}

2. Dynamic Attachment (agentmain)​

The agent is dynamically loaded into a running JVM using the VirtualMachine API (from the tools.jar Attach API) after the application has already started.

public static void agentmain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new MyClassFileTransformer(), true);
// Force retransformation of already-loaded classes
inst.retransformClasses(TargetClass.class);
}

Bytecode Modification​

Inside the ClassFileTransformer, you inspect the class bytes, modify them (usually using libraries like ByteBuddy, ASM, or Javassist), and return the modified byte array:

public class MyClassFileTransformer implements ClassFileTransformer {
@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined,
ProtectionDomain protectionDomain, byte[] classfileBuffer) {
if ("com/example/service/BillingService".equals(className)) {
// Intercept billing methods, inject entry/exit logs or latency trackers
return injectLatencyProfilingBytes(classfileBuffer);
}
return null; // Return null to indicate no changes
}
}

Real-world Use Cases:​

  1. APM Tooling (Datadog, Dynatrace, New Relic): Auto-instruments database drivers, HTTP controllers, and outbound clients to record transaction traces and execution metrics without changing application code.
  2. Dynamic Profiling (async-profiler, Arthas): Inspects class bytecode and system metrics dynamically in production.
  3. Frameworks & Testing (Lombok, Mockito): Lombok uses compile-time annotation processing, but Mockito uses runtime bytecode generation (ByteBuddy) to mock interfaces and classes.

13. Reference Types & GC​

Java provides four reference types that influence garbage collection behavior:

Reference TypeClassGC BehaviorUse Case
Strong(default)Never collected while reachableNormal references
SoftSoftReference<T>Collected when JVM is low on memoryMemory-sensitive caches
WeakWeakReference<T>Collected at next GCWeakHashMap, canonicalizing maps
PhantomPhantomReference<T>Enqueued after finalizationResource cleanup tracking
// Soft reference: cache that yields to memory pressure
SoftReference<byte[]> cache = new SoftReference<>(new byte[1024 * 1024]);
byte[] data = cache.get(); // may be null if GC reclaimed it

// Weak reference: doesn't prevent GC
WeakReference<ExpensiveObject> ref = new WeakReference<>(new ExpensiveObject());
ExpensiveObject obj = ref.get(); // null after GC

ReferenceQueues for Cleanups​

To cleanly handle post-mortem resources, you can register soft, weak, or phantom references with a ReferenceQueue.

When the garbage collector decides to reclaim the referent (the object referenced), it automatically clears the reference (sets it to null) and appends the reference container itself (the SoftReference or WeakReference instance) to the registered ReferenceQueue.

The application can poll or block on this queue in a background thread to safely release associated native resources (like database connections, file handles, or off-heap memory) without using slow, deprecated finalize() methods.

ReferenceQueue<ExpensiveObject> queue = new ReferenceQueue<>();
WeakReference<ExpensiveObject> ref = new WeakReference<>(new ExpensiveObject(), queue);

// ... later, after ExpensiveObject has been garbage-collected ...
Reference<? extends ExpensiveObject> clearedRef = queue.poll();
if (clearedRef != null) {
// Perform resource cleanup associated with this reference
}

πŸ‘» Phantom References Require ReferenceQueue​

Unlike Soft and Weak references, a PhantomReference's get() method always returns null. This prevents the application from accidentally resurrecting the object during garbage collection.

A PhantomReference is completely useless without a ReferenceQueue. It is used purely as a notification mechanism to know exactly when an object has been fully finalized and its memory reclaimed by the GC.

βš™οΈ Production Example: DirectByteBuffer & Cleaner​

The most notable use of PhantomReference and ReferenceQueue is Java's off-heap memory management:

  1. When you allocate off-heap memory using ByteBuffer.allocateDirect(10 * 1024), the JVM creates a DirectByteBuffer object on the heap.
  2. This heap object references a native memory address allocated outside the JVM heap.
  3. To prevent memory leaks, DirectByteBuffer registers a phantom reference with a Cleaner (which uses a ReferenceQueue internally).
  4. When the heap-based DirectByteBuffer is garbage-collected, the phantom reference is enqueued in the ReferenceQueue.
  5. A system-level daemon thread polls this queue and frees the associated off-heap native memory using unsafe.freeMemory().

13. Common OOM Scenarios & Solutions​

ErrorCauseSolution
OutOfMemoryError: Java heap spaceHeap exhaustedIncrease -Xmx, fix memory leaks
OutOfMemoryError: MetaspaceToo many classes loadedIncrease -XX:MaxMetaspaceSize, fix classloader leaks
OutOfMemoryError: GC overhead limitGC consuming over 98% CPU for under 2% heap recoveryFix memory leaks, increase heap
StackOverflowErrorDeep/infinite recursionFix recursion, increase -Xss
OutOfMemoryError: unable to create new native threadToo many threadsUse thread pools, reduce stack size

Advanced Editorial Pass: JVM Internals for Operational Excellence​

Senior-Level Focus​

  • GC tuning is workload-specific and must be tied to SLO outcomes.
  • Heap, metaspace, and thread configuration are architecture choices, not defaults.
  • Classloading and JIT behavior can materially impact startup and latency profiles.

Failure Modes in Production​

  • Over-tuned JVM flags copied between services with different traffic patterns.
  • Memory leaks masked by oversized heaps until incident windows.
  • Misinterpreting GC logs without correlating application-level latency.

Practical Heuristics​

  1. Treat JVM tuning as iterative experimentation with measurable hypotheses.
  2. Baseline key metrics before any flag change.
  3. Keep service-specific runbooks for memory, GC, and thread incidents.

Compare Next​


Interview Questions​

Q: How do you choose between G1 and ZGC for a backend service?​

A: G1 is a strong default for balanced throughput and latency; ZGC is preferred for strict low-latency requirements with larger heaps.

Q: What metrics indicate GC tuning is required?​

A: Rising tail latency, frequent long pauses, promotion failures, and high GC CPU share under normal load.

Q: Why is allocation rate often more important than heap size?​

A: High allocation churn drives GC pressure even on large heaps, so reducing object churn often beats increasing memory.

Q: How do classloader leaks usually appear in production?​

A: Metaspace growth over time after redeploy/plugin cycles and inability to reclaim old class metadata.

Q: What is a practical JVM tuning workflow for senior engineers?​

A: Baseline, form a hypothesis, apply one controlled change, validate with load and latency data, then iterate.

Q: Why are full GC events high priority incidents?​

A: They are stop-the-world and can trigger latency spikes, timeouts, and cascading failures.

Q: How do you explain JIT warmup impact during autoscaling?​

A: New pods initially run colder code paths, so p95/p99 latency can temporarily degrade until optimization stabilizes.