Observability of the Java Virtual Machine
The JVM is one of the most observable runtimes. It provides us lots of tools for troubleshooting a JVM application in production.
1. Thread observability
Threads are how the JVM actually does work. When something is wrong in production, the symptom is almost always a thread: stopped, blocked, leaking etc. Thread dumps work on any JVM with no instrumentation, no agents, no restarts.
<Example project link with /threaddump endpoint>
// (1) Deadlock — two threads grab the same pair of locks in opposite order.
new Thread(() -> grab(LOCK_A, LOCK_B), "deadlock-A-then-B").start();
new Thread(() -> grab(LOCK_B, LOCK_A), "deadlock-B-then-A").start();
http://localhost:8080/actuator/threaddump
To list the JVMS, we can use the command below.
PS C:\observe-jvm> jps -lv
25296 jdk.jcmd/sun.tools.jps.Jps -Dapplication.home=C:\Program Files\Microsoft\jdk-21.0.3.9-hotspot -Xms8m -Djdk.module.main=jdk.jcmd
9588 com.observejvm.l01threads.ThreadDemo -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2025.2\lib\idea_rt.jar=60522 -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8
Now we know the PID of the JVM process so we can use the below command to get a thread dump:
PS C:\observe-jvm> jcmd 9588 Thread.print
...
Found one Java-level deadlock:
=============================
"deadlock-A-then-B":
waiting to lock monitor 0x000001c9c1a66340 (object 0x00000006043b0458, a java.lang.Object),
which is held by "deadlock-B-then-A"
"deadlock-B-then-A":
waiting to lock monitor 0x000001c9fe5f8390 (object 0x00000006043b0468, a java.lang.Object),
which is held by "deadlock-A-then-B"
Java stack information for the threads listed above:
===================================================
"deadlock-A-then-B":
at com.observejvm.l01threads.ThreadDemo.grab(ThreadDemo.java:62)
- waiting to lock <0x00000006043b0458> (a java.lang.Object)
- locked <0x00000006043b0468> (a java.lang.Object)
So the jcmd tool allows us to detect deadlocks and it provides which exact method call is causing that.
The second case we'll look at is a thread that waits for a stuck downstream call:
new Thread(() -> sleep(Long.MAX_VALUE), "stuck-downstream-call").start();
PS C:\observe-jvm> jcmd 2616 Thread.print
"stuck-downstream-call" #49 [668] prio=5 os_prio=0 cpu=0.00ms elapsed=16.36s tid=0x000001ce690a9e70 nid=668 waiting on condition [0x00000048faeff000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep0(java.base@21.0.3/Native Method)
at java.lang.Thread.sleep(java.base@21.0.3/Thread.java:509)
at com.observejvm.l01threads.ThreadDemo.sleep(ThreadDemo.java:79)
at com.observejvm.l01threads.ThreadDemo.lambda$main$0(ThreadDemo.java:42)
The third case we'll investigate is about 3 threads one of which is using a lot of CPU time but the others are mostly idle:
// (3) A small worker pool: one CPU-hot, two mostly idle.
var pool = Executors.newFixedThreadPool(3, r -> {
Thread t = new Thread(r);
t.setName("worker-" + t.threadId());
return t;
});
pool.submit(ThreadDemo::burnHot);
pool.submit(ThreadDemo::idleLoop);
pool.submit(ThreadDemo::idleLoop);
private static void burnHot() {
long x = 1;
while (true) {
x = x * 3 + 1;
}
}
private static void idleLoop() {
while (true) sleep(1000);
}
PS C:\Users\bahad> jcmd 27236 Thread.print
"worker-49" #49 [10640] prio=5 os_prio=0 cpu=10531.25ms elapsed=10.60s tid=0x0000022d87610d30 nid=10640 runnable [0x000000a85e3fe000]
java.lang.Thread.State: RUNNABLE
at com.observejvm.l01threads.ThreadDemo.burnHot(ThreadDemo.java:77)
"worker-50" #50 [14692] prio=5 os_prio=0 cpu=0.00ms elapsed=10.60s tid=0x0000022d8760f2f0 nid=14692 waiting on condition [0x000000a85e4ff000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep0(java.base@21.0.3/Native Method)
at java.lang.Thread.sleep(java.base@21.0.3/Thread.java:509)
at com.observejvm.l01threads.ThreadDemo.sleep(ThreadDemo.java:86)
at com.observejvm.l01threads.ThreadDemo.idleLoop(ThreadDemo.java:82)
Note: The JVM has a small number of housekeeping daemon threads. It is easy to recognize them and some of them are listed below:
Reference Handler
Finalizer
Signal Dispatcher
Common-Cleaner
Notification Thread
Attach Listener
Service Thread
2- Memory Observability
Every Out Of Memory Error (OOM) has a story. The heap dump is the witness statement for us.
Memory issues are the second-most-common production incident after threads. An OOM kills the whole process. Auto-restart hides the leak, the cycle repeats, nobody investigates until customers notice.
Heap dumps contain enormously more information: a byte-by-byte snapshot of every live object, every reference, every field.
<Example project link with /heapdump endpoint>
After running the sample application, we run the command below to take a heap dump:
PS C:\observe-jvm> jcmd 20484 GC.heap_dump C:\observe-jvm\leak.hprof
20484:
Dumping heap to C:\observe-jvm\leak.hprof ...
Heap dump file created [956318512 bytes in 0.678 secs]
We can also get the heap dump from the actuator endpoint:
http://localhost:8080/actuator/heapdump
Then we can open the hprof file using Intellij Idea:
(To be continued)
Comments