Observability of the Java Virtual Machine

The JVM is one of the most observable runtimes. It provides us lots of tools for troubleshooting a JVM application in production.

1. Thread observability

Threads are how the JVM actually does work. When something is wrong in production, the symptom is almost always a thread: stopped, blocked, leaking etc. Thread dumps work on any JVM with no instrumentation, no agents, no restarts.


<Example project link with /threaddump endpoint>


        // (1) Deadlock — two threads grab the same pair of locks in opposite order.

        new Thread(() -> grab(LOCK_A, LOCK_B), "deadlock-A-then-B").start();

        new Thread(() -> grab(LOCK_B, LOCK_A), "deadlock-B-then-A").start();



http://localhost:8080/actuator/threaddump


To list the JVMS, we can use the command below.


PS C:\observe-jvm> jps -lv

25296 jdk.jcmd/sun.tools.jps.Jps -Dapplication.home=C:\Program Files\Microsoft\jdk-21.0.3.9-hotspot -Xms8m -Djdk.module.main=jdk.jcmd

9588 com.observejvm.l01threads.ThreadDemo -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2025.2\lib\idea_rt.jar=60522 -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8


Now we know the PID of the JVM process so we can use the below command to get a thread dump:


PS C:\observe-jvm> jcmd 9588 Thread.print


...


Found one Java-level deadlock:

=============================

"deadlock-A-then-B":

  waiting to lock monitor 0x000001c9c1a66340 (object 0x00000006043b0458, a java.lang.Object),

  which is held by "deadlock-B-then-A"


"deadlock-B-then-A":

  waiting to lock monitor 0x000001c9fe5f8390 (object 0x00000006043b0468, a java.lang.Object),

  which is held by "deadlock-A-then-B"


Java stack information for the threads listed above:

===================================================

"deadlock-A-then-B":

        at com.observejvm.l01threads.ThreadDemo.grab(ThreadDemo.java:62)

        - waiting to lock <0x00000006043b0458> (a java.lang.Object)

        - locked <0x00000006043b0468> (a java.lang.Object)


So the jcmd tool allows us to detect deadlocks and it provides which exact method call is causing that. 


The second case we'll look at is a thread that waits for a stuck downstream call:


new Thread(() -> sleep(Long.MAX_VALUE), "stuck-downstream-call").start();


PS C:\observe-jvm> jcmd 2616 Thread.print


"stuck-downstream-call" #49 [668] prio=5 os_prio=0 cpu=0.00ms elapsed=16.36s tid=0x000001ce690a9e70 nid=668 waiting on condition  [0x00000048faeff000]

   java.lang.Thread.State: TIMED_WAITING (sleeping)

        at java.lang.Thread.sleep0(java.base@21.0.3/Native Method)

        at java.lang.Thread.sleep(java.base@21.0.3/Thread.java:509)

        at com.observejvm.l01threads.ThreadDemo.sleep(ThreadDemo.java:79)

        at com.observejvm.l01threads.ThreadDemo.lambda$main$0(ThreadDemo.java:42)


The third case we'll investigate is about 3 threads one of which is using a lot of CPU time but the others are mostly idle:


// (3) A small worker pool: one CPU-hot, two mostly idle.

        var pool = Executors.newFixedThreadPool(3, r -> {

            Thread t = new Thread(r);

            t.setName("worker-" + t.threadId());

            return t;

        });

        pool.submit(ThreadDemo::burnHot);

        pool.submit(ThreadDemo::idleLoop);

        pool.submit(ThreadDemo::idleLoop);



private static void burnHot() {

        long x = 1;

        while (true) {

            x = x * 3 + 1;

        }

    }


    private static void idleLoop() {

        while (true) sleep(1000);

    }



PS C:\Users\bahad> jcmd 27236 Thread.print


"worker-49" #49 [10640] prio=5 os_prio=0 cpu=10531.25ms elapsed=10.60s tid=0x0000022d87610d30 nid=10640 runnable  [0x000000a85e3fe000]

   java.lang.Thread.State: RUNNABLE

        at com.observejvm.l01threads.ThreadDemo.burnHot(ThreadDemo.java:77)


"worker-50" #50 [14692] prio=5 os_prio=0 cpu=0.00ms elapsed=10.60s tid=0x0000022d8760f2f0 nid=14692 waiting on condition  [0x000000a85e4ff000]

   java.lang.Thread.State: TIMED_WAITING (sleeping)

        at java.lang.Thread.sleep0(java.base@21.0.3/Native Method)

        at java.lang.Thread.sleep(java.base@21.0.3/Thread.java:509)

        at com.observejvm.l01threads.ThreadDemo.sleep(ThreadDemo.java:86)

        at com.observejvm.l01threads.ThreadDemo.idleLoop(ThreadDemo.java:82)


Note:  The JVM has a small number of housekeeping daemon threads. It is easy to recognize them and some of them are listed below:


Reference Handler

Finalizer

Signal Dispatcher

Common-Cleaner

Notification Thread

Attach Listener

Service Thread


2- Memory Observability

Every Out Of Memory Error (OOM) has a story. The heap dump is the witness statement for us.


Memory issues are the second-most-common production incident after threads. An OOM kills the  whole process. Auto-restart hides the leak, the cycle repeats, nobody investigates until customers notice.


Heap dumps contain enormously more information: a byte-by-byte snapshot of every live object, every reference, every field. 


<Example project link with /heapdump endpoint>


After running the sample application, we run the command below to take a heap dump:


PS C:\observe-jvm> jcmd 20484 GC.heap_dump C:\observe-jvm\leak.hprof

20484:

Dumping heap to C:\observe-jvm\leak.hprof ...

Heap dump file created [956318512 bytes in 0.678 secs]


We can also get the heap dump from the actuator endpoint:


http://localhost:8080/actuator/heapdump


Then we can open the hprof file using Intellij Idea:





(To be continued)



Comments

Popular posts from this blog

The WeakReference class, monitoring memory leak and garbage collection in a Java application

Simplescalar Simulator - Part 2: sim-outorder.c

Notes on Java Performance