Notes on Java Performance

- February 06, 2019

Performance monitoring, performance profiling and performance tuning are different activities. For example, monitoring is not an intrusive action but profiling is. Because it may be changing the responsiveness of the running application.

User CPU utilization vs Kernel CPU utilization: The ideal situation is when Kernel CPU utilization is 0%. So ideally, all the CPU cycles must be spent on our application code.

JVM just in time compiler performs dynamic optimizations. So it makes decisions while the program is running and generates native code with better performance.

Java 1.0 was completely interpreted but beginning with Java 1.1.8, there was a just-in-time compiler. It was 8 times faster. After that, paralellizing the garbage collector was a huge improvement.

There are hundreds of Java tuning flags.

https://github.com/ScottOaks/JavaPerformanceTuning

Boolean flags use this syntax: -XX:+FlagName enables the flag, and -XX:-FlagName disables the flag.

32 BIT VS 64 BIT

From OS Perspective:

32-bit system can address a maximum of 4 GB of RAM. The actual limit is often less around 3.5 GB.
What’s important is that a 64-bit computer (which means it has a 64-bit processor) can access more than 4 GB of RAM. If a computer has 8 GB of RAM, it better have a 64-bit processor. Otherwise, at least 4 GB of the memory will be inaccessible by the CPU.

From JVM Perspective:

64-bit vs. 32-bit really boils down to the size of object references, not the size of numbers. In 32-bit mode, references are four bytes, allowing the JVM to uniquely address 2^32 bytes of memory. This is the reason 32-bit JVMs are limited to a maximum heap size of 4GB (in reality, the limit is smaller due to other JVM and OS overhead, and differs depending on the OS).

In 64-bit mode, references are (surprise) eight bytes, allowing the JVM to uniquely address 2^64 bytes of memory, which should be enough for anybody. JVM heap sizes (specified with -Xmx) in 64-bit mode can be huge.

But 64-bit mode comes with a cost: references are double the size, increasing memory consumption. This is why Oracle introduced "Compressed oops". With compressed oops enabled, object references are shrunk to four bytes, with the caveat that the heap is limited to four billion objects (and 32GB Xmx).

volatile keyword in Java

Lets say there are two threads and they run on different CPU cores. These CPUs have their own memory caches, in addition to the main memory. A thread can change the value of a variable but the the change is not committed to the main memory yet. In this case, the other thread may not be aware of the update. To be sure, we need to declare the variable as "volatile", so that each update is reflected in the main memory.
volatile declaration is not always enough:

As soon as a thread needs to first read the value of a volatile variable, and based on that value generate a new value for the shared volatile variable, a volatile variable is no longer enough to guarantee correct visibility. The short time gap in between the reading of the volatile variable and the writing of its new value, creates an race condition where multiple threads might read the same value of the volatile variable, generate a new value for the variable, and when writing the value back to main memory - overwrite each other's values.

If our variable is volatile, it is guaranteed that anything that happens before writing to that variable will be visible to other threads. This is called happens-before relationship.

http://tutorials.jenkov.com/java-concurrency/volatile.html

volatile keyword does not block the threads, for that we need synchronized keyword or data types like AtomicInteger

One of the performance characteristics of Java is that code performs better the more it is executed. This is called the warm-up period.

Generational Garbage Collection

Heap is divided into a number of sections called generations, each of which will hold objects according to their “age” on the heap. Empirical analysis of applications has shown that most objects are short-lived. So, younger objects can be scanned more than older ones, using minor/major collection cycles. Today, almost all garbage collectors are generational.

Heap has 3 generations: Young generation, old generation and permanent generation (PermGen). Young generation consists of Eden space, Survivor1 and Survivor2. In minor garbage collection cycles, young generation objects are scanned. And some are promoted to older generation. Then a major gargabe collection cycle is performed. It was shown with experiments that old generation takes more space and objects are usually short lived, so less major collection cycles is a good idea for performance.

Question: Is it possible to «resurrect» an object that became eligible for garbage collection?

Hint: finalize() method

Memory issues

Fastest garbage collection is the garbage collection that does not happen. So try to decrease your memory usage.
Fat data: object header

new Object(); // 16 bytes (8 byte for object pointer, 8 byte for locking etc.)
new byte[0]; // 16 bytes for object header + 4 bytes for array size + 4 byte padding = 24 bytes
Subclassing + padding can be problem. Imagine a class that holds 1 byte: 16 byte object header, 1 byte data and 7 byte padding. When you subclass this and the subclass has 1 byte of data field, again 1 byte data and 7 byte padding!

This means sometimes you may have to choose between nice design and good performance.

Search This Blog

COMPUTER ENGINEERING DIARIES

Kotlin Language Features Related to Null Handling