JXRay Memory Analyzer analyzes binary heap dumps, which are essentially snapshots of the JVM memory. There are two main methods of obtaining heap dumps. To get a dump
from a running application, invoke the
jmap utility that comes with the JDK:
> jmap -dump:live,format=b,file=myheapdump.hprof <app JVM pid>
For more information, invoke
jmap -help or check its online documentation.
You can also tell the JVM to generate a heap dump if your application fails with OutOfMemoryError. To do this, add the -XX:+HeapDumpOnOutOfMemoryError flag to the target JVM's command line. See the relevant HotSpot JVM documentation for more details. A heap dump obtained in this way can be more informative, since it contains a snapshot of your application's memory when it's completely filled up. This often helps to expose data structures that are the worst in terms of memory consumption, and thus should be optimized first.
Android produces JVM heap dumps in its own format. To convert an Android heap dump into the common format, you can use the hprof-conv tool provided in the Android SDK, for example:
> hprof-conv dump.hprof dump-converted.hprof
For more information, check this article.
To run the JXRay Analyzer, simply invoke the jxray.sh script from the command line like this:
> jxray.sh myheapdump.hprof myheapdump.html
The script expects that the JDK
bin directory is on your
PATH. Alternatively, you can specify the JDK location to the script via the JXRAY_JAVA_HOME environment variable, that should point to your JDK's root directory. To
adjust the JVM settings (for example, if your dump is very big and you need to give the JVM more memory), you may either
You typically need to set -Xmx to at least the same value as the size of the dump file. For better performance, a bigger value (between 1.3x .. 1.7x the size of the analyzed heap dump) is recommended, especially if your disk is slow and/or your heap dump contains a large number of small objects. It is also advised that your machine has at least 4 CPU cores available (otherwise, for one thing, the performance of the garbage collector in the JVM on which JXRay runs may suffer). 8 CPU cores are recommended for optimum performance, and a bigger number may further improve speed in some situations.
Note, however, that JXRay memory requirements may be relaxed for some heap dumps. The rule of the thumb is that the better a heap dump compresses with zip or gzip, the less memory is needed to process it. For (rare) heap dumps that get compressed by a factor of 10 or so, JXRay can run with a heap that's considerably smaller than the dump file size. That's because such dumps tend to contain mostly duplicate objects, and thus the amount of informatio that JXRay has to keep while processing them is smaller than usual.
The output of JXRay goes to the specified file in HTML format. You can also use the
-email <address> command line flag to make the tool send the report to the
specified address. Note that for that to work, your machine should have the
/usr/bin/mail utility configured properly.
JXRay has a number of other command flags that are useful in some rare situations. You can learn about them by invoking jxray.sh -help
One other flag that you may find useful occasionally is -extra_classes <class1,[class2,...]> It allows you to specify one or more classes that the tool would not report otherwise, since their instances take too little (less than 0.1%) of memory. In most situations such classes are not interesting and would just clutter the output. However, when debugging some problems, you may need to know how many instances of some small, but important class are present in memory, what values they contain, etc. That's when this flag comes handy.
A report generated by the JXRay Analyzer for the given heap dump consists of a number of sections. When you first open the page, all sections except the topmost one are collapsed to avoid clutter. You can click on the small black triangle to the left of the section title to expand and view the given section. Within most sections, many more things can be expanded and then collapsed again.
Each section is devoted to a specific heap metrics or potential problem. Whenever possible, the tool reports the overhead (waste) associated with the given problem, in bytes and as a percentage of the whole used heap size, for example "560,810K (31.4%)". The overhead is how much memory you could save in the ideal case, if you completely eliminated the given problem.
MOST IMPORTANT ISSUES. The topmost part of the report tells you immediately if your dump has any serious issues. Only problems with overhead greater than a certain threshold are listed here. When the overhead, i.e. the degree of importance, of some problem is really high, it's marked in yellow or red. The details of each problem are given in the corresponding section later in the report.
Top-Level Stats. When you expand this section, you see the total number and size, in Kbytes, of all objects in the heap, as well as a breakdown by object type (instances vs
object arrays vs primitive arrays) and live/garbage status. Some other general information about the heap is provided as well. If any thread threw an
OutOfMemoryError when this dump was
generated, its stack trace is shown in this section as well.
Where Memory Goes
The next few sections help you understand where memory goes: instances of what classes take memory (object histogram), what GC roots and reference chains hold all these objects in memory, how much memory and how many objects are live vs garbage, how much memory is used by object headers (rather than the useful data that's stored in objects), etc.
Where Memory Goes, by Class. This section, also called object histogram, gives a breakdown of memory used by every class taking 0.1% or more of the used heap space. The same
0.1% rule is generally used in other sections as well; this value is adjustable via the
-min_reported_percent command line flag. For each class the report displays:
As you can see, shallow and implementation-inclusive size is the same for byte, short and int arrays, as well as most other objects (not shown here). However, these sizes are different
HashMap objects (as well as other collections that JXRay knows about), and String, StringBuilders, and StringBuffers. That's because a collection is comprised internally of a number of objects: the main one (e.g. a HashMap instance) and the implementation objects (e.g. the table array backing the HashMap), a HashMap$Node object for each key-value pair, etc. Similarly, a
String is technically two objects: an
String class itself (that’s what shallow size is for) and a
char array with the string contents.
The (variable) size of that array, together with the fixed
String instance size, is implementation-inclusive size. For
HashMap and other collection types, the shallow size is the size of the "naked"
HashMap instance, whereas
implementation-inclusive size adds the size of all the implementation objects backing this collection.
The fourth column, direct retained memory for objects of the given class C, is the memory used by instances of C plus all the objects referenced directly by these instances. This metrics
is especially useful for classes such as
java.net.URI, that "exclusively own" the objects that they reference (e.g.
String host, protocol, etc.) For such objects, direct
retained memory immediately indicates how much memory these objects practically consume.
If you want to find out where instances of the given class "come from" - that is, what data structures hold them in memory, all the way down to GC roots - you can click on the "Reference chains" line. That will open a subsection with all the reference chains leading to groups (clusters) of same-type objects (HashMap instances in our example above). An object cluster, by definition in JXRay, is a group of objects of the same type, all reachable via the same reference chain. Generally, only clusters that take at least 0.1% of the used heap are displayed.
One powerful feature of JXRay is that it joins together (aggregates) all objects that are reachable via the same reference chain. The intermediate objects in the reference chain itself are aggregated too. For example, an intermediate HashSessionManager._sessions reference in the above example may correspond to one or many references to ConcurrentHashMap objects in memory. The big adavantage of aggregated reference chains is that the repetitive details that would otherwise clutter the output (potentially multiple almost identical ConcurrentHashMap instances), are removed. The remaining, distilled information allows you to see immediately what code is responsible for the fact that the given objects are held in memory.
To give you an idea of what kind of objects are contained in the given cluster, JXRay takes a random sample of objects. You can open this sample by clicking on the first line for the given class (the one that contains the class name, number of objects, etc.). Sometimes a sample of objects can be really revealing, as in the example below, where it turns out that most ArrayLists in the given cluster are very small. When practical, JXRay prints the contents of objects (or the first few elements for collections).
JXRay is able to further aggregate and distill reference chains, showing you the "Expensive fields". Quite often there are two or more different reference chains that end with the same data field, for example:
JXRay aggregates the above reference chains (plus some more, where each individual one holds just a small number of objects), and presents the result as
When many long reference chains ending with com.google.protobuf.LiteralByteString.bytes are joined in this way, it becomes clear that this class/field is responsible for a really high number of arrays and, consequently, used memory.
Where Memory Goes, by GC Root (Check for Memory Leaks). This information is crucial for understanding sources of high memory consumption and memory leaks.
To answer the "who holds all these objects in memory" question, we need to be able to track the reference chains from GC roots to objects. To collect this information, JXRay scans the object tree in the heap dump starting from GC roots. If some objects are reachable from several roots, the priority is given to (a) the more informative root, such as a static variable as opposed to a JNI reference, and (b) a shorter reference chain. Once the entire object tree has been scanned, the tool groups all objects by their respective GC roots.
If your application’s behavior suggests a memory leak or, generally, retention of a high amount of data, the GC roots and reference chain(s) holding most memory, on the top of this section, is what you should look at.
Live vs Garbage Objects. When JXRay scans the heap dump, it determines the live/garbage status of each object. The table in this section presents aggregated information for all objects. Types are sorted by the total live size of their instances. Note that in this section the tool uses only shallow size for each type (no implementaiton-inclusive size, as in the "Where Memory Goes, by Class" section). This is due to the fact that when collections and Strings become garbage, some of their implementation objects may be GCed and some not, leaving the remaining garbage data structures in inconsistent state. That, in turn, would not allow the tool to accurately determine implementation-inclusive size of such garbage objects.
Object Headers. Each object in the JVM memory has a header - essentially a JVM-internal record that contains a pointer to the object’s class, bits used by the GC and synchronization locking mechanisms, etc. Object header is not small: its size varies between 8 and 16 bytes depending on the JVM process mode (32-bit or 64-bit, with additional “narrow pointer” vs. “wide pointer” modes for the latter). An array uses additional 4 bytes to designate its size. If your application creates a large number of small objects - those with useful workload of a few bytes - the total size of headers of these objects compared to the amount of data that they contain, can become really large.
Memory Retained by Objects Awaiting Finalization. Any class can override the finalize() method of java.lang.Object to perform cleanup and similar operations on an instance before it gets garbage collected. In certain situations, this may cause some objects that are garbage from the application's standpoint (they are not reachable from any GC roots anymore) to stay in memory. For example, when a very large number of "finalizable" objects are being created by the application all the time, the single finalization thread may be unable to keep up. Another rare, but real problem is the finalize() method that gets blocked or deadlocked. In this case, the whole finalization thread is blocked and the number of unfinalized objects keeps growing, causing a memory leak.
The JVM puts all unfinalized objects into a single queue, with the root at java.lang.ref.Finalizer.queue static field. If objects reachable from this root take a significant enough amount of memory, the whole tree will be presented in this section as a number of reverse reference chains similar to those in the previous sections.
Memory Waste due to Specific Problems
Each of the following sections is devoted to a specific problem, that can potentially be fixed by making changes to the code. In each section, the first part shows the impact of a specific problem, such as duplicate strings or suboptimal collections, and presents the top offending objects or types (for example, string values that are duplicated most). The second part of the section presents reference chains that hold the problematic objects in memory.
Duplicate Strings. This is one of the most common memory problems observed in Java applications. Duplicate strings, that is, multiple
String objects with the same logical value, such as
“0” in the example below - are often created when a lot of data is read from external
sources, e.g. a database. If such strings are then kept in memory for long time, and/or each of them is big, they may become a significant source of waste. Getting rid of duplicate strings typically
requires adding special “deduplication” code. The most common, scalable and reliable way to do that is by calling the String.intern() method. It's best to add this call at the location where duplicate string are generated, or where these strings are assigned to
permanent data structures (e.g. in constructors).
In the subsection that follows, JXRay presents the familiar reverse reference chains leading to the biggest clusters of duplicate strings. When the problematic objects are unreachable (garbage), it attempts to find clusters of similar live strings - they may give you a clue about the source of such strings.
In the example below, 13,885K, or 2.8% of the used heap, is the overhead (waste) due to the duplication of strings reachable via the reference chain unfolded below. That is, if we interned all these strings, our heap consumption will be reduced by 2.8 per cent. The number 134660 is the number of duplicate strings referenced by this reference chain, and 100% is the percentage of stirngs that are duplicate relative to all the strings reachable via this reference chain. In this case, all the strings have duplicates (but this is not always the case). The "(10 unique)" means that if all the copies were eliminated, we will see only 10 unique strings here. If you click on this line, it will expand, presenting a table with several specimens of the duplicate strings in this cluster.
To learn more about duplicate strings and changing the code to eliminate them, check the following article
Bad Collections. Java and Scala standard collection classes, such as
java.util.ArrayList, java.util.HashMap etc., are very useful. However, applications may sometimes abuse them, resulting in wasted memory. The JXRay
Analyzer currently recognizes the following problems with collections:
Empty collections are those that are initialized but don’t contain any elements (workload). Their overhead is defined as the size of the entire collection implementation. It can be really high, varying between around 100 bytes for an ArrayList to 500..1000 bytes for complex structures like ConcurrentHashMap. Empty collections are usually easy enough to avoid with the help of some additional code that allocates collections lazily and performs null checks where necessary.
Single-element collections contain only one element. Their overhead is defined in the same way as for empty collections. If in some place in the program all collections contain only
one element, you may find that there was a mistake there, and you can simply replace a collection with a direct reference to the object. Otherwise, i.e. if for some legitimate reason most of the time
most collections have only one element, but sometimes may contain several, you may implement custom code that replaces say
ArrayList<Foo> x data field
Object x. Then, depending on the number of objects to store, this code would set
x to point to either a single
Foo object or
Small collections are, by definition in JXRay Analyzer, those that contain between 2 and 4 elements. Such collections are suboptimal in most cases, since the size of their workload
(2 to 4 object elements, i.e. object pointers) is still small compared to the size of the collection implementation. Furthermore, many collections that are backed by an array by default allocate
arrays that are considerably bigger, typically 10 to 16 elements, resulting in even more waste. Note that say a hashmap with a small number of elements will likely not perform searches and other
operations faster than a plain array with handwritten search functionality. Thus if some code in your application manages a large number of small collections such as
may be worth considering replacing them with plain arrays, that would result in much smaller overhead. At a minimum, it is worth allocating these lists with a small initial capacity: use
new ArrayList(2)' instead of '
Sparse collections are those that that use an array internally (for example, ArrayList or HashMap), contain more than 4 elements, and have more than a half of the internal array elements equal to null. This can happen if either a collection was allocated with initial capacity significantly larger than needed, or if many elements have been removed from it (most array-based collections never "shrink" when elements are removed). If your application has a large overhead due to sparse collections, consider allocating them with more suitable initial capacity or re-creating them after many elements are removed.
Lists with too many null elements are, by definition in JXRay Analyzer, those where more than 25% of elements are null. If a list contains a large number of nulls, it is probably a sign of a bug or a suboptimal data structure. In the latter case, you may want to consider changing your code at a higher level to store the same information in a more compact form.
In the above table, the first column is the overhead for the given combination of a collection type (last column) and problem (second column). For example, if we got rid of all single-element ArrayLists, we would save 457,454K, or 16.7% of the used heap. The third column shows how many objects of this type have the given problem, and what's their percentage. That is, 84% of all ArrayLists in this heap dump are single-element, 11% of ArrayLists are small, and only the remaining 5% are "normal".
To learn more about suboptimal collections, and changing your code to get rid of them, check the following article
The next subsection, as in all other bad/duplicate object sections, contains the familiar reverse reference chains leading to the clusters of problematic collections.
Bad Object Arrays. Similarly to bad collections, object arrays of wrong size, or with likely unused contents, result in wasted memory. The JXRay Analyzer currently recognizes the following problems with object arrays:
Empty arrays are those that contain only null elements. Their overhead is defined as the size of the entire array. Empty arrays can be avoided, for example, with the help of some additional code that allocates arrays lazily and performs null checks where necessary.
Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. If the code really needs zero-length arrays for some reason, multiple arrays can be easily replaced with a singleton zero-lenth array.
Length 1 arrays, whose overhead is also defined as the size of the entire array, can be avoided in the same way as single-element collections. That is, either Foo x can be replaced with
Foo x, or a more complex solution may be implemented. The latter would replace
Foo x field with
Object x, that would ultimately point either to an array or
to a single
Single-element arrays, with overhead again defined as the entire array size, are those that have length greater than 1, but contain only one non-null element. They can be dealt with in the same way as length 1 arrays.
Sparse arrays, by definition in JXRay, are those that contain 70% or more of null elements. You may want to consider allocating arrays with smaller capacity, or changing your code at a higher level to store the same information in a more compact form.
Bad Primitive Arrays. Similarly to suboptimal collections and object arrays, primitive arrays of wrong size, or with likely unused contents, cause memory waste. The JXRay Analyzer currently recognizes the following problems with primitive arrays:
Empty arrays are those that contain only zeroes. Their overhead is defined as the size of the entire array. Empty arrays can be avoided with the help of some additional code that allocates arrays lazily and performs null checks where necessary. However, in some cases, e.g. with I/O buffers, it's hard to avoid having some temporarily empty byte, int etc. arrays in memory.
Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. Multiple length 0 arrays can be easily replaced with a singleton array.
Length 1 arrays, whose overhead is also defined as the size of the entire array, can sometimes be avoided in the same way as single-element collections. That is, int x can be replaced with int x. Alternative application-specific solutions might also be possible.
Arrays with a high number of trailing zeroes are, by definition in JXRay, those that have the last 25% or more of elements equal to zero. The overhead of such an array is the size, in bytes, of all the trailing zero elements. Such arrays are often buffers that have been created too big, and the problem with them may be addressed by reducing the buffer size.
Sparse arrays, by definition in JXRay, are those that contain 70% or more of zero elements. You may want to consider allocating arrays with smaller capacity, or changing your code at a higher level to store the same information in a more compact form.
Zero high bits arrays are those that contain 70% or more of elements with unused (zero) high half. This only applies to integer types taking two or more bytes, i.e. short, char, int and long. For example, an int array with zero high bits will only contain elements in the -32,768..32,767 range. If in some array all or most elements have zero high bits, you may consider changing its type to a narrower one, e.g. int instead of long, etc.
The breakdown by problems, and reporting of reference chains for bad primitive arrays is very similar to that for bad collections and bad object arrays.
Boxed Numbers. Boxed numbers are instances of classes such as
java.lang.Integer, java.lang.Long etc. One common reason for
using them is the fact that the JDK doesn’t have standard collection classes that store plain
longs etc. That
is, if you want to create say a
HashMap that maps names to numbers, you need to instantiate either a
java.util.HashMap<String, Integer> or use third-party libraries.
Unfortunately, for most real-life applications data structures with boxed numbers would be a bad choice from the performance standpoint. That’s because storing, for
java.lang.Integer object that stores a 4-byte integer takes between 16 and 24 bytes (depends on whether the JVM runs in the "narrow pointer" or "wide pointer" mode).
Plus, there is a pointer to that object that takes 4 or 8 bytes. That is not to mention other effects, like application execution slowdown due to reduced cache locality, multiple memory reads instead
of a single one, immutability of boxed numbers (therefore to e.g. increment it you need to create a new object), etc. Thus, if your application's heap contains a significant quantity of boxed
numbers, your best choice is to stop using them, which likely means switching to third-party libraries with data structures that store plain numbers.
Reporting of boxed numbers in JXRay is similar to that for bad collections and arrays.
Duplicate Objects of arbitrary types occur much more rarely than, for example, duplicate Strings. But when they do, they almost always cause singificant overhead. Currently JXRay analyzes only instances of non-collection classes for duplication. The analysis is "shallow". That is, consider objects A and B that point to objects X and Y, respectively. If A and B have identical contents (except for the references to X and Y that are different), and X and Y have identical contents, A and B will still be considered different. Duplicate objects, if they are immutable, can often be easily eliminated using some form of a canonicalization cache.
The following subsection provides the standard reference chains leading to clusters of duplicate objects. For each cluster some specimen objects are presented in the same format as in the table shown above:
Duplicate Primitive Arrays. Such arrays are similar to duplicate strings: they are separate objects, but their contents are the same. Note that JXRay is smart and can
distinguish between “standalone”
char arrays and those that belong to
String instances - the latter are
not analyzed in this section.
Heap Size Configuration. The HotSpot JVM stores objects in memory in different formats depending on the maximum heap size that you specify via the -Xmx command line option. When the maximum heap size is less than about 32 gigabytes (the so-called "narrow pointer mode"), ordinary instances have a 12-byte header, arrays have 16-byte headers, and each object pointer takes 4 bytes. Otherwise, i.e. when -Xmx is higher than 32GB ("wide pointer mode"), ordinary instances have 16-byte headers, arrays have 20-byte headers, and each object pointer takes 8 bytes. This is necessary for the JVM to support object addresses in really big heaps. But it means that as soon as your -Xmx value crosses the 32GB threshold, objects in your application suddenly become bigger. It may not be a problem if your application works mostly with big primitive arrays - in this case the effect of increased object headers and pointers is often negligible. However, if your application employs many pointer-intensive data structures such as trees, hash maps, etc., or just has many small objects, your memory consumption may suddenly grow more than 1.5 times! For more information, see Why 35GB Heap is Less Than 32GB
When JXRay detects that your application uses wide pointers, it calculates how much memory all your objects would use if you switched to narrow pointers by setting -Xmx below the 32GB threshold. It turns out that sometimes savings are big, and your new used heap size will be well below 32GB. In that case, you may be able to greatly reduce memory consumption by just setting -Xmx=31g or less. Keep in mind, however, that the heap dump may not be taken at the moment when your application's memory utilization is at its peak, and make sure that your new maximum heap size will not cause your application to fail with OutOfMemoryError, or the JVM to run garbage collection too frequently.
Very Long (Over 1000 Elements) Reference Chains. Very long chains of objects can be a problem, in that they are slow to traverse, may result in an extra overhead object per a workload object, etc. If any such chain is detected, it is presented in this section in the standard reverse format.
Off-heap (native) memory used by java.nio.DirectByteBuffers
Almost every Java application uses some memory outside the JVM heap. This is done primarily by the I/O code, since the OS can only read/write memory areas that are outside the JVM heap (within the
heap, objects can be relocated at any moment by the GC). The standard way for a pure Java application to allocate native memory is through
java.nio.DirectByteBuffer objects. When
such an object is created via the ByteBuffer.allocateDirect() call, it allocates
the specified amount (capacity) of native memory using malloc() OS call.
This memory is released only when the given
DirectByteBuffer object is garbage collected and
its internal "cleanup" method is called (the most common scenario), or when this method is invoked explicitly.
Most applications use a modest amount of native memory, but in some situations, e.g. when
DirectByteBuffers are not GCed in time, the memory reserved by them can grow too large. Since
DirectByteBuffer has an internal
capacity data field, JXRay can calculate how much native memory is used by all these objects. As usual, the
tool presents reference chains leading to all the buffers holding significant amounts of memory
To learn more about the ways that the JVM can use (and waste) native memory, check this article.