JXRay Memory Analyzer analyzes binary heap dumps, which are essentially snapshots of the JVM memory. There are two main methods of obtaining heap dumps. To get a dump
from a running application, invoke the
jmap utility that comes with the JDK:
> jmap -dump:live,format=b,file=myheapdump.hprof <app JVM pid>
For more information, invoke
jmap -help or check its online documentation.
You can also tell the JVM to generate a heap dump if your application fails with OutOfMemoryError. To do this, add the -XX:+HeapDumpOnOutOfMemoryError flag to the target JVM's command line. See the relevant HotSpot JVM documentation for more details. A heap dump obtained in this way can be more informative, since it contains a snapshot of your application's memory when it's completely filled up. This often helps to expose data structures that are the worst in terms of memory consumption, and thus should be optimized first.
Android produces JVM heap dumps in its own format. To convert an Android heap dump into the common format, you can use the hprof-conv tool provided in the Android SDK, for example:
> hprof-conv dump.hprof dump-converted.hprof
For more information, check this article.
To run JXRay Analyzer, simply invoke the jxray.sh script from the command line like this:
> jxray.sh myheapdump.hprof myheapdump.html
The script expects that the JDK
bin directory is on your
PATH. You can adjust the JVM settings directly in the script if, for example, your dump is very
big and you need to give the JVM more memory. You typically need to set -Xmx to at least the same value as the size of the dump file. For better performance, a bigger value (between 1.3x .. 1.7x the
size of the analyzed heap dump) is recommended, especially if your disk is slow and/or your heap dump contains a large number of small objects. It is also advised that your machine has at least 4 CPU
cores available (otherwise, for one thing, the performance of the garbage collector in the JVM on which JXRay runs may suffer). 8 CPU cores are recommended for optimum performance, and a bigger
number may further improve speed in some situations.
The output of JXRay goes to the specified file in HTML format. You can also use the
-email <address> command line flag to make the tool send the report to the
specified address. Note that for that to work, your machine should have the
/usr/bin/mail utility configured properly.
A report generated by the JXRay Analyzer for the given heap dump consists of a number of sections. When you first open the page, all sections except the topmost one are collapsed to avoid clutter. You can click on the small black triangle to the left of the section title to expand and view the given section. Within most sections, many more things can be expanded and then collapsed again.
Each section is devoted to a specific heap metrics or potential problem. Whenever possible, the tool reports the overhead (waste) associated with the given problem, in bytes and as a percentage of the whole used heap size, for example "560,810K (31.4%)". The overhead is how much memory you could save in the ideal case, if you completely eliminated the given problem.
MOST IMPORTANT ISSUES. The topmost part of the report tells you immediately if your dump has any serious issues. Only problems with overhead greater than a certain threshold are listed here. The details of each problem are given in the corresponding section later in the report.
Top-Level Stats. When you expand this section, you see the total number and size, in Kbytes, of all objects in the heap, as well as a breakdown by object type (instances vs object arrays vs primitive arrays) and live/garbage status. Some other general information about the heap is provided as well.
Where Memory Goes
The next few secions help you understand where memory goes: instances of what classes take memory (object histogram), what GC roots and reference chains hold all these objects in memory, how much memory and how many objects are live vs garbage, how much memory is used by object headers (rather than the useful data that's stored in objects), etc.
Where Memory Goes, by Class. This section, also called object histogram, gives a breakdown of memory used by every class taking 0.1% or more of the used heap space. The same
0.1% rule is generally used in other sections as well; this value is adjustable via the
-min_reported_percent command line flag. For each class the report displays the number
of instances and how much memory they occupy, both in shallow and implementation-inclusive form.
As you can see, shallow and implementation-inclusive size is the same for byte arrays and
INodeDirectory objects, but different
ArrayList objects (as well as other collections and Strings, that are not displayed here). That's because a collection is usually not one, but several (or many) objects: the main one (e.g. an ArrayList instance) and the implementation objects (e.g. an object array backing the
ArrayList). Similarly, a
technically two objects: an instance of
String class itself (that’s what shallow size is for) and a
char array. The (variable) size of that array, together with the fixed
String instance size, is implementation-inclusive
ArrayList and other collection types that JXRay is aware of, the shallow size is the size of the "naked"
ArrayList instance, whereas implementation-inclusive size includes the size of the object array associated with the given
ArrayList and any other implementation objects.
If you want to find out where objects that take a lot of memory "come from" - that is, what data structures reference them, all the way down to GC roots - you can click on the "Reference chains" line after the given class. That will open a subsection with all the reference chains leading to groups (clusters) of same-type objects (INodeDirectory instances in our example above). A cluster here, by definition, is a group of objects of the same type, all reachable via the same distinct reference chain. Only clusters that take at least 0.1% of the used heap are displayed.
The important and powerful property of JXRay is that it lumps together (aggregates) all objects that are reachable via the same reference chain. The intermediate objects in the reference chain itself are aggregated too. For example, an intermediate INodeMap.map in the above example may correspond to one or many HashMap objects in memory. While some information is lost in this way, in reality this is seldomly a problem. On the contrary, the big adavantage of aggregated reference chains is that the repetitive details that would otherwise clutter the output are removed. The remaining, distilled information allows you to see immediately what code is responsible for the fact that the given objects are held in memory.
In some cases, JXRay aggregates and distills reference chains even further, showing you the "Expensive fields". That happens when there are two or more different reference chains leading to object clusters, that end with the same data field, for example:
In this case, JXRay aggregates the above information and presents it as
As you can see, when all the reference chains ending with com.google.protobuf.LiteralByteString.bytes are lumped together in this way, it turns out that this class/field is responsible for a really high number of arrays and percentage of memory.
Where Memory Goes, by GC Root (Check for Memory Leaks). This information is crucial for understanding sources of high memory consumption and memory leaks.
To answer the "who holds all these objects in memory" question, we need to know the reference chains from these objects down to GC roots. To reveal the reference chains, JXRay scans the object tree in the heap dump starting from GC roots. If some objects are reachable from several roots, the priority is given to (a) the more informative root, such as a static variable as opposed to a JNI reference, and (b) a shorter reference chain. Once the entire object tree has been scanned, the tool groups all objects by their respective GC roots, and further groups objects within each root's subtree into clusters. Here a cluster is a group of objects (not necessarily of the same type) reachable via the same reference chain and taking, together with their "children" objects, a significant enough portion of memory. JXRay then prints the biggest and most interesting clusters within each GC root's object subtree, together with the reverse reference chain leading to these clusters. The clusters are sorted by the amount of retained memory. Thus bigger clusters with highest retained memory, that usually have bigger subtrees of their own (not displayed) and shorter reference chains, are printed first.
If your application’s behavior suggests a memory leak or, generally, retention of a high amount of data, the GC roots and reference chain(s) holding most memory, on the top of this section, is what you should look at.
Live vs Garbage Objects. When JXRay scans the heap dump, it determines the live/garbage status of each object. The table in this section presents aggregated information for all objects. Types are sorted by the total live size of their instances. Note that in this section the tool uses only shallow size for each type (no implementaiton-inclusive size, as in the "Where Memory Goes, by Class" section). This is due to the fact that when collections and Strings become garbage, some of their implementation objects may be GCed and some not, leaving the remaining garbage data structures in inconsistent state. That, in turn, would not allow the tool to accurately determine implementation-inclusive size of such garbage objects.
Object Headers. Each object in the JVM memory has a header - essentially a JVM-internal record that contains a pointer to the object’s class, bits used by the GC and synchronization locking mechanisms, etc. Object header is not small: its size varies between 8 and 16 bytes depending on the JVM process mode (32-bit or 64-bit, with additional “narrow pointer” vs. “wide pointer” modes for the latter). An array uses additional 4 bytes to designate its size. If your application creates a large number of small objects - those with useful workload of a few bytes - the total size of headers of these objects compared to the amount of data that they contain, can become really large.
Memory Retained by Objects Awaiting Finalization. Any class can override the finalize() method of java.lang.Object to perform cleanup and similar operations on an instance before it gets garbage collected. In certain situations, this may cause some objects that are garbage from the application's standpoint (they are not reachable from any GC roots anymore) to stay in memory. For example, when a very large number of "finalizable" objects are being created by the application all the time, the single finalization thread may be unable to keep up. Another rare, but real problem is the finalize() method that gets blocked or deadlocked. In this case, the whole finalization thread is blocked and the number of unfinalized objects keeps growing, causing a memory leak.
The JVM puts all unfinalized objects into a single queue, with the root at java.lang.ref.Finalizer.queue static field. If objects reachable from this root take a significant enough amount of memory, the whole tree will be presented in this section as a number of reverse reference chains similar to those in the previous sections.
Memory Waste due to Specific Problems
Each of the following sections is devoted to a specific problem, that can potentially be fixed by making changes to the code. In each section, the first part shows the impact of a specific problem, such as duplicate strings or suboptimal collections, and presents the top offending objects or types (for example, string values that are duplicated most). The second part of the section presents reference chains that hold the problematic objects in memory.
Duplicate Strings. This is one of the most common memory problems observed in Java applications. Duplicate strings, that is, multiple
String objects with the same logical value, such as
“0” in the example below - are often created when a lot of data is read from external
sources, e.g. a database. If such strings are then kept in memory for long time, and/or each of them is big, they may become a significant source of waste. Getting rid of duplicate strings typically
requires adding special “deduplication” code. The most common, scalable and reliable way to do that is by calling the String.intern() method. It's best to add this call at the location where duplicate string are generated, or where these strings are assigned to
permanent data structures (e.g. in constructors).
In the subsection that follows, JXRay presents the familiar reverse reference chains leading to the biggest clusters of duplicate strings. When the problematic objects are unreachable (garbage), it attempts to find clusters of similar live strings - they may give you a clue about the source of such strings.
In the example below, 13,885K, or 2.8% of the used heap, is the overhead (waste) due to the duplication of strings reachable via the reference chain unfolded below. That is, if we interned all these strings, our heap consumption will be reduced by 2.8 per cent. The number 134660 is the number of duplicate strings referenced by this reference chain, and 100% is the percentage of stirngs that are duplicate relative to all the strings reachable via this reference chain. In this case, all the strings have duplicates (but this is not always the case). The "(10 unique)" means that if all the copies were eliminated, we will see only 10 unique strings here. If you click on this line, it will expand, presenting a table with several specimens of the duplicate strings in this cluster.
Bad Collections. Java and Scala standard collection classes, such as
java.util.ArrayList, java.util.HashMap etc., are very useful. However, applications may sometimes abuse them, resulting in wasted memory. The JXRay
Analyzer currently recognizes the following problems with collections:
Empty collections are those that are initialized but don’t contain any elements (workload). Their overhead is defined as the size of the entire collection implementation. It can be really high, varying between around 100 bytes for an ArrayList to 500..1000 bytes for complex structures like ConcurrentHashMap. Empty collections are usually easy enough to avoid with the help of some additional code that allocates collections lazily and performs null checks where necessary.
Single-element collections contain only one element. Their overhead is defined in the same way as for empty collections. If in some place in the program all collections contain only
one element, you may find that there was a mistake there, and you can simply replace a collection with a direct reference to the object. Otherwise, i.e. if for some legitimate reason most of the time
most collections have only one element, but sometimes may contain several, you may implement custom code that replaces say
ArrayList<Foo> x data field
Object x. Then, depending on the number of objects to store, this code would set
x to point to either a single
Foo object or
Small collections are, by definition in JXRay Analyzer, those that contain between 2 and 4 elements. Such collections are suboptimal in most cases, since the size of their workload
(2 to 4 object elements, i.e. object pointers) is still small compared to the size of the collection implementation. Furthermore, many collections that are backed by an array by default allocate
arrays that are considerably bigger, typically 10 to 16 elements, resulting in even more waste. Note that say a hashmap with a small number of elements will likely not perform searches and other
operations faster than a plain array with handwritten search functionality. Thus if some code in your application manages a large number of small collections such as
may be worth considering replacing them with plain arrays, that would result in much smaller overhead. At a minimum, it is worth allocating these lists with a small initial capacity: use
new ArrayList(2)' instead of '
Sparse collections are those that that use an array internally (for example, ArrayList or HashMap), contain more than 4 elements, and have more than a half of the internal array elements equal to null. This can happen if either a collection was allocated with initial capacity significantly larger than needed, or if many elements have been removed from it (most array-based collections never "shrink" when elements are removed). If your application has a large overhead due to sparse collections, consider allocating them with more suitable initial capacity or re-creating them after many elements are removed.
In the above table, the first column is the overhead for the given combination of a collection type (last column) and problem (second column). For example, if we got rid of all single-element ArrayLists, we would save 457,454K, or 16.7% of the used heap. The third column shows how many objects of this type have the given problem, and what's their percentage. That is, 84% of all ArrayLists in this heap dump are single-element, 11% of ArrayLists are small, and only the remaining 5% are "normal".
The following subsection, as in all other bad/duplicate object sections, contains the familiar reverse reference chains leading to the clusters of problematic collections.
Bad Object Arrays. Similarly to bad collections, object arrays of wrong size, or with likely unused contents, result in wasted memory. The JXRay Analyzer currently recognizes the following problems with object arrays:
Empty arrays are those that contain only null elements. Their overhead is defined as the size of the entire array. Empty arrays can be avoided, for example, with the help of some additional code that allocates arrays lazily and performs null checks where necessary.
Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. If the code really needs zero-length arrays for some reason, multiple arrays can be easily replaced with a singleton zero-lenth array.
Length 1 arrays, whose overhead is also defined as the size of the entire array, can be avoided in the same way as single-element collections. That is, either Foo x can be replaced with
Foo x, or a more complex solution may be implemented. The latter would replace
Foo x field with
Object x, that would ultimately point either to an array or
to a single
Single-element arrays, with overhead again defined as the entire array size, are those that have length greater than 1, but contain only one non-null element. They can be dealt with in the same way as length 1 arrays.
Sparse arrays, by definition in JXRay, are those that contain 70% or more of null elements. You may want to consider allocating arrays with smaller capacity, or changing your code at a higher level to store the same information in a more compact form.
Bad Primitive Arrays. Similarly to suboptimal collections and object arrays, primitive arrays of wrong size, or with likely unused contents, cause memory waste. The JXRay Analyzer currently recognizes the following problems with primitive arrays:
Empty arrays are those that contain only zeroes. Their overhead is defined as the size of the entire array. Empty arrays can be avoided with the help of some additional code that allocates arrays lazily and performs null checks where necessary. However, in some cases, e.g. with I/O buffers, it's hard to avoid having some temporarily empty byte, int etc. arrays in memory.
Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. Multiple length 0 arrays can be easily replaced with a singleton array.
Length 1 arrays, whose overhead is also defined as the size of the entire array, can sometimes be avoided in the same way as single-element collections. That is, int x can be replaced with int x. Alternative application-specific solutions might also be possible.
Arrays with a high number of trailing zeroes are, by definition in JXRay, those that have the last 25% or more of elements equal to zero. The overhead of such an array is the size, in bytes, of all the trailing zero elements. Such arrays are often buffers that have been created too big, and the problem with them may be addressed by reducing the buffer size.
Sparse arrays, by definition in JXRay, are those that contain 70% or more of zero elements. You may want to consider allocating arrays with smaller capacity, or changing your code at a higher level to store the same information in a more compact form.
The breakdown by problems, and reporting of reference chains for bad primitive arrays is very similar to that for bad collections and bad object arrays.
Boxed Numbers. Boxed numbers are instances of classes such as
java.lang.Integer, java.lang.Long etc. One common reason for
using them is the fact that the JDK doesn’t have standard collection classes that store plain
longs etc. That
is, if you want to create say a
HashMap that maps names to numbers, you need to instantiate either a
java.util.HashMap<String, Integer> or use third-party libraries.
Unfortunately, for most real-life applications data structures with boxed numbers would be a bad choice from the performance standpoint. That’s because storing, for
java.lang.Integer object that stores a 4-byte integer takes between 16 and 24 bytes (depends on whether the JVM runs in the "narrow pointer" or "wide pointer" mode).
Plus, there is a pointer to that object that takes 4 or 8 bytes. That is not to mention other effects, like application execution slowdown due to reduced cache locality, multiple memory reads instead
of a single one, immutability of boxed numbers (therefore to e.g. increment it you need to create a new object), etc. Thus, if your application's heap contains a significant quantity of boxed
numbers, your best choice is to stop using them, which likely means switching to third-party libraries with data structures that store plain numbers.
Reporting of boxed numbers in JXRay is similar to that for bad collections and arrays.
Duplicate Objects of arbitrary types occur much more rarely than, for example, duplicate Strings. But when they do, they almost always cause singificant overhead. Currently JXRay analyzes only instances of non-collection classes for duplication. The analysis is "shallow". That is, consider objects A and B that point to objects X and Y, respectively. If A and B have identical contents (except for the references to X and Y that are different), and X and Y have identical contents, A and B will still be considered different. Duplicate objects, if they are immutable, can often be easily eliminated using some form of a canonicalization cache.
The following subsection provides the standard reference chains leading to clusters of duplicate objects. For each cluster some specimen objects are presented in the same format as in the table shown above:
Duplicate Primitive Arrays. Such arrays are similar to duplicate strings: they are separate objects, but their contents are the same. Note that JXRay is smart and can
distinguish between “standalone”
char arrays and those that belong to
String instances - the latter are
not analyzed in this section.
Heap Size Configuration. The HotSpot JVM stores objects in memory in different formats depending on the maximum heap size that you specify via the -Xmx command line option. When the maximum heap size is less than about 32 gigabytes (the so-called "narrow pointer mode"), ordinary instances have a 12-byte header, arrays have 16-byte headers, and each object pointer takes 4 bytes. Otherwise, i.e. when -Xmx is higher than 32GB ("wide pointer mode"), ordinary instances have 16-byte headers, arrays have 20-byte headers, and each object pointer takes 8 bytes. This is necessary for the JVM to support object addresses in really big heaps. But it means that as soon as your -Xmx value crosses the 32GB threshold, objects in your application suddenly become bigger. It may not be a problem if your application works mostly with big primitive arrays - in this case the effect of increased object headers and pointers is often negligible. However, if your application employs many pointer-intensive data structures such as trees, hash maps, etc., or just has many small objects, your memory consumption may suddenly grow more than 1.5 times! For more information, see Why 35GB Heap is Less Than 32GB
When JXRay detects that your application uses wide pointers, it calculates how much memory all your objects would use if you switched to narrow pointers by setting -Xmx below the 32GB threshold. It turns out that sometimes savings are big, and your new used heap size will be well below 32GB. In that case, you may be able to greatly reduce memory consumption by just setting -Xmx=31g or less. Keep in mind, however, that the heap dump may not be taken at the moment when your application's memory utilization is at its peak, and make sure that your new maximum heap size will not cause your application to fail with OutOfMemoryError, or the JVM to run garbage collection too frequently.
Very Long (Over 1000 Elements) Reference Chains. Very long chains of objects can be a problem, in that they are slow to traverse, may result in an extra overhead object per a workload object, etc. If any such chain is detected, it is presented in this section in the standard reverse format.