JXRay Memory Analyzer Documentation

Obtaining a heap dump

 

JXRay Memory Analyzer analyzes binary heap dumps, which are essentially snapshots of the JVM heap. The most common way to obtain a heap dump is by invoking the jmap utility that comes with the JDK:

 

> jmap -dump:live,format=b,file=myheapdump.hprof <pid>

 

where <pid> is the PID of the JVM that runs your application. For more information, invoke jmap -help or check its online documentation.

 

Running the tool

 

To run JXRay Analyzer, simply invoke the jxray.sh script from the command line like this:

 

> jxray.sh myheapdump.hprof > myheapdump.txt

 

The script expects that the JDK bin directory is on your PATH. You can adjust the JVM settings directly in the script if, say, your dump is very big and you need to give the JVM more memory. For the tool to run fast, you typically need to set -Xmx to about the same value as the size of the dump. 

 

The output of JXRay Analyzer by default goes to stdout, and in the above example will be redirected into myheapdump.txt file. You can also use the -email <address> command-line flag to make the tool send the report to the specified address. Note that for that to work, your machine should have the /bin/mail or /usr/bin/mail utility configured properly.

 

Understanding the output

 

A report generated by JXRay Analyzer for the given heap dump consists of a number of sections, each section devoted to a specific heap metrics or potential problem. Whenever possible, a section contains the value of overhead associated with the given problem, in bytes and as a percentage of the whole used heap size. The overhead is how much memory you could save in the ideal case, if you completely eliminated the given problem.

 

A JXRay report currently contains the following sections:

 

Overview of the most important issues. This part allows you to immediately see if your dump has any serious issues. Only problems with overhead greater than a certain threshold are listed here. The details of each problem are given in specific sections later in the report.

 

Example:

 

0. IMPORTANT ISSUES:

 

Section 1. CLASSES AND OBJECTS (threshold: 10%)

  High memory usage by String: 413,030K (27.3%)

  High memory usage by byte[]: 320,712K (21.2%)

 

Section 6. DUPLICATE STRINGS (threshold: 5%)

  High overhead of all duplicate strings: 388,639K (25.7%)

 

Section 8. DUPLICATE PRIMITIVE ARRAYS (threshold: 5%)

  High overhead of all duplicate arrays: 310,380K (20.5%)

 

Section 12. BAD COLLECTIONS (threshold: 2%)

  High overhead of all bad collections: 122,638K (8.1%)

  High overhead of type small j.u.ArrayList: 111,922K (7.4%)

 

Top-level general information. Here you see the total number and size, in bytes, of all objects in the heap dump, as well as a breakdown by object type (instances vs object arrays vs primitive arrays) and live/garbage status.

 

Example:

 

Total num of objects: 713,841

Instances: 573,092, object arrays: 29,164, primitive arrays: 111,585

Total size of all objects: 786,081K (100.0%)

Instances: 14,356K (1.8%), object arrays: 2,270K (0.3%), primitive arrays: 769,453K (97.9%)

 

Live objects: 704,573, garbage objects: 9,268

Live bytes: 785,483K (99.9%), garbage bytes: 597K (0.1%)

 

 

Class and object information. This section gives a breakdown for all classes with instances taking 0.1% or more of the used heap space. The same 0.1% rule is generally used in other sections as well, and the value is adjustable via the -min_reported_percent command line flag. For each class there is information on the number of instances and how much memory they occupy, both in shallow and implementation-inclusive form.

 

Example:

 

 #instances    Shallow size    Impl-inclusive size   Class name

---------------------------------------------------------------

   177,466      5,545K (5.5%)    20,229K (20.0%)     String

   139,974    10,818K (10.7%)    10,818K (10.7%)     byte[]

    33,917      7,066K (7.0%)      7,066K (7.0%)     org.jdom.Attribute

    39,828        933K (0.9%)      2,858K (2.8%)     j.u.ArrayList

 

As you can see, shallow and implementation-inclusive size is the same for byte arrays and Attribute objects, but different for String and ArrayList objects. That’s because a String is technically two objects: an instance of String class itself (that’s what shallow size is for) and a char[] array associated with it. The (variable) size of that array, together with the fixed String instance size, is implementation-inclusive size. Similarly, for ArrayList and other collection types that JXRay is aware of, the shallow size is the size of the "naked" ArrayList instance, whereas implementation-inclusive size includes the size of the object array associated with the given ArrayList, etc.

 

 

Live vs garbage objects. For each object in the dump, JXRay Analyzer is able to determine its live or garbage status. The table in this section presents aggregated information for all objects.

 

Example:

 

Live objects: 1,931,828, garbage objects: 154,758, total objects: 2,086,586

Live bytes: 87,592K (86.8%), garbage bytes: 13,369K (13.2%), total bytes: 100,962K (100.0%)

 

 #instances     Shallow size     #instances    Shallow size       Class name

  garbage       garbage           live         live                         

----------------------------------------------------------------------------

    23,268      4,422K (4.4%)      141,236    12,966K (12.8%)     char[]

    28,818      3,959K (3.9%)      111,156      6,859K (6.8%)     byte[]

     9,170      1,220K (1.2%)       24,747      5,845K (5.8%)     int[]

    23,252        726K (0.7%)      154,214      4,819K (4.8%)     String

 

 

Overhead of object headers. Each object in the JVM memory has a header - essentially a JVM-internal record that contains a pointer to the object’s class, bits used by the GC and locking mechanism, etc. Object header is not small: its size varies between 8 and 16 bytes depending on the JVM binary kind (32-bit or 64-bit, with additional “narrow pointer” vs. “wide pointer” modes for the latter). An array uses additional 4 bytes to designate its size. If your application creates a large number of small objects - those with “workload” of a few bytes - the total size of headers of these objects compared to the amount of data that they contain, can become prohibitively large.

 

Example:

 

Object header size: 12 bytes

Total size of all object headers: 24,452K (24.2%)

 

 #instances    (Average) object   Total header     Class name

                    size              size                   

-------------------------------------------------------------

     177,466         32          2,079K (2.1%)     String

     164,504        108          1,927K (1.9%)     char[]

     139,974         79          1,640K (1.6%)     byte[]

 

 

Reference chains with high retained memory (may signal memory leak). This information is crucial for understanding sources of memory leaks and, generally, high memory usage. JXRay Analyzer scans the heap dump starting from GC roots, and then scans the remaining objects (garbage). In process, it determines which objects are held in memory by which root. Note that many objects can be reachable from several roots. In this case (a) the more informative root, such as a static variable as opposed to an unknown method’s stack frame, and (b) a shorter reference chain, take precedence. Eventually, JXRay divides all objects by their reachability from different GC roots, and further divides objects within each such group by their reachability via a certain reference chain starting from the given root.

 

If your application’s behavior suggests a memory leak or, generally, long-term retention of a high amount of data, there is a very good chance that JXRay analysis will expose a reference chain, and ultimately a pointer to the data structure (class and field), that holds most of these objects in memory.

 

Example:

 

  35,723K (35.4%) (com.intellij.openapi.util.objectTree.ObjectTree)

     <-- Java Static: com.intellij.openapi.util.Disposer.ourTree

  35,472K (35.1%) (git4idea.history.wholeTree.GitLogUI$6, com.intellij.codeInsight.daemon.impl.PsiChangeHandler, com.intellij.openapi.wm.impl.status.EncodingPanel etc.)

     <-- Object[] <-- gnu.trove.THashMap._set <-- com.intellij.openapi.util.objectTree.ObjectTree.myObject2NodeMap <-- Java Static: com.intellij.openapi.util.Disposer.ourTree

  --------

  13,369K (13.2%) (j.l.Thread, j.l.OutOfMemoryError, byte[] etc.)

     <-- Unreachable

  --------

  8,803K (8.7%) (com.intellij.openapi.wm.impl.IdeFrameImpl)

     <-- JNI Global@7ceab2998 (com.intellij.openapi.wm.impl.IdeFrameImpl)

  8,662K (8.6%) (com.intellij.openapi.project.impl.ProjectImpl)

     <-- com.intellij.openapi.wm.impl.IdeFrameImpl.myProject <-- JNI Global@7ceab2998 (com.intellij.openapi.wm.impl.IdeFrameImpl)

 

 

Very long (over 1000 elements) reference chains. Very long chains of objects can be a problem, in that they are slow to traverse, may result in an extra overhead object per a workload object, etc.

 

Example:

 

  1K (1.0%), ref chain length: 3588

     <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- j.l.r.Finalizer.next <-- ... (1572 elements) ... <-- java.awt.image.BufferedImage.sData <-- com.intellij.ui.TreeExpandableItemsHandler.myImage <-- com.intellij.ide.projectView.impl.PackageViewPane$3.myExpandableItemsHandler <-- com.intellij.ide.util.treeView.AbstractTreeUpdater$1.myModalityStateComponent <-- Object[] <-- gnu.trove.THashMap._set <-- com.intellij.openapi.util.objectTree.ObjectTree.myObject2NodeMap <-- Java Static: com.intellij.openapi.util.Disposer.ourTree

 

 

Duplicate string stats. This is one of the most common memory problems observed in Java applications. Duplicate strings, that is, multiple separate String objects with the same logical value, such as “0” in the example below - are often created when a lot of data is read from external sources, e.g. a database. If such strings are then kept in memory for long time, and/or each of them is big, they may become a significant source of waste. Getting rid of duplicate strings typically involves adding special “deduplication” filtering code - usually it's an invocation of String.intern() method. It's best to add this call at the location where duplicate string are generated, or where these strings are assigned to permanent data structures (e.g. in constructors).

 

Example:

 

Total strings: 780,786  Unique strings: 142,183  Duplicate values: 21,496  Overhead: 31,900K (5.9%)

 

Top duplicate strings:

    Ovhd         Num char[]s   Num objs   Value

 

    314K (0.5%)     8056        8057      ""

    306K (0.5%)     6533        6533      "0"

    300K (0.5%)     6416        6416      "d"

    263K (0.4%)     2406        2406      “com.ayala.lynx.StatsRecvr$Xa1C”

 

 

Overhead, number and reference chains for duplicate strings. In this section, JXRay Analyzer provides information on reference chains leading to the biggest “clusters” of duplicate strings. When the problematic objects are unreachable (garbage), it attempts to find clusters of similar live strings - they may give you a clue about the source of such strings.

 

 

Bad collection stats. Java and Scala standard collection classes, such as java.util.ArrayList, java.util.HashMap etc., are very useful. However, applications may sometimes abuse them, resulting in wasted memory. JXRay Analyzer currently recognizes the following problems with collections:

 

Empty collections are those that are initialized but don’t contain any workload. Their overhead is defined as the size of the entire collection implementation. It can be really high, varying between around 100 bytes for an ArrayList to 500..1000 bytes for big structures like ConcurrentHashMap. Fortunately, empty collections are usually easy enough to avoid with the help of some additional code that allocates collections lazily and performs null checks where necessary.

 

Single-element collections contain only one element. Their overhead is defined in the same way as for empty collections. If in some place in the program all collections contain only one element, you may find that there was a mistake there, and you can simply replace a collection with a direct reference to the object. Otherwise, i.e. if for some clear reason most of the time most collections have only one element, but occasionally may contain several, you may implement custom code that replaces say ArrayList<Foo> x data field with Object x. Then, depending on the number of objects to store, this code would set x to point to either an ArrayList or to a single Foo object.

 

Small collections are, by definition in JXRay Analyzer, those that contain between 2 and 4 elements. Such collections are wasteful, since the size of their workload (2 to 4 object pointers) is still small compared to the size of the collection implementation. If, for example, some big higher-level data structure contains a large number of small collections such as ArrayLists, it may be worth considering replacing them with plain arrays, that would result in much smaller overhead.

 

Example:

 

Total collections: 62,470  Bad collections: 56,222  Overhead: 3,986K (3.9%)

 

Top bad collections:

    Ovhd           Problem     Num objs      Type

-------------------------------------------------

  1,221K (1.2%)     small        29080     j.u.ArrayList

    718K (0.7%)     empty         9191     j.u.ArrayList

    347K (0.3%)    1-elem         2965     scala.collection.mutable.HashMap

 

 

Bad object arrays stats. Similarly to bad collections, object arrays of wrong size, or with likely unused contents, result in wasted memory. JXRay Analyzer currently recognizes the following problems with object arrays:

 

Empty arrays are those that contain only null elements. Their overhead is defined as the size of the entire array. Empty arrays can be avoided, for example, with the help of some additional code that allocates arrays lazily and performs null checks where necessary.

 

Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. Multiple length 0 arrays can be easily replaced with a singleton array.

 

Length 1 arrays, whose overhead is also defined as the size of the entire array, can be avoided in the same way as single-element collections. That is, either Foo[] x can be replaced with Foo x, or a more complex solution may be implemented. The latter would replace Foo[] x field with Object x, that would ultimately point either to an array or to a single Foo object.

 

Single-element arrays, with overhead again defined as the entire array size, are those that have length greater than 1, but contain only one non-null element. They can be dealt with in the same way as length 1 arrays.

 

Example:

 

Total object arrays: 604,883  Bad object arrays: 71,084  Overhead: 5,097K (1.0%)
 
Top bad object arrays:
    Ovhd             Problem     Num objs      Type
---------------------------------------------------
  3,974K (0.8%)       1-elem        27128     Object[]
    585K (0.1%)     1-length        25000     org.apache.hadoop.hbase.HRegionLocation[]

 

 

Bad primitive arrays stats. Similarly to suboptimal collections and object arrays, primitive arrays of wrong size, or with likely unused contents, result in wasted memory. JXRay Analyzer currently recognizes the following problems with primitive arrays:

 

Empty arrays are those that contain only zeroes. Their overhead is defined as the size of the entire array. Empty arrays can be avoided with the help of some additional code that allocates arrays lazily and performs null checks where necessary. However, in some cases, e.g. with I/O buffers, it's hard to avoid having some temporarily empty byte[], int[] etc. arrays in memory.

 

Length 0 arrays typically show up if some code allocates arrays of requested size without checking for zero. Their overhead is defined as the size of the entire array. Multiple length 0 arrays can be easily replaced with a singleton array.

 

Length 1 arrays, whose overhead is also defined as the size of the entire array, can sometimes be avoided in the same way as single-element collections. That is, int[] x can be replaced with int x. Alternative application-specific solutions might also be possible.

 

Arrays with a high number of trailing zeroes. JXRay Analyzer puts an array into this category if more than certain percentage of its trailing elements are zeroes. The default value is 25%, and can be adjusted with the -min_trailing_zeroes_percent command line flag. The overhead of such an array is the size, in bytes, of all the trailing zero elements. Such arrays are often buffers that have been created too big, and the problem with them may be addressed by reducing the buffer size.

 

Example:

 

Total primitive arrays: 6,630,260  Bad primitive arrays: 533,223  Overhead: 150,097K (9.1%)
 
Top bad object arrays:
    Ovhd             Problem     Num objs      Type
---------------------------------------------------
  112,974K (8.2%)   trail-0s      389,651      byte[]
 

 

Boxed number stats. Boxed numbers are instances of classes such as java.lang.Integer, java.lang.Long etc. One common reason for using them is related to the fact that the JDK doesn’t include standard collection classes that store plain ints, longs etc. That is, if you want to create say a HashMap that maps names to numbers, you need to instantiate either a java.util.HashMap<String, Integer> or use third-party libraries.

 

Unfortunately, for most real-life applications data structures with boxed numbers would be a bad choice. That’s because storing, for example, a 4-byte int in an instance of java.lang.Integer results in a big overhead: an 8 to 16 bytes object header (see above) and a pointer to that object. That is not to mention other effects, like application execution slowdown due to reduced cache locality, multiple memory reads instead of a single one, immutability of boxed numbers (therefore to increment it you need to create a new object), and so on. Thus, if analysis shows that there are enough boxed number instances to cause noticeable overhead, your best choice is to stop using them, which likely means switching to third-party libraries with data structures that contain plain numbers.

 

Example:

 

Total boxed number objects: 111,314  Overhead: 1,900K (2%)

 

Top boxed number objects:

    Ovhd            Shallow size    Num objs      Type

------------------------------------------------------

 1,116K (1.2%)      1,116K (1.2%)    70,478     j.l.Integer

 

 

Overhead, number and reference chains for boxed numbers. In this section, JXRay Analyzer provides information on reference chains leading to the biggest clusters of boxed numbers. When the problematic objects are unreachable (garbage), the tool attempts to find clusters of similar live objects - they may give you a clue about the source of such objects.

 

 

Duplicate objects. This is a special category problem, and currently for performance reasons only instances of non-collection classes that take more than a certain percentage of the heap are analyzed for duplication. By default, a given class should use 10% of the heap or more; this value is adjustable via the -min_heap_percent_for_class_to_check_dups command line flag. Duplicate objects, if they are immutable, can often be easily eliminated using some kind of a canonicalization cache.

 

Example:

 

Types of duplicate objects:
     Ovhd         Num objs  Num unique objs   Class name

 

129,449K (19.9%)   4147741         5349         com.ayala.lynx.ProbeMetric$InstantAndValue

 

Total objects of above types: 4,147,741  Unique objects: 5,349  Duplicate values: 5,184  Overhead: 129,449K (19.9%)

 

Top duplicate objects:
    Ovhd          Num objs    Value

 

  6,323K (1.0%)   202353     com.ayala.lynx.ProbeMetric$InstantAndValue(timestamp : org.joda.time.Instant@7a046b350, count : 1, value : 0.0)

  ....

 

 

Duplicate primitive array stats. Such arrays are similar to duplicate strings: they are separate objects, but their contents are the same. Note that JXRay is smart and can distinguish between “standalone” char[] arrays and those that belong to String instances - the latter are not analyzed in this section.

 

Example:

 

Total arrays: 67,278  Unique arrays: 1,925  Duplicate values: 561  Overhead: 8,072K (12.3%)

 

Top duplicate arrays:

    Ovhd          Num objs    Value

 

  1,626K (2.5%)      232     int[1798](0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)

    768K (1.2%)       13     byte[65536](0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)

    768K (1.2%)       13     byte[65536](0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)

    367K (0.6%)     7832     char[16](, , , , , , , , , , ...)

 

 

Overhead, number and reference chains for duplicate primitive arrays. In this section, JXRay Analyzer provides information on reference chains leading to the biggest clusters of duplicate primitive arrays. When the problematic arrays are unreachable (garbage), the tool attempts to find clusters of similar live arrays. They may give you a clue about the source of such arrays.

Print Print | Sitemap
© JXRay