Memory Leak Testing in Java
At xMatters we have a fast resolution initiative. The goal is to always find the right recipients even faster and start delivering notifications even sooner.
In support of this initiative, we recently made a minor tweak in our code to decrease overall memory consumption by getting rid of extra references that were no longer needed. Every time we make a bug fix or a code change, we write a unit test. Here’s what happened.
Fair warning: This blog post is intended for engineers. With that said….
Decreasing Memory Consumption
We need to track references to an object and detect the moment when it becomes unreachable. An object in Java, by the way, can be a combination of variables, functions, and data structures At first we were thinking about Soft and Weak references. For help with the rest of this article, you might want to cruise around Wikipedia or check out the Java website.
In descending order, objects can have strong, weak, and phantom references.
Running the Unit Test
The idea was to create this sort of reference and then check if the get() function returns null. But with Soft references, garbage collector (GC) is not contractually obliged to clear them. It is allowed to, but it may choose not to – if there is still plenty of heap space. With Weak references, GC is obliged to clear them up even if the referent is still reachable via regular hard references. Neither of those approaches seems robust enough for a good unit test. The GC uses an object’s type of reachability to determine when to free the object.
And indeed, this is a rare use case for Phantom references. Phantom references are almost never used directly, but their purpose is to do something with instances after they were collected by GC. As you may know, phantom reachable instances are truly freed only on the second GC after they become unreachable, allowing developers the time to perform reference tracking diagnostics.
In our test we created phantom references for every instance we want to track. We do our processing until the point at which we want to ensure that certain instances can be collected. At this point we invoke GC and wait for its completion. Then we check if the reference is enqueued.
If that sounds simple, than you never worked with JVM GC. There are huge caveats:
- Invoking GC can be done via System.gc() call. But this call is not guaranteed to do anything. First, it may be disabled via command line option. Second, it may be disabled in the compile time. And third, it is doing nothing on Azul JVM and some others as well. To really invoke garbage collector you are much better off using MemoryMXBean (“java.lang:type=Memory”). At least if that bean is unreachable, your test may report a nice failure.
- Waiting for GC to complete is tricky. There is no isDone or isComplete call anywhere. But there is a collection of GC diagnostic MBeans available via ManagementFactory.getGarbageCollectorMXBeans. In our test we simply used the sum of GC counts on all of those beans as the metric. We wait for that number to increase over the initial value and to stop increasing after that. We also cap that wait to 5ms, just to account for some GCs that always return -1 instead of the count.
- When you want to track hierarchical structure with Phantom references and the entire structure is ready to be collected, only the lowest level of it will be considered “phantom reachable” by GC and will be placed in the phantom queue. Once you clear the phantom reference, you may need to call another GC to clear the second layer of your structure.
- It is tricky to clear the phantom reference. You have to call clear() on each phantom reference instance and also make sure to clear the strong reference to that phantom reference itself that you have in your local variables stack of the test thread.
- The funniest bug you can spend a lot of time investigating, is the null reference. JVM would happily allow you to create a Phantom reference to null – to nothing. Of course that reference would never become enqueued for GC. And if your method under test ever returns null, you need to guard around that.
Once all of that is taken care of, you can finally write your assertReachable method and complete the test.
It does not end here though. Consider what to do if the test fails. How do you track the rogue references? Surely you would want a heap dump. That is the final part of our unit test – to trigger a heap dump (via HotSpotDiagnosticMXBean) whenever assertReachable fails.
We are dealing with large sets of hierarchically organized data and need to process that data at relatively high speed. While processing it we are expected to run into complex corner cases and to resolve those to the best interest of our customers. In order to achieve this we need to examine fine details of our processing engine and to make sure those stay optimized as we add new functionality.
If we take care of the details, we can deliver a better product to you.