5 Coding Hacks to Reduce GC Overhead

stevoski · on July 18, 2013

This article is way inaccurate. It recommends approaches that were maybe important in 1997. It also breaks the "profile before optimising" rule.

Currently the Java GC works splendidly with lots of short-lived, small immutable classes. Using this approach can be far better for performance than worrying about telling the ArrayList constructor how many elements the list will probably have.

saurik · on July 18, 2013

While VMs like Sun's may be using techniques like escape analysis, I seriously doubt that Dalvik does (it seems like dx might, but that's going to be of limited value at compile time, and it isn't clear how it uses the information): a lot of people these days coding in Java are doing so for Android, where the garbage collector is a serious problem, sufficiently so that all garbage collections are logged to the system log (so as a developer you don't even need to go to much work to profile it as an issue: you just see the pain in your console and then likely seem out this article).

SpikeGronim · on July 18, 2013

Absolutely. Step 0 of this article should have been "profile your GC and establish that it is a problem."

ndnichols · on July 18, 2013

These kinds of optimization/"hacks" articles need to include benchmarks. How does using one StringBuilder compare to using the implicit three or five? How many strings does one need to append before you should've used a StringBuilder? Etc.

I imagine that there's going to be novice programmers whose takeaway from the article is "Hashmaps with ints as keys are slow" or "String concatenation in Java is slow".

laurent123456 · on July 18, 2013

Perhaps it should be renamed to "5 Coding Hacks to Reduce GC Overhead in Java", otherwise it's not clear what it's about (in fact, even in the article, Java is not mentioned before the third paragraph).

DiabloD3 · on July 18, 2013

Articles like this make me believe that Java should have just gone with a model designed around fibers that each fiber gets a fiber-local heap and objects that survive the death of the heap or are accessed in a multi-fiber way are moved to a global heap.

It would solve the problem with "stop the world" GC dynamics and increase concurrent collecting performance while also adding native fibers to the JVM (and thus make it possible to write highly concurrent software in Java much more easy).

dumael · on July 18, 2013

The DLG collector as used by Caml Light does what you describe.

Updating an object in the global heap to point to a thread-local heap can cause unbounded work to be done before that update as the entire pointed-to structure has to be copied into the global heap first. GC designers really hate write barriers (sections of code that are performed before a pointer is updated) that perform unbounded work.

JulianMorrison · on July 18, 2013

"use streams instead of buffers" is completely wrong. It is much faster to read off a file or the network into a buffer rather than byte-by-byte so you can assume that anything which takes a stream, buffers it internally.

What you should do: allocate one read buffer with a capacity > expected length, overwrite-read into it, copy the data into a fresh capacity == actual length byte array, parse the copy, empty and reuse the read buffer.

What you definitely shouldn't do, despite it being tempting: allocate a read buffer each time and pass the whole thing around.

barrkel · on July 18, 2013

An approach I've taken in .NET is to write a class that implements IDisposable and wraps a byte buffer of a standardized size. Behind the scenes it used an array of stacks for recycling the byte arrays. Normal use case would be something like:

    using (ByteBuffer buffer = pool.AllocateBuffer(128 * 1024))
    {
         // use buffer.array
    }

The pool used weak references to the buffers so they could eventually be GC'd. Dead weak references were cleaned out when found during allocation. Typically the byte arrays needed were big enough to need to go into the large object heap, the cutoff for which was 80k IIRC. The LOH was not collected until gen2 collection.

lucian1900 · on July 18, 2013

I'm not sure 1. is relevant anymore. Almost all JVMs default to generational GCs, which would make all those allocations basically free.

ExpiredLink · on July 18, 2013

Most of the 'hacks' were not relevant anymore 10 years ago. The blog is probably written to pitch the product.

maximilianburke · on July 18, 2013

I'm not sure how that's "basically free". If the nursery fills up, even if the objects are dead, it will still require a root set scan and copy/compact of the live objects. That's not basically free, that can be quite expensive.

jbert · on July 18, 2013

Why should the manually managing your stringbuilders be preferable to having the compiler do it for you? The example given is:

  String result = foo() + arg;
  result += boo();
  System.out.println(“result = “ + result);

creates 3 StringBuilders. I would have thought that it wouldn't be particularly hard to track that the input to subsequent "+" was the output of a previous stringbuilder, allowing easy re-use?

joshdev · on July 18, 2013

+1 for Trove. Recently when working with collections with 10s of thousands of primitive elements. We saw substantial memory savings by using Trove.

FreeFull · on July 18, 2013

I was hoping for something that would be general for most GC languages, but it seems to only talk about Java.

ryanpetrich · on July 18, 2013

The general advice is to produce less garbage. Strategies for doing so are specific to the language and APIs involved.

AlisdairO · on July 18, 2013

It's not just that you need to produce less garbage - you want to produce less live objects in general, as a large part of the cost during GC is determining which objects are live.

RyanZAG · on July 18, 2013

That depends - if your code is mostly single threaded, and you're using the concmarksweep collector on a multicore CPU, determinign which objects are live is basically 'free'. On machines with a high number of cores, have all cores on 100% is very rare.

zeroDivisible · on July 18, 2013

I might be wrong, but it might be a bit hard to write something general enough to be useful - as most of the items can be easily translated into other languages, it would still be mostly implementation-specific / language design specific.

xntrk · on July 18, 2013

Unless an article like this is written by Josh Bloch, or someone of his caliber, I don't think you can really believe it.

Jach · on July 18, 2013

Another: buy more RAM if needed, and set -Xmx###M and -Xms###M to sufficiently high numbers so that the JVM never runs out of memory and forces a collection. You can also do System.gc() to "suggest" a collection be done soon, such as while you're busy doing something slow that's I/O bound like reading a file or getting network data. Though I wouldn't be surprised if most of the time modern JVM GCs already know when you're doing such things, making such a hint redundant.

joshdev · on July 18, 2013

Buying more RAM only gets you so far. G1 handles large heaps fairly well, but you can still run into issues with long pauses if you are not careful and those pauses increase with the size of the heap. Most real time services don't have the luxury of manually triggering GCs and must do whatever they can to limit the stress they put on the collector.