

5 Coding Hacks to Reduce GC Overhead - heapster
http://www.takipiblog.com/2013/07/18/5-coding-hacks-to-reduce-gc-overhead/

======
stevoski
This article is way inaccurate. It recommends approaches that were maybe
important in 1997. It also breaks the "profile before optimising" rule.

Currently the Java GC works splendidly with lots of short-lived, small
immutable classes. Using this approach can be far better for performance than
worrying about telling the ArrayList constructor how many elements the list
will probably have.

~~~
saurik
While VMs like Sun's may be using techniques like escape analysis, I seriously
doubt that Dalvik does (it seems like dx might, but that's going to be of
limited value at compile time, and it isn't clear how it uses the
information): a lot of people these days coding in Java are doing so for
Android, where the garbage collector is a serious problem, sufficiently so
that all garbage collections are logged to the system log (so as a developer
you don't even need to go to much work to profile it as an issue: you just see
the pain in your console and then likely seem out this article).

------
ndnichols
These kinds of optimization/"hacks" articles need to include benchmarks. How
does using one StringBuilder compare to using the implicit three or five? How
many strings does one need to append before you should've used a
StringBuilder? Etc.

I imagine that there's going to be novice programmers whose takeaway from the
article is "Hashmaps with ints as keys are slow" or "String concatenation in
Java is slow".

------
laurent123456
Perhaps it should be renamed to "5 Coding Hacks to Reduce GC Overhead in
Java", otherwise it's not clear what it's about (in fact, even in the article,
Java is not mentioned before the third paragraph).

------
DiabloD3
Articles like this make me believe that Java should have just gone with a
model designed around fibers that each fiber gets a fiber-local heap and
objects that survive the death of the heap or are accessed in a multi-fiber
way are moved to a global heap.

It would solve the problem with "stop the world" GC dynamics and increase
concurrent collecting performance while also adding native fibers to the JVM
(and thus make it possible to write highly concurrent software in Java much
more easy).

~~~
dumael
The DLG collector as used by Caml Light does what you describe.

Updating an object in the global heap to point to a thread-local heap can
cause unbounded work to be done before that update as the entire pointed-to
structure has to be copied into the global heap first. GC designers really
hate write barriers (sections of code that are performed before a pointer is
updated) that perform unbounded work.

------
JulianMorrison
"use streams instead of buffers" is completely wrong. It is much faster to
read off a file or the network into a buffer rather than byte-by-byte so you
can assume that anything which takes a stream, buffers it internally.

What you should do: allocate one read buffer with a capacity > expected
length, overwrite-read into it, copy the data into a fresh capacity == actual
length byte array, parse the copy, empty and reuse the read buffer.

What you definitely shouldn't do, despite it being tempting: allocate a read
buffer each time and pass the whole thing around.

~~~
barrkel
An approach I've taken in .NET is to write a class that implements IDisposable
and wraps a byte buffer of a standardized size. Behind the scenes it used an
array of stacks for recycling the byte arrays. Normal use case would be
something like:

    
    
        using (ByteBuffer buffer = pool.AllocateBuffer(128 * 1024))
        {
             // use buffer.array
        }
    

The pool used weak references to the buffers so they could eventually be GC'd.
Dead weak references were cleaned out when found during allocation. Typically
the byte arrays needed were big enough to need to go into the large object
heap, the cutoff for which was 80k IIRC. The LOH was not collected until gen2
collection.

------
lucian1900
I'm not sure 1. is relevant anymore. Almost all JVMs default to generational
GCs, which would make all those allocations basically free.

~~~
ExpiredLink
Most of the 'hacks' were not relevant anymore 10 years ago. The blog is
probably written to pitch the product.

------
jbert
Why should the manually managing your stringbuilders be preferable to having
the compiler do it for you? The example given is:

    
    
      String result = foo() + arg;
      result += boo();
      System.out.println(“result = “ + result);
    

creates 3 StringBuilders. I would have thought that it wouldn't be
particularly hard to track that the input to subsequent "+" was the output of
a previous stringbuilder, allowing easy re-use?

------
joshdev
+1 for Trove. Recently when working with collections with 10s of thousands of
primitive elements. We saw substantial memory savings by using Trove.

------
FreeFull
I was hoping for something that would be general for most GC languages, but it
seems to only talk about Java.

~~~
ryanpetrich
The general advice is to produce less garbage. Strategies for doing so are
specific to the language and APIs involved.

~~~
AlisdairO
It's not just that you need to produce less garbage - you want to produce less
live objects in general, as a large part of the cost during GC is determining
which objects are live.

~~~
RyanZAG
That depends - if your code is mostly single threaded, and you're using the
concmarksweep collector on a multicore CPU, determinign which objects are live
is basically 'free'. On machines with a high number of cores, have all cores
on 100% is very rare.

------
xntrk
Unless an article like this is written by Josh Bloch, or someone of his
caliber, I don't think you can really believe it.

------
Jach
Another: buy more RAM if needed, and set -Xmx###M and -Xms###M to sufficiently
high numbers so that the JVM never runs out of memory and forces a collection.
You can also do System.gc() to "suggest" a collection be done soon, such as
while you're busy doing something slow that's I/O bound like reading a file or
getting network data. Though I wouldn't be surprised if most of the time
modern JVM GCs already know when you're doing such things, making such a hint
redundant.

~~~
joshdev
Buying more RAM only gets you so far. G1 handles large heaps fairly well, but
you can still run into issues with long pauses if you are not careful and
those pauses increase with the size of the heap. Most real time services don't
have the luxury of manually triggering GCs and must do whatever they can to
limit the stress they put on the collector.

