
Java garbage collection can be really slow - ingve
http://jvns.ca/blog/2016/04/22/java-garbage-collection-can-be-really-slow/
======
_ph_
Probably a book needs to be written as in "Pragmatic Garbage Collection"
summarizing some good practices to avoid surprises as the author of the
article encountered. Having used Java since its creation and other GCed
languages, I would summarize them as follows:

\- avoid allocating objects on the heap which you do not have to allocate. The
less fresh allocations you have, the less the GC has to do. That does not mean
you should write ugly and complex code, but if the tool described in the
article was for example grep-like, then one should not have to allocate each
line read separately on the heap just to discard it. If possible use a buffer
for reading in, if the io libraries allow it.

\- generational GCs try to work around this a bit, as the youngest generation
is collected very quickly, assuming the majority of the objects is already
"dead" when it happens, only the "survivors" are copied to older generations.
Make sure that the youngest generation is large enough, that this assumption
is true and only objects are promoted to older generations which indeed have a
longer lifetime.

\- language/library design makes a huge difference how much pressure there is
on the GC system. Less heap allocations help, also languages, which try not to
create too complex heap layouts. In Java, an array of objects means an array
of pointers to objects which could be scattered around the heap, while in Go
you can have an array of structs which is one contiguous block of memory which
drastically reduces heap complexity (but of course, is more effort to
reallocate for growing).

\- good library design can bring a lot of efficiency. At some point in time,
just opening a file in Java would create several separate objects which
referred to each other (a buffered reader which points to the file object...).
My impression is, "modern" Java libraries too often create even larger object
chains for a single task. This can add to the GC pressure.

Of course, all these practices can be used equally "well" to bring down a
program with manual allocation to a crawl. So in summary I am a strong
proponend of GC, but one needs to be aware of at least the performance
tradeoffs different factorings of one program can bring. Modern GCs are
increadible fast, but that is not a magic property.

~~~
pjmlp
One problem with Java is, for the time being, the lack of proper value types.

Had a language like Eiffel, Oberon dialects or Modula-3 taken its role in the
industry, I bet we wouldn't be having this constant discussions of GC vs
manual, in terms of performance.

Another issue, is that many developers even in languages with GC and value
types, tend to be "new() happy" sometimes leading to designs that are very
hard to refactor when the the need arises, given the differences in the type
system between reference and value types.

Eiffel is probably the only one I can remember, where the difference is a
simple attribute.

~~~
_ph_
One thing I like about Go is its strong Oberon heritage, picking up where
those languages left.

~~~
pjmlp
That is what attracted me initially to it, but then I got disappointed with
the overall direction the language design was going.

I am more of a Swift/Rust guy than Go, in terms of features.

Even Oberon eventually evolved into Active Oberon and Component Pascal
variants, both more feature rich than Go.

To be honest, Niklaus Wirth's latest design, Oberon-07 is even more minimalist
than Oberon itself.

EDIT: Typo

~~~
vram22
>then I got disappointed with the overall direction the language design was
going.

Can you elaborate? Thanks.

~~~
pjmlp
For me the fact that Go is a descent of Oberon-2 and Limbo is quite
interesting, but there are several features that a modern language should have
that aren't present in Go and never will be.

Hence why I rather see the appeal of Go as a way to attract developers that
would otherwise use C, to make use of a more safer programming language.

As many turn to C, just because they don't know other AOT compiled languages
well, not because they really need any C special feature.

Regardless of the discussion regarding if it is a systems programming language
or not, I think it can be, given its lineage. It only needs someone to get the
bootstraped version (1.6) write a bare metal runtime and then it would be
proven. Maybe a nice idea for someone looking for a PhD thesis in OS area.

Me, I would rather make use of a .NET, JVM or ML influenced language as those
have type systems more of my liking.

------
pron
This has probably nothing to do with GC tuning[1] or with Java's GC being slow
(or any other GC), and most likely to do with either a bug in the program (a
leak) or a misunderstanding of how the program uses memory. It's not the "GC
ruining your day", but the GC not being able to fix a bug in your program
and/or cram a 5 GB RAM usage into a 4 GB heap.

[1]: Which is relevant if you're trying to turn a 100ms pause into a 15ms
pause, or get rid of the 2sec pause you get once every few hours.

~~~
kpil
There is either a leak: The application keeps pointers to a lot of objects
that are actually never going to be used.

Or running out of memory: The application keeps pointers to objects that
_will_ be used later.

Both problems are solvable, you remove the pointers or change the algorithm
respectively. (If you simply can't add more memory.)

The real hard problem is that the jvm takes a long time to report an OOM
error. But it's not unique to java; Who have not seen servers that have become
unresponsive in a low memory situation.

------
simula67
The problem is not just with the time it takes, but that most garbage
collection algorithms are stop-the-world ( not sure if _any_ of them are truly
concurrent ). This can introduce correctness problems.

I used to work on a network management software that used ICMP polling to
detect if network devices were down. We had a SEDA architecture, requests were
put on a queue, timers were set and if the device did not respond within a
timeout, we would mark the device as down.

Problem was, it so happened that in a high load system after we sent out the
request, the garbage collector would kick in and take eons to return the
system to running state. When the system returns, the timer events would fire
and the handlers would note that the timeout has expired and mark the devices
as down. The device could have responded in time to the requests but the
system would not have detected it.

This is why I am weary of languages with mandatory garbage collection. I feel
it should be a library in any serious systems language.

~~~
hga
See
[https://news.ycombinator.com/item?id=11555017](https://news.ycombinator.com/item?id=11555017)
Azul's Zing is pauseless, never has to do a stop-the-world collection which
they say on a JVM takes about a second per GiB.

It has threads which concurrently collect as other threads mutate, uses clever
VM tricks such as bulk operations with only one TLB invalidation (or at least
they did that with an earlier version of the current collector, they couldn't
get it into the mainline Linux kernel and now use a DLKM). It's the only non-
toy currently maintained pauseless/concurrent GC that I know of.

------
Annatar
If one has to worry about allocating or not allocating objects on the heap,
what is the difference between worrying about memory management that way (and
suffering memory consumption and performance because of the garbage
collector), and doing memory management manually with alloca(3C) or malloc(3C)
in C, and having pretty much guaranteed performance???

~~~
panic
Well, when you use a GC, you don't have to figure out where to put the "free"
calls. This sounds like a minor thing, but it lets you write code in a very
different style (have you ever tried writing a functional program with
explicit malloc and free?)

That said, there are other ways to avoid writing "free" than using a garbage
collector. Regions ([https://en.wikipedia.org/wiki/Region-
based_memory_management](https://en.wikipedia.org/wiki/Region-
based_memory_management)) are faster at allocation than malloc (you just
increment a pointer to allocate) and faster at freeing than a GC (you throw
away the entire region when you're done with it). It seems tricky to base a
general-purpose programming language around them, though.

~~~
Annatar
> Well, when you use a GC, you don't have to figure out where to put the
> "free" calls.

That is precisely why I mentioned alloca(3C): it automatically frees the
memory for you, if you do not want to do it yourself. From the Solaris /
illumos alloca(3C) manual page:

    
    
      void *alloca(size_t size);
    
      The alloca() function allocates size bytes of space  in  the
      stack  frame  of  the  caller,  and returns a pointer to the
      allocated block. This temporary space is automatically freed
      when  the  caller  returns. If the allocated block is beyond
      the current stack limit, the  resulting  behavior  is  unde-
      fined.
    

> (have you ever tried writing a functional program with explicit malloc and
> free?)

I got my start on MOS 6502 / MOS 6510 / MC68000 assembler, so for me making
malloc(3C) and free(3C) calls when programming in a functional style is
completely normal. I have no problem with that whatsoever.

~~~
panic
The part where behavior is undefined when you overflow the stack makes alloca
difficult to use safely, but it is very nice when you can use it!

Did you write your 6502 code with closures, higher-order functions, and so on?
My point is that can be hard to figure out when to free an object in this kind
of environment, where a value can be captured by multiple closures and may not
have a clear owner.

~~~
Annatar
Then either use C and manually manage memory, or use ANSI common LISP, and no
problem.

------
cmrdporcupine
My experience writing an bidder for realtime ad exchanges in Java -- which was
a mistake driven by our use of some 'legacy' code -- the numbers work out on
average but not in the 90+th percentile. Heavy tuning of the GC yields better
results but there's always something that comes along and causes burps.
Throughput is usually on average fine -- but the latency spiky.

If your problem domain is fine with that, that's great. But I will never use
Java for something latency sensitive again.

After I left that job I worked on the other side of RTB, on the exchanges
themselves. They were both written in C++, and performance was reliable and
awesome.

I would only use something like C++ or Rust for this purpose.

~~~
poooogles
Were having a lot of success with our RTB app being written in Go. All we've
done is tune the back pressure on the GC up so we trade off some memory for
less GC time.

------
gravypod
Was this done with the G1 garbage collector brought in by Java 7 if not this
is worth a new test.

It fixes a lot of problems that used to be introduced, and as long as you let
it run wild with memory, and you are doing some parallelized work like
everything real world, you should not see this problem.

There should be either no more, or less, stop the world.

There are a few things you can do to also avoid this problem all together:

The biggest improvement in speed vs memory will come from not passing
primitive as function parameters. When you are doing this, you are passing-by-
value, not passing-by-reference. If you wrap a bunch of ints to a function
that you are using you can save a lot of allocation cycles.

Another good change that you can make would be an object pool. There is a
really good and fast implementation in JMonkeyEngine/LWJGL. They have a low
level, thread happy, object pool.

~~~
polyfractal
Just a note, G1 probably still isn't ready for rock-solid production usage.
E.g. bugs like this are still cropping up:
[https://bugs.openjdk.java.net/browse/JDK-8148175](https://bugs.openjdk.java.net/browse/JDK-8148175)

That's a pretty scary bug. Who knows how stuff like that will trash your data
if you aren't properly checksumming everything.

CMS and other GCs have the advantage of years of bug-squashing and tuning. G1
is exciting, but I wouldn't personally use it on anything important for quite
some time.

~~~
gravypod
That bug does not seem to effect Java 7, it only mentions 8u80 and 9. Still
very bad though.

~~~
polyfractal
Yep, just more recent versions, and it's been fixed too. But if you watch the
bugs that keep popping up for G1 you see stuff like this fairly regularly.

Of course, that's a totally unfair comparison: CMS has had a decade of bug
squashing...I'm sure it had equally scary bugs when it was new. But that's the
point. Don't use new, shiny GC's because they are still squishing bugs :)

(Sorry, preaching to the choir, I just get frustrated by everyone claiming G1
will solve all their problems without investigating potential downsides)

~~~
needusername
Actually CMS is only five years older than G1. The first published papers for
G1 are from 2004. But yes, G1 has a long history of scary bugs. There was a
time a few years ago where practically every week a crashing bug was fixed.

G1 addresses certain problem areas of CMS and replaces them with others.
Honestly I hope in ten years from now we have better choices in HotSpot than
CMS or G1 but right now it doesn't look like it (if you don't count Shenandoah
with has other issues).

Having that said I have recently seen G1 performing exceptionally well in
production: 120ms GC pauses with a 120 MB/s sustained allocation rate with
basically default settings (apart from GC logging).

------
stevesun21
You reminder me the time that I worked in a biology project 2007, I developed
a program with Java to analyze DNA sequences, every execution it can easily
handle over 3GB DNS sequence file without any issues, but for just curiously
finding out how fast the process can be, I rewrote the program with C. The
result is that C one is about 3 times faster than Java one, but the dev C one
costed me about whole week (2 days for Java one).

~~~
gozur88
That's why in practice Java programs tend to be faster if the application is
nontrivial. By the time you've finished the C version you would have completed
the initial version plus a few performance tuning cycles had you written it in
Java.

------
sdesol
I actually wrote a blog on how I keep an eye on JVM garbage collecting at

[http://gitsense.github.io/blog/realtime-process-
monitoring.h...](http://gitsense.github.io/blog/realtime-process-
monitoring.html)

My indexers are designed to automatically shutdown, if they are spending more
than 30% of their time doing garbage collecting, in the last 10 minutes. If
they shutdown on purpose, my background Perl script will restart them.

However, if they shutdown X number of times in a row, my Perl script won't
restart them. Multiple consecutive shutdowns, usually means I'm pushing the
system too hard and I'll need to tweak my indexers thread settings.

------
matt2000
This is probably the wrong title for this article. In the article she gets
right up to the available memory limit and keeps trying to allocate more, the
VM is forced to work increasingly hard to find the memory requested. I
wouldn't call that slow GC, just trying to use too much memory.

------
fpoling
The old rule of thumb is that with GC one needs at least twice the memory than
the max live heap. With little free space GC disables a lot of possible
optimizations that bigger free space allows. Another Java-specific rule is
that with heaps over 1-2GB one must think how to make data structures GC-
friendly or how to split the application into separated processes. I guess the
example program violated both of these.

------
rootlocus
Here is a pretty comprehensive article on the java garbage collector:
[https://plumbr.eu/handbook/garbage-collection-in-
java](https://plumbr.eu/handbook/garbage-collection-in-java) that has helped
me a lot.

It touches the basic aspects of garbage collecting, and dives into the
different kinds of GC available for java at this time.

------
pjmlp
Correction, a specific GC implementation, of a specific Java implementation
can be really slow.

~~~
rootlocus
On a specific, memory-leaking, use case.

------
stevesun21
You can write Java code to skip GC and manage memory by yourself. Like C++
[http://www.mkyong.com/java/java-write-directly-to-
memory/](http://www.mkyong.com/java/java-write-directly-to-memory/)

~~~
fauigerzigerk
It's nothing like C++.

In C++ you get automatically managed memory on the stack.

You get RAII and smart pointers to help with heap allocations/deallocations.

Most importantly, you get to _use_ the system's malloc implementation whilst
you have to _implement_ your own malloc with the Java off-heap solution you
suggest.

~~~
stevesun21
I'm not a C++ expert, RAII sounds to me is like a synchronous GC in C++ --
resource auto alloc and dealloc by it's lifetime
([https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initia...](https://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)),
correct me if I am wrong.

Java, in another way, it has an asynchronous GC -- use different generations,
and not release object right away just because out of scope.

Anyway, I am saying 'like C++', I meant the possible way C++ can manually
manage the memory directly, learnt from what you mentioned, I should say:
'like C'. Thanks.

~~~
catnaroek
RAII is an automatic memory (more generally, resource) management scheme, but
it isn't garbage collection. A garbage collector is a _runtime_ component that
reclaims unused storage. What RAII does is, at _compile time_ , insert the
appropriate resource-freeing calls at the right places.

------
jkot
In my experience there is usually one or two collection which consume most
memory. It is easy to solve 90% cases with a few simple optimalizations.

------
PavlovsCat
Same for Javascript, at least when aiming for perfect 60FPS, the only good
garbage collection is the one that doesn't occur after init, and maybe when
switching maps or whatnot. Even the 0.09ms in the article is way too much and
means a skipped frame. Maybe a way to think of it is to treat GC like optional
automatic destructors, which should get called (more or less) when _you_ want
them to, not as something you just "don't have to think about" (if you run
something in a loop and need it to be silk smooth, that is).

Incidentally, this is great: [https://www.mozilla.org/en-
US/firefox/46.0beta/releasenotes/](https://www.mozilla.org/en-
US/firefox/46.0beta/releasenotes/)

> Allocation and garbage collection pause profiling in the performance panel

~~~
ygra
> Even the 0.09ms in the article is way too much and means a skipped frame.

He wrote 0.09 _seconds_. In any case, if you're generating megabytes to
gigabytes of garbage _in a single frame_ you probably deserve a GC pause.

~~~
JoeAltmaier
Its very hard to control garbage. So many operations have unavoidable garbage
side effects, especially involving immutable arguments.

~~~
_ph_
A good GCed language should give you good controls about garbage generation.
If you have not these controls, the language is to blame, not the concept of
GC. But indeed, I think one of the strongest disadvantages of Java is that it
does not give you much control about garbage generation.

~~~
whateveracct
What languages give you more control over GC than Java? My only experience is
Java and Go. Go gives you literally one knob, whereas Java allows you to tune
MANY factors on top of picking a collector.

~~~
_ph_
I was talking about garbage _generation_. For best performance, you want to
control heap allocation and the memory layout. Java gives you no controls
there. All objects in Java are heap allocated, you can reference them only by
pointer. In Go you can have value-types. An array in Go can be a block of
structures, while in Java it would have pointers to separately allocated
object. Also, if you have objects as member variables in a Java object, they
cannot be part of that object, but need to be separate objects on the heap.
This limits quite a few optimizations.

------
karianna
Disclaimer: I'm the CEO of jClarity who produces Censum.

For those who are looking to read the arcane output of a Java GC log, you can
grab a 7-day free trial of Censum ([https://www.jclarity.com/product/censum-
free-trial/](https://www.jclarity.com/product/censum-free-trial/)) - it parses
GC logs (Java 6-9 all collectors) and gives you a host of analytics and graphs
to help you figure out what's going on. We've also got blog posts on GC
([https://www.jclarity.com/blog](https://www.jclarity.com/blog)) and our
slideshare
[http://www.slideshare.net/jclarity](http://www.slideshare.net/jclarity)

~~~
sjmaple
Disclaimer: I don't work for jClarity who produces Censum.

This is a super valuable tool, which I recommend people take a look at should
they have the misfortune to need to read a Java GC log.

Great work, jClarity!

------
jstimpfle
Plugging my own follow-up:

    
    
        https://news.ycombinator.com/item?id=11555129

~~~
_ph_
A very good post, but like much discussion here it turns into a comparison of
Java vs. C++. It would be good to also compare to Go, which is GCed, but gives
you all the value types of C++. The value types of Go are probably the reason
that they have so good GC performance (version 1.6+) without all the
complexity of the hotspot GC.

------
geodel
I think at this point it is clear those who care about memory usage and
deterministic performance will use C/C++/Rust(maybe). Saying Java is not up
there will bring Java supporters arguing endlessly how Java is so superior to
anything else in market, how Java's GC is state of the art. It would not
matter to them how much expert level tuning it takes to make it work. Yeah and
then there is Azul zing: a heavily over provisioned system on top of already
over provisioned Java systems to have better GC compared to Oracle/OpenJDK GC.

Working in Java for 10 years make me realize that so many solutions Java/ JVM
ecosystem provides are to the problems that Java ecosystem created in first
place.

~~~
pjmlp
> Working in Java for 10 years make me realize that so many solutions Java/
> JVM ecosystem provides are to the problems that Java ecosystem created in
> first place.

There are lots of companies making a living selling tools to track down and
fix memory corruption issues in C and C++.

Java is not alone.

------
based2
[https://www.azul.com/products/zing/pgc/](https://www.azul.com/products/zing/pgc/)

------
mbfg
This article would have been much more informative had we seen the actual
program. Usually it doesn't much matter how big a file you read from your
program assuming you are reading for aggregation purposes. If you are in need
of individual records, then that is what databases are for. So i smell a large
red herring. Is GC a problem? sure it can be. Usually tho when it is, it's
more likely to be my problem, not the JVMs.

------
Roboprog
I forgot the exact number, but I remember reading somewhere that for a garbage
collector to work well you need to give it something like 3 or 4 times more
space than you will actually be using at maximum allocation - not counting
transient stuff.

This allows it to shuffle things around more effectively while it is cleaning
up, doing things like copying into a compacted area.

------
rmprescott
Two simple suggestions:

The author mentioned reading in files with x "number of lines". If they are
then parsing the lines into some structured format, there are likely many
opportunities to look for low cardinality aspects and to reduce object
tenuring by pooling strings using either String.intern or a hashset.

They should also consider increasing the eden size.

~~~
beachstartup
it's been a really, really, long time since i've read something about
computers and been completely and utterly baffled by what i saw.

thanks for making me feel young again.

------
merb
Do you really need to read all the 8 million lines into memory? Wouldn't it be
better to have some kind of streaming and read up to, do some work, read the
next??

~~~
jstimpfle
Why not just assume so? Aren't there valid reasons to keep all in memory?

------
askyourmother
Garbage collection can cost money to a firm. Working a few years ago in a
large investment firm, lots of the code used to trade via algos, make markets,
index arb, was all written in C++. Deterministic, performant, reliable - it
just worked as planned and expected.

New tech lead comes in, swaggers about, declares all the street now uses Java
for their trading code, we should too. I got the desk heads to listen and be
wise to the impending issues, and they told him fine - but if the new Java
based code had direct impact on PnL, then his budget, ultimately, he would pay
for it. Cocksure, he agreed.

Despite throwing bucket loads of Java devs at it, spending fortunes on
"tuning" consultants, performance suffered, GC did affect at critical trading
conditions, and eventually he was exposed and kicked out.

The C++ code came back out of retirement, was updated for C++11/14, and still
serves them well to this day.

~~~
nvarsj
Quite a few of the top HFT firms use Java, almost exclusively.

The trick is to segment your critical path code (whatever is executing the
actual trades and is highly susceptible to delays) from any business logic.
Then you can focus on making critical path components fast - disable GC
completely, keep object allocations way down, audit every line of code, etc.
With such a setup you can even do better than typical C++ because you avoid
virtually all object allocation/deallocation costs that C++ has. Java object
allocation is dirt cheap, it just hurts when you GC, which you can basically
avoid or schedule for out of hours. If you want better I would not bother with
C++ at all personally, but use C / assembly or even look at FPGA type setups.

~~~
na85
If you're disabling GC, why use Java at all?

That's the point of using Java (or any interpreted language), isn't it?
Avoiding manual memory management?

If you're going to go through all that horrid FactoryFactory Factory = new
FactoryFactory(); bullshit, might as well get the benefits of the GC.

And if you don't want the GC, why not write your code in something else?

~~~
recursive
> If you're going to go through all that horrid FactoryFactory Factory = new
> FactoryFactory();

I get the impression that your only exposure to java is through jokes and
blogs from dynamic typing evangelists.

~~~
nv-vn
How would that show any connection to dynamic typing? I imagine that the GP is
coming from the perspective of C based on the fact that they're criticizing
the use of garbage collection.

------
stevenh
I had to solve stop-the-world pauses that were causing occasional intolerable
lag spikes in a multiplayer game. The solution was to switch to Java's G1 GC,
and also to deadpool and reuse every object and byte array I possibly could.

------
smegel
I wonder if Go's GC would exhibit pathological behavior in this case.

~~~
Jabbles
Go's current garbage collector isn't generational. And it aims to complete its
stop-the-world phase in < 10ms, so even the young generation collection which
the author describes as "fast" would be unacceptable (90ms).

------
naranha
Tldr: if you allocate nearly all heap memory, GC performance is bad. Who would
have thought. What an insightful post.

~~~
loevborg
The article may not be news for you, but it doesn't pretend to be a
comprehensive guide: it introduces common issues and explains their relation
through an experience report. We do need to talk more about garbage
collection. Many developers do not understand very well how GC works, and
there's a lack of high-quality discussion online. The article and the
discussion it's sparking helps demystify the topic.

~~~
naranha
I agree. It has a very click-baity title though. From my experience GC is
usually very fast, except when the program runs out of memory or over
allocates all the time due to bad programming.

------
needusername
So yeah, parallel old has high latencies. In other news water is wet and the
sky is blue. If you care about GC latencies don't use parallel old.

------
wmccullough
In other news, the sky is blue.

~~~
dang
Comments like this break the HN guidelines. Please post civilly and
substantively, or not at all.

~~~
recivorecivo
Can you make it part of the guidelines to not use ego? Without ego, comments
contain substance and are civil.

~~~
dang
That might be setting the bar a little high.

------
known
Containers will slow down your app

