
JVM Garbage Collectors Benchmarks Report - Sindisil
https://ionutbalosin.com/2019/12/jvm-garbage-collectors-benchmarks-report-19-12/
======
bootcat
Why isn't the benchmark code public ? For people to adopt/convinced, they
should be able to run the benchmark themselves, and see a similar pattern in
the numbers as well.

~~~
tom_mellior
Some theories:

(a) laziness

(b) wanting to avoid discussions about whether the exact way something was
written was fair to some GC or another

(c) wanting to avoid "can't reproduce these exact numbers on a different CPU
or different JDK version" discussions

(d) the code doesn't matter since these microbenchmarks are not the actual
application you are interested in

Seriously, if you are interested enough in this that you would be willing to
run the benchmarks, you presumably have some JVM workload you care about. (Or
you could just grab a standard benchmark suite.) If you are interested enough,
you can run _your workload_ with the different GC flags and see what they do
_for you_ , in _your context_. Being able to reproduce the author's
microbenchmark numbers on your laptop would not tell you anything about how
your workload of interest runs on the big servers (or whatever).

------
vosper
Has anyone experienced a significant performance improvement just by switching
GCs? We have an Elasticsearch cluster with G1GC and I’ve been wondering if
it’s worth the time to replace a small number of nodes with different GCs. We
could swap out 10% of the G1GC nodes for ZGC nodes, for example.

~~~
mdasen
I know that the Shenandoah people have tested with Elasticsearch and seen
great improvements.

[https://d3i71xaburhd42.cloudfront.net/605efbc0fed86899dcbf87...](https://d3i71xaburhd42.cloudfront.net/605efbc0fed86899dcbf8733884a9256493acb25/8-Table3-1.png)

Christine Flood is pretty great and she has some wonderful talks on YouTube if
you're interested in how Shenandoah works. The image above comes from one of
her papers on GC ([https://www.semanticscholar.org/paper/Shenandoah%3A-An-
open-...](https://www.semanticscholar.org/paper/Shenandoah%3A-An-open-source-
concurrent-compacting-Flood-Kennke/605efbc0fed86899dcbf8733884a9256493acb250))
and shows order-of-magnitude improvements in pause times with average pause
times 1/10th that of G1 and total pause times 1/27th that of G1.

Shenandoah and ZGC are also made for larger heaps and often datastore have
larger heaps.

~~~
vosper
Thanks for that, I’ll definitely check out the paper! I don’t know much about
the JVM, or GCs, but it’s always interesting to learn.

------
ryanobjc
A good start but it’s really a brief survey and I wouldn’t call it actionable.

The greatest challenges for Java gc is when heap sizes are big and when data
is turned over slower than pure transactional. In other words allocate and
retain for a while and free, with a 32gb heap. That’s where gc tuning becomes
a dark art.

The bottom line is allocation rapidly and keeping up with that is easy enough.
Mix in hundreds of parallel and live app threads, large memory structures that
live for minutes or hours, and add a transactional gc turnover.

I think go does so well because if you want a cache in go you get smart and
use the c escape route.

~~~
grandinj
For large data structures I sometimes move them off the Java heap using the
NIO/ByteBuffer APIs, which keeps my normal working set smaller. Only works for
certain use-cases though.

~~~
kjeetgill
Java Unsafe can be great for moving things off heap too. And this is also the
kinda place where pooling in Java can help more than it hurts. But all these
solutions have a few sharp edges and aren't universally helpful.

If/when Valhalla lands it'll help soften a few of the rough edges here.

------
sansnomme
What's the performance of Go's GC like against the JVM's? Especially since
both languages are about the same abstraction level and roughly the same speed
(3x slower than C for general unoptimized code excluding startup time).

~~~
hyperpape
I don't know of any direct comparisons of JVM and Go GC efficiency.

However, the biggest difference is that there's only one Go GC, while there
are several supported ones for Hotspot. With the exception of CMS, which is
deprecated, those collectors occupy different niches, meaning there are times
you'd use each of them.[0]

The Go collector is optimized for relative simplicity and low latency, and
currently tolerates significant overhead to hit that target.

[0] Maybe Shenendoah and ZGC are close enough that they're just direct
competitors? Perhaps they're even designed to supercede G1? I'm not quite sure
about that, but even so, it would leave at least 2-3 options, depending on
whether you include Serial/Epsilon as serious options.

~~~
HALtheWise
It's fundamentally difficult to compare them, because the features of the
language itself change the workload the GC faces in a typical program. For
example, Java boxes almost all values, so can't store very much on the stack.
That means that conventional Java programs (typically) generate many more
short-lived heap allocations than Go programs.

For more, see the excellent blog post here:
[https://blog.golang.org/ismmkeynote](https://blog.golang.org/ismmkeynote)

~~~
hyperpape
I haven't put real thought into this, but it seems that in principle, given
enough understanding of the runtimes, you can create good benchmarks of the
collector. You would have to find patterns that force it to allocate/collect
similar amounts of data, but that's doable.

However, to your point, so far as the languages tend to allocate more or less
data on the heap, that might change which benchmarks look good, the same way
that you'll care about different automotive benchmarks if you want to tow a
trailer or race.

------
hyperpape
With the BurstHeapMemoryBenchmark, I wonder whether the effect is driven by
the concurrency of G1/Shenendoah/ZGC, not greater efficiency.

If the code doing allocation is single threaded, then parallel/serial GCs
could have stop the world pauses, while the other collectors would continue
allocating while the GC runs.

That's a real effect. It means that sometimes you can improve the throughput
of a process with a lot of sequential work by allocating extra-CPUs and using
concurrent collectors. However, that's something of a niche case--more often,
you choose between throughput and lower pause times.

~~~
kjeetgill
I think the rock-paper-scissors dance between throughput, latency, and
efficancy that a GC plays is a pretty facinating topic. It really brings out
the our biases from the kinds of work we do.

If a service has reduced it's latency that only increases the throughput the
overall system. CPU/Memory efficiency of a single process often much less
important because there's often headroom and GC can help leverage
underutilized that cpu/memory.

------
twic
Do i read correctly that in the ConstantHeapMemoryOccupancyBenchmark, ZGC
delivers two operations per second? Two? Per second?!

The test is just:

    
    
      void test(Blackhole blackhole) {
          blackhole.consume(new byte[_8_MB]);
      }
    

So it's not like each "operation" is actually a large number of elementary
operations, either.

~~~
the8472
No, the test is more than that. The preallocated objects matter because they
are pathologic cases for certain GC operations. Large allocations compound
this further.

I guess CMS manages to perform fine here because it can flexibly resize the
young generation and isn't region-based while G1 has special optimizations for
what they call humongous allocations.

It's a test that checks how collectors perform under two combined deviations
from the assumptions for which they're optimized (i.e. that object graphs have
a high branching level that allows parallelization, short-lived objects are
small on average)

------
okr
Does anybody know the author of the article? Where does he get his content
from? Is he making a living as a speaker or is/was he also a coder?

~~~
chrisseaton
> Does anybody know the author of the article?

You know this article is on his website, don't you? His name is the domain
name.

[https://ionutbalosin.com/about-me/](https://ionutbalosin.com/about-me/)

> is/was he also a coder

Seems unlikely anyone would be writing an article about Java GC, including
having coded up benchmarks, without being able to code.

~~~
okr
I am just curious. In my 'likely' world, people doing this kind of work have a
reputation or a visible history. A live q/a session on the internet could do
wonders.

~~~
chrisseaton
> have a reputation or a visible history

He does - it’s on his website.

~~~
okr
Thats not enough for me. I write stuff all day on my or other websites. Who
vouches for his work? The guy is secretful. One can not reproduce the
benchmark. Why should i trust his results?

~~~
chrisseaton
> Who vouches for his work?

It's a blog, not a peer-reviewed research paper. The guy is widely known in
the JVM community. I'll vouch for his work if you want.

