
Improving G1 out-of-the-box performance - arunc
https://kstefanj.github.io/2020/04/16/g1-ootb-performance.html
======
joosters
Is it just me, or does all garbage collector development seem to go through an
endless loop of:

* We've optimised this release for throughput

* We've optimised this release for latency

* We've optimised this release for minimising memory usage

...and each release causes a regression in the stuff they weren't optimising
for, leading to another future release picking one of the other three targets.

Java and Go seem to have gone round this cycle more than once...

~~~
nvarsj
I don't really see that cycle. I think Hotspot's latency and memory usage has
always been atrocious, and they have slowly been working to improve that with
ParallelGC->CMS->G1.

I don't think there is a black/white world like Hotspot people tend to claim
about these things. You can have your cake and eat it - acceptable latency
<10ms/1ms AND good throughput/memory usage is possible without excessive
manual tuning. golang is an example of this, as are GCs in other language
runtimes. In every other managed language I've worked in, I never even think
about the GC 99% of the time. Whereas with Hotspot you have to tune and think
about it 100% of the time. Hopefully it will get there one day and make
working on JVM projects less painful.

~~~
pron
Go's GC's throughput is significantly lower than that of all OpenJDK's GCs,
and its latency numbers are misleading because they don't account for
throttling. To give you an example, a GC that works as follows -- allocate by
pointer bumping from a thread-local buffer, and when it runs out, freeze the
thread forever -- would count as having zero latency according to Go's
metrics, because this infinite pause is per-thread rather than stop-the-world;
of course, the throughput will eventually drop to zero, too, but they don't
report throughput. Why, then, do you see Go applications performing more-or-
less OK? Because of two things: they allocate fewer objects, and they just run
significantly more slowly than Java, except this slowdown is paced. In terms
of algorithm, Go's GC is pretty-much a simplified version (no young
generation) of OpenJDK's now-defunct CMS plus throttling. G1 is a generation
beyond that, and ZGC is two.

~~~
Scaevolus
Generational garbage collection doesn't help much with a language with value
types, stack allocation, and decent escape analysis.

“It isn't that the generational hypothesis isn't true for Go, it's just that
the young objects live and die young on the stack. The result is that
generational collection is much less effective than you might find in other
managed runtime languages”
[https://twitter.com/davecheney/status/1019430967054819328](https://twitter.com/davecheney/status/1019430967054819328)

Go's GC improvements are revealed in P99.9 latencies of servers over time, not
just in the raw numbers of how long it's stopping the world.

~~~
pron
That doesn't change the fact that Go's GC is a simplified CMS. It is true that
Go does have an easier life -- Java does do escape analysis and allocates on
the stack, but it doesn't have value types just yet, and so the allocation
rates are higher, which is why Java is not _drastically_ faster. I.e. a cruder
GC that's similar to OpenJDK's GC from two generations ago works OK for Go.
Despite Go having an easier life, Java 14 performs noticeably better than Go,
partly due to having a better compiler, but also because its GCs are just
better.

------
blattimwind
> The effect is that out of the box a 1 MB region size will be used, while for
> the fixed heap case the region size will be 2 MB. It might sound like a
> small difference, but the benchmark uses a significant amount of large
> objects that need special treatment when using 1 MB regions. This special
> treatment leads to a lot of memory that can’t be used, which in turn leads
> full collections and a poor overall experience.

This means the same problem as before can be observed, you'll just need
slightly bigger objects to get there.

~~~
winkeltripel
That's what I thought too. perhaps this should be profiled and adjusted on-
the-fly (probably a massive undertaking)?

~~~
ashtonkem
The Java way is more about selecting a reasonable default, and then exposing a
tuning parameter for those whose use case falls well off of what was designed
for.

------
bitcharmer
Great to see progress in this space. I remember trying out G1 for the first
time years ago and almost immediately noping out of it as the performance was
terrible.

Working in low-latency space the angle for me is much more about GC latency
(in microsecond-level scales). I've never seen anyone doing a thorough
evaluation of GC times across different implementations.

Does anyone here have any interesting sources to share?

~~~
noelwelsh
I haven't spent much time in the world of JVM tuning, but if latency is your
main concern the relatively new ZGC should be your jam:
[https://wiki.openjdk.java.net/display/zgc/Main](https://wiki.openjdk.java.net/display/zgc/Main)

~~~
pron
Yep. ZGC will soon offer _worst-case_ pauses of <1ms (it's around 2-3 now).

~~~
maxpert
Is ZGC production ready? Specially with JVM11?

~~~
pron
It will be declared non-experimental in 15, but people are running it in
production on 14. GCs, and performance in general, tend to improve quite
significantly between releases.

~~~
MaxBarraclough
Great news. How about the rival Shenandoah GC?

~~~
pron
They're both non-experimental in 15:
[https://openjdk.java.net/projects/jdk/15/](https://openjdk.java.net/projects/jdk/15/)

------
genpfault
Not that[1] G1.

[1]:
[https://en.wikipedia.org/wiki/HTC_Dream](https://en.wikipedia.org/wiki/HTC_Dream)

