
Shenandoah GC in production: experience report - truth_seeker
http://clojure-goes-fast.com/blog/shenandoah-in-production/
======
ignoramous
This is a super nice article with plenty of relevant talks and links. A HTTPS
mirror: [https://outline.com/K96Geb](https://outline.com/K96Geb)

A presentation on Shenandoah by one of the principal developers (focuses on
the how):
[https://youtube.com/watch?v=4d9-FQZZoVA](https://youtube.com/watch?v=4d9-FQZZoVA)

Another presentation (focuses on the how, why, and when) at devoxx:
[https://www.youtube.com/watch?v=VCeHkcwfF9Q](https://www.youtube.com/watch?v=VCeHkcwfF9Q)

Tangent: Go GC (generational, non-copying, concurrent mark and concurrent
sweep) and the improvements it saw from 1.5 -> 1.8
[https://blog.golang.org/ismmkeynote](https://blog.golang.org/ismmkeynote)

The mutator/pacer scheme, the tri-color mark scheme, using rw-barriers during
the sweep phase are interesting to contrast between the two GCs.

~~~
ngrilly
> Tangent: Go GC (generational, non-copying, concurrent mark and concurrent
> sweep)

I think that Go GC is not generational. If I'm not mistaken, the keynote you
linked explains they tried it and the gain were not there.

~~~
munificent
This makes sense for Go. A big part of the reason the generational hypothesis
is true for most applications is that they allocate short-lived objects inside
function bodies that are no longer needed when the function returns.

In Go, most of those objects actually do get allocated directly on the stack,
or as fields directly inside other objects. So they are never seen by the GC.
This is one of the aspects of the language design that I really admire.

Or, you could look at as saying that Go _does_ have a generational memory
manager, and the first generation is "on the stack".

~~~
apta
> In Go, most of those objects actually do get allocated directly on the
> stack, or as fields directly inside other objects.

Most golang code I've seen still relies on pointers to objects, so unless
escape analysis is better than say what the JVM or .NET offer, it's not really
that different from them. .NET already has value types, and the JVM should be
getting them soonish. So there really isn't that much difference from this
aspect.

~~~
CamouflagedKiwi
There is a vast difference in this aspect; a []SomeStruct makes two
allocations (one for the control structures, one for the data), which might be
on the stack or not. Conversely an ArrayList<SomeClass> makes a linear number
of allocations; each object in it is separately allocated.

As you say, value types are coming soon, but they don't exist now, so right
now there is a _lot_ of difference (and in practice will be for a long time,
because not everything is going to migrate to value types immediately).

~~~
geodel
And even after value types arrive it will be years before library ecosystem
leverage that. Typical Java projects use dozens of 3rd party libraries and
they are not going to see much effect in next 5 years.

~~~
miskin
But most of the value of value types would be in your own code to remove
boilerplate code. From outside it will for sure be just another class so
libraries and frameworks do not need to know abuot your value types.

------
mcguire
A quick note: Shenandoah is _not_ generational, according to the article. Most
bog-standard web apps (including REST thingys; not sure why the author calls
out those) do strongly obey the generational hypothesis. For most web apps, in
my experience, if you can tune your GC to serve the vast majority of your
requests from a young generation, your latencies will be good, your
performance will be good, your pauses will be infrequent and short, and plump
unicorns and bunny rabbits will gather in your cubicle to share their
rainbows.

~~~
unlogic
Hi, author here. You are saying exactly what I was thinking before. But turns
out, generational GCs have nasty failure modes when things don't go as
expected. E.g., if an upstream experiences its own difficulties and returns
responses slower, our service has to keep all the requests in memory longer,
so the heap runs out, and G1 performs a few fruitless YoungGCs (without
freeing much) and then tenures all those requests to OldGC, and now you have a
big OldGC pause bomb waiting for you.

Non-generational GCs don't have this problem, and it's one of the reasons why
Shenandoah suited us well there.

~~~
unlogic
Typo

< tenures all those requests to OldGC

> tenures all those requests to OldGen

------
grogers
Does anyone have experiences with both ZGC and Shenandoah? It seems like they
both have very similar goals (10 ms max pause) even though the implementation
is quite different. So with both landing at about the same time, when would
you prefer one or the other? Both seem like such a huge advance over G1 and so
similar to each other that between them it doesn't look like it matters much.

Our team is going to be targeting ZGC for the main reason that it's included
by default in JDK11. Yeah, builds of JDK11 with Shenandoah are available but
it's more work.

~~~
karussell
We have tried them both in production and regarding pause times both reduces
them a lot in our case. At the end our use case was better served from ZGC:
lower pause times and smaller latency.

We have throughput intensive loads where pause times do not matter and so we
do not use ZGC or Shenandoah, it seems that in these cases even ParallelGC is
better than G1 but that is another story.

I've tried to summarize this experience in this post (incl. GraalVM and
different JDK versions)

[https://discuss.graphhopper.com/t/3011](https://discuss.graphhopper.com/t/3011)

------
crawshaw
Great to see the JVM experimenting with low-pause GC.

It looks like the maximum pauses of Shenandoah are still well over 1ms, which
will still cause a lot of tail latency in services. (Go reached ~5ms max
pauses a few releases ago, but there was still a significant improvement in
the behavior of services when pauses were clamped at 100μms in more recent
releases.)

Definitely the right direction. I hope future versions of Shenandoah will
clamp GC pauses even lower.

~~~
pron
ZGC, the other low latency OpenJDK GC, is now (as of JDK 12) at 1.5ms max
pause, 0.5ms average, with a 128GB heap, on SPECjbb2015
([https://www.jfokus.se/jfokus19-preso/ZGC-Concurrent-Class-
Un...](https://www.jfokus.se/jfokus19-preso/ZGC-Concurrent-Class-
Unloading.pdf) slides 36-40) with throughput that's, as always, _much_ better
than Go's.

Also, I think Go's pause times are misleading because I believe Go uses
throttling and throttling pauses are not counted as GC pauses. One could
easily write a GC with absolute 0 max pause: you make each allocation a super-
fast, simple pointer bump from a thread local allocation buffer, and when
that's out you throttle (i.e. block) the thread forever. Of course the
throughput of such a "collector" will end up converging to zero, too. So pause
times without throughput numbers are meaningless.

~~~
pcwalton
Also, Go just doesn't compact the heap, ever. The stop-the-world pauses in the
JVM are typically from compaction. It's not really fair to compare a
compacting GC to a non-compacting one without acknowledging the huge tradeoff.

~~~
shipilev
In tracing collectors, marking (and generally walking the heap in possibly
random order) would take a significant amount of time. After marking is done,
you may decide to move only a few objects, so the overhead of the copying
itself is not that large. Updating the references to all those moved objects
might take another bulk of time. These stories get better with attempts to
segregate the heaps (generational, recording intra-regions references, etc)
somewhat. That comes with associated runtime costs to maintain the metadata to
support those partial collections, but on the upside it allows to minimize the
amount of work done (again), as it only walks/marks copies/updates the sub-
heaps.

I would not agree with the blanket "The stop-the-world pauses in the JVM are
typically from compaction". Concurrent marking is done in CMS, G1, Shenandoah,
ZGC. [In first two, there are nitty gritty details about young collections
that distort the story] -- and that resolves a significant portion of stop-
the-world time. Concurrent copying and updating references is done in
Shenandoah, ZGC -- that resolves the rest of it.

Of course, you can skip the compact part, and just do a sweep, which frees
implementation from dealing with the second part completely. This is not
without drawbacks, though: allocation path gets more complicated,
fragmentation needs to be dealt with, etc. How far you can get with that,
depends on what the use cases really are. As far as I can tell, the damned
"CMS concurrent mode failure" caused by heap fragmentation and unwillingness
to part with uber-fast bump-ptr allocation paths nudged most JVM people to
accept compaction as the go-to answer for reliable GC.

------
foobarbazetc
We use Shenandoah on every JVM we run and it’s amazing. Probably saves us
10k-20k a year on additional compute.

The devs are extremely responsive on the mailing list.

------
ajross
Curmudgeon opinion, riffing off what I think is the most important takeaway
from the linked article:

> Garbage collection is by no means a solved problem.

I was reading articles like this one, about new ideas in JVM garbage
collection, _literally two decades ago_. And it's still "not a solved
problem".

I mean, it's mostly solved. Managed heaps work great for lots of applications
and have been very successful. But that last 5% has stretched out so long it
seems almost like a joke. We'll never get there, for the mythical "there"
where GC overhead and latency isn't an issue that needs to be tuned in
deployment. And IMHO it's time to start recognizing that fact instead of
trying to make the JVM and .NET do what we get from explicit heaps in C and
Rust. They just aren't going to get there.

~~~
pron
> We'll never get there, for the mythical "there" where GC overhead and
> latency isn't an issue that needs to be tuned in deployment.

I think that for the most part we're "there" already. Such improvements in
GCs, like Shenandoah, C4 and ZGC, are more than keeping up with most
applications' evolving requirements, and yield tremendous benefits for the
relatively low cost of added RAM footprint.

> it's time to start recognizing that fact instead of trying to make the JVM
> and .NET do what we get from explicit heaps in C and Rust.

To do better than the JVM, especially in concurrent applications, you have to
work hard. What we "get" from manual memory management we actually _buy_ for a
rather steep price (and remember that for shared data structures, which are
extremely important in concurrent applications, Rust also uses a GC, just a
particularly primitive one -- reference counting).

And even what we buy is not universally "better." I just saw this nice
benchmark the other day, that compares allocation costs (even without
concurrency). As always, it gives a small part of the picture, but a very real
one: [https://github.com/rbehrends/btree-
alloc](https://github.com/rbehrends/btree-alloc). I don't think anyone can
reasonably claim that manual memory management is "generally better" than
modern GCs, even once the effort is invested. It's better in some respects and
worse in others. So it's not even clear which of them has that "last 5%" (and
sometimes more) advantage.

~~~
erichocean
> _Rust also uses a GC, just a particularly primitive one -- reference
> counting_

This meme that reference counting and garbage collection are even remotely the
same needs to die in a fire. Reference counting is not, and never has been,
"primitive garbage collection", and garbage collection is not, and never has
been, "super sophisticated reference counting".

~~~
afiori
there was an article a few month ago here describing how refcounts and
liveness reachability were the two dual basic GC and most simple GC are a
combination of the two techniques.

~~~
pron
Was it this one? [https://www.cs.virginia.edu/~cs415/reading/bacon-
garbage.pdf](https://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf)

Tracing and reference counting are two GC approaches that the paper shows are
dual in some interesting ways and can have similar characteristic -- when both
are sophisticated enough.

~~~
afiori
yes, thank you.

------
abalone
_> Throughput reduction is predictable, and it's easy to plan for that — if
your program runs 10% slower, bring up ~10% more servers; that's about it. But
long GC pauses are rapid and volatile; you can't "autoscale" out of them, so
in order to not fall over, you must allot extra resources to handle them._

Exactly. Long tail latency is often a more important server metric than
throughput.

This is why I’m excited about Swift on the server. Unlike most other languages
it uses automatic reference counting instead of pause the world garbage
collection. That means consistent latency.

~~~
chrisseaton
How does it collect cyclic data structures with consistent latency?

~~~
ghusbands
There exist runtimes that use reference counting and mark/sweep together.
Mark/sweep is used to collect cycles that reference counting misses. PHP and
Firefox's XPCOM both turn up in a google search for Cycle Collector.

Maybe Swift will gain it, one day.

~~~
abalone
_> Maybe Swift will gain it, one day._

Maybe not.

“We have discussed in the passed using hybrid approaches like introducing a
cycle collector, which runs less frequently than a GC would. The problem with
this is that if you introduce a cycle collector, code will start depending on
it. In time you end up with some libraries/packages that works without GC, and
others that leak without it (the D community has relevant experience here). As
such, we have come to think that adding a cycle collector would be bad for the
Swift community in the large.”[1]

[1] [https://lists.swift.org/pipermail/swift-evolution/Week-of-
Mo...](https://lists.swift.org/pipermail/swift-evolution/Week-of-
Mon-20160208/009422.html)

------
azhenley
Here is a previous discussion about the Shenandoah GC from the Java 12
announcement:
[https://news.ycombinator.com/item?id=19435890](https://news.ycombinator.com/item?id=19435890)

------
nullwasamistake
This article is great even for HN standards, thank you! Not much info on
Shenandoah in prod yet

