
GC Fun at Twitch - _raz
https://modularfinance.se/blog/2019/1113-gc-fun
======
MaulingMonkey
Allocating 10GB per the original twitch blog, to tune GC specifics, does feel
like a bit of a hack around missing API knobs - but it's elegant enough. The
fact that it relies on the specifics of the underlying GC, when trying to tune
the specifics of GC behavior, isn't much of a drawback, so much as it's just
sane performance tuning.

This proposed alternative of just toggling the GC on/off outright in a
sleeping loop feels like a pretty big sledgehammer - _and_ just as much of a
hack. The 500ms sleeps are enough to see 5 GC cycles, going off of the
original twitch blog's 10 GCs/second numbers, which would also concern me - as
a potentially unwanted latency spike. I'm also curious what happens when the
GC is toggled back off mid-GC. It's more code, and feels brittle. That
ReadMemStats sync point may be worse than the GC spam in the first place!

------
ijcd
We had internal discussions around the "hacky" nature of the solution. Both
sides had proponents. The proof was in the numbers and some teams utilized it,
others did not. In the end it was a few-line solution that solved the problem
neatly and didn't rely on a (terrifying) dynamic solution such as is proposed
in this blog post. It was expected the "hack" would be temporary as we
expected the Go GC to quickly improve to the point it was not necessary.

~~~
panpanna
Since you seemed to have analyzed this carefully, why couldn't object pools be
used to reduce collectable garbage in the first place?

~~~
ijcd
That was done too, of course. There’s so much to get done in these big systems
that it’s often most efficient to take the quick win and move on, especially
when, as I mentioned, the world is expected to fix the problem for you for
free.

------
dwohnitmok
It's interesting to see how other GCs try to handle this. In particular the
JVM's (awesome) new Shenandoah low-latency GC has a ShenandoahAllocSpikeFactor
option precisely to deal with this kind of situation, where you can specify
what percentage of the heap you're willing to sacrifice in a spike in
allocation rates before the GC starts running wild trying to collect garbage.

The trade-off of the knob-less approach of Go's GC I suppose.

~~~
apta
The JVM offers state of the art GCs, which allows for selecting the best tool
for the job (throughput, latency, large heaps, etc.).

This is unlike the golang gc, which is tuned for latency at the expense of
throughput, with no way of modifying its behavior without resorting to hacks
like the article in the post.

~~~
dwohnitmok
To Go's credit, it predates Shenandoah and ZGC, before which your only real
option for low-latency GC on the JVM was Zing, which I don't think had too
many people using it (I certainly have no personal experience with it). I
can't say whether they were inspired by Go, but I do think that Go is
responsible for bringing the desirability of low-latency GC, even at
potentially high cost to throughput, to the forefront of the greater
programming community's attention.

~~~
astral303
Do you not consider G1 a low latency GC?

~~~
dwohnitmok
When considering latency, it's still not in the same league as Shenandoah and
ZGC. We still had occasional GC times over a second with G1 on some production
systems several years ago IIRC.

------
_bxg1
This is not my area of expertise, but from a software engineering perspective,
the proposal "Replace a constant in a configuration file with a new piece of
procedural code" smells like a huge new liability when it comes to
maintenance. Of course it could be truly necessary, but the author made it
sound like the "ballast" method was working just fine and simply felt hacky.
Personally, I'd rather document and maintain a single value change that's
"hacky" than 22 extra lines of turing-complete code.

~~~
suresk
I think I can see both sides of this argument - the "ballast" method is hacky
not just because of it being a sort of magic thing that might be tricky to
remember later, but it is relying on undocumented behavior that is not part of
the contract Go provides and could randomly break later.

The method presented in the article does seem better in that it is using well-
known and documented parts of Go's runtime api, but I think it might be
problematic for other reasons. Fiddling with GC behavior is always a little
risky because it works fine until you hit some weird corner case and it blows
up.

For example - What happens if that goroutine doesn't run for longer than you
expect and you leave GC turned off while another goroutine is creating a ton
of garbage? Might never be a problem, but it depends on allocation behavior
and how much headroom you have.

So it feels more correct, but also seems like it requires a lot more tuning
and testing to feel confident about it.

~~~
_bxg1
> it is relying on undocumented behavior that is not part of the contract Go
> provides and could randomly break later

Sort of. A change in the undocumented behavior might cause you to lose your
fine-tuning at some point in the future, but I wouldn't say it'll ever cause
it to _break_. You're just telling Go how much memory you want to pre-
allocate. It'll continue doing that; if that stops getting you the same GC
benefits you wanted, then at worst you'll be back in the same boat you were
originally.

Writing your own GC routine, on the other hand, gives you a ton of new
opportunities for introducing very real breakage via your own code.

------
twotwotwo
There is internal work on a SetMaxHeap API:
[https://github.com/golang/go/issues/23044](https://github.com/golang/go/issues/23044)
(there's a review of related code at
[https://groups.google.com/forum/#!topic/golang-
codereviews/b...](https://groups.google.com/forum/#!topic/golang-
codereviews/brkajcJ0mhI) ). It isn't perfect (notably, heap size and process
size as seen by the OS are not identical) but seems like a step up from
ballast or other workarounds.

In the issue thread Caleb Spare also proposed a minimum heap size so that you
get GOGC-ish behavior once your app uses enough RAM, but don't have constant
GCs with a tiny heap.

There's definitely a common issue where the GOGC heuristic doesn't take
advantage of situations where it can collect less often but still remain in
the "don't care" range of memory use. (CloudFlare talked about the same thing
making benchmark results weird: [https://blog.cloudflare.com/go-dont-collect-
my-garbage/](https://blog.cloudflare.com/go-dont-collect-my-garbage/) )

And there can definitely be situations where GC'ing a bit more would be worth
it to keep a process under an important memory threshold to avoid swapping or
OOM kills.

The designers famously don't want too many knobs, but some other ways to
convey user priorities to the runtime could certainly save users from some
awkward workarounds and fiddling w/the existing knobs.

------
teej
This has interesting parallels to the issues folks have with autoscaling in
AWS. When people first start using autoscaling it can be frustrating finding
the right heuristic to scale on, with the automated system over-shooting or
under-shooting what the capacity needs are.

What works well is when you calculate your own capacity needs, then just set
the autoscaler to change to that new capacity number. In other words, using
your knowledge of how your system works, you'll make better decisions than
just looking at secondary metrics like resource utilization.

I know I've done manually triggered GC in Ruby and Java but I don't know
enough about Go to say if the article's suggestion is reasonable.

~~~
ec109685
What does that mean to calculate capacity needs of the application? Are you
saying base it on something like throughput your app can handle?

------
arcticbull
This reminds me of why I hate garbage collectors and think we shouldn't keep
investing in them. Instead, we should double down on languages that allow you
to express liveness constraints in a way the compiler can understand and
manage statically. I'm not saying we have the perfect one yet, though
continuing to add knobs to a gooey ball of complexity is at best a game of
whack-a-mole. Do something you haven't planned for and your whole app or
service takes a dirt-nap and you need to call in a crack squad of your most
senior engineers. Then what? Uh, maybe allocate 11GB? There's no
predictability -- or even causality -- to these optimizations.

There's enough rockets on the rocket-powered horse that is GC to make it to
the moon and back.

~~~
pjmlp
What we should do is learn that many GC enabled languages also offer other
means to manage resources, and increase adoption of such features, instead of
throwing the baby with baby water, just because a couple of them use GC for
everything.

~~~
arcticbull
GC is a means not an ends and we shouldn't be attached to it. We should focus
on developing languages that allow the compiler to statically assess and infer
lifetimes then we don't need a giant for loop over all of active memory. The
value GC provides is it gives the developer an escape-rope from an
insufficiently expressive language. If that solution involves some form of GC
so be it, but the goal should not be to preserve GC but rather to improve the
efficiency of the final product without substantially impacting developer
ergonomics.

~~~
pjmlp
So far that sufficiently expressive language, usable by mainstream developers
is yet to be invented, and no Rust isn't yet it, too many hurdles to overcome
in common programming patterns.

------
MapleWalnut
off topic: It's annoying how the Twitch blog linked in the article doesn't
have an RSS feed. How do people read these blogs without one?

[https://blog.twitch.tv/en/tags/engineering](https://blog.twitch.tv/en/tags/engineering)

~~~
cyrusaf
The blog post is also available on Medium: [https://medium.com/twitch-news/go-
memory-ballast-how-i-learn...](https://medium.com/twitch-news/go-memory-
ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2)

------
EdwardDiego
I presume this is why Java has quite specific initial/min/max heap parameters,
definable either as a set amount of RAM, or a percentage of available.

------
erik_landerholm
If the ballast works and they can "afford" it within whatever parameters they
are using to define "afford", I'm all for that method.

------
tom_mellior
Interesting approach to the application defining a custom GC strategy. (I
wonder why the author gave it this strange title, since the article is really
about something that Twitch is _not_ doing.)

I'll save this for the next time someone posts something along the lines of
"you can't program X in a GC'd language because the GC is so unpredictable".

------
ncmncm
> _the ballast concept seems like a quite hacky solution to me._

"Quite a hacky solution" describes every single detail of every scrap of code
connected in any way to GC. It is the _whole point_ of the enterprise. If
hacky solutions make you unhappy, your only route to happiness is to run very
far away.

A lot gets done with very hacky solutions, and you will never need to throw a
rock very far to hit somebody who swears by them. Those of us who don't
haven't time to get that work done, so for most of the world's work, it's
hacks or nothing.

------
zozbot234
Why are people using GC in this day and age for _anything_ other than
processing on fully-general graphs (where the tracing and auto collecting is
genuinely helpful)? Literally _everything_ else can be dealt with by using
more flexible memory management strategies, that do not need a pre-allocated
10GB heap, and will not hog cpu in wasteful and unpredictable ways when memory
utilization rises above a set percentage.

~~~
tracker1
GC is an inherent part of many higher level languages/runtimes. Including Go,
which is the main reference here, but also in .Net and Java runtimes. Yes, you
could use a lower level language like C/C++, D, Rust etc to work around the
issue in other ways, but that leaves a lot of productivity benefit higher
level languages bring to the table.

I can't read the referenced twitch article from work so cannot comment. I'm
also not sure of the practical loads and implementation details and am
surprised that the Go GC was generally an issue to begin with.

I know I've purposely called GC for languages that use it for ETL jobs that
run on shared servers to minimize memory usage before.

~~~
ncmncm
They are "obligate-GC" languages. You don't get a choice whether to rely on
it.

It is fundamentally misleading to call C++ or Rust "lower-level" languages
than Go or Java. (As it is, also, to say "C/C++".) Both Rust and C++ support
much more powerful abstractions than either, making them markedly higher-
level. That they also enable actually coding abstractions to manage resources
(incidentally including memory resources) reduces neither their expressiveness
nor the productivity of skilled programmers.

The point of Java and Go is that less-skilled programmers can use them to
solve simpler problems more cheaply. Since most problems are simple, those
languages have a secure place.

~~~
jolux
High-level and low-level are ill-defined and I prefer to stick to more precise
language. Additionally, calling Rust high-level is inaccurate. It has high-
level features, but the linear typing used by the borrowing system ultimately
imposes cognitive overhead that does not exist in, say, OCaml. I think it’s
also interesting to note that garbage collectors for Rust that function
reasonably are still pretty experimental, with the most promising approach
I’ve seen so far being withoutboats’s Shifgrethor
([https://github.com/withoutboats/shifgrethor](https://github.com/withoutboats/shifgrethor)).

Haskell and Idris and the like (other languages with a type system in the
calculus of constructions) inarguably support a higher level of abstraction
than Rust does, and are also “GC-obligate” languages. So your example is
something of a red herring. I could say the same about Kotlin and Swift and
Scala, none of which really have a strong story for static memory safety like
Rust has, though it’s being considered for Swift. The only language that is
reasonably complete that I could think to compare it to is ATS, which is far
more complex as a result.

~~~
zozbot234
> garbage collectors for Rust that function reasonably are still pretty
> experimental

You can also use something like
[https://github.com/artichoke/cactusref](https://github.com/artichoke/cactusref)
\- which provides an equivalent of Rc<T> with nearly-seamless, timely
detection and collection of deallocation cycles. This gives you the equivalent
of full GC, but using a "zero-overhead" approach that integrates more cleanly
with how Rust idiomatically works.

~~~
pjmlp
Does it prevent stack overflows and stop-the-world delays in complex data
structures?

Two common problems in most reference counting implementations.

~~~
jashmatthews
IIUC Rust stack overflows are actually checked, hit a guard page, and unwind
the stack [https://github.com/rust-lang/compiler-
builtins/blob/master/s...](https://github.com/rust-lang/compiler-
builtins/blob/master/src/probestack.rs)

A cactusref is owned by a single thread so there's no STW issue, but you also
can't share them mutably between threads like structures available in some
GCed languages.

~~~
pjmlp
Thanks

------
marcrosoft
It is distracting to read this article and see Go code not ran through gofmt.

