GC Fun at Twitch

MaulingMonkey · on Nov 15, 2019

Allocating 10GB per the original twitch blog, to tune GC specifics, does feel like a bit of a hack around missing API knobs - but it's elegant enough. The fact that it relies on the specifics of the underlying GC, when trying to tune the specifics of GC behavior, isn't much of a drawback, so much as it's just sane performance tuning.

This proposed alternative of just toggling the GC on/off outright in a sleeping loop feels like a pretty big sledgehammer - and just as much of a hack. The 500ms sleeps are enough to see 5 GC cycles, going off of the original twitch blog's 10 GCs/second numbers, which would also concern me - as a potentially unwanted latency spike. I'm also curious what happens when the GC is toggled back off mid-GC. It's more code, and feels brittle. That ReadMemStats sync point may be worse than the GC spam in the first place!

ijcd · on Nov 16, 2019

We had internal discussions around the "hacky" nature of the solution. Both sides had proponents. The proof was in the numbers and some teams utilized it, others did not. In the end it was a few-line solution that solved the problem neatly and didn't rely on a (terrifying) dynamic solution such as is proposed in this blog post. It was expected the "hack" would be temporary as we expected the Go GC to quickly improve to the point it was not necessary.

panpanna · on Nov 16, 2019

Since you seemed to have analyzed this carefully, why couldn't object pools be used to reduce collectable garbage in the first place?

ijcd · on Nov 16, 2019

That was done too, of course. There’s so much to get done in these big systems that it’s often most efficient to take the quick win and move on, especially when, as I mentioned, the world is expected to fix the problem for you for free.

dilyevsky · on Nov 16, 2019

Ikr? “ballast is hacky! let me just build my own gc real quick”. Fwiw i think go 1.14 will have the required knob in the runtime package.

dwohnitmok · on Nov 16, 2019

It's interesting to see how other GCs try to handle this. In particular the JVM's (awesome) new Shenandoah low-latency GC has a ShenandoahAllocSpikeFactor option precisely to deal with this kind of situation, where you can specify what percentage of the heap you're willing to sacrifice in a spike in allocation rates before the GC starts running wild trying to collect garbage.

The trade-off of the knob-less approach of Go's GC I suppose.

apta · on Nov 16, 2019

The JVM offers state of the art GCs, which allows for selecting the best tool for the job (throughput, latency, large heaps, etc.).

This is unlike the golang gc, which is tuned for latency at the expense of throughput, with no way of modifying its behavior without resorting to hacks like the article in the post.

dwohnitmok · on Nov 16, 2019

To Go's credit, it predates Shenandoah and ZGC, before which your only real option for low-latency GC on the JVM was Zing, which I don't think had too many people using it (I certainly have no personal experience with it). I can't say whether they were inspired by Go, but I do think that Go is responsible for bringing the desirability of low-latency GC, even at potentially high cost to throughput, to the forefront of the greater programming community's attention.

pjmlp · on Nov 16, 2019

They could offer other ways to manage memory manually instead of just escape analysis, like GC enabled systems languages, but that is not the nature of Go's design.

astral303 · on Nov 16, 2019

Do you not consider G1 a low latency GC?

dwohnitmok · on Nov 16, 2019

When considering latency, it's still not in the same league as Shenandoah and ZGC. We still had occasional GC times over a second with G1 on some production systems several years ago IIRC.

tus88 · on Nov 16, 2019

Except when the best tool for the job is a language without GC at all.

Thaxll · on Nov 16, 2019

Yes you can modify GC behaviour and it's one env variable, the fact that twitch didn't use it makes no sense.

MaulingMonkey · on Nov 16, 2019

The original blog post[1] does mention GOGC, and a list of pitfalls that made them prefer ballast / the pending heap size proposal[2] over it. Are they in error? Or do you find their reasoning unconvincing?

[1] https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

[2] https://github.com/golang/go/issues/23044

_bxg1 · on Nov 15, 2019

This is not my area of expertise, but from a software engineering perspective, the proposal "Replace a constant in a configuration file with a new piece of procedural code" smells like a huge new liability when it comes to maintenance. Of course it could be truly necessary, but the author made it sound like the "ballast" method was working just fine and simply felt hacky. Personally, I'd rather document and maintain a single value change that's "hacky" than 22 extra lines of turing-complete code.

suresk · on Nov 15, 2019

I think I can see both sides of this argument - the "ballast" method is hacky not just because of it being a sort of magic thing that might be tricky to remember later, but it is relying on undocumented behavior that is not part of the contract Go provides and could randomly break later.

The method presented in the article does seem better in that it is using well-known and documented parts of Go's runtime api, but I think it might be problematic for other reasons. Fiddling with GC behavior is always a little risky because it works fine until you hit some weird corner case and it blows up.

For example - What happens if that goroutine doesn't run for longer than you expect and you leave GC turned off while another goroutine is creating a ton of garbage? Might never be a problem, but it depends on allocation behavior and how much headroom you have.

So it feels more correct, but also seems like it requires a lot more tuning and testing to feel confident about it.

_bxg1 · on Nov 15, 2019

> it is relying on undocumented behavior that is not part of the contract Go provides and could randomly break later

Sort of. A change in the undocumented behavior might cause you to lose your fine-tuning at some point in the future, but I wouldn't say it'll ever cause it to break. You're just telling Go how much memory you want to pre-allocate. It'll continue doing that; if that stops getting you the same GC benefits you wanted, then at worst you'll be back in the same boat you were originally.

Writing your own GC routine, on the other hand, gives you a ton of new opportunities for introducing very real breakage via your own code.

lilyball · on Nov 15, 2019

Agreed, especially because this new code may have unintended consequences. For example, if the heap grows extremely fast in that 500ms sleep time then it can get dramatically larger than you'd like, when instead we want to run a GC right as it hits 20GB used.

twotwotwo · on Nov 15, 2019

There is internal work on a SetMaxHeap API: https://github.com/golang/go/issues/23044 (there's a review of related code at https://groups.google.com/forum/#!topic/golang-codereviews/b... ). It isn't perfect (notably, heap size and process size as seen by the OS are not identical) but seems like a step up from ballast or other workarounds.

In the issue thread Caleb Spare also proposed a minimum heap size so that you get GOGC-ish behavior once your app uses enough RAM, but don't have constant GCs with a tiny heap.

There's definitely a common issue where the GOGC heuristic doesn't take advantage of situations where it can collect less often but still remain in the "don't care" range of memory use. (CloudFlare talked about the same thing making benchmark results weird: https://blog.cloudflare.com/go-dont-collect-my-garbage/ )

And there can definitely be situations where GC'ing a bit more would be worth it to keep a process under an important memory threshold to avoid swapping or OOM kills.

The designers famously don't want too many knobs, but some other ways to convey user priorities to the runtime could certainly save users from some awkward workarounds and fiddling w/the existing knobs.

teej · on Nov 15, 2019

This has interesting parallels to the issues folks have with autoscaling in AWS. When people first start using autoscaling it can be frustrating finding the right heuristic to scale on, with the automated system over-shooting or under-shooting what the capacity needs are.

What works well is when you calculate your own capacity needs, then just set the autoscaler to change to that new capacity number. In other words, using your knowledge of how your system works, you'll make better decisions than just looking at secondary metrics like resource utilization.

I know I've done manually triggered GC in Ruby and Java but I don't know enough about Go to say if the article's suggestion is reasonable.

ec109685 · on Nov 16, 2019

What does that mean to calculate capacity needs of the application? Are you saying base it on something like throughput your app can handle?

arcticbull · on Nov 16, 2019

This reminds me of why I hate garbage collectors and think we shouldn't keep investing in them. Instead, we should double down on languages that allow you to express liveness constraints in a way the compiler can understand and manage statically. I'm not saying we have the perfect one yet, though continuing to add knobs to a gooey ball of complexity is at best a game of whack-a-mole. Do something you haven't planned for and your whole app or service takes a dirt-nap and you need to call in a crack squad of your most senior engineers. Then what? Uh, maybe allocate 11GB? There's no predictability -- or even causality -- to these optimizations.

There's enough rockets on the rocket-powered horse that is GC to make it to the moon and back.

pjmlp · on Nov 16, 2019

What we should do is learn that many GC enabled languages also offer other means to manage resources, and increase adoption of such features, instead of throwing the baby with baby water, just because a couple of them use GC for everything.

arcticbull · on Nov 18, 2019

GC is a means not an ends and we shouldn't be attached to it. We should focus on developing languages that allow the compiler to statically assess and infer lifetimes then we don't need a giant for loop over all of active memory. The value GC provides is it gives the developer an escape-rope from an insufficiently expressive language. If that solution involves some form of GC so be it, but the goal should not be to preserve GC but rather to improve the efficiency of the final product without substantially impacting developer ergonomics.

pjmlp · on Nov 19, 2019

So far that sufficiently expressive language, usable by mainstream developers is yet to be invented, and no Rust isn't yet it, too many hurdles to overcome in common programming patterns.

MapleWalnut · on Nov 15, 2019

off topic: It's annoying how the Twitch blog linked in the article doesn't have an RSS feed. How do people read these blogs without one?

https://blog.twitch.tv/en/tags/engineering

cyrusaf · on Nov 16, 2019

The blog post is also available on Medium: https://medium.com/twitch-news/go-memory-ballast-how-i-learn...

EdwardDiego · on Nov 16, 2019

I presume this is why Java has quite specific initial/min/max heap parameters, definable either as a set amount of RAM, or a percentage of available.

erik_landerholm · on Nov 15, 2019

If the ballast works and they can "afford" it within whatever parameters they are using to define "afford", I'm all for that method.

tom_mellior · on Nov 15, 2019

Interesting approach to the application defining a custom GC strategy. (I wonder why the author gave it this strange title, since the article is really about something that Twitch is not doing.)

I'll save this for the next time someone posts something along the lines of "you can't program X in a GC'd language because the GC is so unpredictable".

ncmncm · on Nov 16, 2019

> the ballast concept seems like a quite hacky solution to me.

"Quite a hacky solution" describes every single detail of every scrap of code connected in any way to GC. It is the whole point of the enterprise. If hacky solutions make you unhappy, your only route to happiness is to run very far away.

A lot gets done with very hacky solutions, and you will never need to throw a rock very far to hit somebody who swears by them. Those of us who don't haven't time to get that work done, so for most of the world's work, it's hacks or nothing.

zozbot234 · on Nov 15, 2019

Why are people using GC in this day and age for anything other than processing on fully-general graphs (where the tracing and auto collecting is genuinely helpful)? Literally everything else can be dealt with by using more flexible memory management strategies, that do not need a pre-allocated 10GB heap, and will not hog cpu in wasteful and unpredictable ways when memory utilization rises above a set percentage.

pjmlp · on Nov 16, 2019

Because GCs offer the best balance between performance and productive, even when going with the reference counting algorithms path.

Except for the heroic efforts from the Rust community, linear types are far from general consumption for any kind of software development.

Plus, having GC does not preclude being able to stack allocate, keep data on manual memory segment, or even resort to manually manage memory in unsafe code blocks.

Examples of GC enabled languages with such features, Modula-3, Mesa/Cedar, Active Oberon, Nim, D, Eiffel, C#, F#, System C# (M#), Sing#, Swift, ParaSail, Chapel.

Eventually Java might get such capabilities if Panama and Valhalla actually end up being part of the official implementation.

Manual memory management is required for some critical code paths, but so is Assembly, both are niches, not something to spend 100% of our coding hours.

tracker1 · on Nov 15, 2019

GC is an inherent part of many higher level languages/runtimes. Including Go, which is the main reference here, but also in .Net and Java runtimes. Yes, you could use a lower level language like C/C++, D, Rust etc to work around the issue in other ways, but that leaves a lot of productivity benefit higher level languages bring to the table.

I can't read the referenced twitch article from work so cannot comment. I'm also not sure of the practical loads and implementation details and am surprised that the Go GC was generally an issue to begin with.

I know I've purposely called GC for languages that use it for ETL jobs that run on shared servers to minimize memory usage before.

ncmncm · on Nov 16, 2019

They are "obligate-GC" languages. You don't get a choice whether to rely on it.

It is fundamentally misleading to call C++ or Rust "lower-level" languages than Go or Java. (As it is, also, to say "C/C++".) Both Rust and C++ support much more powerful abstractions than either, making them markedly higher-level. That they also enable actually coding abstractions to manage resources (incidentally including memory resources) reduces neither their expressiveness nor the productivity of skilled programmers.

The point of Java and Go is that less-skilled programmers can use them to solve simpler problems more cheaply. Since most problems are simple, those languages have a secure place.

jolux · on Nov 16, 2019

High-level and low-level are ill-defined and I prefer to stick to more precise language. Additionally, calling Rust high-level is inaccurate. It has high-level features, but the linear typing used by the borrowing system ultimately imposes cognitive overhead that does not exist in, say, OCaml. I think it’s also interesting to note that garbage collectors for Rust that function reasonably are still pretty experimental, with the most promising approach I’ve seen so far being withoutboats’s Shifgrethor (https://github.com/withoutboats/shifgrethor).

Haskell and Idris and the like (other languages with a type system in the calculus of constructions) inarguably support a higher level of abstraction than Rust does, and are also “GC-obligate” languages. So your example is something of a red herring. I could say the same about Kotlin and Swift and Scala, none of which really have a strong story for static memory safety like Rust has, though it’s being considered for Swift. The only language that is reasonably complete that I could think to compare it to is ATS, which is far more complex as a result.

zozbot234 · on Nov 16, 2019

> garbage collectors for Rust that function reasonably are still pretty experimental

You can also use something like https://github.com/artichoke/cactusref - which provides an equivalent of Rc<T> with nearly-seamless, timely detection and collection of deallocation cycles. This gives you the equivalent of full GC, but using a "zero-overhead" approach that integrates more cleanly with how Rust idiomatically works.

pjmlp · on Nov 16, 2019

Does it prevent stack overflows and stop-the-world delays in complex data structures?

Two common problems in most reference counting implementations.

jashmatthews · on Nov 16, 2019

IIUC Rust stack overflows are actually checked, hit a guard page, and unwind the stack https://github.com/rust-lang/compiler-builtins/blob/master/s...

A cactusref is owned by a single thread so there's no STW issue, but you also can't share them mutably between threads like structures available in some GCed languages.

pjmlp · on Nov 16, 2019

Thanks

tracker1 · on Nov 18, 2019

Lower-level meaning closer to the metal. Higher level would be further from the hardware, more abstracted. And in this context, it's more about the relativity to each other.

I wouldn't suggest that a given language favors more or less skilled programmers. There are plenty of skilled programmers that will choose a given language simply for the time to get things done. Not every problem needs absolute performance and memory scarcity, in fact I would suggest that most don't.

I've used some very low level languages as well as a bit of assembly in the past. All the same, JavaScript is the language I enjoy the most, simply because I can get things done in flexible ways, with many modules already written. There are times where you want absolute performance with minimal memory (embedded systems in particular, though even they're getting pretty powerful) and there are others where you can duct tape something together that only needs to run a couple times a day.

I've seen front end developers that couldn't handle conceptualizing SQL-style data storage... Likewise, I've seen backend developers unable to deal with breaking apart UI components or dealing with event based workflows or managing state outside a database context. I've seen systems developers create the most byzantine, overly complex and buggy systems you can imagine, that don't even work half the time. On the flip side, I've seen aircraft systems designed and built almost entirely in hardware... now some of that is truly impressive (and took years to design and develop, compared to weeks/months many developers will get).

Closer to when I was starting out, it was Visual Basic that was considered the proverbial whipping boy of "beginner" or "less skilled" languages. I've seen very good, and very bad implementations of a great many things in a great many languages over the past few decades. I'd say some of the worst of the worst code I've dealt with has come from the most arrogant people I've worked with. Generally, you aren't as smart or clever as you think you are. And I mean "you" in the colloquial sense. In the end, all anyone (or at least most people/users) really care about is it does the job, and is relatively easy to use.

weberc2 · on Nov 16, 2019

There are other factors in engineering besides CPU. For many applications, CPU is the cheaper resource (e.g., compared to developer time / opportunity cost).

marcrosoft · on Nov 16, 2019

It is distracting to read this article and see Go code not ran through gofmt.