
Go memory ballast: How I learnt to stop worrying and love the heap - lelf
https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i-learnt-to-stop-worrying-and-love-the-heap-26c2462549a2/
======
matahwoosh
I think the takeaway here is not that some corners of Go are not performant
(and should be swapped for C++/D etc.), but that there are tools in the Go
ecosystem that help you figure out what the issue is.

Until Twitch hit the bottleneck with their service, they were reaping multi-
year benefits of faster development using memory-safe language (huge gain for
security).

Their short- (or mid-) term solution seems decent and more importantly Go core
team is working on addressing this particular issue (which given the Go 1.x
backward compatibility promise makes it really easy to reap all the benefits
with each Go release).

------
sharpneli
"Only if we attempt to read or write to the slice, will the page fault occur
that causes the physical RAM backing the virtual addresses to be allocated."

Tiny nitpick. On Linux at least you can read from the slice. New memory
allocations are given a copy on write zero page (full of zeroes). So you can
easily read that 10GB worth of zeroes out and still memory consumption
wouldn't increase.

Only when you write a pagefault is issued and you'll get real backing for it.

------
noelwelsh
I'm really surprised there are no flags to configure GC behaviour in Go. This
is fairly standard performance tuning stuff on the JVM which provides a number
of options to control GC behaviour, minimum and maximum heap size being some
of the basic ones.

~~~
sagichmal
Isn't it great? You don't need a degree to operate the thing. When there are
esoteric conditions like the one described in the post, the team is usually
super responsive to fix them if they're bugs, or give you reasonable and
lightweight workarounds when they're not.

~~~
erik_seaberg
You always have to tune GC if you're pushing the limits of the machine. If you
can't do it by turning knobs, you have to do it by rewriting code.

~~~
sagichmal
It's Go's position that the language (and/or runtime) should make this
unnecessary for the vast majority of use cases, I'm not sure what they're
targeting exactly but I'm sure it's 99% or more. The resulting _lack of knobs_
is actually a feature, and one I'm very glad to have.

I'm also happy to make my code less allocation-happy or more GC-friendly in
the extremely rare circumstances when it matters. Like this article!

But I think we probably just work in different environments.

~~~
erik_seaberg
I would much rather make the concise and clear implementation perform better,
than take the readability hit and risk of mistakes from rewriting it (probably
more than once).

~~~
sagichmal
This is sensical for application code, but I think a bad compromise for a
language runtime.

------
Akababa
This is a pretty hilarious hack. I'm surprised there's no way to do this
within Go runtime.

------
brian-armstrong
So, isn't this a strong argument against Go? At the point that you need to
start devising clever workarounds to stop bad behavior from your runtime, it's
time to find a new language.

~~~
wrs
If you know of a language that has absolutely no unexpected behavior in its
runtime under any workload, please let us know what it is!

~~~
ummonk
Baremetal assembly

~~~
presumably
Counterexample:
[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

~~~
saagarjha
Bugs are unintentional and can (and are) fixed.

------
twotwotwo
FWIW, there's a related open issue about adding a minimum heap size for
situations like this, where you don't care about wasting a bit more RAM but do
care about the GC work:
[https://github.com/golang/go/issues/23044](https://github.com/golang/go/issues/23044)

It references a patch to add a SetMaxHeap call, which could help in these
kinds of situations too.

~~~
_ph_
Yes, it is a bit strange that you cannot tell the GC a few more things about
your requirement. The default setup usually works quite fine for me, but I
would like to be able to tell the GC to have a certain minimum memory size. If
I absolutely know that an application is going to need a certain memory size,
the GC doesn't need to to any work before that is reached. Furthermore, I
would like to set the GC percent quite low, if I don't have many allocations
beyond that size - but that is less optimal, if the heap has to grow to that
size.

~~~
twotwotwo
And there are other situations where if you _overshoot_ some memory limit, you
get killed or the OOM killer kills some random thing (ugh) or you just have
performance degradation (swapping, crowd out caches, whatever). Often better
to suffer n% more GC CPU usage than to deal with that. So min/max heap are
both important in different scenarios.

A blog post from someone working on .NET suggested "the user shouldn't have to
know about GC internals" as the heuristic for when you have too many knobs,
which I like. The user knows some things that the runtime does not about their
needs in terms of RAM vs. CPU use, and it seems reasonable to have ways to
communicate that.

(One other thing the user knows is their relative priorities for having a
consistent level of CPU used by the GC vs. minimizing the absolute amount of
CPU used. I think Go is hard-coded to target about 25% CPU use during a
collection right now. Not sure we're ever getting that knob, though.)

------
dastx
I'm confused. They say "above code" and whatnot, but I don't see any code is
this article. Is this the same for everyone?

How did they allocate that 10GB byte array?

~~~
karlnowak

        func main() {
    
            // Create a large heap allocation of 10 GiB
            ballast := make([]byte, 10<<30)
    
            // Application execution continues
    
        }

~~~
bboreham
10<<30 is a lot. Perhaps you meant 2<<30?

~~~
Dylan16807
Are you confusing shifting with exponents? 10<<30 is exactly five times bigger
than 2<<30.

~~~
bboreham
Yes, I did.

------
innagadadavida
Has there been any discussion or proposals to Go to simply provide a free()
function? This could recursively GC up all objects starting with the given
root pointer. The idea is to avoid hacking around the GC system and instead
give some hints to the runtime to better optimize GC - without penalizing
folks that don’t care about this level of performance.

~~~
privateSFacct
If you free something that has a reference to it still what would be the
effect? Can GC still free and then fault out on the reference, or does the
free throw an error (and you do a check while GCing?). If the later, can you
do a free starting anywhere or do you need to have a good global state?

~~~
setr
What's wrong with the first option? Runtime invalidates the pointer, and then
just segfault on use and be done with it?

It seems to me that if you're going to play with memory directly, it's
reasonable for the generally-memory-safe runtime to throw out its guarantees
on memory-safety

~~~
zbentley
I think it's reasonable to want that in principle, but in practice it gets
really messy. What if I use a library which returns me a manually-mis-managed
pointer (either due to a library bug or me not setting up manual memory
management correctly when initializing the library) which segfaults on read or
write? That's kind of spooky behavior in a language that prides itself on
simplicity.

There are ways that could be done (for example by making the type system aware
of whether or not something is hand-managed and potentially unsafe) but not
without adding considerable complexity to the language and, presumably, the
implementation.

~~~
Reelin
Doesn't your scenario essentially amount to asking what happens if you use a
poorly written library that does bad things and it breaks your program in
unexpected ways? That can happen in every language I'm aware of, regardless of
safety guarantees. For Go, consider the unsafe.Pointer type.

This commonly recurring idea that we need to somehow banish all insecurity and
risk is flawed in my opinion. When safe is the easiest and most convenient way
of doing something, it will generally be used. What we need are sane defaults,
combined with explicit and unambiguous interfaces for when we do choose to
take manual control. The real issues start to arise when a language exhibits
unexpected behavior, when automatic memory management is opt-in instead of
opt-out, and similar.

------
grandinj
Sounds like go is missing the Java equivalent of -Xms which sets the minimum
heap size before GC is necessary

~~~
chrisseaton
> Sounds like go is missing the Java equivalent of -Xms which sets the minimum
> heap size before GC is necessary

-Xms sets the initial heap size, but GC happens before the heap is exhausted, in most GC configurations. There is a smaller eden in the heap, and GC is needed to evacuate objects out of the eden into the main heap.

------
numlock86
> It turns out that for modern operating systems, sweeping (freeing memory) is
> a very fast operation, so the GC time for Go’s mark-and-sweep GC is largely
> dominated by the mark component and not sweeping time.

That paragraph got me interested. A lot of stuff in this article is based on
this statement. Where can I read more about this? By intuition I would have
guessed that the process of marking is the smaller portion of work ...

~~~
gizmo
The author didn't mention what percentage of objects is getting sweeped at
every GC stage, but it might be a small percentage of the total heap if the
application is complex.

Sweeping, in typical implementations, also requires locking/unlocking mutexes
and sorting/combining freed memory chunks to combat fragmentation, and that
can be slow.

My hunch is that sweeping is fast because there isn't much garbage (meaning
distinct allocations, not megabytes) per sweep cycle to start with.

Edit: the talk referenced in the article
([https://blog.golang.org/ismmkeynote](https://blog.golang.org/ismmkeynote))
provides a hint. Go has lightweight threads, and if you have many of them you
have to walk all stacks, closures, and all registers of all threads in order
to determine which objects are live. If they are processing millions of
requests per second, they might have a goroutine for every request, and that
might explain why sweeping is (relatively) expensive in his benchmarks.

------
stiray
Great one, I must say I have laughed quite hard at solution, it is logical and
simple. Good work.

What does annoy me is that GC languages doesnt provide option of allocations
that are not handled by GC but still uses language primitives and/or privide
option to deallocate them manually. Dev. could decide should they leave
alocation to GC or handle them on their own. I would certanly love it, in most
programs that I write I know exactly what the lifetime of objects is and I am
leaving them to the GC just due to missing any other option.

~~~
mister_hn
I think you might want to try modern C++ (C++17 and the soon to be released
C++20), if memory allocation and lifetime is your major concern.

The language has improved itself so much that it's now a better syntax and
verbosity

~~~
fileeditview
I think he wants a GCed language that also allows manual memory alloc/dealloc.
Doesn't D offer both?

~~~
mister_hn
yeah but what's the point of a GC when you want to do manually the operations?

~~~
Dylan16807
So that you can put the hottest 5% under [semi-]manual control and let the GC
do the rest.

~~~
stiray
Exactly this. :) I am quite familiar with c++ (no expirience i D) but for fast
little projects go is great, I wouldnt mind having more control over GC.

------
senko
> the Visage application which runs on its own VM with 64GiB of physical
> memory, was GC’ing very frequently while only using ~400MiB of physical
> memory

Why would you provision a 64GB machine if you only need less than 1% of that?

Especially given earlier:

> One approach to handle this is to keep your fleet permanently over-scaled,
> but this is wasteful and expensive. To reduce this ever-increasing cost,
> [..]

This already was over-scaled.

~~~
jerf
There aren't any cloud instances of anything with 32 cores but only 4 GB of
RAM. I've got some similarly lopsided systems, because I don't need the RAM.

Even if you physically deploy them, that would be a silly loadout. Take the
bit of insurance out and stick some RAM in it. The cloud providers don't offer
these instances because if you run the numbers they don't really make sense.

------
nirui
So the solution the article suggesting is to "Use a big buffer (and later
slice it for each client(????)) instead of create buffer for each client"?

If above is true, then I have one question: What if some where down the line,
somebody calls append on that buffer?

    
    
        bigBuffer := make([]byte, 1024)
        clientABuffer := bigBuffer[:512]
        clientBBuffer := bigBuffer[512:]
    
        clientABuffer = append(clientABuffer, []byte("Hello Client 2")...)
    
        // clientBBuffer will now be [72 101 108 108 111 32 67 108 105 101 110 116 32 50 0 0 .... even we didn't directly modify it.
    

Could be a downside.

[https://play.golang.org/p/JiVKypGHQdR](https://play.golang.org/p/JiVKypGHQdR)

~~~
jerf
"So the solution the article suggesting is to "Use a big buffer (and later
slice it for each client(????)) instead of create buffer for each client"?"

No, it's actually just "create a big buffer and don't drop the reference".
It's never used for anything except changing some numbers the GC uses to do
its logic. Since it doesn't actually end up in physical RAM, it's doesn't
consume significant resources either. It's just a funny-looking way at the Go-
languange-level to twiddle some numbers to make the GC act differently.

Allocating a big slice and handing out chunks of it does have its uses,
basically, arena allocation flavored by being used in a GC'd language. But
that's not what this is.

~~~
nirui
Oh I got it finally. I was wondering why the article does not mention how to
actually _use (read/write)_ that buffer after it's been allocated other than
just keeping it alive (and do nothing else).

Thank you for sum thing up.

------
earthboundkid
A) This is from April. It's being resubmitted because the URL changed.

B) In a non-memory managed language, wouldn't you just run into the same
problem except it's called "heap fragmentation" instead and malloc has to do
an unreasonable amount of work to find free blocks to use?

~~~
bboreham
B. no, it’s quite subtle but this is just tweaking a housekeeping parameter
inside the garbage collector.

------
threadec
If you're using less than 1% of the physical RAM in the server, and 30% of
your CPU on GC, you might be optimizing for the wrong thing.

~~~
jchw
They effectively did exactly what you’re implying by letting the heap grow
more in exchange for lower GC CPU usage.

------
bitwize
Or you could use Rust and avoid having to use "tricks" like this at all.

Go is chock full of "what were they thinking?!" decisions.

~~~
eleitl
Can you name a few examples?

~~~
arcticbull
Garbage collection, lack of proper generics except for standard library
collections, interface{} and the weird way of distinguishing visibility via
capitalization, and comments that affect code generation. Oh, and blowing the
cobwebs off some archaic plan 9 assembler instead of using LLVM.

That’s not to say it’s all bad, far from it, but these were my wtf moments
exploring the language.

~~~
dgb23
> Garbage collection

This is hardly a weird choice. Garbage collection gives you the highest
productivity while being memory safe out of the three options (manual, GC,
static).

> lack of proper generics

They are working on parametric polymorphism, since a while now. It is hard to
get right in a language which has readability and simplicity as a main focus.

> interface{}

I certainly agree on this one. It is a weird feature and often a 'smell' when
found in Go code.

> the weird way of distinguishing visibility via capitalization

It is weird syntax choice in the sense of being unique/uncommon but fits very
well into the readability focus of Go.

> comments that affect code generation

I agree that they should have introduced syntax for this, assuming you mean
compiler flags (or w/e they're called).

> archaic plan 9 assembler instead of using LLVM

One of the goals of the language is to compile really fast, which they
certainly succeed at.

~~~
jerf
"> interface{}

I certainly agree on this one. It is a weird feature and often a 'smell' when
found in Go code."

The feature itself is not a problem. Every major static language has the
equivalent. It's often called something that involves the word "dynamic".

Having programmed in Go for many years now, I don't find myself using it very
often in my own code. I've come to think of this as something said by either
people who have never used Go at all, or people who used Go briefly but
insisted on programming Javascript-in-Go or something. The latter is
definitely a Bad Time... but it's always bad to program X-in-Y. If your code
is shot through with interface{}, you either chose Go for something _way_
outside of its domain, or you are not using it correctly.

"> comments that affect code generation

I agree that they should have introduced syntax for this, assuming you mean
compiler flags (or w/e they're called)."

This is another criticism that I think mostly comes from people with a
checklist criticism set of Go, because in practice, this is of negligible
concern. It doesn't come up often, it isn't proliferating (i.e., it's not like
with every point release we get another two or three new kinds of comments),
it's literally never been an issue of any kind for me in the last six years.
It's a complete non-issue. I am _far_ more annoyed by, say, the fact godoc
doesn't give me basic markdown than comments affecting compilation has ever
annoyed me, and that's just an occasional minor annoyance.

~~~
_ph_
I think interface{} is a great feature, when you want to write code that
safely handles any type. fmt.Printf is a great example for this. As type
information is attached with the value, type safety is fully maintained when
casting the contained value back to its right type.

Where it is appropriate, it is a very cool thing that you can have fully
dynamically typed variables with all safeties in place. Of course, it
shouldn't be used in place of better abstractions, like specific interfaces or
properly factored code.

