Hacker News new | past | comments | ask | show | jobs | submit login
Go memory ballast: How I learnt to stop worrying and love the heap (twitch.tv)
148 points by lelf 3 days ago | hide | past | web | favorite | 88 comments





I think the takeaway here is not that some corners of Go are not performant (and should be swapped for C++/D etc.), but that there are tools in the Go ecosystem that help you figure out what the issue is.

Until Twitch hit the bottleneck with their service, they were reaping multi-year benefits of faster development using memory-safe language (huge gain for security).

Their short- (or mid-) term solution seems decent and more importantly Go core team is working on addressing this particular issue (which given the Go 1.x backward compatibility promise makes it really easy to reap all the benefits with each Go release).


"Only if we attempt to read or write to the slice, will the page fault occur that causes the physical RAM backing the virtual addresses to be allocated."

Tiny nitpick. On Linux at least you can read from the slice. New memory allocations are given a copy on write zero page (full of zeroes). So you can easily read that 10GB worth of zeroes out and still memory consumption wouldn't increase.

Only when you write a pagefault is issued and you'll get real backing for it.


I'm really surprised there are no flags to configure GC behaviour in Go. This is fairly standard performance tuning stuff on the JVM which provides a number of options to control GC behaviour, minimum and maximum heap size being some of the basic ones.

There is one flag: GOGC.

It was a deliberate choice not to have as many options as Java.


And then to the expectation of absolutely everybody outside the go team it turns out those flags exist not because of some wanker but because no matter your default they’re going to be pessimal for some users <shocked pikachu face>

Isn't it great? You don't need a degree to operate the thing. When there are esoteric conditions like the one described in the post, the team is usually super responsive to fix them if they're bugs, or give you reasonable and lightweight workarounds when they're not.

In this case, no. And what they’re doing doesn’t sound all that esoteric either. I’ve got 64gb to use but I’m only going to use 450mb and I’m going to burn a ton of CPU to keep it there does sound like a bug. And yeah, latency but sometimes that’s okay.

Agree the JVM is complicated, but you don’t have to be that complicated. A lot of it is legacy anyway.


You always have to tune GC if you're pushing the limits of the machine. If you can't do it by turning knobs, you have to do it by rewriting code.

It's Go's position that the language (and/or runtime) should make this unnecessary for the vast majority of use cases, I'm not sure what they're targeting exactly but I'm sure it's 99% or more. The resulting lack of knobs is actually a feature, and one I'm very glad to have.

I'm also happy to make my code less allocation-happy or more GC-friendly in the extremely rare circumstances when it matters. Like this article!

But I think we probably just work in different environments.


The golang position is to pretend complexity doesn't exist until it becomes a problem (this already happened before for example with the clock issue). This approach is already evident in the overly simplistic design of the language, which ends up pushing complexity into the code, and causing problems with it.

I would much rather make the concise and clear implementation perform better, than take the readability hit and risk of mistakes from rewriting it (probably more than once).

This is sensical for application code, but I think a bad compromise for a language runtime.

How can you read this post of a clear failure case that forced a code hack and think its better than a environment flag? I really find the Go community's disdain for features confounding.

You don't need a degree to set a max heap.

I think their problem was the minimum heap size, thats why they have 10Gb sitting doing nothing, as the GC only doubles/halves its size in go.

The way a gaming engine would do it would be to disable the GC and allocate a large chunk of memory and read/write directly to that memory.

Its a pretty quick hack, not bad, really.


Reality is complex by definition. Just because golang pretends it doesn't exist doesn't make it so. Secondly, the JVM GCs allow selecting the best GC for the job, whether it be throughput, very large heaps (mutli-GB to TB sized, which golang fails at), or low latency (ZGC and Shenandoah). Furthermore, the new low latency JVM GCs have very few knobs to set (2-3 last time I checked).

This is a pretty hilarious hack. I'm surprised there's no way to do this within Go runtime.

FWIW, there's a related open issue about adding a minimum heap size for situations like this, where you don't care about wasting a bit more RAM but do care about the GC work: https://github.com/golang/go/issues/23044

It references a patch to add a SetMaxHeap call, which could help in these kinds of situations too.


Yes, it is a bit strange that you cannot tell the GC a few more things about your requirement. The default setup usually works quite fine for me, but I would like to be able to tell the GC to have a certain minimum memory size. If I absolutely know that an application is going to need a certain memory size, the GC doesn't need to to any work before that is reached. Furthermore, I would like to set the GC percent quite low, if I don't have many allocations beyond that size - but that is less optimal, if the heap has to grow to that size.

And there are other situations where if you overshoot some memory limit, you get killed or the OOM killer kills some random thing (ugh) or you just have performance degradation (swapping, crowd out caches, whatever). Often better to suffer n% more GC CPU usage than to deal with that. So min/max heap are both important in different scenarios.

A blog post from someone working on .NET suggested "the user shouldn't have to know about GC internals" as the heuristic for when you have too many knobs, which I like. The user knows some things that the runtime does not about their needs in terms of RAM vs. CPU use, and it seems reasonable to have ways to communicate that.

(One other thing the user knows is their relative priorities for having a consistent level of CPU used by the GC vs. minimizing the absolute amount of CPU used. I think Go is hard-coded to target about 25% CPU use during a collection right now. Not sure we're ever getting that knob, though.)


So, isn't this a strong argument against Go? At the point that you need to start devising clever workarounds to stop bad behavior from your runtime, it's time to find a new language.

If you know of a language that has absolutely no unexpected behavior in its runtime under any workload, please let us know what it is!

The problem isn't the unexpected behavior, it's needing "clever workarounds" to fix it.

And they don't even have a workaround for the GC assist being way too aggressive, they're just sucking it up and letting the latency exist.


Baremetal assembly


Bugs are unintentional and can (and are) fixed.

Yeah, the book on performance concerns for assembly on Intel CPUs is pretty brief, only a few hundred pages: https://software.intel.com/en-us/download/intel-64-and-ia-32...

Modern C++, since there is no 'runtime' (and don't say linked in functions are a 'runtime'). If you have an example of where you think this breaks down I'll be happy to come up with a solution.

So when you throw an exception in C++, and the stack is being unwound, it is not being unwound by your C++ runtime? And the C++ standard library, specified in the same standard as the language, that you have, that has code present at runtime, is not part of the runtime?

Someone should let Microsoft know they need to rename their “C++ Runtime Library” package.


They all optional. Don't throw exceptions and they won't unwind. Standard library is optional, there're good alternatives like ATL and EASTL. Microsoft's package is optional, change a compiler switch and your binary will stop using or requiring that package.

However, manual memory management makes development more expensive. Especially if you don't crunch numbers but parse strings. C++ probably ain't a good replacement for Go. For the last decades people and whole industries have been migrating the other way, from C++ to higher level memory safe languages, initially Java and C# then others followed.


Sigh.

I avoided mentioning that the C++ runtime was optional because I was trying not to be pedantic and I assumed in context it would be understood that this was irrelevant. I think I should’ve just made my point earlier, but:

- This all started with someone mentioning that C++ doesn’t have ‘bad runtime behavior’ because it doesn’t have a runtime, but this is false. It has a runtime, or to be exact, a specification for one, and I think it is safe to say a vast, perhaps extreme, majority of C++ developers are using it.

- If you opt to not use the runtime, then your runtime behavior is dictated somewhere else, but runtime behaviors don’t go away. In freestanding, you may be your own runtime, but then your runtime behavior is defined by the machine you’re running on. You just get to choose what layer of abstraction you are sitting on.

- edit: And also, as a point I forgot to mention initially, I don’t really feel like freestanding C++ vs standard Go is an apples-to-apples comparison.

I think the point was that the C++ has less runtime behavior, since it doesn’t have a scheduler or garbage collector, but extrapolating that to no runtime is wrong even if the runtime is optional.


As far as I understood the context, it was about language runtimes. Underneath all languages, there’re usually OS kernel, drivers and hardware. They all have non-trivial amount of unexpected behavior, especially under load. But these apply to code written in all languages, almost equally.

BTW, C++ does have a scheduler. Optional like the rest of the runtime, but it’s 1 line of code away in all modern compilers, that line starts with #pragma omp.


Yes, that seems about right. There’s always behavior at runtime, and your language shields it away from you to varying degrees. It is the same for all languages, which is exactly what I am trying to drive home.

However, even if you go to the extreme of not using any features that require runtime support, if you are using hosted C++, in practice you still have one bit of runtime: the entrypoint. Technically, an operating system could implement the bits that call main, but to the best of my knowledge none of them ever have. So every compiled binary from every hosted C++ implementation begins at the runtime library. (Admittedly, normally the C runtime library, since C++ doesn’t differ here, but that is just another layer deep of pedantics. In practice, everyone has a runtime.)

Nitpick: As far as I am aware OpenMP is not part of the language itself but an extension. But yeah, you could argue that is runtime scheduling in C++, I think.


If you don’t have static initalizers you should just be able to set your entrypoint to be main, no?

Not really, because the entrypoint of executables is OS dependent and may not resemble the “main” function signature. I actually don’t know what Linux dumps on the stack/registers before calling your entrypoint, though, to be honest.

On a freestanding environment, the signature of main doesn't have to be the standard "int main(int argc, char argv)". Oh, and on Linux the the arguments to main are put on the stack, which you can see is handled by _start: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...

This isn't about freestanding C++. There's very little in the C++ standard library that can cause exceptions. Off the top of my head, it's throwing new (which nobody relies on anyway because of Linux OOM behavior); stuff like vector::at() that you have to use explicitly; and iostreams that are so horrible in general that they're usually avoided altogether.

There are tons of things that throw exceptions in the standard library with no alternative. std::bad_alloc can get thrown from half the standard library (and it's a lot less implausible than new throwing, because of address space exhaustion on 32-bit). Any constructor that takes input that can be invalid has to throw (e.g. giving a malformed regex to std::regex).

The really obnoxious ones are std::stoi, stol, etc. which were added in C++11 as a replacement for strtol and friends, but throw on overflow/invalid input instead of returning a result type or a bool with out parameter.


There is heap defragmentation.

About all it implies about Go is that the language as an integrated system isn't yet fully polished. There is a corner case of strange behaviour due to aggressive GC when the absolute memory requirements of the program aren't large. It can be worked around by making them larger by a ballast allocation. This is not "time to find a new language", it's a small issue with a known fix and there's a proposal to repair it at source in the pipeline.

A weak argument, maybe. The algorithm the pacer used seems reasonable in 99% of cases. Here's a fairly extreme case where it didn't.

But further, it seems fairly easy to fix and indeed it looks like the Go team are doing just that.


It's hacky but also simple to implement. Should the go gc improve to the point that it's unnecessary, it seems simple to remove as well. Wouldn't bother me much.

> Strong argument against Go

No

Weak argument against Go ?

No

Let me know which language was perfect

That there is a proposal to work this out and relevant people are working on this makes a strong argument for go IMO.

https://github.com/golang/go/issues/23044


I'm confused. They say "above code" and whatnot, but I don't see any code is this article. Is this the same for everyone?

How did they allocate that 10GB byte array?


    func main() {

        // Create a large heap allocation of 10 GiB
        ballast := make([]byte, 10<<30)

        // Application execution continues

    }

10<<30 is a lot. Perhaps you meant 2<<30?

Are you confusing shifting with exponents? 10<<30 is exactly five times bigger than 2<<30.

Yes, I did.

Has there been any discussion or proposals to Go to simply provide a free() function? This could recursively GC up all objects starting with the given root pointer. The idea is to avoid hacking around the GC system and instead give some hints to the runtime to better optimize GC - without penalizing folks that don’t care about this level of performance.

If you free something that has a reference to it still what would be the effect? Can GC still free and then fault out on the reference, or does the free throw an error (and you do a check while GCing?). If the later, can you do a free starting anywhere or do you need to have a good global state?

What's wrong with the first option? Runtime invalidates the pointer, and then just segfault on use and be done with it?

It seems to me that if you're going to play with memory directly, it's reasonable for the generally-memory-safe runtime to throw out its guarantees on memory-safety


I think it's reasonable to want that in principle, but in practice it gets really messy. What if I use a library which returns me a manually-mis-managed pointer (either due to a library bug or me not setting up manual memory management correctly when initializing the library) which segfaults on read or write? That's kind of spooky behavior in a language that prides itself on simplicity.

There are ways that could be done (for example by making the type system aware of whether or not something is hand-managed and potentially unsafe) but not without adding considerable complexity to the language and, presumably, the implementation.


Doesn't your scenario essentially amount to asking what happens if you use a poorly written library that does bad things and it breaks your program in unexpected ways? That can happen in every language I'm aware of, regardless of safety guarantees. For Go, consider the unsafe.Pointer type.

This commonly recurring idea that we need to somehow banish all insecurity and risk is flawed in my opinion. When safe is the easiest and most convenient way of doing something, it will generally be used. What we need are sane defaults, combined with explicit and unambiguous interfaces for when we do choose to take manual control. The real issues start to arise when a language exhibits unexpected behavior, when automatic memory management is opt-in instead of opt-out, and similar.


That was my question- first option seems simplest and fair. If you can write code and manually manage things that’s all on u.

Rust's `drop<T>(_: T)` function is really neat in that it's basically an empty function. Since ownership of the argument moves to the drop function it explicitly leaves scope and gets torn down.

You can do something similar for go where you have a function that marks your locally-scoped variable dead, the compiler can track to make sure you don't use it afterwards, and if it's not the last reference to the underlying, it's not deallocated. If it is, it gets culled immediately.


The “toilet closure” also works.

  |_|()

Sounds like go is missing the Java equivalent of -Xms which sets the minimum heap size before GC is necessary

> Sounds like go is missing the Java equivalent of -Xms which sets the minimum heap size before GC is necessary

-Xms sets the initial heap size, but GC happens before the heap is exhausted, in most GC configurations. There is a smaller eden in the heap, and GC is needed to evacuate objects out of the eden into the main heap.


> It turns out that for modern operating systems, sweeping (freeing memory) is a very fast operation, so the GC time for Go’s mark-and-sweep GC is largely dominated by the mark component and not sweeping time.

That paragraph got me interested. A lot of stuff in this article is based on this statement. Where can I read more about this? By intuition I would have guessed that the process of marking is the smaller portion of work ...


The author didn't mention what percentage of objects is getting sweeped at every GC stage, but it might be a small percentage of the total heap if the application is complex.

Sweeping, in typical implementations, also requires locking/unlocking mutexes and sorting/combining freed memory chunks to combat fragmentation, and that can be slow.

My hunch is that sweeping is fast because there isn't much garbage (meaning distinct allocations, not megabytes) per sweep cycle to start with.

Edit: the talk referenced in the article (https://blog.golang.org/ismmkeynote) provides a hint. Go has lightweight threads, and if you have many of them you have to walk all stacks, closures, and all registers of all threads in order to determine which objects are live. If they are processing millions of requests per second, they might have a goroutine for every request, and that might explain why sweeping is (relatively) expensive in his benchmarks.


I’d also be interested in reading more.

My guess would be all the pointer chasing during marking is expensive and since Go doesn’t use a compacting GC there isn’t anything to do other than push freed memory back into the allocators internal data structures during a sweep.


Great one, I must say I have laughed quite hard at solution, it is logical and simple. Good work.

What does annoy me is that GC languages doesnt provide option of allocations that are not handled by GC but still uses language primitives and/or privide option to deallocate them manually. Dev. could decide should they leave alocation to GC or handle them on their own. I would certanly love it, in most programs that I write I know exactly what the lifetime of objects is and I am leaving them to the GC just due to missing any other option.


C# is "a GC language" and generally has quite good support for explicit lifetime management, through stack allocated value types as well as memory pools and whatnot.

Here is a list of GC language that provide capabilities to handle allocations outside GC heap,

Mesa/Cedar, Modula-2+, Modula-3, Oberon, Oberon-07, Oberon-2, Active Oberon, Zonnon, D, Nim, C#, VB.NET, F#, Standard ML (with MLKit), Eiffel, Swift.

Just the list I tend to keep in mind, there are quite a few others.


I think you might want to try modern C++ (C++17 and the soon to be released C++20), if memory allocation and lifetime is your major concern.

The language has improved itself so much that it's now a better syntax and verbosity


I think he wants a GCed language that also allows manual memory alloc/dealloc. Doesn't D offer both?

yeah but what's the point of a GC when you want to do manually the operations?

So that you can put the hottest 5% under [semi-]manual control and let the GC do the rest.

Exactly this. :) I am quite familiar with c++ (no expirience i D) but for fast little projects go is great, I wouldnt mind having more control over GC.

C# provides this now with all the new Memory and Span constructs to manage memory seamlessly. You can build your own memory pools and allocators to manage lifetimes.

> the Visage application which runs on its own VM with 64GiB of physical memory, was GC’ing very frequently while only using ~400MiB of physical memory

Why would you provision a 64GB machine if you only need less than 1% of that?

Especially given earlier:

> One approach to handle this is to keep your fleet permanently over-scaled, but this is wasteful and expensive. To reduce this ever-increasing cost, [..]

This already was over-scaled.


There aren't any cloud instances of anything with 32 cores but only 4 GB of RAM. I've got some similarly lopsided systems, because I don't need the RAM.

Even if you physically deploy them, that would be a silly loadout. Take the bit of insurance out and stick some RAM in it. The cloud providers don't offer these instances because if you run the numbers they don't really make sense.


So the solution the article suggesting is to "Use a big buffer (and later slice it for each client(????)) instead of create buffer for each client"?

If above is true, then I have one question: What if some where down the line, somebody calls append on that buffer?

    bigBuffer := make([]byte, 1024)
    clientABuffer := bigBuffer[:512]
    clientBBuffer := bigBuffer[512:]

    clientABuffer = append(clientABuffer, []byte("Hello Client 2")...)

    // clientBBuffer will now be [72 101 108 108 111 32 67 108 105 101 110 116 32 50 0 0 .... even we didn't directly modify it.
Could be a downside.

https://play.golang.org/p/JiVKypGHQdR


"So the solution the article suggesting is to "Use a big buffer (and later slice it for each client(????)) instead of create buffer for each client"?"

No, it's actually just "create a big buffer and don't drop the reference". It's never used for anything except changing some numbers the GC uses to do its logic. Since it doesn't actually end up in physical RAM, it's doesn't consume significant resources either. It's just a funny-looking way at the Go-languange-level to twiddle some numbers to make the GC act differently.

Allocating a big slice and handing out chunks of it does have its uses, basically, arena allocation flavored by being used in a GC'd language. But that's not what this is.


Oh I got it finally. I was wondering why the article does not mention how to actually _use (read/write)_ that buffer after it's been allocated other than just keeping it alive (and do nothing else).

Thank you for sum thing up.


A) This is from April. It's being resubmitted because the URL changed.

B) In a non-memory managed language, wouldn't you just run into the same problem except it's called "heap fragmentation" instead and malloc has to do an unreasonable amount of work to find free blocks to use?


No, the issue here is not fragmentation or allocation itself, it’s that the GC heuristics make it way too aggressive to keep memory down (burning cpu the application could use for productive things) despite only a small fraction of the machine’s memory being used.

B. no, it’s quite subtle but this is just tweaking a housekeeping parameter inside the garbage collector.

If you're using less than 1% of the physical RAM in the server, and 30% of your CPU on GC, you might be optimizing for the wrong thing.

They effectively did exactly what you’re implying by letting the heap grow more in exchange for lower GC CPU usage.

Or you could use Rust and avoid having to use "tricks" like this at all.

Go is chock full of "what were they thinking?!" decisions.


That's one of the disadvantages of a language that's easy to learn: it's also easy to criticise. While some of the concepts of Rust are so complicated that most people don't ask themselves "what were they thinking?", but "I guess they must have had some reason to do it this way, I could read up on Rust to try to find out why" - and never actually get around to doing it...

Can you name a few examples?

Garbage collection, lack of proper generics except for standard library collections, interface{} and the weird way of distinguishing visibility via capitalization, and comments that affect code generation. Oh, and blowing the cobwebs off some archaic plan 9 assembler instead of using LLVM.

That’s not to say it’s all bad, far from it, but these were my wtf moments exploring the language.


> Garbage collection

This is hardly a weird choice. Garbage collection gives you the highest productivity while being memory safe out of the three options (manual, GC, static).

> lack of proper generics

They are working on parametric polymorphism, since a while now. It is hard to get right in a language which has readability and simplicity as a main focus.

> interface{}

I certainly agree on this one. It is a weird feature and often a 'smell' when found in Go code.

> the weird way of distinguishing visibility via capitalization

It is weird syntax choice in the sense of being unique/uncommon but fits very well into the readability focus of Go.

> comments that affect code generation

I agree that they should have introduced syntax for this, assuming you mean compiler flags (or w/e they're called).

> archaic plan 9 assembler instead of using LLVM

One of the goals of the language is to compile really fast, which they certainly succeed at.


"> interface{}

I certainly agree on this one. It is a weird feature and often a 'smell' when found in Go code."

The feature itself is not a problem. Every major static language has the equivalent. It's often called something that involves the word "dynamic".

Having programmed in Go for many years now, I don't find myself using it very often in my own code. I've come to think of this as something said by either people who have never used Go at all, or people who used Go briefly but insisted on programming Javascript-in-Go or something. The latter is definitely a Bad Time... but it's always bad to program X-in-Y. If your code is shot through with interface{}, you either chose Go for something way outside of its domain, or you are not using it correctly.

"> comments that affect code generation

I agree that they should have introduced syntax for this, assuming you mean compiler flags (or w/e they're called)."

This is another criticism that I think mostly comes from people with a checklist criticism set of Go, because in practice, this is of negligible concern. It doesn't come up often, it isn't proliferating (i.e., it's not like with every point release we get another two or three new kinds of comments), it's literally never been an issue of any kind for me in the last six years. It's a complete non-issue. I am far more annoyed by, say, the fact godoc doesn't give me basic markdown than comments affecting compilation has ever annoyed me, and that's just an occasional minor annoyance.


I think interface{} is a great feature, when you want to write code that safely handles any type. fmt.Printf is a great example for this. As type information is attached with the value, type safety is fully maintained when casting the contained value back to its right type.

Where it is appropriate, it is a very cool thing that you can have fully dynamically typed variables with all safeties in place. Of course, it shouldn't be used in place of better abstractions, like specific interfaces or properly factored code.


> It's a complete non-issue.

It is an issue as someone who doesn't write Go code often but occasionally reads/debugs it. At least for the first couple of times I've come across it made me scratch my head for a couple of hours, because I didn't see what the issue was it being 'hidden' in the comments. Which is fine. It just isn't in line with the general premise of Go's language design, so I wasn't even expecting it. So in a sense you are right!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: