
Getting to Go: Garbage collection and runtime issues - ingve
https://blog.golang.org/ismmkeynote
======
kjeetgill
Now this is the kind of post I come to HN for. Deeply technical explanations
of real expert work!

I'm particularly keen on some of the details around goroutines since Project
Loom hopes to bring something similar to the JVM and it's nice to see what the
design space looks like.

Another is that go sometimes gets dinged for not being modern enough in a
bunch of different directions, e.g. no rust-like immutability/borrowing,
generics, or generational GC, but there is a lot of thought put into what they
are working on and where they want to focus the language.

------
steveklabnik
There's so many good quotes in this article. Two of my favorites:

"It isn't that the generational hypothesis isn't true for Go, it's just that
the young objects live and die young on the stack."

"... we are led to the conclusion that memory, or at least chip capacity, will
follow Moore's law longer than CPUs"

Kudos to the Go team!

~~~
ernst_klim
What? Are Go interfaces and strings stack allocated? If not, this assertion
looks dubious to me.

~~~
colonelxc
Go uses escape analysis at compile time to determine where to allocate each
variable. If the variable is shown to not 'escape' the current goroutine's
stack, then it is can be allocated on the stack. Otherwise it is said that the
value 'escapes' to the heap.

You can see what the escape analysis has determined for each variable by
building like so: go build -gcflags "-m"

Unfortunately it can be hard to know by just looking at code if something will
be stack or heap allocated. You can allocate a struct and take its pointer,
and have it still end up on the stack, for instance (which is good). You can
also have seemingly obvious flows for a variable that the analyzer can't
reason about and it escapes when you didn't think it would. You really just
have to run the tool to see if you want to be sure.

~~~
ychen306
I am not disagreeing with you but am confused on the issue. Java also has
escape analysis, so I wonder what makes the difference here.

~~~
bbatha
Most go data is value types that can be assumed to be on the stack, unless you
introduce a pointer explicitly or use one of the referenceish types like
interface, string, array or map. Its very easy for both the programmer and the
compiler to see when data escapes.

Java on the other hand is pathologically bad for escape analysis, outside of
ints and floats literally every object in java is presumed to be on the heap.
Even C# has less heap pressure than java typically because of more value type
options, and not type erasing greatly improves the JIT's escape analysis.
After sufficient JITing Java may reach go level of garbage generation, it may
not: but its not deterministic.

~~~
mike_hearn
That's garbled in several ways I'm afraid.

Firstly type erasure and escape analysis are un-related. Whether an object
escapes or not is unrelated to things like whether it's a List<Integer> or
List<int>.

Secondly .NET doesn't do escape analysis at all:

[https://github.com/dotnet/coreclr/issues/1784](https://github.com/dotnet/coreclr/issues/1784)

Thirdly HotSpot will happily escape analyze and then scalar replace regular
objects, it isn't limited to ints and floats. It can make a big difference in
many situations.

Fourthly and finally, if it were "easy to see where data escapes" in Go there
wouldn't need to be tools to help the programmer find out, and Go developers
wouldn't spend time manually tweaking to try and ensure the optimisation
works.

~~~
jules
Type erasure and escape analysis are not entirely unrelated. If you have an
array of pairs and store a pair into that array then the pair is considered to
escape in Java because you're actually storing a pointer to your original
pair. In C# that array can store a true array of pairs rather than an array of
pointers to pairs. A pair you store gets copied into the array, so the
original pair does not have to be considered to escape and can be stored on
the stack and deallocated when the function exits.

Although C# does not have escape analysis, its value type vs reference type
distinction serves roughly the same purpose, but under manual control of the
programmer.

------
pcwalton
I still don't see any comparison of Go's current collector with a generational
GC _with bump allocation in the nursery_. The article states that copying
would have been too much work to implement in 2014. Fair enough. But that's an
engineering decision specific to Go's circumstances. It does not necessarily
support forgoing generational garbage collection in general.

The primary advantage of generational GC is the huge throughput advantage that
bump allocation in the nursery gives you. If you're not comparing against such
a GC, then you're not properly comparing non-generational GC to generational
GC. The write barriers you need reduce throughput, but they're ordinarily
counterbalanced by the throughput gains you get from bump allocation. Of
course you will take a throughput loss if your generational GC is non-copying;
that's why people don't usually deploy non-copying generational GCs.

~~~
barrkel
That's not my understanding of the primary win from generational GC. You can
use bump allocation with an old-fashioned two-space GC. In fact that's what
the allocate() function on
[https://en.wikipedia.org/wiki/Cheney%27s_algorithm](https://en.wikipedia.org/wiki/Cheney%27s_algorithm)
does.

Rather it's assuming the infant mortality hypothesis - that the probability of
an allocation being garbage is inversely proportional to its age. Thus you
split your allocations into three logical buckets. The first bucket is recent
allocations that are super quick to trace (e.g. no bigger than CPU cache), the
third bucket is long-lived allocations ideally reachable by global roots
rather than short-lived stack frames; and the second bucket is all the stuff
that survives collections on the first bucket (hopefully only rooted by code
in flight on the stack) and you want to keep out of the third bucket because
the third bucket is so damn expensive to trace.

You need write barriers for generational GC, but you need them for concurrent
GC anyway.

Generational GC is ideal for servers that have a predictable amount of scratch
allocation per request. If first and second buckets are sized correctly and
collected at the right time, and the server never makes anything reachable
from the third bucket (like a cache - you should use reusable serialized
arrays for your caches in generational GC languages, people! And not weak
references!), then you're in a great place. It approaches arena allocation in
terms of efficiency. I've written app servers in .NET back in the mid 2000s
that barely hit 2% of CPU time in GC following these principles, and they
allocated pretty freely.

I think generational GC outside of a server request loop context, or with
wildly variable allocation profiles, is less great. Not bad mind, but the
generational infant mortality hypothesis, and more importantly, lack of middle
age death, is less reliable.

(I wrote this before I finished with the actual article. The point in the
article about avoiding write barriers with hashing is interesting in
particular - somewhat surprised it scales to 100+G heaps, if indeed it does.)

~~~
pcwalton
Yeah, it depends on how you weight the improved speed of minor collections
(latency) vs. the improved performance of allocation itself (throughput).

(I don't consider pure semispace GCs to be practical because of memory usage.
One benefit of generational GC is that it makes Cheney scans practical by
confining the semispace collection to TLABs or similar, which are by their
nature small.)

------
collinf
What an amazing read. These folks really pulled off some fantastic engineering
without a JIT. It will be interesting to see if the team decides to add
additional knobs to configure the Go GC for different workloads like Java has
now.

~~~
kjeetgill
Go has one BIG advantage compared to Java with respect to both JIT and GC. Go
has structs. This introduces a little more complexity that Java chose
(reasonably) to avoid by making everything a primitive or a reference. There
isn't any "direct" objects or explicit pointers like Go has.

This may muddy up the language vs Java a little bit, but it also gives a lot
more tools to work your code around the runtime's limitations. And thats a
perfectly fine design point by me.

If you're going to have platform limitations (no JIT, simple GC) then I need
reasonable tools when I hit them. It's much easier to avoid frequent
allocations or implement arenas in Go. This takes the pressure of the runtime
to be one size fits all.

I know, I know, Java is getting Value Types some day! We'll see where that
lands.

~~~
staticassertion
It's then interesting to compare to C#, which also can do stack allocation,
but still chooses a generational GC.

~~~
kjeetgill
I knew it had it but I have no C# experience. Does idiomatic code use this or
only more niche optimizations?

~~~
pjmlp
All performance critical data structures are value types.

Meaning int, long, char, bool, ..., points, rectangles, pairs, colours,
enumerations.

Then generics are reiffed. So each instance gets a copy if value types are
used.

Additionally, you can explicitly allocate value types and arrays on the stack
and there are APIs for manual memory management.

------
kodablah
So they had to work with a few algorithms and tweaks. I wonder if there would
be value in exposing the GC (and alloc) iface so others can implement without
recompiling the language. I would love to toy with a pluggable GC. Last I
looked at .Net's when it was open sourced, it was a gargantuan cpp file. I
know LLVM has it somewhat pluggable. We can see a lot of Go GC's work at
[https://github.com/golang/go/blob/master/src/runtime/mgc.go](https://github.com/golang/go/blob/master/src/runtime/mgc.go)
but at quick glance, I couldn't see how I could plug in an alternative impl.
Surely they had to build an experimentation abstraction to test their ideas,
right? Or are GC's just so specialized and use so much internal knowledge that
extracting a common iface is unreasonable?

Also,

> The math is absolutely fascinating, ping me for the design docs

Ping! Publish them please.

~~~
mike_hearn
The latest versions of OpenJDK have a pluggable GC interface, at the source
level at least - you can't literally drop a DLL into a directory and load it
dynamically, but the source code is modularised quite well and there are
several different GC engines available to be chosen at startup.

~~~
kodablah
Good to know, I knew that I could choose different ones at runtime via java
args, but unaware there was a runtime interface. I see cpp code at [0] and
some adjacent impls. is there a single C-level h file or is it cpp only? Any
hello-world trivial impl?

0 -
[http://hg.openjdk.java.net/jdk/jdk11/file/a0de9a3a6766/src/h...](http://hg.openjdk.java.net/jdk/jdk11/file/a0de9a3a6766/src/hotspot/share/gc/shared)

~~~
mike_hearn
The hello world collector is called Epsilon. Epsilon is the simplest possible
collector: it never collects at all.

The next one up in complexity is called the serial collector. It's a stop the
world mark/sweep design straight out of the 1980s, but it can be quite
effective for small programs like command line tools, simple GUI apps etc
because there's no heap overhead and no runtime overhead when not collecting.

After that you get into the more advanced algorithms. In order they'd be
parallel, CMS, G1, Shenandoah/ZGC.

The different engines can be found one level up, here:

[http://hg.openjdk.java.net/jdk/jdk11/file/a0de9a3a6766/src/h...](http://hg.openjdk.java.net/jdk/jdk11/file/a0de9a3a6766/src/hotspot/share/gc/)

The directory you were looking at is code shared between the collectors.

~~~
kodablah
Ah yes, I saw the adjacent impls, didn't check deeper to see Epsilon was
exactly what I was looking for. Thanks. I see even the ifaces are GPL sadly as
are the impls. I am not sure the classpath exception applies to the
interfaces. And I fear even reading/learning from the impls lest I want to use
my GPL-gained knowledge on a distributed, non-GPL project.

~~~
mike_hearn
GPL doesn't affect knowledge, only code. You can go work on proprietary
software using things you learned by reading GPLd code, don't worry about
that.

------
qazwe
With all improvements to the GC, in 2018 Go is ready to develop real-time AAA
games? In the past there is discussions that Go is not very well for game
development. (GC latency)

~~~
jy3
As previously stated, Go for video game servers feels like a no brainer.

~~~
koffiezet
That sounds a bit backwards, you would need to implement the network protocol
twice and keep them in sync - which can be very tricky... Having a common
code-base for parsing the network protocol into internal game structure
objects is a huge advantage...

~~~
pjmlp
A big chunck of AAA game servers is already implemented in Java, .NET, Erlang.

Not all of them use C++ on the server side.

~~~
koffiezet
Many of them use C++ libs with language bindings for 'common' code though. And
let that just be one of the weakest points of Go...

~~~
pjmlp
Yeah, that could be the case yes.

------
skybrian
I wonder if they'd consider changing the bit that indicates whether a word is
a pointer or not based on the value rather than the type?

Interfaces in Go seem pretty inefficient for unions of small types. If you
have a slice of interfaces, it's all pointers, even if many of the values fit
within a word and could be stored inside the interface.

~~~
weberc2
IIRC they originally were stored inside of the interface, but then the GC
would have to check whether they were a value or a pointer and this switching
was too costly.

------
abacate
The article is indeed interesting when it goes to the GC/allocation part and
it's a nice read, but it really gets C++ wrong when talking about
values/references. C++ is not "reference oriented language" the same way it is
not an object-oriented language.

For instance, what restricts one from doing:

    
    
        int blah(myobj a);
    

Instead of:

    
    
        int blah(myobj &a);
    

For passing your a myobj value?

The _usual_ argument is efficiency - since "myobj" may be a big and have an
expensive copy constructor - and you may happen to see this pattern being
commonly used in C++ programs, but restrictions? There are none. C++ even
makes it possible to assert coherency with copy/assign constructors, and
modern C++ supports the more efficient moving as well.

So, the point is: nothing in the language restricts you from passing things by
value. I'd go further an say that blindly passing things by reference is bad
practice and detrimental for simple objects - since you now have to consider
lifetime/ownership of the reference you passed along, and this will probably
complicate an otherwise simple piece of code.

The argument about fields in memory also makes not much sense. How is that
different from C++? Keeping references/pointers in structures is _possible_
but not exactly what one needs to do.

Otherwise the article is great I just think it misses the point when comparing
to "C and C++", since they are not the same language - even though syntax is
similar and there is some level of compatibility.

------
incadenza
I find this stuff fascinating but so much of it went over my head. Can anybody
point me in the right direction to read more about the relevant topics? Is
this standard material in compiler design, or another related field?

~~~
pcwalton
My favorite resource is:
[http://memorymanagement.org/](http://memorymanagement.org/)

Its glossary is invaluable. Even though the site is old, GC has been around
for decades and the state of the art hasn't changed much since it was written.
It's opinionated in a good way, explaining why production GCs work as they do.

~~~
incadenza
Awesome! Thanks.

------
riobard
Does anyone have comparable GC latency numbers comparing Go vs Java as of
2018?

~~~
nvarsj
It’s not even a contest on the latency front. Golang gc targets sub ms,
hotspot easily hits 100s of ms. A well tuned hotspot can reach 10s of ms
pauses on average, with a long tail of 100 ms+. This is on typical apps.

~~~
pron
1\. This is not due to GC design but to language semantics (value types, which
aren't yet available in Java).

2\. In September '18, ZGC will be available in OpenJDK on Linux/x86 (available
today in early access), which also targets sub ms pauses (and guarantees under
10ms):
[https://wiki.openjdk.java.net/display/zgc/Main](https://wiki.openjdk.java.net/display/zgc/Main)

Note that at some point, worst-case latency becomes far less meaningful than
throughput, because unless running on a realtime OS, the OS introduces bigger
pauses, and it makes no sense for the GC to try and beat them.

~~~
ngrilly
> This is not due to GC design but to language semantics (value types, which
> aren't yet available in Java).

The fact that Go has value types helps a lot, but the latency inferior to 0.5
ms is mainly the result of GC design, as explained in the discussed article
(especially the work on removing as much as possible stop the world pauses).

> Note that at some point, worst-case latency becomes far less meaningful than
> throughput, because unless running on a realtime OS, the OS introduces
> bigger pauses, and it makes no sense for the GC to try and beat them.

This is already said in the article.

~~~
pron
Because you usually pay for latency with throughput, you can afford the low
latency achieved with a simpler design only if you have less concurrent work.

~~~
masklinn
Go is not magic, and does in fact pay for latency with throughput:
[https://www.reddit.com/r/golang/comments/5j7phw/modern_garba...](https://www.reddit.com/r/golang/comments/5j7phw/modern_garbage_collection/dbe958e/)

> Go: 67 ms max, 1062 pauses, 23.6 s total pause, 22 ms mean pause, 91 s total
> runtime

> Java, G1 GC, no tuning: 86 ms max, 65 pauses, 2.7 s total pause, 41 ms mean
> pause, 20 s total runtime

~~~
BuckRogers
I don't really understand the argument that Go is "almost" soft-realtime.. if
you need that, you probably need or should just go to realtime. Use say, Rust
or C++.

Otherwise it seems to me that the Java/C# model is the best design for most
tasks. Which is why they're so popular, it's not a mistake.

~~~
ngrilly
> Otherwise it seems to me that the Java/C# model is the best design for most
> tasks.

This is discussed in the article (basically, Google needed low latency
servers):

 _« If you want 10 answers ask for several more and take the first 10 and
those are the answers you put on your search page. If the request exceeds
50%ile reissue or forward the request to another server. If GC is about to
run, refuse new requests or forward the requests to another server until GC is
done. And so forth and so on.

All these are workarounds come from very clever people with very real problems
but they didn't tackle the root problem of GC latency. At Google scale we had
to tackle the root problem. Why?

Redundancy wasn't going to scale, redundancy costs a lot. It costs new server
farms. »_

~~~
pron
> but they didn't tackle the root problem of GC latency

But they did. The new low-latency Java GCs are more sophisticated than Go's,
and deliver pauses that are on the order of OS-caused pauses. The reason Go
was able to achieve low latency with a relatively _simple_ design is because
1. it suffers a hit to throughput and 2. that throughput hit, while
significant, is not catastrophic because Go relies heavily on value types.

~~~
ngrilly
> The new low-latency Java GCs

As you wrote, they are new, and weren't available when the decision was made
for the Go GC.

> The reason Go was able to achieve low latency with a relatively simple
> design is because 1. it suffers a hit to throughput and 2. that throughput
> hit, while significant, is not catastrophic because Go relies heavily on
> value types.

And are you saying these "new low-latency Java GCs" have no tradeoffs either?

I'm sorry, but we are discussing the keynote of the International Symposium on
Memory Management, which is a recognized event in the field, and you are
claiming things without any substantial material to offer. Maybe you're right,
but I need more than vague assertions to be convinced :-)

~~~
pron
There is always a tradeoff in throughput (and a commercial low-latency GC has
been available for Java for years, as well as realtime GCs). All I'm saying is
that the reason Go is able to achieve low latency with a relatively simple
design is because the language is designed to generate less garbage, so the
challenge is smaller. My point is that Go's GC is not some extraordinary
breakthrough in GC design (not that Hudson hadn't made some in the past) that
unlocks the secret to low-pause GCs, but more of an indirect exploitation of
the fact that the allocation rate is _relatively_ low. The same design with
much higher allocation rates would likely suffer an unacceptable hit to
throughput.

A recent presentation on ZGC is here:
[https://www.youtube.com/watch?v=tShc0dyFtgw](https://www.youtube.com/watch?v=tShc0dyFtgw)

~~~
ngrilly
I'm not really sure what you're trying to prove here. Java is definitely a
great platform and has a cutting-edge GC. Nobody contests that. Go's GC is
another point in the design space, starting with different constraints (low
latency and non moving). This is what makes the ISMM keynote interesting.

Hudson explains they tried to switch to a generational GC, but for this they
needed a write barrier. It was difficult to optimize the write barrier, by
eliding it whenever possible, because Go is moving to a system where there is
a GC safepoint at every instruction (this is because goroutines can be
preempted at anytime, which is not a requirement for Java threads). In other
words, the GC design is also constrained by the way goroutines work.

Hudson also explains that because Go relies a lot on value types, it makes
espace analysis more effective, even without interprocedural analysis, which
makes generational collection less effective than in other languages using a
GC.

Keeping the allocation rate low is part of Go's GC design. A language with a
"much higher allocation rate" would probably lead to a different design.

Thanks for the link to the presentation on ZGC! I'll watch it soon. But I saw
the slide showing the performance goals and ZGC doesn't sounds a lot better
than the numbers presented by Hudson for Go:

\- "10 ms max pause time" for ZGC versus "two <500 microseconds STW pauses per
GC" for Go

\- "15 % max throughput reduction" for Java versus "25% of the CPU _during_ GC
cycle" for Go"

By the way, I also note that ZGC is "single generation".

------
js2
Not currently loadable for me due to the Google Cloud Loadbalancer outage[0]:

Google Cache (irony...):

[http://webcache.googleusercontent.com/search?q=cache:BtSPqHc...](http://webcache.googleusercontent.com/search?q=cache:BtSPqHcAa0kJ:https://blog.golang.org/ismmkeynote&num=1&hl=en&gl=us&prmd=ivn&strip=1&vwsrc=0)

[0]
[https://news.ycombinator.com/item?id=17552532](https://news.ycombinator.com/item?id=17552532)

------
geodel
So Go set Heap to 2X and CPU 25% as GC overhead. In Java side may be due to
generational/copying GC Heap is recommended 3-4X and CPU ~20%. This is good
read about Java GC tuning:

[https://engineering.linkedin.com/garbage-
collection/garbage-...](https://engineering.linkedin.com/garbage-
collection/garbage-collection-optimization-high-throughput-and-low-latency-
java-applications)

~~~
the8472
The article you're linking to is outdated. It talks about java 7 and CMS,
which is deprecated as of OpenJDK9. There are other collectors in OpenJDK and
then there are other JVMs too. They all come with a lot of different overhead,
complexity, latency and throughput tradeoffs.

------
amelius
Seems like Go may become an interesting target-language for compilers.

~~~
rurban
When they'll finally get a good generational copying collector, yes. Now it's
just too slow for that.

------
corydominguez
Has anyone found a video of this keynote?

------
pjmlp
Very interesting.

