
Go GC: Latency Problem Solved [pdf] - xkarga00
https://talks.golang.org/2015/go-gc.pdf
======
Animats
The LISP community went through this in the 1980s. They had to; the original
Symbolics LISP machine had 45-minute garbage collections, as the GC fought
with the virtual memory. There's a long list of tricks. This one is to write-
protect data memory during the GC's marking phase, so marking and computation
can proceed simultaneously. When the code stores into a write-protected page,
the store is trapped and that pointer is logged for GC attention later. This
works as long as the GC's marker is faster than the application's pointer
changing. There are programs for which this approach is a lose. A large sort
of a tree, where pointers are being retargeted with little computation between
changes, is such a program.

If they're getting 3ms stalls on a 500MB heap, they're doing pretty well. That
the stall time doesn't increase with heap size is impressive.

Re _" avoid fragmentation to begin with by storing objects of the same size in
the same memory span."_ That's easy today, because we have so much memory and
address space. The simplest version of that is to allocate memory in units of
powers of 2, with each MMU page containing only one size of block. The size
round-up wastes memory, of course. But you can use any exponent in the range
1..2, and have, for example, block sizes every 20%. This approach is popular
with conservative garbage collectors (ones that don't know what's a pointer
and what's just data that looks like a pointer) because the size of a block
can be determined from the pointer alone.

~~~
lispm
Symbolics added an Ephemeral GC in 1985. It keeps a bitmap of modified memory
pages with ephemeral objects in RAM. The Ephemeral GC then looked only at
those pages.

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.2438)

Macintosh Common Lisp later used a similar scheme on 68k machines with MMU.

~~~
nickpsecurity
Thanks for the link. This jumped out at me:

"The importance of designing the architecture and the hardware to facilitate
garbage collection is stressed."

Replace GC with reliability, security, concurrency, and so on it's still true.
It's why I advocated safe, high-level languages and RISC processors. I figured
they'd be easier to modify at compiler or hardware level as people invented
solutions to these problems. Doing the same on x86, Windows, and C++? I almost
gave up...

~~~
pjmlp
In the mid-90's while at the university I got to learn about Oberon and
eventually had the Native Oberon running on my PC.

It opened my mind about using GC enabled systems programming languages.
Eventually I became an Oberon addict and also Modula-3 one, as a side effect
of discovering a book about it gathering dust in a technical library.

OS vendors just need to care the same way as they do with improving JavaScript
JIT compilers, for example.

~~~
nickpsecurity
Yeah it was pretty nice. The SPIN OS team wrote a whole OS in Modula-3 that
supported safe, dynamic linking of code into the kernel for performance
boosts. A recent discussion here showed that the Go language was partly an
attempt to re-create one of its author's experience coding in Oberon. The
combination of systems programming, safety, and productivity disappeared when
he switched to C.

Far as JS, that would help but it's a good example of why it won't. There were
many attempts, like Juice [1], to replace JavaScript in the browsers with
something better. There were also attempts to solve the problems at OS-level
that Web solved. All of them ignored it to the point that we eventually got
stuck with JavaScript & browsers being only thing that's on all devices with
presentation & computation layer. So, while _still_ ignoring better options,
they continued to improve the speed of JavaScript engines and their new JIT
schemes.

So, the OS's could certainly benefit from the kind of activity that led to
JavaScript JIT performance. However, their existence says more in the other
direction.

[1]
[http://www.modulaware.com/mdlt69.htm](http://www.modulaware.com/mdlt69.htm)

~~~
pjmlp
Yeah sure, I just wanted to make the point that some technologies can be
improved if the companies that matter in the IT world, bother to put money
into it.

Somehow I like to think that Android, WinRT/.NET Native, Swift's introduction,
Mirage OS are all little steps into that direction.

~~~
nickpsecurity
Certainly they could be improved. Swift and MirageOS are great examples. IBM's
mainframes and AS/400 line are good examples given they adapted them to most
useful, popular technologies + still backward compatible. Remember, though,
that backward compatibility constrains the biggest and oldest systems in ways
that prevent architectural improvements. All the biggest companies lining
Microsoft, Oracle, and SAP's pockets would have to throw away their apps that
they can't even rebuild. Not happening.

So, our best bet is that small to midsized firms with more flexibility keep
adopting these technologies. That's fuels investment into them to get them
eventually at a level like Microsoft and Oracle. That enterprises have
switched to service model helps in that they keep some services on old tech
but implement some on newer, better stuff (ex Python at BofA). So, these
trends are what we have to bank on.

One thing, though: better get it right the first time as your newer, better
tech will eventually be a legacy tech someone else is maintaining. It's why I
focus on readability, interfaces, and type safety for maintenance concerns a
lot more now in tech.

------
rgbrenner
This page adds some context to the slides:
[https://sourcegraph.com/blog/live/gophercon2015/123574706480](https://sourcegraph.com/blog/live/gophercon2015/123574706480)

It was posted here 10 days ago:
[https://news.ycombinator.com/item?id=9854408](https://news.ycombinator.com/item?id=9854408)

------
joosters
Garbage collecting seems to get solved in each new release of Go and Java,
apparently.

~~~
rgbrenner
Do you actually follow Go development? 1.5 is a major improvement to the GC.

The only other stable release that had any change to the GC worthy of mention
in the release notes was 1.3, which was a minor change.

The only possible way to interpret your statement as true, would be if "get
solved" meant ANY change or bug fix was applied to the GC. Which is a
statement that would apply to so much code (not just the GC) that it would
make the statement completely meaningless.

~~~
joosters
Yeah, I'm being unfair in naming Go & Java specifically. But these stories of
'fixing' garbage collection come up all too often.

I wonder when we'll see a further GC update that trades latency for
throughput...

The problem seems to be that no matter how you tweak GC, you will always have
a class of program that it performs terribly for (and it seems to impact a
large group of programs, never just some obscure corner case). So I suspect
that this latest GC tweak will have unexpected results on some other class of
program, leading to another tweak, and so on...

~~~
xenadu02
I hope you realize that malloc is far from free in a non-GC world right? (In a
GC world allocating is just moving a pointer forward.) You pay the cost
somewhere.

The CLR has also done a lot of GC work to enable concurrent GC, thread-local
heaps, and "zero pause" (in reality extremely low constant time pauses).

The only way to avoid paying the cost for managing memory is to allocate
everything you need once and never release it.

~~~
physguy1123
I hope you realize that stack allocation can replace a lot of allocation that
would be done by a GC? And that having control over memory layout can lend
itself to better performance? And that naively mallocing everywhere is not the
only or fastest way to manually manage memory, and sometimes isn't even the
easiest.

~~~
giovannibajo1
Go also has stack allocations for objects based on escape analysis; basically,
if the compiler can prove that a variable doesn't escape, it is allocated on
the heap, otherwise on the heap. Improvement on escape analysis in the
compiler thus reduce also the heap size by allocating more things onto the
stack.

------
jnordwick
Still slowish. Far far from "solved." The charts they zoom in on only go to
about 500MB in heap, showing 2 ms pause times. It makes me suspicious that the
nice linear trend he's showing doesn't hold up under more reasonable values --
my IDE takes up 500 MB and my web browser over a GB.

So if by his possibly rosy calculations, a basic 3GB heap is still pausing 6
ms. God forbid I use a 500 GB heap and now we're into the one second range
again. This is assuming the linear relationship holds up, but given his choice
of graph domain, I have a suspicion that there are issues to the right.

This seems typical of Google technology. They say they care about performance,
but I have yet to see a piece of Google tech that is actually useful if you
care about performance. People automatically assume Google is synonymous with
performance, but it definitely isn't.

Remember, he says this improved GC pause time is going to come at the expense
of Go top-line speed. You Go will get slower, and you sill will have second
pauses with any serious work.

~~~
mseepgood
The first chart goes up to 20 GB heap size.

> This is assuming the linear relationship holds up

Their goal for 1.6 is to make it constant, not linear:

"Zooming in, there is still a slight positive correlation between heap size
and GC pauses. But they know what the issue is and it will be fixed in Go
1.6."
[https://sourcegraph.com/blog/live/gophercon2015/123574706480](https://sourcegraph.com/blog/live/gophercon2015/123574706480)

~~~
jnordwick
His version of "slight" to me isn't so slight. He's brushing under the rug
that Go isn't suitable for many of the low latency, memory hungry domains that
comprise modern systems.

~~~
bostik
> _many of the low latency, memory hungry domains that comprise modern
> systems._

I would say it really depends on the situation, and especially what you
consider low-latency. It also depends very much on what your definition of
low-latency is.

For background: we run a betting exhange. Customers will notice and complain
if any action _with their money_ takes more than ~100ms. This threshold aligns
quite well with old research about human response times [0].

On the other hand, if we were running an interactive chat/forum system, it
would be acceptable to have >500ms latencies from click to comment display.
When it comes to communications, reliable persistence tends to be more
important to raw latency. (Or to put it another way: it is okay to delay
displaying of a fresh comment until it has been stored. That way the user
knows they do not need to rewrite their contribution.)

I personally have a background in embedded systems, where hard latency limits
are the norm for user experience. Developers, end users and companies are all
willing to sacrifice throughput for near-immediate feedback .. and doubly so
when the system in question happens to control a vehicle dashboard.

At 60 frames per second, one frame refresh is about 17ms. When you need to
provide visibly immediate feedback to the user, you have _at most_ 6 frames to
display it. Because the data must be available before the rendering of the 6th
frame starts, you actually have on average no more than 5,5 * 17ms = 93ms to
calculate the response.

The real trick is figuring out where you can get away with non-immediate
latency requirements. And incidentally, this has knock-over effects: if hard
low-latency is not necessary, some GC spikes should be tolerable. Spend the
engineering effort where it is crucial, not where it might be nice.

At least until you have more workforce than engineering problems.

0: [http://www.nngroup.com/articles/response-
times-3-important-l...](http://www.nngroup.com/articles/response-
times-3-important-limits/) (Nielsen, 1993)

------
jmount
I thought the issue with GO garbage collectors wasn't so much speed as
correctness (as they GO team historically has gotten GCD speed by sacrificing
correctness, or is correctness a goal past version 1.3?).

~~~
mseepgood
The Go garbage collector is precise since Go 1.3:
[https://golang.org/doc/go1.3](https://golang.org/doc/go1.3)

------
shmerl
I prefer RAII approach to GC.

~~~
reagency
Explain?

~~~
gavazzy
Not OP, but RAII provides deterministic destruction. That is, it is provable
exactly when the object will be deleted.

From wikipedia: "Object destruction varies, however – in some languages,
notably C++, automatic and dynamic objects are destroyed at deterministic
times, such as scope exit, explicit destruction (via manual memory
management), or reference count reaching zero; while in other languages, such
as C#, Java, and Python, these objects are destroyed at non-deterministic
times, depending on the garbage collector, and object resurrection may occur
during destruction, extending the lifetime."

[https://en.wikipedia.org/wiki/Object_lifetime#Determinism](https://en.wikipedia.org/wiki/Object_lifetime#Determinism)

~~~
pjmlp
You can have RAII in C#, Java and Python by making use of using/try-with-
resources/with, or HOF with monadic constructs.

Yes, it isn't as easy as declaring a templated handle manager class on the
_stack_ (it won't work for heap objects) in C++, but it also gets the job
done.

