
Go GC: Solving the Latency Problem in Go 1.5 - beliu
https://sourcegraph.com/blog/live/gophercon2015/123574706480
======
arcticbull
I've worked with garbage collected languages (ruby, Java, Objective-C in the
bad old days), automatic reference counting (Objective-C and Swift), in the
Rust model and manual reference counting/ownership in C over about 15 years
now.

Having thought about this a lot, I just don't really understand why people
continue to work on garbage collection. Non-deterministic lifecycle of
objects/resources, non-deterministic pauses, huge complexity and significant
memory and CPU overhead just aren't worth the benefits.

All you have to do with ARC in Swift and Objective-C is type 'weak' once in a
while (which effectively builds a directed acyclic graph of strong
references). With Rust you can get away with just structuring your code in
accordance with their conventions.

I'm sure this won't resonate with everyone but I think it's time to walk away
from GC. I'm curious, is there something I'm missing? The only true benefit I
can think of is reducing heap fragmentation; and there must be a better way to
address that.

~~~
rurban
> I just don't really understand why people continue to work on garbage
> collection. > ... but I think it's time to walk away from GC

Well, look at the arguments. GC advantages:

    
    
      * less memory (unless you use a semi-space GC)
      * faster
      * can handle cyclic refs (graphs, not only trees and linked lists)
      * trivial to use, less programmer errors
    

What you got wrong: significant memory and CPU overhead.

If you compare the memory overhead and complexity of refcounts with malloc to
a non-semi-space GC, you'll be surprised. malloc is more complex and worse in
it's CPU overhead, refcounts in every data cell are a huge overhead. GC's got
it better.

Where GC's have a memory overhead (copying collectors), they do it on purpose,
on machines which do have enough memory. Those copying collectors are the
fastest, but cannot be used on small devices. With not enough RAM, you just
use a trivial Mark&Sweep, which is much more trivial than your malloc
implementation and manual refcounts.

The only real field where you cannot use a GC is, when you cannot tolerate
pauses, as in real-time, latency critical apps. But even there exist
incremental GC's (like boehm) with real-time characteristics. And when you
need immediate destruction of objects, when they go out of scope, and not when
the GC decides to destroy them later on. This can be solved by the compiler,
but usually isn't.

Of course people always walk away from GC and avoid it like a plague. GC
people on the other hand feel memory is too important to be trusted to
programmers. We had these discussion for decades.

~~~
MrBuddyCasino
Also, Azul Systems solved that issue - the proprietary C4 GC has no pauses,
and the heap can be huge, like hundreds of GB. If that tech would become
common place, maybe this discussion would be obsolete. But I think C4 requires
kernel support, and the first attempt to get a patch accepted didn't go well.

~~~
osi
To clarify, Azul's Zing _does_ have pauses, but they optimized the crap out of
them (the pauses are more time-to-safepoint rather than GC pauses). GC time
wrt application stopped time is constant regardless of heap size.

(I'm an Azul customer and Zing user)

~~~
MrBuddyCasino
Thanks, the marketing implies its pause-less. What are typical application
stop times?

~~~
osi
My mean pauses are a few hundred micros. Standard deviation is slightly more
(300-500 micros), with a max of a millisecond or two.

------
alexbock
> ...there has been a virtuous cycle between software and hardware
> development. CPU hardware improves, which enables faster software to be
> written, which in turn...

This is the exact opposite of the experience I've had with (most) software. A
new CPU with a higher clock speed makes existing software faster, but most new
software written for the new CPU will burn all of the extra CPU cycles on more
layers of abstraction or poorly written code until it runs at the same speed
that old code ran on the old CPU. I'm impressed that hardware designers and
compiler authors can do their jobs well enough to make this sort of bloated
software (e.g. multiple gigabytes for a word processor or image editor)
succeed in spite of itself.

There are of course CPU advancements that make a huge performance difference
when used properly (e.g. SSE, multiple cores in consumer machines) and some
applications will use them to great effect, but these seem to be few and far
between.

~~~
nulltype
I'm not sure users really care how many gigabytes their word processor is. How
fast it is is probably more interesting to them. And while wasting CPU cycles
on abstraction layers isn't a great way to make a super fast program, if the
program still runs at 60fps and took half as much time to develop, then maybe
they're worth it.

Of course, when you end up with some standards-driven monstrosity like a
modern web browser, you do seem to have a lot of unnecessary abstraction
layers and also it's slow.

~~~
smegel
> and took half as much time to develop

Abstractions make coding faster now?

~~~
benaiah
No, it'd be way faster to just do it all in assembly. These fancy "high level
languages" and "memory management" and "libraries" are just cons foisted on
poor unsuspecting programmers by middle management and enthusiastic marketers.
Real Programmers (TM) don't need any of that shit.

(/s)

~~~
smegel
You don't seem to understand what "abstraction" means in computer science.
Hint: it's nothing to do with memory management. High level languages like
Python/Ruby actually have less abstractions than lower level languages like
Java because they don't need them, and Python/Ruby programmers tend to want to
get stuff done rather than write a ode to the Gang of 4 in XML.

------
mseepgood
Better images of the plots:

[https://pbs.twimg.com/media/CJatKFQUkAE5qcR.png:large](https://pbs.twimg.com/media/CJatKFQUkAE5qcR.png:large)

[https://pbs.twimg.com/media/CJavrIAUMAAIaq8.png:large](https://pbs.twimg.com/media/CJavrIAUMAAIaq8.png:large)

------
_ph_
I am a bit surprised by most of the discussion here so far. Garbage collection
has first of all one fundamental advantage: correctness. You are guaranteed
never ever to have a pointer to a freed object and that any unreachable object
does get freed. For almost all programs that get written, correctness should
go over speed.

And speaking of speed, unless you require hard realtime behavior, garbage
collection can be quite beneficial. A generational GC offers faster allocation
times than any malloc based allocator, and the collection of the nursery
generation is instantaneous in most cases. ARC has the overhead of counting
for each referencing/dereferencing and while it might predictable about
kicking in when killing a reference frees memory, the time required to free a
given object completely depends on how much objects get consequently freed.

Furthermore, garbage collection helps to write clean code, as it is safe (and
usually cheap) to allocate memory during a function call and return results
referencing the memory.

Of course, badly written programs might perform badly with GC - but without
the same kind of programs would just be a disaster. And most strategies for
efficient memory usage used in non-GC languages (e.g. memory pools for certain
objects) can and should be equally used in GC languages.

------
davidw
Erlang's per-process (Erlang process, not Unix process) GC is pretty good from
this point of view. I'm surprised they didn't mention it as something to think
about.

~~~
andrewchambers
Why would you mention it? Go is shares all memory, erlang doesn't. It is a
different problem.

~~~
jzelinskie
Actually, IIRC Erlang does have a shared heap, but it is used as an
optimization to avoid copying very large objects.

~~~
andrewchambers
That doesn't contradict what I said. Go shares ALL memory, erlang may share
some sometimes.

------
eggnet
Interesting. I think the GC pauses are go's biggest problem.

Looks like they are tackling this head on with positive results.

~~~
nulltype
Have you encountered any problems with go GC pauses?

~~~
eggnet
Yes, I have. We used go in a system that had to keep track of essentially
large hashes containing popularity / scoring information, in addition to what
were effectively routing tables, and ran into >500ms pauses.

At scale, it seemed the largest part of the complexity of go was manipulating
data structures and code to avoid gc pauses. With sufficient work we may have
been able to decrease the pauses sufficiently, but we also ran into raw
request per second numbers that were lower than we liked.

The direction we have taken was to switch to C++ for this application. Having
said that, the gc pauses were the primary reason for the change.

~~~
jerf
If it's possible, and you're interested in trying, it would be interesting to
pull that code out and try it again with Go 1.5, if it's easy.

If you've got a C++ solution, I would not suggest under any circumstances
short of Go suddenly and frankly mysteriously blowing the doors off of C++
that you _switch_... I'm just saying it would be an interesting comparison.

~~~
eggnet
I'll be recommending we try it in a lab with go 1.5, absolutely.

------
jdub
Was the presentation more in-depth that this summary? I'd love to read or hear
more about the changes.

~~~
zachgersh
The summary pretty much covers it in terms of what was presented. There
definitely would not have been enough time to discuss the changes they made at
length.

------
davidgrenier
And with this, I hope we'll see a race for GC pauses improvements.

------
hit8run
"Go programs will get a little bit slower in exchange for ensuring lower GC
latencies."

How much slower are Go1.5 programs compared to their Go1.4 version? Is this
relevant for web apps?

