
Go GC: Prioritizing low latency and simplicity - dbaupp
https://blog.golang.org/go15gc
======
anarcticpuffin
This is very similar to Java's CMS collector[1]. Unfortunately, 10ms is still
far too long for some applications. I have hopes for something like Azul's
pauseless GC[2] eventually becoming a common GC strategy, but I'm not holding
my breath. In the meantime, there's always sun.misc.Unsafe[3] in the JVM world
:-(

[1] -
[https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc...](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/cms.html)
[2] -
[http://www.azulsystems.com/zing/pgc](http://www.azulsystems.com/zing/pgc) [3]
- [http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-
dot...](http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/)

~~~
nickpsecurity
+1 on Azul. They've pretty much solved it by improving on past methods
combining hardware and software. Go could do the same thing. I keep wondering
about putting a dedicated FPGA on the memory bus that does nothing but
concurrent GC. Have a mechanism to keep the processor (s) and it from stepping
on each others' toes. Might work wonders.

~~~
thrownaway2424
Why would an FPGA be a better solution than software running on another core?

~~~
fulafel
.. while keeping in mind that for some workloads GC is a large chunk of the
app's work. For example, 10 Xeon cores worth of GC throughput (out of say 24)
would be a pretty tall order for a FPGA, and as a fixed resource it easily
becomes an Amdahl's law bottleneck.

It would be a cool thing to try still, and maybe doable with COTS hw:
[https://www-ssl.intel.com/content/www/us/en/embedded/technol...](https://www-
ssl.intel.com/content/www/us/en/embedded/technology/quickassist/documentation.html)
\+ [http://www.hotchips.org/wp-
content/uploads/hc_archives/hc21/...](http://www.hotchips.org/wp-
content/uploads/hc_archives/hc21/3_tues/HC21.25.500.ComputingAccelerators-
Epub/HC21.25.532.Cantle-Nallatech-Xeon-Socket-FPGA.pdf)

~~~
nickpsecurity
It's a tall order because you set it up to be. Real system design would call
for a balancing act, as usual. Remember that you can put a bunch of GC's on
one FPGA that all run concurrently with access to shitloads of I/O and/or fast
memory bus. Amdahl's law shouldn't kick in any more than with concurrent GC's
in general. The parallelism, simplicity, and tech like in your link should
make it faster than an on-board collector. The concept isn't speculation as
it's already been done in two different ways:

Fine-Grained Parallel Compacting Garbage Collection through Hardware-Supported
Synchronization (2010) [http://www.ikr.uni-
stuttgart.de/Content/Publications/Archive...](http://www.ikr.uni-
stuttgart.de/Content/Publications/Archive/Hv_GarbageCollection_40071.pdf)

Stall-free, real-time collector for FPGA's (2012)
[http://researcher.watson.ibm.com/researcher/files/us-
bacon/B...](http://researcher.watson.ibm.com/researcher/files/us-
bacon/Bacon12StallFree.pdf)

The question is, "Can modern CPU's and off-chip FPGA's keep in sync without
performance getting dragged down?" The FPGA's have gotten faster. The CPU's
I/O have gotten faster. So, I'm sure it can be done but it might be difficult
enough to be someone's Master thesis. ;)

Besides, I call for replacing current chips with open ones easy to modify for
acceleration and security. Gaisler LEON4 SPARC, Rocket RISC-V, Cambrige's
BERI/CHERI MIPS64... these all come to mind. Plan was to put them onto a high-
end FPGA w/ concurrent GC's to test the scheme. Once it worked, ASIC
conversion time baby. S-ASIC's are $200-500k on average with resulting
production & packaging being _way_ cheaper after that. Just hoping there's a
few companies that would split the cost to eliminate most memory and control
flow issues forever. ;)

------
windlep
"To create a garbage collector for the next decade, we turned to an algorithm
from decades ago. Go's new garbage collector is a concurrent, tri-color, mark-
sweep collector, an idea first proposed by Dijkstra in 1978."

So, the best GC for the next decade was proposed in 1978. Whatever computer
scientists have proposed since then apparently hasn't been as good.... so they
can all just pack it up and go home?

I'd be really really curious to hear from some academic computer scientists if
the best that can be done was first proposed back then, and they've all just
failed to find anything better since. Cause this is just downright depressing
otherwise (If you actually want to believe that we're making progress).

~~~
mholt
I'm no academic (just a CS student), but from the very little I understand
about GC algorithms, there's no free lunch. In other words, choosing GC isn't
like shopping for a new condo... different algorithms have different
advantages and implementation difficulties. There may also be caveats with
some algorithms that others aren't plagued by. "Better" means different things
to different people and in different situations.

For what it's worth, many of today's finest, most important algorithms are
from the 50s-70s, or maybe special adaptations of them. Theory doesn't
obsolete like technologies do.

No need to get depressed by this. Hard problems are still hard (until proven
otherwise), and taking out the trash is the same today as it has ever been.

~~~
eternalban
> For what it's worth, many of today's finest, most important algorithms are
> from the 50s-70s, or maybe special adaptations of them.

This is only the case/sensible for algorithms that have not been practically
improved.

------
mehrdada
> Today 16 gigabytes of RAM costs $100 and CPUs come with many cores, each
> with multiple hardware threads. In a decade this hardware will seem quaint.

This is a potentially dangerous worldview. While it is true for the
server/desktop/laptop segment, it is less of a surety for phones, and
downright scary for smaller, low-power embedded/IoT devices.

~~~
eloff
It's not a limitation, just a tuning knob that can be used to trade off GC
cycles against memory usage. If you're in a low-memory environment, tune the
knob to use more CPU.

~~~
cwyers
What if you're in a low-everything environment?

~~~
millstone
Then don't use Go. Go was pitched as a systems language, but has found its
niche as a server language. It's to be expected that the maintainers will make
decisions that tune it for servers.

------
alberth
>> "Go's new garbage collector is a concurrent, tri-color, mark-sweep
collector"

So basically, it's the same GC that Mike Pall had planned to implement for
LuaJIT 3.0. Too bad Mike recently announced he was transitioning away from
doing daily LuaJIT work.

[http://wiki.luajit.org/New-Garbage-Collector](http://wiki.luajit.org/New-
Garbage-Collector)

~~~
eloff
Actually he wanted to use a quad-color implementation, which AFAIK is a Mike
Pall tweak of a tri-color collector.

------
qznc
> Maintaining this invariant is the job of the write barrier, which is a small
> function run by the mutator whenever a pointer in the heap is modified.

Synchronization whenever a heap pointer is changed? That does not sound like
"low latency".

~~~
dfox
In case of GC terminology "barrier" simply means that some kind of hook is run
on read or write of pointer, it does not necessarily imply barrier in the
caching or synchronization sense.

Typical implementation of write barrier involves adding item to some kind of
thread-local queue.

------
z92
I was wondering, why not use ARC, as in C++ and Obj-C?

~~~
mzl
Simple reference counting is quite possibly the worst possible choice for
automatic garbage collection, especially since it is so intuitively appealing.
The common objection is that cycles are hard to handle; leading to either a
broken behaviour (leaking garbage) or adding complicated cycle detection
(negating the premise of a simple solution).

Apart from cycles, reference counting has several other issues. Some of my pet
objections are:

* The need to add a mutable field to every object increases the active memory foot-print significantly (especially bad for caches).

* The frequent updates to that field must be done using atomic operations if the objects may be accessed by different threads (even though the objects themselves may be immutable).

* The updates to the reference field trashes the caches in a multi-core processor.

* The amount of work done managing these updates can easily become a significant part of the run-time of the program, in some cases dwarfing the actual work.

* Uncontrolled arbitrarily long pauses will happen when cascading deletions happen.

There are really cool and smart algorithms that make reference counting much
better, but then the appeal of a simple natural algorithm for collecting
garbage has come and gone.

~~~
fauigerzigerk
_> There are really cool and smart algorithms that make reference counting
much better, but then the appeal of a simple natural algorithm for collecting
garbage has come and gone._

That's probably true, but if the win is pauseless automatic memory management
that doesn't require twice as much memory as using a GC, then it may be worth
it.

I think, realistically, we'll have to accept that both tracing GC and
reference counting (however smart) will always suffer from some drawbacks that
manual memory management can solve. The reverse is also true.

~~~
mzl
Sure, a really smart implementation (coalescing reference count updates,
concurrent de-allocations to get pauslessness, etc) could have some nice
properties.

However, while the amount of virtual memory needed for a tracing GC might seem
large, it is not active memory. Reference counting on the other hand will
significantly increase the amount of active used memory, especially if many
small objects are common.

As for GC vs. manual memory management, both have their uses. For most systems
I would argue that GC is the way to go since it is quite efficient and is much
more productive. On the other hand, I actually like writing code that has
manual memory management. My current interest is learning more Rust for that
kind of development.

------
copsarebastards
As always with Go, they assume nothing that has happened in the last 3 decades
is worth looking at, and retrofit their justifications for their choices to
fit that. It's inherently a language that is 3 decades behind other modern
languages. It's embarrassing.

------
w_t_payne
I'd prioritize _predictability_ over anything else.

------
issaria
Will this collector evolve into generational in 1.6? or it already does it.

------
spullara
Quite a lot of hubris in thinking they are just going to solve this as if no
one has been working on it. Who knows, maybe they have, but color me
skeptical.

~~~
seanmcdirmid
Good GC is an open problem. Even though there has been much work done on the
subject, there is still plenty of room for improvement!

Server GC, for example, typically tends to be less concurrent to improve
throughput. Client GC is more responsive, but also sacrifices throughput.
Since Go is positioned for server work, it is very interesting that they are
choose what is typically seen as client-side priorities.

~~~
eloff
Unpredictable latency is a huge no-no in these days of responsive server
applications. Long GC pauses are just not acceptable to many server
developers, so I don't think it's correct to consider it only a client-side
problem. If you look at the mechanical sympathy mailing list, you'll see a
large group of Official Knob Turners for the Java garbage collectors
discussing their trade.

~~~
pcwalton
The point is not that latency doesn't matter. It's that throughput and latency
are more or less a fundamental tradeoff in GC, and latency isn't the only
thing that matters.

~~~
eloff
Yes, that's true, but this GC lets you get more throughput by using more
memory, so you can still decide how you want to make that trade-off. But no
amount of knobs can save you from a long stop-the-world pause, so they made
absolutely the right decision, in my opinion. With multi-core server CPUs,
throughput is relatively cheap to obtain. Latency is hard, you can't just
throw money at the problem.

