
Proposed New Go GC: Transaction-Oriented Collector - zalmoxes
https://docs.google.com/document/d/1gCsFxXamW8RRvOe5hECz98Ftk-tcRRJcDFANj2VwCB0/edit
======
pizlonator
Interesting that they are publishing the algorithm before implementing it. I
wish more people did this. This way, if the algorithm doesn't work out, we'll
have some way of knowing. If they hadn't published the algorithm, tried it,
and then abandoned it, then nobody would have any way of knowing that there
had been a negative result.

I don't have high hopes for this algorithm, because my intuition is that
transactions aren't common enough to make a big difference. Looking forward to
seeing the implementation and empirical data.

~~~
twotwotwo
"Transactions" here doesn't mean roll-back-able units like in a database, just
small bits of code that terminate (and, the authors hope, leave some garbage
behind that you can clean up without scanning a bunch of RAM). Request-
oriented collector would have been about as good a name.

This is kind of a spin on generational collection, built around the
observation that a lot of Go programs are HTTP/RPC/whatever servers.

~~~
Jweb_Guru
It's also not exactly original to Go... the Ur language, for instance, employs
a request-oriented collector (arguably, though the documentation claims it
does not use garbage collection).

~~~
ezyang
Ur/Web uses region based memory management. Details are in
[http://adam.chlipala.net/papers/UrWebICFP15/](http://adam.chlipala.net/papers/UrWebICFP15/)
search for "memory management".

~~~
Jweb_Guru
Thanks for the link! I was actually at that talk :) To quote from the relevant
slide:

* Transactions are integrated into Ur/Web at a deep level, so, whenever we run out of space, we can always abort the execution, allocate a larger heap, and restart.

* As a further optimization, we use region-based memory management, inferring a stack structure to allow freeing whole sets of objects at key points during execution.

------
DblPlusUngood
Interesting. The transactional hypothesis sounds like a refinement of the
generational hypothesis.

Seems like the main advantage of this transaction oriented collector over a
typical generational collector is that it precisely identifies the old-to-
young pointers and thus may scan less total objects.

~~~
pcwalton
> Seems like the main advantage of this transaction oriented collector over a
> typical generational collector is that it precisely identifies the old-to-
> young pointers and thus may scan less total objects.

I'm confused as to what this means. Write barriers, just as suggested here,
are the standard solution for this issue in generational garbage collectors,
and they precisely identify the old-to-young pointers as well. (In fact, the
nursery and tenured generations are often separated in memory, so you can test
just by masking a few bits off.)

I agree with the sibling comment that this seems to be a simple generational
GC, just with a different policy for tenuring than the typical textbook one.

~~~
ctangent
The standard solution is that the write barrier updates a card table, which
generally reserves a bit for every page. When it comes time to mark through
old-to-young pointers, the collector scans the card table - if it sees a bit
set in the card table, it must mark through /all objects/ on that page - even
if only one of the objects on the page has a cross-generational pointer.

~~~
pcwalton
Well, another solution (which I believe is common) is to just tenure the young
object (and recursively tenure objects reachable from it) if the write barrier
trips, which preserves the precision.

There isn't anything fundamental to generational GC that requires imprecision.
It's an implementation detail. In fact, David Ungar's original generational GC
paper from 1984 [1] uses precise remembered sets. GHC's GC [2] among others
uses precise remembered sets.

[1]:
[https://people.cs.umass.edu/~emery/classes/cmpsci691s-fall20...](https://people.cs.umass.edu/~emery/classes/cmpsci691s-fall2004/papers/p157-ungar.pdf)

[2]:
[https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage...](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC/RememberedSets)

~~~
DblPlusUngood
Tenuring the young object in the write-barrier is not correct since the the
young objects reachable from the newly-tenured object won't be marked in the
young collection. That is why TOC then traces this newly-tenured object, which
is novel to my knowledge.

~~~
pcwalton
Yeah, of course you need to recursively tenure objects, and I should have been
more precise about that. I've heard of doing this before; it isn't novel.

------
vvanders
Looks a lot like an Arena allocator tied to a specific execution context.

~~~
Doradus
I'm not sure you've grasped the write barrier part. That's the really new
part.

------
cmurphycode
I'm having trouble understanding what the practical difference between the
transaction concept and something like RAII. It may just be the example they
chose (goroutine processes an incoming request, does not "share" the contents
with anyone else, and terminates when complete) but nonetheless I'm having a
mental block. Can anyone clarify?

Note that I'm not saying that they should've used RAII along with existing GC,
I'm just trying to distinguish the transaction concept and the RAII concept.

~~~
pcwalton
RAII is completely static; the lifetimes of all (uniquely-owned) objects are
determined at compile time with a set of rules based on the language syntax.
But the "published"/"local" distinction as in this GC is dynamic: the compiler
does not (and cannot) statically work out which objects are published and
instead determines that at runtime.

This basic distinction leads to completely different implementations.

------
justin_vanw
If you replace 'transaction' with 'stack'...

~~~
giovannibajo1
Go already allocates on the stack anything it can prove it doesn't leave it. I
don't see much work going on on escape analysis and I hope they will improve
it.

There was a promising proposal document, with a couple of interesting goals,
but I guess it was abandoned:
[https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpT...](https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpTe5q32QDmz8l0BouG0Cw/preview)

~~~
twotwotwo
Nah, some of those things went in. I remember seeing a change to keep small
string/byte slices' bytes and the smallest size map on the stack sometimes
now, for example.
[https://twitter.com/golang_cls](https://twitter.com/golang_cls) posts a
selection of change discussions if you like this kind of stuff. The
test/escape_* tests in the Go source tree show what escape analysis is
expected to do and not do. Dmitry Vyukov, the author of the doc you linked,
worked on a lot of the changes he was talking about; impressed with the range
of his contributions (also including the race checker and go-fuzz).

------
kevan
I'm not at all an expert in this area but to me this sounds like a similar
concept as Rust-esque ownership, at least internally in the runtime. If the
runtime knows that an object wasn't shared by its goroutine owner and the
owner goes out of scope then cleanup becomes (theoretically) much faster, if
only because scope is greatly limited vs GC that has to scan the entire heap.

~~~
pcwalton
It's no more similar to Rust than any generational collector is (which is to
say: not much at all). Rust's memory management story is primarily static,
while this is dynamic.

~~~
cwzwarich
Rust's memory management story is not primarily static. Anything that is
allocated on the stack or anchored by a stack allocation is dynamic; it can
fail, and Rust checks for failure at runtime.

~~~
dikaiosune
But rust determines at compile time when almost all allocations need to be
freed (the exceptions being Rc/Arc reference counting boxes), whereas it seems
like a transaction/request GC would discover that at runtime.

In other words, the allocations and deallocations are determined statically,
they aren't static allocations in and of themselves.

------
twotwotwo
A nice thing about this is that it keeps the collections happening while no
users are waiting. Before seeing this I figured what they'd do for throughput
might be more of a Java/C#-like compacting young gen collection, which still
introduces (short!) pauses.

Also, fun wrinkle: seems like _occasionally_ , wrapping a slowish, mostly-
self-contained, garbage-heavy function call in a goroutine could be a useful
hint to this collector. It seems like it's only useful when 1) a transactional
GC on the calling thread or full GC would really be slower (e.g. calling
thread keeps lots of live data so GC on it takes more work), and 2) the
launch/communication costs (expect at least µsecs) are small enough relative
to gains. But interesting it's even possible for it to work out that way.

------
Kapura
I have a generational art program that I wrote under Go 1.4 which was
absolutely crushed by 1.5's GC. The program heavily abuses goroutines, so I'm
cautiously optimistic that this optimisation would allow me to keep my version
of Go current once again.

~~~
gtirloni
What is a generational art program? If you don't mind me asking.

~~~
Kapura
Well, in my case specifically, it was a program that generated coloured images
based on a set of simple rules (namely, don't reuse the same RGB colour for
any pixel & try to be as close to the colour of neighbouring pixels as
possible). I wrote a brief overview around the time I finished the first
version[1].

More recently, I wrote a new routine which generates several of those and then
averages the RGB channel values on a per-pixel basis, and then repeatedly re-
averages the original images with those average images to create more
"generations." I haven't written anything up about that because the code
required to sum and average pixels in a PNG is significantly less interesting.

[1] [https://medium.com/@kapuramax/procedural-image-generation-
in...](https://medium.com/@kapuramax/procedural-image-generation-in-
go-7a57ff2e2e90#.7dxx5hzci)

------
tekacs
For folks looking for a collector with a similar ideal (though with a
potentially more useful constraint model), you might see the Pony language [1]
collector [2].

The essence being to take advantage of some form of isolation in a concurrent
system to dynamically infer something that looks like scoping/lifetimes.

[1]: [http://www.ponylang.org/](http://www.ponylang.org/)

[2]: [http://blog.acolyer.org/2016/02/18/ownership-and-
reference-c...](http://blog.acolyer.org/2016/02/18/ownership-and-reference-
counting-based-garbage-collection-in-the-actor-world)

------
kevindeasis
Somewhat off topic, but since there are growing opportunities for golang
developers, I'm gonna ask a golang development question.

Is anyone using golang with a different language like C, rust, etc? Are you
using it through linkage or through microservices?

What libraries are you using with golang that's written in a different
language. What does your company /app / system / software do?

Do you recommend using libraries in a different language or using related
golang libraries?

------
stcredzero
I'd like to see this tried, along with a couple of wrinkles. Could information
from the escape analysis done by the Golang compiler be used to provide
"hints" to the garbage collector to make it more efficient? Could such hints
be combined with further hints from the programmer, such as "finalize" calls
or tags? Could runtime tracing information be used to derive even more hints
for more efficient GC?

~~~
thwd
Note: "finalize"-hints in Go are just curly braces, which begin and end
blocks. Variables local to the scope are "finalized" (i.e. collectable,
assuming they live on the heap) when the block ends.

~~~
stcredzero
_Note: "finalize"-hints in Go are just curly braces, which begin and end
blocks._

But sometimes objects will survive the close braces.

~~~
thwd
I'm not sure I understand: do you mean that they are collected after the
program counter hits the last instruction of a given scope or are you implying
that scope-local variables outlive their scope somehow?

------
strictfp
I've been thinking about this before as RAII with optional deferring of
recollection when batching would be advantageous , for instance when you have
a high allocation pressure and small blocks.

~~~
Doradus
RAII is using one C++ misfeature (destructors) to work around another
misfeature (inability to do cleanup when leaving a lexical scope). Go has a
far more elegant solution to the latter; if you're not familiar with the defer
statement, check it out.

~~~
strictfp
Eh, defer is more similar to javas try-with-resources or C#:s using, who both
put the burden on the user of the object and cannot handle recursive cleanup
(cleanup of trees). So I consider it much worse than destructors.

------
eternalban
We really need to stop thinking of memory as a 'store'. Treat it as a cache
and (a) GC naturally becomes a background process, and (b) you get persistent
memory for free.

~~~
SixSigma
I doubt it :

@rob_pike

Caches are bugs waiting to happen.

22 Mar 2014

[https://twitter.com/rob_pike/status/447202124753952768](https://twitter.com/rob_pike/status/447202124753952768)

~~~
eternalban
Another bon mot from Rob:

    
    
        My late friend Alain Fournier once told me that he
        considered the lowest form of academic work to be
        taxonomy. And you know what? Type hierarchies are 
        just taxonomy. 
    

Monday, June 25, 2012 [https://commandcenter.blogspot.com/2012/06/less-is-
exponenti...](https://commandcenter.blogspot.com/2012/06/less-is-
exponentially-more.html)

Possibly Mr. Fournier meant 'foundational' by "lowest". If not, he may wish to
revisit the foundational work performed by "lowly" academics such as a Mr.
Darwin in the 19th century.

[p.s. we need to alert the CPU designers that they have been cluelessly using
caches all these years. If only they had higher guidance from an infallible
mind..]

~~~
enneff
I think CPU designers would be a lot happier if they didn't need caches.

------
dboreham
Finally a reason for me to consider using Go vs Java.

~~~
eternalban
The decision tree remains the same as before.

Both systems have active runtime sub-systems, GC, scheduler, and a compiler
front-end.

Java, of course, has a virtual machine and intermediate representation (byte
codes), and a JIT to make it performant. The runtime class loading mechanism
is a key +input here. The JIT aspect is a mix, with the -input being the
upfront startup cost.

Go has far better C linkage, access to underlying memory object image, and
provides the (glaringly missing from JVM) unsigned integer types.

Java has a more modern type system and first class metadata facilities, with
-input being "more rope". Go has a 'interesting' type system, with -input
being source level fragility, but +input here is "less rope".

Having extensively worked with concurrent systems in both languages, the
distinction ultimately boils down to a question of whether langauge level
continuation (fiber/green) as provided by Go vs. library/ByteCodeEngineering
(various Java libs) is a critical requirement.

I honestly think it should be clear at this point that Go is really displacing
Python and the notion of Go vs. Java is a false dichotomy.

~~~
wheaties
Go isn't displacing Python any more than Elixir is displacing Ruby.

------
oleganza
This is similar to "stack promotion" in Swift
([http://fossies.org/linux/swift-
swift/lib/LLVMPasses/LLVMStac...](http://fossies.org/linux/swift-
swift/lib/LLVMPasses/LLVMStackPromotion.cpp)).

Objects that can be determined to stay within a local scope get allocated on
the stack instead of a heap, so cleanup is very efficient.

The difference with Go is that decision is done at compile time in Swift and
in runtime in Go.

~~~
leaveyou
Actually no. Go compiler already has "stack promotion" (using "escape
analysis"). TOC is in addition to that and I think it's intended to cover the
cases not covered by "stack promotion" for example Function A calls Function B
which allocates some memory and returns it to Function A which "consumes it"
before returning. In this case the allocation can't be promoted to stack
because B returns and its stack gets destroyed.. but the allocated memory can
still be safely released when Function A returns.

------
joosters
Each release of Go seems to have yet another garbage collection algorithm,
targetting a different use case each time. Presumably the previous use cases
are left behind, ready for a future update going back to target them too. This
leads to an infinite loop of GC alterations.

Does anyone feel confident that a 'good enough for everyone' GC will ever be
produced? If not, surely the language designers should settle on picking a
type of program that they will optimise for, and sticking to it.

~~~
ngrilly
> Presumably the previous use cases are left behind

No, they are not. The optimization discussed here adds something to the
existing GC, and removes nothing.

> Does anyone feel confident that a 'good enough for everyone' GC will ever be
> produced?

The Go GC is constantly improving, and its performance in Go 1.6 and 1.7 is
considered very good my most users.

~~~
vvanders
There's a subthread further down that points to this exact type of regression:
[https://news.ycombinator.com/item?id=11970313](https://news.ycombinator.com/item?id=11970313)

Without full memory control you're always going to making trade-offs in terms
of performance. Hopefully the GC is "good enough" for your use case but there
will never be a one-size fits all GC.

~~~
enneff
FWIW that "regression" looks like a program that depends on particular runtime
characteristics to generate "artistic" output. Not really the kind of issue
you can realistically develop around.

In almost all practical programs the new GC (in 1.5) was a significant win, or
at worst no change.

------
exabrial
Nice... why hasnt the JVM ever done this?

~~~
icholy
I'm guessing because the transactions are essentially goroutines which the JVM
doesn't have.

~~~
Doradus
Nope. Transactions, in the sense of the article, bear no relation to
goroutines.

The right answer is that Java's generational collector supports this
transaction lifetime reasonably well in many cases.

------
japasc
why dont let programers choose transactions boundaries within a goroutine, so
they can have fine grained control over allocations and deletions

~~~
enneff
Because that would be a different language altogether, and that language
exists: Rust. It's pretty cool.

