
Garbage Collection - tosh
http://craftinginterpreters.com/garbage-collection.html
======
ufo
An interesting story on generational garbage collectors: a couple of versions
ago, Lua experimented with a generational GC. But it didn't improve
performance, so they kept the old incremental collector. A long time later,
they looked at it more closely and realised that the actual problem was that
the generational collector was moving objects out of the nursery too eagerly.
Objects in the stack are live and will survive if the GC runs, but many of
them will stop being live as soon as their function returns. They fixed the
issue by making the stack objects that survive one round of GC to stay in the
nursery. The new GC is now much faster than the old one, and is one of the big
features of the upcoming Lua 5.4

~~~
cracauer
When using generational GCs it is of utmost importance to match the policies
for moving things between GCs to the actual pointer interconnection pattern.

This can go really bad, for example in systems that keep caches/repositories
of preallocated things for faster re-use. If they are anchored at global
variables, or near global, then you need to treat those preallocated things
special. You can't just go and move them every time even when knowing you will
never collect them.

Generally, GC works better if you don't do tricks/optimizations with memory
allocation and let just everything flow freely into the heap. If you do have
to optimize allocation you generally have to teach your GC about your hack.

~~~
hinkley
One of the consequences of this being that when you do in-process caching, you
trade improved average case performance for degraded worst-case behavior,
unless the GC authors have taken some pains to deal with pointer chasing on
writes to the old generation.

I'm kind of a fan of out-of-process, same-system caches for this reason.

------
zackmorris
Great article, but I'm curious why automatic reference counting (ARC) and
smart pointers never seemed to really catch on outside of Objective-C and
Swift:

[https://en.wikipedia.org/wiki/Automatic_Reference_Counting](https://en.wikipedia.org/wiki/Automatic_Reference_Counting)

[https://en.wikipedia.org/wiki/Smart_pointer](https://en.wikipedia.org/wiki/Smart_pointer)

They almost "just work", except for circular references:

[https://en.wikipedia.org/wiki/Reference_count#Dealing_with_r...](https://en.wikipedia.org/wiki/Reference_count#Dealing_with_reference_cycles)

I'd like to see some real-world studies on what percentage of unclaimed memory
is taken up by orphaned circular references, because my gut feeling is that
it's below 10%. So that really makes me question why so much work has gone
into various garbage collection schemes, nearly all of which suffer from
performance problems (or can't be made realtime due to nondetermistic
collection costs).

Also I can't prove it, but I think a major source of pain in garbage
collection is mutability, which is exacerbated by our fascination with object-
oriented programming. I'd like to see a solid comparison of garbage collection
overhead between functional and imperative languages.

I feel like if we put the storage overage of the reference count aside (which
becomes less relevant as time goes on), then there should be some mathematical
proof for how small the time cost can get with a tracing garbage collector or
the cycle collection algorithm of Bacon and Paz.

~~~
munificent
_> They almost "just work", except for circular references_

Well, and they're often slower since they require mutating the reference count
on every single time a field is stored. You can optimize that using lazy
reference counting, or not updating refcounts for references on the stack and
instead scanning the stack before a collection. But at that point... you're
halfway to implementing a tracing collector.

Every refcounting implementation eventually gets "optimized" to the point that
it has most of a tracing collector hiding inside of it. I think it's simpler,
cleaner, and faster to just start with tracing in the first place.

There's a reason almost no widely used language implementation relies on ref-
counting. CPython is the only real exception and they would switch to tracing
if they could. They can't because they exposed deterministic finalization to
the user which means now their GC strategy is a language "feature" that
they're stuck with.

That being said, ref-counting is fine for other kinds of resource management
where resources are pretty coarse-grained and don't have cyclic references.
For example, I think Delphi uses ref-counting to manage strings, which makes a
lot of sense. Many games use ref-counting for tracking resources loaded from
disc, and that also works well. In both of those cases, there's nothing to
trace through, and the overhead of updating refcounts is fairly low.

~~~
ridiculous_fish
Objective-C and then Swift are surely the most serious goes at _fast_ widely
used RC languages. Neither requires mutating the reference count on every
store, and neither seems at risk of growing a tracing GC.

ObjC had some elision tricks like objc_retainAutoreleasedReturnValue, but more
importantly the optimizer was taught about RC manipulation. Swift then
extended this with a new ABI that minimizes unnecessary mutations.

The big advantages of this scheme are efficient COW collections and a simpler
FFI (very important with Swift). More generally RC _integrates_ better.
Imagine teaching the JS GC to walk the Java heap!

~~~
pjmlp
So efficient that Swift's RC implementation loses against all major tracing GC
implementations.

[https://github.com/ixy-languages/ixy-languages](https://github.com/ixy-
languages/ixy-languages)

~~~
jmull
That project doesn’t compare GC implementations, so it’s probably not that
useful here.

Also, the Swift implementation is a bit questionable if performance is a goal.
That is, why not try to remove the memory management from the inner loop?
Probably the first thing to try is value types instead of reference types,
which are more generally preferred anyway.

~~~
pjmlp
Sure it does, memory management impacts the performance of the task being
achieved, writing an high-speed network driver.

~~~
jmull
By that measure, every project is a good measure of GC performance. Is that
really a good argument?

I believe all general purpose languages let the code allocate memory and
therefore will let you allocate memory in a way inefficient to your task.

~~~
pjmlp
The argument is that to eventually write Swift code whose performance matches
the competition, one needs to work around ARC's implementation.

Hence why there is so much emphasis on value driven programming alongside
protocols at WWDC talks.

------
dan-robertson
Whenever the topic of garbage collection comes up, I am reminded of the
following excellent reference,
[https://www.memorymanagement.org/](https://www.memorymanagement.org/) which
puts various different garbage collection (and memory management) techniques
into a wider context. It perhaps doesn’t explain some of the newer tricks (eg
using overlapping memory mappings to put the colour bits into the high(ish)
bits of pointers and get a hardware write barrier only when necessary without
needing to move lots of objects and have forwarding pointers).

The reference is provided by Ravenbrook, a consulting company formed from the
ashes of Harlequin (who made lispworks, a CL implementation and IDE; MLWorks,
the same for SML; and ScriptWorks, a postscript rasteriser which made them all
their money). I don’t know when the reference was created.

------
azhenley
Why was the title changed to include (2017)? It was published today.

~~~
detaro
please mail the mods at hn@ycombinator.com for such corrections

~~~
Noumenon72
This comment seems like something others will want to see. Before the title is
corrected (five hours so far), the comment is useful for people who didn't
notice the wrong date, or did notice and wondered if anyone had emailed the
mods. After the title is corrected, the comment is a useful place to explain
why the mistake happened and learn a little about the process that changes the
titles.

~~~
detaro
And if the user is interested in that change to actually happen, contacting
the moderators is the best way of doing it. They might not be aware of that,
which is why I mentioned it, so that they and others reading the comment know
in the future.

------
jgon
Why is this marked 2017? This is the newest chapter of a book that isn't
finished yet, and was released literally today. It couldn't be more (2019) if
it tried!

~~~
jodrellblank
Tech moves fast; this morning's bleeding edge is this afternoon's yesteryear,
and you don't want to know what happens if you bookmark it to read tomorrow.

~~~
jacobush
Was this a subtle reference to a sweep and mark algorithm?

------
cracauer
If you are interested in optimizing generational GC with operating system
facilities:

[https://medium.com/@MartinCracauer/generational-garbage-
coll...](https://medium.com/@MartinCracauer/generational-garbage-collection-
write-barriers-write-protection-and-userfaultfd-2-8b0e796b8f7f)

And a review of LLVM's GC facilities:
[https://medium.com/@MartinCracauer/llvms-garbage-
collection-...](https://medium.com/@MartinCracauer/llvms-garbage-collection-
facilities-and-sbcl-s-generational-gc-a13eedfb1b31)

Overall message is that there are many different ways to get to the
destination, and creative solutions are mixed in.

~~~
dan-robertson
Do you know if llvm got good at garbage collection support recently? My
understanding was always that their optimisations would mangle your
stack/registers so much that a decent incremental exact GC becomes super hard.
And that something like keeping one/two pointers to the tip of the heap in
dedicated registers is super hard/annoying too.

~~~
cracauer
I don't have anything new. I might have to re-write a GC for Clasp. Then I
know :)

A better questions would be whether anybody has a moving, precise GC running
on LLVM. Clasp with MPS runs, but that might be coincidence/luck.

------
mwcampbell
I'm a bit surprised to find no discussion of reference counting versus tracing
GC, only a couple of passing mentions. On the one hand, I suppose this has
already been discussed to death. And if the author felt it would be more fun
to write and read about actually implementing mark-and-sweep GC, fair enough.
On the other hand, reference counting definitely has its place, and I'd be
curious to know more about the author's opinion on it, given his background in
game scripting languages. Some scripting languages that have been put to good
use in games, such as AngelScript [1] and Squirrel [2], use reference
counting.

Edit: My mistake; he discussed it back in chapter 3.

[1]:
[https://www.angelcode.com/angelscript/](https://www.angelcode.com/angelscript/)

[2]:: [http://squirrel-lang.org/](http://squirrel-lang.org/)

~~~
carapace
FWIW, in re: reference counting vs. tracing GC

"A Unified Theory of Garbage Collection"
[https://researcher.watson.ibm.com/researcher/files/us-
bacon/...](https://researcher.watson.ibm.com/researcher/files/us-
bacon/Bacon04Unified.pdf)

> Tracing and reference counting are uniformly viewed as being fundamentally
> different approaches to garbage collection that possess very distinct
> performance properties. We have implemented high-performance collectors of
> both types, and in the process observed that the more we optimized them, the
> more similarly they behaved— that they seem to share some deep structure.

> We present a formulation of the two algorithms that shows that they are in
> fact duals of each other. Intuitively, the difference is that tracing
> operates on live objects, or “matter”, while reference counting operates on
> dead objects, or “anti-matter”. For every operation performed by the tracing
> collector, there is a precisely corresponding anti-operation performed by
> the reference counting collector.

> Using this framework, we show that all high-performance collectors (for
> example, deferred reference counting and generational collection) are in
> fact hybrids of tracing and reference counting. We develop a uniform cost-
> model for the collectors to quantify the trade-offs that result from
> choosing different hybridizations of tracing and reference counting. This
> allows the correct scheme to be selected based on system performance
> requirements and the expected properties of the target application.

~~~
pcwalton
This is one of my favorite papers. It's a very accessible article that dispels
the myth that reference counting is somehow not a form of garbage collection.

~~~
munificent
_> This is one of my favorite papers._

Mine too. It's a gem.

------
c-smile
While reading I was expecting to hit moving/copying/compacting GC discussion
at some point. But not.

IMO, one of GC's benefits is that it maintains data locality. Otherwise given
GC is not that different from reference counting MM (loops in ownership graphs
aside). Same problems with memory fragmentation inherited from malloc/free.

~~~
BubRoss
When you say locality, you mean moving memory allocations next to each other
so that there is no wasted space between them right?

~~~
cracauer
Not wasted space. Unreleated space would do.

Keep in mind that you only have a tiny amount of L1 data cache lines. They are
gone so quickly. If you can get a couple more struct instances in an array
into those cache lines (without the cache lines holding unrelated nonsense as
a byproduct of a memory fetch) that is a huge win.

The issue of L1 cache _lines_ is more important than the size of the L1 cache.
The granularity of the cache lines uses up the size of the cache very quickly
if all cache lines are padded up with 3/4rd nonsense that you don't need right
now.

------
vidarh
This is great. Garbage collection is one of those things that tends to scare
people, even though a basic garbage collector can be _very_ simple (though a
naive approach is usually also horribly slow..). Great to see the amount of
illustrations to make it easier to grasp.

~~~
cracauer
A GCed piece of nontrivial code can also be faster than a malloc/free
solution.

The reason being that allocation can be much faster. The GCed code can
allocate memory with as much as an increment of a pointer (and that is atomic,
so no thread locking needed).

malloc/free always do full function calls, and they might/will descent into
dealing with fragmentation (aka finding free space). Likewise, free() isn't
free and cross-thread allocation/deallocation can further complicate things.

~~~
jashmatthews
Is this universally true? Redis for example vendors jemalloc so it’s entirely
possible for malloc to mostly inline? IIUC malloc isn’t a syscall like the
underlying sbrk and mmap calls that malloc implementations use to get memory
from the kernel?

~~~
cracauer
Sure, you can change any of the individual properties.

But if you cannot move memory (adjust pointers like most GCs do) then you will
have to deal with fragmentation, which slows down allocation (or causes other
drawbacks).

Moving GC can also make the program faster due to memory compactation and
hence being more efficient wrt CPU cache and TLB.

~~~
vidarh
Fragmentation issues depends greatly on language. E.g I'm (perpetually, it
seems) working on a Ruby compiler, and I slotted in a naive gc as a starting
point and instrumented it.

Turned out the vast majority of allocations are tiny object instances of a
handful of different sizes. As a result minimizing fragmentation is as easy as
allocating pools of a fixed size for small objects, and round up larger
objects to a multiple of block size. There are still truly pathological
patterns possible, and a compacting / generational gc may still be worth it
later both to deal with that and to reduce the cost of a collection, but for a
lot of uses you can get away with simple options like that.

------
aiProgMach
I'm definitely a newbie at this topic, but I wonder if we can have some type
of static GC? I'm not speaking about Radical change to the model of the
programming language (like Rust), but about a compile time analyzer that can
detect when object will go out of memory (edit: out of scope) and insert a
code to deal with it, or make some expectations about the real generation of
the object, this will decrease the amount of required work by the GC during
runtime, no?

~~~
rng_civ
Isn't a compile-time analyzer that can detect when an object will go out of
memory precisely what Rust (and C++ RAII) does? It's just a form of escape
analysis.

Imagine creating a Rust program entirely with `Rc`. It's basically a GC'd
program at the point where the "roots" are managed by the reference counter.
The "list of roots" is only ever messed with whenever a `Rc` is
dropped/created, and one can optimize functions to take `&Rc` to reduce "GC"
pressure. I do not believe it's possible to automate this process in general
because if you could, I have a hunch the solution can be used to decide the
Halting Problem.

So sure, in general a GC can perform some heuristics to predict the lifetime
of an object, but usually the point of using a GC is that one does NOT know
the lifetime or it is insanely complex.

~~~
nostrademons
Such analysis also requires language design that limits the legal statements
in a program to make the analysis tractable.

Without these restrictions, this problem is isomorphic to the halting problem.
(Proof: assignment of a given memory object to a field within another
unrelated object creates another reference. The job of automatic memory
management is to determine when no such references exist. Now replace that
assignment with HALT. Any such automatic memory manager that operates
statically would be able to find all HALT statements within the program and so
solve the halting problem for an arbitrary program.)

That's why languages that manage memory statically like Rust & C++ must be
able to reject some programs as "not passing the borrow-checker", and
everything else requires run-time support via either GC or refcounting.

~~~
vidarh
Doing it perfectly requires a suitable language design, but even a
pathologically statical analysis unfriendly language like Ruby still allows
you to determine it in many cases. You just need to accept that for such
languages it is an optimisation, and you still need to fall back on full gc.

------
a1369209993

      > if (previous != NULL) {
      >   previous->next = object;
      > } else {
      >   vm.objects = object;
    

Ergh. That's much more painful than it needs to be. Maybe try:

    
    
      static void sweep() {
        Obj** ref = &vm.objects;
        while (*ref != NULL) {
          if ((*ref)->isMarked) {
            (*ref)->isMarked = false;
            ref = &(*ref)->next;
          } else {
            Obj* unreached = *ref;
            *ref = unreached->next;
            freeObject(unreached);
          }
        }
      }

~~~
1f60c
Funny you might say that.

 _Baby’s First Garbage Collector_ [0] (from the same author) uses this
approach.

[0]: [https://journal.stuffwithstuff.com/2013/12/08/babys-first-
ga...](https://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-
collector/)

------
ufo
The long awaited garbage collection chapter is finally here! I'm not sure the
discussion about white/gray/black objects is strictly necessary for a simplest
possible garbage collector, but it definitely will help those who want to read
more about the topic in the future.

~~~
munificent
_> I'm not sure the discussion about white/gray/black objects is strictly
necessary for a simplest possible garbage collector_

I wondered about that too. If you aren't doing an incremental collector, it's
not really necessary. But I think it helps build a visual intuition for the
algorithm (and other graph traversals for that matter), so I felt it was
worthwhile to put in there.

------
flavinoezs
Garbage collection solve many problems on virtual machine as memory leaks and
override on primary memory, many programming languages now implement something
like garbage collection on API then the compiler can manage the memory for the
user.

