
Baby's First Garbage Collector (2013) - rspivak
http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-collector/
======
groovy2shoes
Shameless plug: I ported the code to Rust: [https://github.com/baguette/rs-
babygc](https://github.com/baguette/rs-babygc) (feedback is welcome).

~~~
Manishearth
You should be able to use RAII to pop things from the stack so that it mirrors
the actual stack. I think.

Otherwise, looks pretty good. Interestingly, this doesn't contain any unsafe
code, probably because Rc and RefCell are doing the actual memory/safety
management. Still, the purpose of this exercise is to focus on the mark-and-
sweep part, and that seems to have been done pretty well here!

~~~
groovy2shoes
All the GC implementations I've seen in Rust use unsafe code, and I _really_
wanted to avoid it. I struggled a bit with some of the RefCell stuff, but the
folks on #rust were really helpful. I'm still not sure that a practical GC can
be implemented without unsafe, given that RefCells can't be recursively
borrowed (so as soon as you have mutable recursive data structures, you have
to find some way to work around the RefCell limitation that the contents can
only be mutably borrowed once). I wonder if it wouldn't be nice to have
something similar to a RefCell that allows multiple mutable borrows, which I
think would be safe (only) if the application is single-threaded. Or perhaps a
variant that mutably borrows only non-RefCell bits of the contents, leaving
the nested RefCell free to be mutably borrowed.

Thanks for your feedback :) I'd had it sitting there for a while with no idea
if it was any good or not.

~~~
steveklabnik

      > I think would be safe (only) if the application is single-threaded.
    

Unfortunately, it's not that easy:
[http://manishearth.github.io/blog/2015/05/17/the-problem-
wit...](http://manishearth.github.io/blog/2015/05/17/the-problem-with-shared-
mutability/)

~~~
groovy2shoes
Good to know. Thanks.

~~~
Manishearth
In fact, RefCell can't even be shared between threads, it _has_ to be used in
a single threaded context.

RefCell and the "no mutable aliasing" isn't something to be "worked around",
it's something to embrace, because safety and many other things come from it.

------
mbrubeck
Here's another good "implement a simple garbage collector" tutorial:

[http://web.engr.illinois.edu/~maplant2/gc.html](http://web.engr.illinois.edu/~maplant2/gc.html)

------
ezyang
In fact, most garbage collection variants are very pleasant to implement if
you assume that your system only has one type of object. (Simple enough to ask
undergraduates to do as a weekly homework.) Things get more hairy when you (1)
want to support multiple objects, and (2) want to make it fast.

~~~
kazinator
Support for multiple objects and making it fast is dwarfed by ... requirement
for _threads_ (not those green ones).

Now all those regions of code where "we can muck with this object before
returning it because we know GC isn't running, because we don't call any
allocation functions that can trigger GC" have to be locked down. (Among other
problems to solve, like that a thread can be doing anything when you pause
it.)

~~~
Manishearth
Interestingly, for rust-gc, the initial _safe_ GC design was the hard part.
Concurrency is easy to bolt on to it[1].

Basically, once we had a GC design that was "safe" (In Rust terms; i.e. can't
be used in safe Rust to trigger memory or thread unsafety), making GCd objects
passable between threads was a clean, logical extension of the existing model.

This is probably because Rust's safety model in a single threaded situation is
very close to what you need for multithreaded safety too.

[1]: however it gets tricky again if you want to avoid pausing threads as much
as possible (i.e. only pause on a GC mutation).

------
munificent
Wow, has it been almost three years since I wrote this? Where did the time go?

~~~
michaelwww
Your brain garbage collected the intervening unneeded memories. ;-)

------
imtringued
This beats all of my christmas presents even if it's just a toy
implementation.

------
yason
Writing a gc for native C as a plug-in replacement for malloc/free/realloc is
also very much doable and a rather well-scoped exercise for those who wish to
spend few hours on Sunday hacking.

~~~
pjmlp
It only works if 100% of the code behaves properly, as Apple discovered in
their Objective-C GC failed attempt.

------
mentos
Very nice example contains only the code it needs to explain the concept.

Are there any languages that gc every tick?

~~~
munificent
Not that I know of. There are a bunch of games that use "arena allocators" or
"per-frame allocators". Often, the game engine creates a bunch of objects
during a frame of gameplay that it _knows_ won't be needed once that frame is
done. There's a clever strategy for managing that. When the game starts up,
you allocate a big block of memory, the arena, and then set a pointer to the
beginning of it.

To allocate an object, you just increment the pointer by the size of the
object being allocated and return the previous value. In other words, you just
hand out some bytes and move the pointer just past them.

When the frame is done, to "free" all of those objects... just reset the
pointer. Done.

But, to your original question, it's almost always a bad idea to run your GC
every tick or some other high frequency like that. GCs spend a certain amount
of time visiting objects that are still in use. Doing that work over and over
and over again burns tons of cycles for no point. To get the best overall
performance, you want to do as _few_ GCs as possible to minimize that
redundant work.

The problem then is that when you wait a long time, you end up having to a
big, slow GC and long pauses like that are a problem. Incremental and
concurrent GCs avoid long pauses and give you smoother results, at the expense
of slightly lower overall performance. They get better latency but worse
throughput.

