
Writing a Simple Garbage Collector in C - webkike
http://web.engr.illinois.edu/~maplant2/gc.html
======
munificent
This article teaches you how to write a _conservative_ GC, which is what you'd
do if you wanted to garbage collect in C or some other system where you don't
know the shape of objects at runtime.

If you do know that -- like you're writing a GC for a higher-level programming
language -- you can write a _precise_ GC instead and avoid all of the (very
interesting) muckiness this article goes through.

I wrote an article on that a while back:

[http://journal.stuffwithstuff.com/2013/12/08/babys-first-
gar...](http://journal.stuffwithstuff.com/2013/12/08/babys-first-garbage-
collector/)

~~~
Someone
Also, that avoids the very muckiness this article skips. In particular:

 _" Local variables can also be stored in registers, but we won't worry about
this because registers and usually dedicated to local variables, and by the
time our function is called they'll probably be saved on the stack anyway"_

Ignoring things that 'usually' or 'probably' do not happen is not a good idea
when programming, doubly so when doing systems programming. It is one thing
when your application crashes, another when the OS crashes or becomes
exploitable due to a bug.

If you want to implement a robust conservative garbage collector you will have
to study the calling conventions of the language you write it in, will have to
study the source of the compiler, and will have to study the release notes of
every OS or compiler upgrade. For example:

\- rust moved from segmented stacks to conventional ones. That probably broke
any code walking the stack.

\- on some systems, locals can get stored below the stack pointer
([http://en.m.wikipedia.org/wiki/Red_zone_(computing)](http://en.m.wikipedia.org/wiki/Red_zone_\(computing\)))

(Yes, this example code appears to be broken on x64)

~~~
tankenmate
Checking the registers is easy enough; just push them onto the stack and a
quick check of several consecutive memory addresses and you are done.

The other way to improve GC if you are using C is to make sure your local
allocations are in fact local; use alloca() and friends (but only for sub page
sizes, possibly less if you are doing lots of local allocations). This will
reduce the amount of GC required.

As far as Red Zone goes if you are in the GC then you can't be in another
local function; unless you are running your GC pre-emptively (signals etc), in
which case you need to be much much more careful.

------
millstone
> Local variables can also be stored in registers, but we won't worry about
> this because registers and usually dedicated to local variables, and by the
> time our function is called they'll probably be saved on the stack anyway

Heh. No wonder it's 32 bit! You can only hope to get away with this on a
register-poor ISA like i386.

I wonder if you could use setjmp to portably get access to register values?

~~~
cremno
It isn't portable. jmp_buf should be treated as opaque data type (just like
FILE or va_list), even if it isn't. Also, its content can be encrypted (glibc
does pointer mangling).

------
kazinator
1\. Writing a _simple_ anything isn't hard.

2\. Especially if _simple_ means you don't have any interrupts or threads.

3\. Writing anything is a lot easier if you carefully exclude "debugging" from
"writing".

When you have GC bugs, they don't always occur in the collector itself, but
rather when the code which relies on GC breaks the conventions that allow GC
to work correctly (because that code is generated by a C compiler which
doesn't understand your conventions, and so they are ensured by hand).

~~~
radicandor
Large, portable, robust programs are hard to write. But that's not what I'm
addressing. I'm trying to show how what is generally considered a complex
problem is simple when you strip it down to its essence.

~~~
kazinator
_" Local variables can also be stored in registers, but we won't worry about
this because registers and usually dedicated to local variables, and by the
time our function is called they'll probably be saved on the stack anyway"_

This is wrong; you have to take care of registers. The good noews is that an
easy way to do this that will work on a broad range of platforms is to store
the machine context to a jmp_buf that is on the stack. (You never use this
jmp_buf for a longjmp, obviously.)

~~~
MaulingMonkey
Relying on "probably" is completely the wrong foot to start out on, IMO. It
leads to this kind of craziness: [http://timetobleed.com/the-broken-promises-
of-mrireeyarv/](http://timetobleed.com/the-broken-promises-of-mrireeyarv/)

~~~
kazinator
I think the article does not really articulate the root cause properly. The
issue is that a Ruby object goes out of scope, while a low-level pointer to
its internals continues to be used (in the middle of code that allocates and
can trigger GC). How that problem translates to code generation on amd64 is
secondary.

What is happening is that the string object has been reduced to its C
fundamentals (pointer to data, and length), and the one and only variable
("str") which holds the original object from which these fundamentals are
derived has no next use after that.

The fix in the patch is not very good; the real fix is to get rid of the nasty
macro zstream_append_input2, and make it a real function.

The function still has to solve the problem of pinning the object while the
low level pointer is used, but it's more localized: we don't have to think
about the interaction of the macro with the site where it is expanded (and
will the hack always work).

In other words, don't write macros that break apart garbage-collected objects
into low-level C pointers to their internals that are invisible to the
collector. The macros spread dangerous code everywhere they are used.

~~~
MaulingMonkey
> The issue is that a Ruby object goes out of scope, while a low-level pointer
> to its internals continues to be used (in the middle of code that allocates
> and can trigger GC). How that problem translates to code generation on amd64
> is secondary.

"Out of scope" as defined how? Not by the C code, it can be in-scope in C but
optimized away by the compiler... which seems best illustrated by dissecting
the compiler's output.

> The fix in the patch is not very good; the real fix is to get rid of the
> nasty macro zstream_append_input2, and make it a real function.

That "real fix" doesn't actually _fix_ anything, though. Don't get me wrong,
macros are ugly (especially lowercase ones) and that's a fine refactoring to
suggest, but it's not a _fix_.

> In other words, don't write macros that break apart garbage-collected
> objects into low-level C pointers to their internals that are invisible to
> the collector. The macros spread dangerous code everywhere they are used.

In this context, there's extremely little difference between a macro and a
function that the compiler decides to inline.

------
IgorPartola
So am I reading this correctly, in that if I get allocated a segment at
0x12340 and then a random variable on the stack happens to have the value of
0x12340 for a different reason, then that segment would never be freed? Is
this strategy still sound?

Also, while this is a nice explanation, does anyone actually use GC's while
writing code in pure C? I never found the idea of calling free() that
troublesome. On the other hand, writing a GC in C for another language is
obviously a good use case.

~~~
webkike
Indeed, the garbage collector is not precise. However, in practice, the
situation you describe does not tend to occur that often. Intuitively there is
a 1/4 chance on 32 bit machines, but the chances are actually much due lower
to the heap being located at a high memory address, and most local integer
variables tend to have small values.

And yes there are definitely uses for a pure C garbage collector. For example,
one can use it to detect memory leaks, if for some reason valgrind isn't
around.

~~~
oppositelock
You are wrong.

I made a comment to that effect higher up before noticing this.

When you're dealing with real world data - timings, statistics, sizes, dates,
and then if you add floating point numbers whose representations look like big
ints, you will be hitting this a lot, and leaking memory, due to how your mark
and sweep works.

It is absolutely critical to have a sure way of disambiguating pointers from
data.

I've fixed bugs in a memory allocator that's running in an app deployed on
several hundred million PC's. It had a really fancy allocator that put a
header with a magic number around malloc output, in addition to what you do,
in an attempt to identify specific kinds of allocations. We'd see several tens
of thousands of crashes a day due to magic number collision with real world
data.

Your intuition is wrong. You can never assume that most local integer
variables tend to have small values. Assign -3.0 to a float, and map that to
an int, or read the current epoch time into an int, and see what happens.

A C garbage collector would certainly help write some code more easily, but it
has to work, and it has to be "precise". An imprecise GC is worthless in
production, but fun as an academic exercise.

~~~
rayiner
There are quite a few collectors in production that are conservative at least
for the stack. D, Boehm, JSC, SBCL, etc. It can be a problem, but it often
isn't, especially when you have precise knowledge of heap object layouts.

------
doty
"People seem to think that writing a garbage collector is really hard [...]
Well it's not. In fact, it's rather straight forward."

[...]

"Thirdly. Please don't use this code. I did not intend for it to be wholly
correct and there may be subtle bugs I did not catch."

~~~
scorpioxy
I am not sure what you're pointing at. I think this article aims to do the
same as any other "compiler for a toy language" article; namely remove the
fear barrier and explain that at their core, many of our black boxes utilize
simple concepts.

I've always thought a compiler was difficult to write until I wrote my first
one. Sure it is nothing when compared to a production-level compiler but
definitely no longer think that there's something magical about compilers,
even the more advanced ones.

~~~
doty
I'm sure that I don't need to tell you that there are many things in this
world that look easy, but are in fact very hard. Even things that are very
simple at the core turn out to be very difficult in practice.

I mean, to be a little flippant, all you need is abstraction and application,
and you can compute all computable problems, right?

More to the point: what I was trying to do was to point out the irony of
saying "building a garbage collector is easy" and then, only a few sentences
later, saying "this code may have subtle bugs." Saying "this code is not fast
enough for production" would be one thing, saying "this code is not correct"
seems to put lie to the assertion that building it is "easy."

I personally have never built a garbage collector in anger. But I suspect that
the problem of building a passable garbage collector (that is, one that
somebody else would be willing to use) is quite a bit harder than it is made
out to be in this article.

~~~
scott_s
The point of this post is education. The author is trying to de-mystify
garbage collectors. His approach - which I think is a good one - is to present
code that is quite simple, but still meets the criteria for being a garbage
collector.

This is a pedagogical technique. Your points are valid, but they also ignore
that the author is trying to teach concepts through code. That often require
simplification.

------
ChuckMcM
Not sure if webkike is Matt or not, but either way its a nice little mark and
sweep GC. It reminded me a bit of the first one that was written for Oak (the
pre-cursor language to Java). I have always been a fan of building versions of
technology like this to experiment with because it helps you figure out where
the hard parts are of the 'simple' programs. One of my favorite interview
questions is to start with traversing a list to arrive at an answer, now the
list is getting insertions from multiple sources ...

So too with GC that you start with a hook into your allocator and you end up
with coloring references and aging them and disabling compiler optimizations
and all sorts of other things that lead to really unexpected behavior in your
environment.

~~~
radicandor
I'm Matt (officially). And webkike is not me (also officially).

~~~
smeyer
What do you mean by "officially" in this sense? I'm a native English speaker
and all, but I'm not sure what you're implying. How would you unofficially be
Matt? Or unofficially not be webkike?

~~~
radicandor
It's a joke that is probably only funny to me. Because I am the authority on
who I am, adding (officially) is superfluous. Sorry for the confusion.

------
crazypyro
It would be helpful to note that some of the variables used in the first two
function examples are defined further down the page. I was trying to
understand how his first loop in add_to_free_list worked and couldn't figure
out how because *freep was never declared anywhere and that was his list of
free blocks. Took me a page of scribbling pointer diagrams and 5 minutes to
figure out that it was declared and set further down the page as a global.

------
rurban
re preciseness: You should really add your preciseness limitations. Also add
that pointers located in loaded shared libs are not detected with this method.
Boehm has special methods to detected the ranges of loaded shared libs.

Maybe you should mention weak refs and volatile also. volatile to keep
pointers on the stack, so that you don't have to inspect the registers also.
And some easy ways to make it precise. E.g. int tagging, float boxing, ...

------
pharaohgeek
I really appreciate seeing articles like this. This is not a topic most
developers typically encounter in their every day work, and is often not a
topic taught in college. Articles like this really help stretch my thinking
and let me look at a problem I'd otherwise never encounter. Thanks for
posting!

------
oppositelock
This code won't work.

It scans heap memory, and if a word looks like it points into allocated
memory, it's assumed to point into allocated memory, and that memory is marked
as used.

When dealing with numbers, such as times, file sizes, etc, you will inevitably
store an integer value that looks like a pointer into a valid block.

The reason you can do mark and sweep in Java, is that there is a distinct
wrapper for all pointers, so you can disambiguate pointers from data. You
can't do that in this algorithm, it's impossible. This is also the same reason
that disassembly of executables is so hard; it's not always clear what's data
and what's instructions.

~~~
webkike
This has been discussed before. It's the distinction between the collector
being precise and conservative. Conservative collectors are fine because they
err on the side of retaining memory rather than freeing it. As I stated
before, intuitively the odds of an integer containing a pointer-like value is
1/4 (on a 32 bit machine), but they actually tend to be much lower.

Even libgc has this issue and it is impossible to resolve in C.

~~~
cygx
s/unprecise/conservative/

~~~
webkike
Thank you.

------
allegory
I wrote a simple GC for C once. It wasn't simple. I gave up after I hit what I
felt like was the five-thousandth edge case and about 6000 LoC.

