

Demystifying Garbage Collectors - gmcabrita
http://xtzgzorex.wordpress.com/2012/10/11/demystifying-garbage-collectors/

======
RodgerTheGreat
If you can look at a word of memory and differentiate pointers from values,
garbage collection can become extremely simple. It's a shame that tagged
architectures have largely died out.

As an experiment, I tried writing a garbage collector which used high-order
bits of a word to identify pointers. The result is about a page of code in
Forth:

<http://hastebin.com/raw/gabunowelo.fs>

This example works exclusively with fixed-size "cons pair" allocations, but
generalizing to arbitrary-sized allocations only increases the complexity of
the system slightly. Obviously this bitflag technique is not "safe" in
general, as arbitrary values on the stacks could produce false positives, but
it's easy to imagine a 33-bit or 65-bit architecture that provided the
necessary hardware support without such caveats.

~~~
sedachv
> It's a shame that tagged architectures have largely died out.

Judging by benchmarks, it's a good thing they did. Byte-addressed but word-
aligned pointers plus pipelining and superscalar processors (branch prediction
probably helps too) give you almost the same performance as explicit
instructions for tag support. The amount of flexibility and the simplification
in the instruction set you get from it is well worth it IMO. SPARC had such
instructions, and on benchmarks (see
<http://www.cs.ucsb.edu/~urs/oocsb/papers/oo-hardware.html> and
<http://www.cs.cmu.edu/~ram/pub/arch.ps>) the difference in speed between
using them and not was less than 5%.

On a 64-bit machine, you have 30 different tags available (5 tag bits, since
the smallest thing you'd want to point at is going to be two 64-bit words big,
and two of the 5-bit patterns reserved for positive and negative fixnums, for
60-bit immediate integers).

The problem is a lot of algorithms (especially crypto) are designed around 32
or 64 bit words, which typically means you have to make contortions and
declarations to have your compiler operate on immediate representations.
Typically, the compiler will set aside some registers on the CPU for dealing
with these immediate values, so the garbage collector doesn't get confused. On
x86 this might mean you don't have enough registers to run the algorithm
efficiently, even if the compiler knows exactly what needs to be unboxed.

Another problem is floating point values. On 64-bit systems there's a
technique called NaN-boxing that can do something about this
([http://wingolog.org/archives/2011/05/18/value-
representation...](http://wingolog.org/archives/2011/05/18/value-
representation-in-javascript-implementations)), but I don't yet understand how
it works.

I do agree on one thing - slapping an extra byte onto word size would be
awesome, as long as that extra byte is general purpose. Other things besides
tagging that you could do with it would be error-correcting codes (the
Symbolics Ivory actually had ECC for every word), and all kinds of metadata.
Maybe it would even make sense to keep this extra byte out of the memory bus
and just have it on registers; that's where most of the trouble with tagged
values on modern machines seems to come from. This would make sense for
load/store instruction sets.

~~~
tomp
In IEEE 754 standard, floating point numbers are represented using 1 bit for
the sign, 11 bits for the exponent, and the remaining 52 bits for the fraction
(number between 0 and 1). The exponent 7FF (in binary: all ones) is reserved
for _special values_ : NaN and infinity. If the fraction is 0 (all zeroes),
then the number is +/- inf, otherwise it is a NaN.

NaN-tagging works by using the unused variations of NaN for pointers. The
trick is that most processors only generate a single binary pattern for NaN,
so all other binary patterns are never used to represent floating point
numbers, and can thus be used to represent pointers. This works since the
current x64 implementations only use 48 bits for the pointers, so all pointers
can "fit" into the remaining 52 bits of NaN.

~~~
lutusp
> In IEEE 754 standard, floating point numbers are represented using 1 bit for
> the sign, 11 bits for the exponent, and the remaining 52 bits for the
> fraction (number between 0 and 1).

On reading this I immediately read the standard, hoping to discover that the
word "fraction" wasn't used, only to find that "fraction" is defined as the
part of the significand (mantissa) lying to the right of the implied binary
point. In essence then (because the bit to the "left" of the binary point is
only ever implied, never explicit), "fraction" stands as a synonym for
significand.

Unfortunate terminology. And why? Here's why:

A fraction: 1/3

A significand, base ten: 0.3333333333333333

IEEE 754 can provide the second, but it cannot provide the first.

IEEE 754:

<http://kfe.fjfi.cvut.cz/~klimo/nm/ieee754.pdf>

Significand:

<http://en.wikipedia.org/wiki/Significand>

A quote: "The significand (also coefficient or mantissa) is part of a
floating-point number, consisting of its significant digits."

p.s. this is not a correction, it's a lament.

~~~
tomp
It's actually even more complicated...

Remember, this is binary, so the only possible digits are 0 and 1. For
example, 1/3 in binary is 0.01010101010101...;

Encoding that in "floating point", we first get:

    
    
      1.0101010101 * 2^-2;
    

We can see that in this format, the first digit of the number is always 1;
thus, the first digit of the significand, for any number, is ALWAYS 1 (in
binary)! IEEE754 makes the optimization to not store this leading 1 as part of
the fraction, so the the approximation to 1/3 is actually encoded as:

    
    
      0 01111111101 0101010101010101010101010101010101010101010101010101
      ^      ^            ^
      |      +-----+      |
      sign bit     |    fraction = 1.33333... = 1.01010101010101...0101010101010101b
                   |                              |---  this part is included  ---|
                   |
            exponent = 01111111101b - 1023 = 1021 - 1023 = -2

------
microtherion
"Garbage Collection" by Jones & Lins was, in my opinion, an excellent book
back in the day: <http://tinyurl.com/8lrveqm>

I noticed that Jones has a new book (The Garbage Collection Handbook) out now,
which presumably is even better: <http://tinyurl.com/8nl6con>

~~~
alexrp
It's an amazingly in-depth and concise book. I fully recommend it!

------
pcwalton
"It is very likely that the Rust language will go with a similar model [per-
thread instead of global garbage collection]."

Rust is using this model today.

~~~
alexrp
I was under the impression that the collector that's in place right now is
just a cycle collector, not a full-blown garbage collector. But please correct
me if I'm wrong!

~~~
pcwalton
Oh yes, you're right -- I was assuming you were grouping cycle collection
under garbage collection. In any case, the cycle collector is task-local, not
global.

------
batgaijin
I think a really cool tactic is racket's places, which basically creates
individual zones running their own module with their own gc (but objects
shared between cores don't take up extra space, there is a global table or
something).

~~~
alexrp
This sounds very much like how Erlang does it - each process gets an isolated
garbage collected heap and communication between processes happens through
message passing.

How lightweight are Racket places? I'm not very familiar with the language, so
I don't know if they're really comparable to Erlang processes.

~~~
batgaijin
Not lightweight at all really; a 1:1 mapping between places and cores.

------
mseepgood
Nice article. What I didn't understand: How does a conservative GC without
type information know where the references are in an object? E.g. given this
object:

struct {

double a; // 64 bit

short b // 16 bit

int c, d; // 32+32 bit

FooPtr e; // 64 bit

int f; // 32 bit

BarPtr g; // 64 bit

}

Does it assume that all fields are aligned to 64 bit boundaries? Does it
potentially consider a double to be a pointer? And how does it know where to
stop looking for references without knowing the size of the object?

~~~
alexrp
I'll start with your last question: It _does_ know the size of objects - this
size is passed to the GC when you allocate memory from it. Root ranges (such
as the stack, global variables, TLS areas, etc) also all have a static size.

(Not strictly true - some runtimes have dynamically growing stacks, but the GC
knows the size regardless.)

Most garbage collectors assume pointers to be aligned on a word boundary; that
is on a 4-byte boundary on 32-bit machines and an 8-byte boundary on 64-bit
machines. This is a reasonable assumption because accessing pointers that are
not word-aligned is _extremely_ slow on most architectures. It does not care
how fields that don't contain pointers are aligned because their contents are
irrelevant (so this translates to the compiler being able to pack some fields
together without worrying about breaking the GC).

So, a conservative GC will simply scan over every word in an object and
interpret it as a potential pointer, regardless of what it actually is. So yes
- even a double will be considered a pointer.

~~~
mseepgood
Thanks alexrp and digitalinfinity for your replies!

------
weirdkid
Oh, THOSE garbage collectors. I was rather hoping this would be an exposé on
the secret tech employed by curbside trash collection companies.

------
keikun17
i see alex is this busy. no wonder alex hasn't been online in steam recently

