
Rare Are GC Talks - pat_shaughnessy
http://furious-waterfall-55.heroku.com/ruby-guide/internals/gc.html
======
silentbicycle
Wilson's highly readable "Uniprocessor Garbage Collection Techniques" is a far
better overview:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.5038)
You could write a decent GC with just the info in that paper.

It doesn't cover parallel or real-time garbage collection (which both get far
more complicated); for those, you want Jones et. al's _The Garbage Collection
Handbook_ ([http://www.amazon.com/The-Garbage-Collection-Handbook-
Manage...](http://www.amazon.com/The-Garbage-Collection-Handbook-
Management/dp/1420082795/)) and plenty of time to explore its bibliography.
(His older book is also good, but doesn't cover those topics.)

~~~
mattyb
That _is_ a good paper. Thanks for the pointer!

------
gue5t
The reference counting metaphor leaves out a critical aspect of reference-
counting: cyclical structures and how to collect them. Books never write their
names in other books so this doesn't appear in his discussion, though it's one
of the more important complications for implementations.

~~~
electrograv
Honestly, this article is really bad and probably harmful to new programmers.
I recommend against taking anything in it to be accurate, much less complete.

The author misses vitally key issues (like the inherent dangers of circular
references in reference counting "GC"). I wouldn't even call myself slightly
knowledgeable in GC, yet blatant holes in the clumsy analogies of this article
are obvious to me. As someone who is not even slightly knowledgable in GC, it
scares me that people are learning from someone who has massive information
holes in the explanation -- holes in topics that every programmer on earth
should know.

On top of all that, the only "Cons" that are mentioned of RC are absurd: "It's
annoying if you forget to release your references." Wow. Profound stuff there
- bugs and incorrect code is annoying. This isn't even a problem if your
language supports RAII, in which it's impossible to "forget" in the first
place.

~~~
Groxx
I sincerely doubt people are going to read this, and then immediately write
and ship a garbage collector. For a square-zero general-idea overview to
someone who has zero concept of what a GC does, it's more than accurate
enough.

It could probably be about 1/2 the size, as it shoots off in weirdly-complex
directions for such an introductory text. And it could probably introduce e.g.
reference cycles, because they're a real problem with reference counting. But
it's a clearer and simpler text by far than I generally see on this topic.

------
drostie
One of the most fun discussions of the complications of reference counting and
how nuanced you have to be comes from the Python docs:

[http://docs.python.org/extending/extending.html#reference-
co...](http://docs.python.org/extending/extending.html#reference-counts)

It relates "a true story. An older version of Python contained variants of
this bug and someone spent a considerable amount of time in a C debugger to
figure out why his __del__() methods would fail" in a considerable amount of
detail.

------
ctide
_At the December 2008 Kyushu Ruby 01 Conference, I asked the audience "How
many of you here have some interest in garbage collection?" Out of 200 people,
only 3 raised their hands_

Can you imagine how many people would raise their hands if you asked that
question NOW at a ruby conference? Crazy how times change.

~~~
notatoad
can you explain a bit why that has changed? I don't really know anything about
ruby, but I would have thought that being able to ignore abstractions like
garbage collection was the reason that people used high-level languages. Does
programming in ruby involve a lot of interaction with the garbage collector,
or are you saying that a lot of the ruby community is actually becoming
involved with writing and improving low-level language features?

~~~
suresk
My experience in this regard comes from Java, not Ruby, but I think the basic
concepts are the same: You are correct in that, for a lot of uses, ignoring
abstractions like GC is just fine.

If you are building something that uses a lot of memory, requires high
throughput or low latency, or are getting into situations where GC is taking
up a decent chunk of your CPU time, then it can pay to have a better
understanding of the GC concepts so you can make beneficial changes to your
application or to the GC parameters.

Plus, stuff like GC is kind of interesting by itself for some people. Some of
my favorite sessions at JavaOne were always the GC and JVM internals ones,
despite the fact that their immediate utility to me wasn't always obvious.

------
kenmazy
Another great GC discussion for LuaJit: <http://wiki.luajit.org/New-Garbage-
Collector>

------
alinajaf
Interesting that there's no mention of generational GC in this article. Anyone
with a bit more expertise care to comment on why not?

~~~
jimm
Yeah, I noticed that. I first heard of generational GC in association with
Smalltalk VMs and I think that's where they made their debut.

~~~
cwzwarich
The first generational garbage collector was for Lisp:

[http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realt...](http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realtime.html)

------
adamnemecek
Articles like this are exactly why I come to HN.

~~~
pat_shaughnessy
Agreed! Also, beyond the great tech content there's something charming about
this one.

------
cronin101
In the article he states "[for Copying] Conservative GC is unable to determine
the difference between pointer and integer values. If an integer value was
overwritten the result would be disastrous. It is likely that the result of
1+1 would change occasionally."

However reading through the RHG, it is shown that Ruby allocates memory for
objects in 20 byte blocks and that this results in all pointers therefore
being multiples of four. This is handy as it allows the direct usage of
literals such as FixNum (an always odd 'pointer' that you shift right by one
bit to access the value) and Symbol (a 'pointer' that always has 0xff as the
trailing byte that you shift right 1 byte to access the unique ID) without
requiring object creation.

With this in mind, can someone enlighten me as to why Copying could not be
used inside Ruby? It seems as though it would be trivial for the GC to
differentiate between literals and pointers as otherwise they would not be
much use as literals.

------
kmm
I don't see how a Copying GC can be faster than Mark and Sweep. Transferring a
book may be easy, but copying an object can be quite costly. I can't imagine
copying almost all objects everytime the GC runs.

~~~
waterhouse
Suppose that the program only needs about 1 MB of objects, but it will fill up
100 MB before performing a copying GC. In mark and sweep, you need to perform
an operation for every garbage object (of which there are 99 MB), whereas with
copying GC, you need only perform operations for every live object (of which
there is 1 MB). Even if copying an object takes ten times as long as returning
a chunk of memory to a free-list, the copying algorithm is faster. (Also,
popping a free cell off a free-list is more expensive than incrementing an
allocation pointer.) Of course, either way, the total work done for allocating
objects and freeing them is O([number of objects allocated]).

Something like this is described in detail here: "Garbage Collection Can Be
Faster Than Stack Allocation", Andrew Appel:
<http://www.cs.princeton.edu/~appel/papers/45.ps>

~~~
silentbicycle
Not necessarily -- lazy sweeping only needs to do work proportional to live
data (like copying GC), rather than proportional to dead data.

I have a pretty straightforward implementation of a lazy mark-sweep collector
here (in C): <https://github.com/silentbicycle/oscar>

Basically, instead of marking live data, then sweeping all unmarked data, you
just mark live data, then the next time a slot is requested, you step the
allocator over the heap until it finds the next unmarked cell and re-use that.
If most cells are dead, this happens very quickly. If you have to sweep over
too many live cells (left deliberately vague), you grow the heap.

~~~
waterhouse
Interesting... I still feel that this is for the most part merely deferring
the free-ing work to when you next allocate an object. It may succeed in
getting the "free" cost down to a single instruction--checking a mark bit--but
it is still a cost per dead object that a copying GC wouldn't have to pay.
(The difference may be, as some say, "academic".) It does have the slight
advantage of reducing the maximum pause time due to sweeping down to being
proportional to live objects rather than garbage.

Also, what if you want to allocate a large object? You might have to skip over
some unmarked cells. Either you'd waste the memory, or you would probably want
to put them on a free-list of some sort, in which case you are doing more than
just bumping a pointer and checking for a mark. ... I see, it says "equally-
sized chunks of memory".

Still, that's a cool idea. As mark-and-sweep goes, anyway. :-} I'm rather more
fond of copying garbage collectors--I like the determinism, the fact that
(normally) the memory address of each object after a full copying GC will
depend only on the structure of the graph of objects, and not on the memory
addresses of the objects prior to the GC. (In other words, it destroys the
information "what the memory address of each object was before this GC". This
implies a somewhat higher inherent cost, to eliminate this entropy. See [1]
for musings about the relation of this to thermodynamics.) In particular,
whether you have fragmentation issues won't depend on the past history of your
program. (You could say that the space lost to fragmentation is fixed at a
factor of 2.)

[1] <http://www.pipeline.com/~hbaker1/ThermoGC.html>

~~~
silentbicycle
Checking the mark bit can be quite cheap, though, particularly if you store
the mark bits in their own array - the next several bits will be in-cache.
Still, the overall performance is likely to vary quite a bit for either scheme
depending on the project's other quirks. Mark-sweep has the advantage that it
doesn't need to move objects, but copying is easier with variable-size
allocations, etc.

Another disadvantage with lazy sweeping is I don't see a clear way to make it
generational. (Right?)

~~~
crumblan
Mark each arena with the last time something was collected from it and just
skip sweeping old pages until X cycles have passed? Just thinking out loud...
that check wouldn't save time for half-pages of old objects.

------
davnola
I recommend Elise Huard's recent talk on Ruby GC:
<http://skillsmatter.com/podcast/home/ruby-bin-men>

(Also, I'd love to see the Hungarian Folk Dance interpretation of different GC
approaches <http://t.co/l8ADbEQR>)

------
grimlck
nice introduction, but it seems to be missing some really important concepts

\- generational GC

\- concurrent GC, the parts of GC which can be concurrent (which depends on
your algorithm), and the tradeoffs needed to let your program run concurrently
with the GC without stopping the world.

------
zem
very nice article. given its likely target audience, though, he really needs
to explain explicitly why mark-and-sweep can't compact as it goes, rather than
leaving a fragmented heap.

