
A Unified Theory of Garbage Collection (2004) [pdf] - mpweiher
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
======
pcwalton
This is a classic paper that's a must-read for anyone interested in GC. The
basic idea is that not only is reference counting a form of garbage
collection, there's actually a _continuum_ of GC approaches between pure
tracing and pure reference counting. Each algorithm has its tradeoffs, and,
crucially, most optimizations—for example, generational garbage
collection—actually just move the scheme toward the opposite pole, acquiring
some of the upsides and downsides of the other algorithm. Viewed through this
lens, most of what we would consider well-optimized GCs deployed in the wild
(Java HotSpot, .NET, modern JS engines, etc.) are really best viewed as
hybrids of tracing and reference counting, balancing the upsides and downsides
of each approach.

~~~
zZorgz
> The basic idea is that not only is reference counting a form of garbage
> collection..

Oh, how this leads to debate. The confusion is whether having automatic
reference counting without a cycle detector is considered GC by the masses
(what language publicizes a GC and does this?), and if people question GC to
be a form of automatic memory management, where other forms may exist.

Briefly looking at the article I think the authors opinion on this matter is
pretty clear that an additional algorithm such as a cycle collector is
required to be considered GC.

edit: "Reference counting" itself also isn't even necessarily 'automatic'.
Just find it odd the way this gets phrased. GC can use reference counting, not
the other way around..

~~~
pcwalton
> Briefly looking at the article I think the authors opinion on this matter is
> pretty clear that an additional algorithm such as a cycle collector is
> required to be considered GC.

That is almost certainly _not_ what David Bacon thinks. I don't know how you
could get that idea from reading the introduction. Reference counting is a
form of garbage collection, period.

C++, Objective-C, etc. started muddying the waters in the '90s and 2000s and
got people thinking that somehow reference counting isn't GC and therefore
doesn't have the downsides of automatic memory management. But the literature
is clear on this point.

~~~
zZorgz
> That is almost certainly not what David Bacon thinks. I don't know how you
> could get that idea from reading the introduction. Reference counting is a
> form of garbage collection, period.

I'm not sure what you're reading. I will quote:

"Tracing and reference counting are uniformly viewed as being fundamentally
different approaches to garbage collection that possess very distinct
performance properties"

Saying reference counting is an approach that garbage collection can use is
certainly valid, but is not the same as saying that reference counting is a
_form of_ garbage collection, which is nonsensical usage IMO.

Further evidence that the author believes that reference counting alone is
insufficient for garbage collection:

"Finally, tracing collects cyclic garbage while reference counting does not.
As a result, when reference counting is applied to heaps that may contain
cyclic garbage, cycles must either be collected using a backup tracing
collector [43] or with a trial deletion algorithm"

"One of the primary disadvantages of reference counting collec- tors is that
they do not find cyclic garbage. Therefore, an additional mechanism is
required."

Regarding "who was first", although I don't have any supporting proof, I'd be
shocked if usage of manually managing reference counting hasn't predated both
automatic memory management and garbage collectors, which isn't a technique
constrained to, say Objective-C.

Also worthy of mention, some referenced counted garbage collection
implementations (e.g. CPython) vend an API for disabling the GC, which
certainly has no implications on reference counting.

~~~
pcwalton
> Saying reference counting is an approach that garbage collection can use is
> certainly valid, but is not the same as saying that reference counting is a
> form of garbage collection, which is nonsensical usage IMO.

You're drawing distinctions in the paper that the author almost certainly did
not intend. "An approach to garbage collection" means "a form of garbage
collection". If Bacon had meant to write "an approach that garbage collection
can use", he would have written that.

David Bacon's opinion is that RC systems _should_ have some mechanism to
collect cycles. This is understandable, given that he did more work than
anyone else to develop good cycle collectors (and his work in this area is a
good read). But nobody in the GC field, including him, would claim that
reference counting without cycle detection isn't garbage collection at all.

All garbage collection (automatic memory management) systems use a form of
approximation to determine which objects are reachable. Some approximations
involve reachability; that is what tracing GCs use. Some approximations
involve tracking the number of outstanding references; that is what reference
counting GCs use. A pure GC has a _more fine-grained approximation_ than a
pure reference counting one does. But both are approximations to the real
problem: whether some data will dynamically be accessed by the program again.
Reference counting has a less precise approximation, but that doesn't make it
not a form of garbage collection.

Required reading:

\- [http://www.memorymanagement.org/glossary/g.html#term-
garbage...](http://www.memorymanagement.org/glossary/g.html#term-garbage-
collection)

\- [http://www.memorymanagement.org/glossary/r.html#term-
referen...](http://www.memorymanagement.org/glossary/r.html#term-reference-
counting)

~~~
zZorgz
By saying "reference counting is a form of GC", I'm not sure whether or not
you think that reference counting has usage outside of (and could have
predated) GC. David did not write "a form of" either, although you think the
meaning conveyed is the same. Also one could also counter similarly with
saying that if David meant _should_ , he would have said so, but whatever.

I realize memorymanagement.org conflicts with my usage, although I'm not sure
I subscribe with the idea that the usage people have been using first
historically is sensible.

------
tybit
I'm curious how RAII approaches relate to this?

Would it be considered a form of binary reference counting(where reference
counting isn't explicitly used, i.e non shared_ptr code in C++) or it's just
completely separate and not part of the dual GC theory?

~~~
eru
At it's basic RAII is about re-using the a hierarchy of lifetimes (induced
from the hierarchy of stack frames in as strict language) for managing
resources other than memory.

------
eru
Great paper! Especially useful when talking to C++ fanatics who are proud of
their language's de-facto reference counting via 'smart pointers'.

~~~
todd8
Python uses reference counting with a backup garbage collector just to pick up
unreachable sets of object having circular references.

~~~
__s
Python is slow & CPython is ever slower because reference counting is part of
what makes removing the GIL such a challenge

~~~
dfox
The hard part of removing GIL are the language semantics themselves. Replacing
CPython's reference counting with some kind of tracing GC will still leave you
with problem of coming up with low-overhead thread-safe dictionary
implementation.

~~~
hyperpape
I agree with the first sentence. For the second part, what are you thinking
of? For reads, there's very low overhead for using an implementation like
Java's ConcurrentHashMap. The overhead for insertions is more significant, but
not enormous.

