
A Uniﬁed Theory of Garbage Collection [pdf] - adamnemecek
http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf
======
thamer
See also Michael Bernstein's articles and talks on the subject, including the
paper linked: [http://michaelrbernste.in/2013/06/10/to-know-a-garbage-
colle...](http://michaelrbernste.in/2013/06/10/to-know-a-garbage-collector-
goruco-2013.html)

------
jkbyc
Garbage collection is quite similar to view maintenance in databases (or more
specifically reason maintenance / truth maintenance). Especially the fact that
they found a least fix-point formulation of the problem hints at strong
similarity with computing and maintaining fixpoints in deductive databases -
that is computing and maintaining materialized views from some base facts and
rules (a distinction that does not need to be made; base facts can be seen as
special kinds of rules).

In view maintenance, there are various counting algorithms known since around
30 years - for example the PF (Propagate and Filter) algorithm, or DRed
(derive-rederive) algorithm. One difference is that in garbage collection you
always have the whole graph: edges and vertices, while in view maintenance you
typically have only the vertices. That gives rise to the "proof traces" method
of view/reason maintenance where you build up the whole graph and keep the
edges even though they are useful only for maintenance (or for explanation).

The distinction of tracing garbage collection vs. reference counting garbage
collection could be also seen as sort of analogous to the difference between
forward chaining and backward chaining in datalog or rule systems in general.

It seems like another case where bigger awareness of results from different
but related fields could help to bear fruit much faster. This paper in
particular seems like something that could have been inspired by the old view
maintenance research. Maybe there still are some opportunities left for
transferring results from one of the fields to the other.

------
mrbbk
This is a beautiful result that I've written up and spoken about a few times -
[http://michaelrbernste.in/2014/02/24/papers-we-love-
garbage....](http://michaelrbernste.in/2014/02/24/papers-we-love-garbage.html)
\- it's surprisingly deep and foments a deep intuition that may developers
have.

------
rayiner
There are some neat hybrids of GC and RC as well. This is probably state of
the art in this area:
[https://www.google.com/url?sa=t&source=web&rct=j&ei=xkTDU-
GU...](https://www.google.com/url?sa=t&source=web&rct=j&ei=xkTDU-
GUCtOkyAT-4oKIDQ&url=http://www.cs.utexas.edu/users/mckinley/papers/rcix-
oopsla-2013.pdf&cd=5&ved=0CC8QFjAE&usg=AFQjCNF44hAyxlBLnQO2xh527_TGA7_vDQ)

Conceptually, the key insight is that both GC and RC involve tracing. In GC
you trace from live roots to find survivors. In RC, you trace from newly dead
objects to find others that should also become dead. This has interesting
implications when you combine the algorithms in a generational collector.

Imagine the following steady state situation. You're allocating at X MB/sec,
with a young-generation survival rate of 0.1X. Consequently, this is the
allocation rate into the old generation. Since we're in a steady state, this
is also the death rate in the old generation. If you use GC in the young
generation and RC in the old generation, you'll trace 0.2X MB per second. This
holds true even for an extremely large old generation live size. You'll never
have to trace the whole heap during an old-generation collection, just the
objects that die during that interval. And if you're in a steady-state, this
will be proportional to your allocation rate, not your heap size.

Now, ideally, in a tracing GC you can get your work down to being proportional
to your allocation rate, but this requires, even ideally, a heap sized at 2x
the maximum live size. With an RC old generation, you can get away with a heap
not much bigger than your live size.

There are also certain advantages in terms of concurrent collection. The nice
thing about tracing dead objects is that they are never mutated. So you do it
concurrently with the application without worrying about mutations of the
object graph, like a concurrent mark-sweep collector has to. The downside is
that the write barrier is more complicated. See:
[https://www.google.com/url?sa=t&source=web&rct=j&ei=wEnDU_23...](https://www.google.com/url?sa=t&source=web&rct=j&ei=wEnDU_23BZGwyATqpoGwDQ&url=http://dl.acm.org/citation.cfm%3Fid%3D504309&cd=3&ved=0CCMQFjAC&usg=AFQjCNEhxDpi-
_L4vd9bjlSEpBv_XIXc-Q)

~~~
haberman
It is strange, because you are almost exactly restating the central thesis of
the paper, but phrasing it as though you are saying something different.

~~~
rayiner
Posting a tl;dr of a theory paper you read years ago is low-hanging karma. ;)

------
jheriko
still of the opinion that this is not a problem to solve with algorithms but
with just doing your god damned job and accepting that some of it is mindless
chore work and that it requires some basic level of attention to detail and
skill.

my software doesn't leak and it doesn't take forever to develop and its easy
to debug memory problems because the architecture is trivial.

~~~
pkolaczk
Some important programming paradigms are extremely hard or near-to-impossible
without GC. See functional or concurrent lockless programming. Sure, for many
imperative programs you can get away with simple memory management schemes
like RAII.

~~~
dandrews
I'd make your same observation. Like the parent, I'm proud of my imperative
code and never used to consider memory management a particularly onerous task.
I was brought up on it y'know, just a part of writing good code.

But then I started Lisp in earnest, found GC useful (necessary!) there, and
more recently have gone functional with Clojure. Couldn't do without GC now.

One of these days perhaps we'll have concurrent collectors a la Azul, running
on a specialized core/MMU on some multiprocessor SoC, and GC will fade into
the background and nobody will care so much about performance.

Hey, I can dream.

