

Why Garbage Collection Paranoia is Still (sometimes) Justified - signa11
http://prog21.dadgum.com/15.html?_

======
gizmo
> What language systems in existence are using a true incremental or
> concurrent garbage collector? I know of three: Java, Objective C 2.0 (which
> just shipped with OS X Leopard), and the .net runtime. Not Haskell. Not
> Erlang. Not Objective Caml. [...], not Smalltalk not Ruby.

Oh, why could that possibly be. Hmm...

\- Java: Sun

\- Objective C: Apple

\- .Net: Microsoft

Big companies with the manpower and determination to make improvements to
their platforms. All the other languages have either grown out of one-man
prototypes or academic projects. Building a good garbage collector is _hard_.
It's difficult because you can't keep track of much data while you're
allocating, you can't walk the graph of allocated objects because different
threads are modifying the graph while you're working on it. When you then add
a constraint that you can never stop the world for more than 5ms at the time
it becomes a Difficult Problem.

Languages like Ruby, Python, Lisp and Scheme don't have good generational
garbage collectors because they don't have the manpower to tackle Difficult
Problems. (See for instance the global interpreter lock in Python. It's been
10 years now?)

To tackle a Difficult Problem you need (a) a few smart people and (b) get them
to work on the problem for many consecutive hours for many consecutive days.
Volunteers just can't afford to spend 50+ hours a week for half a year on a
problem. A hundred people can chip in weekends and evenings for a year and
make no substantial progress at all.

This is completely different from the hypothetical "sufficiently smart
compiler" that is supposed to sense the intent of the programmer and optimize
accordingly. A sufficiently smart compiler cannot compensate for bad algorithm
design, so the performance differences will always be in the margins. The
state-of-the-art in terms of garbage collection is decades ahead of what we
see in Python, Ruby, Haskell, Scheme and so forth. The GC problem is simple:
spread the computational overhead of garbage collection evenly so that the
application doesn't stutter. Simple engineering problem with a concrete and
clear goal. Just difficult to implement.

~~~
DanielRibeiro
_Languages like Ruby, Python, Lisp and Scheme don't have good generational
garbage collectors because they don't have the manpower to tackle Difficult
Problems._

It seems that you confused language with implementation. For example: want a
Ruby without GIL and with great garbage collector? Use JRuby (and this assumes
we are comparing to the official one: YARV). One language, different
implementations.

~~~
gizmo
JRuby just piggybacks on Java. That doesn't invalidate my point, as far as I
can tell.

~~~
DanielRibeiro
JRuby uses the JVM. Java is the language. Same point: language !=
implementation. Not always, but nowadays we have many mature vms, so it is
more common.

------
ScottBurson
Historical note: the Lisp Machines had true incremental collection, supported
by microcode and hardware. Words in memory had a tag field, and a certain
value of the tag field indicated that the word was "GC-forwarded". When the GC
moved an object (it was a copying collector) it would leave behind these GC-
forward pointers in each word of the old location of the object, pointing to
the corresponding word of the new location. Subsequent references to one of
the old words would automatically indirect to the new one -- note that this is
an indirection triggered by the contents of _memory_ , not by an explicit
instruction. The point was, it was no longer necessary to find all references
to the old location and update them just to move one object. (The GC algorithm
guaranteed that all references to old objects would be updated eventually, so
that the space could be actually reclaimed.)

It's hard to do this without hardware support, and even harder on a
multiprocessor. Still, I do recall seeing a paper presenting a fully
incremental GC with a very short maximum-pause guarantee. The problem was
(IIRC) that it made such heavy use of CAS (compare-and-swap) instructions that
the overhead of these would be unacceptable.

I wonder how much effort Intel and AMD have been putting into making CAS
faster.

~~~
yuhong
I remember that Intel made it faster in Nehalem.

------
yason
Note that a heap implementation for manual allocation and manual freeing isn't
free of problem cases either. The heap will impose an accumulating maintenance
overhead that will need to be cleared out eventually. Similarly to a GC heap
you can try to amortize the overhead into some controllable, bounded chunks of
computation but that doesn't mean pathological cases wouldn't exist nor
happen.

------
j_baker
I don't think there's anyone credible out there who will say that there is
_never_ a case where garbage collection isn't appropriate. Sure, rocket
guidance systems, military software, video games, and software from 1983
aren't a good fit for garbage collection.

However, in 2011 I think that the _vast_ majority of software programmers
write would benefit from garbage collection more than they would lose.

------
Dylan16807
I don't think I would call O(n) of the amount of allocated data
'pathological'. Nor would I agree that completely incremental collectors being
rare proves anything about what is possible. I would say that it's simply
developers preferring a runtime that's a small linear multiple faster but has
occasional pauses.

------
wingo
The V8 JavaScript implementation landed an incremental collector a couple
months ago, FWIW.

------
Roboprog
I want to see the next post about the Erlang GC.

~~~
vukk
This post is from 2008, so the next post is already posted here
<http://prog21.dadgum.com/16.html?_>

