
Use and Abuse of Garbage Collected Languages - prajjwal
http://prog21.dadgum.com/134.html
======
knowtheory
A lot of people use garbage collected languages because garbage collection
isn't an issue in their projects... until it is. And then folks who really
have to deal with it, get to learn about their garbage collector and how to
manage memory allocation in their apps.

But to some how claim that the productivity savings that folks get from
deferring learning about the details about their language's garbage collector
is somehow problematic, because in a subset of cases, you _do_ care about
object allocation and garbage collection is a pretty silly argument
(especially in the face of the article's admission that garbage collection has
won out).

Garbage collection for most people is a tool of convenience. When the tool
becomes inappropriate, yes, people do have to learn how to optimize their
code. None of this should shock us, or keep us from using garbage collectors.

~~~
pcwalton
Well, the problem is that doing explicit memory management in a language
that's built around GC is usually far less convenient than doing explicit
memory management in a language that isn't (like C++). For example, in C++ you
can use templates and placement new to write generic free lists that get
specialized to the type of object in use at compile time, and they still allow
you to use the regular old "new" syntax for calling the constructor. By
contrast, you can write free lists in Java/C#/Python/Ruby/JS/etc, but you
won't get that nice "new" syntax. (Not to mention that you don't get the same
control over memory layout that you do in C++.)

Explicit memory management in GC'd languages is also less predictable: Java
can allocate objects on the stack via escape analysis, but if the object is
determined to escape then it gets silently demoted to the heap instead. In
performance-critical code, this silent demotion can lead to a hard-to-find
bug.

To me, the obvious solution is to build constructs into our GC'd language that
nevertheless make explicit memory management predictable and convenient when
the programmer wants to use it. A requirement for predictability is that
whether an object is garbage collected needs to somehow be encoded into the
type of that object (either via a static type or via a dynamic type). This is
the philosophy we're following in Rust, and it's also the philosophy some of
my colleagues are working on for JavaScript (via the Binary Data API).

~~~
snprbob86
> you won't get that nice "new" syntax

What's nice about that syntax? In every language that I've used which has it
(C++, Java, C#, JavaScript), it's been a royal pain in the butt. It makes it
harder to swap in alternate factory behavior and requires remembering a few
arbitrary extra rules, like operator precedence (no matter how many times I
write `new Foo().bar()`, it still doesn't parse right in my head).

Ruby, SmallTalk, and Objective C get this right: Allocation is distinct from
initialization (new vs init) and both use standard message passing / method
call semantics.

------
Rhapso
I've had some real issues with the Java garbage collector where it seems most
of my peers have not, so perhaps it is a symptom of my programming style or
the projects I am doing (some big numerical and physical simulations).

The ultimate solution I have found to bad garbage collectors is to avoid them,
use 'frame buffering'. Almost every problem can boil down to a state machine,
but your states can be very large (for me I end up using big N-dimensional
arrays), but building and destroying very large state records very rapidly in
a Garbage Collected language tends to overwhelm the GC to the point where you
are filling the queue for garbage faster then it de-allocates, so you run out
of ram and it all slows down while it waits for the garbage collector.

The solution I use, is I make 2 static/global state records then use one as
the state, and write the new state to the other, then flip the pointers on the
2 states exactly like frame buffering. that way you never give anything to the
GC. I've started using this technique for all my big state machines in garbage
collected languages, but every time somebody reads the code or I describe this
idea they look at me like I am mad, like the only way to do it is to create a
new state from scratch each iteration and then throw out the old one. For one
team member it took re-writing some of my code to standard OOP model (it made
the project no longer run in anything like real-time) to prove my idea was
good. Is it just my peer group that does not see this type of idea as sane or
am I missing something important?

~~~
Retric
Sounds like your implementing an Object Pool by hand. That may be reasonable
solution for a one off problem, but you can probably leverage other peoples
code for this stuff and gain some flexibility.

<http://en.wikipedia.org/wiki/Object_pool_pattern>

~~~
Rhapso
Yes, thank you!

------
eru
> Need to build up an intermediate list and then discard it? Easy! No worries!

In Haskell efforts are under way to have your cake and eat it, too. Stream
fusion (and more general deforestation techniques) allow you to have
intermediate data structures at the level of source code logic, without them
ever showing up in the compiler's output. Such tight loops don't produce any
garbage.

~~~
lubutu
This is one of the examples I use to explain why higher-level functional
languages can be more drastically optimised than lower-level imperative
languages. These fusions are also sometimes called hylomorphisms.

~~~
eru
Yes. Though at the moment the stress on that sentence is mostly on the `can
be', not `are', yet.

------
rwj
While garbage collection is useful, people often forget that memory is only
one resource that needs to be managed. File handles, database connections,
sockets, etc. also need to be "deallocated". RAII may require programmer
effort, but it at least is a complete approach. Generally, I don't think a
single technique will win.

~~~
rayiner
I don't like this argument. It ignores the fact that memory objects outnumber
file handles, database connections, etc, by orders of magnitude. Moreover,
things like file handles, database connections, and sockets can't refer to
other file handles, etc, making them terminal nodes in the in-memory data
graph. That makes them a lot easier to manage than memory objects which may
appear in any place within a data structure. Finally, the need to create
temporary memory objects is far greater, especially in functional programming,
than the need to create temporary file handles, etc.

~~~
andrewflnr
I'm not sure what your point is. What part of rwj's argument don't you like?
Do you dislike RAII? Personally, it's one of the few things I think I would
miss from C++ if I went back to low-level programming. Even if file handles,
etc are technically easier to handle, do we really lose anything by using a
general tool to manage them, at least sometimes?

I feel like I'm missing something. Please explain.

~~~
SamReidHughes
You're missing the fact that reclaiming memory under garbage collection does
not have any side effects, while reclaiming file handles does. That's why GC
is appropriate for memory but not for file handles.

~~~
dkarl
Reclaiming memory makes that memory available for subsequent allocation.
Reclaiming file descriptors makes those file descriptors available for
subsequent use. Closing database connections means the resources dedicated to
those connections are made available for reuse as well.

All of those things are resources and fit nicely with the name of the idiom:
Resource Allocation Is Initialization. Resource management is 100% about side
effects.

Not that there's anything wrong with using RAII for things that aren't
normally thought of as resources (such as locks) but there's certainly nothing
strange about using it to manage a finite resource such as memory.

~~~
SamReidHughes
If you think reclaiming memory immediately is a visible side effect, you're
deliberately missing the point. Under GC the unused memory just gets reclaimed
later. The only visible side effect is the performance differences.

That's different than when holding on to a file handle or other resource, for
which holding on to the resource until the next GC cycle (which might never
happen, ever) would not be acceptable.

That's why GC is not acceptable for these resources that are not buffers of
memory.

At the same time, using RAII for memory management has the obvious downsides
that rayiner already iterated through. (One thing he didn't mention was the
tragic consequences of getting manual memory management wrong.)

(It's telling that you're talking about whether it's _strange_ to use RAII for
something, as if anybody here was disagreeing about its relative
_strangeness_. Either you've descended into arguing against an imaginary
opponent or you just like switching out your adjectives to different
adjectives that have different meaning.)

~~~
dkarl
Reclaiming memory is a visible side effect. If you don't do it, your program
leaks memory and dies. Very visible. It's just a resource like any other.

I'm not being deliberately obtuse; I'm just recognizing that RAII and garbage
collection are mutually exclusive. You're telling me that memory is special
because it's managed by GC, but that's a little unfair, because assuming GC
renders the entire discussion moot! If you assume garbage collection, then
you've ruled out RAII, and there's no point in discussing the suitability of
RAII for _any_ resource. You simply can't combine RAII with garbage collection
(at least pending some linguistic innovation that involved, say, reference-
counting objects only for the purpose of destruction and not for the purpose
of freeing memory. Which sounds silly to me.)

The reason is that RAII ties resource management to object lifetime, and with
GC, there is no deterministic destruction of heap objects, and therefore no
such thing as reliable RAII. Without deterministic object destruction, you end
up relying on the programmer to manually ensure the destruction of the
resource-managing objects. The best you can do in the presence of GC is "with"
constructs (such as CL's "with-open-file") which do not give you as strong a
guarantee as RAII does, because even with well-designed resource management
objects they rely on the programmer to know which types manage resources and
to never accidentally use them outside an appropriate "with" construct. Also,
there are inevitably times when a resource must outlive the scope of any
"with" construct. In the end you are much more reliant on programmer
discipline than with RAII.

So the question is not whether RAII is suitable for some resources but not for
memory, because that is a contradiction given current technology, but rather
whether you 1) use RAII both for memory and for whatever additional resources
you like, or 2) use GC for memory and use more manual and error-prone (in my
opinion) methods for handling other resources.

rayiner's objection to RAII is twofold, first that it is simple to manage non-
memory resources and second that functional programming style results in a
very large number of short-lived objects which are more efficiently handled
via GC. To the first I say that managing file handles and database connections
may be easy for him, but it certainly hasn't been easy for all the programmers
I've worked with. When resource leaks stop being a problem, we'll stop looking
for solutions. To the second I say that you can't choose between GC and
reference counting on the basis of performance without considering a concrete
situation. In many situations it's better to have a program that is
consistently slower than one that is much faster on average but intermittently
unavailable. In other cases you only care about throughput and will therefore
prefer GC.

Personally, I prefer to choose on the basis of programmer convenience rather
than performance. If resource management is important, that tilts the scales
in favor of RAII. If your program naturally lends itself to object graphs with
cycles, that points strongly towards a garbage collected language. (Though
personally I have never found that reference counting versus garbage
collection has been the most important difference between two languages
anyway, and the only language I'm aware of that really supports RAII is C++,
so it's not like you'll ever make a straight-up decision between RAII and
garbage collection without a ton of other very strong factors in play.)

 _That's why GC is not acceptable for these resources that are not buffers of
memory._

Not to be pedantic... okay, I'm being pedantic ;-) but GC can be used for any
set of resources that are linked the way objects in memory are. For example,
the database guys at my last job wrote an ad-hoc garbage collector when they
realized that buggy triggers had left a ton of orphaned rows in our database.

~~~
SamReidHughes
You are not disagreeing with anybody.

> Reclaiming memory is a visible side effect. If you don't do it, your program
> leaks memory and dies. Very visible. It's just a resource like any other.

And you are failing at reading comprehension.

------
jhspaybar
Being a programmer who has written memory managers in C++ as well as used
garbage collected languages quite a bit like Java, I think the solution will
just eventually be better VMs. Having watched the progression of Java from the
90s to now, it seems obvious to me that these are the sorts of problems that
will be solved by the folks who put together VMs.

~~~
wsc981
Do we really need garbage collected languages when technology like
Objective-C's ARC transparently "automates" the use of manual memory
management?

See: <http://clang.llvm.org/docs/AutomaticReferenceCounting.html>

~~~
ryanpetrich
ARC is pretty slow in the presence of multiple CPUs as it's still reference
counting underneath the hood (which requires atomic increment/decrement).
Garbage collection and other forms of manual memory management have the
potential to do much better.

~~~
kmontrose
ARC also requires you explicitly break cycles, and has additional "thinking
overhead" (strong/weak on properties, no pointers in structs, and some void*
magic iirc.

It's certainly _better_ than pure manual memory management, but it's not as
painless (in terms of freeing a programmer from thinking about memory
allocation) as a proper GC.

------
jhuni
Here are some relevant papers from Henry Baker on garbage collection:

Lively Linear Lisp -- 'Look Ma, No Garbage!
<http://home.pipeline.com/~hbaker1/LinearLisp.html>

NREVERSAL of Fortune[1] -- The Thermodynamics of Garbage Collection
<http://www.pipeline.com/~hbaker1/ReverseGC.html>

------
Dove
It is particularly concerning, at least in the case of Java, that I cannot
even fall back on the old paradigm and free memory manually. I am not
persuaded that figuring out how best to hoard objects is a better use of my
mental energy than proving they are properly allocated and freed.

------
dgrnbrg
It seems like a solution to getting more performant, lower overhead GC would
be to allocate GC pools that contain only a specific kind of object, and then
generate specialized code to mark/sweep those pools (since you could elide the
object headers, since everything in the pool is the same time. These pools
could be mananged by a GC-pool GC, so that as new object types were created
and destroyed, their pools would get allocated and garbage-collected.

Are there any systems that currently do this?

~~~
pcwalton
Yes, modern garbage collectors usually allocate objects into bins, which don't
have headers since the objects are of fixed size.

------
dazbradbury
Here is an article discussing Stack Overflow's battle with the .Net Garbage
Collector. It covers discovery through to their solutions, a good read for
those interested in scaling code in GC languages.

[http://samsaffron.com/archive/2011/10/28/in-managed-code-
we-...](http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-
recent-battles-with-the-net-garbage-collector)

------
based2
then plug to native code or to native memory if needed:

<http://incubator.apache.org/directmemory/>
[http://blog.centuryminds.com/2007/11/weak-and-soft-
reference...](http://blog.centuryminds.com/2007/11/weak-and-soft-references-
in-java/)

------
goggles99
It seems the main complaint in this blog is with unpredictable "pauses". Here
is a simple solution then. Let the GC run on a worker thread of a different
core. That should eliminate these pauses right? Pauses are more likely to be
caused by context switching between processes anyway. I have never heard of
programming language with GC being used on a Real Time Operating System for
obvious reasons. This sounds like a rant of jealousy or anti GC evangelism.

------
noduerme
If there's ever been a case for GC being fatal, it's the rise and fall of AS3.
My mainstay is heavy image processing and procedural animation in Flex/Flash.
While we do, finally, have a bit of control over the GC, it wasn't so until
very recently and after a lot of battles. So the best you could do was to
understand _how_ the GC worked, and try to write code that would mark objects
you wanted marked, and trigger the GC to run when you wanted it to run. In
other words, in AS3, the only real choice was to write a management layer over
the built-in buffer...

Not surprisingly, very few people employed to write Flash code actually
understood anything about how memory was allocated or managed; nor did they
care, because it was so easy to ignore what was going on in their little
embedded applications, since much of the time the AVM2 GC let you get away
with murder. Other than an endless stream of FlashKit threads where people
couldn't understand how a locally scoped Tween had suddenly stopped haflway
through completion, no one paid much attention. This gave rise to the
phenomenon of the massive-memory-munching Flash widget, and arguably to the
(as-yet-unofficial) death of Flash itself.

The problem was simply that AS3 is way too powerful and potentially dangerous
of a language to put in the hands of coders who don't know or don't care about
memory management; it sits there in your browser basically running Java, but
with the casual auto-loading aspects of Javascript. Moreover, like Javascript,
Flash GC works differently depending on environment - the projector is very
clean; AIR has its own methods to call; individual browsers running Flash
plugins actually specify their own ideal levels of memory usage and prevent
the GC from running until those are reached, and there's nothing you can do to
make a user's browser collect sooner than it wants to. (Almost nothing. We
found out that opening some kinds of local connections forced GC in some
cases...among other tricks). Yet the subject was consistently downplayed by
Adobe and there was very little documentation on best practices to reduce the
footprint of programs. Guys like Grant, Jean Phillipe and me had to work
things out for ourselves as to how the GC actually functioned, and then try to
educate other coders on the forums, etc.

Javascript 1.5 is another language ripe for GC problems, in which there is
_no_ hope of better tools to make the functioning of the GC less opaque,
barring a sudden massive adoption of JS 2.0. The definition of "badly written"
Flash code that everyone always complains about is heavy code that doesn't
consistently clean up after itself. You can write a particle system in AS3
with 100k particles on screen and no major CPU impact; the trick is knowing
how and when to force collection, how to (not) link events to the display
chain, etc. It wouldn't be an exaggeration to say that _most_ AS3 optimization
has to do with working around the GC in some way.

The same problems are rampant in JS. The non-coding population's not as
familiar with the results because it's more transparent in the browser and, by
and large, the programs don't try to reach the same levels of complexity; plus
when they crash, they haven't been trained to think "oh damn, Javascript
crashed my browser again." However GC is a problem that all front-end
developers need to make themselves extremely familiar with if they plan to
write browser code that doesn't hog memory and ultimately collapse...and it's
much harder to do that in a runtime browser script than it is in compiled
code.

