
Everybody thinks about garbage collection the wrong way (2010) - derefr
http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx
======
nostrademons
BTW, it's not unheard of for short-lived command line utilities or
multiprocess apps to literally run with the null garbage collector. They just
never free memory and deliberately leak it until they exit, at which point the
OS cleans it up. If you expect total RAM requirements to be a few hundred K on
a multi-G system, this is completely reasonable, and faster than either manual
or automatic memory management to boot.

The analog in a long-running program is an arena: it keeps allocating memory
indefinitely, and then frees it all in one clump when the program is done with
that phase of operation.

Multi-process languages like Erlang sometimes use this approach as a
degenerate case of a stop & copy garbage collector. They start a process with
a young generation heap sized large enough to contain all the memory the
process will ever allocate (you can get reasonable estimates on this from
previous runs), and so the GC never triggers and all the memory is freed in
one block when the process terminates.

~~~
derleth
> They just never free memory and deliberately leak it until they exit

Which is a _great_ way to kill that process, or arbitrary processes, when a
previously well-behaved program is run on a system that's a bit strapped for
RAM at the moment, or run on a somewhat larger-than-usual input, or both.

That, or bring the system to a standstill as the swapping subsystem flogs the
disk nonstop.

Possibly first one, and then the other.

~~~
nostrademons
For a lot of these simple command-line utilities, it wouldn't matter anyway.
They often work by building up some data structure that's then used to process
the rest of the input line-by-line, and then the program exits. If the data
structure is never freed, so what? It's needed until the end of execution
anyway.

Think of compiled regular expressions for grep, or total counts for wc, or a
rotating line buffer for tail, or the previous line for uniq.

It's quite possible to get pathological behavior even with GC or careful RAII,
as anyone who's ever leaked memory in a Java program could tell you.
Simulating a computer with infinite memory breaks down when the working set
you're touching exceeds the physical RAM of the machine. Actually, leaking
memory is the least of your concerns in that case, as leaked memory just gets
paged out and doesn't cause any problems unless it exceeds the computer's
_address space_.

~~~
andrewflnr
Actually, with tail -f that could become a problem.

~~~
rcthompson
I don't see how it could. Tail doesn't use more memory as it reads more lines.
It sets up a fixed-size buffer and then continually re-uses it. Yes, "tail -f"
could be a long-running process, but its memory usage should remain constant
over its entire lifetime.

------
rdtsc
In case someone missed it, this is more about releasing external resources
than GC per se. (There is an interesting analogy of how GC simulates a machine
with infinite memory).

> but there are things you can do even if you have infinite memory that have
> visible effects on other programs

Moreover there are things you can do even without having these external
resources that will mess up your program and that is just running the GC in a
highly concurrent application. Except for systems with separate
per/thread/actor heaps, when memory is allocated vigorously GC runs will start
to affect your latency and parts of the program will get blocked and frozen.

As for releasing resources. In vital systems it is important to run watchdogs
that release resources on behalf of a the main applications. Sometimes
processes crash hard, segfault, and may not run their own finalizes. So these
resource lists might have to be sent to a watchdog process whose only job is
to monitor for the original program crashing and then releasing these
resources.

------
rational_indian
>Garbage collection is simulating a computer with an infinite amount of memory

I think he means "infinite amount of virtual memory". Virtual memory already
tries to "simulate a computer with an infinite amount of (random access)
memory". This makes me wonder if virtual memory can be used as a poor man's
garbage collector?

~~~
knz42
Virtual memory can be used as "poor man's GC" only with languages with a
variable number of bits in pointers (references). If you have a fixed number
of bits in your pointers, like in C & C++, then you will run out of
_addresses_ eventually, even with virtual memory.

~~~
rational_indian
Yup! Virtual memory as a poor man's GC seems to be viable for short lived
programs with modest virtual memory requirements. I seem to recall Walter
Bright using something along the same lines to replace malloc/free in the D
compiler code. It lead to a significant improvement in performance.

------
Yhippa
I didn't quite grok the concept of GC as simulating infinite memory. After
some research I ended up at, what else, SICP
([http://mitpress.mit.edu/sicp/full-
text/sicp/book/node119.htm...](http://mitpress.mit.edu/sicp/full-
text/sicp/book/node119.html)) with an explanation. Hope this helps someone
else.

>The representation method outlined in section
[[http://mitpress.mit.edu/sicp/full-
text/sicp/book/node118.htm...](http://mitpress.mit.edu/sicp/full-
text/sicp/book/node118.html#sec:memory-as-vectors)] solves the problem of
implementing list structure, provided that we have an infinite amount of
memory. With a real computer we will eventually run out of free space in which
to construct new pairs.[[http://mitpress.mit.edu/sicp/full-
text/sicp/book/footnode.ht...](http://mitpress.mit.edu/sicp/full-
text/sicp/book/footnode.html#foot27912)] However, most of the pairs generated
in a typical computation are used only to hold intermediate results. After
these results are accessed, the pairs are no longer needed--they are garbage.
For instance, the computation

(accumulate + 0 (filter odd? (enumerate-interval 0 n)))

constructs two lists: the enumeration and the result of filtering the
enumeration. When the accumulation is complete, these lists are no longer
needed, and the allocated memory can be reclaimed. If we can arrange to
collect all the garbage periodically, and if this turns out to recycle memory
at about the same rate at which we construct new pairs, we will have preserved
the illusion that there is an infinite amount of memory.

------
twotwotwo
>If the amount of RAM available to the runtime is greater than the amount of
memory required by a program, then a memory manager which employs the null
garbage collector (which never collects anything) is a valid memory manager.

A bit of a tangent, but I wonder GC is one reason Android devices moved to
larger RAM sizes faster than iOS did. (iPhone 5 == 1GB, GS3/N4/One == 2GB.)

Under manual memory management or refcounting, running with RAM 90% full isn't
slower than 50% full. (Your app code isn't slower, at least; who knows it if
has effects via OS having less for cache or whatever.) But under GC, if 90% of
RAM is full of live objects, you'll be forced to GC several times as often as
you would if only 50% of RAM were full. So that 1GB->2GB bump might actually
_more_ than halve how often you have to collect.

There could be other reasons for the diff--maybe part of it is Android's more
permissive multitasking model, maybe non-GC-related memory-use differences
between Android/Java and iOS/ObjC, maybe greater focus on power consumption
from Apple or greater focus on specs from Android handset makers.

------
tehwalrus
This article got me thinking - I use quite a lot of C/C++ python extensions,
and I'm wondering if I could wrap some of my malloc calls (particularly the
ones that store an "inner C++ object" on a python class) as python byte
strings that are then pointer-cast to RAM for C++ to use, and then tidied up
by the garbage collector rather than a manual delete call later.

The only difficulty I can see is persuading the C++ constructor to run on an
arbitrary address?

~~~
anonymous
You can persuade the C++ constructor to run on an arbitrary address trivially
with "placement-new". See here
[https://en.wikipedia.org/wiki/Placement_syntax](https://en.wikipedia.org/wiki/Placement_syntax)
. In short, you can do:

    
    
        void* mem;
        ... snip ...
        MyClass* thing = new (mem) MyClass;
    

And that doesn't allocate memory, but calls the constructor on the mem
pointer. You are responsible for having allocated enough memory. You cannot
call delete normally on this object though. You'll need to call the destructor
manually as such:

    
    
        thing->~MyClass();
    

before the memory is deallocated. If the destructor doesn't actually do
anything, you can safely skip this.

~~~
tehwalrus
Thanks, that's very helpful!

I am blissfully unaware of almost all C++ specific syntax beyond basic class
definitions, as I use it to add the odd syntactic construct to C (like
operator overloads for + etc for my math 2- and 3-vectors) so I had no idea of
this.

~~~
anonymous
If you want to learn more, I'd suggest at first reading (or at least skimming)
Bjarne Stroustrup's "The C++ programming language".

Also, the C++ FAQ
[http://www.parashift.com/c++-faq/](http://www.parashift.com/c++-faq/) is
useful for getting an overview on the corner cases of C++.

After which you should visit its inverse, the C++ FQA (frequently questioned
answers, i.e. an anti-C++ answer to the FAQ)
[http://www.yosefk.com/c++fqa/](http://www.yosefk.com/c++fqa/) to learn why
no-one in their right mind would ever use C++.

That won't teach you C++ per se, but would give you a useful look at the
breadth of the language and how and why things happen the way they do.

~~~
tehwalrus
Thanks. I have got a copy of a Stroustrup on my shelf (I think it's a standard
library reference, not the language definition) and I've been developing in "C
with classes" for a while, so I've discovered quite a few of the pitfalls. I
prefer to let clang++ point out when I stumble across something insane, like:

    
    
        vector<vector<T>>  // wrong, uses a >> operator?!
        vector<vector<T> > // compiles correctly.
    

In some ways I prefer what Objective C did, as a proper superset, and in
others I'm drawn to projects like Cello[1]. I'm happy with using (and knowing
only) a very small set of C++ features, especially where they're useful for
defining simple algebras for objects! I'm sure I'll find another feature I
like one day - this one (GGP) may be one of them.

There is one OSS project I know that I never want to emulate: the code
literally fails to compile via the python installer until you have run the
makefile at least once, by which time it's finished generating all the header
files. Whiskey Tango Foxtrot!

[1]
[https://github.com/orangeduck/libCello](https://github.com/orangeduck/libCello)

------
cousin_it
Question to people who have used finalizers in GC languages: can you describe
what you used them for?

~~~
bcoates
If you call out to a C-style library through the FFI that's expecting a
cleanup call, you should call it in both a user-accessible close() function
and the finalizer.

I think the easiest way to think about it is, "finalizers don't prevent handle
leaks, they sometimes clean up a handle leak that already happened".

I'm not sure if any debugger environments give a warning when a finalizer gets
run due to GC but they ought to--it always indicates a programmer error.

------
joe_the_user
The argument seems logical as far as it goes. But having a piece of code which
may or not run seems to introduce inherently greater mental complications into
the programming process.

~~~
salmonellaeater
The key point is that finalize() is not reliable, so you shouldn't depend on
it. The correct idiom is more or less

    
    
      myResource = foo();
      try {
         ...
      } finally {
         myResource.release();
      }
    

Some programmers (although I've never met one) try to avoid this boilerplate
by writing a finalize() method on myResource that releases it. It's an anti-
pattern. C# and Java have syntactic sugar now that makes the boilerplate
shorter, which should help.

