
A Garbage Collector for C and C++ - brudgers
http://www.hboehm.info/gc/
======
kentonv
This has been around for a long time. The problem is, GC is not just something
you can bolt on, it's a fundamental part of the language that colors
everything you write -- and it fundamentally clashes with the way C++ is
normally written.

In particular, C++ (and Rust, etc.) stresses RAII (Resource Acquisition Is
Initialization), in which you write classes with destructors that cause the
object to release its resources as soon as it goes out of scope. "Resources"
here often means heap memory, but can also mean a whole lot of other things
(file handles, child processes, network connections, arbitrary resources on
remote network services, etc.).

In GC'd languages, the GC really only handles heap memory. Yes, you can
sometimes rely on finalizers to free other kinds of resources, but generally
you will find this doesn't work out well: most GC algorithms do not make any
guarantees about how soon an unreachable object will be cleaned up, and they
tend to use heuristics that assume that the cost of not cleaning up an object
is exactly whatever RAM it is using, nothing more. Often, if your process is
not allocating any new objects, the GC will never run at all.

Thus in GC'd languages you often see classes with methods like `.close()` or
`.dispose()` for eagerly releasing non-local resources. These resources have
to be managed manually, just like heap has to be managed in non-GC languages.
The beauty of RAII is that you can use it to manage any kind of resource, but
GC languages don't typically support RAII because it does not mix well with GC
semantics.

These differences completely change how APIs in these languages are designed.
Essentially, adding GC to a language -- or removing it -- gives you a
_different_ language that happens to share some syntax, and probably not a
very good language since all that syntax was designed around a fundamental
assumption that no longer holds.

I kind of wonder if a preference for GC languages is in part responsible for
the way distributed systems are typically designed these days -- namely, the
emphasis on stateless servers, which avoids the need to do remote resource
management (which is hard without RAII), at the expense of requiring a lot
more machinery to make performant (e.g. complicated caching layers because all
state changes have to hit storage which is sllloowwwwwww).

~~~
amelius
> In particular, C++ (and Rust, etc.) stresses RAII (Resource Acquisition Is
> Initialization)

But there is no reason that the RAII pattern couldn't be completely orthogonal
to memory management.

~~~
ExpiredLink
Could. A hypothetical new language _could_ be created to facilitate whole-
program escape analysis and thus completely avoid both GC and manual memory
(resource) management.

~~~
Others
This sounds awfully like the ownership concepts used in Rust...

------
eatonphil
Why would this outperform malloc for multi-threaded programs? Is that a
property of GCd programs in general?

Edit: That is, is it typical for GCd programs to outperform manually-memory
managed programs in multi-threaded environments?

~~~
zik
Fans of GC claim that GC outperforms malloc() based on some studies that were
done a long time ago. The argument is that repeated malloc/frees are less
efficient than a bunch of GC allocs and then a single garbage collection.

The reality seems to be a bit different. In practice GC programs tend to use
significantly more memory which can impact performance. And the trend towards
low GC pause times costs additional CPU. Beyond that we're now using much
larger multi gigabyte GC memory pools which can also lead to poor GC
performance.

So overall people these days see lower performance with GC systems compared to
malloc.

~~~
pcwalton
Many of the claims that GC outperforms manual memory management compare
running dlmalloc to allocate _all data_ with a well-tuned generational garbage
collector on something like the Da Capo benchmark suite. Naturally, they find
that the ability to get bump allocation in the nursery of a good generational
GC ends up outperforming a malloc implementation that was never really
designed for that kind of workload.

But the real question, in my mind, is whether a well-tuned systems-level
program that uses stack, arena, and heap allocation with a good allocator like
jemalloc ends up being better with a garbage collector. And it's really hard
for me to see how that could possibly be the case. Performance-conscious
systems programmers will use the stack as their nursery, gaining all the
benefits of the nursery without the copying, tracing, or delayed reclamation.
Modern mallocs like jemalloc or tcmalloc are incredibly good at minimizing
fragmentation and satisfying requests quickly, using thread-local caches to
avoid synchronization. Most of the time, the tenured generation needs the same
bookkeeping that a modern malloc does, so you're not really gaining anything
by using GC for that generation. And a GC always has some kind of mark or
tracing phase (not to mention at least a write barrier if you want your pause
times to be reasonable), which is pure overhead over manual memory management.

~~~
pjmlp
Yes, but in a language like Modula-3 you can have your "well-tuned systems-
level program that uses stack, arena, and heap allocation with a good
allocator", and still have a GC at your disposal.

Sadly HP/Compaq killed the Olivetti/Digital unit and Modula-3 died, but its
ideas can still be applied in modern languages, assuming similar capabilities.

------
abalone
Would it be conceivable to implement Automatic Reference Counting for C/C++?
Basically gets you the advantages of GC without the performance impact of GC
cycles, because it does all its magic at compile time.

On a related note I'm excited about Apple open sourcing and porting Swift to
Linux with an eye towards server-side code, specifically because of ARC.
Garbage collection has been a concern for web services in particular when it
comes to 99th percentile latency. Things run fine and then boom, GC runs,
slowing down a few unlucky requests. ARC just makes memory-management
performance deterministic, no tuning needed.

~~~
pjmlp
C++/CX does ARC for C++ for those of us on Windows, but it comes with the cost
of making C++ classes into COM ones.

------
BinaryIdiot
I like that it can be used as a leak detector! I'm not sure I understand the
point beyond that but then again I am in love with RAII semantics so I'm
probably a bit biased here.

~~~
vidarh
Consider it a safeguard. One one hand it can detect (some) leaks. On the other
hand if/when your app leaks it can mitigate the consequences of those leaks.

For some very specific types of apps it can also be sufficient to ditch
traditional C/C++ memory management, and you can actually even gain speed
doing so.

Consider e.g. code that runs a job. Most of the time you know it will need
less than 32MB (or whatever) total over the (short) lifetime of the process.

So you tune the GC accordingly to not trigger any collection at all until
above that level. Now most of time, a collection will never get triggered, and
you're happy - all the memory is just returned to the system at the end in one
swoop.

But if occasionally a job requires lots more, the GC may provide sufficient
cleanup to still make it viable not to include any memory management.

My first large CGI-era webapp was in C++ and relied exactly on lots of hacks
like deferring all memory de-allocation until exit to improve performance (and
it worked very well - it had a considerable performance impact; other hacks
included static linking to drastically cut the loading cost). We didn't use a
GC; instead we had to take extra care when we knew we were going to make large
allocations to do manual de-allocation in those cases if the request was
potentially long-running. I'd have loved to use the Boehm GC for that.

~~~
pjmlp
Walter Bright uses this trick in his D compiler. Memory is not freed, just by
the OS.

[http://www.drdobbs.com/cpp/increasing-compiler-speed-by-
over...](http://www.drdobbs.com/cpp/increasing-compiler-speed-by-
over-75/240158941?queryText=Walter%2BBright)

------
lisper
A Lisp that uses the Boehm GC:

[https://github.com/rongarret/Ciel](https://github.com/rongarret/Ciel)

~~~
BruceM
CLASP
([https://github.com/drmeister/clasp](https://github.com/drmeister/clasp)), a
Common Lisp, can optionally use one of Boehm or the Memory Pool System
([http://www.ravenbrook.com/project/mps](http://www.ravenbrook.com/project/mps)).

Open Dylan ([http://opendylan.org/](http://opendylan.org/)), a Lisp-like
language without Lisp-like syntax, also can use Boehm or the Memory Pool
System.

------
denim_chicken
I feel that this library is a testament to the power of C and C++ (and their
ecosystem).

