
Garbage collection thoughts - networked
http://sebastiansylvan.com/2012/12/01/garbage-collection-thoughts/
======
aidenn0
With GCs that use some varient of Cheney's algorithm for the nursery, short-
lived heap objects are far less painful than you might think, as a nursery
collection takes time relative to the amount of live data in the nursery.

------
rayiner
Regarding your third to last paragraph. This paper is worth looking at
(probably the most recent treatment of deferred reference counting):
[http://www.cs.utexas.edu/users/mckinley/papers/rcix-
oopsla-2...](http://www.cs.utexas.edu/users/mckinley/papers/rcix-
oopsla-2013.pdf).

------
theseoafs
> A strongly typed high level language should be immutable by default, or at
> the very least track mutability, which means we can statically know that
> large object graphs are acyclic (as immutable values must be, in a strict
> language) – no need to trace these in any cycle collector.

How would this work? Unless all values are immutable, this isn't generally
true. If all values _are_ immutable then you've got a pure functional language
and a lot of these ideas don't apply any more.

~~~
Gankro
If you disallow mutation of values in a collected context, then it Just Works.
The only way to complete a cycle is by mutating a managed value or "tying the
knot" via laziness (which is of course just mutation in a hat).

Rust almost has this. Only shared (immutable) references can be obtained to
values owned by a reference-counted pointer (which is nothing fancy -- just
not implementing mutably dereferencing RCs). Since mutability is inherited,
this freezes all reference counted data and everything it can reach.

However Rust also provides types which specifically _can_ be mutated in a
shared context (a Mutex being the most obvious case), so you can indeed make
reference-counted cycles and leak.

Still, by default you can't leak. You have to opt into these "interior
mutability" types.

Basically, "shared XOR mutable" does not necessitate forbidding mutability
universally. It just requires a way to freeze data when it _is_ shared. You
can even unfreeze it whenever you can prove it isn't actually shared (e.g.
reference count = 1).

~~~
pron
But that means you've just disallowed a whole class of some very efficient
data structures (in the collected context). Those are the very same data
structures (trees, graphs) that are useful for arbitrary lifetime objects,
namely exactly the objects that would most benefit from a GC.

------
e28eta
As someone who primarily works on iOS, I would have liked to see Apple's ARC
mentioned. With the addition of weak pointers, I think it gives a good blend
of performance (no GC) and low mental overhead. I think marking pointers
strong or weak actually adds semantic meaning to the code.

But maybe that's just rationalization.

~~~
mikeash
ARC is one of the oldest (if not the oldest) kinds of GC out there. It's even
the subject of one of the famous AI Koans:

[http://www.catb.org/jargon/html/koans.html](http://www.catb.org/jargon/html/koans.html)
(see "Moon instructs a student")

It stopped being considered proper garbage collection long ago because of the
inability to collect circular references without help, which of course is the
subject of that story. Apple is doing decently well with their version of it,
but they're essentially reinventing ancient history.

Other GCs often have weak pointers too. You just use them less, because with a
GC that can deal with circular references you don't need them nearly as often.

~~~
hamstergene
You're mixing up Apple's ARC with reference counting in general.

A typical RC implementation suffers from either requiring to do AddRef/Release
calls manually (what Apple had in pre-ARC era), or blindly doing too many
automatic calls. That's ancient history and can not complete with a moderately
decent modern GC.

ARC has neither of those disadvantages and so provides convincing alternative
to GC, and AFAIK is the first and unique to do so.

BTW, the correct term to use here is "automatic memory management" (AMM), not
garbage collection, and there is no such thing as "proper" AMM. Handling
circular references is not a must-have property of AMM, it is just the price
RC pays, in GC, that will be pauses and higher footprint, and other approaches
exist too: for example, to disallow in compile time to create a structure that
can be potentially circular.

~~~
theseoafs
> ARC has neither of those disadvantages and so provides convincing
> alternative to GC, and AFAIK is the first and unique to do so.

I love what Apple has done with ARC: they've taken a very inefficient approach
to garbage collection, one which has repeatedly shown to be the "worst case"
algorithm with the worst performance characteristics, and then proceeded to
convince legions of developers that that same approach is actually the best
one.

~~~
pjmlp
Not only that, they managed to sell the idea they gave up on GC because their
engineers didn't managed to make it work with C semantics that didn't crash
left and right, but because ARC is way better than GC.

------
kazinator
> _the language should allow you to express simple memory ownership when you
> have it_

The problem is that you only _think_ you have such a thing. If you're wrong,
you're screwed. :)

> _The third and maybe most important optimization is to make garbage
> collection a per-thread operation. You simply don’t allow garbage collected
> heap allocations to transfer between threads._

This takes away much of the point of threads, leaving you better off with
processes---and process boundaries will better enforce this model. You cannot
cons up a message, pop it in a queue, and have another thread pick that object
out the other end: thus you have to use some IPC based on bytes in a pipe or
shared memory buffer.

In other words, you might as well fork a bunch of children that have not just
their own heaps but their own address spaces to enforce the containment of
those heaps. There is a lot of good in it: fault isolation, more virtual
memory footprint to each little worker. There is a degree of attractiveness in
multiple processes with just one thread in each one, and single-threaded
garbage collection (which doesn't have to deal with hairy concurrency
problems, and so is faster and more robust).

> _To figure out which region a pointer refers to in this scheme, we simply
> loop through the different region sizes that our allocator allows, mask the
> appropriate number of bits from the pointer to get the start of the proposed
> region, read the region index in the first DWORD of this memory area (this
> may be garbage if we have the wrong size) use this to index into the region
> pointer array (taking care to avoid buffer overruns from garbage indices)
> and ensure that the pointer stored in that element refers right back to the
> proposed region. We also must ensure that the size of the region at this
> address actually contains the original pointer._

The problem of mapping a pointer to a region is already solved very well with
existing data structures. For instanc, the Linux kernel uses a red-black tree
to map a virtual memory pointer to a "struct vm_area". (These are the regions
allocated by mmap). The regions don't overlap and so a binary search tree
ordered by their base address works fine.

Solving the problem of "does this pointer refer to a heap, and which one" in a
garbage collector is only necessary when we are doing conservative garbage
collection: we are given some pointer-sized value from somewhere (like a scan
of some raw memory area such as a C stack), and we have no idea whether or not
the value is a pointer to an object in the GC world. An always-precisely-
tracing garbage collector never has to solve this problem. It starts with root
values, which are known to be objects in the GC world. If they are heap
values, they point to objects whose structure is known to the collector. They
are precisely traversed to reach other objects and so on.

~~~
rbehrends
> This takes away much of the point of threads, leaving you better off with
> processes---and process boundaries will better enforce this model. You
> cannot cons up a message, pop it in a queue, and have another thread pick
> that object out the other end: thus you have to use some IPC based on bytes
> in a pipe or shared memory buffer.

Not really true; a lot depends on the implementation here. In a multi-threaded
setup you can often avoid context switches and kernel calls (the really
expensive part) and deep copying with a bump allocator is generally cheaper
than deserializing/serializing [1]. Often, you can also avoid copying if you
do it right.

On top of that, while you can't manipulate pointers on the heap safely under
such a concurrency model, you can manipulate integers and floats across heaps
(which is relevant for a large number of concurrent algorithms) if you do it
right (i.e. by ensuring that you don't deallocate the objects containing the
data).

[1] Keep also in mind that any actual shared data that goes from one thread to
another will generally go through main memory or the L3 cache at one point or
another.

------
pjmlp
Oberon derivatives and Modula-3 are two examples of systems programming
languages, using value types by default and GC enabled.

Sadly the market choosed to ignore them.

~~~
_pmf_
> Sadly the market choosed to ignore them.

Most of the market probably did not know about it, which is different from
ignoring something.

~~~
pjmlp
They were quite known in European universities.

ETHZ also created a spin off, Oberon microsystems, that sold BlackBox
Component Builder, based on Component Pascal, an evolution of Oberon-2.
Featuring a Delphi style IDE.

------
pron
The author tosses the word "performance" around a lot even though it means
different things in different environments, especially when it comes to GC.

First, the main cost a GC -- any GC -- always has is significantly increased
RAM overhead. Whether RAM is cheap (and can be traded off for things like
throughput/latency or convenience) or not depends a lot on the environment. If
RAM is expensive -- a GC is not the answer. If it isn't, it may well be the
best answer. OTOH, GCs at least potentially have an advantage that's very hard
to achieve by other means, which is compaction. Everything else is a tradeoff.

Second, I think the author jumps the gun when discussing various solutions
(ownership, stack allocation, reference counting) without first trying to
understand the problem, and the problem is -- first and foremost -- RAM
overhead. Whether or not that's a serious problem really depends on the
environment.

Also, before discussing solutions, we need to understand the domain. I think
that the domain is as follows. Objects belong in one of four lifecycles:

1\. Permanent -- objects live for the duration of the program

2\. Stack -- objects live for the duration of a single function (and the
functions it calls, of course)

3\. Arbitrary -- objects live for an undetermined lifetime

4\. Transaction -- objects live for a well-defined, time-bounded domain scope,
be it a request in a web server, a frame in a game, or a transaction in a DB.

Now, in spite of what some may think, very little of a serious program's data
can live on the stack. Server machines today come with hundreds of GBs of RAM,
and if you do the math, only a very, very, small portion of that can be
exploited by kernel-thread stacks. Lightweight threads require growable stacks
and those stacks are therefore heap objects themselves, with either arbitrary
or transaction lifetime. The cost of allocating stack-scoped objects on the
heap is quite low, as those are very short-lived objects that don't become
tenured by the GC, but allocating on the stack is better because we'd like to
avoid triggering young-gen collections, too, of we can help it (and, indeed,
Java is getting value types, although _mostly_ for a different reason[1]).

With so much available RAM, the best use an application can make of it is to
cache as large a working-set from the DB in RAM. Those objects have an
arbitrary lifetime and often require scalable concurrent access (or sharding,
but that opens a whole new can of worms). Reference counting is not a good
solution here.

Finally, transaction-scoped objects are best handled by arenas. GCs, like
HotSpot's G1 can attempt to automatically determine those transaction-scoped
objects, but the results are not as good. This is a tradeoff: a global GCs
simplicity, vs. arenas' utility and low latency.

Personally, I think that the solution first depends on whether or not you're
targeting high-performance in constrained environments (like C/C++ and Rust
do). If so, a solution like ref-couting is a great one. If not, I see no room
whatsoever for ref-counting (what scope do they target, and what advantage do
they have over a GC in an unconstrained environment?)

In _unconstrained_ environments I think the best approach is as follows:

1\. Stack allocation (value objects) where possible.

2\. A global GC for the arbitrary and permanent scope (no need to separate
permanent objects as they place a relatively low burden on a GC).

3\. Arenas for transaction scopes, _if_ you're willing to sacrifice some
simplicity for latency.

I don't see absolute necessity for ownership types (and certainly not
reference counting) in such _unconstrained environments_ , but they may
certainly be useful in statically verifying that heap objects never reference
an arena object (or arenas with wider scopes never reference shorter-lived
arenas).

In _constrained environments_ where manual memory management is required
because the GC's RAM overhead is onerous, ownership types are invaluable, as
is reference counting. You don't get compaction, but this may not matter as
such applications are usually much more careful with managing their
allocations and deallocations.

RTSJ -- Java for hard-realtime and safety-critical systems (which is getting a
new spec these days) -- makes the simplicity tradeoff, and introduces arenas
(scoped memory):
[https://www.aicas.com/cms/en/rtsj](https://www.aicas.com/cms/en/rtsj) It also
makes permanent objects explicit, because it can't tolerate any additional
cost, even if it mostly limited to program startup.

Finally, the author mentions both immutability and large objects, but the two
are often at odds. Persistent data structures that are popular in immutable-
by-default languages like Clojure rely on small object with lots of pointers
to _reduce_ memory and performance costs.

[1]: Cache friendliness of arrays and other data structures.

------
tacos
Article is from 2012 which is why it seems a little naive.

Then again, my first interview question is "what's wrong with garbage
collection?" and if you can't give me three specific use cases where it's a
non-starter then that's usually the end of the loop.

~~~
_ph_
If you phrased the interview question like that, I would automatically fail
any interview with you. I have been working with GCed languages for 20 years
now, and certainly know the ins and outs of GC a bit. So, if your question is,
which are problematic situations with GC and which mistakes were made in some
popular GC languages, we could have had an interesting discussion. But asking
"what is wrong with GC?" suggests to me that you want to hear why GC is a bad
idea and asking like that would quickly weed out people who are very
knowledgeable in writing good performing programs on GC languages.

~~~
tacos
I have been interviewing candidates for 20+ years now and that precise
language has been refined. Any super-strong opinion one way or the other is
itself a great bit of information, especially among a candidate who lists both
Java/C# and C/C++ on his resume.

If it makes you feel any better, in the early 90s the question was "what's
wrong with malloc?" And I'm sure you'd do just fine. Amazing how many people
don't, though. Better predictor of success than any of this "walk a tree"
nonsense and no whiteboard required.

~~~
_ph_
Thats fair, but then you should have your post phrased differently. You are
not literally requiring three use cases as an answer to your question or
ending the interview, but rather you are using it as a starter for a
discussion.

Indeed, it is better to have a more general discussion about the foundations
one is to work on, then to solve random riddles.

