
Real-Time Garbage Collection Is Real - Mongoose
http://michaelrbernste.in/2013/06/03/real-time-garbage-collection-is-real.html
======
rayiner
Studying garbage collection is a wonderful education in algorithm engineering.
Despite decades of work, there is no "best" GC algorithm. Instead, there are
different points on the space of optimizing for
space/throughput/latency/completeness/etc. Moreover, the various algorithms
are linked by deep correspondences (e.g. Bacon's result that all collectors
lie on a spectrum between pure tracing and pure reference counting, and that
things like generational collection are hybrids.)

~~~
snprbob86
Saving somebody 10 seconds, here is Bacon's paper:

<http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf>

~~~
apu
Adding the initial http links it automatically:

<http://www.cs.virginia.edu/~cs415/reading/bacon-garbage.pdf>

~~~
snprbob86
Heh, fixed mine to save somebody yet one more second. Wrote this comment to
waste that second that I saved.

That said, I'm reading this paper now. It's absolutely fascinating and very
approachable. Well worth checking out.

------
mcartyem
There have been a lot of posts related to garbage collection lately but none
of them touched upon what I see as a crucial issue: why is garbage collection
needed to begin with?

Could you do without it? What is the key point that made it necessary?

I'm aware of it being introduced by McCarthy in the original Lisp paper in
1960 of course. But I suspect what McCarthy originally meant is not what
garbage collection turned out to be. What I suspect he meant was that there
needs to be a way for memory to be managed. malloc/free offer a way for memory
to be managed, and presumably they weren't invented until C was nine years
later. What McCarthy might have meant is what became malloc/free in C, which
doesn't need garbage collection.

C isn't the only flag here. Was there any OS on the IBM 704 used to implement
the original Lisp? Did the OS support multiprocessing? Because if it didn't
(UNIX wasn't invented until 1969 either) it would make sense for memory to be
available for a single process. And it would mean when people said garbage
collection they were envisioning malloc/free.

(Also, databases and operating systems can live without whatever makes garbage
collection necessary, since they don't use it, and those are pretty complex
and fast pieces of software.)

So, what makes garbage collection different than malloc/free, and why is it
necessary? I'd love to learn more about that.

~~~
jerf
I think one of the keys to understanding garbage collection is to understand
that it is on a continuum of memory management techniques, and the line is a
great deal less bright and shining than people often realize. malloc/free is
"manual memory management", right?

Well, not really. Truly manual memory management is getting an enormous block
of memory from the OS, and fully manually choosing what goes where within that
block of memory. This is indeed a real technique used when the situation is
dire enough. If you're not doing that, you're deferring _something_ to your
automated solution, the only question is, how much?

malloc/free is a combination that still gives you lots of power, but it's not
perfect when you start pushing the abstraction hard enough. All malloc/free
combinations have some sort of pathological case, where whatever allocation
strategy they are choosing is wrong for some reason. That's why there isn't
just one implementation, there's many, and some applications can see big wins
switching, while others may see losses.

Garbage collection isn't really some sort of binary dichotomous decision vs.
malloc/free; both are forms of automated memory management. Garbage collection
techniques just step it up, and try to infer when you are done with memory.
The disadvantage is, they may not be as smart as a human, the advantage is
that they're a heck of a lot more consistent and pay much better attention.
Then, even within "garbage collection", you've got a gradient; you may see a
language like Rust with mostly deterministic memory management, but with an
easy call-out to GC if you need it. You may see a language like Perl, with
relatively consistent finalization as soon as an unshared reference leaves a
scope, or you may see something that only collects during sweeps. At the far
end you get imprecise garbage collectors, such as those used in Go right now
(though they are working on it, and have already made a lot of progress), so
even within the realm of GC there's a range of precision vs. speed vs. nuances
the programmer needs to know to use them.

GC is necessary because one particular point on this large spectrum isn't the
right choice for every program. It isn't even the maximally manual solution.
In fact, there's even some middle grounds between malloc/free and fully manual
memory management, such as arena allocation. There's a lot of fine gradation
on this scale.

"(Also, databases and operating systems can live without whatever makes
garbage collection necessary, since they don't use it, and those are pretty
complex and fast pieces of software.)"

A bold statement. Have you ever heard the term "compaction" used within the
context of a database? That's a form of garbage collection. Contrary to what
you said, _almost all_ databases have some form of garbage collection.
(Probably all the ones you're thinking about.)

As for whether operating systems have garbage collection, it depends on your
precise definition of "operating system". Kernels may lack it, but as critical
as they are, and as much sheer staggering code they may have for drivers and
stuff, conceptually they actually aren't that complicated, compared to a lot
of userland things. Broaden your definition and you'll find some form of
garbage collection appear again. And if you include "reference counting"
approaches as a form of garbage collection, the Linux kernel uses that all
over the place. Is reference counting a form of garbage collection? Well, it
can actually be a tough call... because it's all on a continuum.

~~~
mcartyem
(thank you for the willingness to write such a detailed response)

There exists a point on the memory management continuum where management
starts being more automatic (gc) than manual (malloc/free). I would like to
understand the forces surrounding this specific point, right before the scale
tips towards automatic.

If you tried to build a dynamic language without automatic management, what
would break and why?

~~~
jerf
The biggest problem is scope control; as you start having closures that get
passed around freely, those closures drag values along with them that you
can't collect. It is not _impossible_ to write this with malloc/free, but I've
played that game and it's not very fun. And remember, what seems easy in one
little blog post isn't easy in a real program where you've got dozens of the
little buggers flying every which way. (And by dozens, I mean dozens of
distinct different types of closures from different sources, not just dozens
of instance of the same code instance.)

Many of the dynamic languages fling closures around with wild abandon, often
without you even realizing it. (One object has a reference to another object
which is from the standard library which happens to have a closure to
something else which ends up with a reference to your original object... oops,
circular reference loop. Surprisingly easy.)

There isn't much technically impossible with malloc/free (though IIRC there
are indeed some cases that are both sensible and actually can't be correctly
expressed in that framework, but the example escapes me), but there's lots of
practical code where the cost/benefit ration goes absurdly out of whack if
you're trying to write the manual freeing properly. It's hard to write an
example here, because it arises from interactions in a large code base
exceeding your ability to understand them. It's like when people try to
demonstrate how dangerous old-style pthreading is; even though the problem is
exponentially bad in nature, anything that fits in a blog post is still
comprehensible. The explosion doesn't happen until you got to real code.

~~~
mcartyem
I can see how closures would take some work. Thanks.

There's evidence garbage collection was not the desired solution but a plan B.
McCarthy writes the reason reference counting was not implemented was a
hardware limitation [1]:

"Since there were only six bits left in a word, and these were in separated
parts of the word, reference counts seemed infeasible without a drastic change
in the way list structures were represented. (A list handling scheme using
reference counts was later used by Collins (1960) on a 48 bit CDC computer).

The second alternative is garbage collection..."

[1] - [http://www-
formal.stanford.edu/jmc/history/lisp/node3.html#S...](http://www-
formal.stanford.edu/jmc/history/lisp/node3.html#SECTION00030000000000000000)

------
a-priori
While the "Metronome" has very predictable behaviour that makes it probably
the best GC collector for real-time purposes, it still has a maximum GC load
before it gets backed up. If it gets backed up too far... forget about timing
guarantees because the system will fail. The "Metronome" collector can
guarantee a known and tunable GC capacity over time (in terms of objects
collected/sec), which is good. But the flip side is that you need to be able
to guarantee that your application will never exceed that capacity, at least
not for any sustained period of time.

In order to provide hard real-time guarantees in a garbage collected system,
you need to know that there is no situation in which the system produce more
garbage faster than the collector can collect. With manual deallocation, you
can prove that with static analysis. With garbage collection you have to
demonstrate it empirically using dynamic analysis. That requires exhaustive
testing to make sure you've covered the worst-case scenario.

~~~
dllthomas
I don't see why you could not, in principle, prove it statically.

~~~
a-priori
Maybe there are tools that will do that. My knowledge of the industry is about
5 years out of date, and I was no expert even then. But we had no such
solution and, in fact, didn't use dynamic memory allocation at all nevermind
newfangled gizmos like garbage collection.

In principle, yes I think it's possible... at least, I don't think it reduces
to the halting problem. But it would be tricky. It would be relatively simple
to reason statically about the rate of memory _allocation_ (iterate through
all paths leading to a 'new' operator), but for this purpose you care about
cases where an object becomes garbage and can be _deallocated_. That occurs
when the _last_ reference to the object is overwritten or goes out of scope,
which is not so easy to determine in the presence of aliasing.

~~~
Dylan16807
>It would be relatively simple to reason statically about the rate of memory
allocation (iterate through all paths leading to a 'new' operator), but for
this purpose you care about cases where an object becomes garbage and can be
deallocated.

No I don't. All I care about is having enough free memory to make new
allocations. I don't care how _much_ garbage there is, I just care that when
there's garbage it's being freed fast enough to support my allocations.

~~~
moomin
Just to state the obvious, in practice over the long term, you've got to be
deallocating faster than you're allocating, or something nasty that will at
the very least violate your performance constraints will happen.

~~~
Dylan16807
Right. And my point is that the deallocation vs. allocation ratio is the only
metric you really need. How fast garbage is made is completely irrelevant
because over the long term it's bounded by the allocation rate. You don't have
to solve the hard problem of figuring out how fast garbage can be made, you
can solve the much easier problem of bounding allocation. And of course in
either scenario you have to show that there are no leaks.

------
pron
There are actually quite a lot of hard real-time, mission critical systems
(mostly defense) using RTSJ (real-time specification for Java) implementations
in production, but I don't know how many make use of realtime GC (RTSJ allows
for a semi-manual memory management, much simpler than malloc/free but not as
easy as a full GC). Some RTSJ implementations have a realtime GC, like IBM's
WebSphere Real Time (<http://www-03.ibm.com/software/products/us/en/real-
time/>) -- that's the one using Metronome -- and Aicas Jamaica VM
(<http://www.aicas.com/>). Sun/Oracle also had an RTSJ JVM with a realtime GC
(said to be better than Metronome), but it seems to have been discontinued
recently.

------
tinco
The author starts the piece enthusiastically marvelling at the fact that Real
Time Garbage Collectors exist, but the article doesn't go very deep into how
this particular one does it.

I myself was a bit disappointed when I read the limitations, which reveal that
the simple laws still hold, you can't make these guarantees without exactly
knowing the upper limit of the amount of memory you are going to allocate.

In the event that you design a real time system that dynamically allocates and
deallocates objects, wouldn't it be almost or just as easy to implement manual
memory management (through pools or whatnot) as it would be correctly identify
the maximum amount of allocated memory?

~~~
eru
I guess that depends on your tools. Perahps you have an automatic tool to
estimate your memory usage? Of course, programmes amenable to this kind of
automatic analysis must be written in a particular style---because in general
you can prove anything about arbitrary programmes---but that style might still
be easier to bear than managing your own memory.

------
the8472
Or you could go for pauseless garbage collection, then you only have to
concern yourself with the collector throughput keeping up with the
allocations.

[http://paperhub.s3.amazonaws.com/d14661878f7811e5ee9c43de884...](http://paperhub.s3.amazonaws.com/d14661878f7811e5ee9c43de88414e86.pdf)

------
FooBarWidget
Finally a good use of the word "real-time". _This_ is what real-time means,
not web apps that stream data over WebSockets.

~~~
scotth
Can't it mean both?

~~~
lttlrck
Yes, providing it's appropriately prefixed with Soft or Hard...

~~~
qznc
The web stuff is not even soft realtime. A better term for the web stuff would
be "live" or "continuously updated" or "clickless" or whatever.

------
JulianMorrison
I wonder if this could be improved also by "time stealing" - if the mutator is
idling, it waives its slice, if the GC doesn't expect to collect much, it
waives its slice. The result would be more irregular but still able to give
guarantees.

~~~
a-priori
What jjs says is true: this 'time stealing' is non-deterministic, which makes
it a no-go for hard real-time systems.

For soft real-time systems, it would be a good idea. It would improve the
average-case performance and/or power consumption while still providing the
same worst-case guarantees.

~~~
wtracy
Even in a hard real-time system, allowing the garbage collector to "catch up"
whenever the mutator is idling sounds reasonable. It should make the system
slightly more robust in the face of irregular heap usage.

~~~
a-priori
I'd argue that in a real-time system, the GC should be tuned such that it
should never need to 'catch up' (i.e, in each round of collection, the
collector always collects all garbage produced since the last round). If it
does, that should be a non-fatal error condition. But I digress.

Keep in mind that real-time systems are unique in that -- unlike most software
-- they have well-understood requirements and limits. There shouldn't be
anything 'irregular'. If there is, then you don't fully understand your system
and need to do some analysis.

But that said, it's not up to you or I to determine what is 'reasonable'.
That's up to the certification body, and they're notoriously conservative
about what they will certify (with good reason, I might add). If something
causes non-deterministic behaviour, and is not necessary for the function of
the system, they will almost certainly ask 'why is that there?' and you'd
better have a good answer.

Anecdote: I once had a similar thing happen. As a rookie, I once had to
implement a search algorithm for one reason or another. I decided to use a
recursive implementation of binary search. This routine was flagged during
certification. The problem with recursion is that, unlike an iterative
solution, as the problem size grows, a recursive algorithm grows in memory as
well as time (we couldn't assume the compiler would be smart enough to use
tail recursion) and it's hard to prove the maximum stack usage statically. I
know, I tried and ended up replacing it with an iterative implementation of
binary search.

------
jwatte
I'm not sure I agree work based allocators are out. Specifically, if you
follow/mark N references for each M words allocated, you can guarantee to not
fall behind and still have a strict upper bound on collection cost/run-time
jitter. This adds a linear-in-size cost to memory allocation, which already
typically has an amortized linear cost (because you touch all memory
allocated) so it's analytically very well behaved.

------
justncase80
Would this also be deterministic? That seems like the fatal flaw with most
current GC's.

~~~
a-priori
Depends what you mean by 'deterministic' I suppose. Basically the system
they're describing assigns a fixed time slice to the collector. In that way,
yes it is deterministic in that the application is guaranteed to have the
remaining processor time.

What isn't deterministic is the load on the collector. That is determined both
by the rate at which your application generates garbage as it performs its
work, and how that garbage generation lines up with the collector's time
slices.

Under normal circumstances that load should have no affect on your
application's behaviour or response times. But unlike a regular collector,
this one will not degrade gracefully (becoming steadily slower as GC load
increases): it will work 100% normal until it reaches a breaking point, at
which time it will fail catastrophically.

