
Garbage Collection in Ruby 2.1 - tmm1
http://tmm1.net/ruby21-rgengc/
======
AlexanderDhoore
It's interesting to compare this to how python does it (reference counting
with generations): [http://patshaughnessy.net/2013/10/30/generational-gc-in-
pyth...](http://patshaughnessy.net/2013/10/30/generational-gc-in-python-and-
ruby)

"At first glance, Ruby and Python seem to implement garbage collection very
differently. Ruby uses John McCarthy’s original mark and sweep algorithm,
while Python uses reference counting. But when we look more closely, we see
that Python uses bits of the mark and sweep idea to handle cyclic references,
and that both Ruby and Python use generational garbage collection in similar
ways. Python uses three separate generations, while Ruby 2.1 uses two.

This similarity should not be a surprise. Both languages are using computer
science research that was done decades ago – before either Ruby or Python were
even invented. I find it fascinating that when you look “under the hood” at
different programming languages, you often find the same fundamental ideas and
algorithms are used by all of them. Modern programming languages owe a great
deal to the ground breaking computer science research that John McCarthy and
his contemporaries did back in the 1960s and 1970s."

~~~
oleganza
I think Henry Baker had a paper on why generational and refcounting GCs are
ultimately equivalent. Couldn't find it quickly, though, to give you a link.

~~~
hencq
I think you might be thinking of this paper: A unified theory of garbage
collection [1], which shows how most modern collectors are hybrids of GC and
ref counting.

[1]
[http://atlas.cs.virginia.edu/~weimer/2008-415/reading/bacon-...](http://atlas.cs.virginia.edu/~weimer/2008-415/reading/bacon-
garbage.pdf)

~~~
brandonbloom
For the record: This is a _very good_ paper. It's extremely easy to read, even
for somebody who knows nothing about the background, and the scientific result
is absolutely beautiful!

~~~
kirubakaran
I just wanted to say thank you for this comment. I'm reading the paper because
of it. Can you please point me to your other favorites that are good +
accessible?

~~~
brandonbloom
I wouldn't even know where to start :-)

~~~
kirubakaran
Haha you aren't getting off that easy. Could you please tell me, say, 3 that
come to your mind?

------
RyanZAG
Also interesting to compare this with how the G1 collector for the JVM handles
it: [http://www.infoq.com/articles/G1-One-Garbage-Collector-To-
Ru...](http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-
All)

EDIT: Better link:
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.6386&rep=rep1&type=pdf)

------
stcredzero
_So Ruby objects occupy 40 bytes each, inside pages on the eden heap._

That's pretty huge. That's several times larger than some other dynamic
languages -- languages that have essentially the same features as Ruby.

~~~
phamilton
Not really. Python primitives are smaller (24 bytes). Python strings are 37 +
1 byte per character. Python objects are 72 bytes. Python classes are 104
bytes. (Got these numbers playing around with sys.getsizeof in the python
console.)

Primitives are a bit bigger in ruby, but they aren't really primitives anyway
(everything is an object).

~~~
stcredzero
Python is another offender!

~~~
phamilton
What languages are you referring to than? Perl and PHP are really the only
others that are sort of in the same class as Ruby and Python.

~~~
stcredzero
Smalltalk, for one.

------
jeffdavis
To me, it's interesting that after all this time and with the prevalence of
languages that rely heavily on garbage collection, garbage collectors still
don't seem to scale to large heaps.

Large heaps mean long pauses when doing the "mark" phase of the generation
holding most of the objects. That seems like a big problem that will increase
as memory sizes increase.

Maybe the way I'm looking at it is too simplistic and there are better methods
now. But even in Java, which has had plenty of time to work these issues out,
I have seen issues with long GC pauses.

My impression right now is that GCs just don't scale to large heaps. To use
more memory, you need to either manage memory yourself (which not a lot of
modern languages allow), or increase the number of independent heaps (by using
more tasks/processes).

Please enlighten me if I'm wrong here.

~~~
eonwe
That seems to be the case in Oracle Java at least. With 80+ GB heaps, you can
experience multi-minute stop-the-world collections when a full GC strikes.
Luckily, that can be tuned to be very rare ( _for the usual load_ ).

For better latency without such hiccups, the only solution I know of is Azul's
Zing
([http://www.azulsystems.com/zing/pgc](http://www.azulsystems.com/zing/pgc))
which really did away with larger pauses at least with our software.

~~~
jeffdavis
What is "the usual load"?

Interesting; I'll check out the Azul one. I'm a little skeptical, but it might
be an improvement. Did you see better throughput or just reduced latency?

~~~
eonwe
By usual I mean what our application was tuned for.

Which, sadly, may or may not be what it encounters. There are still some usage
patterns that can cause large amount of time being spent doing gc (without
Zing).

Latency was greatly reduced, I cannot really say about throughput (except that
it was not clearly worse at least).

------
rurban
The important number is the avg pause time. Here with ruby 2.1 7ms for a minor
and 58ms for a major sweep for a typical ~500K heap app.

In my potion-based GC (a stack-scanning, compacting, cheney two-finger GC) the
avg GC needs 3ms, but it's not concurrent (multi-threaded) yet.

All of these can be considered real-time, i.e. < 15ms. For bigger heaps the
scans need to be incremental (saving and restoring GC state, which is easy
with libgc or cheney).

The best java GC I found needed 150ms pause time. Good lisp's have real-time
GCs.

------
venus
What's this RailsApp.preload_all method mentioned in this post? Is that a
github thing?

Great post btw. I am looking forward to getting some production apps onto 2.1
in the coming weeks and seeing how performance characteristics have changed in
the new release.

~~~
tmm1
We implement preload_all, which loops over app/{models,controllers}/ and
requires everything. This method is generally called from config.ru, and
happens in the unicorn master before it forks off workers.

I recently upstreamed a warmup method for Rack::Builder you can also use for
this purpose:
[https://github.com/rack/rack/pull/617](https://github.com/rack/rack/pull/617)

~~~
venus
Thanks for that link (and for submitting it!). I've done something similar in
deploy scripts, to warm up a fresh instance before bringing it into rotation -
even better having it able to be baked into rack like that.

------
ksec
And all these are all very similar to Google's Chrome V8 and Mozilla's
SpiderMonkey. Although both had incremental done already, for Ruby that is
scheduled for v2.2.

Now just when will Ruby get a Method or Tracing JIT?

