

Generational Garbage Collection for Ruby 2.1 - ksec
https://bugs.ruby-lang.org/issues/8339
At this rate MRI is gonna be getting most of Rubinius before Rubinius 2.0 shipped.
======
judofyr
Summary:

Generational garbage collectors needs to know about pointers in the old
generation that points to objects in the new generation. These objects in the
new generation can only be correctly garbage collected together with the old
generation. In a mutable language you need to have _write barriers_ to detect
when an object in the old generation sets a pointer into the new generation.
This has been difficult in Ruby because C-extensions gets direct access to the
memory locations. If a C-extension would modify the Array-pointer directly
there's no way to detect this.

RGenGC solves this by adding another flag on objects: an object is either
_sunny_ or _shady_. All objects start as sunny. If you e.g. use a C-extension
that tries to access the Array-pointer (using the RARRAY_PTR-macro) the object
becomes _shady_. So: An object is sunny if we know write barriers are used for
all updates; an object is shady if we do not know that. A shady object may or
may not be modified; all we know is that a C-extension has access to the
internal pointers and can do whatever it wants.

In addition, Ruby can't use a copying collector (because a C-extension might
store the memory location somewhere). Instead it stores both new and old
objects in the same space, only distinguished by a flag. When it does a minor
collection (new generation) it will traverse/mark all roots, but ignore
objects that are in the old generation. All traversed sunny objects will be
promoted to the old generation (causing them to be ignored in the next
collection). All traversed shady objects will stay in the new generation and
also be added to the shady-set. This shady-set is a part of the roots that are
traversed in the minor collection.

If a sunny object is in the old generation and becomes shady (e.g. RARRAY_PTR
gets called) it will be demoted (to the new generation) and added to the
shady-list.

See the attached PDF for more details (pros/cons, comparisons to other Ruby
implementations, some performance numbers, internal API details).

All in all: A decent improvement of the garbage collector without breaking
compatibility with C-extensions.

~~~
mratzloff
Thanks for the excellent summary! For anyone else, I highly recommend reading
the paper itself, it's very interesting.

Correct me if I'm wrong, but my interpretation is that this trades memory for
speed, so the average memory consumption of Ruby programs would increase. If
memory is exhausted, a full 2.0.0-style mark and sweep GC would occur.

Programs that are heavily dependent on C extensions would not see much of a GC
speed increase.

Most would, however, and they could continue to improve GC performance
piecemeal by moving Ruby classes (starting with common classes like Array or
String) to use the new write barrier methods.

~~~
jondot
I think what you're assuming will be relevant for old generation. Most object
allocations should be short-lived so that memory should clear up.

------
gary4gar
Koichi Sasada first contributed Ruby 1.9 VM(YARV) & now new GC algo. Its
amazing that how much big dent an single person can make to a
language/ecosystem used by millions around the world #Respect.

~~~
steveklabnik
Like many Open Source projects, there's a long tail of development with Ruby.
4 or 5 people make up 80% (roughly speaking) of the commits.

It's even easier to see with <http://contributors.rubyonrails.org/> :

The top 2 people have 4,000 vs 3,000 commits, 1/3rd more than the 3rd person.

3rd and 4th have 3,000 vs 2,000, 1/2 more than the fifth.

6th has half of the commits that 4th does.

I myself only started working on Rails about a year ago, and got 192 commits
in that time. That places me at #33 overall, out of 1735.

\-------------------------------

It's also interesting to look at numbers for releases, too. Here's Rails 4,
for example: <http://contributors.rubyonrails.org/edge/contributors>

Same thing, smaller scale.

\-------------------------------

I'm not saying this invalidates your point in any way, just that it's the same
for almost all projects; precious few people do much of the work that we rely
on every day.

------
mark_l_watson
That is great!, about time too :-)

I remember in the early 1980s when ephemeral (or generational) garbage
collection started to be available in Lisp systems - this made a large
improvement in run time performance. Good work!

~~~
jwr
I actually thought this was a repost of a _very_ old article when I first
noticed it on the HN front page. Then I realized that Ruby wasn't around in
the 80s. Oh.

I thought all modern dynamic languages use generational GC.

~~~
ksec
Well it wasn't long ago that JS, arguably the most used dynamic languages gets
GGC.

~~~
mraleph
fwiw V8 always had a generational collector.

~~~
ksec
And that was less then 5 years ago, and it was a Stop the World GC until less
then 2 years ago updated to Incremental. While Mozilla's has had incremental
for a year and are still working on Generational Collector.

The point is, apart from Java which gets many cooperate backing, even the most
widely uses programming languages only recently had a decent enough GC
implementation. As much as i hate this slow progression, most VM still dont
have GGC as default at all.

------
rubyfan
Any benchmarks out there indicating performance with the new GC?

~~~
steveklabnik
See the PDF attached to the ticket.

------
sciurus
The presentation that goes into the details is at [https://bugs.ruby-
lang.org/attachments/3686/gc-strategy-en.p...](https://bugs.ruby-
lang.org/attachments/3686/gc-strategy-en.pdf)

------
pfortuny
It is (mildly) funny how the title of this post was for quite a while on the
same page as one about a gem for grammar/syntax correction.

~~~
pfortuny
It is mildly funny how it got corrected afer my post and someone downvoted it.
Ha.

