Hacker News new | comments | show | ask | jobs | submit login
Generational Garbage Collection for Ruby 2.1 (ruby-lang.org)
85 points by ksec 1632 days ago | hide | past | web | 16 comments | favorite


Generational garbage collectors needs to know about pointers in the old generation that points to objects in the new generation. These objects in the new generation can only be correctly garbage collected together with the old generation. In a mutable language you need to have write barriers to detect when an object in the old generation sets a pointer into the new generation. This has been difficult in Ruby because C-extensions gets direct access to the memory locations. If a C-extension would modify the Array-pointer directly there's no way to detect this.

RGenGC solves this by adding another flag on objects: an object is either sunny or shady. All objects start as sunny. If you e.g. use a C-extension that tries to access the Array-pointer (using the RARRAY_PTR-macro) the object becomes shady. So: An object is sunny if we know write barriers are used for all updates; an object is shady if we do not know that. A shady object may or may not be modified; all we know is that a C-extension has access to the internal pointers and can do whatever it wants.

In addition, Ruby can't use a copying collector (because a C-extension might store the memory location somewhere). Instead it stores both new and old objects in the same space, only distinguished by a flag. When it does a minor collection (new generation) it will traverse/mark all roots, but ignore objects that are in the old generation. All traversed sunny objects will be promoted to the old generation (causing them to be ignored in the next collection). All traversed shady objects will stay in the new generation and also be added to the shady-set. This shady-set is a part of the roots that are traversed in the minor collection.

If a sunny object is in the old generation and becomes shady (e.g. RARRAY_PTR gets called) it will be demoted (to the new generation) and added to the shady-list.

See the attached PDF for more details (pros/cons, comparisons to other Ruby implementations, some performance numbers, internal API details).

All in all: A decent improvement of the garbage collector without breaking compatibility with C-extensions.

Thanks for the excellent summary! For anyone else, I highly recommend reading the paper itself, it's very interesting.

Correct me if I'm wrong, but my interpretation is that this trades memory for speed, so the average memory consumption of Ruby programs would increase. If memory is exhausted, a full 2.0.0-style mark and sweep GC would occur.

Programs that are heavily dependent on C extensions would not see much of a GC speed increase.

Most would, however, and they could continue to improve GC performance piecemeal by moving Ruby classes (starting with common classes like Array or String) to use the new write barrier methods.

I think what you're assuming will be relevant for old generation. Most object allocations should be short-lived so that memory should clear up.

Koichi Sasada first contributed Ruby 1.9 VM(YARV) & now new GC algo. Its amazing that how much big dent an single person can make to a language/ecosystem used by millions around the world #Respect.

Like many Open Source projects, there's a long tail of development with Ruby. 4 or 5 people make up 80% (roughly speaking) of the commits.

It's even easier to see with http://contributors.rubyonrails.org/ :

The top 2 people have 4,000 vs 3,000 commits, 1/3rd more than the 3rd person.

3rd and 4th have 3,000 vs 2,000, 1/2 more than the fifth.

6th has half of the commits that 4th does.

I myself only started working on Rails about a year ago, and got 192 commits in that time. That places me at #33 overall, out of 1735.


It's also interesting to look at numbers for releases, too. Here's Rails 4, for example: http://contributors.rubyonrails.org/edge/contributors

Same thing, smaller scale.


I'm not saying this invalidates your point in any way, just that it's the same for almost all projects; precious few people do much of the work that we rely on every day.

That is great!, about time too :-)

I remember in the early 1980s when ephemeral (or generational) garbage collection started to be available in Lisp systems - this made a large improvement in run time performance. Good work!

I actually thought this was a repost of a very old article when I first noticed it on the HN front page. Then I realized that Ruby wasn't around in the 80s. Oh.

I thought all modern dynamic languages use generational GC.

Ruby's internals are hilariously naive. 1.8 is an AST interpreter (!) and 1.9 and 2.0 are dumb bytecode interpreters, with no quickening or anything.

Not to knock it too much - if it gets the job done it gets the job done - but it would be nice if they considered adding at least some undergraduate level interpreter class optimisation techniques.

Well it wasn't long ago that JS, arguably the most used dynamic languages gets GGC.

fwiw V8 always had a generational collector.

And that was less then 5 years ago, and it was a Stop the World GC until less then 2 years ago updated to Incremental. While Mozilla's has had incremental for a year and are still working on Generational Collector.

The point is, apart from Java which gets many cooperate backing, even the most widely uses programming languages only recently had a decent enough GC implementation. As much as i hate this slow progression, most VM still dont have GGC as default at all.

Any benchmarks out there indicating performance with the new GC?

See the PDF attached to the ticket.

The presentation that goes into the details is at https://bugs.ruby-lang.org/attachments/3686/gc-strategy-en.p...

It is (mildly) funny how the title of this post was for quite a while on the same page as one about a gem for grammar/syntax correction.

It is mildly funny how it got corrected afer my post and someone downvoted it. Ha.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact