

Ruby 2.1 Garbage Collection: ready for production - sunseb
http://samsaffron.com/archive/2014/04/08/ruby-2-1-garbage-collection-ready-for-production

======
matthewmacleod
I think the takeaway is that there _actually is_ a straight-up bug in the
2.1.1 GC that causes unbounded memory growth, and that the new GC does
typically result in higher memory use.

The memory issue isn't really that serious, as it seems to be a tradeoff for
performance. Although it's not like Ruby is light on memory use as it is…

Far more interesting are some of the other issues, like this one:
[https://bugs.ruby-lang.org/issues/9262](https://bugs.ruby-
lang.org/issues/9262)

 _For an app like Discourse 3-10% of request time is occupied looking up
methods, due to cache inefficiency._

That's _amazing_ , and demonstrates that there's probably still quite a lot of
low-hanging performance fruit that Ruby can look to exploit.

All of that aside, performance is generally so much better in the 2.1.1 series
that it's really worth using.

~~~
rubiquity
I think it's part bug and part having a GC with only two generations (old and
young). When you have to choose between putting these tweener objects
somewhere, you have to be more conservative and move them to the old
generation. Once a third generation is added (Ruby 2.2?) this will be much
smoother.

> _For an app like Discourse 3-10% of request time is occupied looking up
> methods, due to cache inefficiency._

Hmmm, I thought Ruby 2.1 already had a per-class method cache, or maybe it was
just a per-class method cache invalidation, but I don't know how you coud have
one without the other. I'll have to reinvestigate this.

> _That 's amazing, and demonstrates that there's probably still quite a lot
> of low-hanging performance fruit that Ruby can look to exploit._

I'm not sure I share as much of a positive outlook. Short of adding JIT
compilation, I think the gains from here on out will start to get smaller and
smaller. The performance gains of RGenGC were very impressive, though.

~~~
vidarh
I'm working on a "as static as possible" Ruby compiler as a hobby project, and
it's incredibly frustrating at times to see the generated code grow to
ridiculous size as I'm getting closer to actually complying with real Ruby
semantics... But I do still think there are substantial gains possible.

For starters, for most method calls there's no reason to do the expensive
method lookups that MRI still uses - cache or no cache - you can use C++ style
vtables, as long as you propagate updates to them downwards when a method is
re-defined. You do need to be able to fall back to handle dynamically created
methods with names not present when you generate the vtables, and optionally
reduce waste (as the vtables needs to be the same size for all classes, with
unimplemented methods replaced with pointers to method_missing thunks), but in
terms of performance you can do fairly well and compared to this GC blowup,
the memory waste would be small.

But there's also not much alternative but going for proper JIT'ing of at least
some things.

~~~
pjmlp
Dynamic languages tend to gain more from JIT as AOT due to such issues.

On the other hand, have a look at Dylan, as it might inspire you:

[http://opendylan.org/](http://opendylan.org/)

~~~
vidarh
For the method lookup, other than for methods that are dynamically generated
with names not known at compile time, the only additional gain you'll get from
JIT is by going to full on inline caches, but vtables gets you most of the
speedup without the hassle of inline caches and tracing, and doesn't _prevent_
using tracing and inline caching down the line.

~~~
pjmlp
With JITs you get devirtualization as well, so no need for vtables.

Something possible in AOT as well to certain extent, but it requires a mix of
profile guided optimizations coupled with whole programm analysis.

Which have issues with dll/so anyway, as those calls cannot be optimized away
as in JITs.

~~~
vidarh
> With JITs you get devirtualization as well, so no need for vtables.

That's what I referred to with "inline caches". The problem is that for Ruby
you need fully polymorphic inline caches, with guards all over the place,
because unless you do tons of analysis upfront, you will have problems knowing
whether or not the world has totally changed on you after any method call, and
almost anything is a method call. (call into code you have not verified can't
possibly call "eval", and you might find that adding two integers afterwards
does in fact not add them, but returns a string and changes global variables,
and what-not)

The upshot, is that compared to vtables, you're not actually saving all that
much. E.g., take "1 + 2 - 3". You could inline Fixnum#+ (and could reasonably
do so with an AOT compiler too). But you need to add a type guard before the
inlined fragment to verify that Fixnum#+ still is the Fixnum#+ you inlined,
which at the minimum costs you a comparison and a branch _or_ you need to
record _every_ call-site with inlined code and be prepared to overwrite it
with fixups if the implementation changes.

And if Fixnum#+ has been overridden, or the Fixnum#+ implementation has method
calls, chances are you will need another guard before "-" too, because you
might not even know for sure whether or not the object returned from "1 + 2"
will be a Fixnum, so you might find that the inlined method suddenly is for
the wrong class.

I'm planning on benchmarking inline caching for my compiler against vtables,
but absent evidence to the contrary I'm expecting that there will be a very
substantial number of cases where the complexity isn't worth it, or where they
might even turn out to be slower.

> Something possible in AOT as well to certain extent, but it requires a mix
> of profile guided optimizations coupled with whole programm analysis.

It does if you want to do _everything_ upfront, but you can pull things into
inline caches with a mostly-AOT compiler relatively easily with just a little
bit of extra information, and a few guards thrown in to do some basic tracing.

~~~
mieko
I've implemented a handful of simple dynamic languages years ago, and
something I was interested in trying, but never did, was taking advantage of
the MMU to replace guard clauses.

For example, mapping a few pages for vtables/method dictionaries read only.
When something like `def` or `define_method` comes along, catch the segfault
(which in this case would actually mean "segmentation fault" instead of "I
fucked up") and rewrite all JIT blocks or method caches that depend on that
method table. Once everything has settled, generally after startup and the
vtables tend to stay more stable, the overhead seems like it'd be negligible.

~~~
vidarh
Catching the vtable updates and propagating them downwards is pretty simple,
you "just" need every class to know which classes inherits from them. There's
an implementation for dynamic runtime updates of dispatch tables for Oberon,
of all languages (though that version sidesteps the "sparse vtables" issue by
splitting the vtables into interfaces, and adding one extra level of
indirection).

The tricky bit is if you have gone as far as inlining the method.

------
jordanthoms
Kind of depressing how far behind Ruby is from V8, Hotspot, CLR etc in terms
of the sophistication of the GC, non-existent JIT etc. Still hoping someday
someone will make the investment needed to catch up.

~~~
IPGlider
Like Rubinius? Or why not JRuby?

~~~
kaffeinecoma
Because there's always something different that needs to be done for them once
your project starts becoming non-trivial. You might need a different version
of a gem (e.g. pure-java Nokogiri), or they're behind recent MRI features. And
if you care about concurrency, it's different everywhere.

In my personal experience I've found MRI to be the best experience simply
because that's what most other people are using, and there's a lot to be
gained from being in the mainstream.

Don't get me wrong- I would actually love for JRuby to become the de-facto
Ruby implementation. So many headaches are caused by native code in gems. And
we'd have a solid foundation for GC, concurrency, etc. But that's not the
current reality.

~~~
YorickPeterse
Rubinius does not need Gem replacements like you need in JRuby, it still has a
compatible CAPI.

> [...] or they're behind recent MRI features.

Part of this is due to MRI having literally no specification process at all.
Python has the PEP system, no such thing exists in Ruby land. People tried to
change this in the past
([http://rubyspec.org/design/](http://rubyspec.org/design/)) but with little
to no success so far. As a direct result of this there are only two ways to
keep up to date with what changes in Ruby:

1\. Follow every issue reported on bugs.ruby-lang.org, forever. 2\. Wait until
users report issues about something not being present, behaving differently,
etc.

> And if you care about concurrency, it's different everywhere.

This is FUD. An implementation may offer different primitives for concurrency
(e.g. Rubinius has Rubinius::Channel) but they also offer shared APIs. For
example, the Thread class works across all implementations as you'd expect.
Whether you use this on JRuby or Rubinius the end result is the same: proper
concurrency.

------
adrianlmm
And still, there is no RubyInstaller 2.1 for Windows =(.

