Hacker News new | comments | show | ask | jobs | submit login
Ruby 2.1 Garbage Collection to Address Criticism on Large Scale Deployments (infoq.com)
66 points by DanielRibeiro on Sept 16, 2013 | hide | past | web | favorite | 62 comments



You should go and read the Ruby Garbage Collector implementation. It's very straightforward to read, the code is simple but reveals why it's so slow. I won't say anything bad about it but it's not a stellar piece of software engineering.

The good thing is that there's plenty of room for improvement.


What surprises me is that there are a lot of companies using Ruby. Why don't they put their money where their mouth is? Hire a team of sufficiently skilled developers and pay them to improve the GC implementation. It isn't rocket science :)


I wonder if rocket scientists say "It's not like we're building a garbage collector".


There's only so many GC engineers in the world, and the majority work in academia, industrial research labs and places like the Oracle VM team. So some web-app company isn't likely to hire one. Also they aren't interested in making some toy GC for a language they don't use.


And this is why Silicon Valley has a hiring problem.

No, you hire someone who can code C and read the GC literature. The problem space is well understood. You are not breaking new ground, just applying known algorithms.


This is an absolutely ridiculous comment. Writing a GC for a dynamic language like Ruby is not just a question of knowing C and reading some literature. It's a massive problem to solve. This is not something that you double your burn rate in the hopes that 2-3 years down the line you can double performance of your application cluster. It's simply not worth it. Startups have immediate problems to solve or they die.

And BTW, the reason we have a hiring problem is not because we are too picky about specific experience, it's because when it comes to software engineering some people have it and some people don't, and the ones who don't are the ones whose resumes are consistently out there filling up inboxes, and the ones who do are rarely looking for a job, and when they are the Facebooks, Twitters and Googles of the world are throwing around $200k + benefits like pocket change.


>This is an absolutely ridiculous comment. Writing a GC for a dynamic language like Ruby is not just a question of knowing C and reading some literature. It's a massive problem to solve.

No it's not. There are tons of good GC's around. You can even improve Ruby's GC with 1990 technology.

(E.g did the LuaJit guy solved a "massive problem"? Mostly by himself?)


Some people routinely underestimate what a team of two or three smart guys and gals can do if they get in a room together and really work on a problem.


GC in Lua is a fundamentally much easier problem because it doesn't do threads.

Also, "Mike Pall can do it so everyone can" is not a very good argument -- Mike Pall is fucking brilliant, and quite a lot of his code is stuff that your typical great C coder can't ever do.


The OP was suggesting that only someone who had worked at Oracle could solve the problem. At least you are saying that someone who "has it" might have a chance.

The issue of whether startups would fund it is a different question. Quite clearly the performance of Ruby is not in fact a huge problem for most people, and we haven't had a huge organisation that could fund stuff in the way, say, Facebook has with PHP.

I haven't looked at Ruby's GC, but I am guessing it was written by one person originally, and that could be done again. The poster at the top said it was simple. Its a big problem but as I said, its not a perfect solution needed just a better one.


There was a line from a talk about Rust[0] that really stuck with me, commenting on the design choice to only garbage-collect task-local heaps using reference counting with cycle detection: "we wanted to avoid spending the next 20 years writing another world-class garbage collector." It is not just a single person project to write a really good GC.

0: I think it was this one - http://www.infoq.com/presentations/Rust

Edit: wording and precision


Note that in Rust it's not as big a deal as in Ruby because of two factors: GC usage is not idiomatic, and since it's task-local it won't ever stop the world.


Absolutely. My point being they seem to have purposefully worked the attributes you mention into the design of the language largely because of how hard it is to write a really good GC. The Rust team has lots of people who know C and are very capable of reading literature, and still didn't really want to tackle it, because it's hard.


Implementing GC rarely seems as simple as "reading the literature" and implementing the algorithms they tell you. One problem is that you're working with an existing code base, and you can't afford to make mistakes in your GC. Sure, there are plenty of (good) GC algorithms out there that are public knowledge, but implementing it alone may be difficult, implementing it efficiently is another order of magnitude more-so.


So this leads to the question: Why there isn't an open-source garbage collector that can be used across different projects?

Surely Python, V8, and Ruby have enough in common, allocation-wise, that the code can be shared? They're all C in the back-end. It's just pointers and more pointers at the lowest level.

It's not about making a toy GC, it's about making a top-rate GC that works and packaging it up as a tool.


Garbage collection is a particularly cross-cutting implementation issue, and tends to interfere with all kinds of abstraction. You're essentially making a system that knows which specific ways abstraction can be broken safely, so that you can defer resource management to it in the future. Retrofitting a GC is even harder. You can bolt on a conservative collector (like boehm-gc), sure, but that can lead to many unexpected performance issues (see e.g. http://timetobleed.com/6-line-eventmachine-bugfix-2x-faster-...).

Writing a virtual machine for a GC'd language and then trying to give it a portable, reasonably convenient C API convinced me that it's a significantly harder issue than most people realize. (Particularly after scrapping my third prototype!) The C API also impacts many things, and tends to bring up nasty edge cases in the GC.

Lua seems to have settled on a design that works unusually well, and is worth studying. Still, Lua can safely cut corners that Python and Ruby cannot - they have threads, which also break a lot of assumptions.


The Boehm GC [1] has been integrated with various other languages in the past. I'm aware of various attempts to get both Python and Ruby code running with it.

1. http://www.hpl.hp.com/personal/Hans_Boehm/gc/


To a certain extent that's what the JVM is -- you just have to target its bytecode :-).


That's not exactly a portable back-end toolkit you can just link in.


It's not really possible to do that with GC. It requires analyzing and dealing with basically every assumption about object lifetimes that is done within the system, and that typically requires you to read and understand every single line of code.

I have worked on GC projects, and imho bolting on a good GC system on an existing language runtime is considerably harder than reimplementing the entire runtime on top of a GC platform.


Really? Seriously?


I think the guy has a good point. GC is a relatively obscure area, with few people having expertise in it. This, plus the fact that the number of Rubyists who also know C (and know C well enough to work on the GC) is very small, doesn't help the current situation. The Ruby GC is also extremely tied to the VM implementation, making it generally very hard to hack on.

I made the Ruby garbage collector copy-on-write friendly several years ago.


Throwing another few boxes into your web app is significantly cheaper than hiring a GC engineer to rewrite the implementation.


It is too bad they aren't looking at the very nice MPS: http://www.ravenbrook.com/project/mps


Sad fact is MRI even with no matter how much effort we put in would never would be able to match GCs of battle tested VM like JVM.

so I am wondering, it makes sense to stop designing a VM & just focus on language. Perhaps, something like Truffle can be used to do the optimizations for you. Ruby on truffle is already 4-5x faster than MRI. And the ruby implementation was done by an intern in just 6months. All because power of JVM.

        https://twitter.com/headius/status/362616159897534465
        http://www.oracle.com/technetwork/java/jvmls2013wimmer-2014084.pdf
Something to keep in mind.


Most Ruby implementations started with awesome numbers in a ridiculous short time. Remember the time when MacRuby was 6 times as fast as MRI? Those times are over after they implemented Ruby proper and ran real world programs.


Haven't seem them getting slower in later stages, and surely not THAT slower. Any citations?

Their main problem is that the project got mostly abandoned in favor of the paid-for RubyMotion (which uses AOT compilation).

That said, Lua and JS are as dynamic and Ruby and both are plenty faster.

And PyPy is plenty faster than Python, despite being 100% compatible now.


Yes, but PyPy was not written in 6 months. For a rather comprehensive (but not so recent) benchmark, see:

http://etehtsea.me/the-great-ruby-shootout http://igor-alexandrov.github.io/blog/2012/11/05/yet-another...

MacRuby looses against MRI in many benchmarks and MRI doesn't fare as bad as many people say.


PyPy wasn't, but as a result of the groundwork we laid in building it, I (and several contributers) were able to build Topaz in under a year: http://topazruby.com/


How much work do you expect it to be capable to replace MRI?


Curious to know what are thing in "ruby proper" that make it slow?


The short answer is that Ruby is "too dynamic" and "too malleable": An object can get a new class when you call a method on it, for example, or every method might be redefined, including seemingly safe ones like arithmetic on constant integers.

The longer explanation of this one issue (there are others):

Somewhere deep in the bowels of your code, some library calls eval(foo). "foo" is what exactly? If you can't track it back to a constant string, you're SOL from an optimisation standpoint: Any and all method pointers and class pointers you may have cached to speed up method calls are now unsafe, as for what you know, the next time you try to do "2 + 2" you might get "5". Or a string. Or your harddrive might get formatted - that latter is particularly important: Methods may change from not having side effects to having side effects or back, potentially in the middle of evaluating an expression.

99% of the time, the changes will not affect core classes, but even so a language implementation that wants to be complete will need to at least guard against it even for basic things like adding two numbers.

That means there's lots of stuff you can't cache, or where you need logic to invalidate caches that can be very complex, or you need to be able to validate any cached information. And you either need to do this for every method call, or you need to find ways of safely rolling back calculations (and you'd need to ensure you're not inadvertently triggering side effects that you can't roll back if you choose that option), or you need to be able to deduce what code can trigger these things, and tread with appropriate caution afterwards.


For example method cache invalidation is a hot topic in dynamic languages that is often ignored in synthetic benchmarks. A Ruby example for this is:

  output.extend(ContentTyped)
(thats from Sinatra: https://github.com/sinatra/sinatra/blob/154859f1553b8bacea97...)

I am not saying that these things cannot be implemented fast, but are usually the things that get ignored in a first implementation and take a lot of work to get right and quick.


Now I want to slap a Sinatra developer.

Note that the problem is not to implement the example you gave fast. That example almost certainly won't be: Implementing it without instantiating a new eigenclass just for that object and populating it with the right methods would be tricky, and that is not going to be particularly fast.

The problem is as you mention that method caches must be invalidated, affecting other calls that we otherwise would expect to be fast, and requiring overhead everywhere to prevent doing the wrong thing in the face of having the rug pulled from under you.

In this case it's fairly benign: ContentTyped "just" installs attribute accessors, and "only" affects that object, and sufficient bookkeeping could restrict the cache invalidation accordingly.

(Thanks, btw., it's an interestingly rare example of a real use of extending objects directly)


The problem here is not just that it trashes the method cache for this object. In most versions of MRI, this trashes _all_ method caches, not just the ones for String.

A sufficiently clever JIT could find out that there is only ever a String instance extended (so, basically, the extension is monomorph) and introduce the resulting type.

However, this example is not rare. Some more:

The DCI pattern: http://www.sitepoint.com/dci-the-evolution-of-the-object-ori... (scroll to source code) ROAR, a popular representation gem: https://github.com/apotonick/roar


It shouldn't need to cache method caches for String at all. It is not extending String. It is extending a specific instance of String (or whichever other class that object happens to be - for the rest of this I'm assuming it's a String), and thus not even touching the String class, but the eigenclass of the object:

  >> module ContentTyped
  >> attr_accessor :content_type
  >> end
  => nil
  >> foo = "bar"
  => "bar"
  >> foo.extend(ContentTyped)
  => "bar"
  >> foo.content_type
  => nil
  >> "another string".content_type
  NoMethodError: undefined method `content_type' for "another string":String
  	from (irb):31
Externally this eigenclass is not readily visible, but within MRI it is, and for the purposes of a method call, the immediate superclass of foo above is not String, but its own object-local class object.

If you are right this might be one of those areas where there's room for quick-wins. It doesn't take a "clever JIT" to localise this damage by ensuring the method cache is only invalidated for whichever entity is extended (which in this case is no existing class, unless the object has been previously extended).

As for your examples, I wonder if these people realize that their code is equivalent to instantiating a new class for every single object. To me ROAR for example just looks horribly conceptually broken. In effect they are creating a new class per object in order to add non-stateful behaviour, which just makes me want to rage

(EDIT: and DCI makes me want to rage too, for different reasons - they could achieve most of the same by composition by wrapping the objects in static classes instead of dynamically extending the objects; to me these examples stands as prime examples of how the horrible performance of MRI leads to people making implementation choices they'd never make with a Ruby implementation where the things that can be made fast are fast).

I don't care if that is never a fast case, as they have options that would be faster: include the representer modules in the resource class like Article, subclass and include, wrap it when needed. I want it to be easy to write fast Ruby code - I don't particularly care if pathologically crazy implementation choices remain slow...


> It is not extending String. It is extending a specific instance of String, and thus not even touching the String class, but the eigenclass of the object:

Whoops!

  module ContentTyped
    def self.inherited(base)
      String.instance_eval do
        def lol
          "lol"
        end
      end
    end
  end


I wanted to find the reference but I failed.

Anyway, I just wanted to counter that IIRC mike pall of luajit fame has stated that in his opinion a VM/JIT laser focused on a single language's semantics is much easier and in the end more effective than using a general purpose one.

Alas, I can't find the reference, but the luajit example stands on his own. Whether designing your own VM is a better use of money&time is largely a trade off of what your targets are. E.g. HotSpot could have been a great target unless you are interested in fast startups, or whatever.


The numbers for Ruby on Truffle are meaningless until people have had a chance to hit it hard with all the particular oddities of Ruby. Consider their 6 months was to hit 45% of RubySpec. You can reach 45% of RubySpec fairly easily if you go for the softest targets (I'm not saying that's what they've done - I haven't checked).

[EDIT: I see they're doing some interesting things that certainly ought to beat MRI. If I understand it correctly it seems like they are somehow collapsing type checks for multiple operations. Of course the devil is in the details - if they are trying to defer type or method checks, and throw away results if the checks fails (which should be rare) that will only be safe if modifications that does happen does not introduce or remove side effects that can't be "rolled back", but I might be misunderstanding their presentation]

The problem is the multitude of bizarre things that are legal Ruby. Like people doing eval("class Fixnum; def + other; 42; end; end;"). Yes, that's legal, and yes that means any integer arithmetic in your application is suddenly broken. More importantly it means any optimisations based on your beliefs about what any piece of code is meant to do, while they are most likely right, can turn out to be horribly wrong and so are problematic for a VM or compiler, without substantial amount of logic to be able to detect or bail out from optimised code to safe fallbacks. Doing so without slowing down the code when your guesses are right is hard because of how many ways there are of changing the behaviour of code in Ruby.

Unless your compiler understands eval() and it is possible for it to reason about the contents of the eval string, it can make pretty much zero guarantees about the state of the world after an eval() call, and so it can make pretty much zero guarantees about the state of the world after any method call that could reach such an eval() call.

Admittedly, that's a stupid thing to do, but it's legal in Ruby, and while the above example is extreme, you do find a lot of use that is roughly equivalent. E.g. autoload creates as much lack of predictability as eval. So does a 'require' or 'load' that might get triggered later in execution, for example.

The reason those are important is that it makes a massive amount of optimisations far harder: You can't blindly cache method pointers, for example, because any method call potentially invalidates them. You can't even cache class pointers, because they can change: You can return from a method call and suddenly an object has an eigenclass. You can't inline functions without guarding them somehow to fall back to the full method call when it turns out some idiot did redefine Fixnum#+. You can't assume seemingly "safe" stuff like Fixnum#+(some other Fixnum) will even return an object of the type you assume, for the same reason - someone might decide to implement a DSL that redefines it.

Frankly, it'd be fantastic to start deprecating some of the more obnoxious things like these, and weeding out the few uses of them, but as it stands today, a fast Ruby subset is "easy". A fast complete Ruby implementation is an entirely different beast. A fast incomplete Ruby implementation that refuses to support some of the most noxious corner cases would still be extremely useful for a lot of people, though.

(in the interest of disclosure since I'm talking about another Ruby implementation: I'm writing a series on my own slow process of writing a Ruby compiler, though my goals are very different - mostly focused on writing about the process)


I'm the author of Ruby on Truffle.

I'll talk you through exactly how we solve the problem of redefining Fixnum, as one example of how we've tackled these problems.

Whenever you use Fixnum#+ in one of your methods, we lookup what that method is and cache the method so we can call it quickly next time. We actually never again check that this cache is still valid. The trick is that we sort of do the opposite - any time you do something that could invalidate that cache, we find the installed machine code that uses it, and delete it. If the machine code is still running somewhere on some stack for some thread or fibre, we jump from the machine code into an interpreted version which looks up the method again and carries on.

So Kernel#eval makes no difference - if something that you eval ruins your later cached method calls in the same method, that's not a problem because if you're still running the same machine code, then you can't have redefined Fixnum#+. If you had redefined it, you'd be back in the interpreter getting ready to compile again with new caches.

I'll also just point out that running RubySpec means we are successfully running something like 5000 lines of off-the-shelf unmodified systems code, just for the harness before we even get to the tests.

Our theory is that we can make Ruby very fast, without having to forgo any of your favourite random dynamic monkey-patching features.

Watch the video: http://medianetwork.oracle.com/video/player/2623645003001

Join us on the mailing list: http://mail.openjdk.java.net/mailman/listinfo/graal-dev


Would you mind providing some "grand order of things" hand-wavey estimate as to when exactly the public can expect to have Fast Ruby ™ © ®?

Also, will it the very same Ruby we all know, compatible with everything? i.e. will it be the Christmas I envision?


I'm afraid I can't - sorry. Keep an eye on the mailing list or follow me on twitter (@ChrisGSeaton) though.


Done, thanks. And good luck to you guys.


>Unless your compiler understands eval() and it is possible for it to reason about the contents of the eval string, it can make pretty much zero guarantees about the state of the world after an eval() call, and so it can make pretty much zero guarantees about the state of the world after any method call that could reach such an eval() call.

That's also true for Javascript, but it hasn't stopped three separate teams (Mozilla, Apple, Google) making it crazy faster than Ruby.


First of all "Ruby" is not an implementation. The performance differences between MRI and the different patched versions of it, MacRuby, MagLev, jRuby, Rubinius etc. can be substantial.

[edit: I referenced https://github.com/cogitator/ruby-implementations/wiki/List-... here, but then I read through the rest of the list, and too much of it is "junk", so I added the list of the more mature implementations above instead]

But the point is not that making speed improvements is impossible. I strongly believe Ruby can be made as fast as current JS implementation or better.

But there's a vast gap between being able to, and for performance numbers based on 45% of RubySpec to mean anything about what we can expect to see in terms of performance of this particular implementation.

I'm working on a Ruby compiler myself (woefully incomplete; certainly vastly faster than MRI on the tiny subset it can compile, but pointless to benchmark for exactly the reasons I stated: I have no way of telling how much performance it'll lose to deal with method invalidation etc.), and I absolutely love that more people try to implement Ruby and it'd be fantastic if these performance gains stay as they flesh it out.

As someone else pointed out: Several Ruby implementation started out with impressive numbers on some subset of the language. Then they slowed down more and more as code was added to handle the corner cases of the language.

Maybe these guys will do better. Maybe they won't. The current benchmark does not tell us either way.


>First of all "Ruby" is not an implementation.

Sure, but I was obviously reffering to MRI. Not to mention that, back in my day, Ruby WAS an implementation.

>But there's a vast gap between being able to, and for performance numbers based on 45% of RubySpec to mean anything about what we can expect to see in terms of performance of this particular implementation.

Sure I agree.

>As someone else pointed out: Several Ruby implementation started out with impressive numbers on some subset of the language. Then they slowed down more and more as code was added to handle the corner cases of the language.

I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage. Perhaps add some good stuff from Python in for good measure.

Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?


> I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage.

No, if you want languages that are less expressive but faster, there are plenty of options available. The thing is, MRI itself keeps getting faster implementing the whole of Ruby. The fact that new Ruby implementations that implement the easiest bits first tend to start out much faster than the mainline interpreter on code that only uses those most-straightforward-to-implement bits but then tends to converge closer to the speed of the mainline interpreter as it the implementation gets more complete doesn't mean that it would be better to make a new language. Much of the experience of those alternative interpreters is relevant to making improvements in the mainline interpreter that speed up, or otherwise improve, "complete" Ruby (including times when the alternative interpreter becomes the mainline interpreter, as occurred with YARV, which was an alternative interpreter before it became the mainline interpreter in 1.9.)

> Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?

If you are referring to mruby, that's an embedded-oriented Ruby implementation, not a Ruby-like language.


Losing too much of Ruby would make it pointless. But there are certainly edge cases that we could lose readily and not care very much.

When was the last time you saw anyone redefine operators on Fixnum for a good reason, for example?

My pet peeve is small things like freezing many of the base classes by default, for example. As well as a "meta programming module" in the standard library that'd gather up as much as possible of what people (ab)use eval() for today in specific, narrowly tailored methods that implementations could provide specialised versions of. A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing - anything that'd combine to allow implementations to defer type and method cache guards and invalidations as long as possible would make it far easier to improve performance, with very little impact on most developers.


> When was the last time you saw anyone redefine operators on Fixnum for a good reason, for example?

Special-case restrictions like the ones that would be necessary to prevent this is increasing, not decreasing, the overall complexity of the language, even if it decreases the complexity of the implementation. That makes it harder to keep a mental grasp on the language.

> As well as a "meta programming module" in the standard library that'd gather up as much as possible of what people (ab)use eval() for today in specific, narrowly tailored methods that implementations could provide specialised versions of.

Rather than such a module, that's actually been the normal evolution of the core is to include, in the appropriate places (usually as methods on Object, Module, or Class) methods that capture the common use cases of eval. But those uses evolve, and removing general purpose eval would both limit the flexibility of the language and limit the signal that eval usage patterns provide for the future development of the language.

> A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing

An eval that interprets a different language that is almost like the Ruby that the implementation it is hosted in evaluates would be yet another layer of complexity in the language.

> A limited "general purpose" eval() that is not allowed to modify the class hierarchy would be another good thing - anything that'd combine to allow implementations to defer type and method cache guards and invalidations as long as possible would make it far easier to improve performance, with very little impact on most developers.

While the developers that directly use the impacted features might be a small number, the developers that use software that uses the impacted features under the hood would be much bigger.

There are plenty of languages that aim to performant and the cost of expressiveness. There's no reason for Ruby to change to compete in that space.


>No, if you want languages that are less expressive but faster, there are plenty of options available.

I'm not sure there are in the style we're talking about. Only Lua comes to mind. Maybe I'd add Julia there too.

A modern Python/Ruby replacement, built for speed and with a large-ish community would be nice to see. Even with static inferred typing.

It's not like we can't have new languages anymore. After all both Ruby/Python came out of nowhere around 1992-4, a time where there was no modern web and even less resources to grow a language.


> A modern Python/Ruby replacement, built for speed and with a large-ish community would be nice to see.

Both Python and Ruby have performance as main areas of focus for improvement, having largely met their goals in terms of expressiveness. So, to a large extent, that's what each new version of Python and Ruby already is.

If you mean "built for speed first", then there are plenty of those (though they aren't really expressly aimed as Python/Ruby replacements, because "built for speed first" isn't Python or Ruby's focus, so something built that way isn't really targetting Python or Ruby, even though it may be targeting some subset of the places where Python and Ruby are currently applied.)

Many new languages fit this niche (langauges designed with performance as a key focus that target some subset of the use cases of Python/Ruby.) But for the most part, they aren't very Python/Ruby-like, because the difference in goals leads to much bigger changes than slicing off features of Python/Ruby.

> It's not like we can't have new languages anymore.

No, its like we have lots of new languages, as well as lots of existing languages, and lots of use for improved versions of existing languages, so it doesn't make a lot of sense for people who aren't the people involved to say that people currently working on new Ruby implementations should stop working on them and instead work on new "Ruby-like" languages with reduced features that fit niches that aren't what Ruby is targetted at, but are what other existing and new languages are already targetted at.

Both because there are plenty of people already working on what you want, and because "people should stop working on what is important to them on spend their time working on what is important to me" is a generally silly when you aren't paying the people in question for their time/effort.

> After all both Ruby/Python came out of nowhere around 1992-4

If you are talking first public release, that's "1991-1995", if you are talking 1.0 release that's "1994-1996" (in both cases, Python and then Ruby.)


The Ruby developer in me wants to have his lunch and eat it.

I'm sure there's a way to have the crazy whole that is Ruby AND the speed. Dammit, Javascript is fast now and it's not like it was more optim-ready than Ruby is to begin with.


> I wonder then, if all that talent is not better served by implementing a BETTER version of the core 50% or 70% Ruby language as a new language, getting rid of edge cases and garbage.

I would very much like to see that. But I think we first need more competition to MRI.

E.g. note the proposal for "refinements" for Ruby 2.1 which is an implementation nightmare for anyone that wants a fast implementation. See Charles Nutter's (of jRuby fame) comments about it, for example. Since MRI is as slow as it is, there's little incentive to keep features like that out - it won't kill MRI performance.

It's also hard to determine exactly what the viable subset should be, since most those of us who would like a faster Ruby also love Ruby a lot because of how dynamic it is, when it is used right, and jump ship if the language lost too much flexibility in the quest for that performance.

Part of my own motivation for writing about writing a Ruby compiler is exploring what parts of Ruby can be implemented efficiently because frankly it's hard to even guess exactly how an "efficient Ruby" should look.

There are some obvious problems for implementations that want to boost performance, like too much reliance on eval() and defining eigenclasses on objects, as well as autoload, require and include, all of which can worst case trash all method caching and optimisations all over the place.

But throwing out all of that would be brutal, especially given common Ruby patterns likes dynamically require'ing everything in a directory at application startup, which should be fine, vs. a "require" occurring later in execution. And it's not clear that there aren't other common patterns that'll cause a lot of pain.

I think we will see at least implementations with options to disable support for certain things, or with support to let applications declare "from now on, no shenanigans" to let the implementation take shortcuts, would be very helpful.

There's a lot of things developers could do themselves, that would let even a compliant implementation speed up. E.g. call #freeze on all classes you have no intention of modifying somewhere that is easily identified by relatively superficial analysis would make a massive difference (suddenly you can cache lots of extra methods, and even inline and unroll things like "each" loops in many cases that'd otherwise be an expensive flurry of method calls).

Other things are about developer practice: Freeze all objects you don't want to modify ASAP on creation. A good implementation could make good use of that too. But today there is no incentive for Ruby users to write Ruby that is amenable to fast execution because the implementations don't take advantage of it.

Once there is an implementation that makes the advantages of writing Ruby to a subset that is more amenable to fast execution, then I think it'd be possible to get traction for deprecating and removing features that are a performance nightmare.

> Matter of fact, didn't Matz do something like that, with an embedded-oriented Ruby like language recently?

Sort of, though mruby appears to focus on size and ability to embed rather than specifically picking a subset that's amenable to a fast implementation. It's still a bytecode interpreter, for example, and it's not even aiming for complying with the (already limited) ISO standard for Ruby.


>Sad fact is MRI even with no matter how much effort we put in would never would be able to match GCs of battle tested VM like JVM.

Nothing imposissible about it happening though. Just money and time.


Ruby and Java were both released in 1995. It's now 18 years later and MRI can't keep up with a 1998 JDK. Could be a long wait...


It's not about absolute time. I mean a GC worse than another 15 year older GC, does not necesarilly needs 15 years to catch up to it.

It might never catch up, or it might catch up in just a few years years, if people and funds are concetrated on that goal.

For example from the legacy of the usual dog slow JS implementations, we got V8 feature complete in less than 2 years and could run rings around them. Same for the JIT-ed js engines of Mozilla and Apple.

And while Google and Apple have tons of funds, Mozilla scale development is not that far from the reach of a typical open source project with industry bakers.


Is Github still running Ruby 1.8? vmg mentioned at last year's Rubyfuza they had "one of the largest 1.8 deployments in the world".


Nope, we're on 1.9. We were briefly on 2.0 (last week) but had some minor performance regressions that we need to understand before we go back.


For anyone interested in how it will be implemented, see ko1's presentation slides from Euruko 2013.

http://euruko2013.org/speakers/presentations/toward_more_eff...


Are the videos of the talks available? or more details about the implementation of the new architecture?


They will be eventually on http://www.baruco.org/. The conference has only just finished.





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: