* As a very crude overview of performance, JRuby+Truffle with the latest
development version of GraalVM across 63 benchmarks is over 31x faster than
MRI
2.3.0, over 32x faster than Rubinius 2.5.8, over 17x faster than Topaz (Ruby
using PyPy technology) and over 24x faster than JRuby 9.0.4.0 with
invokedynamic *
I haven't looked at the benchmarks, but are they synthetic or are they actually relevant to typical rails usage?
>> … almost 6x faster … over 50x faster … We think this shows how can be optimise more effectively on more realistic Ruby code than synthetic benchmarks.<<
Might the difference simply be test coverage?
-- The other Ruby implementations have been testing performance on those same synthetic benchmarks, and have already taken the opportunity to improve performance for those cases.
-- The other Ruby implementations have not been testing performance in other cases, and still have considerable opportunity to improve performance for those cases.
I think the difference is that the synthetic benchmarks are generally written in a way that is as tight as possible, avoids allocation and abstraction, and they certainly don't use metaprogramming. That stuff is easier for everyone to optimise.
Real Ruby code uses a lot of abstraction, allocates objects constantly, and uses metaprogramming. Optimising these aspects of Ruby is much more complex and doing it well requires some optimisations such as partial escape analysis and powerful allocation removal that we have and JRuby and Rubinius do not.
My favourite example is this code from PSD.rb that implements a clamp routine. It does it by creating an array, sorting and finding the middle value. You wouldn't normally find code like this in a synthetic benchmark, but you would in real code.
def clamp(value, min, max)
[value, min, max].sort[1]
end
In JRuby and Rubinius that code really will allocate an array, sort it using some library routine, and then index it. In JRuby+Truffle we compile that method to effectively:
def clamp(value, min, max)
(value > max) ? max : ((value < min) ? min : value);
end
There's a massive massive difference between those two. One allocates objects on the heap, passes them into the runtime, runs a general purpose sort routine etc etc etc, thousands of machine instructions, and the other is a just couple of assembly instructions.
When you run this code as a benchmark, we're over 300x faster than Rubinius' LLVM-based JIT.
Of course we still support if someone has redefined Array#sort or something like that, and you could still find that Array instance using ObjectSpace if you wanted to, using deoptimisation.
No. JRuby and Rubinius both have benchmark suites, but I believe they don't go as far as kernels from real gems, and neither of them track benchmarks in any kind of continuous integration system, which is why I developed Bench 9000 as part of my PhD.
But if they were to benchmark and see that things like that pack method were slow, I think it is unlikely they would be able to implement the algorithms needed to improve on this kind of code, given their current implementation techniques.
Rubinius is essentially a template compiler, emitting a chunk of LLVM for each byte code. There isn't any sophisticated optimisation before it goes into LLVM, so nothing to for example partially evaluate a sort routine or remove allocations. The LLVM that comes out is far too complex for LLVM's optimisations to work for them.
JRuby relies on the JVM to do the sophisticated optimisations, and C2 (the server compiler) just doesn't have the optimisations or inlining scope needed to simplify code like the pack example. JRuby are massively improving on this with their IR, but they are going to have reimplement some very complex optimisations themselves to make this work on methods like pack.
>>But if they were to benchmark and see that things like that pack method were slow, I think it is unlikely they would be able to implement the algorithms needed to improve on this kind of code, given their current implementation techniques.<<
That may be.
I think it unlikely they would be unable to improve on this kind of code without performance tests for this kind of code.
I'm not sure what you're getting at any more. These aren't benchmarks we've pulled out of nowhere. It's existing Ruby code that people are running to make money right now. Any other implementation could have tried to improve on the performance by running it just as we have.
Correct. We do not currently support RubyGems. It's something we're working towards, but it's a fairly complex project and not having OpenSSL support currently limits the ability to install gems quite a bit. However, we do ship with a tool that will install gems using JRuby without Truffle and then running your script with a the $LOAD_PATH set up appropriately. Please see:
Re: Rails ... there's still a fair bit of work involved there. We've been working on passing all the ActiveSupport tests (we're currently at 99%), as that's a core dependency. We haven't looked much into the other gems. Things like ActiveRecord simply won't work since we currently don't run C or Java extensions. I think it's a bit more likely we'll start with a custom driver of some sort with an ActiveModel front-end. I strongly suspect the asset pipeline and Spring will present problems, as well. The rest of Rails should pull together somewhat quickly and that's a big goal for us in 2016.
News on Truffle/Graalvm has been spotty at best, this report shows the tech is really quite phenomenal! One wonders how Nashorn (new JavaScript impl) performance will turn out. Could Oracle end up owning the fastest js?
Great work, Chris Seaton et al. I was very impressed with Chris's presentation at Rubyconf about Truffle a few years ago. I look forward to your efforts and results in 2016!
Congrats on the progress and thanks for the very useful update. Unlike many messages of this sort, yours answered all my questions, was appropriately respectful to alternative implementations, and included URL footnotes. Nice job.
I'd by lying if I said you could find something useful to contribute in just one evening's work, but if you have a few evenings you could probably pick a core library class, see what is currently missing in it, and fill in a few gaps.
You'd need to understand Java and Ruby, but we can help you with the rest.
Yes we have an interpreter for C to run C extensions. It runs real C extensions from Oily PNG and PSD Native 3x faster than native, due to inlining C and Ruby together.
People are now working on an LLVM bitcode interpreter instead, as this should support many languages not just C. There will be info about this at FOSDEM.
The Truffle backend in JRuby+Truffle uses a different class hierarchy that makes supporting the Java extension API tricky. It's certainly doable and we haven't ruled it out, but the API parts of JRuby really need to be ironed out first. Right now it's essentially anything that's public in JRuby core, which is a bad situation to be in. Various attempts at rectifying that have predictably broken third party code, so it needs to be handled with care.
Of course, rather than trying to fix the general problem, we could support the most common JRuby gems out there or the core subset of the API they use. These are just the sorts of things we need to look at a bit more.
thanks for replying.
What you are talking about is potentially a big issue. Like it or not, the biggest usage of jruby is in running Rails. Unlike jython, which not a lot of people use, jruby is extremely popular.
one of the reasons for its popularity is the way it can be hosted using Tomcat or build a jar and drop it in a container ... a nice little trojan horse into the world of enterprisey software.
I would be very excited for Truffle+Jruby if it outperforms in this usecase. Where do you envision Truffle/Jruby being used - as a dropin replacement for MRI or for JRuby ?
Or are you thinking of a third ecosystem around this project ?
One of our stated goals is to run code is to run code without modification, including extensions. We have no intention of creating a third ecosystem. However, with Truffle's polyglot interface, someone certainly could write a JRuby+Truffle specific extension -- that's just not our general plan for extensions.
Early in its life, JRuby+Truffle was a separate project and its goal was to run everything MRI can. Thanks to efforts on both the JRuby and Oracle sides, the project was open-sourced and merged with JRuby. That brought along a ton of benefits for us, but did open the JRuby ext. can of worms.
I'm trying hard to answer your question without committing to solve a problem we haven't looked at in depth. I suspect we'll wind up supporting some portion of the JRuby ext. API, but whether that ends up via an alternative implementation, a sandboxed JRuby runtime, a bytecode processor using something like a hypothetical TruffleJava, or something entirely different, I don't know. We've had enough pure Ruby code to try to get working that we've been able to defer the extension discussion for a bit.
Having said all that, to really get the benefits of JRuby+Truffle you're going to want to use Graal. While JEP 243 [1] should make deployment scenarios much easier starting with Java 9, you'll probably still need to coordinate with your ops team since the JAR needs to be on the bootclasspath. That's really just to say that you won't be able to just drop the JRuby JAR in and get optimal performance -- there's still going to be a second step that you might not be able to "backdoor" into your stack.
Look, I appreciate the engineering, but I'm genuinely curious if you think taking the JVM + building a MRI-compatible stack with C-transpilation is the perfect first order goal.
I'm not sure why you refer to Jruby ext api as a can of worms - perhaps there is some VM level dragons that I'm not aware of... but the Jruby ecosystem is brilliant. I built my first software services startup on top of Jruby - custom enterprise software that runs on Tomcat (the idea came from Mingle - a JRuby based enterprisey Basecamp from Thoughtworks)
Even in a general usecase, Jruby is brilliant when used with Puma. This deployment is unbeatable by any other ruby server. However, startup times in the java world is definitely a pain.. which is why developers tend to use MRI. In fact I know lots of developers who develop in MRI and deploy in jruby.
I think if you decide to take the direction of building a jruby compatible implementation with MRI beating startup times and with even one fourth of the performance you are talking about... tell me where to sign the cheque ;)
Hi there - no problem. Feel free to ask any question you'd like. You're not going to hurt our feelings and we'd rather clarify any confusion than let it linger and manifest into some weird FUD. On that note, we're quite active in the JRuby IRC and gitter rooms, so you ever have a question you want answered more real time, feel free to hop in and ask away.
The "can of worms" comment wasn't meant to be pejorative -- only that there's a lot to the problem and it's not exactly clean. There simply isn't an extension API in JRuby. There's just the JRuby JAR and whatever is publicly reachable.
The situation today is there are two backends available for JRuby: the standard IR-based one and the Truffle-based one. The two backends make use of shared infrastructure, but are otherwise quite distinct. E.g., both backends have different object models for Ruby-level classes & objects. Take Ruby strings for instance. They use a shared abstraction for the underlying implementation, but there's no shared interface -- it's all composition. Creating a Ruby string in a Java extension means importing org.jruby.RubyString and creating an instance of it. JRuby+Truffle represents strings differently, so creating a string that way presents a problem.
The simplest thing to do is translate everything between the two runtimes, but that incurs a lot of overhead, unnecessary object allocations, and grossly limits our ability to optimize calls. The harder solution is to provide a JRuby+Truffle implementation of the API, but given its current lack of definition, we'd have to support the de facto API for 100% compatibility and the de facto API is massive. We're working with the JRuby core team to annotate a portion of the API to be used for extensions, at which point the problem becomes much easier to manage.
FWIW - Prior to joining Oracle Labs I made a big gamble on JRuby for my start-up (Mogotest). I love JRuby and have been working with it for years outside of my current gig. I'd love to see JRuby+Truffle simply work with Java extensions, because they can be really nice (JDBC + Sequel is amazingly fast). Unfortunately, it's just a messy issue at the moment.
May I suggest a blog post on this. What you have written here on HN has been far,far more interesting than the original blog post and is making me excited to dive into the source code.
Unfortunately, our startup time is actually a bit worse than JRuby's. We end up having to essentially bootstrap two different runtimes, since JRuby+Truffle makes use of functionality from the main JRuby runtime. It's a known problem and we have concrete steps to take on improving the situation. It will get better, but you'll likely be disappointed with startup time today.
In addition to the steps we're taking, there's a sister project in Oracle Labs called the Substrate VM that would allow us to build a static binary with startup time on par with MRI. I can't speak to release plans or timelines on anything. I only mention it to show that startup time is a concern for us and we're thinking about how to fix it in a wide variety of ways.
Graal will be available as a plugin to stock OpenJDK's HotSpot in Java 9: http://openjdk.java.net/jeps/243 (although I don't know what it means regarding the production-readiness of Graal itself)
I haven't looked at the benchmarks, but are they synthetic or are they actually relevant to typical rails usage?