CrossTwine Linker Makes Ruby 4x Faster

amix · on May 2, 2009

They make Python 5x+ faster too: http://crosstwine.com/linker/python.html

JIT optimizations are the way forward for dynamic languages and like they showcase it's really not that hard to improve Ruby / Python by a factor of 10 by using JIT (and other common optimizations such as inline caching) Java got a 20x improvement by moving to HotSpot VM (which introduced JIT).

ztzg · on May 2, 2009

Hi Amir,

Speeding up real-world applications can actually range from being very easy to incredibly difficult, depending on the facilities they rely on. And we are far from finished: while number-crunching applications can see substantial speed-ups, Rails is currently not accelerated by our modified interpreters.

As a services business, however, the CrossTwine Linker kernel allows us to quickly develop language-, domain-, and even problem-specific interpreters. In effect, we can “cheat” as much as we want if we know what the target problem is. (Cheat safely, that is; a problem-specific customization would retain 100% compatibility with the language—it would just be much faster on some kind of workloads.)

So while our demos currently feature generic interpreters for Python 2.6 (only in the labs for now) and 3.1, plus Ruby 1.8 and 1.9, we can “easily” whip up an enhanced custom interpreter tuned for whatever performance problem you are experiencing; and we can decide to integrate these improvements into the mainline on a case-by-case basis.

llimllib · on May 2, 2009

How similar/dissimilar is CrossTwine Linker from Pypy? It sounds quite similar from what I was able to read, in the sense that it is a JIT VM with pluggable frontends. Does it support pluggable backends as well, or just the front end?

ztzg · on May 2, 2009

There are quite a few similarities, but the design trade-offs are radically different:

* CrossTwine Linker is designed to be integrated into an existing language interpreter, without modifying its behavior besides improving execution performance. Any other difference (including when dealing with source and binary C/C++ extensions) is probably a bug.

* Is is not a VM (nor a VM compiler), but rather a (constantly evolving) set of C++ engine pieces which can be adapted using “policy-based design.” So while the demos aim to be generally fast, we can tune the engine and its adapters for any specific use-case; e.g.: you embed the Python interpreter in an application using the standard C API (no lock-in), and we make sure the binding behaves optimally with your application objects. (Note: we can also implement the binding as part of the service.)

* As such, everything is pluggable—except that we only target native hardware (PyPy is much more ambitious). But that's theory; I'll defer further comments on the backend to when we get the chance to implement a second one!

[Edit] Note that this means we support the full standard library—and all of its quirks—from day one. Which is both a curse and a blessing, because it is written with interpreted execution in mind, but that can be improved as time passes.

amix · on May 2, 2009

Hi Damien

I applaud your efforts. One thing to note thought is that the market-share of Python 2.x is a lot larger than Python 3.1 - - and I think this will be the case for some time, as Python 3.x requires lots of porting.

It would be great if you could optimize an interpreter for web-applications - where Python is used a lot. I would be very interested in paying for an web-application optimized interpreter - - and you could probably do this optimization more generally?

ztzg · on May 2, 2009

Thanks!

You're not the only one to mention Python 2.x, and we've learned quite a few things about the language since we started this effort. :) I have updated the web page to mention that we now have fully-functional xtpython2.6 in the lab! It will ship with the alpha 2 release, which is due around mid-may.

Even I was surprised that coming up with this was only a couple of days of work, because the two versions of the interpreter are very similar in structure, so most adapters could be reused as-is (Ruby 1.9 → 1.8 was more… challenging, to say the least).

I only started looking into web applications, and what can be done about them. It is a more delicate target than e.g. scientific computing, because many pieces (a lot of which are written in C) are involved at each request. Interestingly, while Django is currently seeing very modest speedups (4%) on the unladen-swallow benchmarks, Spitfire is up by 70%. Oh well, this wouldn't be as fun if my TODO list were too short, would it?

emmett · on May 2, 2009

We'd pay for a ruby 1.8.7 interpreter that could run Rails even 2x as fast at Justin.tv.

ztzg · on May 2, 2009

We're not quite there, but I'll certainly keep you informed of our progress. :)

ztzg · on May 2, 2009

Note that JIT compilation in itself does not improve speed that much, but it's an important enabler for partial specialization.

I actually believe HotSpot gets a lot of its performance from the fact that the language is “fully managed,” and thus allows for very efficient garbage collectors and other runtime mechanisms. AFAIK, HotSpot actually does not do that much partial specialization besides—very effective—method inlining and lightweight inline caching.

compay · on May 2, 2009

Very interesting, but I think being closed source and not working well with Rails are probably a major obstacles to acceptance.

oomkiller · on May 3, 2009

Sight, another Ruby interpreter claiming some odd times increase in performance. I really wonder how much faster Ruby APPLICATIONS would be if people spent more time updating gems and libraries to work with 1.9 and working on the MRI (YARV atm), rather than building oodles of Ruby interpreters.

quellhorst · on May 3, 2009

Excellent point. There is a gut reaction with coders to throw away code when they should try a bit harder to work with whats there already.

oomkiller · on May 3, 2009

Heh, what's funny is that I do it all the time. I'll see a library that has a bunch of cruft, and say "Well this looks like a good concept, but I'll just rewrite it!" It's almost comical how much it happens :)

aminuit · on May 2, 2009

From the link: "The CrossTwine Linker engine reduces the execution times by a factor of up to 4×." So let's say my program takes 10 seconds to run, then this will reduce my run time by 4 x 10 seconds, so it will run in -30s? I know it's pedantic, but things like this kill me.

Also, performing computationally intensive tasks in Ruby is somewhat akin to racing the Tour de France on fixed gear bicycle. Sure you'll look cool, but you're doing it wrong.

ztzg · on May 3, 2009

Heh, heh. I'll see if I can come up with a better formulation; any suggestion?

I agree with your remark about Ruby, but we're now looking into what can be done for speeding up real-world code, Rails being an obvious target. Also, Python is quite popular for scientific computing, and we can integrate CrossTwine Linker into other environments—including proprietary ones.

aminuit · on May 3, 2009

Python is popular in scientific computing circles because of NumPy and SciPy, which use C++ and Fortran (!) libraries for the heavy lifting.

If you want to stick to the original phraseology, then I think "The CrossTwine Linker engine reduces the execution times of these synthetic benchmarks by up to 75%" is more appropriate, though I would prefer something that highlights the quixotic nature of optimizing computationally intensive Ruby code. I suggest the following.

"Here at CrossTwine, we understand that when all you have is a hammer everything looks like a nail. Our integrated CrossTwine hammer has been optimized specifically for pounding all kinds of nails! Got a nail with threads on the outside? CrossTwine! Nail with a hex head? CrossTwine!"

I kid, I kid. If I could earn a living optimizing interpreters all day long I certainly would. Incidentally, I really like the name "CrossTwine" despite the fact that I don't understand how it relates to your product at all. Can you explain it to me?

ztzg · on May 4, 2009

Thanks! And thanks for the laugh!

I totally agree about NumPy and SciPy, but that doesn't mean the glue between these parts should not be optimized! Lots of uses of Python involve modules made of native code, and we do not have the pretension to eliminate them from one day to the next. Quite the opposite: we want to make sure both worlds work well together, and empower people to do more ambitious things on top of these concrete (in both senses of the word) foundations.

Another variation: let's say you have a large application written in C/C++ and want to provide a dynamic language for scripting/extensibility (think expert systems or CAD, where extensive customization is necessary). Not only can we provide recommendations and do the integration work for you (we're a services company), but we can sprinkle the whole thing with a bit of customized magic which will give the “interpreter” an intimate knowledge of your domain objects, removing the “speed hit” barrier and opening new possibilities.

Which brings me to your question: what CrossTwine Linker does is look at the objects, functions and modules involved in a scenario, and try to weave the fast paths together so that little dynamism is left; i.e., it ties “objects” together with a native code “thread.” That napkin drawing looked awfully like a ball of twine, we needed a name, and it seemed to be the only domain on the Internet which had not been judiciously “parked.”

sb · on May 2, 2009

hm, i just read the whitepaper linked from the url. i would rather like to see updated benchmarks from the computer language shootout, or at least have the source code of the examples available.

btw, the "computed goto"-patch mentioned in footnote 8 IS the implementation of the threaded-code technique mentioned for ruby 1.9.x

ztzg · on May 2, 2009

Mea culpa, and I agree. We have since switched to the Ruby Benchmark Suite and the unladen-swallow benchmarks as a baseline, so future numbers will at least reference those, and we will generally try to publish our branches and scripts (also for potential incorporation in upstream).

P.-S. — The white paper was in fact written shortly before the “computed-goto” patch appeared, and you are right that the dispatch techniques are very similar. Ruby 1.9 goes a bit further, however, by allowing a selection of different techniques at compile-time, the default of which (on x86-64 Linux) will use the opcode addresses directly as “bytecode” (i.e. direct-threaded vs. token-threaded code according to current Wikipedia terminology). Koichi Sasada can probably tell you more than I do about the difference in performance for Ruby code—if there is any.