
CrossTwine Linker Makes Ruby 4x Faster - quellhorst
http://crosstwine.com/linker/ruby.html
======
amix
They make Python 5x+ faster too: <http://crosstwine.com/linker/python.html>

JIT optimizations are the way forward for dynamic languages and like they
showcase it's really not that hard to improve Ruby / Python by a factor of 10
by using JIT (and other common optimizations such as inline caching) Java got
a 20x improvement by moving to HotSpot VM (which introduced JIT).

~~~
ztzg
Hi Amir,

Speeding up real-world applications can actually range from being very easy to
incredibly difficult, depending on the facilities they rely on. And we are far
from finished: while number-crunching applications can see substantial speed-
ups, Rails is currently not accelerated by our modified interpreters.

As a services business, however, the CrossTwine Linker kernel allows us to
quickly develop language-, domain-, and even problem-specific interpreters. In
effect, we can “cheat” as much as we want if we know what the target problem
is. (Cheat safely, that is; a problem-specific customization would retain 100%
compatibility with the language—it would just be _much_ faster on some kind of
workloads.)

So while our demos currently feature generic interpreters for Python 2.6 (only
in the labs for now) and 3.1, plus Ruby 1.8 and 1.9, we can “easily” whip up
an enhanced custom interpreter tuned for whatever performance problem you are
experiencing; and we can decide to integrate these improvements into the
mainline on a case-by-case basis.

~~~
llimllib
How similar/dissimilar is CrossTwine Linker from Pypy? It sounds quite similar
from what I was able to read, in the sense that it is a JIT VM with pluggable
frontends. Does it support pluggable backends as well, or just the front end?

~~~
ztzg
There are quite a few similarities, but the design trade-offs are radically
different:

* CrossTwine Linker is designed to be integrated into an existing language interpreter, without modifying its behavior besides improving execution performance. Any other difference (including when dealing with source and binary C/C++ extensions) is probably a bug.

* Is is not a VM (nor a VM compiler), but rather a (constantly evolving) set of C++ engine pieces which can be adapted using “policy-based design.” So while the demos aim to be generally fast, we can tune the engine and its adapters for any specific use-case; e.g.: you embed the Python interpreter in an application using the standard C API (no lock-in), and we make sure the binding behaves optimally with your application objects. (Note: we can also implement the binding as part of the service.)

* As such, everything is pluggable—except that we only target native hardware (PyPy is much more ambitious). But that's theory; I'll defer further comments on the backend to when we get the chance to implement a second one!

[Edit] Note that this means we support the full standard library—and all of
its quirks—from day one. Which is both a curse and a blessing, because it is
written with interpreted execution in mind, but that can be improved as time
passes.

------
compay
Very interesting, but I think being closed source and not working well with
Rails are probably a major obstacles to acceptance.

------
oomkiller
Sight, another Ruby interpreter claiming some odd times increase in
performance. I really wonder how much faster Ruby APPLICATIONS would be if
people spent more time updating gems and libraries to work with 1.9 and
working on the MRI (YARV atm), rather than building oodles of Ruby
interpreters.

~~~
quellhorst
Excellent point. There is a gut reaction with coders to throw away code when
they should try a bit harder to work with whats there already.

~~~
oomkiller
Heh, what's funny is that I do it all the time. I'll see a library that has a
bunch of cruft, and say "Well this looks like a good concept, but I'll just
rewrite it!" It's almost comical how much it happens :)

------
aminuit
From the link: "The CrossTwine Linker engine reduces the execution times by a
factor of up to 4×." So let's say my program takes 10 seconds to run, then
this will reduce my run time by 4 x 10 seconds, so it will run in -30s? I know
it's pedantic, but things like this kill me.

Also, performing computationally intensive tasks in Ruby is somewhat akin to
racing the Tour de France on fixed gear bicycle. Sure you'll look cool, but
you're doing it wrong.

~~~
ztzg
Heh, heh. I'll see if I can come up with a better formulation; any suggestion?

I agree with your remark about Ruby, but we're now looking into what can be
done for speeding up real-world code, Rails being an obvious target. Also,
Python is quite popular for scientific computing, and we can integrate
CrossTwine Linker into other environments—including proprietary ones.

~~~
aminuit
Python is popular in scientific computing circles because of NumPy and SciPy,
which use C++ and Fortran (!) libraries for the heavy lifting.

If you want to stick to the original phraseology, then I think "The CrossTwine
Linker engine reduces the execution times of these synthetic benchmarks by up
to 75%" is more appropriate, though I would prefer something that highlights
the quixotic nature of optimizing computationally intensive Ruby code. I
suggest the following.

"Here at CrossTwine, we understand that when all you have is a hammer
everything looks like a nail. Our integrated CrossTwine hammer has been
optimized specifically for pounding all kinds of nails! Got a nail with
threads on the outside? CrossTwine! Nail with a hex head? CrossTwine!"

I kid, I kid. If I could earn a living optimizing interpreters all day long I
certainly would. Incidentally, I really like the name "CrossTwine" despite the
fact that I don't understand how it relates to your product at all. Can you
explain it to me?

~~~
ztzg
Thanks! And thanks for the laugh!

I totally agree about NumPy and SciPy, but that doesn't mean the glue between
these parts should not be optimized! Lots of uses of Python involve modules
made of native code, and we do not have the pretension to eliminate them from
one day to the next. Quite the opposite: we want to make sure both worlds work
well together, and empower people to do more ambitious things on top of these
concrete (in both senses of the word) foundations.

Another variation: let's say you have a large application written in C/C++ and
want to provide a dynamic language for scripting/extensibility (think expert
systems or CAD, where extensive customization is necessary). Not only can we
provide recommendations and do the integration work for you (we're a services
company), but we can sprinkle the whole thing with a bit of customized magic
which will give the “interpreter” an intimate knowledge of your domain
objects, removing the “speed hit” barrier and opening new possibilities.

Which brings me to your question: what CrossTwine Linker does is look at the
objects, functions and modules involved in a scenario, and try to weave the
fast paths together so that little dynamism is left; i.e., it ties “objects”
together with a native code “thread.” That napkin drawing looked awfully like
a ball of twine, we needed a name, and it seemed to be the only domain on the
Internet which had not been judiciously “parked.”

------
sb
hm, i just read the whitepaper linked from the url. i would rather like to see
updated benchmarks from the computer language shootout, or at least have the
source code of the examples available.

btw, the "computed goto"-patch mentioned in footnote 8 IS the implementation
of the threaded-code technique mentioned for ruby 1.9.x

~~~
ztzg
Mea culpa, and I agree. We have since switched to the Ruby Benchmark Suite and
the unladen-swallow benchmarks as a baseline, so future numbers will at least
reference those, and we will generally try to publish our branches and scripts
(also for potential incorporation in upstream).

P.-S. — The white paper was in fact written shortly before the “computed-goto”
patch appeared, and you are right that the dispatch techniques are very
similar. Ruby 1.9 goes a bit further, however, by allowing a selection of
different techniques at compile-time, the default of which (on x86-64 Linux)
will use the opcode addresses directly as “bytecode” (i.e. direct-threaded vs.
token-threaded code according to current Wikipedia terminology). Koichi Sasada
can probably tell you more than I do about the difference in performance for
Ruby code—if there is any.

