

J is for JVM: Why the ‘J’ in JRuby? - icey
http://www.engineyard.com/blog/2009/j-is-for-jvm-why-the-j-in-jruby/

======
stephenjudkins
JRuby has been invaluable at scaling out our web scraping. Specifically, we've
managed to leverage its concurrency support very effectively.

We're scraping many millions of pages, so a single-threaded scraper wouldn't
be practical. Since most of the time, a given thread is waiting on IO, we get
our highest throughput with a few dozen scraper threads running at once.

Trying to use multiple threads in MRI is pretty much a big mess. For whatever
reason, it starts falling down after three or four. Ruby 1.9's fibers might be
a good solution but it wasn't out of beta when we started this project.

Using EventMachine with MRI might get us the same effect, but changing a
relatively large synchronous codebase to work asynchronously is a big task for
an uncertain benefit. With JRuby, we simply pointed our (synchronous) Ruby
code to a event-driven HTTP client which uses futures to block each thread.
Threads are certainly heavier-weight than some concurrency primitives, but
with only a few dozen the JVM is great at holding all the relevant state.

We've also been able to leverage tools in java.util.concurrent and use Scala
actors with pretty great effect. JRuby (and the JVM) isn't without its faults
but when it comes to concurrency it's the Ruby implementation to beat.

~~~
bodhi
Just curious, for your MRI tests, were you using 1.8? You probably already
know, but 1.8 has green threads but 1.9 threads are native.

Someone mentioned in a sibling that 1.9 still has a GIL, thus making
concurrency a bit painful. I wonder if it is released when waiting on IO?

~~~
stephenjudkins
Yeah, MRI is Ruby 1.8.

I imagine 1.9 would perform a great deal better. Since we're scraping entirely
on single-CPU machines, the GIL probably wouldn't hurt us much. We rewrote the
scheduler and our interface to the HTTP client using Scala, however, so now
we're pretty wedded to the JVM.

However, if we were to use Ruby 1.9, we would have to spend more time
investigating HTTP libraries. From our experience, it's worth it to have
workers threads NEVER block directly on IO. We saw a huge increase in
reliability by having worker threads (1) send a request to an event-based
client, running in its own thread and (2) block on a future with a set
timeout, waiting for the callback. Having individual threads talk directly to
the sockets, in either MRI or Java, wasn't a promising approach. If sockets
got wedged (and they do, even using very reputable HTTP libraries) the event-
based approach keeps on humming, while the blocking approach grinds to a halt.

There exist several event-based HTTP clients for Java that all work well
(though some significantly better than others). Compare that with Ruby, where
we couldn't find anything that mature.

That's not to mention all the concurrency primitives available in Scala,
Clojure, and the underrated java.util.concurrent.

Even if Ruby 1.9 might look much more competitive to JRuby in synthetic tests
of concurrent performance, in the real world the Java ecosystem features tons
of libraries that helps one write reliable, predictable code quickly.

------
haasted
The article's title is a bit unfortunate. The article is actually a long and
very interesting summary of why the Java Virtual Machine is also a great
platform for other languages. The title hints more in the direction of an
anecdote.

~~~
ZeroGravitas
Some of it seems a bit ill informed and fanboy-ish though.

I'm not a JVM expert but claiming, for example, that _"Hotspot [is] available
wherever Java is available"_ is clearly false. Maybe he means JIT when he says
Hotspot, but that's not true either.

The Zero and Shark projects are currently trying to bring OpenJDK and then a
JIT to the platforms that Sun doesn't directly support (and even then doing it
in a hacky way, as doing it for real i.e. rewriting Hotspot would be too much
work):

<http://icedtea.classpath.org/wiki/ZeroSharkFaq>

And I'd love someone knowledgeable to compare the "free upgrades" you get from
JVM updates to the same effect you get from GCC or LLVM improvements, Profile
Guided Optimisation or updates in underlying libraries and OSes.

In general the summary could be "We build our wacky idiosyncratic language on
top of a mature, well-engineered, portable system that was designed and built
to do something else". Matz also built his Ruby on top of mature, well-
engineered portable systems (Unix etc.) and while Ruby is cool and benefitted
from building on that base, it wasn't magic pixie dust (or people wouldn't be
so keen on JRuby).

------
old-gregg
Looking at the graphs: would I trade 2x gain in _eventual_ performance for a
20x gain in startup speed, 3x leaner RAM consumption and great immediate
performance? I wasn't sure, but after 2 months with JRuby - yes I would.

My vote goes to Ruby 1.9.4 coupled with speedy and efficient C extensions.

~~~
qw
_"2x gain in eventual performance"_

It only takes 2 seconds before JRuby catches up with 1.9.2. That's not long,
so unless your program only runs for a few seconds, the wait is worth it

~~~
old-gregg
The way I see it, languages like Ruby aren't meant for high performance
computing: one can pick from a dozen statically typed languages to implement
time-critical algorithms. The power and beauty of Ruby is its flexibility,
"fluidity", and ability to glue services provided by an operating system and a
million OS-native modules. Just look into your /usr/lib.

JRuby lacks that: it only "glues" JARs together and it introduces a shockingly
huge startup lag which makes your development feel like you're compiling C++
code between test runs. And even gluing JARs is kind of pointless: Java
itself, with its plethora of autocompleting IDEs, is a much better environment
to learn and experiment with APIs like Batik or Apache POI.

~~~
megaduck
Perhaps JRuby simply isn't for you. We're using JRuby with Rails, and the
tradeoffs are _well_ worth it. Performance is quite good, and the laggy
startup is irrelevant to a long-running server process like Rails.

More critically, our development time is significantly reduced by using a
highly expressive language like Ruby. Code is more readable, more
understandable, and a heck of a lot faster to write. Plus, we can still use
all the neat Java libraries like Jetty, Lucene, and JavaMail. Add in things
like java NIO, and it matches our needs nicely.

While JRuby might not suit your particular application, it's doing wonderful,
useful work for a lot of people. Horses for courses, as they say.

