But running multi-process is heavy on the JVM as it has to load the virtual machine and JIT for every process start, special when the processes are short lived.
Java solves it by encouraging running things as multi-threaded instead of multi-processed.
But Ruby wasn't built to run multi-threaded. And CRuby even if can run threads, does it by implementing a global thread lock. Aka, only one thread can run in a process at a time. The other threads have to wait until it finishes.
It looks like the multi-process implementation is good enough for CRuby and JRuby's attempt to turn it into multi threaded application didn't improve things.
Solution is to build a JVM that can load the virtual machine, JIT and execute applications fast enough like C programs.
That’s...not really true. Ruby wasn’t really written with parallelism in mind at all, because it mostly ran on machines that couldn’t run really parallel processes. MRI threads were originally green threads, which allow a high degree of concurrency without parallelism, with less overhead than native threads.
When old MRI was replaced with YARV in 1.9 (which became the new MRI), it got native threads with a global VM lock (GVL, similar idea to Python’s GIL) which allowed running thread-safe native code with real parallelism but only having one thread running Ruby code at a time. This made Ruby thread-based concurrency somewhat more expensive, but some parallelism possible in Ruby (as native code can and some basic common processes like waiting on I/O do release the GVL.)
And Ruby 3.0 introduces a new parallelism model with Ractors (basically inspired by the Actor model), which are logically above the thread level (each contains its own set of nonshared threads) and below the process, don’t share mutable state within the VM, and each have their own VM lock, allowing a higher degree of Ruby parallelism without going multiprocess.
This is a valid summary of some facts, but not of the article or facts relevant to the article. Some quotes:
"Jekyll is not forking processes, so that is not the issue."
"The area where JRuby and TruffleRuby shine are long running processes that have had time to warm up. Based on suggestions I put together a repo of a simple small Jekyll build being built 20 times by the same process in a repo here. After 20 builds with the same running process the build times do start to converge, but even after that MRI Ruby is still fastest."
> I think there are two reasons for this:
> * Real-World projects like Jekyll involve a lot more code, and JITing that code has a high start-up cost.
> * Real-world code like Jekyll or Rails is optimized for MRI Ruby, and many of those optimizations don’t help or actively hinder the JVM.
The title seems a bit provocative, though I guess if you were reading it from within the Jekyll community, it makes sense without further disclaimers, but otherwise the article seems fairly even-handed.
The threading stuff just seems like a special case of his second point.
Commercial versions of "a JVM that can load the virtual machine, JIT and execute applications fast enough like C programs" have been available since around 2000, like Excelsior JET or WebSphere Real Time JVM among others.
The JIT cache used in recent versions of Hotspot started as part of JRockit JVM, also commercial only product.
The AOT compiler only runs when the device is idle.
Starting with Android 10, they introduced a mechanism to upload PGO data into the store, so that when an APK is installed, if such data is already available the JIT/AOT don't have to relearn everything from scratch regarding the application.
They also improved the GC several times, the latest generation is quite good,
There are several Google IO talks about how they went through this.
Never used it though, so no idea how it works in practice.
As it runs standalone it can also take advantage of the whole server for itself for more resource hungry optimizations without impacting running code.
The generated code is then reflected back into the client JVM instances.
In fairness, the Java version had an easier path for optimization, but there are no free lunches.
Sounds like they were determined to write Python on Java. Doing it that way likely has a lot of performance costs. However, you can’t assume that idiomatic Java code would take that much longer than Python code for a team that was familiar with Java. Likely it comes down to which languages and frameworks a team is familiar with.
We used spring expression language for dynamic evaluation of (user-defined) expressions in our code and for most cases we could compile and cache the expressions after first usage and invoking them was really fast and close to pure java expressions.
We also had some JVM-python interop which we eventually got rid of (in favor of kotlin) because we were unable to optimize it after a month of effort and it continued to be the biggest bottleneck in the system.
So I am not entirely convinced that there could be real-world usage scenarios that inherently demands so much runtime dynamism that most benefits of JVM optimizations are nullified.
Of course, I'd love to be enlightened otherwise, but rather happy with JVM as of now.
Also, if you are looking for interop, then GraalVM might be worth a look — not the better-known AOT part, but the runtime one, which can seamlessly do interop between a number of languages, and it even optimizes between them!
What I intended to convey in my previous comment was that using strategies like pre-compilation (eg. Spring EL) it is possible to get good performance even for dynamic logic not known at runtime.
So I was curious what was so dynamic about this use case that JVM performance drops down to pythonesque level.
I don't want to speculate - maybe there is something that JVM is unable to optimize; maybe it is something weird happening in the library; or maybe python has gotten really better in recent past or this use case was able to benefit from some python lib with native bindings.
I’m pretty sure it was more than 10 years ago that Charles Nutter wrote a detailed description of why that wasn’t going to happen without breaking compatibility with Ruby, identifying the specific language features preventing that.
One thing that comes to mind is that a lot of performance critical stuff in ruby is implemented via native libraries. The Jruby ecosystem has alternate implementations for a lot of that stuff. But it is probably also able to interface with native code directly. That sounds like that might be a little bit of a bottleneck potentially. And any alternate java based replacements for whatever is being called might have its own issues/bugs/etc.
But instead of hypothesizing what the problem might be (and getting that wrong repeatedly), profiling tends to be much more effective indeed. I've done this a couple of times to diagnose performance issues and it rarely is anything you'd expect. Once you know where it is spending its time, you can usually mitigate the issues. Use a profiler, add some logging, instrument the jvm, etc. There are lots of ways to do this. Even just knowing how often it starts a new process would be good to know. It's apparently more than once because otherwise you'd expect --dev to not speed things up like it did.
I guess both because CRuby has gotten a lot faster, and because in practice any difference is easily paved over with extra hardware (which is cheap).
A lot of people have realised that dynamic typing hinders maintenance of long lived projects and the tooling and dev experience with type safe languages have also gotten much better over last few years.
Despite having worked with Ruby for multiple years, I pick Kotlin/C# for new projects.
I know ruby has recently introduced support for typing, but until the wider ecosystem embraces type-safety it is gonna be an uphill battle to write type-safe code in ruby.
This process and lack of data in itself sounds like a receipe for performance problems in the JVM version, unless there was some low probabvility coincidence that prevented getting the profile data only in this case. Good measurements are a prerequisite for sustainable and maintainable performance work.
Q: Why would one do that? A: Ability to bundle your code to hadoop machines you don’t control
That said, I wish the article would just include numbers without the startup times. I also remember people claiming back at that time JRuby would be much faster than MRI.
Even something like mmap can drastically improve performance since it lets the kernel handle I/O asynchronously from your program execution (so your code doesn't block as much or as easily on I/O)
JRuby and it's active maintainer @headius have been great to work with. Surprisingly a very vibrant and active community.
I thought looking into why something was slow was worth writing down as a blog post.
The biggest gotchas in JRuby are finding alternatives to native extensions and memory use. What's neat is plugging into JMX from Ruby land.
I don't think a lot of people are aware of just how much faster the Ruby ecosystem has gotten in the past few years (especially when you leave Rails out of the equation which is known not to do well in microbenchmarks).
We don't deserve Matz. He's too good to be true. :)
Crystal may be fast, but its definitely bot Ruby. Choices for fast not-Ruby are not lacking.
> Why are we surprised when attempts to shoe-horn it into something else (Ruby 3.0, JRuby, Truffle Ruby) fall flat?
Weird that you don’t put Ruby 1.9+ on that list, though that is as much or more a switch from what immediately preceded it with parallelism as an improvement area as 3.0 is (sure, Ractors are a bigger language change, but going from green threads to native threads with a VM lock was a major implementation change.) The difference is that 3.0’s relevant improvements are still experimental and its easier to misrepresent “haven’t yet stabilized and seen wide production use” as “fell flat” than it would be to claim thr same thing about 1.9’s improvements. But its not true in either case.
Whilst they are doing their multiyear rewrites, it would be nice if someone could optimise the ruby runtime a bit for them.
This means many man-years, and a certain break in the continuity, maybe with small but noticeable deviation in the VM's behavior. If we squint just so, we can consider Ruby 3.0 to be such a rewrite.
Crystal is a similar rewrite effort, but a one that isn't trying to stay backwards-compatible.
I don't follow Lua, but I got the feeling it is pretty much stuck on its last version.