
JIT and Ruby's MJIT - stanislavb
http://engineering.appfolio.com/appfolio-engineering/2019/7/18/jit-and-rubys-mjit
======
ksec
I really do wish the Community as a whole embrace TruffleRuby rather than
leaving it on the sideline. GraalVM, which TruffleRuby is based on, along with
TruffleRuby is literally few hundred million dollar of investment as an
implementation over the years.

Much of the problem has to do with C extensions, despite the team's effort to
even have C Code Compiled with TruffleRuby, they also had to go all the way to
fix all the C library that didn't work with TruffleRuby due to undefined
behaviours.

~~~
onli
Nothing touched by Oracle will be adopted, nor should it. It would be reckless
and stupid. The risk to base your technology stack on such a plainly evil
actor is just not worth it. Doesn't matter how many millions the investment
was.

~~~
pjmlp
Your loss then.

Graal has an open source license, and all corporations are alike, regardless
of what hippie developers think, showing to the man and such stuff.

Instead other language stacks will get adopted into detriment of Ruby, e.g.
Go.

~~~
tsomctl
> all corporations are alike

lol.

> "what you think of Oracle is even truer than you think it is. There has been
> no entity in human history with less complexity or nuance to it than Oracle"

> "this company is about one man and his alter ego and what he wants to
> inflict upon humanity"

> You need to think of Larry Ellison the way you think of a lawnmower. You
> don't anthropomorphize your lawnmower, the lawnmower just mows the lawn, you
> stick your hand in there and it'll chop it off, the end. You don't think
> 'oh, the lawnmower hates me' \-- lawnmower doesn't give a shit about you,
> lawnmower can't hate you. Don't anthropomorphize the lawnmower. Don't fall
> into that trap about Oracle.

[https://www.youtube.com/watch?time_continue=2318&v=-zRN7XLCR...](https://www.youtube.com/watch?time_continue=2318&v=-zRN7XLCRhc&feature=emb_logo)

~~~
pjmlp
As you wish, HN is not the place for this kind of quality content anyway.

------
hirundo
> Unfortunately, there is one huge problem with Ruby’s current MJIT. At the
> time I write this in mid-to-late 2019, MJIT will slow Rails down instead of
> speeding it up.

> That’s a pretty significant footnote.

Oops.

> In addition to memory usage, there’s warmup time. With JIT, the interpreter
> has to recognize that a method is called a lot and then take time to compile
> it. That means there’s a delay between when the program starts and when it
> gets to full speed.

Why not cache the compiled methods to make warmup a once-per-version delay? It
would be a JIT/precompiled hybrid. Call it gradually compiled.

~~~
TylerE
It's a dynamic language. Typical JIT compilation involves generating
specialized versions over specific types. You need run time profiling to know
what to generate.

~~~
hirundo
Seems like the profile from run 1 would be pretty predictive of run 2. And the
profile and cache can be updated during each run.

~~~
sams99
Some experiments are underway soon, current implementation uses a cache that
is tightly scoped to a specific run including stuff such as class serial
numbers that can not be reused between runs safely

------
elliotlarson
I’m excited about this work. Having a faster Ruby would be some wonderful
icing on the cake. But, for me, using Ruby (and Rails) has always been about
optimizing for developer hours over system performance. IMO Ruby is not a race
horse, but it’s plenty “fast”. The real value is how quickly my team can
iterate, and how enjoyable the process is.

~~~
pjmlp
Languages like Smalltalk or Lisp can have it both ways, the problem with Ruby
is the manpower to have a comparable JIT available, although MJIT isn't the
only one already available.

~~~
igouy
> Languages like Smalltalk or Lisp can have it both ways...

You seem to be suggesting that Smalltalk implementations are a whole lot
faster than current Ruby ?

They are a whole lot faster than Matz's 2008 Ruby 1.8.7 but so is current
Ruby:

[https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/fastest/yarv-mri.html)

~~~
pjmlp
I don't any commercial versions there, which is what I was referring to.

~~~
igouy
If you meant to say that no commercial implementations are shown on the
benchmarks game website — that is not true.

[https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/fastest/smalltalk.html)

and

"Largest Provider of Commercial Smalltalk

...Cincom is the largest commercial provider of Smalltalk in the world, with
twice as many partners and customers than all other commercial providers
combined."

[http://www.cincomsmalltalk.com/main/products/visualworks/](http://www.cincomsmalltalk.com/main/products/visualworks/)

------
Lerc
I was a bit surprised by

>When a method has been called a certain number of times (10,000 times in
current prerelease Ruby 2.7), MJIT will mark it to be compiled into native
code and put it on a “to compile” queue. MJIT’s background thread will pull
methods from the queue and compile them one at a time into native code.

I would have thought that you wouldn't wait for 10,000 iterations but just
start from the beginning and compile the methods with the most calls and keep
the compiler thread busy. Flush less frequently or no-longer needed compiled
calls and limit the total to some defined resource usage cap. You'd probably
win overall.

~~~
Twirrim
For what it's worth, the JVM defaults to 10,000 times for a method being
called before deciding it is hot, so that threshold is not without precedent
(but can be adjusted via XX:CompileThreshold)

~~~
nycdotnet
Interesting. In .NET Core 2.1, they decided just 30 calls was enough to
recompile from the fast “tier 0” JIT code to the fully optimized “tier 1”
JITed code.

[https://github.com/dotnet/coreclr/blob/master/Documentation/...](https://github.com/dotnet/coreclr/blob/master/Documentation/design-
docs/tiered-compilation.md#tiered-compilation-policy-21-rtm)

~~~
Twirrim
10,000 calls really isn't that many, when you consider the sorts of operating
environments that the JVM is targetted at.

30 seems crazy low to me. That seems like you'd be spending a bunch of compute
time early on compiling stuff that may be only used during the start-up stages
of your code.

------
felixarba
Would be interesting to hear why it doesn't play nice with rails yet

~~~
haimez
Because rails is extremely eager to redefine classes and monkey patch
instances by design. This is hostile to almost all JIT models. V8 spends a
very large amount of engineering time optimizing specific cases of handling
the common cases of this, but it’s non trivial.

~~~
hsm3
Aren't most of those shenanigans done early in the VM's lifespan (as Rails
boots, actions are called the first time,...), after which the world would be
stable, and JIT-related caches wouldn't have to be lost?

~~~
t-writescode
New Rails dev here: I believe the entire ActiveRecord model is constant monkey
patching.

~~~
toasterlovin
Do you have something specific in mind? From my understanding of ActiveRecord,
most of the method generation should happen the first time the class is
defined.

~~~
viraptor
Closer to the first time it's used. Schema gets queried and methods get added
the moment you do the first query on the model normally.

~~~
mperham
This is not true in production, where eager loading is always on.

------
saagarjha
> And while it’s possible to tune a JIT implementation to be okay for warmup
> time, most JIT is not tuned that way.

Most JavaScript JITs?

~~~
gsnedders
They are the notable exception, yes. Note that aside from V8, all have
multiple JITs (and V8 is gaining a mid-tier compiler, so they can move away
from the interpreter sooner), which adds a fair amount of extra complexity and
maintenance cost. All have pretty advanced interpreters, which also helps make
warmup smoother.

------
joelbluminator
Did I get this right: the JVM also uses a JIT, so basically first you compile
your java program into java byte code, and then once it's running there's also
a JIT running to further optimise the bytecode ?

~~~
pjmlp
Kind of.

First of all there are many JVM implementations, the commercial ones used to
offer AOT compilation as well, going back to early 2000.

Then the ones that have JITs offer multiple flavours.

One way is to initially interpret the bytecodes, after enough information it
is gathered, the first level JIT gets into action and compiles that block into
native code, here block is usually a function, but can be something else.

This first level compiler is rather simple and does only basic optimizations.

The application keeps profiling execution and eventually notices that the
already compiled block (into native code) keeps being used significantly, now
it is time to bring the big brother JIT, which is somehow equivalent to -O3 on
gcc, and recompile again to native code using all major optimizations.

Other JVMs (like JRockit) never interpret, when they start the first level is
already the dumb level one compiler to native code.

Then all of them now support JIT caches, meaning after a run, the JITed
methods get saved and re-used by next execution, so the profiler gets to learn
from previous runs, and execution of the system already starts from a much
better performance state.

~~~
thenewnewguy
Would their interpretation would be correct for the HotSpot JVM? (As in does
HotSpot start with interpretation and JIT "hot" code?)

~~~
pjmlp
Yes, although HotSpot has multiple layers, not just two.

It initially interprets, and when a specific threshold is reached (you can
configure it), the C1 compiler gets called into action doing basic
optimizations.

After awhile if that native generated code keeps getting even more hot, the C2
compiler (the one with -O3 capabilities) gets called into action.

In both cases the optimized code gets safety guards to validate that the
assumptions made by the JIT are still valid. For example if a dynamic dispatch
always lands on the same method, then it gets replaced by a direct call
instead. Even that is proven wrong, then the JIT throws the optimized code
away and starts with the new assumptions.

Then in what concerns OpenJDK, you have actually 2 C2 JIT compilers available,
HotSpot written in C++ and still the default, and Graal written in Java taken
from GraalVM (nee MaximeVM) project. Currently Graal is much better than
HotSpot in escape analysis for example, but worse in other scenarios.

In both cases, OpenJDK has inherited the JIT cache infrastructure from
JRockit, so you also get to save the native code between runs, and start much
faster in consequent runs.

As note, even though it is usually not a good idea, if you set the interpreter
threshold to zero, then C1 kicks right at the beginning, but it won't have any
information available, so the generated code is going to be most likely worse
than just interpreting.

------
ddtaylor
Heads up this could have been submitted as an HTTPS link instead.

------
mrtweetyhack
"If you can’t write the .c files to compile, you can’t compile them." So it
generates C, why not just compile the C into an executable?

~~~
sosodev
It does. If you’re asking why not AOT compile a whole ruby app? Well Ruby is a
really dynamic language and the generated C might not be valid after the app
begins execution. I guess theoretically you could enumerate all of the
different possible class states or the final class states but that sounds
really hard.

If you actually need a compiled Ruby binary mRuby is the best option but it
has its own drawbacks of course.

------
seaghost
I’m closely following both Ruby and PHP communities and I can say that
unfortunately PHP is light years away from Ruby in terms of language features.

------
sunseb
Ruby: too little, too late. It's sad. :(

~~~
thomasfedb
The majority of workloads still respond better to optimising for programmer
satisfaction than for raw speed. Ruby still has plenty of relevance.

