
Mir: A lightweight JIT compiler project - ksec
https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/
======
pizlonator
This project is trying to do too many things.

It seems like Ruby needs a profile guided optimizer, which means building an
IR that is suitable for profile-guided optimization. That’s way different from
classic IRs like this since it means having provisions for OSR exit.

I recommend looking at these slides to learn how to do it.

[http://www.filpizlo.com/slides/pizlo-speculation-in-jsc-
slid...](http://www.filpizlo.com/slides/pizlo-speculation-in-jsc-slides.pdf)
[http://www.filpizlo.com/slides/pizlo-splash2018-jsc-
compiler...](http://www.filpizlo.com/slides/pizlo-splash2018-jsc-compiler-
slides.pdf)

~~~
chrisseaton
I agree - I think MIR is trying to produce better machine code more quickly,
but that isn't the problem Ruby faces! The problem Ruby faces is having to
inline through ten levels of metaprogramming in order to get any meaningfully-
sized compilation unit that you can optimise, _and only then_ it makes sense
to worry about code generation.

However, the reason that this is not so simple is that Ruby is a vastly more
complicated language than JavaScript (I've worked on implementing both.) Ruby
has an enormous standard library and most Ruby programs are just endless calls
to the that library, so your compiler must be able to understand the library
semantically, either by rewriting it in Ruby (not likely at scale in MRI) or
by adding tens of thousands of individually optimised intrinsics (again not
likely).

Ruby is digging itself further into a local optima with these approaches
optimisations, rather than looking further around for a better global optima.

~~~
pizlonator
Your description of optimizing Ruby sounds like it is exactly like optimizing
JS. JSC’s main optimizing IR (the DFG) is all about understanding the standard
library semantically. We write the standard library in JS+hacks (called
“builtins”) or C++ (pick on a per function basis). Some of those functions
have opcodes in DFG, or are built out of primitives that have their own
specialized opcodes in DFG.

Maybe the reason why Ruby optimization has problems is nobody has done it the
JSC way.

~~~
chrisseaton
> Your description of optimizing Ruby sounds like it is exactly like
> optimizing JS.

Yes it's the same problem and it's unique to neither Ruby nor JS - but the
problem is just scale.

Ruby has a larger library, so needs more rewritten from the current C into
Ruby+builtins. And then that rewrite doesn't maintain Ruby semantics (the C
API currently used does not exactly match Ruby semantics) so a rewrite
matching semantics is often very hard.

> Maybe the reason why Ruby optimization has problems is nobody has done it
> the JSC way.

People have been trying the approach you use in Core (for over a decade,
starting with Rubinius, now TruffleRuby and others), but I think (based on
practical experience working on optimising both languages) that it's just a
larger problem in Ruby which is why it hasn't been conquered yet.

~~~
pizlonator
I think that the way you’re describing the solution for Ruby tells me that you
don’t see the problem the way that I see it. The problem isn’t rewriting
things in builtins. The problem is making an IR in which you can reason about
the standard library at scale: reason about its speculation opportunities, all
of the implied dependencies and effects, it’s GC impacts, etc.

None of the IRs I’ve seen people try for Ruby does that. JSC’s DFG IR does a
lot of this. Hence I don’t think folks have really tried the JSC approach for
Ruby.

~~~
jashmatthews
> The problem is making an IR in which you can reason about the standard
> library at scale

How do you deal with functions written in C++ which invoke functions written
in JS?

~~~
pizlonator
The important ones are intrinsics. The C++ function is understood by the JIT
so well that it just emits the code for it itself.

Our C++->JS calling convention sucks, partly because we just avoid going down
that path.

We do have a C++->JS call IC that we could use more.

~~~
chrisseaton
> The C++ function is understood by the JIT so well that it just emits the
> code for it itself.

But how would that work for Ruby, where C functions on the critical
performance path are often third-party code that the compiler author has never
seen before?

We'd need to let the JIT understand third-party unseen C functions. There are
experiments to do that (Sulong, MIR, Rubinius sort of tried it) but I think
it's more of an open problem than you're implying.

If you treat calls to unknown third-party C functions as an opaque native call
then you're really going to struggle to build a meaningful compilation unit,
in my experience.

~~~
tln
It looks like the author is intending to use a C to MIR compiler so that
existing CRuby code can be inlined into generated MIR code. Third party code
could be compiled the same way, right?

"The blue parts show the new data-flow for MJIT. When building CRuby, we could
generate MIR code for the standard Ruby methods written in C. We can load this
MIR code as a MIR binary. This part could be done very quickly."

~~~
chrisseaton
That's what MIR is doing, not what JavaScriptCore is doing.

Flip thinks this isn't needed for Ruby - '[t]his project is trying to do too
many things' \- and that JavaScriptCore could do it already.

I think that MIR and TruffleRuby think they have to do something else (lifting
C code into their IR) shows us that JavaScriptCore's approach isn't quite as
immediately applicable as Filip thinks it is.

~~~
pizlonator
It’s more that I think that you won’t get enough semantic understanding of C
extensions lifted into any IR for that to be useful.

------
bakery2k
From GitHub [1]:

    
    
      "Plans to try MIR light-weight JIT first for CRuby or/and MRuby implementation"
      "MIR is strongly typed"
    

Is there an explanation of how the project bridges the gap between
dynamically-typed Ruby and statically-typed MIR?

More generally, I'd love to see something like MRuby+MIR be successful. It
would be great to see an alternative to the aging LuaJIT.

[1] [https://github.com/vnmakarov/mir](https://github.com/vnmakarov/mir)

~~~
vnorilo
Seems to me that MIR operates on a (much) lower level, basically abstract away
the physical machine and its finite register set. As such, it would replace
LLVM (or GCC) middle and back end. The goal is much faster compilation without
sacrificing more than ~20% of performance.

Dynamic types and garbage collection would then be implemented on/for the
abstract MIR machine.

------
nn3
Interesting. Seems to be the first modern non trivial compiler that avoids
using SSA internally. I thought SSA had cornered the market, but perhaps now
there is an opposite trend (ok one example doesn't make a trend)

<quote> No SSA (single static assignment form) for:

Faster optimizations for short optimizations pipeline and small functions (a
target usage scenario) Currently SSA could be used only for two optimizations
(CCP and GCSE). SSA usage would mean 4 additional passes over IR. If we
implement more optimizations, SSA transition is possible when additional time
for expensive in/out SSA passes will be less than additional time for non-SSA
optimization implementation Simpler and more compact generator code because we
can avoid to implement a lot of nontrivial code (for dominator and dominator
frontier calculation, a good out of SSA code) </quote>

~~~
tom_mellior
> Seems to be the first modern non trivial compiler that avoids using SSA
> internally.

Of course it's debatable what "modern" and "non trivial" mean, and if they
apply to this project. It's very small, but on the (one!) small benchmark the
author cites it seems to do quite well, so it's certainly not completely
naive.

For whatever it's worth, CompCert doesn't use SSA either, and that's certainly
a non-trivial compiler, though arguably the non-triviality does not stem from
any advanced optimizations it does.

------
ksec
There are quite a few decent active Ruby Implementation going on at the
moment.

CRuby with MJIT

MIR ( This )

JRuby ( Ruby on JVM )

TruffleRuby ( Ruby on Graal )

Artichoke ( Ruby on Rust )

And I remember someone mentioned making Ruby with Tracing JIT. ( Not Topaz )
Unfortunately My Google fu is not good enough I can no longer find it.

~~~
boulos
Rubinius. Here are Evan’s slides from the 2009 LLVM developer conference:

[https://llvm.org/devmtg/2009-10/Phoenix_AcceleratingRuby.pdf](https://llvm.org/devmtg/2009-10/Phoenix_AcceleratingRuby.pdf)

EngineYard took Rubinius in a few directions, but I think the main lasting
impact was all the RSpec work they did along the way.

~~~
ksec
Edited my original post. I meant Implementation that are still actively
developed. Both Rubinius and Topaz are no longer being maintained.

~~~
YorickPeterse
I believe Rubinius _is_ being maintained (again), it's just going in a
direction very different from Ruby.

------
mratsim
I would be quite interested in an IR/JIT assembler specialized for vector
instructions with ARM Neon and x86-64 SSE~AVX512 output.

Ideally it handles register allocations and generated function caching as
well. The current JIT assembler (ASMJIT, Xbyak) requires you to handle
register allocations. LLVM is as mentioned quite a heavy dependency to have.

~~~
Asm2D
asmjit has a register allocator, for sure not the highest quality one, but
it's there in the asmjit's Compiler infrastructure.

------
MaxBarraclough
The article stresses how their JIT is much more lightweight than GCC/LLVM,
which is perfectly valid, but why not compare MIR against the _other_ portable
lightweight JIT engines out there? They're not the first to think of it.

The article mentions Cranelift, but that's a 'middleweight JIT' with a proper
SSA IR. I was surprised to see LibJIT has more LOC than Cranelift - I thought
it was lighter. (Imperfect proxy for runtime 'weight', of course.)

If you want a lightweight portable JIT engine, there's already _GNU Lightning_
[0], and the atrociously-named _Lightening_ fork [1] (used in the new JIT in
the _GNU Guile_ Scheme interpreter, which turned up on the HN front page
recently).

Here's a 1996 paper (preprint) on a research JIT named _VCODE_ which executed
around 8 instructions to generate each instruction in its output. [2] (Sadly
it was never released, as far as I can tell, and is presumably long dead.)

Anyway, with all that said, I wish this project well. No-one's managed to get
good performance out of Ruby yet, so it's certainly ambitious. Google gave up
on Unladen Swallow, and that was a JIT for Python, which, as I understand it,
is more amenable to JIT than Ruby. Even failing that, having a quality rival
to GNU Lightning would be worthwhile.

[0]
[https://www.gnu.org/software/lightning/](https://www.gnu.org/software/lightning/)

[1] [https://www.wingolog.org/archives/2019/05/24/lightening-
run-...](https://www.wingolog.org/archives/2019/05/24/lightening-run-time-
code-generation)

[2] [http://www-
leland.stanford.edu/class/cs343/resources/vcode-a...](http://www-
leland.stanford.edu/class/cs343/resources/vcode-annotated.pdf)

------
scythe
How is this meaningfully different in scope and intention vs Parrot? That
project went on for a long time until every language (including its original
target, Raku) decided they’d rather build their own more specialized JIT. What
would prevent MIR from meeting the same fate?

~~~
jashmatthews
MIR is much more flexible. It's like CraneLift but more basic.

Anything you write in MIR can be compiled by MIR. Parrot only JITs Parrot
bytecode, so the applications are much more limited.

------
tom_mellior
> implement the GCC C extensions necessary for the CRuby JIT implementation

That's the point where "C is nice and simple, it's easy to whip up a compiler"
invariably turns into "why the #^(&)# didn't I use an existing frontend?".
Real-world C code is _messy_. The entire sub-project of implementing a C
compiler is a needless distraction that will turn into a huge time-suck with
zero benefit to the author.

See also: "Why Says C is Simple?"
[https://people.eecs.berkeley.edu/~necula/cil/cil016.html](https://people.eecs.berkeley.edu/~necula/cil/cil016.html)

------
mark_l_watson
Good analysis on memory usage and how that negates use in mobile and IoT
projects.

Ruby was my go to language for a long dry spell when I had little Lisp
development jobs (except for Clojure). Ruby is a great language, as Matz says,
Ruby is designed for developer happiness. I stopped using Ruby when more
Common Lisp work came my way and then I used Python for five years of deep
learning work. I have favorite languages but I used what customers wanted.

That said, I still keep up with Ruby news.

------
kbumsik
Hey Mir display server is still alive :)

~~~
DC-3
Yet another drastic pivot!

------
DominoTree
Hey everyone, please stop naming things Mir for a while. Thanks!

~~~
giancarlostoro
>
> [https://en.wikipedia.org/wiki/Mir_(disambiguation)#Science_a...](https://en.wikipedia.org/wiki/Mir_\(disambiguation\)#Science_and_technology)

I feel like this list is missing some more, but yeah, there's been a few 'Mir'
projects.

~~~
jordigh
I was thinking of this Mir:

[http://docs.mir.dlang.io/latest/index.html](http://docs.mir.dlang.io/latest/index.html)

~~~
giancarlostoro
I was thinking of Rust's MIR being missing too:

[https://blog.rust-lang.org/2016/04/19/MIR.html](https://blog.rust-
lang.org/2016/04/19/MIR.html)

~~~
wyldfire
> That is, we are introducing a new intermediate representation (IR) of your
> program that we call MIR: MIR stands for mid-level IR, because the MIR comes
> between the existing HIR ("high-level IR", roughly an abstract syntax tree)
> and LLVM (the "low-level" IR).

It's so terribly confusing because LLVM itself defines a .mir (Machine IR)
that is a syntax between its IR and the backend (which could be thought of as
mid-level IR). When I heard about Rust's MIR, I assumed it was some clever way
to generate target-dependent IR.

The good news is that we now have M(L)IR: the one IR to bring them all and in
the darkness bind them.

~~~
anp
If anything I expect MLIR to make things more confusing as Rust’s MIR is
probably going to be around for the long haul.

