
Introducing MIR - steveklabnik
http://blog.rust-lang.org/2016/04/19/MIR.html
======
kibwen
I'm incredibly psyched for MIR. It'll represent the first conceptual "leap"
for the language since Rust 1.0, checking off (or at least unblocking) a ton
of the improvements that we've had in mind for years now (the list in the OP
is hardly comprehensive!).

It's also interesting that Swift, another LLVM-based project, has also
transitioned to using their own language-specific intermediate representation
(SIL). I think it's fascinating that, despite the availability of a robust and
mature pluggable language backend, two professional languages independently
decided that they needed _another_ layer of abstraction between AST and LLVM
IR. Not sure if this should be interpreted as a bug report against LLVM IR, or
if it possibly represents a future trend in compiler design.

EDIT: I should also mention that MIR has been available on play.rust-lang.org
for a while, if anyone would like to play with it in their browser:
[https://play.rust-
lang.org/?gist=fee8ccf28bae2c89107d&versio...](https://play.rust-
lang.org/?gist=fee8ccf28bae2c89107d&version=nightly)

~~~
nostrademons
It's been pretty common for compilers to require 3 different intermediate
representations (parse tree, MIR, and LIR). My copy of _Advanced Compiler
Design and Optimization_ [1] (over 10 years old at this point) lists the 3-IR
architecture on page 8. It references Sun's SPARC compilers, DEC's Alpha
compilers, Intel's x86 compilers, and SGI's MIPS compilers as implementations
that use it.

What _has_ changed are the boundaries. Traditionally, semantic analysis
(including typechecking) operated on the parse tree, which resembled the
human-written language extensively. Optimization operated on MIR, then it'd be
lowered to an architecture-specific LIR via register allocation and
instruction selection, and another round of optimization (instruction
scheduling, etc.) would be applied. The purpose of all of this was to provide
multiple language front-ends on top of a single compiler. In the 80s and 90s,
compilers were often written by the hardware vendor, so they would tightly
optimize the MIR and LIR for their own architecture, and use the parse-tree =>
MIR lowering to support multiple surface languages.

LLVM chose a LIR that's closer to what many compilers had been using as MIR,
and then moved the optimization passes into the LIR, and hid the architecture-
specific backends within the LLVM project itself, out of the eyes of language
designers. It could do this because of open-source: with a shared body of
compiler code owned by everyone and gradual consolidation of the hardware
market, it became easier to contribute your backend to LLVM than to maintain
your own compiler stack and fight for adoption. (GCC actually had a fairly
similar architecture first, but the GCC IR was very difficult to comprehend if
you weren't a GCC maintainer, which made it impractical as a compilation
target for outside projects. They generated C code instead and let GCC compile
it.) That in turn made it much easier to write a compiler and experiment with
language design, since you only had to figure out how to translate your
language to LLVM's IR rather than work out the details of scheduling and
register allocation. _That_ , in turn, allowed greater complexity in language
features: Swift and Rust have language features that go beyond what cutting-
edge research was <10 years ago, and do so in a production language that you
can use _now_. And so it's not surprising that they're now re-introducing a
MIR to manage the additional complexity introduced by the new language
features.

[1] [http://www.amazon.com/Advanced-Compiler-Design-
Implementatio...](http://www.amazon.com/Advanced-Compiler-Design-
Implementation-Muchnick/dp/1558603204)

~~~
infogulch
I tried to upvote you, but I fat-finger downvoted you instead. Sorry. :(

~~~
BinaryIdiot
A slight side note: this is one of the most difficult parts with using Hacker
News on a mobile device. I fat finger upvotes _all the time_! I wouldn't mind
the ability to change a downvote to an upvote and vice versa, even if it was
time limited so I could correct any incorrectly voted items.

~~~
tetraodonpuffer
couldn't this be easily fixed by putting the downvote triangle on the other
side of the time?

(upvote arrow) name xxx hours ago | parent | flag | (downvote arrow)

would make it less fat finger prone and still be pretty straightfoward to use
(plus having downvote close to flag IMHO makes sense as well from a context
perspective)

~~~
odbol
Could be easily fixed by one line of CSS to make the dang buttons larger, but
apparently Hacker News can't be bothered to improve their shitty website.

EDIT: great community, shitty website.

------
skrebbel
I suspect that the MIR idea could (should?) become a pretty common concept for
a large number of LLVM-based languages. the LLVM IR is very low-level, and in
most modern languages there's a big gap between the programmer-friendly
features they have and "the bare essentials that still capture the same
ideas".

I also suspect that the act of designing a MIR for a language is a fun one,
should be pretty damn enlightening. It really forces you to focus on what's
"core" about the language.

In my own, limited, experience with language design and compilers, I've found
that doing something like a MIR greatly simplifies other parts. Code written
in MIR (in my case, it didn't even have a concrete syntax, just an abstract
syntax tree) is full of duplication and is ridiculously explicit, but it's
super easy to process in all kinds of ways.

Just plainly guessing, but based on this blog post, if I'd write a Rust
interpreter, I'd probably run it on MIR and not on plain Rust or LLVM IR.

~~~
eddyb
You are correct that a Rust interpreter would work great on MIR, which is why
Scott Olson (@tsion) has been working on one:
[https://github.com/tsion/miri](https://github.com/tsion/miri) (check out the
slides and the report).

------
drostie
What I want to see in a language would be a bunch of syntactic sugar which
corresponds to a MIR like Rust's by a well-specified reduction that is likely
to be consistent from build to build.

The reason that this could be really cool is that you could hypothetically
hash these MIRs and store these hashes in a public database of open-source
code. (You would have to include hashes of whatever functions they call/depend
on, and you would have to solve foreign-function imports.)

Then you would have this cool language where modules are simply saying "here's
a mapping of names to hashes in the public database," with automatic
recognition of "hey, these two functions you're importing from these two
modules are actually the same function, I don't need to disambiguate them." If
your dependency graph isn't a tree and one of the branches can't cope with a
dependency-upgrade, that's fine too -- you upgrade the dependency on the one
branch only. The point is, all of your dependency-semantics are now reduced to
a really straightforward reasoning about unique names.

~~~
kibwen
Why would you prefer to hash an intermediate representation, rather than
hashing the source?

~~~
lsaferite
Maybe two different sources compile to the same IR?

~~~
cwzwarich
If the IR contains file and line number info, that seems pretty unlikely.

~~~
drostie
Right, the point would be for the hash to strip out file info, line numbers,
whitespace, comments, any stylistic choices which turn out to be equivalent
"under-the-hood" in some straightforward way, and probably even the bound
variable names (with de Bruijn indexes).

------
haberman
I'm curious how this new architecture preserves safety invariants. The post
says MIR contains operations not available in Rust because they are unsafe.
Does this mean every MIR transformation needs to be safety-preserving? Or is
safety-checking the last stage that runs directly before LLVM IR generation?

~~~
pcwalton
MIR transformations do have to be safety-preserving, but this isn't any
different from all of the LLVM IR-level optimizations.

~~~
haberman
But the MIR definition of "safety" must be more encompassing than LLVM IR,
right? LLVM IR has no concept of a borrow. Where in the MIR pipeline does
borrow checking happen?

~~~
pcwalton
I believe that when it's implemented it will happen first before any
optimization, in order to provide the best diagnostics possible and to avoid
doing wasted optimization if there are errors. But Niko would really be able
to answer this more definitively.

------
nbaksalyar
That's awesome work, thank you!

I'm wondering if MIR will make it possible to replace the backend IR. For
instance, to use WebAssembly instead of LLVM IR.

Or is it still a better idea to base it off of LLVM?

~~~
eddyb
There is discussion of a WebAssembly backend which lowers MIR directly, see
[https://github.com/rust-lang/rust/issues/33205](https://github.com/rust-
lang/rust/issues/33205).

------
sudeepj
Wow! More than the actual concept (which is cool), I found the level of
technical writing is of high calibre .. highly educating. Pleasure to read
such articles.

------
cm3
Will this allows us to get TCO? I'm used to writing recursive functions and
miss TCO all the time.

~~~
pcwalton
In theory it could allow for sibling call optimization by rewriting those
functions into loops. It would not allow full unrestricted TCO--that would
require both ABI work and thorny high-level design issues around destructors
that may well not be solvable at all.

However, in practice LLVM already has full support for sibling call
optimization internally, so implementing such an optimization at the MIR level
wouldn't really buy us anything over what we already have.

~~~
cm3
Then maybe rustc ought to have diagnostics that warn if a recursive function
won't be optimized fully, and then I can either avoid recursion or rewrite to
fit into the optimizer, but having to appease a changing optimizer isn't what
I'd call maintainable code, so a function attribute might make more sense.

~~~
geofft
There's a reserved and unimplemented keyword "become", which is like "return"
except it requires that it can optimize the tail call (and is a compile-time
error if it can't). See this comment: [https://github.com/rust-
lang/rfcs/issues/271#issuecomment-18...](https://github.com/rust-
lang/rfcs/issues/271#issuecomment-189386398)

Nobody's tried implementing it recently, though.

------
kccqzy
This really sounds like GHC's Core, a simplified, desugared, explicit
intermediate language used during compilation.

------
kevindqc
Not super familiar with Rust, but will this have any impact on debugging?

I imagine LLVM is generating debug symbols from what it receives (which would
be fairly representative of the original program).

Now with this extra step, won't LLVM generate debugging symbols for the MIR,
which doesn't really map well to the original code (ie: LLVM won't know all
those gotos were originally a loop)?

~~~
steveklabnik
MIR keeps track of where in the original source it was generated, so you still
get good debugging output.

~~~
kevindqc
nice! thanks :)

------
perflexive
I'm surprised to see no mention in the article or on HN about how this might
affect writing crypto code in Rust. Maybe it's a little tangential, but I'm
dying for constant time operations.

Avoiding branches, etc. in Rust doesn't mean LLVM won't add some as an
optimization, which is frustrating to say the least. It would be awesome to be
able to define a block - similar to "unsafe" \- that tells the compiler to
disable optimizations that could introduce non-constant time operations. When
I started reading the article, I though maybe this new development would open
the door to something like that, but it doesn't appear to be the case.

There's some work to do constant time ops in rust, but it's very experimental
and untrustworthy. :/

~~~
steveklabnik
Since this is before the LLVM passes, it doesn't really change anything, and
can't.

~~~
perflexive
I have no experience with LLVM internals, but couldn't you split up the code
and disable some of LLVM's optimizations on blocks defined by the programmer?

Or do you have to pass the entire program at once to LLVM? Maybe the ability
to disable optimizations only on certain functions is possible, idk, but I
think it certainly would be nice.

~~~
steveklabnik
I'm not knowledgeable about that, to be honest.

------
munificent
This is some really fantastically well done exposition. I loved it.

------
nv-vn
Awesome. This reminds me of the recent changes to the OCaml compiler
(flambda). If it's anything similar, can't wait for the performance
improvements from this (which can hopefully continue bringing Rust up to C
speed).

~~~
steveklabnik
In general, if Rust is significantly slower than equivalent C, it's a bug.
Please file them if you find them.

------
mushishi
What is the reason the MIR was not a part of the compiler right from the
start?

~~~
pcwalton
Because (a) I resisted it due to (mistakenly) thinking it was not necessary;
(b) the language was changing a lot and having a MIR to change would have made
that harder. If we had had a MIR from Rust 0.1, it probably would have had to
be completely overhauled by this point.

------
merb
Hopefully the IntelliJ Plugin of Rust will iterate faster and support more
completions. Actually it would be awesome to edit Rust the same way than
Scala/Java.

~~~
chrisseaton
I don't think this new work would be useful for that. It's an internal
compiler IR for programs in the process of being optimised and lowered, not a
general format for representing Rust programs for user applications.

~~~
jdmichal
Specifically, based on the thread covering WebAssembly as an output target
[0], rewinding the graph into higher control structures like loops may be non-
trivial.

[0] [https://github.com/rust-
lang/rust/issues/33205#issue-1509804...](https://github.com/rust-
lang/rust/issues/33205#issue-150980448)

------
sievebrain
It's interesting to compare and contrast this against bytecode from other
platforms like CIL or JVM bytecode. They're both essentially a kind of
compiler IR and they allow for incremental compilation, can be interpreted,
dynamically manipulated, structed as basic blocks, etc. But MIR is represented
using a textual form that still has scoping and which still uses Rust syntax.
I can see the appeal. But I wonder how much it complicates working with MIR.
You'd need a more complex parser, I guess.

~~~
pcwalton
It's kind of a moot point, because MIR never leaves its in-memory
representation in normal compilation. (However, that may well change with
future improvements to incremental compilation, though the on-disk format will
likely never be formally defined and stabilized--it's too much work for little
gain.)

Surface syntax of the compiler IR isn't particularly important anyway.

~~~
eddyb
Well, actually, we store MIR in crate metadata. However, it's not a special
serialization format, just the rustc_serialize infrastructure, i.e. you could
also serialize MIR to JSON if you really wanted to.

------
esarbe
I wonder; can someone explain to me how this compares to Scalas' TASTY? It
seems to be on about the same layer, but I understand too little about
compilers to be sure.

------
thomnottom
I assume this is not the X Window replacement from Canonical.

~~~
donatj
I thought before clicking they were going to announce that the X Window
replacement was being rewritten in Rust, lol. They should change the name.

~~~
kibwen
I don't think they're concerned overly much with the marketing presence of a
compiler implementation detail. :P

------
spv
I am not sure if Julia already has something like this, because there was some
work precompilation. Can someone throw some light on this ?

~~~
ihnorton
Yes, Julia uses a similar "lowered" form:
[http://docs.julialang.org/en/latest/devdocs/ast/#lowered-
for...](http://docs.julialang.org/en/latest/devdocs/ast/#lowered-form)

If you are interested in more details, there was a talk two years about the
compiler design: [https://www.youtube.com/watch?v=osdeT-
tWjzk](https://www.youtube.com/watch?v=osdeT-tWjzk) (many type names and
implementation details have changed since then, but conceptually it is still
very applicable).

------
spv
I am not sure if Julia already has something similiar to this ? Can anyone
throw some light on this.

------
monomaniar
every rust topic, i vote.

------
mirimir
MIR riffing on peace or world?

