
A possible new back end for Rust - obl
https://jason-williams.co.uk/a-possible-new-backend-for-rust
======
pizlonator
This is really great. The world needs more diverse compiler tech. The llvm
monoculture is constraining what kind of compiler research folks do to just
the things that are practical to do in llvm.

I particularly suspect that if something like Cranelift gets evolved more then
it will eventually reach throughput parity with llvm, likely without actually
implementing all of the optimizations that llvm has. It shouldn’t be assumed
that just because llvm has an optimization that this optimization is
profitable anywhere but llvm or at all.

Final thought, someone should try this with B3.
[https://webkit.org/docs/b3/](https://webkit.org/docs/b3/)

~~~
cfv
I still remember when Clang bringing LLVM along was seen as SO OUT THERE and
I'm just mentioning it because I find it weird to be old enough to see fads in
system languages come and start to go.

Just curious, do you have any examples of this "limitations" you speak of?
Sounds like a very interesting read.

~~~
pizlonator
Llvm makes some questionable choices about how to do SSA, alias analysis,
register allocation, and instruction selection. Also it goes all in on UB
optimizations even when experience from other compilers shows that it’s not
really needed. Maybe those choices are really fundamental and there is no
escaping them to get peak perf - but you’re not going to know for sure until
folks try alternatives. Those alternatives likely require building something
totally new from scratch because we are talking about things that are
fundamental to llvm even if they aren’t fundamental to compilers in general.

~~~
temac
I dislike UB, but I do at language level. When LLVM is reached, UB can only
have and only be continued to be removed, never added (from a global point of
view, applying general as-if rules a compiler can always generate its own
boilerplate in which it knows something can not happen, then maybe latter
leverage "UB" to e.g. trim impossible paths, that are really impossible in
this case -- at least barring other language level "real" UB). So are there
really any drawback to internal exploitation of "UB" (maybe we should call it
otherwise then) if for example the source language had none?

~~~
pizlonator
It is true that compilers sometimes have to have operations that have a
semantics that are defined only if some conditions hold. But LLVM's and C's
interpretation of what happens when the conditions don't hold is
extraordinarily liberal and I'm not sure that is either beneficial or sane.

Like, LLVM tries not to add UB, but design choices it made to support
optimization with UB do sometimes result in new UB being introduced, like the
horror show that happens with `undef` and code versioning.

So, I think that optimizing with UB internally is fine but only if it's some
kind of bounded UB where you promise something stronger than nasal demons.

------
dwheeler
One cool advantage of having multiple compilers for a language is that you can
use one as a check on the other.

For example, if you're worried that one of the compilers might be malicious,
you can use the other compiler to check on it: [https://dwheeler.com/trusting-
trust](https://dwheeler.com/trusting-trust)

Even if you're not worried about malicious compilers, you can generate code,
compiled it against multiple compilers, and sending inputs and see when they
differ in the outputs. This has been used as a fuzzing technique to detect
subtle errors in compilers.

~~~
gbrown_
> For example, if you're worried that one of the compilers might be malicious,
> you can use the other compiler to check on it:
> [https://dwheeler.com/trusting-trust](https://dwheeler.com/trusting-trust)

This still requires the use of a use of trusted compiler though. Comparing two
compilers arbitrarily shows if there is _consensus_ , it does not give
guarantees about _correctness_.

From the link.

    
    
        In the DDC technique, source code is compiled twice: once with a second
        (trusted) compiler (using the source code of the compiler’s parent), and then
        the compiler source code is compiled using the result of the first
        compilation. If the result is bit-for-bit identical with the untrusted
        executable, then the source code accurately represents the executable.

~~~
dwheeler
First, I forgot to disclose: I am the author of
[https://dwheeler.com/trusting-trust](https://dwheeler.com/trusting-trust) .

As discussed in detail in that dissertation, if you are using diverse double
compiling to look for malicious compilers, the trusted compiler does not have
to be perfect or even non-malicious. The trusted compiler could be malicious
itself. The only thing you're trusting is that the trusted compiler does not
have the same triggers or payloads as the compiler it is testing. The diverse
double compiling check merely determines whether or not the source code
matches the executable given certain assumptions. The compiler could still be
malicious, but at that point the maliciousness would be revealed in its source
code, which makes the revelation of any malicious code much, much easier.

You're absolutely right about the general case merely showing consistency, not
correctness. I completely agree. But that still is useful. If two compilers
agree on something, there is a decent chance that their behavior is correct.
If two computers disagree on something, perhaps that is an area where the spec
allows disagreement, but if that is not the case then at least one of the
compilers is wrong. The check by itself won't tell you whirch one is wrong,
but at least it will tell you where to look. In a lot of compiler bugs, having
some sample code that causes the problem is the key first step.

~~~
et2o
Sounds fascinating. Are there real-world examples of malicious compilers?

~~~
411111111111111
i read a story about a compiler adding malware to the compiled binary once.

they kept getting owned until they supposedly found a pretty dump hack which
just appended the backdoor to the final compilation on the build server...

no clue if it was just a story though, as i personally havent experienced
anything like that before.

~~~
gryfft
I don't think this is what you're looking for, but Coding Machines[1] is a
great little story in which the Ken Thompson hack[2] plays a role.

[1][https://www.teamten.com/lawrence/writings/coding-
machines/](https://www.teamten.com/lawrence/writings/coding-machines/)

[2][https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html](https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html)

------
liquidify
>>"That’s Bjorn3, he decided to experiment in this area whilst on a summer
vacation, and a year & half later single-handedly (bar a couple of PRs)
achieved a working Cranelift frontend."

Is this guy human? This is amazing, and this guy should be given an award.

------
WalterBright
The D programming language has 3 compilers, one with LLVM (LDC) one with GCC
(GDC) and one with the Digital Mars back end (DMC).

It's great to have all three, as they each have different characteristics in
terms of speed, generated code, debug support, platform support, etc.
Supporting these three also helps maintain proper semantic separation of code
gen from front end.

~~~
stjohnswarts
Has the D community been growing or shrinking over the past decade or so?
Staying relatively the same size?

------
korpiq
This feels welcome to me. I tend to think a language needs multiple
independent implementations that only share the same source language spec, in
order to really tear a clear spec apart from the quirks of any particular
implementation.

I find Rust (the spec, though also the implemenration) quite safe and
practical (a balance). It deserves some independent implementations to secure
a long and stable future.

On the other hand, I want to use it on non-ARM embedded platforms, where
current cross-compilation through C produces unusably big binaries. I dream
this might increase hope for that, too, eventually.

~~~
thesuperbigfrog
>> I find Rust (the spec, though also the implemenration) quite safe and
practical (a balance). It deserves some independent implementations to secure
a long and stable future.

Where is the Rust spec? Unless something happened really quickly that I was
not aware of there is only the implementation.

~~~
steveklabnik
[https://doc.rust-lang.org/stable/reference/](https://doc.rust-
lang.org/stable/reference/) is the closest thing we have. It is not yet
complete.

~~~
thesuperbigfrog
Thank you! I look forward to the day when there is a spec, but I was surprised
to see it mentioned and was wondering if I missed something big.

------
andrewprock
The thing that struck me most about the article was this quote from the Rust
Survey (2019):

“Compiling development builds at least as fast as Go would be table stakes for
us to consider Rust“

Go was designed from the ground up to have super fast compile times. In fact,
there are some significant language issues related to that design decision.

Using one of the primary design goals that impacted language structure as
"table stakes" is almost certainly going require a lot of effort with some
serious unintended consequences.

Improving compilation times sounds good. Aiming high is good. But reaching
"best of breed performance" is major initiative.

~~~
pjmlp
If you mean generics, D, Delphi, Ada and plenty of other languages prove you
can have them and still be pretty fast.

~~~
andrewprock
I mean interface{}

[https://golang.org/doc/effective_go.html#interfaces_and_type...](https://golang.org/doc/effective_go.html#interfaces_and_types)

------
Waterluvian
When writing a language like Rust, is the biggest challenge simply deciding
what Rust's features and behaviors should be? And implementing the syntax and
Rust -> LLVM compiler is really just a chore for the individuals who are super
familiar with the implementation of these languages? Or is the technical
implementation also genuinely challenging and non-obvious?

~~~
steveklabnik
First of all, deciding features and behaviors is _not_ simple. :)

There are a number of technical implementation challenges in the compiler.

It is a large project, and Rust's got a really intense stability policy.

The compiler was bootstrapped very early, when the rate of change of the
language itself was still "multiple things per day." This introduced
significant architectural debt.

There have been multiple projects that have re-written massive parts of the
compiler, and more ongoing. For example, non-lexical lifetimes required
inventing an entire additional intermediate language, re-writing the compiler
to use it, and making sure that everything kept working while doing so.

More recently, the compiler has been being re-done from a classic, multiple-
pass architecture to a more Roslyn, "query-based" one. Again, this is being
done entirely "in-flight", while keeping a project that's used by a _lot_ of
folks stable. The rust-analyzer has made this project even more interesting; a
"librarification" strategy is being undergone to make the compiler more
modular.

For some numbers on this kind of thing,
[https://twitter.com/steveklabnik/status/1211667962379276288/...](https://twitter.com/steveklabnik/status/1211667962379276288/photo/1)
and
[https://twitter.com/steveklabnik/status/1211717308143587334/...](https://twitter.com/steveklabnik/status/1211717308143587334/photo/1)

~~~
j88439h84
> Rust's got a really intense stability policy.

I know the code won't stop running, but I wonder how soon it stops being
idiomatic. If it's not idiomatic, it's harder to maintain due to unfamiliar
style and structure. Does Rust have measures to deal with this issue?

~~~
steveklabnik
I think the closest thing is enforced rustfmt. I don't hack on the compiler
though, so maybe there's some stuff that the team does that they don't
broadcast super widely.

~~~
j88439h84
I don't mean the code of the Rust compiler, I mean code written in Rust
becomes unidiomatic as the idioms change. How fast does that happen, is it a
problem, is it being addressed?

~~~
ansible
As it happens, Steve can provide some metrics for this:

[https://words.steveklabnik.com/how-often-does-rust-
change](https://words.steveklabnik.com/how-often-does-rust-change)

[https://www.reddit.com/r/rust/comments/fz8mwm/how_often_does...](https://www.reddit.com/r/rust/comments/fz8mwm/how_often_does_rust_change/)

~~~
steveklabnik
I don't think this analysis really captures idiom questions, so while it's
related, I'm not sure it's the right thing here :)

------
gok
Novel compiler backends are a super cool idea, but I don't think it's going to
help Rust compile speeds as much as this posts suggests. The complexity of
Rust's type system puts a pretty high lower bound on compile times because of
work the front end needs to do. Plain C compiles quickly even with an LLVM
backend, for example.

~~~
pjmlp
Haskell, OCaml, SML, Idris also compile quite fast, with complex type systems.

Their secret? Multiple backends with different kinds of optimizations.

You don't need to compile for the ultimate release performance when in the
middle of compile-debug-edit cycle.

~~~
isatty
From my (limited) experience, Haskell does not compile fast, especially if
you’re doing something that needs lenses.

~~~
pjmlp
It surely does, because Haskell is not one compiler language, not only does it
have multiple implementations, which I concede almost everyone only cares
about GHC, there are interpreters and a REPL experience as well.

You don't need to compile your program in one go using GHC's LLVM backend,
many times a GHCi session is more than enough.

------
Koshkin
I've been wondering lately if the modern compilers should all be using C as
the intermediate language (or some language-specific code optimization
opportunities could be lost if they do that).

~~~
edwintorok
The semantics of C aren't very well defined, there is a lot of ambiguity in
the form of undefined and implementation defined behaviour. This ambiguity is
often needed to build an efficient optimizing compiler.

When you have a higher level language with more accurately defined semantics,
running it all through C would risk introducing undefined behaviour.

With an IR you can control and define the semantics more closely to what your
language needs.

~~~
ansible
I've got to wonder if any of the existing intermediate representations would
be appropriate with other programming languages.

~~~
steveklabnik
This is true to varying degrees, you could say that LLVM-IR and Java bytecode
are two examples of this in action.

------
jlebar
If there are any rust people here, you've probably considered that you can
speed up your debug llvm builds by enabling some optimizations. SimplifyCFG
comes to mind, but, like, you can experiment. I presume the reason you haven't
is because you want to preserve debug info, and llvm isn't great at that when
optimizations are on.

~~~
the8472
You can customize the debug profile or create an intermediate profile between
release and debug in your Cargo.toml. Debug info and optimization levels can
be configured separately.

If by speed up you mean compile times and not runtime behavior then there's
also some unstable compiler flag that allows adding specific llvm passes.

------
crad
While it appears that cg_clif is faster to compile, does it provide any
performance benefit compared to cg_llvm? Are the compiled binaries as fast as
llvm compiled binaries? If not is the use-case for development purposes only?

~~~
__s
Correct, cranelift is meant for faster development build cycles

[https://github.com/bytecodealliance/wasmtime/blob/da02c913cc...](https://github.com/bytecodealliance/wasmtime/blob/da02c913cc5a71de955b071a05bc157de39b20be/cranelift/rustc.md)

------
mttyng
This is awesome. It doesn’t even seem that long ago when Boa was started! Man,
time flies and people do great things. Kudos to the author and co-contributors
for what Boa has become.

------
tyrion
Thanks for nice article! Hoping the author reads the comments, I would like to
leave an, hopefully useful, feedback.

It would greatly improve the reading experience of your blog if you could make
clickable the footnotes/references.

For example when you say:

> I’ve taken the chart from the 2016 MIR blog post[3]

I have to scroll to the end of the page to find the blog post (and then scroll
back to resume reading). If [3] were clickable it would be great. It would be
even better if [MIR blog post] were an actual link itself.

------
tester3
How do they ensure that output of both compilers is correct?

e.g LLVM output is A, but the new one is B, how do they deal with different
results between backends?

------
Myrmornis
There wouldn't be any surprises, or cognitive dissonance, from using very
different paths for debug versus release builds?

On a small project, personally I use --release sometimes during development
because the compile time doesn't matter that much and the resulting executable
is much faster: if I don't use --release I can get a misleading sense of UX
during development.

~~~
steveklabnik
This already happens a bunch, even with the current setups. It's very natural
if you come from a compiled language, and not if you don't. The first step of
someone saying "hey why is Rust slow?" is five people replying "did you use
--release".

~~~
sfink
Can confirm.

I had a graph traversal program written in Python. I ported it to Rust, and
the runtime was identical -- 68.4 seconds, down to the tenth of a second.
(Kinda blew my mind -- I had to triple check that I was running and timing
what I thought I was!) I had a bit of a crisis of faith.

I poked at it a few times over the next week, then finally got on the IRC
channel and quickly received the advice mentioned above. Same input, with
--release: 6.2 seconds.

------
amelius
Does it support JIT compilation, i.e. specialization at runtime?

~~~
steveklabnik
Cranelift has a JIT, but I am not sure what the status of it is as a rustc
backend.

------
stackzero
+1 enjoyed how accessible this write up was

------
brokenbotnet
Really great.

