
Closing the gap: cross-language LTO between Rust and C/C++ - pedrow
http://blog.llvm.org/2019/09/closing-gap-cross-language-lto-between.html
======
pornel
It's nice to see such post on the LLVM blog (as opposed to typical Rust-only
outlets). Feels like recognition that Rust is a serious and important LLVM
user.

~~~
masklinn
Now if only LLVM could feel it was important enough to make noalias work
reliably.

~~~
fluffything
LLVM devs don't really care about Rust. They haven't fixed noalias in years,
there isn't still a freeze intrinsic in the IR, the LLVM-IR semantics are
often not documented enough for the Rust compiler to know whether it is
generating IR that has UB or not...

I see a couple of Rust developers working on LLVM almost full-time (like
nikic), but there should be more. The Rust language needs to become a more
important stakeholder, and for that it needs more paid full-time LLVM
developers, infinite loops are UB in LLVM-IR...

~~~
ncmncm
LLVM developers are focused on the needs of (literally!) millions of C and C++
programmers. Until you can point to a few hundred-thousand production Rust
coders, or get somebody with deep pockets to depend on it, Rust is just not
important enough. The solution is to fund your own LLVM developers. If you
can't raise the money to pay them, who should?

People with deep pockets are typically advised not to depend on unsupported
infrastructure. It is hard and, often, unwise to argue with such advice.

It is still early days. Give it ten years: Rust will either have taken off or
sunk, by then. Maybe somewhere in those years there will be some new hotness
to jump on instead.

~~~
Hello71
Rust has in fact had almost 10 years.

~~~
Jweb_Guru
The Rust of ten years ago bears almost no resemblance to the Rust of today.
Its history is interesting to people involved in the project, but it wasn't
really used seriously until 1.0 was released.

~~~
ChrisSD
I make it 9 years since the first release and 4 years since the first stable
version (aka 1.0).

Either way it's a short time in which to gain acceptance anywhere near the
level of C++ so it's not surprising it hasn't. Although companies (including
Microsoft and some Google teams) are just starting to take it seriously.

~~~
imtringued
It's not a short time. C++ is only 34 years old and back them there were fewer
developers than today.

~~~
hyperman1
C with classes, which would be comparable to pre 1.0 rust, started in 1979, so
40 years ago. The first commercial C++ and the book were in 1985 and 34 years
ago. I'd guess that's a good 1.0 milestone, comparable with rust's 4 years.

------
angrygoat
This is awesome, especially with the gains for Firefox, but this bit seemed
odd to me:

> We quickly learned, however, that the LLVM linker plugin crashes with a
> segmentation fault when trying to perform another round of ThinLTO on a
> module that had already gone through the process.

It sounds like they worked around this, rather than fixing the segfault and
putting some error handling in place? Might make it easier for the next bunch
of people working in this part of clang.

~~~
xiphias2
You're right, but it still looks like a big improvement. It means that Firefox
devs can write every new functionality in Rust, no matter how small it is.

~~~
phkahler
>> It means that Firefox devs can write every new functionality in Rust, no
matter how small it is.

True, but they should still focus on oxidizing whole modules and subsystems in
their entirety whenever possible.

------
azakai
The need to use "compatible" versions of LLVM between the C++ and Rust
compilers is scary. Anything aside from the exact same LLVM revision could in
theory lead to bad results, including bugs or security vulnerabilities (if
LLVM changes the meaning of something in its IR).

This isn't Rust or Clang's fault, of course, it's just a consequence of using
LLVM IR as the data for LTO, that LLVM IR has no backwards compatibility
guarantees, and that Rust is out-of-tree for LLVM.

In theory using a stable format for LTO would avoid issues like this. Wasm is
one option that has working cross-language inlining already today, but on the
other hand it has less rich an IR, and the optimizers are less powerful than
LLVM.

~~~
bla3
LLVM bitcode is backwards compatible. It is however not forward compatible, so
the linker needs to understand the newer bitcode format that clang and rustc
use.

~~~
azakai
The issue isn't of being able to load the bitcode (which LLVM has gotten
pretty good at supporting in a backwards compatible way). It's that the
meaning of things might change, undefined behavior may be handled differently,
and so forth.

In other words a newer optimizer running on older IR may emit broken code.

~~~
bla3
I thought bitcode backwards compat included that too. If it didn't, Apple's
collecting bitcode for watch apps for transparent 64-bit support wouldn't
work.

~~~
azakai
It's possible to try to support that, but you can never be sure.

For example, imagine that LLVM has a known bug with some pass, and it has a
workaround somewhere else that disables generating IR that would hit that bug.
A different version of LLVM may fix that bug, and remove the workaround - but
then optimizing bitcode from another version could be vulnerable.

Another example is undefined behavior in LLVM IR. It may be handled
differently in different versions, and it's hard to know what might happen
from mixing them.

In general, LLVM is heavily tested - on each revision. I'm not aware of any
large-scale project that tests all LLVM versions on LLVM IR from all other
versions. That's untested. I'd be afraid to rely on that.

I don't know what Apple does with user-supplied bitcode, but if I were them
I'd be recompiling old bitcode with the old LLVM that matches it, or something
else (like subset the bitcode to remove undefined behavior, etc.).

~~~
glandium
I found a LLVM IR incompatibility once, and it was already fixed. It _seems_
some checks run for compatibility with released versions. I'm not sure whether
they're entirely automatic and systematic, but they do happen.

However, the way I found it is that it affected the random version of LLVM
trunk stable rustc was using at the time... That's one of the reasons why
stable rust should stay away from LLVM trunk.

------
bla3
> No problem, we thought and instructed the Rust compiler to disable its own
> ThinLTO pass when compiling for the cross-language case and indeed
> everything was fine -- until the segmentation faults mysteriously returned a
> few weeks later even though ThinLTO was still disabled. [...] Since then
> ThinLTO is turned off for libstd by default.

Instead of fixing the crash, they landed a workaround.

> We learned that all LLVM versions involved really have to be a close match
> in order for things to work out. The Rust compiler's documentation now
> offers a compatibility table for the various versions of Rust and Clang.

It's cool they got it working, but it sounds like this is currently proof-of-
concept quality and not very productionized yet. To me, the overall tone of
the article sounds like they ran into a bunch of issues and opted for duct
tape instead of systemic fixes. Which is fine to get things off the ground, of
course! But I hope they take the time to go back and fix the underlying issues
they ran into too.

~~~
pcwalton
This isn't anything new. Rust has had to land workarounds for lots of LLVM
issues in its history. For example, Rust had to stop using noalias on function
parameters because LLVM miscompiled too many functions with it, as Rust can
use it way more than C/C++ do and therefore it didn't receive much upstream
test coverage.

~~~
bla3
Rust could fix upstream issues it runs into, no?

~~~
bzbarsky
They could, and they do.

That said, they don't have infinite time, and if, as in this case, the
upstream fix would: (1) be pretty involved and (2) be very likely to get
regressed because upstream doesn't have the capability to run tests that would
prevent that (e.g. because upstream only runs C++ compilation tests and there
is no way to exercise the relevant bugs via C++ code), then investing in
fixing upstream may not be the right tradeoff.

In theory, one could first change upstream's test harness to allow Rust code,
but that involves upstream tests depending on the Rust compiler frontend,
which apart from being a technical problem is probably a political one.

Maybe it would have been possible to do upstream tests via bitcode source
instead of Rust or C++; I don't know about LLVM to say offhand. But in either
case this is not as easy as just "fix a simple upstream bug"...

~~~
pcwalton
Upstream tests are generally done at the LLVM IR level actually. It's mostly
just a question of (1) time; (2) worries about ongoing maintenance work
upstream; (3) a general feeling that perhaps such optimizations are best done
on MIR anyway, because they'll be more effective there than they would be in
LLVM.

~~~
comex
You're suggesting that rustc should do noalias optimizations on MIR? I'm
skeptical of that idea... A lot of duplicate loads that would benefit from
being coalesced are only visible after LLVM inlining.

~~~
pcwalton
Obviously MIR inlining needs to happen first (and I think it does happen
already?) But to me it's clearly the right solution going forward. LLVM's
noalias semantics are much weaker than what we can have on MIR, with full
knowledge of transitive immutability/mutability properties.

~~~
comex
'Classic' LLVM noalias as a function parameter attribute is weak, but the
metadata version is much more flexible. I looked into it in the past and IIRC
it's not a perfect match for Rust semantics, but close enough that rustc could
still use it to emit much more fine-grained aliasing info; it just doesn't.
But there was also a plan on LLVM's end to replace it with yet a third
approach, as part of the grand noalias overhaul that's also supposed to fix
the bug. Not sure if there's been any progress on that.

As far as I can tell, MIR inlining currently happens with -Z mir-opt-level=2
or higher, and that is not implied by -O. But I have no idea what future plans
exist in that area.

I admit I have a bias here: it feels to me like everyone (not just Rust) is
running away from doing things at LLVM IR level, and the resulting duplication
seems inelegant. But on reflection, part of the reason I have that feeling is
that I've recently been spending time fixing up Clang's control-flow graph
functionality... which itself is 12 years old, so it's not a new trend at all!

------
zelly
Rust needs easy interop with C++'s ABI. Easy as in I should be able to "import
Boost" and have it all mapped to Rust structures without doing anything.

A big reason C++ took off was backward compatibility with C. Network effects.
Today C++ has the role that C had in the 80s.

No one uses any other compiler but LLVM for Rust anyway, so who cares about
compatibility with MSVC and others. This will also force adoption of LLVM,
which can be a good incentive for LLVM to support it.

~~~
comex
I think this is both possible and desirable. Basically, create a bridge
between rustc and Clang, kind of like Swift's C importer, but far more complex
in order to be able to do things like instantiate C++ templates on demand
(perhaps even with Rust types as arguments!).

However, it would also be _extremely hard_ ; I don't think any programming
language has ever created such a tight bridge to C++. The existing approaches
I've seen are:

\- Swig, rust-bindgen, etc.: Pass through auto-generated C ABI wrappers;
support for generics is limited and requires declaring up front what types you
want to instantiate them with.

\- D: You can bind directly to C++ if you rewrite your C++ header in D,
generics and all... including the implementations of all inline functions.

Both very limited, especially in template-happy modern C++.

~~~
pjmlp
You forgot about COM/UWP, which has taken the role originally though for .NET
on Longhorn.

The whole point of UWP was to improve COM to make it even better for language
interop, increasing the kind of language features that can get exposed as COM
libraries.

Also the number one reason that if Rust wants to succeed as system language on
Windows it needs to have first class support for COM/UWP.

------
jokoon
I'm not a huge fan of rust, but that would make rust much more attractive and
simple to use.

Keeping C++ software while making sure important parts are bug free sounds
awesome...

~~~
mlindner
You don't have to love Rust to still use it in specific narrow areas.

