As long as the Rust language specification isn't stable, GCC's Rust implementation would just fall behind all the time.
As an alternative, it has been suggested providing a GCC frontend for the LLVM-IR language which is generated by the Rust compiler since that specification is stable.
If Rust's specification becomes stable at some point, it would probably a good idea to start a Bountysource campaign (like we recently did for the m68k  and AVR backends ) to help the Rust frontend brushed up and ready for being merged into the GCC source tree.
>  https://www.bountysource.com/issues/80706251-m68k-convert-th...
>  https://www.bountysource.com/issues/84630749-avr-convert-the...
Compare that to C/C++ which has new releases every 3 years. A compiler that only supports C++14 is still perfectly usable for most C++ codebases out there. In fact, Godot engine still hasn't migrated to C++11 yet, and it's no exception from the norm at all. Even if your compiler has parity with Rust 1.31, released one year ago, you'll have trouble with most projects as even if the project itself doesn't depend on newer compiler releases, one of the transitive dependencies will.
Because then you are bound by the release schedule of GCC. Of course one can think about separate projects like the gcc-rust one where you can have whatever release schedule you want. But then it'd be a separate project building on GCC and not a part of the larger GCC project itself.
LLVM having a separate release schedule from rustc isn't a problem. rustc can miss out an LLVM release without problem. A rustc 1.35.0 compiled with LLVM 5 can compile a rustc 1.36.0 compiled with LLVM 7. However, GCC can't miss out releases. The rustc 1.36.0 frontend needs at least a 1.35.0 frontend to compile itself. And most Rust programs in the ecosystem work with rustcs compiled with older LLVM releases, but most Rust programs that have dependencies do need newer rustc releases.
As for stability of MIR: currently I think MIR is serialized to disk in a glorified mmap way. Basically following the Rust memory representation. It's great if both the creator of the MIR as well as the part that reads the MIR are written in Rust. Furthermore, there are libraries about how memory layout should look like that are provided to codegen backends. Those libraries are written in Rust and not really usable outside of Rust. So currently unless someone serializes MIR using e.g. bincode and provides C bindings for those layout libraries, there are good reasons to write the codegen backend in Rust itself, at least the part that translates MIR to the next stage.
In this fashion, LLVM IR is different as it allows a multitude of languages to communicate with it.
I also just realized that it's not the same backend as the one written by "redbrain" here .
>  https://github.com/redbrain/gccrs
gcc-rust currently needs LLVM to build because that's the only good way to build Rust code, and gcc-rust (unlike, say, mrustc) uses Rust code. Once gcc-rust is on track, Rust code used by gcc-rust can be built by gcc-rust, and LLVM becomes unnecessary.
Interesting. This is the first I've heard of Cranelift being used as anything other than a code generator for wasm runtimes. When would Cranelift be used instead of LLVM, and vice versa?
But I would expect in a few more years we'll see the rate of Rust language change will be significantly lower.
The (simplified) way that rustc works is that it translates Rust source code into a custom intermediate language called MIR, does a bunch of analysis and a few optimizations, and at the very end translates MIR to LLVM IR and passes that into LLVM. This project looks to add support for translating MIR to GCC's own IR instead of LLVM IR, allowing it to reuse all the other work that has been done writing rustc.
For a from-scratch reimplementation of (most of) a Rust frontend, see mrustc ( https://github.com/thepowersgang/mrustc ), which is written in C++, and served to demonstrate that (a prior version of) rustc did not feature a reflections-on-trusting-trust attack (via diverse double compilation).
FWIW, there is an out-of-tree m68k backend for LLVM . I have already done some experimental work on Rust for m68k based on this backend .
>  https://github.com/M680x0/M680x0-llvm/
>  https://github.com/glaubitz/rust/tree/m68k-linux
* Alpha is more than a decade past end of life.
* Hppa is EOL as of 6 years ago.
* ia64 goes EOL next month.
.. and the list goes on. These aren't going concerns outside of some hobbyist work, and GCC has had difficulty retaining maintainers for some of the hardware and has threatened deprecation of them (in fact IA-64 is facing this exact situation and is going to be marked deprecated in GCC 10, and vanish in 11 https://gcc.gnu.org/ml/gcc/2019-06/msg00125.html). With instruction set support in that state, the odds are reasonable that bugs will be exposed, and difficulty will be had trying to support and fix them (exasperated by scarcity of hardware)
Adding support for obsolete hardware is not much of a good argument for taking on the work of supporting two compilers.
There are some back-end changes coming to GCC, and no one had ported m68k to it (presumably there is no maintainer for m68k in GCC?) The back-end code the m68k code was relying upon was going away, and it needed ported to use the modern one. Someone has stepped up and done that work, so it gets to live another day.
https://gcc.gnu.org/backends.html based on this is looks like avr, cris, h8300, mn10300 and vax are yet to be ported away from cc0.
This currently shows no sign of happening. What do you expect to change such that this happens?
It's unlikely that LLVM would have been maintained otherwise
Again, why? You're making this assertion as if it's fact, but there are a huge number of maintained pieces of software that do the same task as other maintained pieces of software. A great many pieces of software exist that didn't even get started until after similar software was already established. Do you have any logic behind your thinking beyond the fact that they're two pieces of software that do similar tasks?
That's why the LLVM people won't switch over to GCC, sure. But what do the GCC people think of it? If they're content to keep working on GCC, what's going to change such that they want to stop? I could imagine that if the LLVM group gets orders of magnitude ahead (seems unlikely - thus far, they seem to be mostly keeping up with each other in terms of performance and so on - but it could happen), GCC might start to be seen as a niche piece of legacy software and start to head towards retirement. Any other obvious pathways for the end of GCC?
Just natural turnover. I expect GCC to struggle to attract new developers. When it was clearly best open-source optimizer there was a certain cachet to it, but now all the advantages are with LLVM.
> Do you have any logic behind your thinking beyond the fact that they're two pieces of software that do similar tasks?
Apple in particular are documented as having tested the limits of GCC's licensing before funding work on LLVM; I believe other corporate contributors have made comments along the same lines.
I would welcome another rust compiler as it would give us choices.
Why was MIR chosen instead of WASM?
MIR is Rust's accurate low-level representation with much more type information, and it's aware of native target's specification, so it can be optimized better, and has better interoperability with native code.
But rust in the general case is agnostic about 32 vs 64 bit pointers and explicitly targets WASM.
I'm not familiar with GCC's IR, but unsandboxed AOT WASM compiled thru LLVM IR is astoundingly fast.
> This average benchmark has speed in microseconds and is compiled using GCC -O3 –march=native on WSL. “We usually see 75% native speed with sandboxing and 95% without. The C++ benchmark is actually run twice – we use the second run, after the cache has had time to warm up. Turning on fastmath for both inNative and GCC makes both go faster, but the relative speed stays the same”, the official website reads.
> “The only reason we haven’t already gotten to 99% native speed is because WebAssembly’s 32-bit integer indexes break LLVM’s vectorization due to pointer aliasing”, the WebAssembly researcher mentions. Once fixed-width SIMD instructions are added, native WebAssembly will close the gap entirely, as this vectorization analysis will have happened before the WebAssembly compilation step.
It's important for core infrastructure to have multiple competing implementations. On a related note, does Rust have a standard yet or are they still doing the reference implementation thing?
> I was under the impression that llvm is better than gcc?
And I thought that tabs were better than spaces, BSD beat Linux, Emacs was the one true god... what were we arguing about again?
I'd be curious to know whether this would provide cross-language outlining during LTO using gcc. I believe some form of this is possible with llvm?
For many practical purposes I think the closest thing to a language definition is the set of testsuites visible to Crater.
(That is: when the compiler people are considering a change, they don't say "we can't change this because we're past 1.0 and the change is technically backwards-incompatible", or "we can't change this because the Reference specifies the current behaviour"; they say "let's do a Crater run and see if anything breaks".)
It’s a bit weird how laser-focused the Rust community is on backwards compatibility, not seeming to believe that forward compatibility is also important.
e.g., if I write code targeting C++17, I can be reasonably sure it compiles with an older version of the compiler, as long as that version also claims to support C++17, modulo bugs. Not the case if I write code targeting Rust 2015 as they’re still adding features to that from time to time. Let alone Rust 2018 which changes every 6 weeks.
Will there ever be a version of Rust that the community agrees “OK, this language definition is not changing unless we find some major soundness bug” ?
This is a big blocker for mainstream adoption in Linux distributions since the maintainer wants to be able to land one specific version of rustc in the repositories, not rely on people downloading new versions with rustup continuously. But old versions of rustc are effectively useless due to the lack of forward compatibility guarantees.
g++ 4.4 implemented several key parts of C++11, including notably rvalue references, and adapted libstdc++ to use rvalue references in C++11 mode. However, the committee had to make major revisions to rvalue references subsequent to this implementation, to the point that you can't use libstdc++ 4.4's header files (such as <vector>) with a compliant C++11 compiler. So when you try to use newer clang (which prefers to use system libstdc++ for ABI reasons) on systems with 4.4 installed (because conservative IT departments), the result is lots and lots of pain.
Furthermore, it absolutely is the case that newer versions of compilers will interpret old language standards differently than older versions of the compiler. You don't notice it for the most part because the changes tend to revolve around pretty obscure language wordings involving combinations of features that most people won't hit. Compilers are going to try hard not to care about language versions past the frontend of the compiler--if the specification change lies entirely in the middle or backend, then that change is likely to be retroactively applied to past versions because otherwise the plumbing necessary is too difficult.
(And has a number of other disadvantages too, like constant cognitive load having to re-learn the language every 6 weeks).
Also, editions could be a snapshot of the language definition at a point in time, without being a snapshot of the compiler. There are still new versions of Clang and GCC coming out with new bugfixes, better optimizations, improved error messages, support for different hardware, WIP support for future language editions, etc., without changing the C++17 standard.
I mention the reason in my prior comment: to allow people to continually upgrade their compiler version without needing to change any code. Rust doesn't have a stable ABI, so all crates in a Rust project ultimately need to be built with the same compiler (and furthermore, crates must always be able to interoperate regardless of which edition they're on). That means that every new version of the compiler needs to support every old edition, because the alternative is to have users stuck on old versions of not just the compiler but also on old versions of dependencies that have since begun using features only supported by newer compilers. In Rust's case avoiding such a fundamental fracture in the community was more important, since, after all, there's still nothing stopping anyone from voluntarily sticking with an older version of the compiler if they're willing to deliberately endure such a situation.
> (And has a number of other disadvantages too, like constant cognitive load having to re-learn the language every 6 weeks).
This is quite hyperbolic. Rust introduces no more features than any other language, it simply rolls them out on a more fine-grained schedule. Furthermore, Rust hardly requires re-learning every six weeks; a "feature" introduced by a new version is often nothing more than a new convenience method in the standard library. The fact that we have established that Rust goes out of its way even to keep "old" code compiling and compatible with the rest of the ecosystem should demonstrate how little it demands that users re-learn anything.
While the cadence may be faster, the pace of language change in Rust is slower than other popular languages such as C++ or JS.
Right now with C++20 around the corner, C++14 is still the safest bet for portable code, whereas in Rust we still see relevant crates that depend on nightly.
* The new lifetime model (non-lexical lifetimes)
* Async fn
* Procedural macros
* ? operator
* Import name resolution changes
* impl Trait
* C-style unions
Here's the list of features added in C++17 and C++20:
* constexpr if
* Structured binding
* Type deduction helpers
* <=> operator
* Expanded the set of expressions and statements that qualify as constexpr to the point that it's a very different feature from what it was in C++11.
In the same amount of time, C++ has added roughly the same number of features, but I would qualitatively say that C++'s feature additions are more impactful than Rust's feature additions, especially in terms of making newer code unrecognizable to programmers used only to the old version.
That's what I meant by pace versus cadence--overall, C++ has changed more, but it tends to change in triennial bursts instead of every six weeks.
The 3 years for each ISO revision, plus around three until the standard is actually usable in a portable way.
My old dusty non-maintained MFC applications still compile on latest Visual Studio.
There are supported versions of RHEL that still only support C++11, FWIW.
Rust dearly needs a stable specification, it is the main blocker why the language hasn't been more widely adopted.
E.g., if someone thinks "I'm going to target 2015 because I want my code to run on the rustc shipped with various slow-moving Linux distros", it doesn't help, because you might still not be able to target their code, unless they specifically target an older version of rustc, which nobody does.
There has been discussion of a Rust LTS channel alongside stable/beta/nightly, which would try to solve that problem, but it has not been prioritized yet: https://github.com/rust-lang/rfcs/pull/2483
An actual frozen language is also a possibility, but probably won't happen until more work happens on an independent specification. Which, in fact, people are also working on: https://ferrous-systems.com/blog/sealed-rust-the-pitch/
It really depends on how strictly you define the term specification. The Rust Reference is not required to be accurate. Though many other language compilers/implementations don’t fully implement their respective specs so, :shrug:.
If I have a question whose answer isn't obvious, it's far more likely that I have to go trawling around in RFCs than that there's an answer in the reference.
I think most languages of a similar age (eg Go, Swift) are doing better.
Could you site sources please?
Now, in terms of end results, llvm and gcc each have their qualities. When llvm was released, gcc typically produced faster binaries but llvm optimizations were easier to understand. Since then, both have evolved and I haven't tried to catch up.
Bottom line: having two back-ends for rust or any other language is good. Among other things, it's a good way to check for bugs within an implementation of the compiler, it can be used to increase the chances of finding subtle bugs in safety-critical code, etc.
The most recent restrict bug the rustc developers found (which made they disable restrict again) was found in both LLVM and GCC (they made a C reproducer, so they could test in both). See: https://github.com/rust-lang/rust/issues/54878 (rustc) https://bugs.llvm.org/show_bug.cgi?id=39282 (LLVM) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609 (GCC).
I even filed a version of it myself:
I'm actually surprised that Rust enabled noalias usage with this known outstanding issue. When I worked on Rust years ago, it was definitely common knowledge on the compiler "team" that this was broken.
I'm equally surprised that GCC had that bug, since their pointer aliasing model is equipped to correctly handle this situation (and is why they were able to fix it quickly).
And the number of architectures by LLVM:
GCC supports vastly more targets.
> The C standard uses the term byte to mean the minimum addressable unit in the implementation, which is char, which means a byte on these targets is 16 bits. This is in conflict with the widespread use of byte to mean 8 bits exactly. This is an unfortunate disagreement between C terminology and widespread industry terminology that TI can't do anything about.
In the past, architectures differed wildly in number of bits per byte, e.g 36 for the machine where the Pascal language was created.
Today, the industry mostly standardized on 8 bits per byte, but see e.g the PIC architecture for an example relevant today with a different choice: 8 bit bytes for data, but 10 bit bytes for instructions.
I think that's an anachronistic/incorrect usage. A lot of machines (including several with 36-bit words that you mentioned) supported larger basic addressable units of memory, but didn't call these larger units "bytes", and distinguished between "bytes" and "words". In fact, one of the elements of the early RISC philosophy was that CPU support for byte accesses (as opposed to word accesses) was extraneous, based on statistics gathered from real programs. Early MIPS/Alpha/etc. machines did not support byte addressing, but the people using them still called 8 bits a byte.
That is my experience too.
GCC for code with high level of nesting, meaning high potential for inlining (typically C++), is close to unbeatable. Including even compared Highly optimised compilers like Intel ICC.
LLVM's IR is not stable by design. [1, 2]
> while GCC refuses to do so over the decades for political and strategic reasons.
That was a long time ago. Since GCC 4.5 (released in 2010) GCC supports external plugins. [3,4] These plugins, like the rest of GCC, use GENERIC and GIMPLE as their IR.