Hacker News new | past | comments | ask | show | jobs | submit login
A possible new back end for Rust (jason-williams.co.uk)
526 points by obl 46 days ago | hide | past | web | favorite | 218 comments

This is really great. The world needs more diverse compiler tech. The llvm monoculture is constraining what kind of compiler research folks do to just the things that are practical to do in llvm.

I particularly suspect that if something like Cranelift gets evolved more then it will eventually reach throughput parity with llvm, likely without actually implementing all of the optimizations that llvm has. It shouldn’t be assumed that just because llvm has an optimization that this optimization is profitable anywhere but llvm or at all.

Final thought, someone should try this with B3. https://webkit.org/docs/b3/

Devil's advocate: more diverse compiler tech will mean a more fragmented community and a larger probability of divergence across implementations.

People think the C compiler community is dominated by GCC and Clang, and it is, but there are literally 1000s of implementations out there in the wild. Most are necessary, because we need code generated for some obscure processor architecture that's completely proprietary, but you can create that "backend" in LLVM itself - it's a new target architecture instead of e.g x86.

The great thing about LLVM is that it's effectively the quickest (and probably the best) way to generate machine code without putting in too much effort, for a language. Whether that language be a research language or an existing industry language (say, C), that kind of establishment is hugely valuable.

A great example of a good monoculture is the Go monoculture. Sure, there's gccgo, but the proportion of people using that vs. the reference implementation is minimal, and that reduced fragmentation is actually a good thing for practitioners (which most engineers are, not PL researchers).

Llvm is absolutely not the least effort for generating machine code. In many settings, it takes a fraction of the effort of integrating llvm to create a template compiler that goes straight to machine code. In many other cases, your best bet is to have your compiler emit C and then feed that to a C compiler of your choice.

It’s good to have divergence. Competition is good. Otherwise people stop trying new things.

Depends on your goals. Writing a front end, optimizer and backend quickly gets to more work. I can write a c++ compiler in a few months. It won't be good and to make it good would be many many years of work. If I write a llvm backend it might take a little longer (I doubt it), but I automatically get all the optimizations llvm has plus a good front end that doesn't have bugs in obscure corner cases. (not claiming llvm is perfect but there will be less bugs)

> I can write a c++ compiler in a few months.

Not that it substantially distracts from your point, but I strongly doubt this. Or did you mean a heavily restricted subset of C++? A C++ front end alone is so complex to build that these guys make a living off of licensing their front end code: https://www.edg.com/

(Fun fact: Microsoft rebuilt IntelliSense for C++ on the EDG front end. Yes, that Microsoft with the MSVC compiler. See https://devblogs.microsoft.com/cppblog/rebuilding-intellisen... and https://old.reddit.com/r/cpp/comments/bdt8ep/does_msvc_still...)

Even without compatibility cruft, you're looking at multiple 100k LOC if their code base is anything to go by. That's man-years, not man-months...

Edg is writing a good front end. With good error handling and all of the other things that make a commercial program a few hundred times harder than a quick prototype. I'd be writing a brute force front end that is slow, and goes straight to assembly. If there is a syntax error I'll handle it by crashing. When you create a variable is used twice in a row I'll store the intermediate value back into memory and reload it back into my register.

There have been write a c++ compiler classes that did it in a semester. I think they also do the standard library. (but that might be a year long course)

I'm still awfully skeptical that you can get even close to understanding all the template, lookup, lifetime, lambda, exception and initialization rules in a semester (I assume that CS students are almost never this deep into C++ at that point). Not to speak of actually implementing all of it.

I'd be curious where these classes draw the line. Do you happen to have a syllabus or so? I don't doubt you can implement a meaningful portion of C++ in a semester, but converting 500 pages of standardese into code within as many hours seems like an impossible goal for a class to me.

http://www.cppgm.org/ was a sort of experiment called "C++ Grand Master Certification." It seems to have died out, but the goal was to produce a self-hosted C++-11 compiler and standard library implementation. It took about a year to go from nothing to code generation in 9 programming assignments (I think I completed the first 6 before I got too busy to continue).

Still not as good as emitting C code in most cases? C code gets optimized using either llvm or any other optimizer so it’s a more portable compile target.

When you emit C, you're limited by C, at least if you want to emit C as opposed to inline assembly wrapped in C. For example, it's harder to have a function return more than one value in C than it is in most architectures, you can't do things with processor flags (on architectures which have them), you're at the mercy of the C compiler's optimizer as to vectorization and loop unrolling, you can't always preserve semantic information in the source code even when a "reasonable" compiler would be able to use it to improve the machine code...

LLVM was created to replace emitting C, by providing programmers a way to turn source code into a representation that is lower-level than C without having to write the whole optimization and assembly code generation pipeline.

I think we can all agree that LLVM IR is a more powerful compilation target than C. However, what Pizlo was saying is that generating C can be simpler than generating LLVM IR. A bunch of printfs can get you very far.

There are very few things that you can say in LLVM IR that you can't say in C.

I use C as a target for my own compiler and I agree that there are very few things that I miss. But the ones we do miss do stick out like a sore thumb. One of the main ones for me is a way to get an accurate stack map for precise garbage collection. With C you usually gotta go for a conservative GC, or you need to maintain a shadow stack. (Although I'll admit that the GC story with LLVM isn't super great either)

Except for _e.g._ memory alignment, or types of a given width (until C99).

For lots of us the extensions that gcc and clang support are part of C. So, a backend that emits C would get access to those things if it wanted.

Until you want to write code that works on platforms not supported by these compilers.

How many platforms does LLVM support that Clang does not?

LLVM while a very successful project isn't nothing new as idea.

IBM had several LLVM like projects during the 70's, and that is how their surviving IBM i and z/OS work anyway, with language environments that AOT at installation time.

Likewise there were projects like Amsterdam Compiler Kit among others during the early 80's.

> it's harder to have a function return more than one value in C than it is in most architectures

Biggest issue is the cultural aversion to returning structs and tagged unions.

And it's not even hard, just ugly. Which is much less of a problem for a compiler IR.

That’s true. If you’re doing a lot of that then llvm ir might be nicer. Matter of taste of course.

Some syntactic sugar using anonymous structs and type inference would probably fix the ugly.

What about the usecase provided in the blog post, where one is used for fast debugging/dev builds but you use LLVM for production releases? Basically: why not both?

I'm guessing this could create some divergence in terms of what is supported by the compiler but I'm curious how much that would matter in reality - for day-to-day serious project development. I'm not familiar with language dev at the compiler level, so I'm curious to hear if that's practical or sane.

Worth mentioning other alternative small backends: http://c9x.me/compile/

I use qbe, it's great. Here's a mostly feature-complete C11 compiler based on qbe:


I see that cproc is under quite heavy development, but qbe had last commit at the end of November. Is it considered feature complete? I heard about it some months ago and was quite interested in QBE, but it did not enjoy high tempo of changes. It may be considered advantage, I know too little to judge.

It's complete enough to compile C11 programs - to me, that's as good of a benchmark as anything. The main thing qbe is missing for cproc's purposes is inline assembly and VLAs. DWARF support would also be nice, but no one seems to care enough to do the work yet.

Are you saying that qbe does not generate any debug information, or just not DWARF format?

qbe does not generate any debug information. Though you can pass some flags to get some ideas of its internal code generation process.

While the GP doesn’t state this as an advantage, the Rust community would benefit from a fully Rust toolchain.

Why? Other than to prove it can be done what is the point.

If rust was a huge community okay, but face it, they are not. It is better therefore to focus their efforts where they can make a difference. A new x where the existing ones are just fine (this includes well maintained) is a waste of resources.

There are many possible good answers to the above question. However I'm not sure they apply, and worse I believe they will split resources that could be used to make something else better.

Cranelift - the compiler toolchain being discussed in this post (previously known as Cretonne) - is actually completely written in Rust, being developed (obviously) by Rust programmers, that are members of the Rust community. Its development started at Mozilla, which still employs some of its developers to work on it full-time.

So.. the claim that the Rust community is not big enough to achieve this is wrong, since they have already done it..

The reason they are doing it, is that LLVM is not fine: it is super _super_ slow. People want Rust to compile instantaneously, and are willing to pay people full time to work on that.

D, for example, compiles much faster than C and C++, and does this by having their own backend for unoptimized builds. I don't know how big the D community is, so I can't compare its size to the Rust community, but they did it, and it payed of for them big time, so I don't see why it wouldn't pay off for Rust as well.

DMD inherited the backend from DMC++, which was the end of a long line of optimizing C and C++ compilers going back over a decade before the earliest D alphas.

I didn't claim rust isn't big enough to do it. (that may well be true given the large effort that went into llvm over many years to make it a good optimizer - this is a different debate though and I'm not sure if it is true)

What I said was rust is better off focusing on problems that are not solved well by other people. A fast modern web browser (with whatever features is lacking) for example.

> LLVM is not fine: it is super _super_ slow

Source? LLVM is fast for what it does.

What people usually complain about is rustc being slow overall, not the LLVM passes.

> What people usually complain about is rustc being slow overall, not the LLVM passes.

The LLVM phases are usually the dominating factor in Rust compile times (the other big single contender is the linking phase). However, when the Rust developers point this out, they are also careful to mention that this may be due to the rustc frontend generating suboptimal IR as input to LLVM; we can both acknowledge that LLVM is often the bottleneck for Rust compilation while also not framing it as a failure on LLVM's part (though at the same time it is uncontroversial to state that LLVM does err on the side of superior codegen versus minimal compilation time, hence the niche that alternative compilers like Cranelift seek to fill).

This is true to some degree — Rust does more work than most programming languages, and that work will always take some time — but the Cranelift backend is also measurably faster than the LLVM one.

Why phrase it as "other than to prove it can be done" if you already know there are good answers? I think the following obviously do apply:

1) much easier for Rust community to contribute to the compiler from end-to-end.

2) lower coordination cost with LLVM giving complete, Rust-focussed control over code generation/optimisation. Think about e.g. fixing noalias.

3) lower maintenance cost for LLVM integration/fork.

It's also obvious that this needs to be weighed against the loss of LLVM accumulated technology and contributors. This is easy to underestimate (although I think 2)/3) are also easy to underestimate).

Because I don't think the possible good answers apply.

Sure it is harder to contribute to the backend, but does it matter? I've been doing c++ for years and never looked at the backend.

I'll grant lower coordination costs. However I believe they are not outweighed by the advantages of the other llvm contributions.

If they need to fork llvm that is a problem. Either merge it back in and be done (with some tests so whatever they need is not broke), or there is a compelling reason as llvm won't work with their changes.

Lower coordination cost is a big deal. Having your own backend means you can do frontend-backend codesign. You can implement language specific optimizations in the backend. Those things are not in the cards if you’re using llvm. (I mean they might be, but unless you fork, the time it’ll take for the changes to make it into llvm will be comparable to the time it takes to write your own backend.)

Yes, it does matter, because LLVM is an incredibly complex piece of software. And when you work on a compiler, it turns out you'll have to work on the backend. When I worked on a compiler day-in-and-out, there were single files in LLVM that were bigger than our entire in-house compilation backend put together. Which do you think is more appealing to debug? When a bug in code generation causes compiled programs to segfault, it is not necessarily easy to debug if you aren't intimately familiar with the project, and this fact is compounded when you consider not everyone hacking your compiler is also a C++ programmer, knows LLVM's architecture, and so on. It is literally hundreds of thousands of lines of C++. The trigger test case is probably a massive generated IR program generated by some toolchain written in a completely foreign language, for a foreign language. Playing the game of "recover the blackbox from the crash site" is not always fun.

You can file bug reports, but not every part of the project is going to receive the same level of attention or care from core developers, and not everyone has the same priority. For example the Glasgow Haskell Compiler had to post-process LLVM generated assembly for years because we lacked the ability to attach data directly next to functions in an object file (i.e. at an offset directly preceding the function). Not doing this resulted in serious, meaningful performance drops. That was only fixed because GHC developers, not any other LLVM users, fixed it after finding the situation untenable after so long. But it required feature design, coordination, and care like anything else and did not happen immediately. On the other hand the post-processing stuff was a huge hack and broke in somewhat strange ways. We had other priorities. In the end GHC, LLVM, and LLVM users benefitted, but it was not exactly ideal or easy, necessarily.

On the other hand, "normal" code generation bugs like register misallocation or whatever, caused by extreme cases, were occasionally fixed by upstream developers, or patches were merged quickly. But absolutely none of this was as simple as you think. LLVM is largely a toolchain designed for a C compiler, and things like this show. Rust has similarly stressed LLVM in interesting ways. Good luck if your language has interesting aliasing semantics! (I gave up on trying to integrate LLVM plugins into our build system so that the code generator could better understand e.g. stack and heap registers never aliased. That would have resulted in better code, but I gave up because it turns out writing and distributing plugins for random LLVM versions your users want to use isn't fun or easy, which is a direct result of LLVM's fast-moving release policy -- and it is objectively better to generate worse code if it's more reliable to do so, without question.)

Finally, LLVM's compilation time issues are very real. Almost every project that uses LLVM in my experience ends up having to either A) just accept the fact LLVM will probably eat up a non-negligible amount of the compilation time, or B) you have to spend a lot of time tuning the pass sets and finding the right set of passes that work based on your design and architecture (e.g. earlier passes outside of LLVM, in your own IR, might make later passes not very worth it). This isn't exactly LLVM's fault, basically, but it's worth keeping in mind. Even for GHC, a language with heavy "frontend complexity", you might suspect type checking or whatever would dwarf stuff -- but the LLVM backend measurably increased build times on large projects.

> Either merge it back in and be done

It's weird how you think coordination costs aren't a big deal and then immediately say afterwords "just merge it back in and be done". Yeah, that's how it works, definitely. You just email the patch and it gets accepted, every time. Just "merge it back in". Going to go out on a limb and say you've never actually done this kind of work before? For the record, Rust has maintained various levels of LLVM patches for years at this point. They may or may not maintain various ones now, but I wouldn't be surprised if still they did. Ebbs and flows.

I'm not saying LLVM isn't a good project, or that it is not worth using. It's a great project! If you're writing a compiler, you should think about it seriously. If I was writing a statically typed language it'd be my first choice unless my needs were extreme or exotic. But if you think the people working on this Rust backend are somehow unaware of what they're dealing with, or what problems they deal with, I'm going to go out on a limb and suggest that: they actually do understand the problem domain much, much better than you.

Based on my own experience, I strongly suspect this backend will not only be profitable in terms of compilation time, which is a serious and meaningful metric for users, but will also be more easily understood and grokked by the core developers. And Cranelift itself will benefit, which will extend into other Rust projects.

Your points are well taken. Now imagine the rust compiler 15 years from now after great effort has made the backend optimizers great - most of your criticisms to llvm will apply there. It will be rust, and lack some code that isn't needed to optimize rust, but otherwise it will be extremely complex and hard to get into. Merging new fixes will take a long time because it is so hard.

C++ isn't a great language, but learning C++ is the least difficult part of the problem to contributing to llvm.

One advantage is that you don't need a C++ compiler to build the Rust compiler. Dealing with building C++ projects can be a major headache.

Historically writing a compiler in the language that you’re promoting is a good way to really understand the limitations of your language.

I think this works so well because language designers tend to understand compilers better than they understand other software.

I heard Niklaus Wirth would only allow new compiler optimizations (in his compilers for Pascal, Oberson, Modula-2) that proved themselves by speeding up the compiler itself.

Hahaha that sounds excessive!

JavaScriptCore does it differently: many of our benchmarks are either interpreters or compilers written in JavaScript.

One of those benchmarks, Air, is just the stack slot coloring phase of JSC’s FTL JIT (that JIT has >90 phases) rewritten in JavaScript instead of C++. It runs like 50x slower in JS than C++ even in JSC, which wins on that test. So, probably it won’t be possible to write a JS VM in JS anytime soon. I mean, surely it’ll be possible, but it’ll also be hilariously shitty.

The specific metric, IIRC, is self-compilation of the compiler. Adding optimizations to the compiler needed to speed up compilation of the compiler more than their added complexity slowed down compilation of the compiler.

This is the mandatory rule for Chez Scheme which was only broken once when their entire backend was rewritten, and also (from what I have heard) a large guiding principle for the C# compiler at Microsoft.

It's extreme but it's a good idea because it treats compilation time like an actual budget, which it is. You can't just add things endlessly. But it's not easy to achieve in practice.

Which might or might not be a good idea. When I'm writing code at my desk the faster it builds the better. I just need my unit tests to finish and they are small. When it the same code running on my embedded system with lots of data being thrown at it in real time and the cpu load is near (sometimes over) 100% I'll take every optimization of the final code I can get no matter how long it takes to build.

It would be nice if a gcc replacement compiler made speed of building code the goal. I'll even accept speed of building the compiler after it was compiled with gcc (clang, msvc...) as the benchmark if that is faster.

because bootstrapping entire systems without C.

I still remember when Clang bringing LLVM along was seen as SO OUT THERE and I'm just mentioning it because I find it weird to be old enough to see fads in system languages come and start to go.

Just curious, do you have any examples of this "limitations" you speak of? Sounds like a very interesting read.

As an example, WebKit had an LLVM-based JavaScript optimizer in 2014 (https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/), but dropped it for another one in 2016 (https://webkit.org/blog/5852/introducing-the-b3-jit-compiler...)

In broad strokes, LLVM chooses to optimize for generating good code for statically compiled code more than for, for example, memory usage, compilation speed, or ability to dynamically change compiled code. That doesn’t make it optimal for JavaScript, a language that’s highly dynamic and often is used in cases where compilation time can easily dwarf execution time.

Worth noting that B3’s biggest win was higher peak throughput. It generated better code than llvm. It achieved that by having an IR that lets us be a lot more precise about things that were important to our front end compiler.

It’s not even about what language you’re compiling. It’s about the IR that goes into llvm or whatever you would use instead of llvm. If that IR generally does C-like things and can only describe types and aliasing to the level of fidelity that C can (I.e. structured assembly with crude hacks that let you sometimes pretend that you have a super janky abstract machine), then llvm is great. Otherwise it’s a missed opportunity.

> It achieved that by having an IR that lets us be a lot more precise about things that were important to our front end compiler.

Do you have any examples off-hand? I presume caring about patchpoints and OSR is as fair gain to start with?

And aliasing. The aliasing story in B3 is so wonderful. That was one of the biggest wins - being able to say for example that something can side-exit (and can do weird shit after exit) but doesn't write any state if it falls through.

LLVM's MCJIT library is 17MB. If you have a language that you want to JIT and you thought you could embed your language like lua (<100k), Python (used to be ~250k but now <3M), you're looking at almost 20MB out of the gates. Not ideal!

Also if you want to use llvm as a backend for your project and expect to build llvm as part of a vendored package, the llvm libraries with debug symbols on my machine was about 3GB. Also not ideal.

Llvm makes some questionable choices about how to do SSA, alias analysis, register allocation, and instruction selection. Also it goes all in on UB optimizations even when experience from other compilers shows that it’s not really needed. Maybe those choices are really fundamental and there is no escaping them to get peak perf - but you’re not going to know for sure until folks try alternatives. Those alternatives likely require building something totally new from scratch because we are talking about things that are fundamental to llvm even if they aren’t fundamental to compilers in general.

I dislike UB, but I do at language level. When LLVM is reached, UB can only have and only be continued to be removed, never added (from a global point of view, applying general as-if rules a compiler can always generate its own boilerplate in which it knows something can not happen, then maybe latter leverage "UB" to e.g. trim impossible paths, that are really impossible in this case -- at least barring other language level "real" UB). So are there really any drawback to internal exploitation of "UB" (maybe we should call it otherwise then) if for example the source language had none?

It is true that compilers sometimes have to have operations that have a semantics that are defined only if some conditions hold. But LLVM's and C's interpretation of what happens when the conditions don't hold is extraordinarily liberal and I'm not sure that is either beneficial or sane.

Like, LLVM tries not to add UB, but design choices it made to support optimization with UB do sometimes result in new UB being introduced, like the horror show that happens with `undef` and code versioning.

So, I think that optimizing with UB internally is fine but only if it's some kind of bounded UB where you promise something stronger than nasal demons.

> Llvm makes some questionable choices about how to do SSA, alias analysis, register allocation, and instruction selection.

Do you mind expanding more on these points or directing me to some places where I can learn more about them? Compilers are a fairly new field for me, so anything I can learn about their design decisions and tradeoffs are worth their weight in gold.

HN isn't the place to go for conservative opinions on compilers :)

This is also why I think it was great that Maxime eventually graduated into GraalVM.

Another tool for compiler research using modern approaches with type safe languages.

Isn't GraalVM completely tied to LLVM bitcode, and therefore has all the same problems that LLVM has ?

Not at all. There’s an LLVM bitcode interpreter built on top of GraalVM, but the VM itself is heavily reliant on the internals of OpenJDK.

Isn’t that just the (née sulong) llvm frontend? IIUC GraalVM is deeply dependent on OpenJDK internals.

Not even remotely.

There are non-llvm compilers.

FreePascal for example has its own x86, arm, mips, sparc and powerpc backends

We are also seeing MLIR emerging as a compiler framework and LLVM being a dialect of that. This is happening within the LLVM project itself. From this point, it may be easier to write compilers without bringing in all of LLVM with it.

Note that the LLVM monoculture came about because of how much of a pain GCC is to work with.

And GCC being a pain to work with is a deliberate decision by Stallman to avoid his baby being expanded upon by corporations

That sentiment is about 10 years out-of-date. Today, GCC supports modules better than clang/LLVM, and has moved to a minimal C++ coding standard. And time has proven that clang and LLVM are no less a moving target than GCC--it turns out that simply writing things in C++ with OOP doesn't automatically guarantee API compatibility while preserving the ability to hack on the implementation.

GCC can't do runtime retargeting. This is a major drawback because suddenly you need your distro to think about your pet niche target. Suddenly you need your build system to choose the correct linker instead of just being able to use ldd. I'm a big fan of GNU and the GPL, but clang is much better in this regard.

It's unfair to rms to say that. He would be happy for corporations to use and contribute to any project associated with the GNU project (like GCC), if everyone wanted to play along in GPL land (which of course isn't reality).

I think it's partially fair; gcc, in order to make it impossible to add proprietary add-ons, deliberately has an non-modular architecture, which makes it hard even for open source extensions to exist.

Unfortunately not. For years people wanted gcc to output a nice parse tree for C++, which would have been plenty useful for open source text editors, but was banned by RMS as it would also be useful for closed source systems.

This was a legitimate concern, which helped in the days GCC originated in, and hurt later on. Parsing C and C++ well and doing intermediate code generation and optimization was the hard part; taking the result and generating code for a target architecture might well have been proprietary for many architectures if it had been allowed to be, in the era when UNIX and other OS vendors were fighting against each other for every scrap of differentiation. And frontends for some languages would have been proprietary, as well.

LLVM's extensible architecture was its most critical property; its permissive license is an unfortunate side effect of a rewrite.

If GCC had come up with the "GCC Runtime Library Exception" way back in the day, and provided a modular architecture, half the innovation happening around LLVM might have happened around GCC instead. (Might, not "would have"; we can only speculate on what alternate history might have occurred.)

Not sure, but I think his concern was more about the introduction of opaque steps being introduced in the compiler and becoming something people depend on. A weak analogy might be nVidia drivers on linux, imagine a new arch where part of the toolchain is a closed blob.

It turns out that hasn't happened yet with LLVM and allowing such things under LGPL may have worked.

I agree with this point of view; the three E's, Embrace, Extend, Extinguish, are already rampant in the compiler industry, and I see his choices as sacrificing ease of use for more transparency.

Having a GCC monoculture wouldn't be much better.

"Expanded upon" is a funny way of saying "incorporated into systems that removed their users' freedom".

Now it's called monoculture? Rather strange. But anyway: in your terms you're just replacing LLVM monoculture by Rust monoculture, isn't it?

Yes, it's a monoculture when the majority of all compiler work/research is happening on one compiler chain. (I feel like GCC is still competitive enough to keep up some competition, but Clang does have a lot of backing.) And yes, if we made a rust replacement and that somehow eclipsed all other compiler suites it would be a monoculture and be bad, but that's unlikely and creating an alternative to the most popular option reduces monoculture issues by adding more options.

LLVM has a library and modular approach which makes it easier for people to contribute just in their area of expertise instead of having to find their way in the hundred of thousands of line of GCC.

So if we join forces and create a reusable compiler backend so not every compiler writer has to implement the same optimizers and code generators over and over again, then this is bad because it's a monoculture? How strange is that?

To me, it sounds more like political propaganda from a few idealists who want to justify why - instead of participating in a joint project - they want to develop everything themselves from scratch in their favourite technology. For this there is, nota bene, also a common term: "Not invented here" syndrome.

> So if we join forces and create a reusable compiler backend so not every compiler writer has to implement the same optimizers and code generators over and over again, then this is bad because it's a monoculture? How strange is that?

Why is that strange? You now have a diverse set of frontends and a monoculture on the backend. A world with Chrome, Chromium, Edge, Brave, and the Yandex browser is still a browser engine monoculture.

The word "monoculture" means something completely different. LLVM is by definition no "monoculture" because it integrates a diverse set of function and is developed and maintained by a diverse set of people, and there is a plethora of versions and applications. With your definition each country or company would be a monoculture; and mathematics would be a monoculture because it always applies the same rules; and all fuel driven cars would be a monoculture; and electrical engineering would be a monoculture because everyone applies Maxwells laws; you name it. This misuse of terms is done with the intention of creating a negative connotation where none is justified.

One cool advantage of having multiple compilers for a language is that you can use one as a check on the other.

For example, if you're worried that one of the compilers might be malicious, you can use the other compiler to check on it: https://dwheeler.com/trusting-trust

Even if you're not worried about malicious compilers, you can generate code, compiled it against multiple compilers, and sending inputs and see when they differ in the outputs. This has been used as a fuzzing technique to detect subtle errors in compilers.

> For example, if you're worried that one of the compilers might be malicious, you can use the other compiler to check on it: https://dwheeler.com/trusting-trust

This still requires the use of a use of trusted compiler though. Comparing two compilers arbitrarily shows if there is consensus, it does not give guarantees about correctness.

From the link.

    In the DDC technique, source code is compiled twice: once with a second
    (trusted) compiler (using the source code of the compiler’s parent), and then
    the compiler source code is compiled using the result of the first
    compilation. If the result is bit-for-bit identical with the untrusted
    executable, then the source code accurately represents the executable.

First, I forgot to disclose: I am the author of https://dwheeler.com/trusting-trust .

As discussed in detail in that dissertation, if you are using diverse double compiling to look for malicious compilers, the trusted compiler does not have to be perfect or even non-malicious. The trusted compiler could be malicious itself. The only thing you're trusting is that the trusted compiler does not have the same triggers or payloads as the compiler it is testing. The diverse double compiling check merely determines whether or not the source code matches the executable given certain assumptions. The compiler could still be malicious, but at that point the maliciousness would be revealed in its source code, which makes the revelation of any malicious code much, much easier.

You're absolutely right about the general case merely showing consistency, not correctness. I completely agree. But that still is useful. If two compilers agree on something, there is a decent chance that their behavior is correct. If two computers disagree on something, perhaps that is an area where the spec allows disagreement, but if that is not the case then at least one of the compilers is wrong. The check by itself won't tell you whirch one is wrong, but at least it will tell you where to look. In a lot of compiler bugs, having some sample code that causes the problem is the key first step.

Ha, I didn't even notice the username! I agree consensus (or lack thereof) is an useful property to demonstrate. I think I may have been a bit of a pedant in my prior comment.

Sounds fascinating. Are there real-world examples of malicious compilers?

Yes, there was a malicious compiler system for Apple iOS that was released in China a few years back and subverted a large number of mobile applications, including apps used in the US and Europe. There was also a subverted Delphi compiler a number of years back, though I don't think the subversion was dangerous it was more like a test case. And of course, Ken Thompson demonstrated the attack in the 1980s. There may be others, but I remember those offhand.

IIRC this was feasible because people in China are behind the GFW which throttles/blocks the mac app store, so most people download from in-country caches, which circumvents a lot / all of the app signing that Apple uses.

i read a story about a compiler adding malware to the compiled binary once.

they kept getting owned until they supposedly found a pretty dump hack which just appended the backdoor to the final compilation on the build server...

no clue if it was just a story though, as i personally havent experienced anything like that before.

I don't think this is what you're looking for, but Coding Machines[1] is a great little story in which the Ken Thompson hack[2] plays a role.



Yes, that's right, that's another story about a subverted compiler. I don't have any way to verify it, but I have no reason to doubt the story. It is quite possible, and not even that difficult to do if you want to be that malicious. I don't have a URL for it, maybe someone else can provide that.

Neat. Reduces attacks to conspiracies.

Please don't quote with code blocks. Makes reading on mobile very difficult.

The quote reformatted:

> In the DDC technique, source code is compiled twice: once with a second (trusted) compiler (using the source code of the compiler’s parent), and then the compiler source code is compiled using the result of the first compilation. If the result is bit-for-bit identical with the untrusted executable, then the source code accurately represents the executable.

Yep! This is a very good property, and part of why mrustc is a big deal.

>>"That’s Bjorn3, he decided to experiment in this area whilst on a summer vacation, and a year & half later single-handedly (bar a couple of PRs) achieved a working Cranelift frontend."

Is this guy human? This is amazing, and this guy should be given an award.

The D programming language has 3 compilers, one with LLVM (LDC) one with GCC (GDC) and one with the Digital Mars back end (DMC).

It's great to have all three, as they each have different characteristics in terms of speed, generated code, debug support, platform support, etc. Supporting these three also helps maintain proper semantic separation of code gen from front end.

Has the D community been growing or shrinking over the past decade or so? Staying relatively the same size?

This feels welcome to me. I tend to think a language needs multiple independent implementations that only share the same source language spec, in order to really tear a clear spec apart from the quirks of any particular implementation.

I find Rust (the spec, though also the implemenration) quite safe and practical (a balance). It deserves some independent implementations to secure a long and stable future.

On the other hand, I want to use it on non-ARM embedded platforms, where current cross-compilation through C produces unusably big binaries. I dream this might increase hope for that, too, eventually.

>> I find Rust (the spec, though also the implemenration) quite safe and practical (a balance). It deserves some independent implementations to secure a long and stable future.

Where is the Rust spec? Unless something happened really quickly that I was not aware of there is only the implementation.

https://doc.rust-lang.org/stable/reference/ is the closest thing we have. It is not yet complete.

Thank you! I look forward to the day when there is a spec, but I was surprised to see it mentioned and was wondering if I missed something big.

The thing that struck me most about the article was this quote from the Rust Survey (2019):

“Compiling development builds at least as fast as Go would be table stakes for us to consider Rust“

Go was designed from the ground up to have super fast compile times. In fact, there are some significant language issues related to that design decision.

Using one of the primary design goals that impacted language structure as "table stakes" is almost certainly going require a lot of effort with some serious unintended consequences.

Improving compilation times sounds good. Aiming high is good. But reaching "best of breed performance" is major initiative.

If you mean generics, D, Delphi, Ada and plenty of other languages prove you can have them and still be pretty fast.

When writing a language like Rust, is the biggest challenge simply deciding what Rust's features and behaviors should be? And implementing the syntax and Rust -> LLVM compiler is really just a chore for the individuals who are super familiar with the implementation of these languages? Or is the technical implementation also genuinely challenging and non-obvious?

First of all, deciding features and behaviors is not simple. :)

There are a number of technical implementation challenges in the compiler.

It is a large project, and Rust's got a really intense stability policy.

The compiler was bootstrapped very early, when the rate of change of the language itself was still "multiple things per day." This introduced significant architectural debt.

There have been multiple projects that have re-written massive parts of the compiler, and more ongoing. For example, non-lexical lifetimes required inventing an entire additional intermediate language, re-writing the compiler to use it, and making sure that everything kept working while doing so.

More recently, the compiler has been being re-done from a classic, multiple-pass architecture to a more Roslyn, "query-based" one. Again, this is being done entirely "in-flight", while keeping a project that's used by a lot of folks stable. The rust-analyzer has made this project even more interesting; a "librarification" strategy is being undergone to make the compiler more modular.

For some numbers on this kind of thing, https://twitter.com/steveklabnik/status/1211667962379276288/... and https://twitter.com/steveklabnik/status/1211717308143587334/...

> Rust's got a really intense stability policy.

I know the code won't stop running, but I wonder how soon it stops being idiomatic. If it's not idiomatic, it's harder to maintain due to unfamiliar style and structure. Does Rust have measures to deal with this issue?

I think the closest thing is enforced rustfmt. I don't hack on the compiler though, so maybe there's some stuff that the team does that they don't broadcast super widely.

I don't mean the code of the Rust compiler, I mean code written in Rust becomes unidiomatic as the idioms change. How fast does that happen, is it a problem, is it being addressed?

Okay, so hilariously, I thought you meant that at first, but re-read your comment, and said "oh, but we're talking about the compiler's maintenance and I mentioned how often the language is changing, so I must have misunderstood." Should have stuck with my gut!

It is not a problem. A lot of processing steps go on before the meat of the work gets done; many new idioms end up boiling away entirely as part of this process. Like, the borrow checker doesn't even know about loops; by the time the code gets there, it's all been turned into plain old gotos. The further you get into the compiler, the simpler of a language it gets, and everything is defined in terms of sugar of the next IR down.

I don't think this analysis really captures idiom questions, so while it's related, I'm not sure it's the right thing here :)

Honestly, it’s probably the perfect time to dive in, now that async/await has dropped.

During my time in rust, the major changes in idiomatic code have been around Results/Errors, async/futures, and a few macros and syntactic sugar goodies have evolved. None of these evolutions were problematic to migrate to, and all of them were moving in the right direction, IMO.

Is the rust book still the best place to start if you're a veteran c/c++ programmer? https://doc.rust-lang.org/book/

Depends on opinion; some also really like the O'Reilly book.

Using the word "simply" is a quirk of mine. I'm very much trying to express that I think it's by far the hardest part. Thanks for your response!

It's all good :) You're welcome!

The concept of lifetime management is relatively novel and uncharted territory, if I understand correctly. There's only some prior art. So implementing that must have been an adventure and a half.

And while I'm sure the folks who work on these languages are wonderfully intelligent people, let's dispel this notion that you need to be a super genius to implement a compiler or something like that!

It seems magical, like one of the hardest things you could program-- but take a look through crafting interpreters, if you will: http://craftinginterpreters.com/

"Nothing is particularly hard if you break it down into small jobs." - Henry Ford

I walked through the Java portion of Crafting Interpreters and indeed, they can be much simpler than you might imagine in your early years. I was more just paying a compliment than suggesting that maintainers are superhuman. Everyone who ships productive code is a wizard and you can be too!

I modified my original question to avoid a potential distraction from what I want to talk about. Thanks!

Do you have any links for the prior art? I'm sincerely interested, thank you if you do. I'd also be interested in any articles (or even blog posts) that describe Rust's process for that from a compiler writer's point of view.

A language like Rust aims for "zero-cost abstractions", which means the features and behaviors of the language must be evaluated in the context of the implementation.

Let me attempt to unpack this into language I think I understand:

It's up to Rust's compiler to verify all the contracts made, because in order for the binary to be "zero-cost", none of those checks are being done at runtime. If you looked at the output assembly, you would see what looks like very carefully written code that shows no explicit signs of protecting itself. Ie. there's no swaths of boilerplate assembly doing borrow checking, out of bounds checking, etc.

You're not wrong, exactly, but I'm not 100% sure this is right. The way I would put it is this: Rust has made certain commitments about performance. This means that language changes have to be made in the context of how they are implemented, because releasing a language feature that causes a significant performance degradation would make it not a good fit by definition.

That might be a stronger way of putting it than I would.

The canonical example is iterator chains with complex logic compiling down into vectorised and unrolled loops. Powerful logical abstractions are used by the compiler to generate code that does what it says to do without the runtime cost of closures or whatever other logical but not mechanically necessary things you have.

Iterators can't go out of bounds, so the compiler can elide those checks. There are still some runtime costs at the intersection of safety, ergonomics, and performance. Bounds checking, overflow checking. But they have escape hatches and are relatively rare in the language. Most things do compile out.

Novel compiler backends are a super cool idea, but I don't think it's going to help Rust compile speeds as much as this posts suggests. The complexity of Rust's type system puts a pretty high lower bound on compile times because of work the front end needs to do. Plain C compiles quickly even with an LLVM backend, for example.

Rustc normally spends way more time in LLVM than in the frontend. Rust parsing and type checking are very fast in comparison to LLVM's codegen.

Here is a chart from last September showing where the time goes in compiling a large Rust codebase (rustc itself):


(Scroll down to the large horizontal bars once dependencies have been built.) (Sorry if GitHub is down at the moment; try later if it doesn't load.)

The blue part of each bar is time in the frontend, the purple part is time in LLVM. The largest bar (rustc) spans 105 seconds in LLVM out of 140 total, or 75% in LLVM. Many of the subcrates are even more dominated by LLVM time, for example look at rustc_metadata or rustc_traits where >95% of compile time is spent in LLVM.

Rust is famous for throwing garbage IR at LLVM and hoping it cleans it all up. They've made a lot of progress but comparing the timing is very misleading when the work is intentionally offloaded to LLVM.

My comment is in response to "super cool idea, but I don't think it's going to help Rust compile speeds". Even a compiler that emits garbage IR from the frontend would get 20x faster with a magic instant backend if 95% of time is currently spent in the backend.

A magic instant backend is unrealistic, so Rust will need to move some of the current backend work to frontend work for things that can be done more efficiently on the frontend. But the fact remains that there is an opportunity for big improvement from a much faster backend.

Isn't that kind of the point of having a relatively high-level backend - to avoid the need for every front-end to do the same tedious optimizations?

You do have to have some sort of balance. With a large enough input, any program will become slow.

It sounds like this is addressed in the post, no? LLVM IR doesn't always map cleanly to Rust representations, so a backend that supports a cleaner mapping should obviate the need for more frontend optimization.

I'm basically a layperson when it comes to this topic, but that's my understanding of one of the potential benefits of a different backend.

I think you might be surprised just how much time is spent in codegen- and optimization-related code.

For example, a bit over 75% of the time needed to compile the regex crate can be attributed to codegen- and optimization-related events, with a bit over 64% of that time spent in LLVM-related events specifically [0]. Granted, I'm not certain whether this is a release or debug build, but it does show that there is room for significant wins by switching backends.

As for why C can compile quickly with an LLVM backend while Rust can't, I'm not sure. I've read in the past that rustc generates pretty bad LLVM IR to pass to the backend, and it takes time for LLVM to sort through that, but there's probably some other factors in there too.

[0]: https://blog.rust-lang.org/inside-rust/2020/02/25/intro-rust...

I wonder how much of it is just code style. A simple for-loop in C is probably going to be an iterator blob in Rust with an order of magnitude more code for the backend to chew through.

Part of it may be code style, but another part comes from just how much LLVM IR the frontend generates for just about any code style.

Part of the reason LLVM runs so much faster on C than on Rust is that Clang is smarter about generating less/better IR from the start, so LLVM's optimizer has less of a hole to dig itself out of.

I think it's actually mostly the language. Rust idioms just generate a lot of code.

Clang is in a worse spot than rustc is in terms of emitting good LLVM IR, since it has no IR aside from the AST. By contrast, rustc has MIR which is more amenable to optimizations. At this point I'm fairly sure the problem is just that C code naturally generates less IR than Rust code does. All those function calls that go into iterators, array indexing, containers, etc. etc. add up quick.

Modern C++ is in a similar situation with iterators, indexing, etc. and it's still often faster for LLVM to get through.

MIR optimization can help close the gap but that still involved generating "garbage" that takes extra time to codegen (in Debug) or optimize out (in Release).

Being smarter about generating IR, whether MIR or LLVM IR, is still an area rustc has a lot of room to improve, even given Rust's idioms. E.g. stuff like this: https://github.com/rust-lang/rust/issues/69715

I think people are way too hard on rustc. That linked issue is pretty esoteric and is a good example of the low-hanging fruit being picked. I doubt it'll move the needle much.

It's hardly esoteric when it affects some of the most common (and commonly-inlined) methods in the entire language, which are a large part of Rust's unique idioms.

We can compare them! https://godbolt.org/z/-Fzuqs

(My screen is small so it's tough for me to read these results, to be honest...)

I believe that is the IR after the optimizer has chewed through it.

Oh duh. Should be obvious that you can drop the -O flags, at least.

Yes, which gives about 40 lines of IR for C, and about 1300 for Rust (Godbolt truncates it at 500).

While the type system does add to compile times, profiling generally doesn't show that it's the current limiting factor for compile times. Additionally, tools like rust-analyzer will give you type errors pretty much instantaneously, though of course that work is not finished.

Also of note, this blog post isn't speculation; they posted numbers from actually doing it.

A 30% speedup is nothing to sneeze at, but it's not putting Rust within spitting distance of Go or C for similar amounts of code.

Absolutely. I don't see where anyone is claiming that.

Haskell, OCaml, SML, Idris also compile quite fast, with complex type systems.

Their secret? Multiple backends with different kinds of optimizations.

You don't need to compile for the ultimate release performance when in the middle of compile-debug-edit cycle.

From my (limited) experience, Haskell does not compile fast, especially if you’re doing something that needs lenses.

It surely does, because Haskell is not one compiler language, not only does it have multiple implementations, which I concede almost everyone only cares about GHC, there are interpreters and a REPL experience as well.

You don't need to compile your program in one go using GHC's LLVM backend, many times a GHCi session is more than enough.

Haskell and Idris both have very slow compilation time.

Only when using release mode compilers, they also have interpreters and REPLs to choose from.

I've been wondering lately if the modern compilers should all be using C as the intermediate language (or some language-specific code optimization opportunities could be lost if they do that).

If you're trying to implement a language with substantially different semantics from C (e.g. a substantially different memory model, or without UB) the semantics of C make it really unsuitable as an IR.

You can't use C's casts (undef for out of range float -> int conversions, for example), arithmetic (undef for signed overflow), or shift operators (implementation-defined behavior for signed right shifts, undefined behavior for left shifts into the signbit or shift counts not in [0, n)). You can work around these by defining functions with the semantics that your language needs, but they get gross pretty quickly (they are both much more verbose and more error-prone than having an IR with the semantics you really want, and they require optimizer heroics to reassemble them into the instructions you really want to generate). Alternatively, you can use intrinsics or compiler builtins, but then you're effectively locking yourself to a single backed anyway, and might as well use its IR.

The issues around memory models (especially aliasing, but also support for unaligned access, dynamic layouts, etc) are worse.

Even LLVM IR is too tightly coupled to the semantics of C and C++ to be easily usable as a really generic IR for arbitrary languages (Rust, Swift, and the new Fortran front end have all had some struggles with this, and they're more C-like than most languages). C is much worse in this regard.

I agree 99.44%.

The behavior of shift operations on signed integers will be fixed in C++20 and C2x, as part of the effort to require twos complement representation. It is a massive potential source of UB in currently standardized C and C++.

All the other problems listed remain.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...

[2]: http://eel.is/c++draft/expr.shift#2

Even after C2x is finalized, people will be using C compilers that don't conform to C++20 and C2x for at least another decade, so you'll forgive me if I don't hold my breath =)

C is a pretty lousy intermediate language:

* It's missing several useful operators, such as classic bit manipulation (count trailing zero, byteswap), or even 8- and 16-bit arithmetic. Checked arithmetic is another useful one that's not present (or even really possible in C's ABI).

* Signed integer overflow is UB.

* Utterly no support for SIMD types.

* Proper IEEE 754 floating-point control is kind of spotty, although it tends to be as bad or worse in most other languages.

* ABI control is poor. You can't come up with any way to return multiple register values, for example.

* Anything that's not a vanilla function isn't supported. No uparg function support (required for Pascal), multiple entry points (required for Fortran), or zero-cost exception handling (required for C++). Hell, even computed goto isn't actually supported.

And all of this is assuming you have strong control over how you expect implementation-defined behavior (e.g., sizeof(int)) to work.

Just out of curiosity, what would be a great intermediate language to transpile to (as of intermediate language)?

Of ones I'm a familiar with, LLVM IR is probably the best, although it has other issues of its own (in particular, floating point is done even worse than C). I'm not aware of any language which is going to beat a retargeting compiler's processor-agnostic IR.

But even the "better C" languages tend to not really attempt to expand C structurally. The changes amount to fixing the egregious semantics (fixed-size types, no int-promotion, define signed overflow, etc.), add vector types and other operators, maybe tweak ABI a little bit, and add a whole lot of syntactic sugar. And those languages that explore beyond C's limited structural repertoire do so at the cost of C's specificity.

That said, ever since the last time someone asked me this kind of question, I've been trying to design a portable assembly language.

Interesting. I need to look into LLVM IR a bit more to understand this subject better.

>> I've been trying to design a portable assembly language.

Couldn't something like Forth fulfill this role?

> in particular, floating point is done even worse than C

Do you mind expanding on this or pointing me to places where I can read more?

There is a hidden floating-point environment that affects, and is affected by, every single floating-point instruction. Predominantly, this is rounding mode control, sticky bits, and exception control (does overflow cause a SIGFPE?), although most processors have some form of flushing denormals or treating them as 0s, which isn't in IEEE 754.

LLVM's floating point instructions assume that there is no floating point environment [1]. And there's no real facility to indicate that floating point instructions might be affected. To remedy this, they've been working on adding constrained floating point intrinsics.

[1] More specifically, that the environment is set up to the default rounding mode (round-nearest), all exceptions are masked, and no one will ever care about sticky bits.

Is GIMPLE any worse or better in this regards?

I don't have much experience with GCC innards, but after doing a quick test [1], it seems GCC is as bad as clang in terms of not supporting STDC FENV_ACCESS, so I suspect the answer is that it's pretty much the same.

It looks like the only major compiler that actually supports IEEE 754 correctly is icc. MSVC, gcc, and clang all optimize floating-point operations without considering if the dynamic environment is the same.

[1] https://godbolt.org/z/x2ERVM

In practice I think it is hard to beat C in this regard. You can go pretty far if you are willing adapt to its quirks! And while C doesn't always allow for the best optimization (such as returning multiple values via registers), the workarounds often are still pretty fast.

On a more theoretical level there has been some research on what a better intermediate language would look like. One project I found interesting was Mu VM, which offers some niceties for compiling languages with a garbage collector.


Firstly the programming language itself, then something like LLVM IR. This is an answer to a slightly different question but rewriting into the same language (i.e. C++ to C++ then LLVM) can make debugging much simpler and implementing features and specific optimizations much more feasible if you don't have control over the backend.

IR's should be terse, simple and dumb. I'm not sure any "real" programming language fits that.

The semantics of C aren't very well defined, there is a lot of ambiguity in the form of undefined and implementation defined behaviour. This ambiguity is often needed to build an efficient optimizing compiler.

When you have a higher level language with more accurately defined semantics, running it all through C would risk introducing undefined behaviour.

With an IR you can control and define the semantics more closely to what your language needs.

> When you have a higher level language with more accurately defined semantics, running it all through C would risk introducing undefined behaviour.

No, it wouldn't. When you target C you need to write a proper backend for its abstract machine, rather than naively rewriting code, of course.

The C abstract machine is a fine IR, specially the later editions of the standard.

I've got to wonder if any of the existing intermediate representations would be appropriate with other programming languages.

This is true to varying degrees, you could say that LLVM-IR and Java bytecode are two examples of this in action.

Modern compilers generally have a language-specific front and that generates an intermediate representation of the program logic, which is then transformed into an abstract representation (such as a single static analysis tree) for optimization. That is then transformed into an abstract machine description language, which gets further transformed by the back end into concrete machine instructions or assembly code.

Outside of the language-specific front end, compilers generally have no knowledge of the programming language itself. There is no technical advantage to transforming Rust into C when it comes to the middle and back ends, which form the bulk of the compiler.

There are no language-specific optimization opportunities. There are, of course, restrictions on what you can do in some languages that eliminate optimization opportunities, but you're not suddenly going to be able to take advantage of those opportunities by transforming your code into a langue that lacks the restrictions, because then you change the semantics of your code.

> There is no technical advantage to transforming Rust into C

There is a key one: the ability to use any C compiler out there (including proprietary ones). This allows you to target all platforms out there.

A dumb interpreter for the IR as bootstraping stage is a better alternative.

Plus very few platforms have only support for C and nothing else, unless we are speaking about esoteric embedded CPUs.

We have an interpreter for MIR. It isn't fast enough.

I clearly mentioned "as bootstraping stage", for nothing else.

I see, I think I misinterpreted you. Sorry!

Of course we are talking about embedded CPUs. And no, they are not "esoteric". They are everywhere around you.

Then you can also add custom CPUs and systems, FPGAs, etc. Those are way more rare, but still something people use daily in some industries.

While a C backend is great for compatibility, is it a sufficient IL to express everything? For example, Rust has some extra guarantees with aliasing that I'm unsure if C or C extensions support yet that could offer greater optimizations (currently not fully being used due to bugs in the LLVM backend).

It's a sad thing that you've been downvoted for posting a thought. Others have already said the drawbacks of this idea, but there are also pros.

Indeed. Time-to-market is an obvious one and esoteric platform support, and maybe also debuggability.

That’s how C++ started out but as far as I know this had lots of limitations in terms of optimization so they started writing native C++ compilers.

I would be surprised if optimization was the actual goal. C++ is not faster than C.

Not only does C++ provide features that straight C optimizers won't be able to match like templates and constexpr, C++ shares a common subset with C, and libc all major C compilers is actually written in C++ with extern "C" entry points nowadays.

This "C is faster than C++" is a bit dated by now.

I didn't say C is faster than C++, I said C++ is not faster than C and so speed is probably not the main driver behind implementing a C++-->Assembly pipeline without intermediate C.

Playing word games here?

"I said C++ is not faster than C" implies that C++ compilers don't beat C compilers, which as many in HPC, HFT and GPGPU computing domains know is false for years now, and no restrict doesn't help that much against template metaprogramming and constexpr.

I'm not playing word games.

Template metaprogramming and constexpr doesn't help being faster in HPC or GPGPU, it helps reduce the redundancy of your code, for example if you want a generic algorithm on float, double, int, complex.

What helps speed is being able to control memory allocations and having the tool to place the data required on registers, L1 cache or L2 cache as required by your kernel (and similarly for GPU).

On current architectures, what is hard to optimize is memory and data movement, if your data is at the wrong place or not prefetched at the right time it will be literally 100 times more costly than a saved addition from constexpr.

Enough theory,

"Scientific Computing: C++ Versus Fortran" (1997)


"Micro-Optimisation in C++: HFT and Beyond"


"The Speed Game: Automated Trading Systems in C++"


"When a Microsecond Is an Eternity: High Performance Trading Systems in C++"


It might be Internet and the issue of communicating emotions across but you sound quite taken by this issue.

Anyway, I stand by what I say and I'm backed by my high performance code:

- Writing matrix multiplication that is as fast as Assembly, complete with analysis and control on register allocations, L1 and L2 cache tiling and avoiding TLB cache miss:

- https://github.com/numforge/laser/blob/master/laser/primitiv...

- Code, including caveat about hyperthreading: https://github.com/numforge/laser/blob/master/laser/primitiv...

- The code is all pure Nim and is as fast/faster than OpenBLAS when multithreaded, caveat, the single-threaded kernel are slightly slower but it scales better on multiple cores.

- I've also written my own multithreading runtime. It's scale better and has lower overhead than Intel TBB. There is no constexpr, you need type-erasure to handle everything people can use a multithreading runtime for, same comparison on GEMM: https://github.com/mratsim/weave/tree/v0.4.0/benchmarks/matm...

- More resources on the importance of memory bandwidth: optimization convolutions https://github.com/numforge/laser/wiki/Convolution-optimisat...

- Optimizing matrix multiplication on GPUs: https://github.com/NervanaSystems/maxas/wiki/SGEMM, again it's all about memory and caches optimization

- Let's switch to another domain with critical perf need, cryptography. Even when knowing the bounds of iterating on a bigint at compile-time, compiler are very bad at producing optimized code, see GCC vs Clang https://gcc.godbolt.org/z/2h768y

- And crypto is the one thing where integer templates are very useful since you know the bounds.

- Another domain? VM interpretation. The slowness there is due to function call overhead and/or switch dispatching and not properly using hardware prefetchers. Same thing, C++ constexpr doesn't help it's lower-level, see resources: https://github.com/status-im/nimbus/wiki/Interpreter-optimiz...

Also all the polyhedral research, and deep learning compiler research including the Halide compiler, Taichi, Tiramisu, Legion, DaCE confirm that memory is the big bottleneck.

Now since you want to stop on the theory and you mentioned HPC, pick your algorithm, it could be matrix multiplication, QR decomposition, Cholesky, ... Any fast C++ code (or C, or Fortran or Assembly) that you find will be fast because of careful memory layout and all level of caches, not constexpr.

If you have your own library in one of those domains I would be also very happy to have a look.

As a simple example, let's pick an out-of-place transposition kernel to transpose a matrix. Show me how you use constexpr and template metaprogramming to speed it up. Here is a detailed analysis on the impact of 1D-tiling and 2D tiling: https://github.com/numforge/laser/blob/master/benchmarks/tra..., throughput can be increased 4x with proper usage of memory caches.

Ah now we are into the opinion of experts in the matter don't count, only if I prove it myself?

I guess that is why NVidia has spent 10 years doing hardware design to optimize their cards for C++ execution.

Apparently that was wasted money, they should have kept using C.

I mentioned theory and experts, you said enough theory.

I switched to practical applications and walk the talk showing my code, and then you back off and want to go back to opinions.

I see now that you want to back myself with experts since reproducible code and runnable benchmarks is not enough.

Apparently you recognize Nvidia as an expert so let's talk about CuDNN where optimizing convolution is all about memory layout, source: https://github.com/soumith/convnet-benchmarks/issues/93#issu... and it's not about C vs C++ vs PTX.

Or let's hear about what Nvidia says about optimizing GEMM: https://github.com/NVIDIA/cutlass/blob/master/media/docs/eff..., it's all about memory locality and tiling.

Or maybe Stanford, the US government and Nvidia Research are also wrong when pouring significant research in Legion? https://legion.stanford.edu/

> Legion is a data-centric parallel programming system for writing portable high performance programs targeted at distributed heterogeneous architectures. Legion presents abstractions which allow programmers to describe properties of program data (e.g. independence, locality). By making the Legion programming system aware of the structure of program data, it can automate many of the tedious tasks programmers currently face, including correctly extracting task- and data-level parallelism and moving data around complex memory hierarchies. A novel mapping interface provides explicit programmer controlled placement of data in the memory hierarchy and assignment of tasks to processors in a way that is orthogonal to correctness, thereby enabling easy porting and tuning of Legion applications to new architectures.

Are you saying they should have just called it a day once they were done with C++?

Or you can read the DaCE paper on how to beat CuBLAS and CuDNN: https://arxiv.org/pdf/1902.10345.pdf, it's all about data movement. 6.4 Case Study III: Quantum Transport to optimize transistors heat dissipation, Nvidia strided matrix multiplication was improved upon by over 30%, and this part is pure Assembly, the improvement was about better utilizing the hardware caches.

Nah, I was answering the whole "C vs C++" issue.

But then since you saw it was a lousing battle going down that path, you pulled the hardware rabbit trick out of the magician hat.

So we moved from C++ is not faster than C assertion, to memory layouts, hardware design and data representation.

Now you are even asserting that it's not about C vs C++ vs PTX, and going down quantum transport lane?

Yeah, whatever.

Obviously you didn't read what I posted. The Quantum Transport is compute-intensive physics problem that has a lot of optimization research going behind it. One of the main bottleneck to solve this problem is Strided Matrix Multiplication.

There is no C vs C++ issue, you keep saying that constexpr and template metaprogramming matter in high performance computing and GPGPU, I have given you links, benchmarks and actual code that showed you that what makes a difference is memory locality.

Ergo, as long as your language is low-level enough to control that locality, be it C, C++, Fortran, Rust, Nim, Zig, ... you can achieve speedups by several order of magnitude and it is absolutely required to get high-performance.

Constexpr and template metaprogramming don't matter in high performance computing, prove me wrong, walk the talk, don't drink the kool-aid.

There are plenty of well studied computation kernels you can use: matrix multiplication, convolution, ray-tracing, recurrent neural network, laplacian, video encoding, Cholesky decomposition, Gaussian filter, Jacobi, Heat, Gauss Seidel, ...

It’s not faster but it’s also not much slower. I think some features of C++ Would be hard to optimize by a compiler that doesn’t understand them so a C++ to C compiler may produce slow or bloated code.

In embedded development C++ is often frowned upon because in the past people used C++ features that bloated the code.

Regarding features that may not be properly optimized, there are exceptions and move semantics I guess, but exceptions are often avoided like the plague anyway and code can be refactored to get the same effect as move semantics.

Am I missing something?

Doesn't Nim do that?

Just because you shouldn't doesn't mean you can't.

(C itself is not specified very thoroughly but C - a C implementation - is, in the sense that it only does one thing for a given line of code)

Yes, Nim uses C and GCC and this gave it very fast compile times and similar performance. It also runs on most devices supported by GCC.

cfront demonstrated how this is a bad idea. And it was for C++, about as C friendly as you can get.

If there are any rust people here, you've probably considered that you can speed up your debug llvm builds by enabling some optimizations. SimplifyCFG comes to mind, but, like, you can experiment. I presume the reason you haven't is because you want to preserve debug info, and llvm isn't great at that when optimizations are on.

You can customize the debug profile or create an intermediate profile between release and debug in your Cargo.toml. Debug info and optimization levels can be configured separately.

If by speed up you mean compile times and not runtime behavior then there's also some unstable compiler flag that allows adding specific llvm passes.

While it appears that cg_clif is faster to compile, does it provide any performance benefit compared to cg_llvm? Are the compiled binaries as fast as llvm compiled binaries? If not is the use-case for development purposes only?

Correct, cranelift is meant for faster development build cycles


From the article, it is pretty clear that the resulting code is not as optimized as the LLVM backend. I didn't see any claims of how much slower it would be, but clearly that will vary greatly. Fast to compile is still really handy while developing.

This is awesome. It doesn’t even seem that long ago when Boa was started! Man, time flies and people do great things. Kudos to the author and co-contributors for what Boa has become.

Thanks for nice article! Hoping the author reads the comments, I would like to leave an, hopefully useful, feedback.

It would greatly improve the reading experience of your blog if you could make clickable the footnotes/references.

For example when you say:

> I’ve taken the chart from the 2016 MIR blog post[3]

I have to scroll to the end of the page to find the blog post (and then scroll back to resume reading). If [3] were clickable it would be great. It would be even better if [MIR blog post] were an actual link itself.

How do they ensure that output of both compilers is correct?

e.g LLVM output is A, but the new one is B, how do they deal with different results between backends?

There wouldn't be any surprises, or cognitive dissonance, from using very different paths for debug versus release builds?

On a small project, personally I use --release sometimes during development because the compile time doesn't matter that much and the resulting executable is much faster: if I don't use --release I can get a misleading sense of UX during development.

This already happens a bunch, even with the current setups. It's very natural if you come from a compiled language, and not if you don't. The first step of someone saying "hey why is Rust slow?" is five people replying "did you use --release".

Can confirm.

I had a graph traversal program written in Python. I ported it to Rust, and the runtime was identical -- 68.4 seconds, down to the tenth of a second. (Kinda blew my mind -- I had to triple check that I was running and timing what I thought I was!) I had a bit of a crisis of faith.

I poked at it a few times over the next week, then finally got on the IRC channel and quickly received the advice mentioned above. Same input, with --release: 6.2 seconds.

It's funny, because I do the exact opposite.

As a developer I usually have a pretty powerful machine, and I've found that debug mode is a good way to approximate slow computers, and something that is unbearably slow in debug will bother some users later on.

This is an interesting idea, but I guess my one question is how much does the slowness of debug relate to HOW it will be slow in release? Since release optimizations can do pretty radical things to the assembly generated it feels like it wouldn't really be apples to apples.

I thought it would be an issue as well at first, but it has really rarely been an issue.

The performance degradation might not be even, but generally it approximates pretty well a slower system, without having to use a slower system in my experience. You can deal with the few edge cases individually.

If you really care about performance on slower computers, then at some point you'll need to use one for real. But at least this way you have a fast feedback loop.

Does it support JIT compilation, i.e. specialization at runtime?

Cranelift has a JIT, but I am not sure what the status of it is as a rustc backend.

+1 enjoyed how accessible this write up was

Really great.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact