
How to speed up the Rust compiler - nnethercote
https://blog.mozilla.org/nnethercote/2016/10/14/how-to-speed-up-the-rust-compiler/
======
keyle
I've never used rust but seeing it's so oriented towards correctness and type
checking, I'm not surprised it's a little slow to compile!

From someone stuck with dynamic languages, mark my words, it's best for your
compiler to be a bit slow, than your brain being the compiler! :-)

~~~
vintagedave
Why? You can have both fast compilers and safe languages.

Take Delphi for example: statically typed, strong type correctness, you have
to work to avoid type safety. Compiles a million lines of code in _4 seconds_
(including linking.)

* [https://www.youtube.com/watch?v=Yq2mNUzcpE0](https://www.youtube.com/watch?v=Yq2mNUzcpE0) (1 million lines, 4 seconds)

* [https://www.youtube.com/watch?v=kvuxeaMDmMc](https://www.youtube.com/watch?v=kvuxeaMDmMc) (277,000 lines of much more complex code (the most complex, compiler-stressing library Delphi has) in 16 seconds counting linking.

~~~
chamakits
I'm not familiar enough to comment on what different feature set and different
type of work each compiler does to compare the speed, but WOW, that's an
INSANE quick build speed.

Was it always an explicit goal of theirs compilation speed? Is there any
documentation of specific things they did to achieve that build speed?

~~~
mike_hearn
I used to use Delphi a lot, in the 1990's. It's a few things.

Compiler designed for compilation speed above all else. Delphi did not attempt
to ever compete with other languages/toolchains on raw runtime performance of
the generated code .... I don't recall ever seeing Borland/Inprise boast that
a new version generated more optimal code. This is because the code the
compiler produced was fast enough for Delphi's target market (desktop business
apps) and it was still a lot faster than Visual Basic, and a lot more compact
than Java, which were its primary competitors. Delphi sold new versions to
developers by giving them new productivity tools: for desktop apps that spend
all their time idling, raw code performance was rarely an issue.

Single pass compiler. Structuring Delphi apps could be a nightmare. You could
not have circular unit dependencies and breaking such dependencies could often
be irritating as hell. Functions had to be declared in the right order.

No preprocessor or include based model. "Units" defined their header and
implementation in the same file, so the compiler could rapidly load a binary
representation of what a unit exported whilst it was compiling.

No generics. Thus, no explosion of code to generate at runtime, like with C++.

Grammar that was very easy to parse.

Compiler codebase that at its core was designed for very old computers, so
is/was very tightly written.

Probably more. That's just what I recall.

~~~
chamakits
Sweet. Thanks for the info. That provides a lot more context.

I never wrote Delphi myself, but I always appreciated it, because it was used
to develop the first IDE I used fory first programming class. Bloodshed's Dev
C++. Quite an impressive IDE, so I imagine Delphi was (still is? No idea) a
great platform for desktop app development.

~~~
pjmlp
It is, but since Borland lost their vision how to sell software and the tools
division changed hands a few times, they lost most of the customers.

Nowadays most of them are enterprise customers with big maintenance contracts.

Also for shops focusing on Windows only customers, Delphi cannot compete with
.NET.

------
nickpsecurity
I still think they should investigate doing the LISP solution to the problem
where you combine several options:

1\. Interpreter (eg REPL) that executes it instantly with dynamic checks for
rapid prototyping of logic.

2\. Fast, non-optimizing compile that at least does the safety checks.

3\. Fully-optimizing compile that can be done in background or overnight while
programmer works on other things like tests or documentation.

4\. Incremental, per-function version of 2 or 3 for added boost. Rust is
already adding something like that.

This combo maximizes mental flow of programmer, lets them tinker around with
algorithms, quickly produces executables for test deployments, and will often
do full compiles without interrupting flow with wait times. It's the workflow
available in a good LISP. It doesn't have to be exclusive to LISP. Other
languages should copy it. I'll go further to say they should even design the
language and compiler to make it easy to pull that off.

~~~
qwertyuiop924
In CLs, anyways. In Schemes, we usually only have 1 and 3. Maybe 2 and four,
but only sometimes.

~~~
nickpsecurity
The Scheme I was thinking of using to re-learn LISP and Scheme was Racket. Its
environment + How to Design Programs looked like a strong combo. Do you know
how it fares on this list?

~~~
qwertyuiop924
Not really. I'm pretty sure Racket is bytecode compiled, but it's not
nativecode compiled. It has a realtime interpreter, of course.

I'm more of a Chicken fan, myself, and Chicken has a REPL/interpreter and an
optimizing compiler. It's compiler isn't especially fast, even at lower
settings.

Racket really is the C++ to Scheme's C in that it's similar to its parent
(although unlike C++, it's actually not compatible), and is more complex (I
would say too complex, but unlike C++, it's fairly well designed, so it's a
matter of opinion: if you like it, good on you).

Guile is also bytecode compiled. I don't know how fast compilation is.

~~~
soegaard
Racket can compile a racket source file to platform independent bytecode
(stored as zo-files). When the bytecode is run, it isn't interpreted, but a
just-in-time compiler kicks in, generates machine code, and the code is run.

As far as I know - a new compiler is in the works.

------
acqq
As far as I know, the Rust compiler is written in Rust? That should mean that
if the features of the language are used, a lot of "small" allocations
shouldn't happen at all, instead, the language is supposed to allow that the
short lived variables (and objects) exist only on the stack, costing
effectively nothing to allocate them?

Moreover, the compilers in general have an advantage to know that most of the
objects are needed only during specific times, allowing optimized allocation
and bundled deallocation. I see "a lot" of "arena" allocators are mentioned,
so it seems there is something like that, just maybe too fine grained?

In short, there's no need for the Rust compiler to have any measurable
overhead due to the allocations.

Then only the changes in the compiling algorithms and the underlying data
structures (how often is what traversed and why, is something kept or
recalculated, sometimes it's even more expensive to produce too much data to
be kept in memory, and sometimes the data structures aren't optimal for the
most of the uses) should have the potential of changing the compilation speed.

~~~
Chirono
Don't know why you are being down voted. Even if you are incorrect, it would
be interesting to hear why. Anyone care to comment?

~~~
__s
Compilers spend a lot of time with hashtables, which exist on the heap. Rust
isn't some magical language that makes the heap some unnecessary entity. One
still has objects whose lifetime do not fit the stack model or whose size is
dynamic

Their comment went on to claim that given their hypothesis, allocation
optimization shouldn't be impactful. Yet this post demonstrates otherwise,
perhaps showing that therefore their hypothesis is false (if a then b, b is
false, therefore a is false)

One of the post's optimizations was to use Cow<str> over String tho, which is
an example of reducing allocations with borrows

Additionally it's oft mentioned that a weak point in Rust's compile time is
LLVM. This has to do with various things, some of which are Rust's fault (it
could generate less IR), but it points out that there's a lot of time being
spent in C++ code. LLVM's great, but it's guidelines essentially suggest 'if
you want to make llvm play nice, generate IR like clang'

~~~
acqq
> Their comment went on to claim that given their hypothesis, allocation
> optimization shouldn't be impactful.

And what was the hypothesis? That the code for the compiler itself should use
the features imagined to be used by the language, just like your example:

> One of the post's optimizations was to use Cow<str> over String

Then:

> allocation optimization shouldn't be impactful. Yet this post demonstrates
> otherwise

"not impactful": It looks to me like that: 36.526s vs 32.373s for the longest
compile tested. It is measurable, but not something that I as the programmer
will be able to notice. My definition of impactful for the compilation times
is "at least a factor of 2" not 1.1.

And the claim is, once the language possibilities are actually used. If the
code is written differently, of course not much is expected. The result of a
moderate speedup exactly shows that the language possibilities weren't always
used in a way to minimize allocations, but the difference is still not, for my
perception, significant (10% is hard to recognize unless measured or the
actual compilations last hours and not seconds). But I still claim that much
bigger potential gains are the actual changes in the algorithms and the data
structures. The data structures especially can hugely change the result if
they are made to be really cache friendly, and together with the changes in
algorithms such modifications have the potential to make an order of magnitude
speedups, if they are actually possible.

------
pjmlp
In conclusion, allocating memory is expensive, regardless how it is done. :)

~~~
kzrdude
The allocation fixes are scratching on the surface of the compiler. It must be
something more fundamental underlying the long compilation time.

~~~
gribbly
Not sure how much is due to this, but LLVM has gotten much slower in recent
releases, probably due to improved optimizations.

I would not be surprised if down the line, the Rust devs decided to write
their own Rust backend rather than relying on LLVM.

~~~
pjmlp
That would be a good way for removing the C++ dependency on their toolchain,
but it would take decades to achieve parity with the quality of LLVM
optimisers and supported ISAs (which are a subset of gcc ones).

C and C++ optimisers weren't that good in the 80 and 90's even against junior
Assembly developers, the 40 years of compiler research in optimizing C and C++
code that made them what they are nowadays.

~~~
gribbly
While I certainly agree that it would take time, I'm not so sure of 'decades'
needed, I believe the scope of a Rust specific backend would be quite
different from that of a everything-but-the-kitchen-sink backend like LLVM.

Also not sure that reaching optimization parity with LLVM is an absolute
necessity, Clang/LLVM is not as good at optimizing as GCC but that hasn't
rendered it unpopular.

Looking at Go as a benchmark, the compiler seems to be improving the
optimization at a good pace and supports X86, ARM, PPC, MIPS, S390 despite
being relatively young language and also having recently rewritten the
compiler.

~~~
pcwalton
> Also not sure that reaching optimization parity with LLVM is an absolute
> necessity, Clang/LLVM is not as good at optimizing as GCC but that hasn't
> rendered it unpopular.

Clang/LLVM is essentially as good as GCC. It depends on the benchmark you're
looking at. Getting up to LLVM parity would take years and years.

> Looking at Go as a benchmark, the compiler seems to be improving the
> optimization at a good pace and supports X86, ARM, PPC, MIPS, S390 despite
> being relatively young language and also having recently rewritten the
> compiler.

It will take years and years for Go to achieve LLVM's level of optimization
(assuming that they want to; I suspect that they won't when it starts
regressing compile times).

~~~
gribbly
>It will take years and years for Go to achieve LLVM's level of optimization

Looking at Rust vs Go over at 'benchmarksgame' (granted these are micro-
benchmarks), Go seems to compare very well for tests that are mainly
computational and thus avoid involving the GC very much (even beating Rust on
a couple).

[http://benchmarksgame.alioth.debian.org/u64q/compare.php?lan...](http://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=go&lang2=rust)

And let's not forget that Go hasn't really had any focus on optimizations up
until the last release where SSA was introduced.

Anyway, maybe I'm projecting as I would like to see a full Rust toolchain in
Rust, and my (very possibly entirely wrong) impression is that Rust devs have
been struggling with getting the full performance potential of Rust through
LLVM.

~~~
pcwalton
The benchmarks game is one thing. Doing well on real code that isn't
ridiculously microoptimized is quite another.

It makes zero sense to write a new backend for Rust from scratch because Go
does well on a few microbenchmarks. Go's compiler infrastructure is across the
board worse than that of LLVM (in compile-time performance too; some of Go's
algorithms have worse asymptotics than the more sophisticated ones in LLVM).
LLVM doesn't do things like SROA and alias analysis just for fun.

> my (very possibly entirely wrong) impression is that Rust devs have been
> struggling with getting the full performance potential of Rust through LLVM.

I'm a Rust compiler dev and I have no idea what you're talking about. LLVM is
working wonderfully for us. If we had done what Go did, we'd quite possibly
have failed.

I have a different proposal. Instead of reinventing the world for no reason,
let's treat bugs in LLVM/rustc as bugs, and _fix them_.

------
eridius
> _malloc and free are still the two hottest functions in most benchmarks.
> Avoiding heap allocations can be a win._

This is a non-sequitur. If you see a hot function in a profile, that is an
indication that you should optimize that function if you can. But it does not
mean that you should try to reduce calls to that function elsewhere (unless
you can cut out a lot in one go). As a trivial example, if you have a super-
fast function that cannot be optimized any further, but it's called millions
of times, that function may appear hot, but any individual invocation of the
function might be taking just a few microseconds. Getting rid of a handful of
calls to this function (or even a few hundred) isn't going to make an
appreciable difference. Of course, if you can cut the number of calls in half,
then that would be good.

I guess another way of saying this is, if you're trying to optimize a hot
function by reducing the number of times it's called, you need to reduce it by
a significant percentage of the total calls rather than reducing it by a
specific number of times. No matter how many times it's called, if you reduce
the number of calls by 50%, the overall time taken by the function drops by
50%. But if you reduce the number of calls by 50, then whether that's
meaningful or not depends entirely on how long each individual call to the
function takes.

Edit: Why am I being downvoted for this? I'm speaking a factually correct
statement. It's not like this is some sort of controversial opinion. If I'm
wrong, please tell me why.

~~~
elihu
The original statement isn't a non-sequitur. If the compiler is spending a lot
of time in malloc, it makes sense to replace those heap allocations with stack
allocations where possible.

You are also correct that removing mallocs might not be much of a low-hanging
fruit, since it might entail changing a lot of malloc calls rather than just a
few, but that doesn't mean the original statement was wrong.

~~~
eridius
Fair enough, but it's a good bet that most of those mallocs cannot be replaced
with stack allocations.

~~~
MaulingMonkey
So grab an fixed-size pool allocator. Malloc is very generic, and faster
alternatives aren't limited to stack allocators.

EDIT: Or do what the article does, eliminate allocations that were straight up
unnecessary (because they were unused, or over-eager deep copies)

