
How to speed up the Rust compiler in 2018 - nnethercote
https://blog.mozilla.org/nnethercote/2018/04/30/how-to-speed-up-the-rust-compiler-in-2018/
======
ajross
So... grumpy old man response here:

These are all tiny, targetted microoptimizations worth a percent or three of
benefit in specific tests. They're worth doing (or at least evaluating) in any
mature product and I have no complaint.

Nonetheless rustc remains _really_ slow relative to other similar
technologies, including C++ compilers. Is there any consensus as to why?

I mean, with C++, the answer is something to the effect of "template expansion
happens syntactically and generally has to be expressed in headers, leading to
many megabytes of code that has to be compiled repeatedly with every
translation unit". And that isn't really amenable to microoptimization. We all
agree that it sucks, and probably can't be fixed with the language as it's
specified, and chalk it up to a design flaw.

What's the equivalent quip with rustc? I mean... is it going to get faster (in
a real sense, not micro), or is it not? Is this fixable or not, and if not
why?

~~~
woolvalley
In swift it's the type inference and operator overloading combination that
causes long compile times. The operator + for example can have a lot of
implementations, and since types have to be inferred instead of being declared
upfront, the combo creates slow compile times:

Ex, there is this infamous error in swift:

"Expression was too complex to be solved in reasonable time; consider breaking
up the expression into distinct sub-expressions"

for something like this:

matrix11 = (g _(x23_ (x12+x32)+x13 _(-x22+x32)-(x12+x22)_ x33) * sin(a)) /
(x13 _(x22_ x31-x21 _x32)+x12_ (-x23 * x31+x21 * x33)+x11 * (x23 * x32-x22 *
x33))

C++ would have no problems compile that expression quickly.

Since rust and swift are similar languages, there is probably a similar issue.

~~~
eridius
> _Since rust and swift are similar languages, there is probably a similar
> issue._

Swift's type checker works quite differently than Rust's. I'm not aware of any
exponential time complexity issues in type-checking Rust code.

~~~
shepmaster
Exponential performance cliff compiling associated types with equality
constraints[1]. It was first opened in 2015 and was only recently addressed.
Rust certainly has the potential for exponential times when performing large
inference.

[1]: [https://github.com/rust-lang/rust/issues/22204](https://github.com/rust-
lang/rust/issues/22204)

------
mastax
If you're interested in the current "grand plans" for performance, check out
this issue: [https://github.com/rust-
lang/rust/issues/48547](https://github.com/rust-lang/rust/issues/48547)

The query parallelization work has been coming along. Also it was recently
discovered that building LLVM with a recent clang and doing cross-language LTO
speeds up builds by ~25%.

------
oblio
Is there any dedicated effort to speed up LLVM compilation? I imagine that
would benefit a lot of languages.

~~~
noelwelsh
What I've heard is that the output that the Rust compiler gives to LLVM is
much more complex than it needs be, and optimising this is the simplest way to
reduce the time spent in LLVM. This seems plausible to me as there aren't many
complaints that other compilers using LLVM are slow.

~~~
arghwhat
That depends on what you are comparing to. Both GCC and Clang are very slow
compared to, say, the Go standard compiler.

However, comparing compilation speed across languages and compiler
implementations is a bit difficult.

~~~
noelwelsh
Right, but compilation performance is not a race in which you have to be
first, but a minimum threshold that you must meet. Compilation just has to be
fast enough that it doesn't annoy developers most of the time. Once you
achieve that they'll mostly stop complaining. I believe Clang et al are fast
enough.

(This is true for performance and optimisation is general: fast enough is good
enough.)

~~~
arghwhat
Compiling larger projects, I certainly do not feel that Clang et.al. are fast
enough. It takes _minutes_ to compile stuff at work with 32 Xeon threads
powering through at full load.

Once it takes seconds, we can talk about "fast enough".

~~~
iClaudiusX
Not to pick on you or the parent comment, but this is a such a common mistake
that it's worth correcting.

"Clang" is a thing not a person, therefore it's appropriate to use et cetera,
not et alii. The former is for things, the latter for people (usually limited
to authors of academic papers).

~~~
threatofrain
I interpreted that as a deliberate and sassy way to write. It's as if I said
with Python et al. eating its lunch, Clojure better light a fire under its own
ass.

~~~
beojan
et al. refers to a group of authors working together, so I would say a phrase
like "Python et al. eating its lunch" is simply wrong.

~~~
arghwhat
comex's research above would appear to disagree.

------
jeffdavis
A recent good experience with rust: wanted to learn a little about machine
learning. Found a crate called rusty-machine, copied some examples from the
docs, and built them.

No problems, compiling all of the dependencies was fast, examples worked, and
in literally about 2 minutes I had a neural net and was hacking on it.

This is without touching rust in months.

~~~
orib
Great insights about improving the rust compiler performance! What else would
you do to speed it up?

------
leshow
Is there a reason why you'd create 2 distinct repositories instead of just
creating branches and adding new remotes?

~~~
lambda
So that you can easily rebuild and re-run each one with difference in build
settings, instrumentation, etc, and get comparisons between the baseline and
the modified version, without having to check out a different branch each
time.

Sometimes it's just quicker to work with two separate working directories than
switching branches in one.

~~~
ben0x539
`git worktree` is pretty neat btw!

------
Rafuino
Total noob question here, but how does the hardware you're using affect
compilation speed? What's the baseline hardware being used in this benchmark
he runs? There's no way to recreate the benchmark environment unless we know
this.

When someone like ajross says rustc is _really_ slow, how slow are we talking?

~~~
steveklabnik
> how slow are we talking?

It really depends on your machine, as you ask above. And on your project. And
if it’s a fresh build or a rebuild. And all sorts of stuff.

It’s slow enough that improving this is a priority for us.

~~~
Rafuino
Thanks, Steve, for the response. Certainly there are a lot of variables so I'd
like to see benchmark reports actually disclose what's under the hood.

Let's say, for example, you're on a regular rebuild cadence for your work.
What makes the biggest difference for speed in such a scenario: CPU cores, CPU
speed, DRAM speed, SSD latency, SSD throughput, something else?

~~~
steveklabnik
I only own one computer so I can’t really get empiric data here. We do track
compile times on the same machine at perf.rust-lang.org, but I do t believe
anyone has done those kinds of tests. It’d be neat to have though!

~~~
Rafuino
Awesome. I've been checking out the site on the wall time submissions
especially. To the point of the original questioning, do you know what's under
the hood on the machine you're using to test at perf.rust-lang.org? I'm not
seeing it called out on the site so far in my searching...

~~~
steveklabnik
I'm not sure, might make a good question for internals.rust-lang.org.

------
shmerl
Is rustc still using forked llvm, or it's already switched to upstream?

~~~
mastax
Nothing's really changed with that recently. Still using a fork (though less
divergent than it has been?), but it also builds fine with upstream.

~~~
shmerl
Are there plans to merge remaining differences to upstream and switch to it?

~~~
steveklabnik
We generally send our patches upstream but they can take some time to land.
This will just always be the case. We test that everything builds with stock
LLVM, you’ll probably just get some test failures from the unpatched bugs.

~~~
shmerl
So the fork is mostly needed for convenient fast iteration?

~~~
steveklabnik
Yup.

------
MikkoFinell
Why is Rust so popular on HN? Honest question.

~~~
olavk
I find it interesting because of the novel approach to memory management. It
is deterministic and zero-overhead like C, C++ and similar, while at the same
time safe like garbage collected languages. I think of it like a compile-time
garbage collection.

I wouldn't use it as an alternative to garbage collected languages since
garbage collection is just simpler overall, but I would consider using it as a
safer alternative to C or C++.

~~~
danieldk
_I wouldn 't use it as an alternative to garbage collected languages since
garbage collection is just simpler overall,_

I do not disagree, but I think one could argue to the contrary as well.

Since most garbage collected languages do not guarantee that objects are
actually (timely) garbage collected, you cannot tie the lifetime of other
resources (file descriptors, sockets, locks) to object lifetimes. So, the
burden is on the programmer to ensure that resources are correctly finalized.
Whereas in RAII languages you can properly tie all finalization to object
lifetimes.

Note: I am not arguing that GC-ed languages cannot do RAII. AFAIR D has a GC
and supports RAII.

~~~
jerf
In most modern languages, there is some way to tie object cleanup to a scope,
even for a GC'd value. Note I say "cleanup", as in "I closed the file handle"
or something, not "finalization" which is a GC-specific term. It isn't
necessarily as rigorous as RAII but it works in principle in much the same
way. "with" in Python, for instance, "defer" in Go (less automatic but if used
properly fits the 80/20 nature of Go), Haskell has a motley crew of solutions,
most other languages you can put something together yourself if nothing is
provided.

Where things get complicated is when you don't have a clear scope to tie
lifetime to and so you can't use these, but then, that applies to RAII too.

I'm not saying C++ doesn't have a bit of an advantage here, but I think it
often gets oversold as "C++ has RAII and other languages have nothing even
remotely resembling it", which isn't true.

In practice this isn't a problem that I encounter in GC'd languages anywhere
near often enough to justify even a slight preference for a "true RAII"
language.

~~~
danieldk
_It isn 't necessarily as rigorous as RAII but it works in principle in much
the same way._

I would say that _in principle_ they work very differently ;).

They work superficially in the same way in that if you tie a particular object
to the current scope in a RAII language, the cleanup happens at the same point
as defer, with, try-with-resource, etc. would. However, there are cases where
you _really_ want cleanup to be tied to the object's lifetime.

For example, I have a Tensorflow binding for Go. However, I cannot pass Go-
allocated memory (e.g. memory allocated to a slice), because Go does not
allocate slice memory on 32-byte boundaries (using Go memory would cause an
allocation + memcpy in Tensorflow to align the memory). So, you allocate
memory in C-land and have Go structs that wrap your pointers. However, now
cleanup becomes interesting. You do not want to rely on finalizers, since they
are not guaranteed to run. However, using a _Close_ method is also an
annoyance. For tensors that live for the duration of a graph run, it is fine
(you can use _defer_ ), but other tensors live longer and are reused between
runs, shared by models, etc. It becomes unclear pretty quickly who is
responsible for closing the model.

I also use a Tensorflow binding for Rust, which is drastically more convenient
in this respect. Since ownership is clear, the lifetime of a tensor is bound
to the scope or object that owns it. If the owner is dropped, the tensor is
also dropped. If you need to share a tensor, you make an Rc/Arc the owner.

~~~
jerf
Well, to be honest, my position has consistently been that using Go for
scientific programming is not a good idea, and will never be a good idea
because the Go team is never going to give you the features you need for it,
and that you probably _should_ just use Rust. (Or something else. Rust is not
the only choice.) The Go answer is probably that, yes, you need a .Close
method, and yes, I agree that in this use case that's _really annoying_ and I
would suggest this argues against using Go for this.

People tend to then get annoyed at me for expressing this opinion, but this
sort of thing is the reason why. Go is really only merely adequate at
interfacing with libraries in other languages [1] which scientific programming
does a lot of, and Go has a type system that seems almost precisely tuned to
get in your way if you try to program mathematical code in a typed manner
_but_ at the same time isn't so weak that you can pull something like a NumPy
where at the Python level everything is just untyped so as long as you
assemble it correctly up there, the C level can work it all out. Nor can you
practically program in that manner, because while you can slather interface{}
everywhere, you can't make it convenient to work with like a dynamically-typed
language. I think Go is approaching maximally pessimal for scientific-type
programming, personally.

I should clarify that when I say Go has "something like" RAII, I do mean pure-
Go code only. And by no means is "defer" perfect. (I'm definitely in the camp
that it should have been block scoped, not function scoped, and the
performance hit can be quite annoying.) It's just that, as I said, it's not
like the choice is "either RAII or you're in some manual-management only
horrorland"... lots of languages have block-scoped constructs (not just Go)
that can be used to 80/20 RAII. That last 20 may be important in some cases,
but it's quite often a great deal less important than the 80.

(This post is brought to you by your friendly local "HN poster who has been
accused of being unreasonably positive about Go".)

[1] Pretty much every modern language claims to have "great" interfacing with
C, despite IMHO wild variances in difficulty. Go is "adequate" because it's
not too difficult to simply call a C function, and with not much labor you can
get binary-level-compatible structs between the two, which is a nice advantage
over Python or Perl or something. But the semantic mismatch is pretty rough
around memory management and threading model, and that manifests in slowness
in the calls in addition to general semantic mismatch.

~~~
mratsim
Rust is better than Go for sure ( _cough_ Generics) but it's already hard
enough to fight the mathematics, statistics and machine learning, scientists
will not want to fight the Rust syntax as well.

For me, the most promising compiled and statically-typed language for
scientific computing is Nim.

Disclaimer: I am the author of a Numpy/Torch/Tensorflow-like library written
from scratch in pure Nim, the look and feel is pretty similar to Python
Pytorch + Keras for neural networks:
[https://github.com/mratsim/Arraymancer](https://github.com/mratsim/Arraymancer)

