
How to speed up the Rust compiler some more - nnethercote
https://blog.mozilla.org/nnethercote/2016/11/23/how-to-speed-up-the-rust-compiler-some-more/
======
ronjouch
The post goes directly from _" heap allocations were frequent within rustc"_
to _" effort to minimize heap allocations"_, and proceeds to detailing
speedups.

→ Systems programming newbie question: why are heap allocations bad for
performance? Is it the additional level of indirection? The cost of calling
your memory allocator? Something else?

My background, if that helps focusing answers: python/js programmer, did a
tiny bit of C/C++, am ~approximately~ familiar with the stack (call frames,
each with its context) vs. the heap (where to allocate memory for big/long-
lived objects e.g. arrays and trees).

~~~
valarauca1

        why are heap allocations bad for performance?
    

1\. You have to call to your allocator. Then do some type of search for free
memory, of the approximate size. Then flag this memory as used. Then mark the
remaining memory in that block as unused. Then mark the internal books to
match the new layout.

This is done efficiently, but modifying these collections take time. AND IT
STILL FASTER THEN:

2\. Calling the kernel to MMAP in new virtual memory, adding that to the pool,
and well restarting this process all over again.

Allocation time is a big cost, and there is work to make allocation lazy by
default in the Rust Compiler at the minute.

~~~
marcosdumay
Also, make sure #1 is done atomically. Your request will either get in a
queue, wait for a mutex, or create many more of #2 than needed because the
pool isn't shared between threads.

~~~
valarauca1
Rust uses jemalloc which has per-thread-pooling (sort of). There is very
little lock contention as there are multiple pools per-thread.

Also per thread pooling doesn't use more memory then not. In some cases it
actually uses less. Citation:
[https://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemal...](https://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf)

------
stevedonovan
Here's some thinking outside the box. Traditional compilers are focused on
building an executable as fast as possible, throwing away an enormous amount
of state each time. The new rustc incremental compilation attempts to re-use
some of that computed state, although still early days. If however the
compiler's state remains persistent (it is running as a daemon) then small
code changes should usually be pretty fast - it's analogous to the code
scanners of Eclipse. If the target of the compiler isn't a native executable,
but a continuously updated image containing code, then the link gets pretty
fast as well. The result will not be fast, but it will be _fast enough_ to
test changes.

~~~
barrkel
The Delphi compiler, when hosted in the IDE, has done this since 1996 or so.

The symbol table, generated code etc. for every compilation unit is kept in
memory and only discarded if the source is modified. All imports across
compilation units are done via a double indirection, so the symbols can be
unlinked and relinked more easily.

(Sub-second recompiles in Delphi are the norm, not the exception. In large
projects, most of the time during dev builds is linking, and that's usually
only a few seconds.)

~~~
pjmlp
Which is why I kind of think that it is a bit sad when millennials think Go
compilation speed is some kind of achievement.

Specially given Delphi features vs Go ones.

~~~
jerf
I've said a couple of times that new language designers should thank their
lucky stars that C++ has set the bar for "fast compilation" so low. Coming out
of the gate with a compiler that's "faster than C++!" is pretty doable even
for a relatively compile-time heavy language. If you had to come out of the
gate with something as fast as Delphi to be taken seriously, we wouldn't get
very many new languages.

(I mention this in the context of my observation that the minimum bar to be
taken seriously for a language is going steadily up. You certainly need a
standard library that is powerful out of the gate, whether or not it is
necessarily "part" of the language, and we're getting perilously close to the
language being required to ship some heavy-duty HTTP stuff, possibly a server
implemented in the language, before it stands a chance. Rust may have snuck in
under the wire on that, though of course that stack is developing apace even
so.)

~~~
stevedonovan
Go (despite its faults) definitely raised that minimum bar; a language needs
to arrive with a package manager and doc tools, as well as enough standard
library to get non-trivial things going. Rust made the right decision on HTTP
- let the ecosystem provide. Of course, the result of that is summarized in
the quote by Grace Hopper: “The wonderful thing about standards is that there
are so many of them to choose from.”

~~~
marcosdumay
There's always the Python way:

Let the ecosystem provide, then, once there's a few clean winners, pick one as
the official (while keeping the others there, obviously).

~~~
paulddraper
C++/Boost does that

~~~
marcosdumay
Yes (what is quite inline with the C tradition).

What is interesting is that I can't think of a lot languages that do it. It
should be the default, as it's easy, safe, and gets good results. Somehow, it
isn't.

And then there's the Haskell's way of: let the community informally choose the
one best option, and when somebody uses any of the other ones, just get
somebody near him telling "nobody goes there anymore, come to this other
place". It works very well, but is a bit confusing for newbies.

------
wyldfire
Can rust capitalize on LTO/PGO? Even if that's not quite ready for primetime,
if we're spending 50% or more time in the LLVM backend, that certainly can be
built with LTO/PGO.

Seems like it might be worth the trouble/bootstrapping challenge if it yields
another ~5%.

~~~
corndoge
we should just make computers faster, intel is failing, moores law

~~~
thecopy
We know how to make them faster. How to make faster CPUs viable & affordable
to the public though, that's a hard problem to solve.

~~~
RX14
Intel essentially has a monopoly on the market for desktop cpus, which can't
help.

~~~
thecopy
Affordable air-cooled x86-CPUs at > 4GHz will disrupt that very quickly.

~~~
k__
Is x86 still needed?

I mean the ARM stuff is getting more traction and maybe it will run around x86
in the next years.

~~~
wbl
POWER already did by sucking vast amounts of electricity and dumping heat.
Getting a high IPC isn't easy and leads to lots of thermal dissapation. Odd
arches might make a comeback but don't help much.

------
vanderZwan
Nice article, although I would have liked to see a before/after summary bar
chart of the benchmarks for all PRs as a whole; I'm curious how all these
incremental improvements add up together.

~~~
snovv_crash
You can find roughly that here, unfortunately mixed in with everything else
though:

[http://perf.rust-lang.org/](http://perf.rust-lang.org/)

------
TazeTSchnitzel
Is heap allocation actually unavoidable? PHP uses an arena allocator for its
AST. I don't know how applicable that could be to Rust.

~~~
hackcrafter
Not that applicable.

Rust memory model around strong ownership and borrow-checking differs from the
GC languages that can use generational strategies and lots of heap allocation
re-use.

All languages are doing allocations to the OS, rust just doesn't have a GC
like PHP does to intermediate.

~~~
TazeTSchnitzel
PHP's compiler is written in C, and its AST isn't garbage-collected. I think
we just throw out the entire arena, actually.

~~~
slededit
Technically its a very limited form of garbage collection... Then again with
this definition so is just not free-ing anything and letting the OS cleanup
after process exit.

~~~
TazeTSchnitzel
I believe it's manually freed.

