
How to speed up the Rust compiler some more in 2018 - nnethercote
https://blog.mozilla.org/nnethercote/2018/06/05/how-to-speed-up-the-rust-compiler-some-more-in-2018/
======
breakingcups
I liked this post, not only for showing an overview of all the big and small
wins but also for showing approaches that were tried and didn't work out.

It's nice to see someone write about failures in a very neutral and
informative way, as it is a daily part of our jobs and life but most of what
you read online is (logically) about successes.

~~~
glangdale
Good point. I'm working on a project at the moment that's 100% performance
oriented and it's sort of a pain to think about writing about it _responsibly_
\- i.e. keeping all the intermediate steps and half-good ideas around (and
bug-free, and sorta-usable) long enough to be able to do performance analysis
on all the stuff.

It's hard not just to sweep all the bad ideas and half-way steps under the rug
and hit the world with this "aren't I clever" looking final product (without
really helping anyone learn how to get there themselves on their own
projects).

~~~
acdha
One of the other benefits is keeping those notes around in case some
assumptions change in the future and the optimization you picked is no longer
viable.

~~~
glangdale
Absolutely, yes. Or the optimization landscape suddenly changes. I had a
super-cool trick for doing state transitions in a DFA at the rate of the
_throughput_ of a shuffle instruction rather than at the rate of the latency
of a shuffle instruction, and smugly congratulated myself about how well it
worked on Ivy Bridge (latency = 1, reciprocal throughput = 0.5). Then Haswell
came along and took over the second shuffle capability to do 256-bit shuffles
(latency = 1, throughput = 1). So the clever trick went obsolete overnight.
:-(

~~~
acdha
I remember a few stories like that back in the P3/P4/AMD era where a
researcher ended up ripping out his hand-tuned assembly because the C
reference implementation was increasingly faster. It was really good that they
were very conscientious about testing both implementations regularly so there
was zero concern about subtle incompatibility.

------
cesarb
> Cachegrind’s output showed that the trivial methods for the simple BytePos
> and CharPos types in the parser are (a) extremely hot and (b) not being
> inlined.

To me, this points to a deficiency in the rustc compiler. All trivial methods
(trivial in that they will always be cheaper than a function call) should be
automatically treated by the compiler as if they had been marked #[inline].
Currently a rust programmer has to annotate every single trivial getter and
setter with that attribute, which is both tiresome and easy to forget.

~~~
bluejekyll
I’d love someone more knowledgeable to correct this, but it’s my understanding
that it does inline properly inside a crate, but across crate boundaries you
must mark a method as #[inline]...

~~~
steveklabnik
As far as I know, that is true.

~~~
hobofan
Isn't that only true for the default "release" profile? AFAIK you have to turn
LTO on, if you want to get the most performance out of it (LTO isn't on in any
profile by default), but it's also been some time since I last optimized to
that extent.

~~~
steveklabnik
I’m not sure to be honest, this is a corner of the language I find hard to
remember.

------
johnparkar7777
Informative link

