
Compiler Performance and LLVM - jondgoodwin
http://pling.jondgoodwin.com/post/compiler-performance/
======
pcwalton
Look at Cranelift [1], in particular "Cranelift compared to LLVM" [2].
Cranelift is in some ways an effort to rearchitect LLVM for faster compilation
speed, written by some longtime LLVM contributors. For example, Cranelift has
only one IR, which lets it avoid rebuilding trees over and over, and the IR is
stored tightly packed instead of scattered throughout memory.

[1]:
[https://github.com/CraneStation/cranelift](https://github.com/CraneStation/cranelift)

[2]: [https://cranelift.readthedocs.io/en/latest/compare-
llvm.html](https://cranelift.readthedocs.io/en/latest/compare-llvm.html)

~~~
mshockwave
I think the one of the key differences is that Cranelift are more low-level

------
htfy96
Sometimes I feel many comparisons against LLVM's performance is somewhat
unfair: You can easily reach -O0/-O1 performance of LLVM with much simpler
infrastructure and only a limited number of cheap passes, which leads to a
considerable performance boost, but many competitors like Cranelift claiming
they are fast in compilation will never reach -O2 performance without major
infrastructure changes.

These compilers are mainly designed for extreme performance. People
complaining slow compilation of LLVM must never used ICC's -fast mode before,
where a helloworld can take ~30s to compile. Developers still spend thousands
on it because it squashed every drop of performance.

~~~
sanxiyn
I am not sure what is unfair about it. LLVM is inferior if you want fast build
and -O1 performance. Many people are looking for fast build and -O1
performance, so it makes sense to let them know that LLVM is not what they
want.

~~~
Crinus
I'd say most people, by far, are perfectly fine with -O1 performance. -O2 (and
higher) is only needed for very small and specific parts of a codebase (e.g.
the inner parts of encoding and rendering).

The problem is having both at the same time (i haven't seen any build
configuration try to mix optimization levels) without compromising on compiler
speed for -O1, so projects that require -O2 for a 0.1% of their codebase apply
it for 100% of it.

In theory depending on the language you could mix different compilers, but
that is a big can of worms (and other bugs).

~~~
nmca
LuaJIT springs to mind here... Famously performant.

~~~
tachyonbeam
LuaJIT also does a lot less work for a given piece of code than LLVM. It
generates relatively well optimized code for a dynamic language, but it
doesn't do much of the low level optimizations that LLVM and GCC do.

~~~
Crinus
Yes, that is how it usually goes, you exchange code performance for compiling
performance but my point (a couple messages above) is that most of the time
this performance is perfectly fine and it is only a tiny part of the codebase
that may need the extra low level optimizations that LLVM and GCC can do (if
it needs it at all). Of course this is a generality, the specifics depend on
the project (chances are, the rendering parts of a CPU-based raytracer for CGI
movies will need these optimizations much more than most projects, whereas a
file manager most likely wont need them at all).

------
sanxiyn
The single most important topic of LLVM compilation time is missing: FastISel.
As shown, all time is spent in IR to machine code step. That's because LLVM's
default for codegen is optimizing. For debug build, you want to set LLVM's
codegen to fast mode.

~~~
BubRoss
How do you turn that option on? (A Google search actually returns your comment
as the first result and some more links to JIT PDFs)

------
ahaferburg
I would be curious what the performance looks like on bigger, more realistic
source files. And what happens if you disable any optimizations, will that
influence the obj generation? What about link times?

The post made me look into string interning for my compiler. I wasn't
convinced that it would be that useful. I thought that most unsuccessful
string comparisons are fast anyways, because I store the length for each
token. With a hash map, you still have to do one comparison for every lookup,
and you also have to compute the hash. But it also greatly increases the odds
that it's the right comparison. And once you did the interning, you don't need
to look up strings anymore at all.

I (very sloppily) implemented a hash map, and integrated it into the lexer.
Despite the poor implementation, and having to build the map in the lexer, it
does speed up the check whether an identifier is a keyword, and reduced the
parse time to about 70%. I get similar gains for code generation, because it
speeds up the symbol lookup, but it's probably going to be less useful here,
since I still have terrible O(n) lookup for globals. The absolute gains are
still worth it, though.

So yeah. Thanks for encouraging me to look into it!

~~~
jondgoodwin
Hey - I am thrilled to hear about your positive experience with interning
strings. I never actually did a performance test, so I am delighted to hear
your gains were substantial.

As for you questions about performance on bigger source files and twiddling
optimization options, I too am curious about that. I will likely revisit those
questions at some point in the future. It will be easier to do once I have
baked these diagnostics into the compiler.

------
fooker
Also see Google's Subzero and Apple's B3.

