

Source Code Optimization (2009) [pdf] - dodders
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf

======
nkurz
I like that this article shows the generated assembly for the same source with
different compilers, and the same compiler with different optimization levels.
Even with C programmers, there's a discouraging trend of treating the
compiler's decisions as unimpeachable. Occasionally it does a great job, but
if you make a habit of looking at generated code you will quickly realize it's
anything but ideal.

What this article implies, but doesn't really concentrate on, is that while
one can write portable code that will produce correct results when compiled
with different compilers, it's very difficult to write code that will have
portable performance. A code construct that produces a 100% speedup with one
compiler might just as well yield a 50% slowdown on another.

Even in C, optimizing code is largely an act of faith. One way around this,
which I'm surprised is not more used, is to "lock the code down" to assembly
after it is written. This doesn't mean you have to actually write anything in
assembly. Rather, after you finally manage to get a compiler to produce the
level of performance you want for the function that most impacts performance,
you can extract the generated assembly and paste it into file, leaving your C
code for portability to different processors. Given the stable x64 assembly
ABI, you can usually even mix and match between compilers --- try several, and
copy the code generated by the one that does best.

While this obviously won't work for different instruction sets, it works quite
well for different generations of processors. CPU makers are properly
concerned about making their new chips faster than the old, and thus do a
fairly good job of avoiding regressions in performance. While there might be a
faster method for the new chip, it's not that common find that the new chip
executes identical code substantially slower. I'd usually rather bet on this
than on assuming that future generations of the compiler will keep the same
magic combination of instructions for the old chip while simultaneously
generating new magic for the new chip.

~~~
sitkack
Similarly, one can use the compiler as a CASE [0] tool and write in a HLL
using the compiler and optimizer to generate assembly which can then be
modified by the programmer. There are many instructions that the compiler
won't use that can lead to speedups.

I used the above technique for some bit rotation code years ago, I couldn't
figure out how to do it in C, so I wrote the function signature, dumped the
whole thing to assembly using the compiler, removed the function body with my
own routine which was trivial to write in assembler.

It would be interesting to have an application for this, that could be called
as part of an optimizing build step:

* given a set of compilers (gcc,clang,icc,open64) * project the source to assembler as various optimization levels * benchmark all of them using supplied data * give a side by side view of the fastest assembly and HLL code * programmer optimizes assembly, does a benchmark test with historical runs tests for correctness and speed

The tool would be a combo text editor, historical benchmark and compiler
driver. This feedback loop step could be broadcast on a website for lots of
organic agents to optimize... ;)

[0] [http://en.wikipedia.org/wiki/Computer-
aided_software_enginee...](http://en.wikipedia.org/wiki/Computer-
aided_software_engineering)

