
On the Madness of Optimizing Compilers - augb
http://prog21.dadgum.com/217.html
======
chadaustin
Speaking of madness, James Hague drives me a little crazy. I get it - people
tend to focus too much on performance. But it seems like his blog is a series
of repeated arguments that nobody needs maximum performance out of their
machine. Try compiling your web browser or operating system with -O0 and then
tell me the performance is satisfactory. Or your video encoder. Or VMware.
There is a huge market for high-performance software where performance comes
from more than a few inner loops, and optimizing compilers, while complicated
and huge and occasionally buggy, are necessary for the user experience we
desire.

~~~
vmorgulis
> But it seems like his blog is a series of repeated arguments that nobody
> needs maximum performance out of their machine.

Not false. He seems sometimes contradictory.

[http://www.dadgum.com/james/performance.html](http://www.dadgum.com/james/performance.html)

~~~
junke
I don't understand what is contradictory here.

> The golden rule of programming has always been that clarity and correctness
> matter much more than the utmost speed. Very few people will argue with
> that. And yet do we really believe it? If we did, then 99% of all programs
> would be written in something like Python. Or Erlang. Even traditional
> disclaimers such as "except for video games, which need to stay close to the
> machine level" usually don't hold water any more.

He claims that you can _usually_ choose any language you want to develop your
software and that it will not make any visible difference. Systems are so fast
nowadays that the trying to optimize for speed will need to an effort that is
probably not justified.

~~~
w0utert
That may be true, but suggesting this means you could just as well write
everything in python or some other 'simple language' just doesn't make any
sense. First of all because they all introduce their own unique set of
problems (e.g. no type safety, much less static, compile-time checking, the
need for a runtime, etc), second, because the way people typically use them
promote programming styles that are inefficient in ways far beyond some
constant-time compiler optimizations (using dict or list for everything, for
example, or having no concept of data locality and cache-friendliness), and
third, because it's perfectly possible to write C/C++ code that is efficient
enough, readable, and easy to maintain. I agree that most of the time the
performance of the most obvious implementation of some part of a program is
more than sufficient, but to me that just proves the point the problem is not
with compiled languages like C/C++, as you can use 'readable, maintainable' as
the default way to implement things, and only drop back to obscure tricks if
you really need to (which is rarely).

I don't really get the point the article is trying to get across anyway. I
don't care about the complexity of the compiler at all, and 99.99999% of the
time I don't care about what the assembly it spews out looks like. I'm just
using the compiler as a tool, I don't need or want to know how difficult it
was to get it to compile my code. The only thing I care about is that it
produces valid code, and that the code gets a speedup when I compile with
optimization enabled. That's about the extent of my interest in the compiler,
and modern compilers have served me very well over the years. Why complain
that they need to be complex to produce faster code?

~~~
vmorgulis
> Why complain that they need to be complex to produce faster code?

Maybe because the resources could be invested in something (like Erlang).

I share similar views but for another subject (web browsers).

------
MichaelBurge
If I'm writing code that actually needs to be optimized, then I usually put it
into its own object file and tweak the compiler flags as necessary to get the
assembly to look the way I want. The C compiler supporting this saves me time.

For any other language except C(Haskell, Java, C++, C#, Go, etc.), I don't
actually care that much about compiler performance as long as it does the
obvious optimizations. Generally you can look at performance at a higher
semantic level as soon as your code starts implying calls to malloc() - and
conversely, you can start looking at the assembly as soon as you've removed
all calls to malloc() in the offending section.

The other thing that transcends compilers or languages are the actual bits
that get stored in memory or on disk. Even if I'm writing Perl, it makes a
difference if I'm packing something into a contiguous array or just packing a
bunch of pointers or references to all over my heap. Or if I can shave a
couple bytes off every record in a hundred million records.

Javascript might be the one exception, since people using it don't have any
other choice. I don't know - is there such a thing as an FFI for Javascript,
using a browser extension or something?

~~~
dclowd9901
Asm.js lets you write a strictly typed subset of JS. Most prototypes of it are
compiled C running (compiled to JavaScript, yes) 3D games. When compiled, Asm
hints the JIT so that it can further optimize around the loose typing of JS.

------
sanxiyn
> I've seen arguments that some people desperately need every last bit of
> performance, and even a few cycles inside a loop is the difference between a
> viable product and failure.

This is a strawman. The usual argument is that 1% better performance is, well,
1% better. That's a whole server if you have 100 servers. People are not
desperate, but do want an additional server for free.

> Assuming that's true, then they should be crafting assembly code by hand.

This, and "a few cycles inside a loop", is another strawman. What usually
happens is a good compiler saves a bit everywhere, which helps when the
program is optimized to flat profile without hotspot. Since saving is
everywhere, "crafting assembly" amounts to lots and lots of assembly, not just
a few in strategic places.

~~~
dkopi
And a good compiler doesn't only optimize for speed. It also optimizes for
code size, memory usage.

I do however agree with underlying premise of this post: Code that works 100%
of the time is better than higher performant code that explodes in your face
every once in a while.

~~~
0x07c0
>And a good compiler doesn't only optimize for speed. It also optimizes for
code size, memory usage.

No they don't. Code size is rarely a problem (embedded probably cares..).
Optimizations will in most cases lead to larger code, loop unrolling, inline
expansion etc. In c the compiler cant't, in most cases do anything about
memory usage. (Probably could pack structs.. , but that is not something the
compiler would do I think, you would break ABI with non optimized code). In
managed languages, memory usages is probably more depended on the runtime
system GC, then compilers.

GCC has a flag for code size optimization -Os
[https://gcc.gnu.org/onlinedocs/gcc/Optimize-
Options.html](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html)

~~~
dkopi
From that exact link: "Turning on optimization flags makes the compiler
attempt to improve the performance _and /or code size_ at the expense of
compilation time and possibly the ability to debug the program."

"With -O, the compiler tries to reduce code size _and_ execution time"

So actually yes, The compiler optimizes for both size _AND_ speed. Sometimes
you can optimize for maximum speed (-O3), and sometimes optimize for minimum
code size (-Os). You can even optimize for best debugging or best compilation
times, and forget about size/speed.

As for memory usage -

1\. First of all - code uses up memory. smaller programs = less memory used
for holding the exectuable code.

2\. Consider the various ways you could compile a switch(-case) statement. You
could build it as a set of conditional branches one after the other, you could
compile it into a binary tree search (or a hash dictionary lookup), and you
could jump to an address found in a lookup array table.

This former method is also called a "branch table".
[https://en.wikipedia.org/wiki/Branch_table](https://en.wikipedia.org/wiki/Branch_table)

A branch table would be the fastest method for implementing a switch with a
few sequential values (0,1,2,3,4,5) But would be a really bad idea for sparse
values (0,200,5000,10000,200000).

Choosing not to use a branch table for sparse values _is_ an optimization for
Memory usage. And compilers make this decision all the time.

------
yxhuvud
> The straightforward "every program all the time" compiler is likely within
> 2-3x of the fully optimized version (for most things)

Citation needed. Pulling numbers out of your ass doesn't an argument make.

~~~
0x07c0
That number my have be true pre 2000, but with the move to parallel
architectures it is grossly wrong. Optimization is not about saving a few
cycles here and there. It is about using the whole machine, multiple chips,
all with multiple cores. This cores are vector cores (AVX/SSE, altivec, etc).
The chips are on different NUMA domains, the cores has pipelines, caches, etc,
that has to be exploited... Compilers really don't do to much about this. This
is now the job of the programmer. Speedup for codes that can exploit this, at
least 20-30x. This adds up to loot of computer time... (add multiple
accelerators (GPGPUs, MIC), and you can have 200-300x speedup on one node)

------
rlpb
"If you get any of this wrong, even in an edge case, the user is presented
with non-working code for a valid program, and the compiler writer has failed
at his or her one task."

Perhaps languages could be designed so as to make this task easier?

In language design we end up torn between making things easier for the
programmer and making things easier for the compiler-designed-for-performance.
Doing both things well is perhaps something that language design will be able
to improve upon in the future.

~~~
dkopi
Languages and compilers actually do have constructs for programmers to make
things easier for the compiler.

"inline" and "register" are two very known examples. Other examples are
function attributes in gcc: [https://gcc.gnu.org/onlinedocs/gcc/Function-
Attributes.html](https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html)

~~~
nkurz
I presume you're aware that you've chosen two controversial examples?
"register" has been formally deprecated in C++, and has been ignored by
(almost?) all C compilers for decades:
[http://stackoverflow.com/questions/20618008/replacement-
for-...](http://stackoverflow.com/questions/20618008/replacement-for-
deprecated-register-keyword-c-11)

And most current wisdom is that "inline" no longer serves as a useful
performance suggestion to the compiler, but instead should only be used as a
linkage directive to allow multiple declarations of the same function:
[http://stackoverflow.com/questions/29796264/is-there-
still-a...](http://stackoverflow.com/questions/29796264/is-there-still-a-use-
for-inline)

~~~
dkopi
Great points. As compilers evolve, optimizers are even more knowledgeable of
what should be put in a register or inline that the programmer.

But those are still 2 well known features that were added intentionally to
help improve performance.

------
legulere
So this article says that optimizing compilers are unreliable? I don't think
this is the case.

Compiler bugs are pretty rare, a bigger problem are insane language standards
that have undefined behaviour lingering behind every corner.

~~~
randomsearch
Compiler bugs are not rare, quite the opposite, and this is often because of
premature optimisation.

For example, GPGPU compilers are awful:

[http://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2015/PLDI...](http://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2015/PLDI_Fuzzing.pdf)

~~~
DannyBee
It's more that compiler vendors who compete mainly on the basis of worthless
benchmarks do awful things.

This is one of the reasons gcc/llvm won: they don't care about that stuff. I
mean, they care about performance, but not at all costs like these guys care.

At some point, people wanted compilers that did less stupid things. So they
moved to the vendors who did that.

The reason it's so awful in the GPGPU space is because none of these vendors
produce compilers yet. Instead, the vendors you have are those whose main goal
in life is to generate better performance numbers than the other guy.

Ask anyone about the old days of fortran compilers and early optimizing C
compilers and what have you.

What exists now is essentially "what happened after the market pushed back".
Sadly, it didn't go as far as some people want (in either direction).

~~~
loup-vaillant
> _This is one of the reasons gcc /llvm won: they don't care about that stuff.
> I mean, they care about performance, but not at all costs like these guys
> care._

Really?
[http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...](http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf)
[https://news.ycombinator.com/item?id=11219874](https://news.ycombinator.com/item?id=11219874)

While GCC and LLVM do care about implementing the standard correctly, they
show little mercy to the poor developer who failed to avoid one of the many
sharp corners of undefined behaviour.

~~~
DannyBee
Yes. Really.

There was a long thread about this just a few days ago
([https://news.ycombinator.com/item?id=11219874](https://news.ycombinator.com/item?id=11219874))

I already commented there, and since i didn't always have pleasant things to
say, i'm not going to repeat them here.

------
riscy
This is a meaningless article. A compiler that chooses to perform an
optimization (transformation) which does not preserve the defined semantics of
the language has a bug, by the definition of what a compiler is. What is the
argument?

~~~
jfoutz
something like -Ofast is well defined. the thing is, it defines another
language closely related to the source language but with some extra
constraints. So you have C, and also C'. It may be the case that all the C in
your program happens to be valid C', but probably not. If you're not acutely
aware of the rules of C', you won't get what you want. And, some poor hapless
C programmer may find themselves working on a C' project and not know it, or
the expert is suddenly a beginner again, because they need to learn the extra
rules of the game.

It's all well defined. it's a social problem. it's something to be aware of.

~~~
riscy
> If you're not acutely aware of the rules of C', you won't get what you want.

The same can be said of any language that lacks a rigorous definition;
misusing -Ofast is just an example of one. Even then, the definition can be
too large or tricky for anybody to truly understand, but that's a bit of a
grey area... at some point you have to draw the line and say the programmer
made a mistake. The only sensible thing to do is draw the line at the
definition of the language, otherwise there is chaos.

------
sharpneli
I don't understand the point of the article. Using optimizations in a compiler
is fully voluntary act. You can just compile everything with -O0 and be done
with it.

Some of us actually do need them. Soft realtime programs (games as an example)
are perfect example where they are crucial. Reaching same level of performance
using non optimizing compiler would balloon the development costs up multiple
times.

I would understand the complaint if compilers would mess things up and you
couldn't get around it.

------
gambiting
"I've seen arguments that some people desperately need every last bit of
performance, and even a few cycles inside a loop is the difference between a
viable product and failure. Assuming that's true, then they should be crafting
assembly code by hand, or they should be writing a custom code generator with
domain-specific knowledge built-in."

Well, I work in the games industry and that "last bit of performance" is quite
frequently the difference between having a 60fps game and something that
doesn't quite hit the vsync and looks a lot worse. Seriously, this is one of
the very few industries where I can say with a straight face "don't do context
switching because the 5 microsecond delay is too much" and "a missed L2 cache
hit costs us 50 cycles, and it a major problem here".

------
Rexxar
I think the problem is not too much optimizations but the undefined
behaviors/compiler bugs detection. There should be an "optimized debug" mode
that apply all optimizations requested but add asserts to check every
assumptions made by the compiler. In this way we can run the "optimized debug"
binary and check it doesn't trigger any assert.

------
dalailambda
While I understand the argument the author is making, I think it's important
to consider that while a mild focus on performance could only benefit your
code a little, applied across your entire stack, the performance benefits
start to add up, and become worthwhile.

------
lmm
But why would people who don't need that last 1% of performance be using C at
all in this day and age? The article talks about crafting assembly by hand,
but C is often spoken of as a cross-platform assembler.

------
mpweiher
I blame:

(a) free compilers

(b) the C standard

Crazy talk, surely! Let me explain: each of these is a Good Thing™, but
together they have put us in our current predicament. It used to be that
compilers were reasonably expensive products, so as a vendor you had a clear
incentive not to piss off your customers. In addition (in the very olden
days), there as no C standard so the definition of "correct" was very
customer-based: if customer code compiled well, you were doing a good job, if
not, you would probably suffer.

With compilers becoming free, the incentive structure changed, customers now
matter very little. Instead, you get ahead in that pecking order by improving
some SPEC benchmark by 0.2%. At the same time, the C standard, which was
worded very inclusively so even compilers for odd architectures could be
ANSI/ISO compliant, was used as a substitute for "my customers' code works",
and people started to say silly things like "if you have undefined behaviour,
demons can fly out your nose".

Well: I had a pre-ANSI compiler. ALL its behaviour was "undefined". And yet it
took fewer liberties with the code than many of today's compilers do. Because
the creators (a) were reasonable people and (b) had real customers.

