The paper shows that some of these perceived inefficiencies can be resolved by manual changes on a few lines of code, such as eliminating redundancies. However, the paper then demonstrates that such manual changes can sometimes make the results worse because a particular compiler might not be able to perform inlining or vectorization.
The paper concludes the optimization study by stating, "no compiler is better or worse than another production compiler in all cases." (The study was on GCC, LLVM, and ICC.)
However, in my experience icc crashed more than any other compiler I've ever used (internal compiler errors). That kind of a bug in a compiler makes me really wonder what else is wrong under the hood. And of course, being closed source, I can't possibly know...
It's been ~17 years since then, so I can't recall the options used, but seeing different results emerge when I used a higher optimization option caused an unanticipated puff of smoke from my little brain.
A possible moral of the story -- test every little change.
Then I went on to other things and realized that developers are experts in their domains, and those “obvious” things were only clear to someone steeped in the code — not obvious at all. In fact some of them are no longer obvious to me any more after decades doing other things. Hopefully I’ve become at least slightly less arrogant and thoughtless by now.
Yes, this can trip up even quite knowledgeable people as with the fp post from Lemire the other day — he thought gcc was making a mistake, but it was doing the correct thing.
But there are plenty of others, for example telling the compiler not to worry about aliasing. I might use this on certain modules or individual files which were carefully written for this.
Or asking for more aggressive loop unrolling Which could increase code size but improve performance (or not!).
99.9% the defaults are fine.
> The accuracy of the floating-point operations (+, -, <star>, /) and of the library functions in <math.h> and <complex.h> that return floating-point results is implementation-defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the library functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the accuracy is unknown.
There was an article recently talking about the bizarre rounding that happens when the x86 FPU registers get involved. You get one rounding for operations occurring in the register (which is 80-bits wide) and another if you get kicked out of the register and into RAM (which is only 64-bits wide, for doubles).
The paper isn't concentrating on what are probably the two most important performance governors these days, vectorization and fitting the memory hierarchy, which can make a factor of several difference. ("Your mileage may vary", of course.) Then it isn't obvious whether comparing different generations of compilers was even sufficiently controlled; their default tuning may be for different micro-architectures which were considered the most relevant at the time. GCC performance regressions are monitored by maintainers, but I wonder what real optimization bugs uncovered in a maintained version have been reported as such from this sort of work.
In large scale scientific code, such compiler optimizations may be relatively unimportant anyway compared with time spent in libraries of various sorts: numerical (like BLAS, FFT), MPI communication, and filesystem i/o. Or not: you must measure and understand the measurements, which is what users need to be told over and over.
CCTLib - Framework for fine-grained monitoring and execution-wide call path collection. : https://github.com/CCTLib
Some call it a "snowclone". (Previous comment: https://news.ycombinator.com/item?id=19072271)
It's a headline writing rhetorical device that's been with us even before computers. E.g. newspaper headlines.