

How to trick C/C++ compilers into generating terrible code? - l0stman
http://www.futurechips.org/tips-for-power-coders/how-to-trick-cc-compilers-into-generating-terrible-code.html

======
caf
These issues are why various compiler hints exist.

In the dead code example, the gcc function attribute 'const' can be applied to
the declaration of bar(), telling the compiler that it is a pure function
whose result depends on nothing but its arguments.

In the pointer example, the C99 standard 'restrict' qualifier can be applied
to a, b and c to tell the compiler that the values pointed to by these
variables do not overlap.

'restrict' will also help the global variable example - the reason that N is
loaded each time around the loop is because as far as the compiler knows, one
of the a[i]s could alias with N.

~~~
ajross
Mild quibble: the attribute you want is "pure", not "const". The distinction
is that a const function inspects nothing but its arguments, but a pure
function is allowed to read (but not write) external memory. Both are without
side effects and can be optimized out of loops, but pure is looser. Not all
const functions can be pure.

~~~
caf
Yes, that's why I said _"...whose result depends on nothing but its
arguments."_ \- the example bar() function in the original article does not
read global memory, so it can be declared __attribute__((const)).

~~~
ajross
Sure. I was just pointing out that that is a stricter constraint than required
for the optimization in question. Doing loop hoisting and CSE wants "pure"
functions, because pure "means" a function without side effects.

The "meaning" of const is that the function depends on nothing but its
arguments, and can therefore have its value computed at compile time, or be
part of a global CSE pass. That's a different optimization.

~~~
caf
It seems to me that loop hoisting would also be easier with ((const)), because
to do so with a ((pure)) function requires further assessing that the loop
does not modify any global state that might be visible to the ((pure))
function. A ((const)) function can be hoisted out of a loop even if the loop
modifies globals, or values through pointers that might point at globals.

------
notJim
I'm glad to see pointer aliasing mentioned here. Back in the day, I was
writing fluid mechanics simulations in FORTRAN. Writing in FORTRAN is awful,
of course, so I did some research into why FORTRAN is considered to be faster
than C++ for these simulations. Of course, a lot of it is history, and the
fact that the people writing the simulations are engineers first and
programmers a distant second, but another thing that seemed to come up was
that due to pointer aliasing (which was absent in fortran, or at least made
more explicit), FORTRAN compilers were able to implement some important
optimizations that C++ compilers couldn't. I wanted to experiment a little bit
with the C99 restrict keyword, to see if it would produce similar results to
FORTRAN, but I never really got around to it.

------
eliasmacpherson
Although it's a poorly titled article, it was interesting to read. Surely the
objective is to trick the compiler into generating the best code. I was
surprised that the vectorisation it mentions was not performed automatically.

One other way would be to target different hardware than it's designed to work
with via flags, or by using AMD with the intel compiler mentioned in the
article. There was a very short discussion about this on reddit yesterday
[http://www.reddit.com/r/programming/comments/lj1ze/ask_rprog...](http://www.reddit.com/r/programming/comments/lj1ze/ask_rprogramming_do_intel_compilers_still_screw/)

~~~
JoeAltmaier
Depends entirely upon the compiler. Don't worry about this stuff in general
unless you're doing embedded work, where the compilers are often
problematical. Any desktop compiler will know more about generating code than
you do.

~~~
sliverstorm
Embedded compilers are perfectly intelligent; I would hazard that you just
wind up doing weird things more often, and/or you care more what exactly it
does with this or that function because of your 32kHz clock and/or 1KB of
program memory.

~~~
tryp
Many embedded compilers and assemblers are terribly buggy. Having worked with
dozens, my impression is that the compilers targeting 8-bit microcontrollers
are generally of similar quality to 80's x86 compilers. Emission of incorrect
assembly given correct code is rare, but optimizations are incredibly weak and
the compilers segfault relatively often.

Some of the DSP tool suites have solid compilers that optimize insightfully
for their target architecture but whose in-circuit debuggers are tied to flaky
IDEs. I'm currently working with an XDP debugger on a Sandy Bridge board that
requires the debugger software to be restarted nearly hourly, often corrupting
the project file requiring me to enter the memory map again.

Lately I've been thrilled to spec in ARM micros because I can just use GCC, an
el-cheapo universal USB JTAG adapter through OpenOCD, and expect everything to
work.

------
dhruvbird
Interesting point about ICC generating faster code and the link order making a
difference in the running time of the application.

~~~
larsberg
The biggest thing you'll notice in practice from ICC is that it is much more
likely to unroll a loop and transform the body to use SSE instructions when
requested. If you have dense numeric code and aren't either using ICC or hand-
unrolling and using GCC intrinsics, you're probably leaving performance on the
floor.

But, there's still no free lunch. For example, as of about two months ago, ICC
will unroll loops whose increment is "i++" but will _not_ unroll loops whose
increment is "i+=1". Some insight, looking at output assembly, etc. is still
required.

~~~
dhruvbird
Interesting. Another thing I noticed is that code runs faster if floating
point and integer arithmetic instructions are interleaved rather than
"blocked" together.

------
DarkShikari
_When I declared N as a global variable, the compiler left it in memory and
added a load instruction to the loop. When I put N as a local variable, the
compiler loaded it into a register. While I do blame the compiler for this
particular behavior (because I did not declare N as volatile), we have to work
with what we have._

This is because global variables can be modified at runtime unless they're
const. The compiler cannot guarantee it hasn't been declared extern and
modified by some other file without sufficiently powerful link-time
optimization.

------
b0b0b0b
These issues are also why sql hints exist.

