On the other hand, if I have a function that works on parameters in a specified range (say, 0.0 to 1.0) and returns results which is supposed to be correct to a specified accuracy I would love to have the compiler do all the possible optimizations without having to specify a compiler flag (doing so for, say, just a single function can be quite annoying!). Maybe approaches like Haskell's type inference will, eventually, produce significantly faster code because they can do things like this?
One example is interval arithmetic. http://en.wikipedia.org/wiki/Interval_arithmetic
None of them seem very mainstream: the predominant thinking is that floats are "the right way" to badly approximate real numbers, that works well enough except when you're working with pounds and pence, when "everyone" knows the right way is to use fixed point...
That's not going to help the above associativity problem, but it fits in the theme of "LAAAA LAAAA LAAAA lets gloss over the bits of floating point that don't follow the rules"
As already pointed out, for floating point data types the two versions could (and probably will) result in slightly different answers due to rounding issues.
In this particular case, the difference due to re-ordering of multiplications is likely to be relatively small, but it doesn't hurt to be aware of the potential problems.
Different operation orders giving different results is due to rounding issues. At the end of each individual operation the intermediate result needs to be stored in a floating point data type and at this point it is rounded to fit the limits of that type - changing the order of the operations to try optimise in the way being talked about changes the number of intermediate results that get rounded (so change the error at the end of the process), and some numbers "suffer" more from the rounding than others (0.5 can be represented accurately in binary, 0.6 can not) so sometimes the difference in the values you get at the end of operations that are mathematically equivalent (until you consider the accuracy of intermediate results) will be much larger than other times.
Of course if all compilers optimised a.a.a.a.a.a to (a.a.a)^2 then it would be OK as for the same data type all code would give the same result, but for consistency as not all code does do this optimisation it is safer for none to do it unless the programmer explicitly codes that way (and therefore takes responsibility for the fact that the "optimal" version may give different results when compared to code using the unaltered version, rather than the compiler deciding that this is OK).
The same is true for any chained floating point operations. changing a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a to a*20 (if one multiple is faster than 19 additions) will give different results.
An optimisation that could change the result (even by the smallest possible amount) is not a safe optimisation. As a general rule, unless the build script tells it otherwise, a compile won't perform an optimisation that could make the optimised version give a different result from the basic one.
Of course you may decide it is safe enough, in which case you might find your compiler has an option to tell it that you would like such optimisations to be considered. Stepping away from arithmetic for a moment, variable aliasing is another example of this. If your code could potentially update a value several ways through pointer jiggery-pokery a compile may decide certain register based optimisations are not safe, and there are often directives you can give to tell the compile that it can consider these optimisations as you are sure the code does nothing that would make them a problem (the compile can't decide this for itself as that would potentially take for ever due to the "halting" problem).
As long as they follow the "as if rule", which states that any transformations that is done to code that does not invoke undefined behavior will produce the same final result as if the transformations were not done. Reordering floating point multiplications does not follow this rule.
(Watching stuff in a debugger doesn't count for this rule)
2. The compiler is allowed to either leave the float inside the FPU or store it back to main memory after each operation, is it not? These give different results.
Somewhere around here I have a chart of the differences in floating point outputs of different GCC versions. And no, fastmath is not on.