
When GCC’s “-ffast-math” isn’t - jjuhl
http://programerror.com/2009/09/when-gccs-ffast-math-isnt/
======
amluto
-ffast-math does two types of things: it changes the code generation, and it mucks with the floating point settings at startup in a bid to make them faster. On x86 systems with SSE2, this involves writing to the MXCSR register.

It would be interesting to separate these two effects. You can get the former
by /compiling/ with --ffast-math and the latter by /linking/ with -ffast-math.

BTW, to all the library writers out there: never ever link a shared library
with -ffast-math. See
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522)

~~~
teraflop
Oh wow, that's a nasty bug. From the linked openjpeg issue:

    
    
        >>> s = '4.74303020008e-322'
        >>> f = float(s)
        >>> print(str(f))
        4.74303020008e-322
        >>> import osgeo
        >>> print(str(f))
        0.0

~~~
hn9780470248775
Well... The option _is_ called "unsafe-math-optimizations".

~~~
mikeash
I'd never expect such an option, no matter how "unsafe," to affect code
outside of what was built with it.

~~~
Sanddancer
The number used in that example is called a subnormal or denormalized number
[1]. When you get into the really tiny ranges, the fpu can compensate by
shifting the number further and further to the right, at the cost of lost
accuracy, and a lot of speed. The example given only is only a few binary
places from zero [2], so is indeed one of the smallest numbers an fpu can
operate on.

One of the things that happens when you enable "unsafe" options is that it
sets a flag in the SSE control register to set subnormal numbers to zero.
Because subnormal numbers slow things down considerably, making them not
happen is one of the unsafe ways to speed up code. Importing that library will
run initialization code that sets the control register to get the extra,
precision-losing, speed, and as a consequence, set all those tiny numbers to
zero, hence the results you see.

[1]
[https://en.wikipedia.org/wiki/Denormal_number](https://en.wikipedia.org/wiki/Denormal_number)

[2]
[http://www.binaryconvert.com/result_double.html?decimal=0520...](http://www.binaryconvert.com/result_double.html?decimal=052046055052051048051048050048048048056101045051050050)

~~~
mikeash
It makes sense given what's going on, but I think that flags which affect
global process behavior when you merely use them in a library ought to be
called out much more heavily. The default ought to be to either leave these
global flags alone, or save/restore them at the library entry points. If you
really want to change the whole global state, it ought to be described as
something beyond merely "unsafe" in my opinion.

------
Joky
What would be more interesting is the analysis of what went wrong in the
optimizer with -ffast-math!

~~~
gradstudent
Yeah, I came to post the same thing. What /exactly/ is this code doing that
makes turning off optimisations faster? What does the assembly look like, with
-ffast-math and without? Can these differences be used to draw a hypothesis,
one that could be tested empirically?

Despite the title, which promises an article with some interesting technical
depth, I just don't feel like I _learned_ anything. On the whole the piece
comes off like a glib anecdote; lazy and not very informative.

~~~
frozenport
Further, after six years they might have fixed the problem.

------
santaclaus
I've seen floating point 'optimizations' cause issues with iterative methods
that depend on the specifics of strict IEEE754 behavior (causing massive
spikes in the number of iterations and thus hurting overall performance).

------
caf
(2009)

------
personjerry
Aw I read the title as " __WHY __GCC’s “-ffast-math” isn’t " and was confused
as to the missing explanation. Also, smart quotes in the title, ugh.

------
TwoBit
I'm pretty sure the VC++ version of fast math really is faster.

