When GCC’s “-ffast-math” isn’t

amluto · on July 29, 2015

-ffast-math does two types of things: it changes the code generation, and it mucks with the floating point settings at startup in a bid to make them faster. On x86 systems with SSE2, this involves writing to the MXCSR register.

It would be interesting to separate these two effects. You can get the former by /compiling/ with --ffast-math and the latter by /linking/ with -ffast-math.

BTW, to all the library writers out there: never ever link a shared library with -ffast-math. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522

teraflop · on July 30, 2015

Oh wow, that's a nasty bug. From the linked openjpeg issue:

    >>> s = '4.74303020008e-322'
    >>> f = float(s)
    >>> print(str(f))
    4.74303020008e-322
    >>> import osgeo
    >>> print(str(f))
    0.0

hn9780470248775 · on July 30, 2015

Well... The option is called "unsafe-math-optimizations".

mikeash · on July 30, 2015

I'd never expect such an option, no matter how "unsafe," to affect code outside of what was built with it.

Sanddancer · on July 30, 2015

The number used in that example is called a subnormal or denormalized number [1]. When you get into the really tiny ranges, the fpu can compensate by shifting the number further and further to the right, at the cost of lost accuracy, and a lot of speed. The example given only is only a few binary places from zero [2], so is indeed one of the smallest numbers an fpu can operate on.

One of the things that happens when you enable "unsafe" options is that it sets a flag in the SSE control register to set subnormal numbers to zero. Because subnormal numbers slow things down considerably, making them not happen is one of the unsafe ways to speed up code. Importing that library will run initialization code that sets the control register to get the extra, precision-losing, speed, and as a consequence, set all those tiny numbers to zero, hence the results you see.

[1] https://en.wikipedia.org/wiki/Denormal_number

[2] http://www.binaryconvert.com/result_double.html?decimal=0520...

mikeash · on July 30, 2015

It makes sense given what's going on, but I think that flags which affect global process behavior when you merely use them in a library ought to be called out much more heavily. The default ought to be to either leave these global flags alone, or save/restore them at the library entry points. If you really want to change the whole global state, it ought to be described as something beyond merely "unsafe" in my opinion.

wtallis · on July 30, 2015

Well, almost all hardware implements floating-point (and most other arithmetic) in a manner that has global side-effects and dependencies on flags. Complete sanity is impossible on such hardware.

mikeash · on July 30, 2015

Dependencies on flags, yes, but what global side effects?

Joky · on July 30, 2015

What would be more interesting is the analysis of what went wrong in the optimizer with -ffast-math!

gradstudent · on July 30, 2015

Yeah, I came to post the same thing. What /exactly/ is this code doing that makes turning off optimisations faster? What does the assembly look like, with -ffast-math and without? Can these differences be used to draw a hypothesis, one that could be tested empirically?

Despite the title, which promises an article with some interesting technical depth, I just don't feel like I learned anything. On the whole the piece comes off like a glib anecdote; lazy and not very informative.

frozenport · on July 30, 2015

Further, after six years they might have fixed the problem.

santaclaus · on July 30, 2015

I've seen floating point 'optimizations' cause issues with iterative methods that depend on the specifics of strict IEEE754 behavior (causing massive spikes in the number of iterations and thus hurting overall performance).

caf · on July 30, 2015

(2009)

personjerry · on July 30, 2015

Aw I read the title as "WHY GCC’s “-ffast-math” isn’t" and was confused as to the missing explanation. Also, smart quotes in the title, ugh.

TwoBit · on July 30, 2015

I'm pretty sure the VC++ version of fast math really is faster.