-ffast-math does two types of things: it changes the code generation, and it mucks with the floating point settings at startup in a bid to make them faster. On x86 systems with SSE2, this involves writing to the MXCSR register.
It would be interesting to separate these two effects. You can get the former by /compiling/ with --ffast-math and the latter by /linking/ with -ffast-math.
The number used in that example is called a subnormal or denormalized number [1]. When you get into the really tiny ranges, the fpu can compensate by shifting the number further and further to the right, at the cost of lost accuracy, and a lot of speed. The example given only is only a few binary places from zero [2], so is indeed one of the smallest numbers an fpu can operate on.
One of the things that happens when you enable "unsafe" options is that it sets a flag in the SSE control register to set subnormal numbers to zero. Because subnormal numbers slow things down considerably, making them not happen is one of the unsafe ways to speed up code. Importing that library will run initialization code that sets the control register to get the extra, precision-losing, speed, and as a consequence, set all those tiny numbers to zero, hence the results you see.
It makes sense given what's going on, but I think that flags which affect global process behavior when you merely use them in a library ought to be called out much more heavily. The default ought to be to either leave these global flags alone, or save/restore them at the library entry points. If you really want to change the whole global state, it ought to be described as something beyond merely "unsafe" in my opinion.
Well, almost all hardware implements floating-point (and most other arithmetic) in a manner that has global side-effects and dependencies on flags. Complete sanity is impossible on such hardware.
Yeah, I came to post the same thing. What /exactly/ is this code doing that makes turning off optimisations faster? What does the assembly look like, with -ffast-math and without? Can these differences be used to draw a hypothesis, one that could be tested empirically?
Despite the title, which promises an article with some interesting technical depth, I just don't feel like I learned anything. On the whole the piece comes off like a glib anecdote; lazy and not very informative.
I've seen floating point 'optimizations' cause issues with iterative methods that depend on the specifics of strict IEEE754 behavior (causing massive spikes in the number of iterations and thus hurting overall performance).
It would be interesting to separate these two effects. You can get the former by /compiling/ with --ffast-math and the latter by /linking/ with -ffast-math.
BTW, to all the library writers out there: never ever link a shared library with -ffast-math. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522