

Tricks with the Floating Point Format (2012) - brudgers
https://randomascii.wordpress.com/2012/01/11/tricks-with-the-floating-point-format/

======
nkurz
This is a fine series of articles done over several months. The final one
includes links to the rest:
[https://randomascii.wordpress.com/2012/05/20/thats-not-
norma...](https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-
performance-of-odd-floats/)

This last one was also the one that (in retrospect) I found most important:
when optimizing floating point routines, do not underestimate the performance
penalty of denormal math. You've probably realized that NaN can hurt
performance, but denormals can be much more insidious. Not only can each one
cost a few hundred extra cycles for the operation, it also resets the branch
prediction buffer, so that the next couple executions of every if statement
and loop elsewhere in the program suffers an extra 15 cycle penalty!

But it gets worse! Like a disease that quickly kills the host, NaN has the
saving grace that it usually only happens once, since it generally
contaminates the result. But adding or subtracting denormals (numbers larger
than zero but less than FLT_MIN) can go on, and on, while your performance
slows to a crawl. And unless you know to be on the lookout for floating point
exceptions, simple profiling can hide the source of the problem in the noise
of all the missed branch predictions.

Making it even trickier, different compilers have different defaults. Intel's
compilers default to FTZ (flush-to-zero), but GCC (and I think Clang?) keeps
it turned on. So if you test with ICC but distribute source code, the first
sign of a problem might be an end user who experiences code that runs 1000x
slower than it does on your test system, but only on certain inputs!

Here's another couple articles on this (IMO underreported) issue:

[https://software.intel.com/en-us/articles/x87-and-sse-
floati...](https://software.intel.com/en-us/articles/x87-and-sse-floating-
point-assists-in-ia-32-flush-to-zero-ftz-and-denormals-are-zero-daz)

[http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-
flush...](http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-flush-
denormals-confidence/)

~~~
brucedawson
I think NaNs tend to pollute results more effectively than denormals. However
NaNs can be trapped easily by enabling the appropriate floating-point
exceptions so that they turn into crashes.

Note that on most CPUs flush-to-zero is a CPU state so it's not really a
default of a compiler, but of a run-time. Also, it is a per-thread state, so
you can end up with code that runs faster (and with different results) on one
thread from another!

I'm glad you liked the articles. There have actually been several in the
Floating Point category since then. I think this one is the most important:
[https://randomascii.wordpress.com/2014/01/27/theres-only-
fou...](https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-
floatsso-test-them-all/) but the entire set is available ehre:
[https://randomascii.wordpress.com/category/floating-
point/](https://randomascii.wordpress.com/category/floating-point/)

~~~
nkurz
Thanks for the articles and the response! Yes, it's a CPU state, but I'm
pretty sure that by default different compilers set it differently at thread
start (or alternatively, choose to leave it as is). The ICC to GCC switch was
the one that tricked me. So while the changes at runtime can override either,
in the absence of an explicit override you'll often see different behavior
based on compiler choice. Useful details and links here:
[http://stackoverflow.com/questions/9314534/why-does-
changing...](http://stackoverflow.com/questions/9314534/why-does-
changing-0-1f-to-0-slow-down-performance-by-10x)

------
fiatmoney
My favorite FP trick: embedding meaningful data in the "ignored" bits of a
floating point NaN. That gives you 22 bits (must have 1 nonzero) to play with
in a 32-bit float representation, or 51 in a 64-bit float.

As I recall, at one point someone used this at battlecode [1] to lightly
obscure their communication protocol. It's also a nasty but effective way to
pack more nuanced metadata than you "should" be able to in a packed Java
float[] or double[].

[1] [http://www.battlecode.org/](http://www.battlecode.org/)

~~~
yoklov
This is used in some dynamic language runtimes as a way to represent a compact
(64bit) tagged union.

It's known as nan-boxing, and a couple years ago most of the JS vms did it (in
fairly different ways) but I'm not sure they still do, as my understanding is
that it was only useful for the pre-optimized codepath.

------
malkia
Denormalized numbers used to be the plague for the audio programmer. Spikes of
CPU audio processing just because these were produced, most of the time due to
some filters (IIR)

[http://www.dspguru.com/dsp/faqs/iir/basics](http://www.dspguru.com/dsp/faqs/iir/basics)

Recommendations were going of adding white noise, and some other tricks.

