
Comparing Floating-Point Numbers Is Tricky - mrkline
http://bitbashing.io/comparing-floats.html
======
jepler
The operator overlooks my pet peeve when it comes to comparing floating-point
numbers: in C++ the "standard associative containers" (std::set and std::map)
are based on ordering using an less-than relationship which must be a "strict
weak ordering". Many times, the methods suggested for comparing floating point
types do not satisfy the requirements of "strict weak ordering", and in this
case the C++ standard says you've entered the realm of undefined behavior. In
the code at $DAY_JOB, "undefined behavior" turned out in this case to include
such pleasant side-effects as double frees(!).

Specifically: when you have a less-than relationship "<", then !(a<b) &&
!(b<a) implies that a and b are equal (a==b). And if a==b and b==c then it
must be the case that a==c, or the requirements of the ordering predicate are
not met. Unfortunately, under most of these FP comparison schemes, for numbers
a and b that are "close but not too close", it's the case that a<b, but for
x=(a+b)/2, a==x and x==b!

~~~
Negitivefrags
This is actually caused by the use of the x87 80 bit floating point registers.
(Infamous GCC bug #323)

When the float is first inserted into the set/map it has 80 bit precision, but
that gets truncated to float or double precision during the store. This breaks
the ordering as you are saying, but it's not an inherent flaw with floats as
such.

The problem goes away if you compile with -mfpmath=sse because then the math
will be performed in the same precision as the storage format.

Bug #323 is responsible for a huge amount of mistrust of floats that they
don't deserve. Other compilers don't have this problem because they truncate
the floats before any comparison.

~~~
jepler
Yes, though I'm specifically talking about when you decide to define a 'bool
less_than(double, double)' that uses some kind of fuzzy comparison approach
internally. This can affect any platform, not just one with the "bug #323"
behavior in it.

~~~
Negitivefrags
You don't need (and shouldn't do) any kind of fuzzy comparison for less than
for floats.

~~~
jepler
You sure might imagine you need to. For instance, suppose you want to average
pieces of data that are timestamped "almost the same", but could arrive to be
processed with varying delays, including out of order arrival (so you can't
just say "is this datum at 'about the same time as' the datum received just
prior"). There are better approaches, but the one I inherited involved using a
std::map which used a fuzzy less than as the ordering predicate; and my main
task was to diagnose why, once in a blue moon, a segmentation fault could
occur when doing some operation on the map (insertion, I think).

~~~
enqk
It seems more logical to pre-quantise the timestamps before insertion. (Actual
solution I've seen practiced)

------
noobermin
>When comparing to some known value—especially zero or values near it—use a
fixed ϵ that makes sense for your calculations.

If you're ever doing mathematical calculations of any sort, it is a good
practice to have a handle on the scale your numbers will lie within. If not
just to be a better professional, it helps you choose an ϵ that matches.

------
fnj
Comparing floating point numbers for _equality_ us tricky. In fact it is a
classic fool's errand. Comparing for lesser or greater is not tricky.

~~~
bubblethink
Why is that so ? If you can do a<b and a>b reliably, you can get equality as
!(a<b || b<a). Inequality has the same issues for very close values.

~~~
greglindahl
Try putting NaN into your equation: not equal to itself.

------
keithnz
There was another good article posted on HN on floating points -
[http://lemire.me/blog/2017/02/28/how-many-floating-point-
num...](http://lemire.me/blog/2017/02/28/how-many-floating-point-numbers-are-
in-the-interval-01/)

where the key takeaway is that between -1 -> 1 is where the %50 of floats
live.

------
hprotagonist
comparing floats is arguably nondeterministic.

[https://randomascii.wordpress.com/2013/07/16/floating-
point-...](https://randomascii.wordpress.com/2013/07/16/floating-point-
determinism/)

------
kwhitefoot
Comparing numbers is easy. The operators are there in the manual.

The hard part is understanding when it is appropriate to compare floating
point numbers and how to produce them.

I regularly use floating point numbers as keys in dictionaries and of course
all the code quality tools whine about comparisons being inexact. But in my
case there is no fuzziness because the keys are all produced by the same
method and hence do not suffer from any different rounding errors.

Pretty much every new member of the team sees floats being compared and has
heart attack yet the code in question is in the oldest and most reliable
component of the whole million line program.

Just don't expect two numbers produced by different expressions that would be
mathematically equivalent but have operators in a different order to produce
identical results.

------
Safety1stClyde
I got confused reading this, because first of all it has a picture of 64 bit
floating point, then it starts comparing float to int32_t. Obviously 64 bit
floating point is "double" not "float", but then it has a picture of 64 bit
and program code using int32_t.

What I would say is that when you're considering comparison of floating point
numbers, it's important to understand what the operation means in terms of the
data which you're representing using the floating point numbers, in other
words what does it mean in terms of the data for two values to be equal or not
equal. Usually there is a precision inherent in the data itself which will
guide you to how to formulate equality, if necessary.

------
exDM69
Here's my less-than-scientific floating point near-equality test I use.

    
    
        bool zero(float x) ( return x*x < FLT_EPSILON; }
        bool equal_float(float a, float b) {
            return (zero(a) && zero(b)) ||   // both are zero
                zero((a-b)*(a-b) / (a*a + b*b)); // or relative error squared is zero
        }
    

This checks equality to about four decimal digits for 32 bit single precision
and seven digits for 64 bit floats. Inf/NaN special values are not considered.

Critique and comments welcome.

~~~
janco
`FLT_EPSILON` represent the minimum difference between two adjacent floats
around 1.0; it should be scaled according to the input argument. E.g. your
`equal_float` returns `true` for 2e-6 and 4e-6 which are clearly not the same
number.

A better comparison would check for zero somehow like this:

    
    
        bool zero(float x) { return std::abs(x) <= std::abs(x)*FLT_EPSILON;  }

~~~
exDM69
Yes, this is by design and it works as intended.

I typically use this with doubles and DBL_EPSILON, which is much much smaller
than FLT_EPSILON.

With FLT_EPSILON this roughly equals to "zero" being "less than 0.001". If the
zero check is omitted, there's going to be a division by near-zero which will
make the results nonsense (and you have to draw the line somewhere). With
DBL_EPSILON "zero is less than 0.000000001".

If this is too loose, then `zero(x) = abs(x) < FLT_EPSILON` makes it much
stricter (about 1e-7).

This is good enough for my purposes, I don't deal with very small numbers in
float and doubles give more than enough precision.

NOTE: I usually use this kind of comparison in testing by comparing known
"gold" figures against the results of the code being tested. I don't test
accuracy, I test for "in the ballpark" because the stuff I deal with has
built-in inaccuracy in the algorithm and numerics.

The version you posted will always return false if I read it correctly.

------
hcrisp
There was a very good article explaining this using MATLAB, but I can't find
it right now. This one is pretty close and explains the concepts of overflow,
underflow, etc. The diagrams about "eps" are pretty good, even if your
language of choice is Python, C/C++, etc.

[http://blogs.mathworks.com/cleve/2014/07/07/floating-
point-n...](http://blogs.mathworks.com/cleve/2014/07/07/floating-point-
numbers/)

~~~
ubernostrum
Python has math.isclose() in the standard library, with configurable
tolerances:

[https://docs.python.org/3/library/math.html#math.isclose](https://docs.python.org/3/library/math.html#math.isclose)

------
mikeash
Formatting note for the author: on Safari Mac, something is causing ff and fl
ligatures to be applied even to the monospaced code, which makes it look kind
of weird.

~~~
mrkline
Is it better now?

~~~
mikeash
I still see it in a couple of places where code-formatted text is inline with
regular text, for example "relative_difference." The code blocks themselves
look good.

~~~
mrkline
Hopefully that fixes it. Sorry I suck at CSS.

~~~
mikeash
Looks good now!

------
mjevans
I try to always remember:

Floats are great for quick, mostly correct, math.

Think REALLY carefully about any process that then 'compares' the result.
Usually you're "doing it wrong" if that's the case.

I think I might even find it useful if compilers could be instructed to warn
whenever comparison operators were used on a float type value.

------
munro
Hm, this is interesting--in over a decade the only time I can think I've ever
needed to compare floats are 1) deduping duplicate data, naive comparison is
what I want 2) disambiguating messy user data, which I would take floats over
strings any day

------
JoelJacobson
Can someone explain why float is "better" than numeric in this example?

pg1:joel=#* select 1/7::numeric _7; ?column? \------------------------
0.99999999999999999998 (1 row)

pg1:joel=#_ select 1/7::float*7; ?column? \---------- 1 (1 row)

~~~
luhn
Floating point math is a bit fuzzy, so many languages will round to the
nearest integer if the float is within a certain margin.

Numeric types are meant to be exact, so they will be represented as an exact
value.

~~~
tmyklebu
Floating-point math is carefully-defined. There are multiple independently-
developed but interoperable implementations and an IEEE standard that talks in
detail about how floating-point math is supposed to work. It's not "a bit
fuzzy."

~~~
ghettoimp
It most certainly _is_ a bit fuzzy. For a fun critique of the newer IEEE
standard by a real FP expert, see:

[http://www.russinoff.com/papers/ieee.pdf](http://www.russinoff.com/papers/ieee.pdf)

For something more concrete, consider Section 8, "Variations Allowed by the
IEEE Floating-Point Standard", of the TestFloat tool for testing floating
point implementations for IEEE compliance:

[http://www.jhauser.us/arithmetic/TestFloat-3c/doc/TestFloat-...](http://www.jhauser.us/arithmetic/TestFloat-3c/doc/TestFloat-
general.html)

And of course, many arithmetic operations (e.g., trig functions) aren't even
covered by the standard, which occasionally provokes consternation like
this...

[https://forums.theregister.co.uk/forum/1/2014/10/10/intel_un...](https://forums.theregister.co.uk/forum/1/2014/10/10/intel_underestimates_error_bounds_by_13_quintillion/)

~~~
simonbyrne
The problem with the "floating point is fuzzy" comment is that people start
treating it as some sort of black box, or as if the results are random
somehow. Sure there are a few weird things with floating point status flags,
but mostly "fuzziness" is perfectly understandable when you grasp what it is
doing (including the usual 0.1 + 0.2 != 0.3).

Also the standard does specify trig functions (§9.2 Recommended correctly
rounded functions), but that's one of the optional parts, and as far as I know
no one has actually implemented them fully (CRlibm came close, but I don't
think their pow function has been fully proven to be correctly rounded, and in
any case it isn't widely used).

This is actually a big problem with most standards: a lot of them contain
finicky details about which only a very small subset of people care. As far as
I know, there still isn't a C compiler that implements all the PRAGMAs
specified in the C-1999/C-2011 specs.

------
rusk
My basic rule of thumb is to only use floats at the presentation layer, for
storage, or for measurements. If you're doing any serious calculations you
need to normalise to a more reliable format first.

