Hacker News new | past | comments | ask | show | jobs | submit login
Demystifying NaN for the Working Programmer (lucidchart.com)
57 points by Zacru on March 5, 2022 | hide | past | favorite | 36 comments



NaN is a cancer. The choice that NaN == Nan being false is just wrong. Every type, every variable can have multiple reason for being invalid. Yet, no other type has ever chosen to make invalid values not being equal to themselves.

Pointers can be invalid. They can be invalid for any number of reason. Lack of memory, object not found, etc. No one ever suggest that null should not equal null.

File handle can be invalid. They can be invalid for any number of reasons: file not found, access denied, file server is offline. No one has ever made invalid handles not being equal to themselves.

The justification for NaN not being equal to themselves is just bonk.


In a world without generic programming, NaN not being equal to itself makes a certain amount of sense for some kinds of numeric code. But in a world with reusable generic algorithms the calculation changes -- here equality/ordering relations really must be transitive or weird shit happens. In C++ it's undefined behavior to call `std::sort` or `std::unique` on list of floats containing NaN.

Most languages nowadays have standard-library functions/types that require well-behaved equality, so why have a builtin type for which equality is not well-behaved?


>The justification for NaN not being equal to themselves is just bonk.

It makes a lot of sense to me. NaN indicates data has been lost. You did something and you stored the result in a number datatype but the result isn't a number. Data was lost. You lost the data and have only 'your answer wasn't a number.'

Comparing NaN with NaN is asking the computer 'we have two buckets that have overflowed, were their contents the same?' The answer is 'we don't know' which means, to err on the side of safety, the answer is 'no.'

No?


Let's say you make a particular NaN equal to itself.

But then it's sensible for different operations to give you different NaN values.

And you still wouldn't say that 4 < NaN is true, or NaN < 4 is true, would you?

So it's still going to confuse the user. Is just changing equality going to give you a better system overall?


Infinity is NaN, but 4 < infinity is true.


Signed infinity represents aggressive rounding, but you still know roughly what number it is. It works out well to let it still participate in equality and ordering. NaN can be created in many different ways and there is no way to say basically anything about what numbers it could be related to.

You could arbitrarily make NaN sort as if it was a certain value, and that would be useful when you want to sort a big array, but it would have unpleasant side effects when you're doing math. IEEE decided "always false" was less likely to cause problems, but to be clear you get problems no matter what you choose.


My point was you can have multiple types of “Not a Number” used at the same time.

Defining just error, undefined, positive infinity, and negative infinity is far from an exhaustive list but they obviously should be treated differently in many contexts.


Well let's look at wikipedia's list of things that cause IEEE NaN, since they already split out Infinity.

0/0, ∞/∞, ∞%n, n%0, ∞-∞, results with imaginary components

The first five have no meaningful approximation or way to interact with anything. And there's no good way to pretend a single float is a complex number.

So those results get the "this doesn't exist" treatment. Coder's choice if NaN triggers errors or not.

Would you try to define any more behavior for any of those NaNs?


Only if you wanted to treat them as more severe types of NaN, Aka ∞-∞ - ∞ can evaluate to ∞-∞, but ∞-∞ / 0 = 0/0.

Though really that choice is just about what kind of errors you’re showing users.


infinity is not NaN


IEEE 754 NaN’s are arguably defined as different things, but that’s not the only system to use floating point Math and NaN.

Mathematically different infinities are not equivalent because infinity is not a number. How you represent that is dependent on the specific system involved.


Since math is one of the few things I fell confident talking about:

It is true that different infinities exists[0][1] and there are whole areas of logic examining them.

It is also true that _number_ without any extra qualifier generally means the Real numbers (R) or Complex numbers (C) and those domain do not define infinity as a number, but even then there are only a few good ways to add infinity into each number system:

In R generally you either add a projective point of infinity ∞ [2] that makes geometry sometimes nices or two signed infinities (-∞ and +∞) that make calculus nicer (especially limits and integration)

In C it is often simpler as typically you want to treat them as a sphere and so add an extra point so that the inversion f(x) = 1/x is a well-behaved function. In this domain you often end up working with holomorphic functions[3] and then there is not really an intrinsic difference between a function like f(x) = 1/x and g(x) = x they simply have both a _pole_ f at 0 and g at infinity.

If you want to get trippy even integers can have unusual definitions [4] and then there is always one of my favorite topic in math: surreal numbers [5] (for which I recommend both [6] and [7]) a field where √∞ < ∞/2 < ∞ - 1 < ∞ < ∞ + 1 and is perfectly well defined (but still 0/0 doesn't have any meaning in any of these theories, that is a though nut to crack)

[0]https://en.wikipedia.org/wiki/Ordinal_number [1]https://en.wikipedia.org/wiki/Cardinal_number [2]https://en.wikipedia.org/wiki/Projective_geometry [3]https://en.wikipedia.org/wiki/Holomorphic_function [4]https://en.wikipedia.org/wiki/Algebraic_integer [5]https://en.wikipedia.org/wiki/Surreal_number [6]https://www-cs-faculty.stanford.edu/~knuth/sn.html [7]https://books.google.it/books?id=tXiVo8qA5PQC&redir_esc=y


Note (without disagreeing). In SQL NULL!= NULL


This article conflates the representational limits of floating point with the concept of NaN in a way that I suspect will lead to more confusion, not less.

Zero/zero doesn’t return NaN because it isn’t representable within floating point - it returns NaN because it is an expression that has no mathematical meaning.

The fact that sqrt(-1) has two valid nonreal answers has nothing to do with why it returns NaN - after all, sqrt(4) has two valid real answers so is also technically not representable by a single floating point value, but that doesn’t typically result in NaN.

NaN is just an error value you get when you ask floating point math a dumb question it can’t usefully answer.

Far more interesting and subtle are the ways in which positive and negative infinity and positive and negative zero let you actually still obtain useful (at least for purposes of things like comparison) results to certain calculations even if they overflow the representable range.


> The only reliable way to test for NaN is to use a language-dependent built-in function; the expression a === NaN is always false

Well, you test for it by comparing the value against itself and seeing if that returns false.

(There’s also a bit of confusion on by value vs. by reference comparison and the actual bit value on a NaN, which isn’t quite right.)


Signaling NaNs raise exceptions in some operations. Is comparison one of these?


They “raise exceptions” in the IEEE 754 sense, which is not at all the same thing as what most programming languages mean by “raise exception”. It means that they set a sticky flag in a register that may be queried at a later point, not that program control flow is redirected.


The only use I saw for this is that you can enable compiler flags to crash the program when NaNs are encountered. Useful for testing Fortran code, in my experience. I didn’t see any support for other languages I’ve used.


C lets you set and query the flag state with the `<fenv.h>` functions in theory, but compiler support for rigorously adhering to IEEE 754 semantics around these operations is pretty limited in most compilers, to say the least. Clang has been making some progress on support recently.


I dislike this article, as it tries repeatedly to imply that the use of NaN is somehow a restriction cause by floating point.

No ieee754 ever produces a NaN result unless the operation has no valid result in the set of real values.

Similarly the behaviour in comparisons: if you want NaN to equal NaN you have to come up with a definition of equality that is also consistent with

    NaN < X

    NaN > X

    NaN == X
The logical result of this is that NaN does not equal itself, and I believe mathematicians agree on that definition. Again not a result of the representation, but a result of the mathematical rules of real values.

I want to be very clear here: floating point math always produces the correct value rounded (according to rounding mode) to the appropriate value in the represented space unless it is fundamentally not possible. The only place where floating point (or indeed any finite representation) produces an incorrectly rounded result are the transcendental functions, where some values can only be correctly rounded if you compute the exact value, but the exact value is irrational.

People seem hell bent on complaining about floating point behavior, but it is fundamentally mathematically sound. IEEE754 also specifies some functions like e^x-1 explicitly to ensure that you get the best possible accuracy for the core arithmetic operations


greater-than and less-than already make no sense around NaN, you won't get much worse, I don't get what you're trying to point out with them. This is less a question about mathematical correctness (which there isn't much around NaN anyway), but more practical. There being this annoying NaN that breaks everything if its in an array to be sorted or in a set or a key in a map is just pure awful.


Correct they don’t make sense, but given < and > return a Boolean in the ieee environment they need to produce a deterministic value.

As you say relations with NaN don’t make sense, but given the requirement of a single value NaN != NaN makes the most “sense” mathematically, and a core principle of ieee754 was ensuring the most accurate rendition of true maths with a finite representation (see a bunch of papers by Kahan).

Of course x87’s ieee754 implementation does actually have multiple NaNs, infinities, and representations of the same value. For all its quirks remember x87 was what demonstrated that the ieee754 specification could be made fast and affordably, which non-intel manufacturers were all claiming was impossible. The only real “flaw”* in x87 was the explicit leading 1, which was an artifact of it intel being sufficiently ahead of the curve to predate dropping it.

* the x87 transcendtals are known to be hopelessly inaccurate, but that in theory could have been fixed, whereas the format could not be.


mathematically, yes. In practice, NaN!=NaN just kills any hope of having any amount of sanity for operations that don't care about floating-point and just want to generally compare things. It's not very nice to say "sorting, hashmaps & hashsets containing NaNs cause the entire operation/structure to be completely undefined behavior", especially given that NaNs kind exist to allow noticing errors, not cause even more of them.


I did a code assignment for a potential JavaScript heavy job in 2014, for some reason I think isNaN was part of the language then because I have a memory deciding not to use it (but could be misremembering), at any rate I did Number(x) !== Number(x) at some point.

In the meeting when they went over the code the guy who did it said we were wondering why you did this? So I had to explain NaN to him. He really did not know it existed. At any rate I thought this is a weird thing not to know anything at all about.


Related: I’ve met developers who think NaNs are a language or library (notably pandas) feature.


Imagine doing if(x) ..., where x can be NaN. Shouldn't that throw an exception in most cases? Why are our compilers not doing it that way?


Should it? It isn't obvious to me at all that throwing an exception in this case is the best behaviour. Throwing an exception when testing a value for 'truthiness' is extremely surprising.

On the other hand, I would strongly discourage 'if(x)' where x is a float that may be NaN purely because the 'correct' behaviour here isn't clear to me.


How about the case where x is (y > 0)? If y is NaN, shouldn't x be boolean-NaN? And shouldn't if(x) throw an exception? Or shouldn't (y > 0) throw an exception if you don't want boolean-NaNs?


That's easy: y > 0 is False, not NaN.

You may not think this is wise, but this is very much how comparisons with NaN are defined.

And I think this is better than exception raising. Again, I think it would be _really_ weird for simple value comparisons to throw.


> Again, I think it would be _really_ weird for simple value comparisons to throw.

But ... why?

You may say that NaN > 0 is defined as False, but we know that's not how programmers think, most of the time.

In code like if(y > 0) steer_car_to_left() I don't want the compiler or the IEEE standard to make any choices for me! Let it throw, so emergency systems can kick in.


The compiler presumably can't know in most cases, but the runtime might be able to throw. It depends on the language implementation and the tradeoffs.


Excellent article, this helped me understand the issues working with floating numbers. I work with them quite a lot when developing business logic and often times NaN can be a pain. Understanding why helps a lot.


I love NaNs, especially their "infectious" quality. Initializing float variable to NaNs before first assignment can make a lot of errors immediately obvious. I wish there were a NaN for integers.


What about a “nullable” double? In C#, you’d use `double?`, Rust would be Option<f64>, C++ would be std::optional<double>. Then any operation would throw upon an unset value?


That would required every operation on a floating point value to return an optional, which you’d then need to unwrap and branch on.


Don’t initialize them and turn on UBSan :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: