Hacker News new | past | comments | ask | show | jobs | submit login
An Interview with the Old Man of Floating-Point (1998) (eecs.berkeley.edu)
115 points by rjeli on Oct 29, 2021 | hide | past | favorite | 76 comments



When I teach about floating point, the two things I try to impress on the students are: it remains a truly incredible engineering feat to believably fit the entire real number line (plus infinities) into 32 or 64 bits, and, it was an incredible political feat to get so many competing companies to agree on one particular way of doing this; both are thanks to Kahan's leadership. Complaints about the quirks of using floating point could be tempered with some appreciation of the hard design decisions that were made, and with gratitude for the people who pulled it off.


Yes. And one take-away for me: when yet another article comes out along the lines of "floating point sucks, and here's a much simpler and better replacement", and the author doesn't mention Kahan and shows in detail that they understand the design tradeoffs and decisions made back then (in IEEE 754), then there's a very good chance that you can toss it.


+1. There are a lot of messy things in the world, and someone comes along as says, "The old way is too messy and complicated, here's a new, simpler, better way," and they don't understand why the old messy thing was messy in the first place. And they don't understand what it was about the old thing that allowed it to last for all those decades.

Somebody posted this anecdote a few days/weeks back, and it stuck with me:

> There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”

https://fs.blog/chestertons-fence/


A lot of the design tradeoffs are not really relevant anymore[0]. There are some ways in which 754 effectively makes a "this is UB, up to the manufacturer" choice (to appease manufacturers of the day) which these days would probably not fly; it's a much easier sell to declare "no ub" (or the equivalent for hw) because we have retrospective power over all the times those were problems, and the hw manufacturers have far LESS power than the application consumer these days.

[0] for example iirc cray had a wonky multiplier, don't remember if it was 754, that (I guess) they thought made it faster but resulted in noncommutative multiplication for many cases.


Some (a lot?) of those got cleaned up in the 2008 revision [1]. And for many practical purposes, once x87 went away as part of x86-64, we now have a world of much more sensible agreement.

We're likely coming back into a period of divergence with ARM vs x86 parts on edge case handling that aren't strictly stated as MUST in the standard (various things related to qNaNs, sNaNs, denormal handling, +-0) but they're minor compared to the "old days".

[1] https://en.m.wikipedia.org/wiki/IEEE_754-2008_revision



noncommutative multiplication? what the hell were they doing?


My guess is they were chunking the multiplier cascade and handling using parallel circuits so that the waveform could be independent across the chunks and not be dependent?

Edit: found it.

2240004 3-24 C (you will want to search by this key, it's a long document!)

http://ed-thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1-HRM....

"Note that reversing the multiplier and multiplicand operands could cause slightly different results, that is, A x B is not necessarily the same as [B] x A"


Looks like I have some weekend reading to do. Thanks!


I found it instructive to do 8-bit floating point. Keep it simple, 1 bit for sign, 3 bits for exponent, 4 for mantissa, and don't worry about the 'free' extra bit you get on the mantissa. Just compute - exactly! - the values those 256 patterns represent. Now, what pattern of the values emerges? How do the numbers around zero work, vs numbers on the extremes? Are there applications to use your 8-bit floats instead of, say, fixed-point 8-bit representations? Bonus: why do some graphics processors have 16-bit floats?


8-bit floating point numbers (in two variants, mu-law and A-law, which differed in the treatment of denormals, A-law was like IEEE floats, mu-law requires a longer explication), about the same as in your choice, have been used for the digitally-encoded telephone audio signals (PCM audio at 8 kHz sampling rate).

They are probably still used in various older communication equipments.

Using floating-point had the purpose to reduce the quantization noise in comparison to 8-bit fixed-point numbers to the level corresponding to 12-bit or 13-bit fixed-point numbers.

So this was effectively a method of compressing by 50% the bit rate of voice signals.

Later, much better audio compression algorithms have been developed, allowing e.g. a 10 times compression, and such algorithms are used in the modern mobile phones.

Nevertheless, 8-bit floating-point was used for many decades in telephony.


The smallest float you can do that still shows all the patterns is 5-bit (2-bit exponent). If you had only 1 bit for the exponent, you'd go straight from denormal to infinity.


Modern AI hardware is starting to do 8-bit float.


True, it's a combo technical and political achievement -- an extremely rare feat.

Some complaints should be tempered, some others should be flattened into a bare acknowledgement that floating point is simply the wrong tool for the job, e.g. currency.

Maybe one "complaint" that remains is that floating point is too good and displaces progress in development and support for other number formats that are needed in their neiche like bfloat or fixed point.


> Some complaints should be tempered, some others should be flattened into a bare acknowledgement that floating point is simply the wrong tool for the job, e.g. currency.

IBM implemented IEEE 854 Radix-10 floating point (which later got subsumed back into IEEE 754) back in the System z9 days.

In fact, testing currency handling is the only thing that Bitcoin is actually useful for, LOL. If you take Bitcoin, you get smacked with a lot of fractional digits that break your currency system right away if you didn't do it right.


That's news to me, apparently 754 was updated in 2008 to include base-10 representations. I didn't even know IEEE standards have revisions at all, multiple mind blows here.

https://en.m.wikipedia.org/wiki/IEEE_854-1987


Decimal floats are also available in gcc

https://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html


good point re currency computations. Support for binary-coded decimal is something that deserves to be improved in modern languages (COBOL had it).


>the entire real number line

I'm not a mathematician but perhaps one could confirm, this is wrong, no? Irrationals e.g. Pi, sqrt2, cannot be represented. The floating point numbers are like a finite number of teeth on a comb pressed against the continuous real line.


Exactly. Floating-point numbers cannot represent such numbers without error, yet we still see people asserting that floats represent "the entire real number line." The first format to represent such numbers honestly was the original unum format, where the last bit of the fraction indicates if the number is exact or represents the open interval between exact numbers. Like saying pi is 3.14... means pi is between 3.14 and 3.15, a mathematically honest statement. The presence or absence of "..." as a bit in the number was the main idea behind unum arithmetic.


Sure. Obviously, it is only a finite sampling of the real number line, and it is only sampling at a particular set of rational numbers. But it is a sampling with a density that is (roughly) scale invariant. That enables FP computations to have (roughly) the same precision regardless of the overall scaling (choice of units), and that underlies the illusion that you have the whole number line at your disposal when you do computations with FP numbers.


I don't get the purpose of this comment. Yes, it's great that we have IEEE floats standardised across architectures. And yes, it's also true that IEEE floats have serious flaws and that in the intervening decades vastly superior alternatives have been proposed. How are these two assertions incompatible? Are we supposed to be forever thankful for the political and standardisation feat that we never again attempt to improve on a flawed design?


I wonder which are the serious flaws of the IEEE floats in your opinion.

I am familiar with most of the floating-point number formats that have ever been used in computers and there is no doubt that as a general-purpose numeric format all the previous floating-point alternatives have been vastly inferior to the IEEE formats.

For special-purpose niches, usually when a low precision is good enough, it is indeed possible to use some other numeric formats suitable for approximate numbers, e.g. logarithmic numbers, fixed-point numbers, low-precision FP numbers, unums/posits and a few others, which may have certain advantages over the IEEE FP numbers. e.g. a lower cost or higher speed for a given (low) precision, but even for those niches using IEEE FP numbers is usually a decent alternative, not a vastly inferior one.


You mention posits, which are in almost every conceivable aspect superior to IEEE floats. If a similar standardising effort was pushed today we could have vastly better FP in a decade.


Unums and posits are ingenious and interesting.

For low precision applications posits may be a good choice, sometimes the best.

However it remains to be demonstrated that for high precisions they can be implemented in hardware with similar cost and speed as traditional floating-point numbers.

Regarding the actual advantages of posits, the papers of Gustafson et al. do not inspire much confidence, because besides some correct arguments about genuine advantages of posits, the papers are also full of BS claims, e.g. the claim that posits have the advantage of not generating NaNs.

They do not generate NaNs, because they generate an exception, which is exactly what the IEEE floats also do when you enable trap-on-undefined-operation.

There are also other very dubious claims. While it is very likely that for low precisions posits should be superior and that they might have been a much better alternative to the proliferation of various 16-bit FP formats for machine learning, I have yet to see a single example with posits behaving better than traditional FP formats at high precisions.

For low precisions, there is also the alternative of logarithmic numbers, which allow very fast arithmetic operations but which require look-up tables for addition and subtraction. It is not clear in which applications posits are preferable and in which applications logarithmic numbers are preferable.


>Low precision

This is not true. You base your whole comment on the notion that posits are only good for low precision? This is not correct, for instance 32-bit posits outperform 64-bit floats.

> e.g. the claim that posits have the advantage of not generating Nans. They do not generate NaNs, because they generate an exception

This is incorrect also. There are no exceptions in the posit proposal! The point is that e.g. a division by zero is simply a bug in the code, like violating any other ordinary invariant/precondition in ordinary code. If you want exceptions, turn on debugging in your compiler like you would turn on bounds checks on array access for instance. It's silly to require always-on debug assertions and maximum performance at the same time, which is what IEEE floats attempt and fail at.

Regarding NaN, the advantage in posits is that there is only one such value (unsigned infinity), rather than the quadrillions of NaN bit patterns in IEEE floats.

> I have yet to see a single example with posits behaving better than traditional FP formats at high precisions.

Again, then you cannot have read the posit proposal thoroughly x) they show that 32-bit floats outperform 64-bit floats in precision, sacrificing only some dynamic range which is useless anyway.

Consider now that most HPC applications nowadays are I/O bound... switching to 32-bit posits increases precision and slashes memory requirements by half. It's immense.


Claiming that a 32-bit numeric format can outperform a 64-bit format is an example of BS claim.

Al 32-bit formats have the same number of points that are distributed over the real numbers, partitioning them in intervals.

The difference in the possible numeric formats is only in the position of the points, so the number of intervals is identical. When for a numeric format the intervals in a certain area are smaller, i.e. the precision is better, that means that in another area the intervals must be larger, so the precision must be worse. Which areas are more important depends on the problem that must be solved.

The posits are just floating-point numbers where the partitioning between the logarithmic part (the exponent) and the linear part (the fraction) is not fixed, but variable.

The posits close to 1 have more fraction bits, while the posits closer to 0 and to infinities have more exponent bits and less fraction bits.

For certain problems 32-bit posits can outperform 32-bit IEEE floats, but 32-bit posits cannot outperform any 64-bit numeric format, because the 64-bit format has billions times more intervals so a 32-bit format does not have any chance of approximating better a number.

Posits can outperform standard floats only at low precision, because in the formats with few bits there are not enough bits to reserve for the exponent, so the exponent range is small, which makes overflows and underflows very likely.

Posits can have a much larger exponent range, while maintaining a good precision close to 1, paying their extended exponent range with reduced precision towards 0 and infinities, i.e. posits have a gradual underflow and overflow, which are more gradual than the IEEE gradual underflow.

At double precision or higher precisions, the IEEE floats have enough exponent range that underflows and overflows become very unlikely, so posits no longer have any advantage.

The standard floating-point numbers have an almost constant relative approximation error over their exponent range.

For most serious numerical work this is the approximation property that is desired. Posits have a variable relative approximation error that becomes worse and worse for small and large numbers. This is normally undesirable. Complex physical models always have both very small numbers and very large numbers that must be approximated well, even better than the numbers close to 1 that are preferred by posits.

Nevertheless for small floating-point numbers, e.g. 16-bit FP, avoiding overflows and underflows becomes more important than the loss of precision at exponent range extremities, so posits are better.

32-bit floats are around the threshold where posits transition from being better to being worse that standard FP numbers.

Depending on the problem, 32-bit floats or 32-bit posits may be better. On the other hand, I have never seen any problem where 64-bit posits, with their worse relative errors, could be better than IEEE double precision.


The original Stanford talk on posits suggested that they generate an exception and not the equivalent of a NaN. A few months later, I changed my mind and the (unique) NaR bit pattern serves the same purpose as NaN does in floats. We have also learned that the best exponent size (es or eS) is 2 bits, independent of the precision of the posit. So there have been some tweaks, but the basic concept is unchanged.


Many things are possible in our imagination. I'm grateful that my actual world, today, we have working floating point.


Okay, I still don't get the purpose of your comment.

Imagine rust is being built and you reply

> Many things are possible in our imagination. I'm grateful in my actual world we have working BCPL.

:^)


Is an interesting topic and seems to have an interesting history, is there any good books on the topic? Something less academic and dry than I’d expect a floating point book to be but more all encompassing and accessible?


"The End of Error: Unum Computing" is written for a popular audience, not fellow mathematicians. Only high school math is needed, and it's got plenty of humor and full-color illustrations and figures, in an attempt to make a very dry topic into something interesting.

A book on posit arithmetic is in the works.


Favorite quotes:

> members of the committee, for the most part, were about equally altruistic. IBM's Dr. Fred Ris was extremely supportive from the outset even though he knew that no IBM equipment in existence at the time had the slightest hope of conforming to the standard we were advocating. It was remarkable that so many hardware people there, knowing how difficult p754 would be, agreed that it should benefit the community at large. If it encouraged the production of floating-point software and eased the development of reliable software, it would help create a larger market for everyone's hardware. This degree of altruism was so astonishing that MATLAB's creator Dr. Cleve Moler used to advise foreign visitors not to miss the country's two most awesome spectacles: the Grand Canyon, and meetings of IEEE p754.

> In the usual standards meetings everybody wants to grandfather in his own product. I think it is nice to have at least one example -- IEEE 754 is one -- where sleaze did not triumph. CDC, Cray and IBM could have weighed in and destroyed the whole thing had they so wished. Perhaps CDC and Cray thought `Microprocessors? Why should we worry?' In the end, all computer systems designers must try to make our things work well for the innumerable ( innumerate ?) programmers upon whom we all depend for the existence of a burgeoning market.

> Epilog: The ACM's Turing award went to Kahan in 1989.


This is quite the whitewashing of something that was super contentious.

IEEE 754 is okay as a technical standard. I've seen more denormals as a place to stash data than I ever have as a computational result, but, meh.

However, the political reason wasn't that everybody were being magically altruistic. IBM and DEC, especially, were killing it and were in no way going to allow the other to set the standard. And everybody else was keen to stop IBM or DEC from having the de facto standard which would cement their dominance further.

For example, if I remember correctly, Cray arithmetic was notorious for never actually being anything compliant (something about their multipliers).


I believe that you are right, and we are very lucky for these historical circumstances, when both IBM and DEC have preferred to support the Intel floating-point number format, rather than accept the format of their major competitor.

The Intel format, which is due to William Kahan, but also to Jerome Coonen and John Palmer and a few others with lesser contributions, was a huge improvement over the IBM and DEC floating-point formats.

I have written my first programs when I was in high-school, for some IBM mainframes and DEC PDP-11 computers. Then I have used the Microsoft compilers for several languages, for Z80 / Intel 8080, which used FP formats similar to those of DEC.

The IBM and DEC formats were really ugly and writing programs for them included many pitfalls. When I began to use an IBM PC, with its much more foolproof FP format, all problems disappeared.

In general all processors introduced by Intel since their beginning and until the nineties (when their competition withered) lacked any innovative features. Every improvements in the early Intel CPUs had already been introduced earlier in CPUs from competitors.

To this lack of innovation in CPU architecture, there is a very important exception, the IEEE 754 standard, which was based on the Intel format with very minor modifications.

The Intel FP number format was one of the most important events in the history of floating-point numbers, the only other events with similar importance were when IBM made the first computers with hardware FP units (IBM 704 and NORC, in 1954) and when IBM introduced fused multiply-add (IBM POWER, in 1990).


> The IBM and DEC formats were really ugly and writing programs for them included many pitfalls. When I began to use an IBM PC, with its much more foolproof FP format, all problems disappeared.

I remember the IBM format was based on hexadecimal, and was thus criticized for wobbling precision. I thought the DEC format was pretty similar to the Intel one. What problems did you have with it, that disappeared when you switched to the IBM PC?


Yeah, I don't remember the DEC format being poor, and some of the differences which did exist really felt like "make sure IEEE-754 is gratuitously incompatible with DEC".

Denormals/gradual underflow support in hardware was one of those, for example. DEC generally trapped on those and let a library handle it. Most of the time, gradual underflow bites people by hiding the precision loss rather than being something useful to exploit for speed.

Hardware support for it meant that people tended to "stash" information in there just like they do for NaN-boxing. AutoCAD, for example, was notorious for having a zillion denormals, none of which had anything to do with calculation.

The number of people who ever benefited from gradual underflow was ridiculously tiny while the number of people who suffered performance loss was huge. This was one of those tradeoffs that wasn't worth it for a very long time--only now that hardware is practically free does it not matter.


The DEC single-precision format was close to the IEEE format, the main difference being that it did not have a useful behavior if the exceptions were masked, so they had to be handled correctly.

On the other hand, the double-precision format, in order to reduce the cost in the hardware FPU, had the same exponent range as single-precision, not like the IEEE formats, which have greater exponent ranges in the longer formats.

The exponent range typical for single-precision FP formats is really too small for solving many problems of physical modelling and simulation.

With the DEC FP format, I ran into overflows very frequently, so I had to add various scale factors in many equations.

On IBM PCs, such overflows never appeared, so there was no need to waste time with determining adequate scale factors. The precise reason why floating-point numbers were introduced was to no longer waste time with the scale factors, as it is needed for computations with fixed-point numbers.


I note you said "DEC format". Were you perhaps on PDPs instead of VAXen?

I'm pretty sure that VAX had support for G_floating contemporaneously with the 8087 (I actually think it predates it but I can't prove it). I can see that MicroVAX had support for it back in 1985 which would seem to imply that mainline VAX had it probably for several years prior.


When it was launched, in October 1977, VAX supported only the "D" double-precision floating-point format, which was the same as in the previous DEC computers, like PDP-11, and which had a too small exponent range.

In 1977 there was the first publication about the Intel floating-point format and the standardization work started soon after that.

The first devices that supported in hardware the Intel FP formats, which were later adopted as IEEE 754, were AMD Am9512 in 1979 (second sourced by Intel as 8232), then Intel 8087 in 1980. (At that time Intel and AMD were frequently partners, only several years later, when Intel began to gain tons of money from the IBM PC, they no longer wanted to share that money with anyone, so they severed the links with AMD)

Meanwhile the debates about the future standard continued and DEC acknowledged that their "D" format had a too small exponent range, so they introduced the "G" format in VAX, which was similar, but not identical (different exponent bias) with the format proposed by Intel.

In 1978 DEC VAX did not have the "G" format and in 1982 it had the "G" format, so it was introduced between 1979 and 1981, possibly at about the same time as Intel 8087, but in any case it was not completely compliant with the proposed standard and DEC still hoped that they might succeed to impose their own FP formats.


A couple threads from way back:

An Interview with the Old Man of Floating-Point (1998) - https://news.ycombinator.com/item?id=7769303 - May 2014 (17 comments)

An Interview with the Old Man of Floating-Point (1998) - https://news.ycombinator.com/item?id=6656197 - Nov 2013 (21 comments)


One time me and a friend having an animated conversation on the 7th floor of Soda hall at Berkeley and William Kahan came out and gave us a coupon for Sizzlers. I think that was his way of telling us to get the fuck out.


Random anecdote: When I built my first 386 box that had a socket for a 387, I was super eager to fill that socket because even then PC builders were the same as today... but realized there wasn't any software that I used which would utilize it (my QuickC C-compiler didn't even support it!) The first app I remember that used it was Excel. It wasn't till the 486 that commodity games started using it.


Byte magazine !


He taught numerical analysis at Berkeley, and though he was a great guy, I think he was waay to smart to be teaching undergrads... he'd go off on examples about literally every way that things like SVD could go wrong b/c of FP quirks, or how Matlab implements thing incorrectly, etc.


As someone who has implemented a lot of low level functions using all the tricks of Floating point math, I have very mixed thoughts on Floating Point. Nan and -0.0 both seem like aggressively bad ideas to me. I can totally see why it was believed at the time that they would be good, but they just add a ton of special cases if you want to do things right that slow everything down. IMO, it would have been much better if we got errors instead of NaN (like we do for integer division by zero).

That said, the ability to use double-double schemes to extend precision is wonderful and makes things much easier than they are in most of the Floating Point alternatives that have been proposed (eg Posits).


Anyone who does not like NaNs should remember that using NaNs is just an option, nobody forces you to use them.

It is enough to enable exception generation for undefined operations and it is guaranteed that no NaNs can appear in any results.

It turns out that most people think that it is too much work to write suitable exception handlers, so they prefer to mask all exceptions and deal with NaNs and negative zeros.

In practice this just means that you must be careful when you write conditional expressions where floating-point numbers are compared, because the order relation becomes partial, so there are 14 possible relational operations instead of the only 6 that are possible for a total order, and you also must write the compared values in a way where the sign of a zero does not matter (in most expressions the sign of a zero does not influence the result; you must make some efforts to distinguish a negative zero from a positive zero).


From my perspective of implementing algorithms it matters because if I want to follow the spec correctly, I have to deal with them. Even if you don't want to use them, you pay about 10% for pretty much any function that needs special NaN handling (which is most of them). Also, pre AVX-512, these checks make vectorized algorithms way harder to write efficiently.


I agree that if you are writing a library to be used by others, you cannot know how they will choose to configure their floating-point environment, so you are forced to handle all the possible cases in your code.


I'd heard of posits before but hadn't been motivated to learn about them until your comment; thanks. Reminds me of CIDR for IPv4


IMO, 8 and 16 bit posits are way better than the float equivalents, but for 32 and 64 bits, floating point math is easier to analyze.


And before our master Kahan, there was Pat H. Sterbenz. I still have my "Floating Point Computation" in the photocopied bound sheaf that was handed out to numerical analysis grad students at ASU in the late 80s. I learned an enormous amount about what digital computation means in the presence of algorithms in that class.

EDIT: I have an 8087 chip always installed to sitting on my monitor base. Because Kahan.


Note that I have worked with chips designed in this century that did not implement denormals in hardware.


> I think it is nice to have at least one example -- IEEE 754 is one -- where sleaze did not triumph.

A != A if A is a NaN: that's pretty sleazy.

https://en.wikipedia.org/wiki/Law_of_identity


Why? All we know it that both are not-numbers. It's just a label for something that cannot usefully be further identified.

I'm pretty sure mathematicians can come up with many different not-numbers that have to share the same label, with a rather slim chance of being equal ...


They don't have to share the same label. 64 bit floating points have 2^52 different NaN values. Even Float16 has 1024 which should be more than enough for all the indeterminent forms.


Collisions will come up. If you have some allocator for NaN, it will have to start reusing values. The allocator could be inefficient. How do you handle threads? Does each thread have its own NaN-allocating counter? What if threads communicate numeric results to each other and have NaN-counters close in value?

The underlying problem is that a system of calculation which propagates error symbols up the expression tree to indicate error is mixed with a system of Boolean calculation which has no such symbol. The bad calculation bubbles up a NaN up to the level of the comparison. There, the not-a-number gets eaten and becomes a Boolean true or false --- rather than becoming not-a-truth and continuing to bubble up.

There is no satisfactory way to plug NaNs into an expression that produces a clean two-valued Boolean truth with no error indication. You must separately test for the NaN.


If X and Y are the same label, they should compare equal to satisfy the Law of Identity.

If you perform two calculations, whose values are coming up as equal, but you didn't check the fact that both calculated the same NaN, that is your problem.

Suppose you are looking for the result of the two calculations being unequal, and they produce NaN (or at least one of them does). That's also false positive. There is no way to get around checking for NaN.


Numeric equality is not the same as bitwise equality. Use the right operation for your intention.


Nothing which cheerfully concludes that two bitwise-identical operands are different can be called equality.

The equality operation should only apply its own specific logic to a pair of operands which fail the bitwise test.

(That doesn't mean all bits have to be looked at, like if an object has padding bits that don't contribute to the value.)

In floating-point, like IEE754, a positive and negative zero still compare equal; they failed the bitwise test, so then the numeric logic still concludes they are the same number.


>Nothing which cheerfully concludes that two bitwise-identical operands are different can be called equality.

You are asserting that as if it were some kind of law of nature, but I'm afraid that's just your personal misconception. The IEEE754 equality function is defined such that it can sometimes be false when applied to identical operands. There are very good reasons why that is the case, just like there are very good reasons why NULL = NULL is false in SQL.


> some kind of law of nature

See upthread: law of identity. That's even higher than nature.

> The IEEE754 equality function is defined such that it can sometimes be false when applied to identical operands.

Cool! So, OK, (1) don't promote that function into the fundamental equality operator of programming languages; put it in a library. (2) provide a real equality in parallel.

I should be able to do things like this:

   double x = find_number_in_list(list, number, not_found_nan);

   if (x == not_found_nan) { /* wasn't found */ }
I see you are referring to "identical operands"; that requires the intellectual basis of an identity relation under entities are identical to themselves. Otherwise there is no foundation for your sentence.

We can have all sorts of weird functions whose properties are useful insofar that they remove complexity or verbosity from specific scenarios in which they are used.

For instance, it's useful to have a numeric equality operator which tests that two values are within a small epsilon of each other and reports true. That's not even an equivalence relation though; near(A, B) and near(B, C) does not imply near(A, C).

It would be pretty irresponsible to put this behavior into the principal equality, like the one and only built-in == operator of a language or what have you.

> there are very good reasons why NULL = NULL is false in SQL.

I don't know much about SQL, but I understand why, in a join of two tables based on some field equivalence, you wouldn't want to include the entries where the fields are NULL.

However, there is no conflict here with the Law of Identity.

In a database, a NULL field is an external representation stored in a table. We can define a model of computation whereby when the database record is internalized for processing, each NULL field value maps to a unique object. This is very similar to the concept of an uninterned symbol in Common Lisp:

   (eq '#:null '#:null) -> NIL
The #: syntax, whenever scanned, creates a new symbol object. There are two different objects in the above eq call which only look the same when printed, because they have the same name. This idea could be used to implement null database field values.

However, we can retain the idea that if we pull a null value from a specific field from a specific record, that null value is equal to itself. We do not have to throw the Law of Identity under the bus, in other words:

    (let ((sym '#:null)) (eq sym sym)) -> T

If we do the following in SQL (pardon me if I don't have the syntax right), I expect the table to be replicated in the selection:

    select * from X where X.Y == X.Y
records where Y is NULL should not be missing. If A and B are different records in this X table, such that A.Y and B.Y are NULL, then A.Y == B.Y can be false; that is perfectly okay.


No one was talking about bitwise equality though. Bitwise equality means positive and negative zero do not compare as equal, and some NaNs compare as equal while others don't, in ways that will vary according to exactly where those NaNs came from. There may be times where that might be useful but it would be uncommon.


> A != A if A is a NaN

Among other things, this is super useful insofar as it gives a reliable way to test for NaN.


Usable and reliable is not the same thing as good design or good idea.

The fact that Windows reserves the PRN file name in every directory is probably usable and reliable to someone. That doesn't mean it's a good design for providing access to a printer device.

To test a set membership property of an object, you want a predicate function which takes the object as its only argument. For instance isnan(x).

This predicate can be efficiently implemented without relying on a bastardized equality operation.


Consider that the compiler cannot optimize A != A into false, or A == A into true, because NaN values can occur at run time.

While you might not explicitly write A == A into your code, it could occur implicitly due to some macro expansion, inline expansion or other code transformation.

I think GCC with -ffast-math gets rid of this NaN rule and does such optimizations anyway. (Your code just has to avoid generating NaNs so that the optimizations are valid.)


What would you do? The comparison has no meaning if NaN.


If it has no meaning, then the comparison should have an unspecified result, not true. Otherwise it has a meaning: the meaning of producing true! However, it is a poorly considered meaning which requires a thing to be different from itself.

Since NaN values are valid representations which play a role in the system, and can be used in operations (such as comparing a NaN to a number, which is false), each of them must compare equal to itself.

If the bits on the left are the same as the bits on the right, the comparison is true. Distinct NaN bit patterns are unequal. Simple as that.

Whatever I would do, I would make sure that a comparison observes the Law of Identity.

(I'd rather not have Inf and NaN at all; operations should just generate an exception if they can't come up with a number.)


You have a fairly complex logic being stuffed into a binary operation — NaN == NaN and NaN != NaN are both irresponsible. The same with comparing to INF. The correct answer is that boolean operations don’t successfully represent the possibilities, and shouldn’t be offered in the first place.

It’s the same with NULL in SQL.


That is a valid view. NaN is supposed to propagate an error value, and that concept should continue through Boolean expressions. So that is to say, there has to be a NaT (not a truth) value which results instead of true or false, if a NaN is involved in a relational expression.

Problem is, that is impractical. Programming languages tend to have two-valued Boolen baked into their DNA; it's implicit in if/then/else conditionals which will have to treat NaT as false --- back to square one.

Programming languages with two-valued Booleans are not going to accommodate such a thing (it is not as easy to sneak in as NaN into floating-point). Even if they were to, programmers are going to be reluctant to turn every if/then situation into a three-way switch.


Besides the case of floating-point numbers with NaNs there are many other cases of partial order relations.

The problem is that most people know how to handle only total order relations, for which only 6 operations with Boolean results can be defined (equal and not-equal, less and greater-or-equal, greater and less-or-equal).

While it is possible to handle partial orders using ternary logic, it is easier to handle them with operations with Boolean results, so this is what all programming languages either provide or they should provide.

The difference is that for partial orders you no longer have only 6 operations with Boolean results (3 plus their negations), but you have 14 operations (7 + their negations).

One operation pair is ordered / unordered (ordered means either equal or less or greater).

The other 6 pairs correspond to the 6 well known relational operators from the total order, which now no longer are each other's negation, together with their 6 negations, which now include the possibility that the 2 operands are unordered.

For example, corresponding to the negation pair less and greater-or-equal from the total order, for a partial order there are 2 negation pairs, less and not-less (i.e. greater, equal or unordered) and greater-or-equal and neither-greater-nor-equal (i.e. less-or-unordered).

In the education of the programmers there should be more stress on the possible relational operators for partial order relations, because they appear in many situations, both for FP computations and for databases, and handling partial orders is only slightly more complex than handling total orders, but many people are not accustomed to this.


Do you also think NULL should equal NULL in SQL?

Read that Chesterton quote somebody posted above. It sounds like you need to study the standard a bit more, because you don’t seem to understand why NaNs are the way they are. You’re arguing from principles that don’t apply.


I don't know enough to say; all I remember from the few interactions I've had with SQL over my long programming career is that it's intellectually unsavory as a whole.

I do think NIL should be EQ to NIL in Lisp.


You must like dividing by zero and never knowing about it. There's a reason why NaN blows things up. It's by design so that math errors don't propagate everywhere.


No; I want an exception. The software image dies, unless it's handed.


I hope I'm not too late to the party to correct some things I see here. The big accomplishment of Kahan and IEEE 754 was to get companies to agree on where the sign, exponent, and fraction should go, so that data interchange finally became possible between different computer brands.

Kahan wanted decimal floats, not binary, and he wanted Extended Precision to be 128, not 80. I've had many hours of conversation with the man about how Intel railroaded that Standard to express the design decisions that had already been made for the i8087 coprocessor. John Palmer, who I also worked with for years, was proud of this, and told me "Whatever the i8087 is, THAT is the IEEE Standard."

Posits have a single exception value, Not a Real (NaR) for all things that fall through the protections of C and Java and all the other modern languages for things like division by zero, and the square root of a negative value. Kahan wanted the quadrillions of Not a Number (NaN) patterns to be used to encode the address of the instruction in the program to pinpoint where it happened, but the support for this in computing languages never happened. By around 2005, vendors noticed they could trap the exceptions and spend hundreds of clock cycles handling them with microcode or software, so the FLOPS claims only applied to normal floats, not subnormals or NaN or infinities, etc. This is true today for all x86 and ARM processors, and SPARC for that matter. Only the POWER series from IBM can still claim to support IEEE 754 in hardware; hardware support for IEEE 754 is all but extinct.

There are over a hundred papers published comparing posits and floats, both for accuracy on applications and difficulty of implementation. LLNL and Oxford U have definitively showed that posits are much more accurate than floats on a range of applications, so much so that a lower (power-of-two) precision can be used. Like 32-bit posits instead of 64-bit floats for shock hydrodynamics, and 16-bit posits instead of 32-bit floats for climate and weather prediction. For signal processing, 16-bit posits are about 10 dB more accurate (less noise) than 16-bit floats, which means they can perform lossless Fast Fourier Transforms (FFTs) on data from 12-bit A-to-D convertors.

For the same precision, posit hardware add/subtract units appear slightly more expensive than float add/subtract, and multiplier units are slightly cheaper for posits than for floats. This echos what was found comparing the speed of the Berkeley SoftFloat emulator with that of Cerlane Leong's SoftPosit emulator. Naive studies say posits are more expensive because they first decode the posit into float subfields, apply time-honored float algorithms, then re-encode the subfields into posit format. This does not exploit the perfect mapping of posits to 2's complement integers.

Float comparison hardware is quite complicated and expensive because there are redundant representation like –0 and +0 that have to test as equal, and redundant NaN exceptions that have to test as not equal even when their bit patterns are identical. Posit comparison hardware is unnecessary because they test exactly the same way as 2's complement integers. NaR is the 2's complement integer that has no absolute value and cannot be negated, 1000...000 in binary. It is equal to itself and less than any real-valued posit.

The name is NaR because IEEE 754 incorrectly states that imaginary numbers are not numbers, and sqrt(–1) returns NaN. The Posit Standard is more careful to say that it is not a _real_.

The Posit Standard is up to Version 4.13 and close to full approval by its Working Group. Don't use any Version 3 or earlier. The one on posithub.org may be out of date. In Version 4, the number of eS bits was fixed at 2, greatly simplifying conversions between different precisions. Unlike floats, posit precision can be changed simply by appending bits or rounding them off, without any need to decode the fraction and the scaling. It's like changing a 16-bit integer to a 32-bit integer; it costs next to nothing, which really helps people right-size the precision they're using.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: