I met William Kahan about 15 years ago and asked him about this. His response: "If your program says double then yes, of course it's a bug if the compiler does long-double precision arithmetic instead. Isn't that obvious? Why are you even asking this?"
If the author of the IEEE 754 standard says that the way you do floating-point arithmetic is obviously a bug, you're probably doing it wrong.
> If the author of the IEEE 754 standard says that the way you do floating-point arithmetic is obviously a bug, you're probably doing it wrong.
That's interesting, because it seems to contradict his attitude toward FLT_EVAL_METHOD.
Kahan is a strong supporter of FLT_EVAL_METHOD==2—in fact, he's quite possibly the single person who's most responsible for its existence—and FLT_EVAL_METHOD==2 has been resoundly rejected by compilers on newer architectures due to being totally unintuitive. (For those who don't know, FLT_EVAL_METHOD==2 means that "x∗y+z" can have a different result from "double tmp = x∗y; tmp + z", even if x, y, and z are doubles. It's a very similar situation to the bug, just with expression temporaries vs. explicit storage locations instead of register-memory spills.)
In particular, Kahan wrote a widely-shared, hyperbolic essay called "Why Java Floating Point Hurts Everyone Everywhere" (which is still sometimes shared around today as a way to attack Java, despite being obsolete) castigating Java for having FLT_EVAL_METHOD==0. Java added the little-used "strictfp" in response, but history proved Java's original behavior right—with the advent of SSE on x86-64, GCC switched to FLT_EVAL_METHOD==0 by default, matching Java.
Besides, Kahan's x87 FPU design is arguably responsible for this bug to begin with. The 80-bit FP register stack is designed makes sense when you consider ancient C compilers that didn't keep variables in registers unless explicitly told to do so. In that case, the register stack allows for a simple implementation of FLT_EVAL_METHOD==2. But if assignment to a variable doesn't result in a spill (as in any non-toy compiler since 1990?) then this becomes awkward, because register spills no longer coincide with those evaluation rules. This is just a design mistake in the x87 FPU, and it's a large part of the reason why we have scalar FP math in SSE.
I've did a fast "reread" of "Why Java Floating Point Hurts Everyone Everywhere" and I can't see FLT_EVAL_METHOD mentioned anywhere. What I see is that Kahan argues that Java should let him use 80 bits if he would want to use it, and that the float arithmetic without specific notation for otherwise should be done at least "as in K&R C" always as doubles. Basically his argument is "don't limit what I can program to only your strange subset of what's already in the standard and in the hardware, and also provide the defaults that when the programmer doesn't specify the explicit behavior the behavior is safer numerically." Which in the case of the sample program can't be "one sum is different than another sum" when both times it's the double that is the explicit result of the sum operations. It's not "should it spill or not" there the compiler has to respect the size which the programmer specified.
P.S. answer to pcwalton's post below: no it's certainly not "pretty much" "hey, you lose precision when you evaluate in fewer bits." It's: let me do the full IEEE standard in Java. It's: if the programmer doesn't specify what the sizes of the calculations in the expressions are, do these calculations with the biggest sizes (but predictably!), and if he specifies, respect what he specified(1). In the sample program we discuss, doubles as the results are explicit. These doubles can be then compared even in 80 bits but they will be still the same. It's not a "spill" it's an explicit cast that should happen.
1) "For example, tan(x) should delivers a double result no matter
whether x is float or double, but tan((float)x) should deliver a float result provided a suitable
tan method is available. Thus, a programmer who gives no thought to the question gets the safer default." This is obviously not "hey, you lose precision when you evaluate in fewer bits" he wants to be able to choose.
FLT_EVAL_METHOD is the modern term. What he calls the Java semantics we now call FLT_EVAL_METHOD==0, and what he calls K&R C we now call FLT_EVAL_METHOD==2. Contrary to that essay, for years now C on the most popular architectures by default follows the Java semantics.
His argument is pretty much "hey, you lose precision when you evaluate in fewer bits". Well, yeah, of course you do. The obvious counterargument is "if you need long double, write 'long double'; don't count on the compiler to automatically widen in expressions". The minor precision benefits in code that was fragile to begin with if written in C are by no means worth the giant refactoring hazard of "introducing temporaries can break your code".
> It's: if the programmer doesn't specify what the sizes of the calculations in the expressions are, do these calculations with the biggest sizes (but predictably!), and if he specifies, respect what he specified(1).
The programmer is specifying. When you say "tan(x)" with x as a float, then most programmers would expect to get a float out. In Haskell speak, people expect the type of "tan" to be "Floating a => a -> a" (which is, unsurprisingly, exactly what the type of "tan" is in Haskell [1]). Why? Because it's consistent with the rest of the language: everyone knows that x/y with x and y both ints yields an int, not a double. Likewise, x/y with x and y both floats intuitively yields a float, not a double (and with FLT_EVAL_METHOD==0, the norm in most cases nowadays, this is in fact how it works).
Saying that FLT_EVAL_METHOD==0 robs you of the ability to choose to evaluate in high precision makes no more sense than saying that the way that x/y truncates the remainder robs you of the ability to do floating point division. As all C programmers know, if you want a fractional result from integer division, you write (double)x/y. Likewise, if you want the double-precision tangent of the single-precision x, you would expect to write tan((double)x).
There is a slightly stronger argument that Java doesn't expose access to the 80-bit precision in the x86 FPU. That's true. But the right solution to that is to introduce "long double" in Java (which is what C did), not to complicate the semantics. An implementation of 80-bit floating point that doesn't allow you to store an 80-bit value in a variable is pretty much a toy implementation no matter what. You will need first-class "long double" in order to claim to have true 80-bit support, and once you have it, you have no need for FLT_EVAL_METHOD==2.
> Saying that FLT_EVAL_METHOD==0 robs you of the ability to choose
Kahan never defined something like FLT_EVAL_METHOD==0, he wrote the PDF you linked to, and the PDF doesn't contain claims that in the form you say it does, and I'm not making them either. He says in his suggestion to "fix" Java, among other things: calculate the partial results which aren't specified explicitly to higher precision, the target sizes should be respected as the programmers specified them. That is not what the example program produced with -O flag. And I don't see anywhere that what Kahan suggested has equivalence to
"FLT_EVAL_METHOD==2 all operations and constants evaluate in the range and precision of long double. Additionally, both float_t and double_t are equivalent to long double"
He never mentions something like "float_t" and "double_t" there as far as I saw? He mentions K&R C which never had any of these. And where float x = something meant x is 32-bits. And tan( x ) meant pass 32-bit x to the double tan and return double, and float y = tan( x ) meant store the double returned from tan as 32-bits.
The program in the example that started the discussion is explicitly producing two double sums (that is, that's what the programmer wrote) so it should never compare one 80-bit and one 64-bit sum. That behavior is surely not something that you can find in Kahan's suggestions.
Please read once Kahan's document carefully to see if he writes what you believe he writes (because you keep referring to the definition he didn't make, and he explicitly knew that not "everything should be long double"). I claim he doesn't.
> An implementation of 80-bit floating point that doesn't allow you to store an 80-bit value in a variable is pretty much a toy implementation no matter what.
x87 instructions have it: FSTP m80fp and Kahan wanted that in the higher level languages too (pg. 77).
"Names for primitive floating-point types or for Borneo floating-point classes:"
"long double = 10+-byte IEEE 754 Double Extended with at least 64 sig. bits etc."
He also writes (pg. 43 of his PDF):
"Of course explicit casts and
assignments to a narrower precision must round superfluous digits away as the programmer directs"
And on page 80, regarding false arguments:
"At first sight, a frequently occurring assignment X = Y¤Z involving
floats X, Y, Z in just one algebraic operation ¤ appears to require that Y and Z be converted to double, and
that Y¤Z be computed and rounded to double, and then rounded again to float to be stored in X . The same
result X ( and the same exceptions if any ) are obtained sooner by rounding Y¤Z to float directly." That is, the standard and the x87 are already designed that it can be done directly.
> "calculate the partial results which aren't specified explicitly to higher precision"
I understand the paper. That is exactly what I'm arguing is the wrong behavior, and C compilers have stopped doing it on popular hardware too.
"Partial results which aren't specified explicitly" is really misleading terminology in a typed language. Given int x and int y, the type of "x/y" is specified explicitly given the typing rules of the language. And that's why programmers rightly expect that "int x = 5, y = 2; double z = x/y;" yields z=2.0, not z=2.5.
> "At first sight, a frequently occurring assignment X = Y¤Z involving floats X, Y, Z in just one algebraic operation ¤ appears to require that Y and Z be converted to double, and that Y¤Z be computed and rounded to double, and then rounded again to float to be stored in X . The same result X ( and the same exceptions if any ) are obtained sooner by rounding Y¤Z to float directly."
Yes, the unintuitive FLT_EVAL_METHOD==2 semantics can be optimized into the less confusing FLT_EVAL_METHOD==0 semantics in many cases. That doesn't change the fact that they're confusing semantics.
> Given int x and int y, the type of "x/y" is specified explicitly given the typing rules of the language. And that's why programmers rightly expect that "int x = 5, y = 2; double z = x/y;" yields z=2.0, not z=2.5
Red herring. Kahan advocated the approach of K&R C. K&R C didn't perform the division of two integers in floating
point. Your example would be 2 in K&R C too.
> the unintuitive FLT_EVAL_METHOD==2 semantics can be optimized into the less confusing FLT_EVAL_METHOD==0 semantics in many cases.
The example given is not "optimization" at all, it's required by the definition of IEEE 754, mostly designed by Kahan.
> That doesn't change the fact that they're confusing semantics.
I see that as confusion from you repeatedly misdirecting the discussion to the "FLT_EVAL_METHOD" which is neither what Kahan wrote in his Java paper, nor what's in IEEE 754 and even less in K&R C.
Disclaimer: I actually studied IEEE 754 shortly after it was standardized, and followed Kahan's work afterwards. The first C I've learned was K&R C, directly from their first book. I've implemented some hardware doing some IEEE 754 operations completely to the letter of the standard (to the last bit, rounding direction and trap, the control of rounding directions and "meaningful" trap handling is also what Kahan complained that Java didn't do) and implemented some compilers that used x87.
> Red herring. Kahan advocated the approach of K&R C. K&R C didn't perform the division of two integers in floating point. Your example would be 2 in K&R C too.
You missed the point. Kahan's proposed semantics are inconsistent with what programmers expect. The 5/2 example is intended to show why programmers expect Java-like semantics.
> The example given is not "optimization" at all, it's required by the definition of IEEE 754, mostly designed by Kahan.
How does an optimization being guaranteed to be correct (which all optimizations sure ought to be!) make it not an optimization?
> I see that as confusion from you repeatedly misdirecting the discussion to the "FLT_EVAL_METHOD" which is neither what Kahan wrote in his Java paper, nor what's in IEEE 754 and even less in K&R C.
We're talking past each other at this point. FLT_EVAL_METHOD is precisely what Kahan is talking about in the Java paper: it is the term introduced in the C99 spec for the precision that intermediate floating-point computations are performed in. The difference between Java and K&R C is that Java uses FLT_EVAL_METHOD==0, while K&R C uses FLT_EVAL_METHOD==1. I don't know how to make that clearer.
I can imagine that is "confusing" to you but "what programmers expect" is not what the people who talk about "spills" (like you did) expect. I understand where you come from when you mention them. But spills etc are all irrelevant to how we started this discussion:
As cpreciva stated that Kahan said that the behavior of the given example (the code where two sums of two some numbers are summed to doubles and compared to equality) should never give "false" you wrote "it seems to contradict his attitude ..." and your "proof" was Kahan's Java paper.
I showed that there's nowhere anything in Kahan's Java paper or K&R that would support such behavior. I quoted explicit lines form Kahan's paper which explicitly claim that the explicit evaluation to doubles should properly round away the bits that don't fit. And you continued to claim "but but FLT_EVAL_METHOD" whatever.
> We're talking past each other at this point.
I agree about that. You also gave false example 5/2 and you now explain it was your "intention" even if it's neither K&R nor what Kahan argued. You also use the name of the "optimization" for the basic design property of IEEE 754 operations, probably the best feature of IEEE 754 compared to all "common practices" of the times it was designed. I'd be glad to explain you what it is, you'd actually have to make the effort to understand the design principles of IEEE 754. If you're really interested, please post some place where we could start a new thread for that topic, I won't post in this thread more. There we can also discuss, if you are really interested, what Kahan actually says in the paper you posted.
The integer example is not a red herring - it's an argument to consistency by comparison with an isomorphic evaluation semantics.
If you look at it closely though, you might find material for your position in integer promotion. But I think you're missing the point of what is being argued via a vis refactoring of expression trees changing calculation results.
> the point of what is being argued via a vis refactoring of expression trees changing calculation results
That was not a topic why the discussion started and I never discussed that here. Check for yourself. It started with pcwalton's (I simplify) "the bug is because of Kahan, see his Java paper" and I showed the opposite: whoever reads Kahan's paper can see that following his suggestions in the paper the bug can only be a bug. I also cited the exact pages.
I know there are a lot of people who would like to handle FP arithmetic identically to the integer one, but that can't work by the definition of FP. And please see my other post here -- it's surely time to continue the discussion somewhere else.
I have also met Kahan. He gives new meaning to the word "pedantic".
The reality is that numerical anslysts want the computer do do exactly what they tell them to, in precisely the order described. The rest of the human race wants approximately the right answer reasonably quickly. (Floating point is a bad choice if you want exactly the right answer). And the people who make purchase decisions read benchmark reports they don't understand.
> The rest of the human race wants approximately the right answer reasonably quickly
But numerical analysts really require being able to specify the semantics of calculations precisely in order to write libraries that mere mortals can use to get the approximate right answer.
A quick perusal of any numerical analysis textbook should convince you that order of evaluation, order of truncation, overflow modes, and the like are very relevant for being able to write a library that has the right order of error.
Something like Kahan summation will have vastly different results depend on whether intermediate calculations are done at a higher precision or whether the compiler is allowed to reorder expressions. In fact, an overly aggressive optimizing compiler will make Kahan summation simply fail [1].
If all you're doing is adding a few numbers together, by all means, you don't need or want to think about precise semantics. But if you've ever used BLAS or Numpy or any other math package, you very much depend on languages and compilers being pedantic as possible in spelling out how floating point gets evaluated.
>But numerical analysts really require being able to specify the semantics of calculations precisely in order to write libraries that mere mortals can use to get the approximate right answer.
>A quick perusal of any numerical analysis textbook should convince you that order of evaluation, order of truncation, overflow modes, and the like are very relevant for being able to write a library that has the right order of error.
I see your point. I've done more than peruse numerical analysis textbooks, and as an old-school CPU logic designer, I've forgotten more about implementing floating point than most programmers know to start with.
But your point is not in conflict with mine. Numerical analysis requires precise operation ordering. Most people not only don't care, they don't have sufficient understanding of the issues that you could even convince them to care.
Actually, BLAS implementations usually do guarantee error bounds, but the implementation is almost always tailored for speed over precision. That said, you can always opt for extended precision at the cost of performance.
Interesting. This reminds me of a bug I had to track down when rewriting a bunch of Java image analysis code in C# (yeah I would have preferred neither language for this sort of thing, but that's a different story.)
Java has a flag which allows you to specify how 32-bit floating point arithmetic happens (STRICT_FP or something.) C# does not, and was doing extended precision math on floats (80-bit register) and then truncating the result. This led to small differences in feature extraction output.
And I believe he's right, if he was asked specifically what that program should do. If all the statements in the program are really just:
#include <stdio.h>
void test(double x, double y)
{
const double y2 = x + 1.0;
if (y != y2) printf("error\n");
}
void main()
{
const double x = .012;
const double y = x + 1.0;
test(x, y);
}
Then the comparison has to be done on doubles, and "but it's extended precision internally" mentioned in some answers is the wrong answer. All the variables in the program are stated to be doubles. If I'd write the x86 and x87 asm instructions I could easily write them so that they work as expected, even without SSE2.
However if you compile with some "special" optimizations that are documented to relax some specific code generation rules, then it can be OK that these specific optimizations behave differently, but then: you asked for it by choosing that documented optimization.
In the answers there, Richard Henderson wrote 2001-01-16 06:30:04 UTC:
"You are seeing the results of excess precision in the FPU. Either change the rounding precision in the FPCR, or work around the problem with -ffloat-store."
He's also right, if the -ffloat-store is documented to be turned off by -O and if it's documented that that flag also affects the precomputed calculations that are done in the compile time. If the second detail is not documented, the results could be unexpected, but again depends of what you expect from the optimizer. The properly written compiler and optimizer should perform the calculations during the code generation correct to the definition and not cut the corners. If the generated code was generated to actually perform the != during the run time, then the compile-time calculations are irrelevant, but the sane default should be something like -ffloat-store when not using SSE2 even with -O, if -O has the semantic "the optimizations that should not produce really unexpected effects." It sounds complicated, but it has its reasons.
P.S. answer to nsajko's post below: I mention -ffloat-store not ffast-math. Read the description of ffloat-store please:
Yes, that's the correct optimization. That code obviously can't perform printf("error\n") to manifest the bug, and obviously both additions were made compile-time producing the results that are equal.
If the author of the IEEE 754 standard says that the way you do floating-point arithmetic is obviously a bug, you're probably doing it wrong.