Hacker News new | past | comments | ask | show | jobs | submit login

This is a rant we came across in the early design of Rust and eventually decided was pretty misguided. Very few languages actually address these complaints.

The biggest complaint here is that, in Java, when you perform arithmetic on 32-bit floats you perform 32-bit arithmetic with 32-bit intermediate results. Java's behavior is, in other words, what you'd expect. In C, though, if you perform arithmetic on 32-bit floats you're actually performing 64-bit arithmetic on 64-bit intermediate results (at least most of the time; it's been a while since I consulted the rules). Java's behavior (which basically amounts to doing what you asked to do) infuriated the author, so he gave a bunch of examples of scientific calculations that need 64 bits of precision. But that's totally unconvincing to me: if you want 64-bit floating point precision, use doubles. C's implicit conversions in general have caused more harm than good and what the author is complaining about is having to tell the compiler to do 64-bit arithmetic on 32-bit values instead of having it follow complex promotion rules that few programmers even know about.

The biggest problem is that Java designers ignored the numerical aspects of language almost completely. The IEEE 754 is an international standard for FPU calculations which is implemented on practically all relevant CPU's for at least some 20 years.

Java designers practically ignored the standard since they obviously didn't understand it or decided that serious computations will never be important for their target group. A lot of people commenting here don't understand it either. OK, a lot of people don't need it. But those that need such calculations simply can't implement decent numerical calculations in Java.

So the next time somebody claims that "Java is comparably fast to C" remember that it can be true only for some of the algorithms.

If you ignore IEEE 754 for Rust you're creating one more language that will not be "as good as C." And here I don't mean "C as defined by language lawyers" but "C as implemented in the compilers to at least minimally support the use of the underlying hardware for those who need that."

And those who need that are the people who use computers for computations. Don't ignore them. Or at least don't be surprised that they stick to old Fortran and C compilers.

Sure, I agree we shouldn't neglect the needs of scientific computing. But we need concrete suggestions. It's not our intent to "neglect IEEE 754" (Graydon Hoare was on that committee!)

I haven't seen any concrete reason that eliminating implicit floating point promotion affects scientific code. Just use doubles if you want 64-bit precision. We match the underlying LLVM IR much more closely than C does.


C didn't have "in a language standard" IEEE support for a while, but it had in practice. C99 had it standardized, and it's not "just libraries." If you make a serious language, at least that's what you shouldn't ignore.

http://en.wikipedia.org/wiki/C99 (IEEE 754 floating point support)

Still the practice is even more important than the standardization. In the compilers I used I actually had to do "bit-tricks" to do some parts of manipulations of doubles and floats. Last times I've checked, even initializing values to the desired values was something where you actually "danced" around the standard. It works and it's needed, even if you won't find it in the standards. For an example of what people actually have to use and C "allows" see:


If you make a language that is "nice theoretically" but doesn't allow you to use the hardware, it will remain just "nice theoretically."

Edit: regarding "just use doubles:" it's about the engineering trade-off: for a lot of uses you want to read from and store to floats but do all the calculations in doubles. Only the calculations which are specifically designed to use floats as partial results should be done with floats. The language should be easy to accommodate this assumption. Computation has simply different logic than "classes" and "theoretical types."

Yes, for that stuff I totally agree that we should do it right. It isn't very well defined at the moment on Rust, as we mostly delegate to what LLVM does; when we do spec it, we should make sure to spec it correctly. (Note that delegating to what LLVM does helps us here, as they have to implement C99 correctly and we can piggyback on that.)

Would syntax for choosing between 64-bit intermediate vs. 32-bit intermediate be appropriate?

> syntax for choosing between 64-bit intermediate vs. 32-bit intermediate

That's exactly what Kahan proposed to be added to Java in the title work ("How Java’s Floating-Point Hurts Everyone Everywhere," which most of the people arguing here didn't digest fully, but I have understanding for that, people who didn't have to do computations don't have enough familiarity with the whole subject), see the page 78:

"To fit in with Java’s linguistic proclivities, Borneo allows a programmer to declare a minimum width to which everything narrower is to be widened before Java’s rules are invoked: anonymous float follow Java’s rules ( Borneo’s default; it must match Java’s ) anonymous double widen every narrower operand to double as does K-R C anonymous long double widen every narrower operand to long double ( use on Intel ) anonymous indigenous widen every narrower operand to indigenous. Of course, Java should be repaired promptly to adopt anonymous double as its default, which would then become Borneo’s default too. The scope of an anonymous declaration is a block."

"just use doubles"

Okay, and all of the data that I send off to my GPU for either computation or rendering - I should just accept the huge performance hit I get when using double on them?

Or I am up against the cache limits - I should just make everything double and accept that my code now thrashes as the CPU's cache misses?

Sorry, that probably comes off as antagonistic. But this is not a 'just do X' situation. Those of us that do heavy computation tend to think about this a lot, if it was as trivial/cost-free as changing float to double we would have done that already.

What I mean is that if you want to do 64-bit math on 32-bit values, convert them to 64-bit first. (This matches what the compiler frontend is inserting into the IR and assembly anyway.) Not "use 64 bit everywhere".

To provide meaningful results, 64-bit math should be implicit, 32-bit explicit.

The reason is: just like you believe that "it's same" or "it's not important" (it's not the same and it is very often actually important) there are a lot of programmers who don't want to spend any energy in "making the computer doing what's right" they simply assume that the computer "does what's right." In your language case, if it's hard to have 64-bit partial results, the programmers won't do this where they should. And the proper computations are the serious thing. Like calculating if the bridge will fall or not.

OK, so we agree that this is serious, and we need to make sure that programmers understand whether 32-bit or 64-bit arithmetic is being performed, lest the bridge collapses.

In Rust a programmer familiar with the rules would write it this way, making it clear what's happening:

    bridge.d = (bridge.a as f64 * bridge.b as f64 + bridge.c as f64) as f32;
Taking advantage of the implicit conversions, a programmer familiar with C++'s rules might write:

    bridge.d = bridge.a * bridge.b + bridge.c;
Now suppose that, years later, someone comes along and wants to refactor the code by adding a temporary. In Rust:

    let tmp = bridge.a as f64 * bridge.b as f64;
    bridge.d = (tmp + bridge.c as f64) as f32;
In C++:

    auto tmp = bridge.a * bridge.b;
    bridge.d = tmp + bridge.c;
And now they've changed the meaning of the code in a serious way, because adding a temporary silently changes the meaning of the code.

It seems you missed something:

    auto tmp = bridge.a * bridge.b;   
The type of the right hand expression is 64-bit FP by definition we discuss, so "auto" means "double" in your example. Only if you do simple assignment the type remains 32-bit, every "+" "-" "*" etc promotes it to 64. It can actually be implemented in your language too and the rest of the compiler will keep functioning.

    #include <stdio.h>

    int main() {
        float a = 1.0f;
        float b = 2.0f;
        auto c = a + b;
        printf("%lu\n", sizeof(c));
        return 0;

    $ clang++ test.cpp
    $ ./a.out
This is a perfect example of how C++'s floating point conversion rules are so complex nobody knows them!

Have you tried float a = 1, b = 7; auto c = a / b? It should give you 8 (I don't have clang here). Note that in the example you constructed you have practically a constant which doesn't lose precision by staying as 32 bits?

Have you also checked compiler options that affect the computations? The proper numerics is sometimes (often!) not default, but C and C++ compilers are not "one size fits all."

Tried that, also in -std=c++11 mode, still 4…

Edit: By default clang++ has FLT_EVAL_METHOD set to 0 (i.e. Rust's behavior). When I tried with -mno-sse FLT_EVAL_METHOD was set to 2 (long double promotion), but the sizeof was still 4.

Now you know the pain of the people who do numerics. C99 should use 8 bytes for partial results at least the real ones (not the constants computable by the compiler). You should see that in the debugger too. C++ is unfortunately fully another topic. I also admit that I don't know if clang implemented numerics properly, I've never used it. This looks suspicious and at least conceptually wrong: it loses the precision "just because."

After something like:

        fld dword ptr a
        fdiv dword ptr b
there is a full 64-bit result in the FPU waiting to be stored even if the recent compiler implementers think "32-bits should be enough for everybody." That's that "bridges-can-fall-once dangerous" thing I've mentioned.

I've also learned something: "auto" seems to be more dangerous than I expected! Wow.

g++ seems to duplicate your clang results, for the record (as more or less expected).

Okay, and all of the data that I send off to my GPU for either computation or rendering - I should just accept the huge performance hit I get when using double on them?

When necessary, manually cast your floats to doubles, perform the calculation, then assign the result back to a float. It achieves the same thing, though it is, of course, verbose. For example:

  float result = (float) ((double) floatVar1 + (double) floatVar2);

You obviously don't have to write numerical code. I also guess, if you had to, you'd also be fast to start avoiding language that forces you to do the above dance everywhere.

You obviously don't have to write numerical code. I also guess, if you had to, you'd also be fast to start avoiding language that forces you to do the above dance everywhere.

I didn't claim it would be pretty, I was just demonstrating that it's possible.

Note that this "rant" is by Velvel Kahan [1], who won the Turing Award for designing the IEEE 754 floating point standard [2] (among other things), which is used for specifying floating point computations on almost all computers today. So perhaps he has a little more context on this than yourself.

[1] http://en.wikipedia.org/wiki/William_Kahan [2] http://en.wikipedia.org/wiki/IEEE_754-2008

I'm not questioning the author's credentials, nor do I even think he's wrong about what's most useful in his field (scientific computing). I just think that programming language design considerations (not wanting to admit implicit conversions) outweigh his objections.

If you claim that Java was right to implement that so and the next language who ignores his complaints is also, then just accept that both Java and the next language have weaknesses expressing the computational tasks, from the point of view of the people who do have to write that kind of code.

Note that for a very long time Windows didn't use FPU at all. They also "didn't need it." And I think they started to think differently once they looked enough what Apple does. A lot can be programmed without FPU, but the edge in the industry is gained by using the hardware fully. Not doing what's easiest.

And the FPU hardware doesn't introduce penalties when reading 32-bit values and doing 64-bit computations.

That example is not comparable. The issue is not whether something is expressible. All 32-bit and 64-bit arithmetic operations are expressible in both C and Java (although I'd argue that Java makes it easier to know which you're getting). The issue is just what the default should be, and whether introducing temporaries should change the value of an expression. This is impossible to measure objectively, because we're talking formally about the exact same set of operations, just about which default is more intuitive/convenient/leading to fewer bugs/etc.

I don't agree that it's "impossible to measure objectively." The practice to store floating values with less bits but calculate with more is very old, specifically, older than C. It's not accidental, it survived because it's needed. As long as we don't have infinite memory and want to have huge inputs and outputs we'll need 32-bit floats.

And as long we want to calculate with less error, and we need it, we'll need doubles as intermediate calculations.

None of this is relevant to my point. You can use doubles for intermediate calculations in C, Java, and Rust. The only difference is what the defaults are.

Is the group using your choice as default sizeable? it's hard to imagine it's comparable to all the people who do numerical computations assuming standard behavior. Now all the algorithms will produce different results and the onea in Rust will be less accurate. I don't think you can reasonably expect to avoid a lot of pain with this choice. it would be nice to at least have compiler flag or something instead requirement to write explicit casts everywhere.

But that sounds like an argument to appeal by authority.

Firstly, the title page says "presented March 1, 1998". That's pretty old. And then the PDF shows creation date 'July 30, 2004'. I'll assume that there were 6 years of updates to this file between the 2 dates.

So that still is about 10 years old document. How have things changed since then?

Appeal to authority? Ah, I think you misunderstand why that is considered an invalid form of argument. An appeal to authority occurs when one appeals to an authority in an unrelated field in order to bolster an argument from a different field. For example, while watching football yesterday, I saw a commercial where a football player told me that CenturyLink had the fastest internet. This is an implicit appeal to authority. The player is undoubtedly an authority on how to play football, but this authority is unrelated to the question of choosing a good internet service. On the other hand, if this player were to tell me which football video games were most realistic, this would not be an invalid form of argument, since he in fact has some authority on what makes a football video game realistic (Of course, there may be higher authorities yet).

In this case, William Kahan actually has authority on the proper use of floating point, as the designer of IEEE 754 and a well-known authority in numerical analysis. His credentials in programming language design, granted, are weaker, but you cannot deny that his authority does in fact extend to the object of our discussion.

His credentials, however, do not make his argument correct.

Java's FP has historically had problems of its own. The requirement for exact portable behaviour crippled performance when the available hardware primitives weren't a good match.

For example, x87 performs all calculations using a configurable intermediate precision, 32, 64 or 80 bit. But if you want to configure it, you need to change the control word - not something you want to do every other instruction. The main other way of specifying precision is to not use most of the processor's internal registers (FP stack), and instead store intermediate results to memory.

That's changed a bit over the years, and Java has loosened up somewhat; you need strictfp to get the old behaviour.

Wouldn't you only need to change the control word when switching between 32-bit and 64-bit arithmetic? I don't think that would happen in every instruction, at least for the FP-intensive apps I work on (graphics).

Yeah, that. Providing portable, perfectly-consistent universal FP is needed as an option (going without it kills whole classes of networked computing applications), but should never be mandatory, not for a language that claims to be general-purpose.

You might be right about bit-size (although I disagree) but that is not the only complaint: the lack (at least for that age, I do not know about now) of true 'exceptions' for NaNs, infs, etc... is very very important in many ordinary circumstances and cannot be dismissed as a whim. One should be able to check the status of an operation without having to implement a strange concoction (a 'trap').

In what context would I want less accurate results from my arithmetic expressions? I mean, I have some numbers, I write an expression with them, and then I expect to get the most accurate possible result. Doesn't it make a lot more sense to let users specifically specify that a given expression is to be evaluated only using 32-bit float values, than to make everybody always cast every value in every expression to 64-bit float? I can tell you that having to sprinkle (double) all over your Java code (and then an additional (float) in front) to be sure you're getting the most available precision is pretty tedious.

Two reasons:

1. I mostly write graphics and browser layout code when I use floating point. I never want 64-bit precision. I want the fastest thing available, and I'm willing to accept some error. It's surprisingly tricky to get C to actually do 32-bit floating point arithmetic, and I suspect we may actually gain a little performance over existing browser engines with Rust here (probably negligible, but hey).

2. Wouldn't you expect these two expressions to be the same (assuming b, c, d are all f32)?

    let a: f32 = b + c / d;

    let tmp = c / d;
    let a: f32 = b + tmp;
In Kahan's preferred semantics, they are not the same. This interferes with refactoring, complicates the language, and violates intuition, IMHO. Introducing temporaries without changing the semantics is something programmers should be allowed to do.

1. Faster? Really? Those values only live in registers for a short while, is saving one or two cycles all that important?

2. That is an interesting point. Gut instinct says tmp should end up being f64 and then both expressions would be the same. Equally valid is the interpretation that the user selected explicitly for conversion to f32 of the intermediate value. We all know computers don't perform platonically perfect math with real numbers, there isn't much you can do about it, but I think the choice which results in the most precision should win at the end of the day.

Perhaps in a language which doesn't require specifying types like Rust, the best thing to do is to simply dispense with 32-bit floating point values entirely. I'd be equally happy with compiler flags like -fmore-precision. Or maybe you could have other arithmetic operators like +64 and +80 which cast all their arguments to 64/80-bit floating point numbers and produce a result with that precision. So I'd write

    let a: f32 = b (+80) c (/80) d as f32
Given that Rust is a pretty young language and still far from a 1.0 release, do consider if you can provide any kind of syntax to actually make full use of computers' numerical capabilities.

Regarding point (1), it starts to matter when autovectorization kicks in. If this expression is in a loop and the compiler can autovectorize (which rustc does, if you have a new enough version), you really want to be able to pack more values into a SIMD register if you can. (This is one point in which this paper is out of date…)

As for point (2), well, I think that making the size of "tmp" f64 would interact badly with other features like operator overloading and generics, since you're playing fast and loose with types.

Gory details (warning, type theory ahead): "+" is implemented by a trait, Add(RHS,Result) where RHS and Result are concrete types and there is a functional dependency so that Self + RHS → Result. This fundep is necessary because otherwise "let tmp: b + c" wouldn't work, as the compiler doesn't know what the type of "tmp" is (is it f32 or f64? It won't guess.) So you can't simultaneously implement f32 : Add(f32,f32) and f32 : Add(f32,f64). We'd have to introduce a lot more experimental type machinery to make this work (a "floating point functional dependency", I guess?) and I'm not sure there wouldn't be fallout (for example, I can foresee issues with higher-kinded type parameters).

It's very simple: as soon as the floating value is read, it should go to the 64-bit virtual register unless you specify that register to be 32-bit too.

If you want to use 32-bit partial results, you should say that explicitly. Autovectorization can be applied even if the rules are like suggested.

I don't know, I wouldn't expect that tbh but I am used to C. I also don't see why I should need it, I am not going to compare non-zero floating point numbers anyway but I would take better accuracy in long computations any day. I think there is value in following the standard (what people doing floating point computations are used to) and you need very good reason to deviate from it. You state that performance gains will be negligible if any in your domain but you could potentially cause a lot of headache in other domains. I don't think elegance/simplicity/negligible performance gains (if you are right about it at all) is enough of a reason to deviate from standard behavior.

> I don't think elegance/simplicity (if you are right about it at all) is enough of a reason to deviate from standard behavior.

The elegance and simplicity in this case are what are needed to make generics work. (See my other response for the gory details.)

Still the fact is that computation has different goals than "generics" or whatever you want from the rest of the language "cleanness." On x86 and most other CPU's, reading 32-bit FP variable and doing computation with 64-bits doesn't cost you anything: it's just how you fill the bits of the register during the load. As long as you don't have FP registers spills, computation has the same speed, and the final store to the destination is of course faster if you store only 4 bytes instead of 8.

You can omit some aspects from your language, but don't be surprised that people who use C continue to use C. If you really want bigger acceptance, even if it's "harder" to do it "properly," you can't pretend that people don't need it.

Cleanliness is not about theoretical purity in this case. It's about type soundness, something that C doesn't have but which is a major design goal of Rust.

And it does cost you something: autovectorization, as I explained earlier.

The problem with these kinds of discussions is that we aren't talking about expressive power: it's obvious that you can do both 64-bit or 32-bit math in either C or Rust. The issue is just what the defaults should be. In these discussions, small communities (like scientific computing in this case) tend to have oversimplified notions of how things should "obviously" be without understanding the subtleties from other perspectives (like PL design). You don't "just" make the "+" operator return different types depending on context in a language with typeclasses. That can break soundness.

Do you really think that C introduced that accidentally? Don't you think that it were also easier for all the compilers written in the last 40-50 years to simply have 1-1 correspondence between stored variables and partial results?

The reason they didn't do this was that the implementers knew that most of calculations turn out wrong if the partial results are too short.

> Don't you think that it were also easier for all the compilers written in the last 40-50 years to simply have 1-1 correspondence between stored variables and partial results?

Not really; the size of intermediate representations wasn't even specified prior to C99. It was easier to just leave 80-bit intermediate representations in e.g. the x87 FPU unit.

You're wrong regarding "80 bits." The intermediate was double at least twenty years: the practice is older than the standards, as I already claimed. You can check the code, I claim that the first thing C std libraries do on x86 with 80-bit registers is set the precision to 64-bits. The FPU unit has 80 bits registers but the computation can be fixed and it was.

It's not "easier" to leave it at 80-bits: you get bad numerical outputs if you don't implement the spilling to 80-bits too and if you don't have proper language support.

Numerics is hard but it should be done right.

Note the practice. The standards were intentionally "almost useless" because they reflected the influences of those involved.

No, I was correct about the standards: http://www.altdevblogaday.com/2012/03/22/intermediate-floati...

C98: "5.10 The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby."

You're right that most compilers in practice changed the precision to double. I think we're in violent agreement (and I learned a lot from this debate!) :)

On page 43, Kahan gives a four rule of thumbs. According to rule 3 (declare temporary/local variables all with the widest finite precision that is not too slow nor too narrow), I think Kahan would actually agree with you.

Maybe you confused Kahans rant with C semantics?

More precisely, I think the scenario is different with type inference. What is the type of "c/d", if c and d are f32? Would it be unreasonable, if the expression had type f64 or even more precise?

This was fixed in Java 1.2 I think: http://en.wikipedia.org/wiki/Strictfp

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact