Hacker News new | past | comments | ask | show | jobs | submit login

Sorry! As consolation, I’ll tell you the following (totally obvious unless you understand floating-point, in which case it is shocking) little lemma I happened upon the other day: assuming default rounding, the result of x+x+x+x is exact.

And please do write your next blog post anyway, I love this sort of thing =)




Is • one of the four operations +, -, * , /?

For -, x-x-x-x is -2x, representable if there is no overflow, and obtained with only exact steps.

For +, the result is 4x, again representable. The second and third operations do not seem like they are going to be exact but the error might just compensate. Should this be studied by case analysis on the last bit of the significand?

For * and /, I just don't see it.


That should have been +. Fixed.

As you say:

> The second and third operations do not seem like they are going to be exact but the error might just compensate. Should this be studied by case analysis on the last bit of the significand?

That’s exactly right. There’s a simultaneously ugly and clever proof by looking at the 3 low-order bits of the intermediate terms. I haven’t found a more elegant one yet.

It follows trivially that x+x+x+x+x is correctly rounded, but it breaks down when you get to x+x+x+x+x+x.


I don't know if this can be considered an improvement over brute-force case analysis:

If x is a power of two, then each floating-point operation in x+x+x+x is exact.

If x isn't a power of two:

fl(3x) = 3x + e1 |e1|≤ulp(3x)/2

fl(fl(3x) + x) = fl(3x) + x + e2 |e2|≤ulp(4x)/2

        (ulp(4x) because once set apart the case when x is a power of two,
        there is no way fl(fl(3x) + x) can be in a binade other than the binade of 4x)
Now there are two cases:

- if 3x is in the binade of 2x, then ulp(3x) = ulp(2x) = ulp(4x)/2. In this case fl(fl(3x) + x) is within (3/4)ulp(4x) of 4x. The only floating-point number within (3/4)ulp(4x) of 4x is 4x.

- if 3x is in the binade of 4x:

Then the real number 3x cannot be a midpoint for its binade. If floating-point numbers of the binade of 4x are all multiples of 2^p, then all midpoints for this binade are multiples of 2^(p-1). But x is of the form n * 2^(p-2), and 3x = (3n/2) * 2^(p-1) cannot be a midpoint.

Since 3x is not a midpoint for its binade, we can in fact improve the inequality |e1|≤ulp(3x)/2 into |e1|<ulp(3x)/2.

It follows that fl(fl(3x) + x) is in this case again strictly within ulp(4x) of 4x.

I do feel that this is a little hand-wavy as proofs go. In particular, I set apart the case when x is a power of two to make it clear that fl(fl(3x) + x) and 4x are in the same binade in the rest of cases, but I wonder whether it's possible that fl(3x) is a power of two and whether that deserves special attention.




Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: