Give me an (real life) example where these errors matter. Any real data will be ...

vardump · on June 11, 2021

If you're developing floating point code, you're in for a huge surprise, eventually. What happens for example when you subtract two close numbers with an error like this? There are a lot of mechanisms for small errors to become significant.

Many of us have had their attitudes readjusted by a bug where these "insignificant" errors became anything but.

We're often wrong in ways we can't imagine, so it is wise to remain humble and try to strive for correctness over our potentially misguided ideas how things work.

For example, you'd better still have multiple layers of security even though we've already made some other layer "impenetrable". And still check for those "impossible" error cases before they create nearly undebuggable issues.

andi999 · on June 11, 2021

Yes, one has to careful about floating point, but honestly the article is not about that. Actually it is more about something where you do not have to be careful.

vardump · on June 11, 2021

The article explicitly says while you might get away with it with some luck due to characteristics of the real world data, you do have to be careful.

From TFA:

"Now, in the real world, you have programs that ingest untold amounts of data. They sum numbers, divide them, multiply them, do unspeakable things to them in the name of “big data”. Very few of the people who consider themselves C++ wizards, or F# philosophers, or C# ninjas actually know that one needs to pay attention to how you torture the data. Otherwise, by the time you add, divide, multiply, subtract, and raise to the nth power you might be reporting mush and not data.

One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range. On the other hand, in the real world, one hardly ever works with just a single variable, and one can hardly every verify the results of individual summations independently.

Anyway, the point of this post was not to make a grand statement, but to illustrate with a simple example that when using floating point, the way numbers are added up, and divided, matters."

GuB-42 · on June 11, 2021

I work with avionics and it does matter.

Yes, real data is noisy but testing needs do be precise and repeatable. For example, if we need to test that a value is <=100, then it must pass at 100 and fail at 100.00000001. And yes, we use margins and fixed point numbers too but sometimes, we need to be precise with floating point numbers.

It also matters in some calculations, for example when doing GCD/LCM with periods and frequencies. For that, we eventually switched to rational numbers because precision of source data was all over the place.

We all know that parts per billion rarely make sense but where do we draw the line? Sometimes, it really matters (ex: GPS clocks), sometimes, PI=3 is fine. So in doubt, use the highest precision possible, you can relax later if you can characterize your error.

sumtechguy · on June 11, 2021

I didn't work in avionics but a field that used a lot of GPS bits. Depending on where your error is, it can be +/- 100 feet or more or less depending on what point the cut off is. Like you say you need to know what your target is.

whatshisface · on June 11, 2021

Of course, the fact that the threshold has to be so precise is because that's the rules, not because of anything related to aviation. That's a human-enforced quality. A threshold at 100 probably has about one significant figure, sometimes less.

MauranKilom · on June 11, 2021

Well, not necessarily. The "and repeatable" can be quite a constraint.

Imagine three redundant systems on a plane that want to compute the same from the same input array (and then check whether they all agree). You want to implement this, and figure that (since there's a lot of data) it makes sense to parallelize the sum. Each thread sums some part and then in the end you accumulate the per-thread results. No race conditions, clean parallelization. And suddenly, warning lights go off, because the redundant systems computed different results. Why? Different thread team assignments may lead to different summation orders and different results, because floating point addition is not associative.

More generally, whenever you want fully reproducible results (like "the bits hash to the same value"), you need to take care of these kinds of "irrelevant" problems.

GuB-42 · on June 11, 2021

Aviation is all about rules, and sometimes a bit of flying.

mcguire · on June 11, 2021

And most of the rules are written in blood.

habibur · on June 11, 2021

Then use more bits for floats and use naive average. This article is more like "how to squeeze the last bits from a floating point operation."

And even if you use these techniques butterfly effect will kick in eventually.

CarVac · on June 11, 2021

I was accumulating deltas in a pde simulation on photographic images and when the calculation was tiled, different tiles had noticeable difference in brightness.

Real image data, roughly 1 megapixel per tile at that time, and 32-bit floats.

adrianN · on June 11, 2021

A field where floating point errors accumulate quickly and can totally ruin your results is computational geometry. See this paper https://people.mpi-inf.mpg.de/~mehlhorn/ftp/classroomExample... for examples.

kstenerud · on June 11, 2021

https://dolphin-emu.org/blog/2021/06/06/dolphin-progress-rep...

snakeboy · on June 11, 2021

In general, floating point error can accumulate in iterative algorithms where you are feeding the result of one calculation into the next step.

andi999 · on June 11, 2021

Yes. If you have a positive Ljapunov coefficient that will happen. But then what the article does won't help you either

johnbcoughlin · on June 11, 2021

If you know about Lyapunov coefficients then you should also know about the concept of order of accuracy. If I'm developing a code that is supposed to be second order in time and space, then when I'm running my convergence study, I better be able to control the floating point error in my computation of the total energy, so that I can properly attribute the error I do observe to power series truncation, or what have you.

thehappypm · on June 11, 2021

The article is not saying that this is even all that useful or revelatory -- but it's just something to keep in mind.

SuchAnonMuchWow · on June 11, 2021

Rounding errors might be small, but catastrophic cancellation can make errors much larger than parts per billion. Maybe for your application it doesn't matter, but sometimes it does, and you need to be aware of these cases.

You wouldn't want those kind of rounding errors in financial applications, in rocket science applications, etc ...

andi999 · on June 11, 2021

And how does the mentioned averages procedure help?

pcwalton · on June 11, 2021

Sure. Here's an example of blatantly incorrect font rendering due to (what I assume is) floating point error in macOS: https://twitter.com/pcwalton/status/971825249464475648