I still think the best way to deal with this stuff is to use interval arithmetic. Calculate the numbers with rounding set to round up, and separately to round down and compare the results. See PyInterval https://pyinterval.readthedocs.io/ for an implementation.
>>> from interval import interval
>>> data = (interval(1_000_000_000.1), interval(1.1)) * 50_000
>>> sum(data) / len(data)
interval([500000000.5971994, 500000000.601347])
The result is definitely between the 2 values. If you try another method to do the calculation, (e.g. sorting first) you will know which one was better without the answer having to be obvious as it is in this case.
You might find one method is better at the lower bound, the other is better at the higher bound, which means you can narrow the value further than using either individually. It's a really powerful system that I think should be much better known.
You should be okay with a sum but interval arithmetic can blow up in a way that is not correlated with the numerical accuracy of the algorithm.
Newton's method, for example, is strongly convergent but produces diverging intervals.
You can use other algorithms in these cases, and you can do things like converge your interval on a result by testing different values and seeing if the result is above the root or below the root, (or if it's within the error, so not possible to determine).
That (and overall the unability to take correlation between error into account) and the fact that intervals propagated are upper bounds for some things like the division.
Kahan summation is better than naive summation, but not perfect. The stable_mean algorithm in the article is not perfect either. It just has more precision than the naive one, but it more or less has the same precision as twice-as-precise floats.
>>> data = (1_000_000.1, 1.1, -1_000_000.1, -1.1) * 25_000
Note that in python there is a math.fsum function which return a perfectly rounded summation of floats. There are multiple different techniques for that, but they are more complex than Kahan summation.
Namely, the "naive" mean yielded the smallest error (due to cancellation lining up just right such that it's capped after two rounds), fsum yielded a nominally larger one, and stable_mean yielded the biggest error.
Of course, if you sort the values (such that we're accumulating -0.2 + ... -0.2 + -0.1 + ... + -0.1 + 0.3 + ... + 0.3, then sum performs much worse, stable_mean is a little worse, and fsum performs exactly the same.
edit: Naturally, if you want accurate decimal calculations, you should consider using decimal.Decimal or fractions.Fraction (albeit at the cost of significant performance). Using numpy.float128 will help, but it's still subject to rounding errors due to imprecise representation.
Some of these results are indeed very surprising. The "correct" result of (double(0.3) - double(0.2) - double(0.1)) is -1 * 2^-54 = -5.5511151231257827e-17, and dividing by three comes out to -1.8503717077085942e-17.
Hence, as far as those functions are concerned, the naive mean yielded the worst error in both tests, but stable_mean coincidentally happens to be the best in the first test. I'm slightly surprised that none of them got the correct result.
Heh, I reasoned about the exponent by hand and got -55 but then wasn't sure whether it was correct so I searched for discussions of 0.3 - 0.2 - 0.1 and took the value they quoted (2^-54). That being an off-by-one error does resolve my question.
The two behaves differently, so you can not strictly compare the two. Most of the time stable_mean will give more accurate results, but this is not always the case. For example you can create test cases where the naive approach will return an exact result but stable_mean will have some rounding error.
Kahan summation however is always more precise than naive summation.
Interestingly enough I was given this problem (well, numerically stable online variance, but close enough) a while ago as a take-home job interview and I ended up researching this quite a bit.
It has been a while, but if I remember correctly pairwise sum was in practice as good or better as Kahan summation (and both more stable than Walford's) but faster.
But by far the fastest and still very stable solution was to use the legacy 80 bit x87 fpu as accumulator. Which is not surprising as Kahan explicitly added 80 bit precision to x87 for intermediate computations.
FWIW, Ruby the language doesn't have Bignum any more, though, just Integer which handles arbitrarily large numbers. Under the hood, there are some optimizations for small integers.
It looks like the second. For many of its operations, the statistics module first converts floats to integer ratios (integers are bignums in Python), does intermediate operations using integer arithmetic, and converts back to floats at the end.
According to Radford Neal,
it is possible to sum a sequence of n-bit floating point numbers exactly, i.e. rounding to the n-bit floating point value closest to the exact result.
I have a special hatred of floating point numbers. I wish and advocate for a better standard.
My issue is that I've come across errors due to floating point in weird places. One example, searching a list of strings for another string. If the sort function decides those strings look like numbers it may coerce them into doubles (ints being too small to hold them). Then sort wont match correctly due to the least significant bits being rounded off.
Not sure what this could possibly have to do with floating point numbers, nor am I aware of any function that does what you're describing. Is this some kind of JavaScript quirky behavior because nothing about floating point numbers has anything to do with coercing strings.
I've definitely been bitten by similar (not identical) stuff. I've seen data serialized and sent over the wire by I-don't-know-what where several parsers can't correctly parse/serialize and end up with the same thing written back. Be it 32 bit float/64 bit float/string parsed as 32 bit float/string parsed as 64 bit float/string parsed as big decimals: I don't care, the fact is there are buggy serializers, buggy parsers, and even sometimes buggy parsers going into infinite loop (the JVM is famous for that: there was a bug where Double.parseDouble could be sent into an infinite loop by feeing it a carefully crafted string).
I think the lesson is rather: don't use text based protocols for data serialization if you care about precision. Such things should never happen with a binary protocol.
I've gone back and forth over the years. Having implemented some of the lowest-level floating point routines, taking care to handle all the corner cases, it definitely is a chore. But is there an alternative?
After all of this, I still think that IEEE 754 is pretty darn good. The fact that it is available and fast in pretty much all hardware is a major selling point.
That said, I absolutely hate -0 (minus zero). It's a stupid design wart that I've never seen any good use case for. But it is observable and therefore cannot be ignored, leading to even more icky special cases.
After staring at the MatLab[https://www.mathworks.com/help/matlab/ref/atan2.html#buct8h0...] explanation, I have to admit I don't really get that. Minus zero is a encoding artifact. Values of [x,y] with either `x == 0` or `y == 0` don't have, in the mathematical sense, a quadrant. The fact that a convention preserves an encoding artifact doesn't make it useful.
E.g. I've seen it argued that +0 and -0 represent limits of functions that approach zero from the negative side or the positive side. But that makes no sense, because no other floating point number represents a limit going from above or below the number, so why should 0? Also, all this walking on eggshells around a possible -0 (because that represents the limit of 0 coming from below 0) is just lost when someone writes 0. Admittedly, I've not seen the guts of most numerical code, but the only time I have ever seen a minus 0 written into code was to test the minus-zero handling of something.
Just use a programming language that doesn't coerce strings into numbers.
Or use a comparison function/operator that explicitely compares arguments as strings and not as numbers.
That isn't floating points fault, but whoever wrote the buggy parser..
Floating point is overall pretty good, sometimes I wish there was a version without some many ways to store NAN(for 16 bit floats it is so wasteful), and maybe versions of sqrt/div/rcp/rsqrt that would clamp invalid inputs to 0, but other than that it is hard to see how you make any major improvements given the fixed # of bits.
Actually one other possible improvement would be a way to request the hardware to dither when rounding down to the fixed bit count, although I'm not sure how it could do anything besides pure white noise, but better than nothing..
> If the sort function decides those strings look like numbers it may coerce them into doubles (ints being too small to hold them).
You sometimes even see that with APIs inconsistently wrapping behind quotes or not (seemingly at random) some values. And then it gets trickier because you can have the parser correctly parse into, say, "big decimals" some of the values, while others may be automagically, imprecisely, parsed into floating-point numbers.
The problem is of course made worse by people overall overusing decimal numbers where integers would be totally fine.
One of the most beautiful floating-point bugs ever is this one (from about ten years ago as I recall it): language preference in browsers could be set as a decimal number between 0 and 1. Java (and other languages) had a broken floating-point implementation where parsing a certain floating-point number would lead to an infinite loop. So you could set one core of any webserver with that bug by simply requesting a page, with the language preference set to that floating-point number. Rinse and repeat a few times and all cores were spinning at 100%, effectively DDoS'ing the webserver.
> I have a special hatred of floating point numbers. I wish and advocate for a better standard.
Like what? Packing the real number line in 32 (or 64) bits is an incredible trick. The roughly scale-invariant distribution of represented values means you can do math without regard to choice of units, which would be impossible with any fixed point scheme.
I suppose if you're not bothered with wasting lots of time, do the following for a precise sum: Select the two smallest numbers from the list, remove them and insert their sum. Repeat until only one number is left.
Summing [4, 5, 6, 7] would yield the series (4, )9, 15, 22 with "sort then add", but what the previous poster described would first sum 4+5 (leaving the list [9, 6, 7]), then 6+7 (leaving the list [9, 13]) and finally arriving at 22.
That algorithm could be improved (for accuracy, not speed) by replacing sums of the smallest number with multiplications as I suppose that multiplication of more than 3 identical numbers or more is loosing less precision than multiple additions.
If you do, in fact, need to worry about floating point error, then there is not some magical "it will never happen if you have this much precision and use this neat algorithm". Heck, even fixed/decimal/rational numbers aren't perfect solutions, they just make it easier to tell if you're staying within your precision.
Without infinite storage, you can't retain infinite precision, so the only real question is how much error you are willing to tolerate. And that depends very much on what you're doing - I can tolerate a lot more error in rolls on a MMO than in my bank account balance...
Yes. Similarly, if you design something with total disregard to optimization, it will likely need optimization.
YAGNI is a prediction, not a rule. Just like premature optimization, alternatives to floating point - or using longer floating point types - have real costs, which must be balanced against the needed precision.
Well, his final example suggests otherwise, but I kind of suspect that there must be something else going on there (that I'm too lazy to try to track down). Still, while fixing most floating point errors seem like rearranging the deck chairs on the titanic, if there's one algorithm that's 99.999% accurate and another that's only 99.9% accurate, might as well use and be aware of the more accurate one.
The error in the 'Variance calculated using Statistics::Descriptive' is ~0.0002% of the mean (and of each value).
As the OP says: "One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range"
My point is that 99.999% algorithm is going to have a cost over the 99.9% algorithm (more complex code, more resource usage), and depending on what you're doing, you may have to consider that cost.
double fsum(const double *p, size_t n) {
size_t i;
double s;
for (s = i = 0; i < n; ++i) s += p[i];
return s;
}
Add one line of code and it not only becomes numerically stable but benchmarks faster too.
double fsum(const double *p, size_t n) {
size_t i;
double s;
if (n > 8) return fsum(p, n / 2) + fsum(p + n / 2, n - n / 2);
for (s = i = 0; i < n; ++i) s += p[i];
return s;
}
Losers always whine about how their laziness and ignorance is a calculated tradeoff.
What we need to consider is that sometimes there is no tradeoff.
That depends to an extreme degree on your environment, your language, your compiler... etc. What you need to consider is that tradeoffs have two sides.
Because it reduces worst case roundoff errors from being linear to logarithmic. Why is that the case? Best intuitive explanation I can think of is that floating point behaves more like VHS than DVD. If you wanted to bootleg a thousand copies of Titanic, then would you have each copy be made from the previous copy? No. It'd look terrible. You would instead have a stack of VCRs where the copies could be made in a hierarchical way that minimizes the critical path of copies. The overall number of operations may be the same, but if we consider that errors get worse the more errorful something becomes then it should be simple to understand why divide and conquer improves numerical stability.
If you are looking for such accuracy (where 50_000_000.00006 is different from 50_000_000.06) then you need a specialised library and do very careful coding.
If you are averaging bazillions of numbers and they range from weeny to huge and you thing 50_000_000.000006 != 50_000_000.06 for your purposes then it is hard for me to see that you are doing anything sensible.
This is, except in very specialised cases that I cannot imagine, a non issue.
Just add them up and divide, (and check for overflow, that could ruin your day!)
That's the wrong way to think about floating point. Double gives you ~15 digits of precision. If the answer is 500000000.600000 but your algorithm says 500000000.601239 then 25% of the bits you've got are kaput. It's also a question of scalability. Increase the author's example array size to 1 gigabyte and now 40% of your double is garbage data. Increase it to web scale, and you're toast. What's the solution? Divide and conquer. Not only will that let you farm out the summation across machines, but it also eliminates the linear growth in errors.
Give me an (real life) example where these errors matter. Any real data will be noisy anyway and relying on anything at the level of ppb (parts per billion) looks suspicious.
Clarification: I do not say that treating floats careful is not important, and I know doing stuff like subtracting large numbers can lead to inaccurate leading digits. This is clear and part and parcel of any numerical work. But this is not what the article is about. I am claiming instead the article is looking for a problem where there is none (or let's say I fail to see) .
If you're developing floating point code, you're in for a huge surprise, eventually. What happens for example when you subtract two close numbers with an error like this? There are a lot of mechanisms for small errors to become significant.
Many of us have had their attitudes readjusted by a bug where these "insignificant" errors became anything but.
We're often wrong in ways we can't imagine, so it is wise to remain humble and try to strive for correctness over our potentially misguided ideas how things work.
For example, you'd better still have multiple layers of security even though we've already made some other layer "impenetrable". And still check for those "impossible" error cases before they create nearly undebuggable issues.
Yes, one has to careful about floating point, but honestly the article is not about that. Actually it is more about something where you do not have to be careful.
The article explicitly says while you might get away with it with some luck due to characteristics of the real world data, you do have to be careful.
From TFA:
"Now, in the real world, you have programs that ingest untold amounts of data. They sum numbers, divide them, multiply them, do unspeakable things to them in the name of “big data”. Very few of the people who consider themselves C++ wizards, or F# philosophers, or C# ninjas actually know that one needs to pay attention to how you torture the data. Otherwise, by the time you add, divide, multiply, subtract, and raise to the nth power you might be reporting mush and not data.
One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range. On the other hand, in the real world, one hardly ever works with just a single variable, and one can hardly every verify the results of individual summations independently.
Anyway, the point of this post was not to make a grand statement, but to illustrate with a simple example that when using floating point, the way numbers are added up, and divided, matters."
Yes, real data is noisy but testing needs do be precise and repeatable. For example, if we need to test that a value is <=100, then it must pass at 100 and fail at 100.00000001. And yes, we use margins and fixed point numbers too but sometimes, we need to be precise with floating point numbers.
It also matters in some calculations, for example when doing GCD/LCM with periods and frequencies. For that, we eventually switched to rational numbers because precision of source data was all over the place.
We all know that parts per billion rarely make sense but where do we draw the line? Sometimes, it really matters (ex: GPS clocks), sometimes, PI=3 is fine. So in doubt, use the highest precision possible, you can relax later if you can characterize your error.
I didn't work in avionics but a field that used a lot of GPS bits. Depending on where your error is, it can be +/- 100 feet or more or less depending on what point the cut off is. Like you say you need to know what your target is.
Of course, the fact that the threshold has to be so precise is because that's the rules, not because of anything related to aviation. That's a human-enforced quality. A threshold at 100 probably has about one significant figure, sometimes less.
Well, not necessarily. The "and repeatable" can be quite a constraint.
Imagine three redundant systems on a plane that want to compute the same from the same input array (and then check whether they all agree). You want to implement this, and figure that (since there's a lot of data) it makes sense to parallelize the sum. Each thread sums some part and then in the end you accumulate the per-thread results. No race conditions, clean parallelization. And suddenly, warning lights go off, because the redundant systems computed different results. Why? Different thread team assignments may lead to different summation orders and different results, because floating point addition is not associative.
More generally, whenever you want fully reproducible results (like "the bits hash to the same value"), you need to take care of these kinds of "irrelevant" problems.
I was accumulating deltas in a pde simulation on photographic images and when the calculation was tiled, different tiles had noticeable difference in brightness.
Real image data, roughly 1 megapixel per tile at that time, and 32-bit floats.
If you know about Lyapunov coefficients then you should also know about the concept of order of accuracy. If I'm developing a code that is supposed to be second order in time and space, then when I'm running my convergence study, I better be able to control the floating point error in my computation of the total energy, so that I can properly attribute the error I do observe to power series truncation, or what have you.
Rounding errors might be small, but catastrophic cancellation can make errors much larger than parts per billion. Maybe for your application it doesn't matter, but sometimes it does, and you need to be aware of these cases.
You wouldn't want those kind of rounding errors in financial applications, in rocket science applications, etc ...
>>> from interval import interval
>>> data = (interval(1_000_000_000.1), interval(1.1)) * 50_000
>>> sum(data) / len(data)
interval([500000000.5971994, 500000000.601347])
The result is definitely between the 2 values. If you try another method to do the calculation, (e.g. sorting first) you will know which one was better without the answer having to be obvious as it is in this case.
You might find one method is better at the lower bound, the other is better at the higher bound, which means you can narrow the value further than using either individually. It's a really powerful system that I think should be much better known.