Replicating this for double-precision inputs would have been impossible, of course; but fortunately knowing that the worst-case single precision rounding error occurred in computing
(3/4 + 12582909 / 16777216 i) * (5592409 / 8388608 + 5592407 / 8388608 i)
Wat. I had never even heard about this, but the more I read and think about it, the more this rule makes sense to me.
How did I miss this - is this commonly taught in school and I just did not pay attention? It seriously worries me that I don't even know how to round properly - I always thought of myself as a math-and-sciency guy...
Benford's law applied to rounding floats is splitting hairs, really - but it goes to show how even "simple" things like rounding can be really difficult if you only worry about them for too long.
Benfords law only applies to certain sets of numbers.
Not all numbers display Benfords law. Therefore Benfords law should not be considered when talking about rounding unless there's a reason to think that it applies to your particular set of numbers.
Check Knuth's discussion of it in "The Art of Computer Programming, section 4.2.4 where he talks about the distribution of floating point numbers. Benford's research took numbers from 20299 different sources. Additionally, the phenomenon is bolstered by noticing, in the pre-calculator days, that the front pages of logarithm books tended to be more worn than the back pages.
The whole discussion covers about four pages, and contains what I think of adequate mathematics demonstrating the point.
If your test oracle is code itself, then it's not unlikely you'll make the same bug twice. I can't find the paper I'm after, but NASA did some work here.
If your test oracle isn't code, you have the problem of working out the correct outputs for the entire function space (at which point maybe you'd be better off with some kind of look-up table anyway)
The next problem is your input combinations grow much faster than you are able to test. Most functions don't just work on one float (which is very feasible) but they might work on three or four streams of input, and keep internal state.
The moment you have these combinations you run into massive problems of test execution time, even if you can parallelise really well.
Butler and Finelli did some work http://www.cs.stonybrook.edu/~tashbook/fall2009/cse308/butle... also at NASA investigating if doing a sample of the input space was worthwhile, but their conclusion was that it isn't helpful, and that any statistical verification is infeasible.
In my report I ended up concluding that it isn't a particularly useful approach. It seems that following good engineering practice has better returns on your investment.
- My function does something complicated: chances are you will be replicating your function in your test code. Hand-code assertions.
- My function can be tested using a function that is provided by the runtime/framework: great! Remove your function and use the one in the framework/runtime instead.
- My function can be tested using a function that is provided by the runtime/framework, but is written for another platform or has some performance trick: test as described in the article.
- My function should never return NaNs or infinities: test as described in the article.
> And, you can test every float bit-pattern (all four billion!) in about ninety seconds. [...] A trillion tests can complete in a reasonable amount of time, and it should catch most problems.
Your developers will never run said tests. Unit tests are supposed to be fast. However, these exhaustive tests could be run on the CI server.
Also, these aren't unit tests; they're property tests.
Very true but that really should have been stated up-front by the author. Next thing you know "some guy at Valve said all unit tests with floats should test all 4 billion cases." You would be surprised at how much people take away from a skim over something - e.g. a poor understanding of Apps Hungarian created the monster that is Systems Hungarian.
> in which case there should always be a slow reference implementation.
Good point - however, keep in mind that bug fixes would need to be made in two places. Not only does it mean the maintainer needs to grok two pieces of code, but they may have to determine if the original code is providing bad truths. Hopefully there wouldn't be many places where this type of testing would be required as the low hanging fruit is often enough: however keep in mind certain industries do optimize to ridiculous degrees - and in some of those industries optimization can happen first depending on the developers [well-seasoned] intuition, and can be a frequent habit. The author's industry is a brilliant example of where the typical order of optimization (make it work, make it correct, make it fast) takes a back seat - life is incredibly different at 60/120Hz.
Only if the bug actually occurs in both implementations. This does happen sometimes, so it's indeed good to keep in mind - but my experience experimenting with these kinds of comparison tests hasn't borne this out as the more common case. YMMV, of course.
> Not only does it mean the maintainer needs to grok two pieces of code, but they may have to determine if the original code is providing bad truths.
I find automated comparison testing of multiple implementations to be extremely helpful in grokking, documenting, and writing more explicit tests of corner cases by way of discovering the ones I've forgotten, frequently making this easier as a whole, even with the doubled function count.
> Hopefully there wouldn't be many places where this type of testing would be required as the low hanging fruit is often enough: however keep in mind certain industries do optimize to ridiculous degrees - and in some of those industries optimization can happen first depending on the developers [well-seasoned] intuition, and can be a frequent habit.
I don't trust anyone who relies on intuition in lieu of constantly profiling and measuring in this sea of constantly changing hardware design - I've yet to see those who've honed worthwhile intuitions drop the habit :). Even in the lower hanging fruit baskets, it's frequently less a matter of optimizing the obvious and more a matter of finding where someone did something silly with O(n^scary)... needles in the haystack and where you'd least expect it.
An example that comes to mind: at some point I was working on an optimized paintbrush tool that was to give perceptually similar results to drawing the brush image repeatedly along a path. I replaced the individual draws with convolution, but there was no way to make this exactly correspond with the "over" compositing operator that it was replacing. So I used a post-processing step to adjust it.
In this case, the divergence from the reference implementation was so great that you couldn't really compare them. But the results looked good, and the performance gains were tremendous. The only way to compare it would have been to create a new reference implementation that imitated the new algorithm, which would have told me very little. Instead, I did lots and lots of real world testing to ensure there were no missed edge cases.
With non-ECC memory and a simplistic brute force algo it doesn't make much sense to talk about doing 2^64 tests because any errors will swamp the memory errors that will be detected. Depending on who's numbers you select (1e-10? 1e-15?) to get the predetermined result you want (LOL)
At 1 in 2 ^ 32 range its commodity desktop hardware time. To pull off 1 in 2 ^ 64 hardware error rates is way harder.
And, skipping testing of large numbers and NaNs can lead to missing bugs. One of the fixed ceil functions handled everything except NaNs correctly. Implementations that failed on exponents beyond 63 or 64 are easy to imagine.
But, testing doubles requires compromises. Testing of special values plus random testing plus exhaustive testing of the smallest ten trillion or so numbers should be sufficient -- better than the current status quo at least.
Quick--why would you sort an array of floating point numbers before adding them up? And in which order?
Floats are a quantity and a magnitude. Adding two floats requires (at least) the larger of the two magnitudes because larger values are more important than smaller ones. So the smaller float essentially gets converted to a small quantity and a large magnitude. In an extreme case, this small quantity becomes zero. So when adding a very large number and a very small number, the result will be the same as adding zero to the very large number.
If you add numbers largest to smallest, then thanks to this, at a certain point you are simply doing $SUM = $SUM + 0 over and over.
But if you start with the smallest numbers, they have a chance to add up to a value with a magnitude high enough to survive being added to the largest numbers.
It sucks that most people don't really care about performance any more. It sucks when people causally dismiss people who do care about performance with arguments like "premature optimization" and so on.
So you know what? It makes me feel happy to use the most efficient type that will work. I like my single precision floats.
I can die happy knowing that I didn't waste everyone's collective network, disks and CPUs with needless processing. Can't it be good to save just for the sake of saving?
So you want a practical reason do you?
I shaved over 2 gigabytes off the size off the download size of our product simply by using more efficient data types. Using half size floats is good enough for some data (16 bits).
And let me tell you about a data type that we used to use called a char. 8 bits would you believe! Turns out those x86 CPUs that are sitting there a few VMs deep still support those things.
Why? Really. To my mind it's good that we no longer have to care about these implementation details, and can concentrate on writing code that expresses what we mean. Like writing on paper rather than scratching letters into tree bark.
I have a hard time finding programs that don't push the computer interesting - media programs and so on interest me more than anything else. There's the top 1% of programs that people actually use and then there's a huge wasteland of stuff nobody uses. This top 1% usually have a lot of attention paid to their performance.
I can think of websites that don't really demand computer power that are interesting but only as businesses - not as programs themselves. And they have performance requirements in the background.
The realm of interesting things that are not squeezing as much performance as possible out of your code is enormous.
There's UX to be improved, there's robots to be built, and there's an almost limitless trove of science that could be applied to everyday life, or to business, or to art. There's improving human input methods. There are whole new genres of games to be invented. There's research into algorithms, where the objective is measured in big-O, not milliseconds, because we won't have hardware to make them practical for another six years.
Sometimes, you need micro-optimized code to do theses things well. Often you don't.
Don't get me wrong: I love digging into a hot spot and optimizing the hell out of it. But I'm going to have to agree with lmm that it's a good thing that we have the option of whether or not to think about it.
And if a super smart compiler were released tomorrow that could take any program you threw at it and give you the absolute optimal implementation, the part of me that would be sad for the loss of that intellectual pursuit would be much smaller than the part of me that would be crazy excited. Of course, there would still be lossy optimization to explore, but even if it could magically solve that, I'd still never run out of interesting problems to play with.
All of the things you suggest seem to me to have important performance requirements, maybe not as hard as micro-optimized, but still they are there.
The few examples I can think of where the performance of code of popular products is not up to scratch - it's from users complaining - and a better performing product would have a competitive advantage.
I don't think that people who go after delivering the better performing software should be derided as being dullards pursuing something less interesting.
Sure, but in most of them, those performance requirements can be met simply by following a reasonably idiomatic style in any of a number of popular high level languages.
But I don't see anyone deriding people who find performance tuning interesting as dullards. You find that more interesting than other problems, while many people find the other problems more interesting.
We currently have plenty of use for both kinds of interest, and will for the foreseeable future. Compilers that completely magick away the need for optimization are a long way off, and there are plenty of applications where performance is paramount. But there are plenty of other applications where the performance you get from using high level tools is good enough, and choices like "float" or "double" are pretty trivial.
And high level tools are allowing more and more for you to make different choices when it really is important. In JS, for example, you have things like asm.js, and if you need the efficiency of single precision floats, you have Float32Array. Soon, we'll even have abstractions for vector instructions in the browser. And of course, if you really need extra horsepower for something on the server side, you can always pipe in and out of a carefully tuned native program.
But even if your end goal is efficiency, it actually makes sense for a language to use a less efficient default so that you don't have to think about things like precision requirements. The time you save on the 95% of that code that takes up 10% of your resources is time you can spend optimizing the 5% of the code that takes up the other 90%.
> It sucks that most people don't really care about performance any more. It sucks when people causally dismiss people who do care about performance with arguments like "premature optimization" and so on.
"Why? Really. To my mind it's good" etc etc
I also disagree with the "most of us" and "more interesting" angle, also exhibited here:
I think a dismissive attitude to performance is a poor attitude, I'm not saying that high level tools should not be used.
I also don't think that arguments against premature optimization are a dismissive attitude toward performance. Avoiding premature optimization is about picking your battles. In an ideal world, you could optimize everything.
But optimized code takes longer to write, and longer to maintain, and in this subideal world, we have limited time to spend on code. The point of avoiding premature optimization isn't to excuse low performance code. It's to give you time to optimize the code that makes the biggest difference.
My third job was an insurance trading platform. Most of the interesting stuff was around the representation of contracts, making it possible to manipulate them by computer, and compare different possibilities, but still having them look like the paper contracts users were used to.
My second job was at a music streaming service, and was mostly about doing simple things in ways that performed well, or scaled up to millions of users. I found this a lot less interesting or creative than my first or third job. Of course, that could just be personal taste.
That's not actually true. JS engines are highly optimized. For example, anything that fits in a 32-bit int (and is an integer, of course) is stored as such, at least in V8.
Consider a database: If expected data amounts to 10 inserts pr second, and the application lifetime is expected to be 10 years, 32-bit is enough to index. However, there is only room for 36% growth, so 64-bit it is, unless performance is important. In that case watch the index and project when it will run out of space.
The point is, you consider the cases, and make an educated choice, this means reasoning about it both ways.
Because of rounding issues, that look like a no problem for a single precision number, but will be a no problem^2 for a double precision one. Thus, if I'm wrong (or get wrong after a change in scope), the program will still work.
Take SSE2 instructions as an example (Every X64 processor supports these, the old x87 FPU has been obsolete for over 10 years). You can either calculate 4 floats per instruction or 2 doubles per instruction. Similar considerations also apply to ARM.
Also even if you might want to use doubles as intermediate representation you still might wish to use floats as storage. E.g read from float, convert into double, calculate stuff, write back a float.
qsort() can get 15% better performance with uint64_ts than uint32_ts (sorting identical arrays of 8 bit numbers represented differently).
On the other hand, a naive bubble sort implementation was managing 5% better performance with 16 bit datatypes over any of the others.
If you start measuring energy usage as well, it becomes even stranger - using a different datatype can make it run 5% faster, while using 10%+ less energy (or in the qsort() case, take less time, but use more energy).
I'd tested a number of sort algorithms (bubble, insertion, quick, merge, counting), so also testing the sort function from stdlib seemed like a logical continuation.
It was done rather quickly, so there wasn't time to properly investigate, merely look at the numbers and go "that's strange".
If you look on arXiv.org, there are publications looking more at alignment on Cortex and XCore architectures. There's probably work relating to x86, though the longer instruction pipeline makes it more complicated to reason about than the simpler architectures.
Also, if the dynamic range of your data allows it [and for physical application it usually does], you can first rescale it to [0,1] in order to further increase precision. [It's the interval where floats are most dense.]
Having worked for a few years now in computational geometry, i'm more and more convinced that "float by default" is the way to go. Doubles should be used for intermediate computations.
Who not to scale to [0,0.5], where they are even more dense? ;-)
Even CD-quality audio only has 16 bits per sample. My digital camera only records 12 bits per pixel, and that's before the image is converted to a more reasonable 8-bit-per-channel format.
Single precision is still overkill in a number of applications.
When doing arithmetics you can lose as much as 1 bit of precision per floating point operation. So, it pays off to do use more precision for calculations even when the storage format doesn't persist it.
Besides I remember reading somewhere that 32-bit floats instructions aren't actually any faster than 64-bit (except of course the cache...). They even might be slower because the FPU has to extend them to 64-bit before performing the operation
Nowadays when you do computations with single float/double values in your registers they are equally fast.
The biggest difference comes from memory bandwith and the ability to vectorize. Your CPU can calculate either 4 floats or 2 doubles in one instruction (assuming pre AVX X64 processor). With AVX it's 8 floats or 4 doubles.
If it's something you're worried about, I'd suggest measuring the difference in your actual application, both in terms of performance and accuracy. If I can come up with a decent test case I can legally publish, I'll post results, but your mileage may still vary.
So I could use a chip that costs more, results in shorter battery life, more thermal problems, and is slower so smaller BW processed and less audio filtering possible. But, at least I'm using doubles instead of floats, which doesn't even matter when it results in no measurable signal metric improvement while every other measurable performance and economic metric tanks downward.
You can do a lot of DSP and plain ole signal processing with fixed point of course, and sometimes doing weird things requires doubles.
If you care about precision - doubles still aren't precise - for currency use bigdecimals, for some stuff ratios can give precise results, for irrational things your only options are doubles, but be aware that you aren't precise, even for simply looking operations. No way to compute 10% of something precisely with doubles.
That's obviously not true. For irrational things, if exact values matter, use symbolic arithmetic.
- there's a problem with how much precision you need. Different countries specify their way to round up doing money calculation differently (even with the same currencies - see Euro). When you use ints you have the assumption about precision repeated all over your codebase, which make it harder to change when tax law changes or you want to support other countries. With BigDecimals most of the code is universal.
For financial calculation you use a dynamic integer type that expand as needed.
I sort-of agree with you, in that it is easy to get bitten, and that if people have a dynamic integer type available on their platform that's probably safest. If the choice is between using integers for fixed point vs. using double's though, I'd go for the fixed point any day.
but that's a bit much to type from a phone so I went for "use doubles". And now I'll never be so lazy with my comments again!
If there is a long chain of calculations that use those numbers, you might do the intermediate calculations in double, or even long double.
Remember, the determinant value is not the rounded value of the exact computation with those things unless it is is zero, it's still "sloppy" if it's far from zero. You get only 1 bit of guaranteed accuracy in a 64bits FP number if the result is not zero.
I think there is a little bit to "sloppiness" in the fact that we carry very inexact results in FP. Even "good enough" FP computation is PhD level.
Given all the special and unique features of the number one, I sincerely cannot tell if he's being sarcastic or not.
When your individual test is a handful of instructions, you can make such impressive claims. I only wish $work's test suite was so svelte!
If you use ruby 2.1, be sure to upgrade to BigDecimal 1.2.4.
The bug is basically: if the divisor is less than 1, the result will be 0.0.