[For finding worst-case inputs] the approach taken was to perform an exhaustive search, taking several hours on the second author’s laptop, of IEEE single-precision inputs, using only a few arguments from Theorem 1 to prune the search.
Even though (2^32)^4 was far too large for it to be tractable to test all 4-tuples of single-precision inputs, a "pruned" search was able to easily cover and/or exclude the entire search space.
I have noticed that there is a lot of high quality mathematical/numerical software coming out of INRIA; and have often wondered of late why we do not have an equivalent in the US. Do you have any thoughts on why this is the case; and what the closest equivalent in the USA may be?
After that they were working with INRIA on collaboration software fifteen years before WebEx was a thing. They also had large divisions looking at VR and data visualization.
The good news is that if your code works with doubles or multiple inputs, you might still be able to test it exhaustively using an SMT solver like Z3 which supports a "theory of floating point numbers". A solver like this symbolically models floating point operations at a binary level, letting it use an efficient pruning search to solve problems about the exact behavior of floating point numbers.
I haven't tried it myself so I don't know quite how well it scales, but it should definitely work better than a loop across all possible inputs!
Z3's support for floating point theories is really cool, but sometimes the simpler brute-force approach works too!
IOW, you could change the message to "Your formula's failure rate is either exactly zero or larger than a few per billion, so random testing is good enough", and I think it would still be good, insufficiently-applied advice.
It's worth tuning your random input generator to produce inputs that commonly cause problems like -0.
I can still think of a few situations where exhaustiveness might be worth the effort:
- verifying code passed in from the outside (security considerations?)
- verifying code generated automatically (ie as part of program synthesis)
- verifying very general code you expect to be used in hard-to-predict contexts (ie library routines or compiler optimizations)
If you're going to do random testing then you need to have a huge set of special numbers (denormals, NaNs, infinity, zero, exact powers of two, one more or less than exact powers of two) and you should do millions or billions of random numbers. Then you can start to feel confident.
This would definitely include denormals. As well as +-inf, nan, odd int, even int, negative odd and even int. +-1. Some .5s.
For instance, if an algorithm failed only when exactly all three inputs were 0, then it would fail only in 1 per (2^32)^3, despite not being exactly 0 at its failure rate. So exhaustive testing only works if you have a single input, since 2^96 is somewhat harder to exhaustively search, and statistical testing may not be enough, since 1/2^96 << 1/2^32.
We can imagine a situation where it fails with each of the values being a number besides 0, making this nontrivial to check for. (For example, if the inputs are normalized somehow before being processed.)
For round/ceil/floor a common failure case is the number immediately preceding 0.5. If you're "lucky" then your algorithm fails for lots of numbers ending in 0.5, but if it's just the one number then a random search will never find it. And, an exhaustive search over all of the doubles from 0.0 to 1.0 turns out to cover a really big search space (one quarter of all doubles, roughly).
Minor, but I think it's not never just very rare. If you're sampling from all floats then the problematic inputs should be possible.
I think the distinction people aren't seeing is what are you testing. If it's the sin routine, then I think you're making valid points.
But if we're talking about testing for correctness and the overall logic of the program or the abstract expression of the calculation, there are only a limited number of variables. Trying to divide them and vary them under this kind of control seems like a valuable tactic. It might not be sufficient but I've seen way more bugs in shitty if else boolean soups than I have from bits getting flipped.
If the output is so sensitive to the value of a certain calculation there should probably be a secondary or checksum. Unless you're doing games where... Framerates.
I'm not sure I agree with that, but I'll meet you halfway with those errors rarely matter in practice. (There are only 2^33 people, so getting 2^96 events (or significant portion) is well, impractical. So failure on a single, random one is unlikely to matter.)
Which you can easily do in pytest directly using parametrized test cases.
The relevancy is that a sci-fi author thought about applying this method over 60 years ago, and at the time it seemed like an insurmountable task for humans that could be solved with massive computers. With the unexpected consequence of ending the universe. And now anyone can do this on their desktop.
That is the natural entropic end-state of all large social news sites. HN has anti-bodies against that by downvoting a bit more aggressively than might seem warranted at first glance, but all in all I think it's a wise choice.
Then you'll find far better on Slashdot where free speech seems to actually happen, as opposed to here where you're not allowed to joke about anything.
I mean, it's a timeless classic, but the fact that it's three years old is still worth acknowledging.
But it wasn't really documented or guaranteed that it would, so I wrote some bit-bashing code to generate all 232 bit patterns.
if(isfinite(testValue.f) && isfinite(refValue.f)
? testValue.i == refValue.i
: (fpclassify(testValue.f) == fpclassify(refValue.f)
&& signbit(testValue.f) == signbit(refValue.f)))
I doubt it's that I know more than Bruce Dawson about FP; is it about working in the distant past (2014, cough) without compiler support for C99 stuff like fpclassify and signbit?
To some extent it comes down to personal taste and how much knowledge of FP you want embedded in the test.
Weird, because I am running standard chrome on windows 7.
- Haskell, Erlang: QuickCheck
- Elixir: StreamData
- Clojure/ClojureScript: clojure.spec
other languages have their own implementations. It may not cover everything, but it will cover more than manually hardcoding values in your tests.
- firstly binary ops really need not 4G ops to test them all, but 4G2 ops which is not a 'few seconds' but 4G times a few seconds
- secondly, as a chip designer, by the time you've got silicon it's essentially too late - testing 4G or 4G2 operations in the original verilog/vhdl model is very much not a few seconds or 4G times a few seconds
The author is complaining about this particular (bad) implementation of ceil and related functions giving incorrect results (some due to said rounding rule).