Maybe I'm just getting incurably academic, but I think "I found a bug!" is not nearly as interesting as "I found a bug!" and one of:
1) It affects millions of people!
2) It affects large amounts of money!
3) It is a great example of a new or rare class of bug. Here's how you can avoid introducing similar bugs into your code! Here's how we can detect these things automatically! etc.
Here, the only thing that springs to mind is IEEE floating point being used internally by Google Calculator for some reason, which I find hard to believe. Where's the insight?
Well, apparently, it was obvious. AFAICT, no one has come up with a better explanation than using IEEE floating point where bignums should be used, which I pointed out in my original comment.
The guy who was the maintainer of Google Calculator from 2005 to 2007 confirmed that this is the correct explanation. He said he wanted to switch it to use bignums, but wasn't able to (and not due to technical reasons).
it's 80-bit ieee floats internally, but there's some specialized code
to clamp it to zero when it looks like it ought to be zero.
as far as I know it should otherwise behave exactly like 80-bit ieee
(it wouldn't surprise me if the clamping code is being a little overzealous
in this case)
The algorithm probably first decides how to do the math and then does the math.
EDIT: single-precision does behave this way:
-bash-3.2$ cat precision.c
#include <stdio.h>
int main() {
float f1 = 999999999999999;
float f2 = 999999999999997;
printf("%f\n", f1 - f2);
return 0;
}
-bash-3.2$ gcc precision.c
-bash-3.2$ ./a.out
0.000000
I am saying that if it's not consistent with single-precision across multiple inputs, there's probably code that looks at the numbers involved and decides what kind of numeric type to use to do the calculations. In this case, single-precision floating-point happens to be a bad choice.
Second edit: replacing the 7 with a 6 still prints 0. The parent is right: something else is going on here. I like the theory espoused elsewhere in this discussion that it's a result of some truncation to avoid returning nonzero when the result should be zero.
What exactly do you mean, and how does it make "using floating point arithmetic with insufficient precision" a viable explanation for this, given that none of the usual floating-point formats behave in this way?
Insufficient precision implies too little bits to store the number in.
These are unsigned integers, they could have been represented as integers in a 64 bit unsigned value, if the number of bits in the variable destined to hold the input is not enough and you keep on working as though there are that's overflow al
right.
It should have simply thrown an error: operand too large.
I have a dark brown feeling that it won't be long before it does though :)
My definition of "overflow" is along the lines of "value is too big to be represented by said type". If they had been using integers, ok. But they're using floats, and floats can hold numbers bigger than that.
It all boils down to what type you consider the entries have, opposed to what Google considers.
I did not realize they use floats to represent all types, 64.64 would have been an excellent choice for this and would go a long way to solving the issue reported.
I remember a programmer once griping that you still need to be careful with floats, and that double-precision still did not have neough significant digits to store the dollar to turkish lira conversion rate..
No, you can do realtime graphics just fine with fixed point arithmetic.
You can do nice matrix and vector math entirely in 16.16 fixed point. It's not as fast as having a co-processor, but there was a time that those were not standard items.
This is clearly a floating point precision problem. It happens when are no bits small enough on the mantissa to express the difference between the two values.
This is definitely not just a matter of using double where something longer would have been better. 999999999999999 and 999999999999997 are both exactly representable as 64-bit IEEE floating-point numbers (i.e., doubles).
They're not using single-precision floats either; with one 9 fewer in each number, everything's fine, but those numbers are way out of range for 32-bit IEEE floats.
Doesn't look like fixed-precision decimals, either: reduce the second number by 1 and you correctly get a difference of 3.
Tweaking that second number, I find that when the difference is 2 or less we get 0, but otherwise we get the right answer. More evidence that it's not simply a FP imprecision problem. Further, if I approximately halve them both (just change the initial 9 to a 4) then a difference of 1 or less produces 0 and other results are correct.
I think what's going on is more likely this: whoever implemented this wanted to avoid giving nonzero answers to calculations whose correct answer is zero, and therefore put in some sort of deliberate truncation somewhere. (It's not clear to me exactly what -- maybe every addition or subtraction that produces an answer much much smaller than its inputs gets clamped to 0, or something.)
I don't think the truncation is happening purely at the output stage: if I as it to do the original calculation and multiply the result by 100, e.g., I still get 0. (I had to write it like that because otherwise HN's asterisks-mean-italics hack scrambles it. Even though the supposedly matching asterisk is in a different paragraph. PG, if you're reading this, I'd love it if you were to change that behaviour.)
In other words, I think we're seeing an unfortunate effect of the same feature that makes it say that cos(2*atan(1)) is 0 rather than reporting it as about 6.123 x 10^-17 (the actual result in IEEE doubles).
One of the reasons Lisp is a nice language for math is its numbers behave more like true mathematical numbers than the approximations of numbers that are easy to implement in finite computer hardware. For instance, integers in Common Lisp can be almost arbitrarily large rather than being limited by the size of a machine word.3 And dividing two integers results in an exact ratio, not a truncated value. And since ratios are represented as pairs of arbitrarily sized integers, ratios can represent arbitrarily precise fractions.4
On the other hand, for high-performance numeric programming, you may be willing to trade the exactitude of rationals for the speed offered by using the hardware's underlying floating-point operations. So, Common Lisp also offers several types of floating-point numbers, which are mapped by the implementation to the appropriate hardware-supported floating-point representations.5 Floats are also used to represent the results of a computation whose true mathematical value would be an irrational number.
Finally, Common Lisp supports complex numbers--the numbers that result from doing things such as taking square roots and logarithms of negative numbers. The Common Lisp standard even goes so far as to specify the principal values and branch cuts for irrational and transcendental functions on the complex domain.
These days quite a few languages already support bignums, rationals and even complex numbers. CL's support for these was unique back then, but I guess it has lost its edge in this domain.
Still, support of strict read/write invariance of floating point numbers (e.g. as described in Burger&Dybvig 1996 and Clinger 1990) doesn't seem to spread widely outside CL/Scheme camp. Concept of exact/inexact numbers as well.
IEEE 754 double precision can represent numbers with about 16 decimal digits of precision, however Google calculator (from anecdotal evidence) seems to only reliably represent numbers with 14 of precision (ex. http://www.google.com/search?q=99999999999999+-+999999999999...)
Interestingly, for pure decimal operations it seems some other floating point representation is used. For example, http://www.google.com/search?q=1.0+-+0.2+-+0.2+-+0.2+-+0.2+-... returns mathematically correct result of zero. If double precision were being used it should return the mathematically incorrect result of 0.0000000000000000555 because of floating point representations errors.
Here is what seems to have happened: Note that whenever the answer is not accurate, Google uses scientific notation. For example, 1e15 + 30 = 1.0 × 10^15. We accept this as true because Google only claims it is true to two sig figs.
However, in this case, Google means to say 0.0 × 10^15, indicating two sig figs of accuracy. Google's mistake is probably that the internal notation used does not allow it to distinguish between "0" and "0.0 x 10^15", and so it does not report that the result is only accurate to two sig figs as it should.
So, if Google somehow reports the "zero" result in scientific notation with two sig figs, then it will be consistent with the rest of their calculator and clear that it may not be an accurate result.
Even in non-standard models of arithmetic, where you could use infinity as an operand, the value of that expression would have to be undefined (not 1, as you seem to expect).
Yeah ... I was suggesting that a more sophisticated algorithm for arithmetic on very large integers could eliminate substantial similarities, leaving the 'zone of difference'. A limited 'bit window' needs to be more versatile.
For example, a while back a man got a bill for $35 billion ... that wouldn't happen with more intelligence built into routines. (OUR minds look at 999...999 and 999...997 and see quickly what can be eliminated.)
What about limit as x goes to infinity of x - (x - 1)? We do the algebra first, right? Now, that's a special case of y - (x - 1) where y = x, so can we get a different residue by going about the limit in two dimensions?
The thing about 4chan is that there's no order. That means not even chaos is an absolute must. Pictures of transsexuals and underage girls and people covered in fecal matter go alongside programming jokes, debates about Obama, and discussions of existentialism. This was closer to the latter.
I don't go on 4chan often - once or twice a year at most - but when I do, it's to get that completely random mix of things all at once. Every time, I come away with 3-4 things I didn't know/hadn't thought of before. This was one of the things I found this time.
The ability to ignore and avoid thinking about certain things is essential when visiting 4chan. If you practice, you can read sentences like the above without having mental images of the described event. More advanced filters can prevent you from thinking about it with imagery even after seeing something you wish you hadn't seen. ;)
Impressive. I find that extremely hard, it's like things play on an internal monitor that I'm forced to watch and can not ignore no matter how much I want to.
I run a webcam site and over the years we've had some pretty weird stuff happening there and I really wished I had not seen any of that.
It's a combination of ignorance and desensitization. My generation grew up with tubgirl and 2girls1cup. I first saw tubgirl/lemon party when I was 13. When I was I think 17, I came across goatse for the first time; by then, a man * his wide wasn't enough to faze me.
I wrote an essay about how this widespread desensitization will affect society and culture. It'll be interesting to see the world in twenty years.
I have to remember to shut that site down. I removed almost all my stuff back in June. It's all archived, but the archives aren't online yet. I'm trying to figure out how best to put it all up again.
When I read those sentences I didn't even come close to thinking about underage girls, girls covered in fecal matter, or underaged girls covered in fecal matter. ;)
Hey man, I have a hard enough time getting my head wrapped around learning (in no particular order) django, python and lisp right now :)
But I appreciate the pointer and I've filed it for future reference.
One thing though, my 'vivid imagination' really comes in to its own while reading books, I literally see the scene in front of me as though completely immersed in it.
Unfortunately, I cannot tell you exactly how to develop this ability, because I only discovered I had it by trying, but I remember not having the ability to NOT think about a pink elephant, so it can be developed. Somehow. :)
Perhaps it's not a unique case - icey's link explained the cause, which incidentally I hadn't known before posting this - but it's still one that's perhaps worth a conversation for some people.
1) It affects millions of people!
2) It affects large amounts of money!
3) It is a great example of a new or rare class of bug. Here's how you can avoid introducing similar bugs into your code! Here's how we can detect these things automatically! etc.
Here, the only thing that springs to mind is IEEE floating point being used internally by Google Calculator for some reason, which I find hard to believe. Where's the insight?