
'9223372036854775807' == '9223372036854775808' - moe
https://bugs.php.net/bug.php?id=54547
======
extension
I can understand the rationale for coercing strings to numbers for an
operation that is not valid on strings, but coercing strings to numbers _just
because it's possible_ is clearly a terrible idea. It's like they looked at
JavaScript and decided that the == operator was just not hazardous enough.

~~~
nikic
Please have a look at the comments by "jabakobob at gmail dot com" and myself
("nikic@php.net"). They explain why such behavior is actually good, in most
cases.

~~~
ctdonath
Throwing away data without unavoidably good reason is BAD behavior. Period.

We've got gigabytes of RAM and terabytes of storage to work with; don't throw
away characters in a string just because they don't fit in a compact data type
used only because there is a passing resemblance of one to the other.

If I'm comparing two 53-digit barcodes, and they differ only by the last digit
( _checksum_ ), then it's very important that comparing those two STRINGS
comes up FALSE.

~~~
nbpoole
> _If I'm comparing two 53-digit barcodes, and they differ only by the last
> digit (checksum), then it's very important that comparing those two STRINGS
> comes up FALSE._

Then use === and do a type-strict comparison.

    
    
        <?php
    
        // Prints bool(true)
        var_dump('9223372036854775807' == '9223372036854775808');
    
        // Prints bool(false)
        var_dump('9223372036854775807' ==='9223372036854775808');

~~~
ctdonath
Last week I spent 4 hours trying to find what turned out to be a missing =
because C++ will not complain, but will behave very differently, when = is
confused with ==. Now you want to add === to the mix?

I'll stand by the axiom that throwing away data should _NOT_ occur unless no
other sensible option is available. If I'm comparing two literal strings, I
shouldn't have to start with the obscure knowledge that a simple comparison
will result in an aggressive attempt to perform two consecutive non-obvious
type casts high risk of data loss.

I'm reminded of the great Belkin router fiasco: wireless routers were shipped
with the "hold muh beer" great idea that random web page requests would be
redirected to Belkin ad pages. I don't buy Belkin products any more (and that
was years ago now) because knowing they would go there broke the trust that
they wouldn't. Ditto here: if PHP is going to go to great lengths to try to
throw away critical data (hey, I'm storing those numbers as strings BECAUSE I
need _all_ the digits), then I can't trust that the language won't do other
similarly stupid things. I'm working in an industry where such a cavalier
attitude to data can cost MILLIONS of $$$ over one failure, and can't afford
to use a language where such failures are systemic. That there exists a
workaround is inadequate. </tangent>

Fine. I could use ===.

The problem remains that a fundamental axiom of the language design is that
casting lossless to lossy data types, without direction or warning, is
considered acceptable. Ya know, if PHP wants to convert my numeric strings to
integers for comparison, fine ... IF it maintains precision and preserves all
the data. I shouldn't have to know of and use other operators/functions to
explicitly avoid a pathological pursuit of forgetfulness.

~~~
ajross
No offense, but it took you four hours to chase down a syntax error? If it
survived long enough to see that kind of debugging effort, it was almost
certainly dead code; seems like a unit test would have caught it before the
first commit. And what compiler are you using? GCC certainly will issue
warnings when the result of a = operator is used in a boolean context.

~~~
to3m
"No offence", yes, always a good way to start one's post! I think you missed
out the "just saying", though. You need that too.

'=' vs '==' is not a syntax error. Consider "x=y=z" vs "x=y==z". And it's in
somebody else's code. And they wrote it 2 months ago, but the programmer who's
using it has only just started working with it. And they are super busy and
don't have time to look at it. And it sort of looks like the problem is in the
code you changed last week.

You can easily lose 4 hours over this stuff... have some imagination ;)

~~~
ajross
I'm just sayin', but this is a ridiculous strawman. (Well, you're right that
it's not a syntax error in the sense of compiler output. I should have been
more precise and called it a "syntax goof" instead.)

I addressed the "someone else wrote it two months ago" point above: If that
happened, and this was in the code, _it was dead code for two months_ ,
because it clearly couldn't have been running correctly. That's a process
problem, not a syntax issue, and the appropriate fix is clearly not to modify
the syntax of the language.

( _Edit for ctdonath: good grief. 1.) the reply was to to3m's post, not yours.
2.) The "two months" thing comes straight out of his example, please read it.
3.) It was a JOKE, based on his chiding me for language. 4.) Why are you still
flaming about this?_ )

~~~
ctdonath
"two months ago" is your own straw man. You made that up.

I'd written the code the day before, and it was failing a pre-commit unit
test. As I posted elsewhere, this kind of "forgot the second =" error _can_
compile without warning, esp. within a complex evaluation. The process was
running fine, as it caught the existence of the logic error early. That it
took hours to find was a matter of tracing symptoms back to cause in an
embedded system not easily debugged when running.

One could make a valid argument that this is a problem of language syntax, as
everyone has been bit by the = vs == difference. As such, and in line with
this thread OP, you'd think a new popular language would learn from that
mistake and would not throw === into the mix as a solution to an even more
obscure problem (casting a string to a float? really?).

------
fpp
interesting when you see your comments on hacker news as submissions to hacker
news a day later: ( <http://news.ycombinator.com/item?id=3825132> )

"...My 5cents on that (recently published issues):

<? if ('9223372036854775807' == '9223372036854775808') { echo "I can not
count!\n"; } ?> (see <https://bugs.php.net/bug.php?id=54547>)

or

built-in PHP web server dies with a large Content-Length header value: The
value of the Content-Length header is passed directly to a pemalloc() call in
sapi/cli/php_cli_server.c on line 1538. The inline function defined within
Zend/zend_alloc.h for malloc() will fail, and will terminate the process with
the error message "Out of memory". (see
<https://bugs.php.net/bug.php?id=61461>) Luckily we are getting Javascript
ready to replace all PHP on the server sooner or later ;-)..."

~~~
m_for_monkey
You are bragging as if you invented that bug :)

------
hiddenbayes
This behavior is documented here:
<http://php.net/manual/en/language.operators.comparison.php>

    
    
      If you compare a number with a string or the comparison
      involves numerical strings, then each string is converted
      to a number and the comparison performed numerically.

~~~
ctdonath
Ah, an obscure point of absurdity which utterly kills my pending interest in
the language. If this sort of thing exists under the hood, revealed only by a
detailed analysis of the specification, what other nonsense is there? Going so
far as analyzing a string to determine whether it consists entirely of numbers
for the non-sequitur process of then and only then converting it to what it
isn't for logical evaluation is working pretty hard to do something counter-
intuitive; might be tolerable if it actually preserved all digits, but not
only does it work hard to convert a string to an integer, it then converts
large integers in to floating-point values - not just one, but _two_ layers of
explicitly undesired and unnecessary and unreasonable typecasting.

I'm currently working with barcodes: numerical strings from 6 to 55 digits. In
no way can I risk having one barcode be evaluated as equal to a literally
different barcode just because the symbols in that string just happen to
exhibit a passing resemblance to data of a different type.

Again, it's not just that it has loose typing. It's that it's taking what is
OBVIOUSLY a string, converting it to an integer, THEN converting it to yet
another data type which imposes data loss.

Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew
you...

ETA: Oh, I'd love to know the justification for the downvoting.

~~~
JadeNB
> Going so far as analyzing a string to determine whether it consists entirely
> of numbers

I was about to give an outraged reply that, if PHP is like Perl, then it
doesn't scan the string afresh, just keeps a flag indicating whether or not it
thinks a string is numeric. However, it turns out that's not true at all.
`Perl_looks_like_number`, defined in `sv.c`, calls `Perl_grok_number`, defined
beginning on l. 577 (as of v5.14.2) in `numeric.c`, which (after some book-
keeping) does this:

    
    
        if (s == send) {
          return 0;
        } else if (*s == '-') {
          s++;
          numtype = IS_NUMBER_NEG;
        }
        else if (*s == '+')
        s++;
    
        if (s == send)
          return 0;
    
        if (isDIGIT(*s)) {
          UV value = *s - '0';
          if (++s < send) {
            int digit = *s - '0';
            if (digit >= 0 && digit <= 9) {
              value = value * 10 + digit;
              if (++s < send) {
                digit = *s - '0';
                if (digit >= 0 && digit <= 9) {
                  value = value * 10 + digit;
                  if (++s < send) {
                    digit = *s - '0';
                    if (digit >= 0 && digit <= 9) {
                      value = value * 10 + digit;
    

and goes on and on and on and on in the same vein. Sheesh! (I didn't forget to
close that last brace; the next line is de-dented, but that seems to be a
mistake.)

~~~
chromatic
Perl _does_ cache whether an SV contains something usable as an integer (the
IOK flag) or a floating point number (the NOK flag). That's why you almost
never see `looks_like_number` on its own and always called after using one of
the appropriate flag checking macros.

------
jrockway
This is why I like programming languages with type systems and "numerical
towers":

    
    
        Prelude> "9223372036854775807" == "9223372036854775808"
        False
        Prelude> "9223372036854775807" == 9223372036854775808
    
        <interactive>:1:25:
            No instance for (Num [Char])
              arising from the literal `9223372036854775808'
                       at <interactive>:1:25-43
            Possible fix: add an instance declaration for (Num [Char])
            In the second argument of `(==)', namely `9223372036854775808'
            In the expression: "9223372036854775807" == 9223372036854775808
            In the definition of `it':
            it = "9223372036854775807" == 9223372036854775808
        Prelude>
    

Yes. Strings are not numbers.

~~~
bromagosa
You don't actually need type systems to test equality and identity the right
way...

~~~
xyzzyz
The point here is that these strings are neither equal nor identical.

~~~
jrockway
Yes, it's the implicit conversion that matters. If you try to write 2 == 2.0
in Haskell, it will blow up, because doubles and integers are not the same
type. You need to explicitly convert one of them to another representation
before you can compare them. That guarantees defined and repeatable semantics
at compile time, which I think is excellent.

(This is not strictly required, of course; you can write a typeclass that
defines a two-paramater ==, instead of a -> a -> Bool, it could be a -> b ->
Bool. But that's dumb, so nobody does.)

~~~
syaramak
Uh? I just tried it and it works without blowing up.

Prelude> 2 == 2.0 True

------
postfuturist
As a working programmer who has to use PHP, I just use === all the time and
have long since moved on from even thinking about the insanity of PHP's ==
operator. Kinda like JavaScript programmers.

~~~
prodigal_erik
This. PHP's "==" is yet another trap of incompetent language design and almost
all code that ever used it does the wrong thing for some inputs. $x == $y &&
$y == $z doesn't even tell you that $x == $z, much less that $a[$x] == $a[$y].

------
ww520
A somewhat related story dealing with MaxInt in Javascript.

One of the worst bugs I've encountered years ago involved the conversion of
Javascript int from string to number. Javascript's long integer has only 53
bits, while most other languages have 64-bit long int. When the backend
language generated Javascript snippets (JSON) containing integers greater than
53 bits, the horror started at the frontend. Javascript happily truncated the
int to 53 bits upon conversion from string to int. It was not a happy tale
since those long integers were account numbers. The wrong accounts ended up
getting updated, randomly at first appearance.

~~~
mikeash
I think the lesson there is that numeric types should only be used for things
you actually want to do arithmetic with. An account ID that just happens to be
all digits should still be stored and transmitted as a string.

~~~
ww520
The lesson I got was to be very careful about data type limitation when going
across language boundary. The problem is not limited to numeric types.
Different encoding and code page can screw up string values as well.

~~~
mikeash
If you're not using UTF-8 everywhere then you're doing it wrong. Exceptions
made for legacy systems, but you should get that data into UTF-8 as soon as
possible.

~~~
ww520
It's unwise to lazily adopt a silver bullet without understanding the context
and thinking through the consequence. I can say if you are not using XML with
encoding specified to encode everything everywhere, then you are doing it
wrong. You should get all your data into XML as soon as possible. Of course it
sounds ludicrous.

~~~
mikeash
XML is just one data storage and exchange format above many, with no
particularly interesting properties and no compelling reason to use it. UTF-8
is the _only_ encoding that's ASCII compatible, widely accepted/expected, and
can represent any text you'll ever encounter.

I can come up with half a dozen reasons to use something other than XML for
data storage. I've yet to hear anyone give me a compelling reason to use
something other than UTF-8 for encoding strings. Just because what I said is
absurd when you replace UTF-8 with XML doesn't mean the original was absurd.

~~~
ww520
UTF-8 is not efficient for random access.

I don't have problem with UTF-8. I have problem with the silver bullet
attitude advocating using an approach for all cases without thought. That's
just intellectually lazy.

~~~
mikeash
No encoding that can handle all the necessary languages will be efficient for
random access.

I'm not saying don't think about it. But once you think about it, I think
there's really only one sane conclusion to reach.

~~~
ww520
Never say never. UTF-32 handles them just fine.

~~~
mikeash
Precomposed versus decomposed accents? Jamo versus precomposed Hangul
characters? The Unicode code point is rarely useful thing to know about on its
own, and code which assumes that one code point equals one "character", for
whatever definition of a character is in use, is likely to work poorly with
UTF-32.

------
Spoom
Some of the comments on the bug report asking for the operation of == to
change are misguided. Such a change would break many real-world applications.
As I understand it, PHP is casting number-like strings to integers, and this
fails because both numbers generated from the cast are above PHP_MAX_INT, so
their values are undefined.

This is easily solved by using the type-checking === operator, which exists
for that purpose.

I hesitate to say that this is a feature, not a bug, but it is clear that this
is documented behavior.

------
pestaa
In this perspective, writing `strcmp` everywhere is not boilerplate, but a
requirement.

(Such that

    
    
        strcmp('9223372036854775807', '9223372036854775808');
    

returns -1, meaning the strings are _not_ equal.)

~~~
vladev
No, you just need strict comparison (like you need it in JS and any other
language with weak typing):

    
    
        php > var_dump('9223372036854775807' == '9223372036854775808');
        bool(true)
        php > var_dump('9223372036854775807' === '9223372036854775808');
        bool(false)

~~~
eblume
I beg your pardon, but you do not 'need it in JS', if 'it' is referring to
using === instead of == to compare strings.

Here is what node.js says:

    
    
        > "9223372036854775807" == "9223372036854775808"
        false

~~~
mikeryan
Ah but more fun with Javascript

>>> "9223372036854775807" == "9223372036854775808" false

>>> 9223372036854775807 == "9223372036854775808" true

>>> 9223372036854775807 == 9223372036854775808 true

I believe the grandparent post is more referring to "general" use cases then
this one. Personally I now default to strict comparison operators both in JS
and PHP unless I explicitly want a loose comparison and end up missing most of
these strange vagaries these days.

------
DiabloD3
The fact PHP fails at floating point math isn't news (and I swear I've seen
this exact bug somewhere, same numbers and everything, somewhere else)...

Its the fact PHP refuses to fix it that is the news.

~~~
gloob
This sort of thing is an intrinsic property of floating point math. It has
limited precision. When the numbers get sufficiently large, that precision is
insufficient to distinguish successive integers. That is to say, this is
symptomatic of PHP implementing floating point stuff correctly.

This is what ghci (Haskell) says:

    
    
      Prelude> 9223372036844775807 == 9223372036844775808
      False
      Prelude> 9223372036844775807.0 == 9223372036844775808.0
      True
    

This is what Python says:

    
    
      >>> 9223372036844775807 == 9223372036844775808
      False
      >>> 9223372036844775807.0 == 9223372036844775808.0
      True
    

Here is what SBCL (Common Lisp) says:

    
    
      * (= 9223372036844775807 9223372036844775808)
      NIL
      * (= 9223372036844775807.0 9223372036844775808.0)
      T
    

Lua:

    
    
      > print(9223372036844775807 == 9223372036844775808)
      true (!!!!!)
      > print(9223372036844775807.0 == 9223372036844775808.0)
      true
    

Javascript:

    
    
      alert(9223372036844775807 == 9223372036844775808)
      true (!!!!!)
      alert(9223372036844775807.0 == 9223372036844775808.0)
      true
    

Other languages that will also do this[1]: Javascript, Lua. Languages that
won't: anything with actual, honest-to-god integers, and not floats or doubles
masquerading as them.[2] Languages that actually handle numbers sensibly:
Lisp.[3] I'm not familiar with any others that actually treat rational numbers
like rational numbers, but I expect there are some. (It's still, of course,
impossible to treat real numbers like real numbers, meaning that this sort of
thing will also happen there.)

[1] Well, not the string-to-number bit, but whatever.

[2] Except for the niggle that they'll still do this when you're using
floating point numbers, because this is what floating point numbers _do_.

[3] <https://en.wikipedia.org/wiki/Numerical_tower>

~~~
eblume
I think you may be missing the point, or maybe it's I who am missing the
point. Your three very thorough examples are a good way of showing how (most?)
languages handle floating-point arithmetic vs. arbitrary arithmetic.

But it seems to me - and let me stress that I am not a PHP developer and won't
be bothered to install PHP on my machine at this time - that PHP is failing to
exhibit exactly the behavior your code examples are giving.

Put it another way - type coercion 'run amok' being another thing entirely,
you are correct that this bug stems from the fact that PHP is converting these
integers to floating-point, and the standard floating point implementations
will all behave in this exact way (thus, not a PHP bug.)

However, the issue here is that (again, "most?") languages also provide an
easy way to get to arbitrary-precision arithmetic - and indeed, in the three
examples you posted, you simply encode in the most natural way (by simply
writing them) the two integers and they automatically compare correctly.

My understanding is that this is not the case in PHP, and that is a shame.

~~~
gloob
I agree that, when we are talking about high-level languages, I prefer ones
that will transparently convert integers to bignum when required. I'm just
replying to the contention that (paraphrased) "this is a bug in PHP's handling
of floating point numbers".

~~~
eblume
Ah, point taken, sorry about that. :)

------
fexl
This is how the strtol function in the standard C library works. Here's a test
program:

    
    
        #include <stdio.h>
        #include <stdlib.h> /* strtol, strtod */
    
        /* Convert string to long and return true if successful. */
        int string_long(char *beg, long *num)
            {
            char *end;
            *num = strtol(beg, &end, 10);
            return *beg != '\0' && *end == '\0';
            }
    
        int main(void)
            {
            char *x_str = "9223372036854775807";
            char *y_str = "9223372036854775808";
    
            long x;
            long y;
    
            int x_ok = string_long(x_str, &x);
            int y_ok = string_long(y_str, &y);
    
            printf("x_str = %s\n", x_str);
            printf("y_str = %s\n", y_str);
    
            printf("x = %ld (ok=%d)\n", x, x_ok);
            printf("y = %ld (ok=%d)\n", y, y_ok);
            printf("x and y are %s\n", x == y ? "equal" : "not equal");
    
            return 0;
            }
    
    

And here's the output:

    
    
        x_str = 9223372036854775807
        y_str = 9223372036854775808
        x = 9223372036854775807 (ok=1)
        y = 9223372036854775807 (ok=1)
        x and y are equal
    

I compiled it like so:

    
    
        gcc -c -Wall -Werror -ansi -O3 -fPIC src/test_num.c -o obj/test_num.o

~~~
fexl
Incidentally if you try using those numeric constants directly in a C program,
it fails to compile:

    
    
        #include <stdio.h>
    
        int main(void)
            {
            long x = 9223372036854775807;
            long y = 9223372036854775808;
    
            printf("x = %ld\n", x);
            printf("y = %ld\n", y);
            printf("x and y are %s\n", x == y ? "equal" : "not equal");
    
            return 0;
            }
    

The error message is:

    
    
        gcc -c -Wall -Werror -ansi -O3 -fPIC src/test_num2.c -o obj/test_num2.o
        src/test_num2.c: In function ‘main’:
        src/test_num2.c:6:11: error: integer constant is so large that it is unsigned [-Werror]
        src/test_num2.c:6:2: error: this decimal constant is unsigned only in ISO C90 [-Werror]
        cc1: all warnings being treated as errors

------
ars
Judging by the huge number of stupid comments on that bug report this bug was
posted on reddit.

If you don't actually use PHP (as most of the commenters seem not to) don't
comment on the bug, it has nothing to do with you and you are just making
noise.

------
caioariede
You can do literal comparison using ===

But the problem in my perception is that it is very error-prone. A less error-
prone solution would be only convert one string value to number when the
another is really a number. For example:

'9223372036854775807' == 9223372036854775808

~~~
mistercow
> A less error-prone solution would be only convert one string value to number
> when the another is really a number.

That is, in fact, exactly how JS handles it.

------
ndefinite
It's the same with JavaScript regardless of whether you use == or ===

9223372036854775807 == 9223372036854775808

true

9223372036854775807 === 9223372036854775808

true

~~~
burgerbrain
Is _"javascript does it too"_ supposed to be a defence?

~~~
ndefinite
More of an observation than anything else. I don't actually use PHP but find
these recent PHP focused articles are helping me learn at thing or two about
languages I do use.

~~~
burgerbrain
Fair enough.

