Hacker News new | past | comments | ask | show | jobs | submit login
Why MD5('240610708') is equal to MD5('QNKCDZO')? (stackoverflow.com)
81 points by paulgb on April 12, 2023 | hide | past | favorite | 80 comments



By the way, in JavaScript:

  > let a = "0e462097431906509019562988736854"; // = md5("240610708")
  > let b = "0e830400451993494058024219903391"; // = md5("QNKCDZO")
  > let c = 0;
  > a == c
  true
  > b == c
  true
but

  > a == b
  false
(Of course, also solved by using ===)


Oh that is nasty.

For anybody who cares: `a == c` coerces `a` into a number . `a` is a hexadecimal string that happens to have only one non-numeric 'digit', and that digit happens to be an 'e'. This triggers parsing as scientific notation. For both `a` and `b`, the only digit in the significand is 0, so the result is 0.


Roughly 7 years ago, I started using strict equality comparisons (===) in PHP and JS to avoid issues like this, and doing so has had zero negative impact on my code and obviously a MUCH higher upside. Is there any reason for using sameness (==) merely beyond convenience?


I can't even remember the last time I worked in a Javascript project where you could find "==" in the codebase. It even got phased out as a convenient "null or undefined" check.

Between it being an ancient footgun and the ubiquity of linters, making fun of "==" code ends up just making fun of beginners.


I still use it for null/undefined checks, because it’s almost always what’s intended/expected. I was surprised to find out recently that this is even a remotely controversial take.

I do try to use the newer language features like nullish-coalescing and optional chaining to make the check less necessary, but sometimes it makes more sense (clearer code, less unnecessary code execution) to check for null/undefined and return early.


Even with a linter that can approve of that one use case, the problem with == these days is that it catches everyone off guard, and every developer who sees the code has to do the same "did they mean to use this or-- oh, it's for a null|undefined check" every single time.

Just because of that I've created an `isNil = (x) => x is undefined or null` function and banished == from my code.

== can lead to what we've jokingly called tollbooth code: code that can make every reader stop and pay a mental toll while they confirm that the writer did intend to write the code that way.


This is surprising to me! I/my team do use such a linter rule (non-strict equals errors for all cases except directly comparing to null), and I don’t even notice it while reading code because the linter is satisfied, and because it reads as (obviously, to me) correct.

It never occurred to me that it could be so noticeable to others as to add any cognitive overhead. I’m not discounting that at all, to be clear, only that it’s not something I’d ever considered before now.


We have banished it as well for this exact reason — the “toll” — if you want 1 more piece of anecdata.


Thank goodness for nullish coalescing.


I think it's a lot less nasty than the Python version. At least the JS version only gives you a weird answer if you're comparing a number to a string, while Python will automatically do a weird type conversion even if you're just comparing two "number looking strings" to begin with.

Edit: I'm an idiot, thanks for the responses for correcting me, was thinking PHP but had Python on the brain I guess. Original bad edit: I was downvoted pretty hard, but I think I was just unclear. In Python, doing md5("foo") == md5("bar") returns true because Python converts both strings to numbers first because they have an e in them. That is, in my opinion, Python is doing an unnecessary conversion there that JS doesn't do in that case. See https://stackoverflow.com/questions/12598407/why-does-php-co...


... no? Python's == never does implicit type conversions unless one of the types overrides it to do so (neither str nor int does)


There's one small gotcha with Python.

  >>> 1 == True
  True


Yes, but not because of type conversion: Python's booleans are a subclass of int.


I think you keep writing "Python" when you intended to write "PHP".


Lol, thanks, I'm an idiot, it was the brain fart that kept going.


Python doesn’t do any automatic type conversion for strings


That's false, and that's the point I was trying to make, albeit unclearly - see https://stackoverflow.com/questions/12598407/why-does-php-co...


That's about PHP


That's more sensible though. It's just trying to interpret the invention when comparing different types. The fact that == breaks even when the types are the same is much more surprising


This is not a criticism of Javascript; this is history. There has been a lot of work done on Javascript over the decades. But historically, it is a language that was put together in a week in an emergency, last-ditch effort to keep Java from being the language instead. That's literally a week, not a rhetorically-exaggerated week.

For a one-week language, it's pretty darned good! But it was still a one-week language. The behavior of the == operator is one of the places where that really shows. I'm sure Brendan Eich would have cleaned that up with more time, but there wasn't any. And the behavior of the == operator is just one of those things that simply can not be fixed for backwards compatibility reasons. That's why an additional one had to be added later that works more sensibly.

There's a few other old corners of JS that have survived into the modern era. See also the excitement of hasOwnProperty and all that involves. This was a huge mess for a long time.

The work on cleaning it up, against the requirements of backwards compatibility and the uphill battle of having so much of it deployed out in the world, has been exemplary. But there is still some things that just can't have much done about them, == being one of them.


You're mistaken about == in the first ten days Mocha implementation. If types differed, its result was false. I changed it after that, due to internal Netscape requests from the LiveWire team to make it sloppy. Their use cases included comparing HTTP headers reflected as strings to numbers such as 401, and database fields containing numeric strings to numbers. I've written about this many times, e.g., https://twitter.com/BrendanEich/status/1641067035391332354.


If it had been Java, we would have saved billions of dollars, multiple billions of programmer hours. Sounds like a colossal mistake in retrospect.


Might be true in 2023, won't argue either way. In ~1995, it was insane. Java startup costs were dozens of seconds, and that's assuming it didn't have to push anything into swap to do it. The browser needed some sort of language that was always present and didn't have such a staggering footprint, some sort of built-in language that could be parsed and executed without all that. In a perfect world we might have gotten an existing language that fit the bill, but in literally-a-week, it was easier to put something together new inside the browser than to integrate anything into the codebase.

The reason for the deadline wasn't that Java was a good idea. You should also bear in mind we're not talking the Java of today, but the Java of not-even-1.0 yet. Java in 1995 wasn't a very good language. Sun was just jamming it everywhere they could, and the browser looked like an appealing target from a marketing point of view. From a technical point of view it was insane.

Java grew up, certainly. But it was forced on the world on the back of an enormous marketing budget and a full court press, not the technical merits of Java circa 1995. Many people around here remember the XML push forcing a technology everywhere, even where it doesn't belong, well, the one before that was Java. Completely inorganic.


Eich is a smart man, and already had experience with Scheme at the time. A subset of Java with a fast interpreter would have been much preferable. Lots of great small languages existed. It makes for a great heroic story, but really it should one of a smart person making a hasty design that basically sht on the future.

One of the inspirations for Java was the language "Bob" by David Betz

https://github.com/dbetz/bob

Even it would have been a win over JS.

https://github.com/dbetz/bob

    @Article{Betz:1991:YOT,
    author =       "David Betz",
    title =        "Your own tiny object-oriented language",
    journal =      j-DDJ,
    volume =       "16",
    type =         "PL",
    number =       "9",
    pages =        "26, 28, 30, 32--33, 86, 88--89",
    month =        sep,
    year =         "1991",
    CODEN =        "DDJOEB",
    ISSN =         "1044-789X",
    bibdate =      "Tue Sep 10 09:11:02 MDT 1996",
    bibsource =    "http://www.ddj.com/index/author/index.htm;
                    http://www.math.utah.edu/pub/tex/bib/dr-dobbs-1990.bib",
    note =         "Reprinted in \cite{Betz:1994:YOT}.",
    acknowledgement = ack-nhfb,
    classification = "C6140D (High level languages); C6150C (Compilers,
                    interpreters and other processors)",
    keywords =     "Bob; C++; C-like syntax; Class system; Interpreter;
                    Lisp; Tiny object-oriented language",
    thesaurus =    "C listings; High level languages; Object-oriented
                    programming; Program interpreters",
    }
http://www.txbobsc.com/misc/bob-article/bob-article.html

It shouldn't be excused. Every feature is the potential for an equal or larger detractor.


When does == break when the types are the same? Can't think of any case.


I'd imagine they are referring to the post we are commenting on (whose context is PHP).


Ah right, forgot the stack overflow post is about PHP.


What "breaks" when types are the same? The strings do not have the same contents.


I'm referring to the article about PHP. Even the strings being different, they compare to true.


Thanks, I should have checked context and figured it was PHP. I met Rasmus in 2015 while in Taipei. We got on fine, two old C/Unix hackers. PHP has evolved a lot but I have to say some of its quirks make me slightly less ashamed of JS's quirks ;-).


Well, we are still in the "play silly games" territory, aren't we? What should comparing a string to a number equal anyway?


The program is semantically incorrect, the compiler or interpreter should complain that you are attempting to ask for equality of a string and a number, and it doesn't know how to answer such a question. Then the programmer can fix the mistake.

Here's Rust for example: error[E0277]: can't compare `&str` with `{integer}`

[The full diagnostic points to where exactly your code tries to perform such a comparison and explains that you could do this if PartialEq<{integer}> was implemented for &str, which it is not, and then suggests things which are PartialEq for &str, such as other strings and various string-like objects a sane person might compare them to]


I'm not sure about comparing strings to numbers.

But I know I appreciate it when I can query my database with:

  SELECT * FROM orders WHERE delivery_day
    between '2023-04-10' and '2023-04-12'
Instead of using

  SELECT * FROM orders WHERE delivery_day 
    between PARSE_DATE('%Y-%m-%d', '2023-04-10')
    and PARSE_DATE('%Y-%m-%d', '2023-04-12')
So I can see why language designers put in some implicit conversions.


The difference is that it won't turn "abcjhd" into "1970-01-01", but rather issue an error.


> What should comparing a string to a number equal anyway?

I think there are two valid options: false and signaling an error (by any means such as returning null, throwing, etc). JavaScript chooses the third (for some strings)


I'll cop to having cooled a lot on dynamic languages in general over the past 10 years, but broadly speaking I think it's reasonable to say the decades of experience of the various dynamic languages leans in favor of == being strict in terms of the types of the operands and making it either "false" or an exception. The convenience of comparing a string "124" to the number 124 is outweighed by the near-continuous stream of bugs generated by the languages that will happily yield "true" for that. If I were writing a new dynamic language I'd look to emulate Python's == more than Perl or Javascript here.

In general == should not coerce types automatically. Automatic type coercion, even in dynamic languages, is the worst aspect of dynamic typing.


What should 0x40 == 64 equal? Ought hexadecimal integer and decimal integer be the same type?

As long as conversion between representations is well-defined and not hugely surprising, what does it matter?


> As long as conversion between representations is well-defined and not hugely surprising, what does it matter?

Surely the MD5 example demonstrates exactly how this ends up being a problem.

On the surface "1e5" == 100000 seems like a reasonable conversion between representations and unsurprising, but by silently converting things and then comparing the result we get this astonishing result.

Conversions need to be made explicit. This is an opportunity for programmers to consider whether they actually wanted a conversion (the MD5 programmer did not) and for API designers to offer multiple distinct conversions when appropriate. "1e5" == 100000 is only one reasonable interpretation. Maybe it's a lowercase hexadecimal value. If you make the programmer explicitly convert they can choose.


> On the surface "1e5" == 100000

Coercing an explicit string e.g. "1e5" to a numeric is boneheaded. It's fine to support different numeric formats like exponent or hex format. Weak typing and type coercion is a major footgun in a language.


> Weak typing and type coercion is a major footgun in a language.

Type coercion is an expression of strong typing, and it's inevitable if you want a language that's usable.

For example, "array of int of length 5" is a type, and "array of int of length 6" is a different type. Do you want to write a separate sort function for each of those types?


I don't want to write a separate sort function, the computer can and should do all that work for me automatically.

Because Default is tricky to genericise, there literally are 32 distinct implementations provided for Rust's [T; 1] through [T; 32] of Default::default()

https://doc.rust-lang.org/src/core/array/mod.rs.html#467


They are the same type and represent the same number. Just different literals. "ABC" is also the same as "A" + "B" + "C".


> As long as conversion between representations is well-defined and not hugely surprising, what does it matter?

In Javascript:

    let a = '0';
    a == !a; // True. Why?
    let b = !a;
    (a == b) == (!a == !b) // False. Why?
None of the coercions involved is particularly surprising, but the result most definitely is.


It should throw FileNotFound.


For those who missed the reference: https://thedailywtf.com/articles/What_Is_Truth_0x3f_


It’s always fun to have an equality operator that isn’t transitive.


Every single JS/TS codebase I have ever worked on, == was disallowed by the linter. There is zero reason to use == instead of ===, so while this is a cool quirk, it's hardly a 'gotcha, look how bad javascript is'.


The "triple equal" or "really really equal" or whatever we call it is the stupidest thing a language can have. If your language needs something like that, it already failed* somewhere along the way.

* Which doesn't imply it's not useful.


As others are saying, I don't think "==" vs "===" are a bad thing, what is bad is allowing 2 objects of different types to be compared, which is really why this md5 bug occurs. The implicit coercion from one type into another automatically creates a rats nest of problems.


No, here they were both the same type, but they were cast to int anyway.


I understand, but the root cause is allowing equality checks between different types. As a result of that decision you need cascading rules on what conversions take higher priority then others. Clearly php has decided that when a string goes through an equality check against another object, that integer conversion is placed above byte comparison


In python, I find it pretty intuitive. It's very convenient to have "do these two objects have the same contents" vs "are these two the same object at the same memory address."

Other language design options are to use pointers / references, be strongly typed or be functional. Seems like you're saying all weakly-typed object-oriented languages are failures?


I don't think "===" in PHP is the same as "is" in Python, though? Identity is important (why would you want to check for 0, None, "", etc.. in separate cases, unless you want to?) but I don't think that is what "===" in PHP.


PHP exists in a place where everything's a string (HTTPland). Type coercion makes more sense there.


Oh you're right. I actually was thinking python "is" was written "===" because I got confused with javascript, yikes.

And I don't speak PHP.

Anyway yeah being able to compare identity and equivalence separately seems important.


Pretty hard disagree. There are fundamentally different definitions of "equal", and pretty much all languages I can think of support that (e.g. reference equality vs. Object.equals in Java).

You can argue that triple equals is a confusing syntax, but the underlying problem is inherent in the world we live in.


> You can argue that triple equals is a confusing syntax

To me this is the problem with triple equals. The different types of equality should be explicit in their use rather than overloading a sigil.


Shouldn't that logic also apply to ==, which overloads the assignment operator (=) in the same way?


Very much agree with this one, but would phrase it as using = to represent assignment was the bad original choice.


If the average keyboard had support, characters like ≡ could be used as the equivalence sigil and ≢ for not equivalent.


This makes me so glad I don't use PHP.


This is great news! Everyone who doesn't use PHP increases my hourly rate ;-)


Every language has it's quirks.


That only means you have to distinguish by frequency and magnitude of quirks, and languages are very different then.


That's not a quirk. That's a fundamentally broken design. It flagrantly violates the principal of least surprise. The language is almost helping people write bugs in their code.


It's the kind of thing you get with weak typing.

Perl does better here by requiring "eq" for strings, so there's less chance for such an issue to happen. But you have the "zero but true" fun quirk -- "0e0" evaluates to 0 numerically, but 1 logically because it's not an empty string.


The casting occurs for both operands that have same type, in narrower type space, with non trivial rules, behind your back, is really acid level of a surprise. The kind that would make the removal of the "==" operator reasonable.


Yeah but there are levels of weak typing. JS does not do this as both operands are strings, it compares them as strings. That makes tons of more sense compared to this. Not that JS is the most sane language out there but PHP tends to make it a lot easier to shoot yourself.


Python does weak typing and doesn't have 1% of the unexpected nonsense PHP has.


You're probably thinking of dynamic typing, which Python has, but Python's typing is strong, not weak, so e.g. '1' == 1 returns False.


Python has strong typing, but it does use duck typing.

Weak typing is when a language inserts implicit coercions in unexpected places, such as 42 == “42”. It’s not a precisely defined term. Another example is in C where pointers and values may be mixed and matched, although less so nowadays.


Python has dynamic but strong typing. There's no hidden type coercion. A dev might run into behavior they don't expect but the behavior is documented and regular.


Python is strongly typed though.


And this one had a purpose. Remember, PHP was designed (as much as it was "designed") for less-technical people. And everything comes through $_GET and $_POST as strings.

The way I see it, this was a crude attempt to make things like $_GET['age'] > 18 work without having to remember to convert types manually.

Javascript was the same with input.value

The == conversions become more intuitive if you keep this use case in mind and think about semantic equality instead of programming's more traditional exact equality.



Man, this takes me back.

I remember trying to help a friend with their PHP assignment in college (~2001). I had only ever worked with C, C++, and java in my classes and I just remember being was confused as hell since I couldn't understand why PHP worked the way it did.


Uh the joys of PHP's type juggling. Fairly sure this bug is still present in many systems.


Learnt about this 8 years ago when it was on top of HN: https://news.ycombinator.com/item?id=9484757


Linter would cought this error. Seriously use linter, define a shortcut for it. It catches stupid errors like these, missing brackets, uninitialized variables, etc...


WAT




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: