IMO, any language with a type system should enforce comparisons only between those types (or derived types) unless a developer explicitly overrides an operator--and even then, overriding operators is sometimes considered bad practice due to violating the principle of least astonishment. The types of a comparison operator's operands matter because the type system exists. Allowing things like "5" == 5 suddenly introduces human expectations of logical equality--"they're both '5', therefore they're equal." Programming languages shouldn't codify these human interpretations.
My favorite language, C#, has some unfortunate history. The == operator when dealing with references indicates reference equality (Object.ReferenceEquals), but can be overridden to indicate value equality (Object.Equals) even with reference types. Then, when dealing with value types, == suddenly means value equality. It would've been better to define two separate operators for reference and value equality. This would've helped make usages self-documenting.
You can already kinda sorta do that if you consistently use == assuming value equality, and use ReferenceEquals() - which doesn't need to be qualified, since everything inherits from Object - for reference equality.
You still have == working on all references in this case, but this can be interpreted as reference types having the default value semantics of "every instance is logically different", so referenced values are only equal to themselves.
The annoying part of this approach is that ReferenceEquals() is the one that ends up being used way more often than == in practice for reference types, but it's also much longer to type, and more awkward to use being a function. Python and VB, with their "is" operator for reference equality, handled this one best. Unfortunately, C# wasted "is" as a keyword for something far less common - a type check (and now also pattern matching).
> IMO, any language with a type system should enforce comparisons only between those types (or derived types)
There are languages with a type system that have universal type (and all types are derived from it), in these languages it make sense that "5" == 5 is defined and false.
A universal type doesn't necessarily have to be arbitrarily comparable between instances. The strictly correct response to "5"==5 really is "not comparable"; it's just a question of how you choose to surface it. One popular choice is to define the equality operator as returning false if the operands aren't comparable - but then you have to do the reverse for inequality, which is far less obvious.
That historical... erm... glitch... in C# is very unfortunate and burned us many times back when I was working in a .NET shop (people getting creative was usually the culprit). I've even seen some codebases where people decided to implement their own equality operators (lesson through experience: don't do this).
They picked it up from Java, which also used == to mean both reference equality and value equality.
But C# also tried to improve on Java. One of the things that the latter was commonly criticized for, was the need to use .equals() for strings for value comparisons - which is almost always what you want - and avoiding the more obvious ==. So they overloaded == for strings to mean value equality, while retaining them as reference types.
IMO, the bigger mistake here was to keep strings as reference types - they really ought to be value types (the implementation can still wrap a reference to the actual string data internally, and share it between copies, but the client shouldn't be able to observe that).
Funny how they say SQL gets it right, but then shows an example which is the opposite of my experience.
From PostgreSQL
# select '5' = 5;
?column?
--------
t
(1 row)
I believe this is standard SQL. Putting something in quotes just means it is a literal, doesn't mean it is a string. Of course if you explicitly cast it to a string, then you get an error. As in:
select '5'::text = 5;
ERROR: operator does not exist: text = integer
LINE 1: select '5'::text = 5;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
I agree with the premise - SQL gets it right, but perhaps only in PostgreSQL?
> I agree with the premise - SQL gets it right, but perhaps only in PostgreSQL?
I don't agree with the premise. Both MySQL[1] and MS SQL Server [2] do an implicit cast and then compare them as numbers. I don't know whether this is ANSI standard behavior though.
The only acceptable answer is a type error. His example of python being able to handle heterogeneous collections is fine, but a good language will make you be explicit about the types.
let a: String = “5”
let b: Any[] = [5,6,7]
let c: Int[] = [5,6,7]
a in b // false - we don’t know what’s in b so we check and a is not in b
a in c // type error - there’s no way a is in c. Convert a to an Int, and handle the error cases involved in that.
In dynamic languages, like Python, all variables have a static type of "Any", so "5" == 5 has matching static types. Then the runtime types (or tags as some prefer to call them) are checked in the implementation of == itself. This is perfectly equivalent to what must happen in your example with `a in b`. Python can't express something like Int[], though.
I've spent a lot of time with languages that use runtime enforced types, and then a lot of time with compiler enforced types. The latter is just better. We can argue about how strict the typing system should be, how much it should guess at types or coerce types, etc. But there is no world where `"5" == 5` shouldn't at very least throw up a warning to let the programmer know that STUFF is happening here that you might not expect.
This isn't a function of static vs dynamic typing, though. You can write "5"==5 in Python, because, while == is always value equality, the root class provides a default implementation for it that simply checks the reference. They could have made that default implementation raise an exception, though.
I believe the main reason why it doesn't, is because it allows to write kinda sorta generic code that always uses == for comparison, and have it work for the most common cases for both value and reference types (defined semantically here: reference types are those where object identity matters in some way, value types are those where it doesn't). For example, functions that work on lists, or even the standard operator "in". If not for this hack, everything that needs to compare things would have to take an explicit comparator function for every separate comparison context to ensure that caller can request "==" or "is" as appropriate for their use case.
FWIW, I believe that explicitly parametrized comparisons should be the norm in generic code, not exceptions, even if it's more verbose. But mainstream PL design hasn't really embraced this. Much like how we keep repeating the mistake of making strings indexable and iterable without specifying what is being indexed or iterated (bytes, codepoints, graphemes etc) - because programmers are so used to the convenience of doing that, not because it was ever a good idea.
> They could have made that default implementation raise an exception, though.
That wouldn't be an improvement, tho. Runtime exceptions mean you have to exercise that code-path to find something as trivially detectable as comparing a string to an integer. It has to be a compile-time check (or a validation step for an interpreted language).
To specifically address your scenario:
> If not for this hack, everything that needs to compare things would have to take an explicit comparator function for every separate comparison context to ensure that caller can request "==" or "is" as appropriate for their use case.
This is a solved problem in statically typed languages. All it takes is a protocol/interface called "Equatable" or "Comparable" or something like that. Collection.contains is only available if the collection's contents implement that "Comparable" interface/protocol. For every comparable thing, == is a required fn. A good clear example of this is `equatable` in swift https://developer.apple.com/documentation/swift/equatable
Swift takes the stance that nothing is Equatable unless it opts into the protocol. It also synthesizes a definition of == if the type is simple enough. Another language might say, ok everything is equatable, the default definition is just comparing memory addresses (a->ptr == b->ptr) - but that would still require a and b being of the same type. If you want to compare an Apple and an Orange `==` is the wrong tool. Something like `apple.sameCaloricContent(orange)` i.e. a function that describes how you're comparing two un-alike things would be the right way to handle it.
That would be an improvement, surely? Just not one that is good enough...
But at that point we're discussing a much broader problem (or feature, depending on who you ask) with dynamic typing in general, which is different and orthogonal to the problem with implicit conversions that "5"==5 is all about: whether it's an error in the first place or not.
> This is a solved problem in statically typed languages.
It's only a solved problem if you follow the patterns consistently, which many modern languages do not (e.g. look at C# and Java, which have == and Object.Equals() and Equatable, and where collections of equatable items aren't themselves equatable).
But even then, it doesn't really solve the "contains-problem", because the request is ambiguous by the very nature of a collection of references - perhaps you do, in fact, want to check if a reference to an object is present in the collection or not, rather than checking for something that is value-equal. This also goes for comparing whole collections for equality element-by-element etc. All those cases, at the minimum, need a way for the caller to specify which kind of comparison they need. Worse yet, there's no clear default here - both are common - so, ideally, it should always be explicit.
Equality in particular is a tricky subject with languages that have reference types and subtypes.
Even C# and Java define equality between all types ("5".Equals(5) is false, not an error).
Still, I wasn't trying to defend dynamic typing over static typing, just pointing out how to conceptualize the dynamic language decision in terms of type theory, to point out that it's not surprising.
The author is right, the answer should be a type error. If you want to compare them as ints, convert the string to an int first (and deal with the potential failure in the case that the string does not represent an integer). If you want to compare them as strings, convert the int to a string first. There are multiple approaches to transform this invalid comparison into a valid comparison, choose one. Hint: prefer the one that most closely aligns with the abstract meaning of the domain.
A point that other commenters brought up is what to do in the case of user input. Lets assume it's unreasonable to expect the user to select a type explicitly. In this case I'd say there are actually three distinct types: int, string, unknown-currently-represented-as-a-string. This doesn't solve the problem of needing to select a type conversion, it just makes it more obvious that a type conversion must occur.
This reminds me of Go's const numbers which are actually represented as abstract numbers with arbitrary precision during compilation and only get assigned a concrete value once its assigned to a variable that has a concrete type. This allows, for example, to have an expression using Pi which can be assigned to both float and double that utilizes each type's maximum precision.
> This reminds me of Go's const numbers which are actually represented as abstract numbers with arbitrary precision during compilation and only get assigned a concrete value once its assigned to a variable that has a concrete type.
Haskell does something similar. Literal integers and decimal fractions are translated into expressions like "fromInteger (12345 :: Integer)" for "12345" or "fromRational (314159 % 100000 :: Ratio Integer)" for "3.14159" (where the % operator, defined in Data.Ratio, constructs a Ratio value from a numerator and denominator). The inputs have arbitrary precision, and the concrete type of the expression is resolved through the normal type unification process.
IMHO the static type error, runtime type error, and always-false interpretations all have some merit, though naturally static typing without any implicit conversion has the best chance of detecting mistakes early on. The only one I would reject as obviously invalid is the one that says that "5"==5 is true.
Brendan Eich created the language in 10 days. Think about the billions of dollars and billions of man-hours that have spent related to a language not designed and slowly gained userbase, or scholarly, but a guy with a job. One guy. One guy in 10 days decided the fate of the internet. That's crazy.
Tcl isn't quite the same, since "5" and 5 are literally the same exact value in it, in all respects - not just equality. It's more of a problem in languages where, at the same time, 5+5==10 but "5"+"5"=="55".
In php, this is also true or false if you have strict types on.
I’d argue that there are some valid use cases for “5” and 5 to be equal. For example, if you take user input where on an admin screen for a user ID or username, you don’t know if the type is a string or number. Another common example is when you don’t have a known encoding for a packet yet, so you treat it as a binary string. In these cases, it’s easier to grok, as a human what is expected.
On the other hand, what about "V" == 5? Or one of the literally dozens of unicode characters representing the number five in different writing systems? Maybe even "five" == 5?
Accepting both a numerical user id and a username string is a very valid scenario. However, I believe you should handle that by explicitly trying to parse it as a number, not hoping that automatic coercion will do the right thing. It's just a footgun waiting to go off.
> Accepting both a numerical user id and a username string is a very valid scenario. However, I believe you should handle that by explicitly trying to parse it as a number, not hoping that automatic coercion will do the right thing. It's just a footgun waiting to go off.
This is why many people recommend questioning the use of basic or naked types. Details vary between the language, but this should be a type UserID with a UserID::from(String) and maybe a UserID::from_symbolic_name(String) and everything but some edge code should work on UserIDs. This makes code easier to read, too.
PHP also has a nice thing I wish more languages had, which stops you shooting yourself in the foot due to this - the concatentaion operator (.) is different from addition (+).
The reason other languages don't have that is very few languages will implicitly convert that "5" to a number. Instead other languages generally only have the implicit convert to string as it's considered less magical.
> if that string can be a number, that's obviously what was intended
Maybe that would be a reasonable assumption in a world where programmers were omniscient and never made mistakes. In practice these "do what I mean" rules just tend to make it easier for poorly-specified programs to produce nonsense without signaling that anything is wrong. If your program intends to add two numbers and one of the inputs turns out to be a string then someone gave you bad input; the right solution to that is to report an error, not continue with bad data. Perhaps it's only an accident that the string happens to look like a number this time. Maybe it wasn't intended to be a number at all, and some parameters got swapped or an input file was corrupted.
Also, that rule is too broad. The string "5" could be treated as a number, sure, but so could "five" or "⑨" or "¾". So why is "five" + "⑨¾" an error and not 14.75 (or even "14¾")? If you want to accept number-like strings as numbers, fine, but there should be an explicit process where the programmer defines exactly what form the input can take and how to perform the conversion to a number. Implicit conversions skip that step and expose the internal defaults of the language to the end user.
If you could do relative comparisons like "5" < 5 and get a boolean answer, that would be surprising. I don't see anything controversial, however, about unequivocally reporting that objects of different types are unequal. In Lua anything can be tested for (in)equality but asking whether a string is greater than or less than a number is an error, as is performing an ordered comparison involving any other type(s).
My favorite language, C#, has some unfortunate history. The == operator when dealing with references indicates reference equality (Object.ReferenceEquals), but can be overridden to indicate value equality (Object.Equals) even with reference types. Then, when dealing with value types, == suddenly means value equality. It would've been better to define two separate operators for reference and value equality. This would've helped make usages self-documenting.