
Equality Is Hard - ohjeez
https://www.craigstuntz.com/posts/2020-03-09-equality-is-hard.html
======
lazulicurio
As much as null is considered the "billion dollar mistake", personally, I
think that Java giving Object a virtual equals() and hashCode() was an even
bigger mistake. NPEs blow up loudly in your face and are fairly easy to lint
for. By comparison, bad equals() and hashCode() implementations can cause
difficult-to-track-down errors. Unfortunately, this design was inherited by C#
and other OO languages. C# does have the IEqualityComparer interface, which
helps in some cases, but many containers and other framework methods rely on
an overridden instance equality method.

~~~
mjevans
I agree. It might make sense to use something like memcmp() on structs to
assert that they are exactly the same, or to have a dedicated comparison
function that makes sure they are functionally equivalent for some defined
need.

However Objects should have only the second and it might also need to be (I
think replacing a function pointer might be) an over-loaded function if
something has extended a base object type. I'm not even actually positive that
could work if there are two extensions to a base type.

~~~
usrnm
> It might make sense to use something like memcmp() on structs to assert that
> they are exactly the same

No, actually, it doesn't make sense in general. Due to padding, structs may
have arbitrary meaningless bytes between the actual data fields, and memcmp
will break on those. It works because structs are often filled with zeroes on
initialization, but it isn't required or enforced in any way. I was burnt by
this problem more than once, not really fun to debug

~~~
Gibbon1
Fun, create a struct on the stack. Set the elements individually. Padding is
now filled with garbage. Any elements you forget to set will also be garbage.
When you copy the struct the garbage goes with it.

I've been burnt too.

------
dvt
> First, programming languages should make it simple to create types where
> equality comparison is disabled because it makes no sense

This is kind of a silly "solution" as some languages don't even have types,
others have "soft" types, and others support arbitrary typecasting. (Not even
going to get into duck typing, union types, etc. Type theory is _hard_.)

> Programming languages must make the difference between structural equality
> and reference equality crystal clear

This is probably intractable. Author doesn't even touch cases where you have
objects that contain other objects. Are these references or values? How does
the equality operator work there? How deep does the rabbit hole go?

If you _only_ deal with values, then you're probably crippling performance.
There are reasons for these design concessions, it's unfair to assume that
people that designed these languages are complete doofuses.

> One might look at the length of this post and say, “Wow, equality is really
> complicated! I’m going to give up coding and become a soybean farmer.”

Farming is magnitudes harder than what the average programmer does day-to-day
:)

~~~
lmm
> This is kind of a silly "solution" as some languages don't even have types,
> others have "soft" types, and others support arbitrary typecasting.

Well, more fool those languages. It's a good solution, and if your language
can't offer an equivalent, take it up with your language's maintainers.

> This is probably intractable. Author doesn't even touch cases where you have
> objects that contain other objects. Are these references or values?

Let's not declare it intractable before we've actually tried. The language
would need to draw a clear distinction between (possibly compound) values and
objects-with-identity - like the distinction between struct and class in C#,
or between case class and class in Scala - and ripple that distinction all the
way up; values should only be allowed to contain other values, should be
immutable, and should be able to be treated in a generic way (i.e. a record
system). Objects-with-identity should be rarer and heavily encapsulated,
almost like Erlang actors; the simplest example is a reference cell (i.e. a
mutable thing holding a value of a particular type that can be get and set).
Doesn't sound like such a bad language design.

~~~
dvt
I'd have to think about this more, but I think your purported solution (mostly
immutable record-type objects) would tax performance and memory constraints.

~~~
lmm
Haskell already works that way, and is a high-performance language. (The same
is true of strict ML-family languages, so laziness is not required to make
this practical). Those languages generally just don't bother with reference
comparisons at all, but one could imagine a more hybrid language that included
support for traditional OO - in fact idiomatic Scala works like this (and
again is a high-performance language, though to some extent that relies on
what's probably the best garbage collector in the world). I don't know what
kind of support Erlang has for making comparisons between actor identities,
but that might be an even better example.

------
jeffadotio
I really like this about Rust, implementing equality is just another trait.
The same is true of comparison operators. If you don’t want a type to be able
to be compared by == you can make it always evaluate to false or even panic
and give a message that the type cannot be used in that way. That might
aggravate coders who do not test comprehensively but it is an option.

Edit: This is a response to the concerns of the article. For general info on
this it’s a great introduction to Rust’s awesome documentation.

[https://doc.rust-lang.org/std/cmp/trait.Eq.html](https://doc.rust-
lang.org/std/cmp/trait.Eq.html)

~~~
tus88
You mean there is no compile time way to accomplish that? Doesn't sound very
Rust-like.

~~~
re
It is determined at compile time; I'm not sure why GP is suggesting returning
false if you don't want types to be compared. By default structs are not
comparable. If you opt-in to the standard PartialEq implementation using the
`derive` annotation[1], it will only allow a struct to be compared for
equality with other structs of the same type. You can choose to add
implementations to compare against other types[2].

[1] [https://doc.rust-lang.org/book/ch05-02-example-
structs.html#...](https://doc.rust-lang.org/book/ch05-02-example-
structs.html#adding-useful-functionality-with-derived-traits)

[2] [https://doc.rust-
lang.org/std/cmp/trait.PartialEq.html#how-c...](https://doc.rust-
lang.org/std/cmp/trait.PartialEq.html#how-can-i-compare-two-different-types)

~~~
artursapek
Rust is such a delightful language.

------
cperciva
To correct the opening statement: The two hard problems in computer science
are cache invalidation, naming things, and off-by-one errors.

~~~
aaron695
The [1] is worth a read -

[https://skeptics.stackexchange.com/questions/19836/has-
phil-...](https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-
ever-said-there-are-only-two-hard-things-in-computer-science)

------
smallnamespace
I’m surprised the article discusses extensional equality for functions, then
doesn’t mention why languages don’t try to derive it: it’s equivalent to
solving the Halting Problem (per Rice’s Theorem).

~~~
guerrilla
That's only for Turing complete languages. It's not true for total functions.
Total functions can be written in Coq, Agda and Idris among others.

~~~
TheAsprngHacker
Funnily, in programming languages based on Martin Lof Type Theory, the topic
of function extensionality is a whole rabbit hole for reasons other than the
halting problem.

There is a distinction between intensional Martin Lof type theory and
extensional Martin Lof type theory, which differ in their treatments of
equality. First, one must distinguish between judgmental equality x = y, which
is a statement in the mathematical metalanguage that two terms are equal, and
propositional equality, which "internalizes" judgmental equality within the
programming language through the equality type Id(A, x, y), the type of proofs
that x and y are judgmentally equal. In extensional type theory, judgmental
equality also follows from propositional equality, and together with the eta-
conversion rule for functions, function extensionality is provable [0].
However, typechecking extensional type theory is undecidable [1].

Alternatively, in Homotopy Type Theory (HoTT), supports function
extensionality is provable from the univalence axiom, which states that
isomorphic types are equal [2]. Function extensionality is trivially provable
in Cubical Agda [3], which gives a computational interpretation of HoTT's
notion of equality.

[0]
[https://cstheory.stackexchange.com/questions/46331/extension...](https://cstheory.stackexchange.com/questions/46331/extensional-
type-theory-and-function-extensionality/46342#46342)

[1]
[https://cs.stackexchange.com/a/112559](https://cs.stackexchange.com/a/112559)

[2]
[https://ncatlab.org/nlab/show/function+extensionality#relati...](https://ncatlab.org/nlab/show/function+extensionality#relation_to_the_univalence_axiom)

[3]
[https://agda.readthedocs.io/en/v2.6.0.1/language/cubical.htm...](https://agda.readthedocs.io/en/v2.6.0.1/language/cubical.html)

~~~
guerrilla
Yeah, I didn't mean to insinuate that function extensionality is provable or
testable in those languages, just that one can ensure that one is actually
writing total functions today. I'm sure the same could be done in a simple or
polymorphic type theory, I'm just not aware of one. Is there a Haskell
extension like Idris's totality checking?

I was gonna say, just wait until the OP hears about HoTT.

Thanks though, I didn't know Cubical Agda exists.

~~~
ghostwriter
> Is there a Haskell extension like Idris's totality checking?

There's LiquidHaskell [https://ucsd-progsys.github.io/liquidhaskell-
blog/](https://ucsd-progsys.github.io/liquidhaskell-blog/)

------
earthboundkid
If no mainstream programming languages implement equals like math—an obvious
and well known example throughout the history of computing—maybe it’s because
that’s a bad idea.

~~~
rossdavidh
Or really hard, when you're communicating with a computer and not another
human. Lots of concepts that are easy to communicate to a human, are difficult
to communicate to a computer, not just equality.

It is a good point, though, that if virtually EVERY mainstream language gets
it wrong, when every single one of the people creating those languages knew
about the math =, there has to be a reason.

~~~
dajohnson89
and what might that reason be?

~~~
TheOtherHobbes
It depends what the meaning of "is" is.

~~~
earthboundkid
Philosophers distinguish three senses of "is":

\- Identity: "Clark Kent is Superman." Anything true of Superman is true of
Clark Kent and vice versa, since these are two names for the same thing. \-
Predication: "Superman's cape is red." Red is a quality of the cape. It's not
reflexive because not all red things are capes. \- Existence: "There is no
Superman." In English, existence sentences usually begin with "there is" or
more poetically, you say "Superman is ~not~." Existence is not a relation.

In the case of Clinton, he was being evasive about "is" versus "was" in the
time dimension. There was a sexual relationship between him and Lewinski, but
he called there "is" no relationship in general.

------
marcus_holmes
I know that opening line as "there are two hard problems in computer science:
cache invalidation, naming things, and off-by-one errors"

~~~
capableweb
I'm pretty sure the original statement is just with "cache invalidation" and
"naming things", and later the "off-by-one errors" was added, as when I first
learned about this quote, it was never with the "off-by-one errors" parts but
nowadays, seems to always include the three of them.

------
bovermyer
I have to admit that I expected an article on a different type of equality
before I clicked on the link.

~~~
AnimalMuppet
I, uh, equally expected that.

------
jerome-jh
Is reference equality ever useful in itself, in a high level language?

Of course reference equality is a straightforward optimization for structural
equality, as mentioned in the post. But it is not required to make it visible
to the programmer.

Reference equally is mandatory for the garbage collector. Again it may not be
visible to the programmer.

Finally there is the case of two data structures sharing a reference to a
single instance of some data. Usually that is a bad idea if the pointed to
data is mutable, and often makes the program leaky.

The paper reads: "So reference equality is a sensible default in a language
which supports mutation.". Is it not a bad reason to support reference
equality?

------
ldeangelis
> _Never use == in JavaScript. Use === instead._

Or better, know what == [0] and === [1] do by looking at the ECMAScript
reference

[0]: [https://www.ecma-
international.org/ecma-262/10.0/index.html#...](https://www.ecma-
international.org/ecma-262/10.0/index.html#sec-abstract-equality-comparison)

[1]: [https://www.ecma-
international.org/ecma-262/10.0/index.html#...](https://www.ecma-
international.org/ecma-262/10.0/index.html#sec-strict-equality-comparison)

------
vadiml
Hm...

let john = { Name = "John"; Age = 15; Offspring = [] }; let jane = { Name =
"Jane"; Age = 47; Offspring = [ john ] } let sue = { Name = "Sue"; Age = 35;
Offspring = [ john ] }

Not very realistic :)

------
dnautics
I believe the erlang VM and the languages on top of it get all four conditions
correct (not counting rebinding a variable to a new value, which you can do in
some BEAM languages)

