
ValueObject - ZainRiz
http://martinfowler.com/bliki/ValueObject.html
======
edejong
Equality is an often overlooked complexity in denotational semantics. Dijkstra
wrote an EWD [1] on the subject, which should give insight into the
difficulties of this presumably simple relation.

In most languages, I prefer to think of equality as a property outside of
structure or class. Languages such as Java bind equals() strongly with the
object, but this is a mistake. First of all, it promotes ordering where there
is none. Equality is reflexive and symmetric. Second of all, it does not allow
for multiple forms of equality. Lastly, since it is defined in Object, it
forces the equality relation to all underlying types, even when there is no
equality defined.

[1]
[https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/E...](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1073.html)

~~~
jstimpfle
Multiple forms of equality: I think there should be just one "equality": the
obvious structural one, which can be inferred by a compiler. What examples are
there of cases where there is no obvious equality but still the official
"equality" tag is needed?

EDIT: got downvoted, why? It's a serious question. I've never had any use for
any equality besides the structural one. Honestly want to see if anyone can
come up with a valid case.

~~~
Coincoin
Is the angle -90 equal to 270?

Is the quotient 1/2 equal to 3/6

Is the string "Bébé" equal to "Bebe"?

Is the float 0.555555559 equal to 0.6?

Is the color RGB(1,1,1) equal to Lab(100,0,0)?

Is the equation y=3x+2 equal to y-2=3x?

Is the function x=>2+x equal to y=>2+y?

~~~
jstimpfle
> Is the angle -90 equal to 270?

> Is the quotient 1/2 equal to 3/6?

I think these are good examples that come up in the real world a lot. The
representation structures we use are not sufficient for precise specification.
So we have to overspecify (making a superfluous choice in representation, -90,
270, 630?).

My opinion still is that data must be normalized with explicit procedure
calls. Doing stuff implicitly is neither beneficient to efficiency nor to
comprehensibility. Explicit normalization allows to leverage what the compiler
_can_ infer.

The other questions I have to return because it's not even clear to me, "
_Should_ they be"?

~~~
dllthomas
> My opinion still is that data must be normalized with explicit procedure
> calls. Doing stuff implicitly is neither beneficient to efficiency nor to
> comprehensibility.

It can certainly benefit comprehensibility. Needing to explicitly normalize
before I can apply standard comparisons leaves more space for things to go
wrong. "Having a value of this type means that I have a normalized value"
means I have fewer things to keep track of. (I do note that there is
significant space between "it can benefit comprehensibility" and "it will
always benefit comprehensibility" \- that depends on what I need to be readily
able to pull out of the implementation from a quick read of surface syntax)

Also, some things may have an easy equality check while normalizing may be
expensive or impossible - although I sadly can't think of any examples off the
top of my head.

------
jstimpfle
Fairly long post with a simple message: Data is the truth, it is useful to
think about the values that live in the program, and about their relations. It
is useful to think about what things are "objects" and thus need identities.

This is not obvious to OOP people. In OOP everything (each object) has implied
identity: their address in memory. This is brutally wrong for most use cases
in most programming domains. What usecase justifies _new Integer(3) != new
Integer(3)_? Most languages support overriding an equality method as a partial
fix, but that's far from elegant.

On the other hand it should be fairly obvious to people who like relational
databases.

~~~
lacampbell
> This is not obvious to OOP people.

Citation needed!!

> In OOP everything (each object) has implied identity: their address in
> memory.

That's not an implied identity at all. That's a default identity. You're meant
to override it. What sane default would you propose? Objects are not structs
or records.

> This is brutally wrong for most use cases in most programming domains. What
> usecase justifies new Integer(3) != new Integer(3)?

What OOP language provides numeric literals that don't have a built in integer
class, with an equality method that makes sense!? Complete strawman.

> Most languages support overriding an equality method as a partial fix, but
> that's far from elegant.

A partial fix? It is how you solve the problem with objects. I am not sure
what other method you would propose.

~~~
jstimpfle
It is an identity you can't get rid of, and which is often misused, or
accidentally used, to create a lot of complexity. No, you can't override '=='
in Java and other OO languages. You can override .equals() which defaults to
==.

> A partial fix? It is how you solve the problem with objects. I am not sure
> what other method you would propose.

You mentioned it yourself: structs or records, a.k.a value semantics. Here is
a resource from people with more authority (esp. Guy Steele)
[http://cr.openjdk.java.net/~jrose/values/values-0.html](http://cr.openjdk.java.net/~jrose/values/values-0.html)

~~~
lacampbell
> It is an identity you can't get rid of, and which is often misused, or
> accidentally used, to create a lot of complexity. No, you can't override
> '==' in Java and other OO languages. You can override .equals() which
> defaults to ==.

Using Java as an example to prove something about OO isn't very useful IMO.
It'd be like holding up rust as an example for how functional languages do
things - yes, it's fairly functional but hardly canonical.

In ruby, for example, you can indeed override ==.

> You mentioned it yourself: structs or records, a.k.a value semantics. Here
> is a resource from people with more authority (esp. Guy Steele)
> [http://cr.openjdk.java.net/~jrose/values/values-0.html](http://cr.openjdk.java.net/~jrose/values/values-0.html)

I think you're conflating structural equality on the one hand, and reference
semantics on the other. The two aren't related in real OO. For example, here's
a class that defines equality the same way an ML record might:

    
    
      class Value
        def ==(other)
          if other.class != self.class then 
            return false
          end
          
          self.instance_variables.each do |v|
            if instance_variable_get(v) != other.instance_variable_get(v) then
              return false
            end
          end
          
          return true    
        end
      end
    

Then we can use it to define a simple point class:

    
    
      class Point < Value
        def initialize(x, y)
          @x, @y = x, y
        end
      end
    

And there you go - structural equality, but still passed as references.

    
    
      Point.new(0, 0) == Point.new(1, 2) => false
      Point.new(0, 0) == Point.new(0, 0) => true

------
kazinator
TXR Lisp:

    
    
      This is the TXR Lisp interactive listener of TXR 157.
      Use the :quit command or type Ctrl-D on empty line to exit.
      1> (defstruct point nil
           x y
           (:method equal (me) (list me.x me.y)))
      #<struct-type point>
      2> (equal (new point x 0 y 0) (new point x 1 y 1))
      nil
      3> (equal (new point x 0 y 0) (new point x 0 y 0))
      t
      4> (hash-equal (new point x 0 y 0))
      536869888
      5> (hash-equal (new point x 0 y 0))
      536869888
      6> (hash-equal (list 0 0))
      536869888
    

See what I did there? When designing this object system, I realized that
rather than a binary equal method which takes this object and that, what we
need is a _unary_ equal method. When an object must be compared _or hashed_
for equality under the equal function, and it has the equal method, then that
method is called to retrieve a representative object. That object is then used
in place of the original object. The method just has to return something which
supports equal directly.

This feature is called _equality substitution_.

Doc link: [http://www.nongnu.org/txr/txr-
manpage.html#N-00790C76](http://www.nongnu.org/txr/txr-
manpage.html#N-00790C76)

~~~
jstimpfle
That will be terribly inefficient for asserting inequality of large objects.

~~~
kazinator
So: don't write huge objects---it's a code smelly anti-pattern anyway; or
don't make every single member of a large object participate in equality; or
else, cache the equality substitution and re-compute it if the object is
modified.

I'm adding a dirty flag to the object system in the next release. If any slot
is modified, the object will be marked dirty; an API will be provided for
testing and clearing the dirty flag. This will make it a cinch to write the
above point class so that it returns the same list if the object is clean.
Also, when the object is dirty and the representative list must be recomputed,
the cons cells of the old list can be re-used; we don't have to allocate a new
list.

Most of the time, you don't want that anyway; objects should be immutable as
much as possible. If you have objects in a hash table, you don't want to
fiddle with them in ways that affect their equality.

Speaking of which, this is one of the ways in which the equality substitution
is used in real code. Certain slots of the objects are treated as immutable
and the equality substitution is based on those slots. The objects are put
into hash tables based on that equality. Yet they have other slots treated as
mutable. Those don't count toward equality. If they did, the hashing would be
in trouble, obviously.

~~~
jstimpfle
> So: don't write huge objects---it's a code smelly anti-pattern anyway

I tend to agree. But even for structures of only say, 4, primitives, it's
still faster to only compare two primitives in the common case, instead of
hashing 8 and comparing two hashes.

In the same way it could also be argued that needing hash functions for
testing equality is a code smell. Equality makes only sense for small clean
data structures of primitives. Equality for those should be inferred from the
compiler by the primitives' equalities.

Also, how can hash collisions be avoided?

------
Traubenfuchs
Please avoid hurting Javas reputation by using outdated code.

1\. Since Java 8 there are LocalDateTime/ZonedDateTime/etc. classes. They are
immutable. LocalDateTime even works with Hibernate.

2\. date.setDate(int date) is deprecated for nearly 20 years. The methods
Javadoc clearly states that you are changing the object. The method even
returns void. Don't use it. Why are you still using it?

3\. If you still want to use the less comfortable Date and Calendar, they work
fine if you use them as you are supposed to.

~~~
talideon
He does actually refer to this, but he could be more explicit on point (2)
rather than referring to it obliquely.

------
GuiA
_> In many situations using references rather than values makes sense. If I'm
loading and manipulating a bunch of sales orders, it makes sense to load each
order into a single place. If I then need to see if the Alice's latest order
is in the next delivery, I can take the memory reference, or identity, of
Alice's order and see if that reference is in the list of orders in the
delivery._

Presumably, an order would consist of much more than a list of items (order #,
delivery address, etc.), including a unique one (order #), so comparing by
value here would still work?

If you're properly building your hierarchies, I'm not sure I see cases where
comparing by reference would logically be desirable (of course, in the real
world you probably want to still compare by reference in cases where
performance might matter, or you're doing lower level work).

I guess this isn't disagreeing with the author's main point - that Value
Objects are desirable - but pushing it further: Value Objects are desirable
the vast majority of the time.

The second part, regarding immutability, I'm fully onboard with.

------
chrismorgan
I like Rust.

`==` is implemented by types as they desire and as makes sense, via the
PartialEq trait (and the type can mark whether it’s a partial equality or a
total equality by whether Eq is also implemented).

Because of Rust’s strong ownership model, the whole referential equality
matter actually becomes comparatively irrelevant, though if you actually need
it you can cast to raw pointers as compare those (`x as * const _ == y as
*const _`).

This strong ownership model also means that the whole aliasing problem becomes
irrelevant also: it’s obvious from the code where you alias things, and you
can only do so in a memory-safe way.

Really, this whole article becomes delightfully irrelevant for Rust.

------
verytrivial
I have always used "value" vs "identity" to distinguish these classes. e.g.
"Two men called John" (value, equal, no identity) "That man called John, and
that man called John." (with identity, so "non-identical", buy may have the
same value is some aspect). Every comparison must first be distinguished by
whether you are comparing identity or some value of the object. With that
distinction made, the arguments about referential integrity (and when it
matters) is easier to digest. Pretty obvious I guess, but coming from C++, it
is amazing how many people fumble through without having this distinction
clear.

~~~
jstimpfle
I've realized that there's seldom an obvious "identity", because identity is
never absolute, but needs context ("semantics"). Mathematically identity is
just a function (of some key data to more data) but in practice part of the
key is implied (in the program, runtime, database connection, external
policies, whatever).

For example, conventional URLs are often frowned upon because they don't
"exactly" identify a document. Instead URLs involving hashsums, GUIDs etc. are
proposed. I think that's a bad idea because while the hash sum is a quite
precise identity, it's actually often _too precise_. I _want_ to be able to
edit a web document without having to create a new identity. When someone
surfs the URL and gets the latest "version" that's fine. We could consider all
versions "identical" for our purposes. Or alternatively we could say that some
of the identity was implied in the context, not the URL (the "get the newest"
part).

~~~
verytrivial
Hmm .. I think you might have that the wrong way around. URLs do exactly
_identify_ a document in the sense that the document that is returned _is_ the
document in question. That you might get a different document each time is a
different matter and relates, again, to value, not identity. (Then there's
idempotency which is a useful concept that only appears to causes eyeballs to
roll. Even the top in in Google is snarky.) And identity and value are both
arbitrary -- John today is John tomorrow, but even that will also cause
philosophers to twitch. I guess it not that surprising that programming
languages and the like encode the same confusion!

~~~
jstimpfle
The confusion only goes to show that there is no one obvious definition.
That's the beauty of the relational model. John is John, but what it can
identify is context-dependent (a string, a human in a group of humans, a human
in a group of humans in time?).

------
sly010
Other good examples of value objects are:

\- EmailAddress

\- PhoneNumber

\- FullName

\- Address

Perhaps it's just my domain, but I find myself very often defining equality to
various degrees for these concepts.

In fact I create value classes for any value that can have multiple valid
encodings (e.g. keys and hashes in crypto, dollar value, etc). It makes it
very easy to have all encoding/decoding code in one obvious place (in or near
the object code). Encoding and decoding can then happen at the edges, and the
core of the logic becomes much more readable ... but this gets into the Domain
Object territory.

------
wwwigham
Let me attempt to improve upon Fowler's ValueObject pattern in JS, and tell
you the upsides:

    
    
        function Point(x, y) {
            const hash = Point.__hash(x, y);
            const cached = Point.__cache.get(hash);
            if (cached) return cached;
            const newPoint = {get x() { return x; }, get y() { return y; } };
            Point.__cache.set(hash, newPoint);
            return newPoint;
        }
        Point.__hash = function(ptx, pty) {
            return `${ptx}|${pty}`;
        }
        Point.__cache = new WeakMap();
    

This does what you would expect a ValueObject (or an algebraic type or struct
depending on the languages you normally use) to do in another non-JS language
- it interns all equivalent objects so that simple reference equality is
sufficient to determine equality - rendering the nonstandard "equals" method
unneeded. This also solves all the issues with ".includes" and so forth,
again, because the built in reference equality is sufficient to determine
equality. This also has much better memory characteristics in a system where
many equivalent objects are created, as only one copy is ever stored in
memory. The only drawback is the small overhead of the WeakMap used to cache
all the references and the overhead of "hashing" at object creation time -
neither of which should be noticeable in most applications, and the memory
benefits should outweigh these concerns in most performance applications
regardless.

~~~
phpnode
Unfortunately your example won't work. You can't use a primitive value as a
key in a weak map, which negates many of their possible use cases.

~~~
wwwigham
You're right, I hadn't realized that the spec forbade primitives as WeakMap
keys - so while to approach should actually work in other languages, in JS it
would leak memory as you'd be forced to use a normal map (or clean up after
yourself - ich).

------
dmalvarado
> const p1 = {x: 2, y: 3};

> const p2 = {x: 2, y: 3};

> assert.notEqual(p1,p2); // NOT what I want

> Sadly that test passes. It does so because JavaScript tests equality for js
> objects by looking at their references, ignoring the values they contain.

What? Has 'assert' been standardized?

Shouldn't it read: "It does so because my assert function tests equality for
js objects by looking at their references, ignoring the values they contain."

~~~
mcbits
I think his point would be the same (and maybe clearer) if he said
assert.isFalse(p1 == p2).

I.e. he's not concerned about the testing semantics, but JavaScript's equality
semantics which we're expected to assume the test function is using.

------
hprotagonist
this sounds suspiciously like "labeled products are good", which seems fairly
intuitive.

easy-ish to do in statically typed languages, but something like attrs
([https://attrs.readthedocs.io/en/stable/](https://attrs.readthedocs.io/en/stable/)
) or just namedtuples will do this in python.

------
bertan
One of the reasons I love Golang is that comparison of structs is among their
values, not their references.

To do so, compiler complains if you have circular struct type dependencies,
but I think it is OK.

------
ronreiter
Reminds me of Python articles from 2005

------
maxxxxx
Does anybody get anything from his writing? I have seen several posts of his
here and they all are either trivial and/or give a fancy name to some standard
coding construct. I think what he is describing here is the advantages of
immutability which anyone who has heard only a little about FP knows already.

~~~
jdlshore
This isn't about immutability, this is about the power of introducing value
objects to a system. (Immutability is just a property of good value objects.)

Most code I read does _not_ use value objects. It's more common for me to see
primitive obsession [1], data clumps [2], arbitrary structs, or the "struct +
service" antipattern (looking at you, Angular 1). Granted, I'm not working
with FP languages.

When you migrate code to use value objects, then follow up by moving
associated behavior into the value object, there are significant knock-on
benefits to your design.

Value objects may be simple, but they're not obvious. If they were, more code
would use them.

[1]
[http://www.jamesshore.com/Blog/PrimitiveObsession.html](http://www.jamesshore.com/Blog/PrimitiveObsession.html)

[2]
[http://www.martinfowler.com/bliki/DataClump.html](http://www.martinfowler.com/bliki/DataClump.html)

~~~
lorddoig
> Immutability is just a property of good value objects.

It sounds rather like value objects are strictly superior than immutable data
structures, based on this statement. I'd love to hear an expansion of this
rationale.

~~~
int_19h
It's actually the other way around. If your object is immutable, whether it
has identity or not (i.e. whether it's a "value object" or not) doesn't
matter! There's nothing useful you could derive from that identity, so it
might as well not exist.

So, languages should stop thinking in terms of "this is a reference type" and
"this is a value type", and start thinking about "this references something
mutable" vs "this references something immutable". For the latter, the
implementation can then use by-value semantics for perf reasons, where
appropriate.

~~~
comex
Not sure what you mean. Identity comparisons are more useful for mutable
objects, but even an immutable object can use one as an alternative to a
unique ID. For example, you might have a queue of input events, where once an
event is posted nothing about it needs to change, but multiple events with the
same properties are possible (e.g. pressing the same key twice in a row).

On the other hand, mutable objects with value semantics can provide the
ergonomic advantage that mutation has in some cases (e.g. 'point.y += 10'
rather than 'point = Point(point.x, point.y + 10)' as well as more predictable
performance, while avoiding bugs caused by accidental aliasing.

~~~
int_19h
> Identity comparisons are more useful for mutable objects, but even an
> immutable object can use one as an alternative to a unique ID.

I think that's part of a problem. Object identity should not be used for
unique IDs - when you need a unique ID, use a UniqueIdGenerator or something
like that.

Java and C# and others have got this so very wrong, when they did things like,
"okay, all objects have identity, let's just reuse it for other stuff", and
made it possible to e.g. synchronize on arbitrary objects - synchronize(x) in
Java, lock(x) in C#. Now they can't get rid of object identity, because it's
part of the language semantics - even if you never synchronize on objects of
your class, something else might, and making them identity-less will break
that.

At the very least, let's make it explicit. When you define a class, it should
be stated upfront whether it has identity or not. If it doesn't have identity,
it should be immutable. If it does have identity, it still _can_ be immutable
if you want (if you need that identity for something, like your event
example). But this state of affairs - immutable with identity - definitely
shouldn't be _normal_. If it's needed, it should be requested explicitly.

I guess the real point here is that object identity has a heavier cost than it
would seem from the first glance, and so it should be opt-in rather than opt-
out (and it should definitely be possible to opt out).

As for `point.x += 10`, it doesn't preclude point from being immutable. It
just means that the language has to desugar it into `point = Point(point.x,
point.y + 10)` for you. That way, there's no accidental aliasing (since
objects are still immutable - it's the reference that is mutating - so no
alias can observe the change). And then the code generator, knowing that
there's no aliasing, can replace it with actual in-place field update, when
and where it makes sense.

