
Invalid Object Is an Anti-Pattern - mjbellantoni
http://solnic.eu/2015/12/28/invalid-object-is-an-anti-pattern.html
======
grandalf
While this article hits on a very true aspect of validation, I think the real
problem that needs to be solved is a bit deeper:

There are really four levels of validity that we care about:

1) absolutely invalid data that indicates an unanticipated system error

2) invalid data that the caller (or user) must fix.

3) invalid data that the caller/user can optionally fix but that we wish to
warn about (via UX or in the log)

4) invalid data that is optional and so can be made valid by deleting it.

I have not yet seen a system based on types/typing that elegantly captures the
four cases above. ActiveRecord validations feels like an antipattern because
it is in some ways weaker than typing, but in some cases more flexible.

So we typically resort to ad-hoc approaches that are essentially custom code
doing the same thing in many projects. ActiveRecord validations removes
boilerplate and makes it easy to give user feedback for simple flat object
structures with predictable semantics and low probability of logic errors.
This changes with nested structures.

I wonder if it would be possible to do something along the lines of the Maybe
monad to handle scenarios 1-4 above via pattern matching...

~~~
solnic
This is exactly what I'm trying to solve with both dry-validation and dry-
data.

I wouldn't agree with the low probability of logic errors in AR validations
though. I think a DSL with lots of options is a recipe for logic errors. This
is the reason why I decided to use predicate logic as the base for dry-
validation. It supports nested structures easily, it also supports defining
high level rules that rely on results from low-level rules (typically type
checking with additional constraints, who would have thought ;)).

I'd love to know more about your idea for using Maybe monad in validations.
Using pattern matching crossed my mind more than once but so far I haven't
seen a real need for it (in dry-validation).

This is an awesome comment btw. These 4 points describes this problem in a
very clear way. It's such a fundamental problem and I believe people should
have a good understanding of what it involves.

There's one thing I'd add to this - coercion logic. It's very common that you
need to coerce values coming from "the outside", and there are different rules
for coercions depending on the context. That's why in dry-data you have
different coercible type categories and "form" category is dedicated for web
forms.

~~~
encoderer
I think the coercion you talk about is a mapping that should take place
outside the object. I would love to see a validation library like yours made
for Django.

------
dasil003
A lot of things in Rails are are anti-patterns in large code bases, but
pragmatic in small ones. ActiveRecord itself is a prime example: when you
start an app, putting your business logic directly in ActiveRecord objects
works pretty well in most cases, but later on as the models proliferate and
grow, you realize that some of them contain business logic which is far too
complex to warrant being conflated with persistence concerns. The result is
difficulty grokking the higher level business logic that crosses database
table boundaries, and potentially slow tests because you can't reliably test
complex logic without hitting the database. Of course it's easy enough to add
a service layer in a Rails app, Ruby is very flexible, but there's no
convention for it so there is a high barrier for making this decision since
you lose the benefits of shared patterns.

Some might argue this is bad and Rails got it wrong not to anticipate this
problem, but I think it's a good decision to avoid incidental complexity for
the 95% of Rails apps that will never grow beyond a certain point. In fact
this is a significant reason Rails was able to get traction: in 2004 Java had
solid, proven solutions to all possible concerns, but the combinatorial
complexity of configuring and making them all play together made simple apps
take 10x as long to get working. It's smarter to ship fast and then refactor
based on real-world feedback than attempt to implement the perfect
architecture before you even know if you're designing the right solution.

~~~
solnic
You are right with everything you wrote except one thing: ship fast and the
refactor is not only often extremely difficult and risky, there are cases
where it becomes almost impossible.

That's one of the reasons why I've been working on libraries that could
provide similar level of convenience and allow you to rapidly prototype
something _but_ with a better foundation where things can be refactored more
easily.

~~~
saneshark
You're being overly modest. You should mention your contributions to the ruby
community with respect to Ruby Object Mapper as an example. I have not used it
in a project, but in any future undertaking I will most certainly give it a
look.

One of the reasons I've started to look more to Elixir and Ecto as ORM is
because issues like immutability come up more and more.

In an ideal world we'd have a dynamic client side querying language like
GraphQL, a simple application layer that maps REST to CRUD without constraints
pushed upon it based on the ORM, and an ORM abstraction that allows for
immutability in our models. Concurrency and race conditions are shouldn't be a
developers concern when working with higher level framework.

------
talles
I'm not a Rails dev (.NET actually) but I have stumbled with this question
before: should I allow invalid objects to lay around my domain?

It may be tempting to validate as soon as the constructor/setter is hit. And
this works great for things like a name length or an email format as the
article points out. The problem with this approach is when you hit a
validation that is complex and/or needs IO. What if there's some info in the
database that would allow or not you to construct the object the way you need?
You would go to the database and check the info in your constructor?

Strike one.

Some validation only makes sense on a specific operation. The object you
constructed, due to some business rule, may be invalid to delete but valid to
update for instance. Feels more natural to validate the object with a context:
is the object valid _to this operation_?

Strike two.

Not to mention the inconvenience of never being able to have invalid entity
for things like integration with other systems and logging. What if you
receive, from a webservice or something, a valid object format-wise but
invalid from the business rule perspective? Being able to deal with an invalid
instance feels more natural, when logging such exceptional behavior, inform
the other system about it, etc. In this scenario of inevitable invalid data is
better to handle it built into one entity rather than fighting against a bag
of properties without any form at all.

Strike three.

~~~
ArtB
> Some validation only makes sense on a specific operation. The object you
> constructed, due to some business rule, may be invalid to delete but valid
> to update for instance.

These are two different concepts I feel you are mixing up. An object may be
valid, but that doesn't mean it's valid for all operations. For example there
is basic validation that a file object is valid if it exists, but if you only
have read access to it .append("foo") will fail on it. But it's still a valid
file. Your constructor should be checking for the first kind of validity, and
your methods for the second.

> What if you receive, from a webservice or something, a valid object format-
> wise but invalid from the business rule perspective?

That is what you need an [anti-corruption
layer]([http://www.markhneedham.com/blog/2009/07/07/domain-driven-
de...](http://www.markhneedham.com/blog/2009/07/07/domain-driven-design-anti-
corruption-layer/)) for. The object makes sense within that other services
domain but not within yours. That domain may be closely related to yours, but
it isn't you. You need a separate object for that (a simple struct ought to
suffice usually). Then you act as a gatekeeper only allowing conforming
objects into your system. Basically filter out the shit, and make sure if it
makes it past your gate it is clean. Otherwise you might want to use the
[state
pattern]([https://en.wikipedia.org/wiki/State_pattern](https://en.wikipedia.org/wiki/State_pattern))
to allow for different validation rules of the same object (eg legacy accounts
might not have an email address but all new ones must).

> What if there's some info in the database that would allow or not you to
> construct the object the way you need? You would go to the database and
> check the info in your constructor?

This is touchy. Sometimes, if it's important enough then yes. Sometimes you
can code around it (eg having case numbers auto-generated at time of
persistence). Other times you bite the bullet. Other times you rearchitect the
data (storing it in memory to do the check quickly). Or as a last resort,
except it as a compromise and start coding up some roll-back functionality.
Aim for the ideal, and know when to step backwards towards the practical: that
is the art of software development.

~~~
talles
I _kinda_ agree with your first remark. I once acted just as you described as
my rule of thumb. But after sometime I found out that deciding if a validation
is bound to the operation or not and if the cost (IO) of doing it first hand
is worth or not isn't black and white as we want, they all have different
tonalities of gray.

As for the second, this anti-corruption layer is dealing with validating the
format of the data (typical deserialization problems such as missing
properties and invalid types) or the actual content? If it's the first it
should be done outside the domain, that's a transport/markup specific thing.
If it's the latter that's the domain job. My problem is when you don't
instantiate the actual entity (that you know it's well formed) to check its
content, dealing with a bag of properties for such thing (that should be done
inside the domain) is awful. The domain should deal with its entities and
nothing else.

The third, from the experience I have, is a big flashy "no no". Our typical
boring OO systems may not be religious as FP is with side-effects, but that
doesn't mean that we shouldn't have a little strive to isolate it. Unlike with
the actual operations, the instantiation of the entities may take place in a
myriad of places. When you mix this with IO side-effects as latency you
contribute to create a little monster that you have to peek and poke to find
out what's going on.

~~~
ArtB
> If it's the latter that's the domain job.

The domain's job is to represent and encapsulate valid transformations in the
domain. If it's not valid within the domain then it doesn't belong in it. If
you get a well-formed XML file that has -7 as a social security number, that
is not something that your domain has to deal with. It should be caught by the
corruption layer. It's not a valid value for your domain. Where I work we
regularily build an import domain that allows users to see all the invalid
data and manually correct it before allowing it into the rest of the system.
Once I have an instance of a Person object I should be able to trust that it
is sane.

> The third, from the experience I have, is a big flashy "no no".

It depends; I'd say it's on a case-by-case. Doing a read can be much more
acceptable than a write. It's all about trade-offs at that point, but the
benefit of knowing "if I have an instance I know that it is sane" means you
don't need to litter your code with guards and that pays dividends in
maintenance and bugs and agility and testability and, yes, even performance. A
big factor is the cost of load. I usually work on low-load (~14 concurrent
users) high-importance systems (eg global pricing management) where
correctness is at a premium and we usually have system resources to spare.
YMMV. As I said: it depends.

~~~
talles
> If you get a well-formed XML file that has -7 as a social security number,
> that is not something that your domain has to deal with.

Agreed. That's a type/format problem. But if, for whatever reason, the domain
should process social security numbers that starts with 9 differently that
should not be outside of the domain by all means. That is your business rule,
you should "trap it" in the domain.

> where correctness is at a premium

The correctness of both are the same. The programming effort and the
performance between those differ, but it's never a correctness trade-off.

\---

But instead of arguing back and forth, let me give you a problem that I had to
deal with before:

Suppose that to have an entity with a state X there must already exist in the
system an entity with state Y. Also, to have an entity with state Y there must
already exist in the system an entity with state X.

How do you solve this deadlock "the chicken or the egg" problem if you never
allow invalid entities to exist?

If you do allow invalid entities it's pretty simple: you instantiate both and
handles both, together, to the domain.

------
abvdasker
That first example is a little silly. Rails validations have little to do with
instantiating objects and are mainly concerned with preventing persisted
invalid state (why they are designed to run before calling "save"). This is
incredibly useful as a safeguard against persisting bad data, which is much
more harmful than simply instantiating an object with bad data. Validating
objects during instantiation sounds great, but it isn't really the purpose of
ActiveRecord's validations.

That said, the second example is very cool/interesting. Anything that makes
type more explicit in Ruby is a move in the right direction.

~~~
benmmurphy
the second example misses what is most important about Rails validations which
is spitting errors (plural is important) out for a form. by using an exception
which goes off when a field assignment is not valid it is restricted to only
giving back a single error.

~~~
solnic
dry-validation supports typical validation behavior known from rails where
errors are gathered and represented in a string representation. In fact it's
much "smarter" than AR validations as it only executes validation functions
that are needed and fill in other _potential_ error messages based on rule
definitions. This is one of the reasons why it's multiple times faster than AR
validations and more extendible.

With dry-data constrained types it's a different story. It raises type errors
because we're dealing with lower level objects of your system. Personally I
prefer to see a meaningful type error rather than exceptions like "foo called
on nil" etc. Not to mention that constrained types increases the chance of
spotting type-related bugs earlier.

------
bluesnowmonkey
> Both libraries are using each other, which is a cool synergy [...]

Or an ominous circular dependency, depending on how you look at it.

~~~
jeremiep
I had the same reaction, "why didn't he abstract away the dependencies in a
third library?"

~~~
solnic
That's gonna happen, as I mentioned they are both young libs. A shared
dependency will be introduced very soon. I should've mentioned that in the
post.

------
JFlash
The validity of an object depends on the observer. If you require all
Customers to have a billing address but a person is in the middle of a multi-
page checkout, should their Customer record be invalid if they've only filled
in their email so far? What if they want to come back to the form later?

Rails tries to fix this with the `:on` option for validations
([http://api.rubyonrails.org/classes/ActiveModel/Validations/C...](http://api.rubyonrails.org/classes/ActiveModel/Validations/ClassMethods.html#method-
i-validate)), which goes woefully unused in Rails apps.

Now you can do:

    
    
      customer.save(context: :checking_out)
      # which is different than
      customer.save

------
scient
I'm struggling to see the improvement here. The validations are harder (more
confusing?) to use, require more code/work and then apparently try hard to
provide the exact same integration that the original thing provides? Sooo
whats different or better now?

------
gue5t
Functional programming phrases this as the "make illegal states
unrepresentable" maxim.

~~~
ionforce
Now that I'm on the functional side of things, I can say confidently to my
past self and all the "OOP but never functional" practitioners out there...

Learn more about this concept. Make illegal states unrepresentable in your
program.

It will greatly help how you think about programming.

------
DanielBMarkham
Meh, I'm not so sure.

"Why do we validate data? Typically, to make sure that invalid state doesn’t
leak into our systems."

Surprisingly, no! We validate data to put all the business validation logic
into one place, regardless of whether that data is being persisted or not. In
fact, a good argument can be made that the most important purpose of
centralized object validation is to _allow_ storing objects that are out of
whack -- and that our code should degrade gracefully on load when the object
is invalid -- whether that load occurs from the UI, a feed, or a persistent
datastore.

Perfectly fine to argue the other way. In that case, nope, it doesn't make
sense.

~~~
solnic
It's not feasible to have validations centralized. You don't validate objects,
you validate data that your system received from somewhere. How it's done
depends on the type of this external system, the format and sometimes the
context in which you validate, that is specific to your system. This is one of
the reasons why AR validations don't scale so well.

Having said that, there are many systems where AR and centralized validation
works just fine.

~~~
DanielBMarkham
I hate these discussions because you're right, but in a limited way.

Yes, it is impossible. I should have said something like "locally centralized
to the compilation unit"

As gandalf pointed out, "IsValid" really asks a bunch of questions at the same
time, and talking about it in the abstract is always going to miss the full
depth. As a for-instance, there's clean data in terms of my datastore and then
there's clean data in terms of whatever app I'm looking at. In a system of any
size, these are most likely not the same thing. (And yes, that drives us
crazy, but such is the way of large datastores)

------
rubiquity
I think I'd rather just use a language with a good type system than write Ruby
in that sort of style. The value of types is greatly diminished if there isn't
a compiler to make all sorts of wonderful optimizations and find bugs for you.
At least until Ruby the language implements gradual/soft typing directly.

The library mentioned also raises exceptions for bad input. Unfortunately,
exceptions cripple the runtime performance of Ruby so I wouldn't consider that
a smart thing to do.

~~~
solnic
You'd get these runtime exceptions anyway, but they would be obscure and often
hard to debug. Once you define core objects in your system that you construct
from external sources (like a database) you establish a contract that the
source is trusted and it's expected to provide valid data. If that doesn't
happen, for some reason (and it happens), you want to see it as early as
possible and a meaningful error can be very helpful.

This obviously doesn't come even close to a statically typed language but I
believe it's a huge improvement over the typical approach in Ruby where
you...just don't care and hope for the best ;)

~~~
rubiquity
> _You 'd get these runtime exceptions anyway, but they would be obscure and
> often hard to debug._

Only if you use the bang versions of save/update/create, which I don't.

------
saneshark
I'm a strong advocate for this and to an even larger extend some of the
improvements that Trailblazer project have introduced.

Reform, a component of Trailblazer, uses virtus for coercion and
Dry::Validation methodologies described.

it's worth a look for anyone who found this post enlightening.

~~~
solnic
I should mention that dry-data will replace virtus in reform, as it's a)
faster b) simpler c) more flexible :)

------
davidbanham
From the example in the article the libraries smell a lot like a reinvention
of JSON schema. What am I not seeing?

