
Parse, Don’t Validate - undreren
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
======
Izkata
This is describing what I've known as "Typed Hungarian notation", and have
seen a few times before, though I can't seem to find now.

The original intent of Hungarian notation was to encode metadata into the
variable name - for example, "validData = validate(rawData);
showValidData(validData)". The idea was that the prefixes would make it just
look wrong to the programmer if someone accidentally used mismatched names,
such as "showValidData(rawData)", indicating a likely bug.

This notation mutated into the far more simplistic and arguably less useful
"prefix with the datatype", such as iFoo indicating Foo is an integer. This
mutation became known as Systems Hungarian, while the original became Apps
Hungarian.

The suggestion in this post is taking Apps Hungarian, and instead of relying
on variable names, encoding it into the type system itself.

The first time I recall seeing something like this suggested was actually in
Java, with examples in that blog post involving class attributes:

    
    
      public String Name;
      public String Address;
    

And creating classes to represent these values, preventing them from being
used the wrong way:

    
    
      public NameType Name;
      public AddressType Address;
    

...which is why I remember this as "Typed Hungarian" \- making the language's
type system handle the work that a person would normally have to think about
in Apps Hungarian.

~~~
radicalbyte
The Java example you've provided is known as Domain Modeling, it's well
described (and honestly greatly expanded on) in Domain Driven Design by Eric
Evans.

It's applied heavily in the enterprise world simply because it's so powerful.
In general as the domain logic becomes more complex the benefits of doing this
increase.

Actually, not encountering this style in code-base which is solving a complex
problem is a massive warning sign for me. It usually means that concepts are
poorly defined and that the logic is scattered randomly all over the code-
base.

Apps Hungarian is just a logical style: name functions, variables, types so
that they're meaningful within your domain. The result of which is code which
is very easy to understand for someone who understands the domain. This
doesn't mean long names - if you're doing anything math intensive then using
the short names which conform to the norms of the field is perfect. For a
business process it probably isn't :)

~~~
gambler
_> Actually, not encountering this style in code-base which is solving a
complex problem is a massive warning sign for me._

There is a fine line between capturing some common meaning in a reusable type
(e.g. Optional<T>) and spending most of your time crafting a straight-jacket
of type restrictions that are supposed to thwart some imaginary low-level bug
in the future.

Java is actually a great example of how this mentality backfires in real life.
The language, from day 1, had a zillion ways to force someone else to do
something they didn't want to. Abstract classes. Private methods. Final
classes. Etc. It was supposed to ensure good design, but in practice just led
to the endless reenactment of the banana-gorilla-jungle problem.
([https://pastebin.com/uvr99kBE](https://pastebin.com/uvr99kBE))

At the same time, the language designers didn't add such basic features as
lambdas and streams until 1.8. This lead to creation of ungodly amount of
easily preventable bugs, since everyone was writing those awful imperative
loops.

~~~
monoideism
Banana-gorilla-jungle problem results from OOP. This article (and presumably,
the GP) is describing encoding domain-driven logic in a statically-typed
functional language, and thus the OOP issues are not a problem.

Functional languages don't carry around their environment with them, so you
just get a banana. No gorilla or jungle.

~~~
gambler
_> Banana-gorilla-jungle problem results from OOP._

Not really. There are specific design decisions that Java made which enabled
this issue to manifest. I've recently took a foray into Smalltalk and it
doesn't have this problem, because methods depend on _object interfaces_
rather than _class types_. AFAIK, OCaml also got this issue right while
keeping static type checks - they decoupled classes from types.

 _> Functional languages don't carry around their environment with them_

It is entirely possible to recreate the issue in a functional language, as
long as there are sufficiently powerful mechanisms to enforce type safety
during compilation. In fact, if you don't understand this and follow the
advice in the article too far, that's exactly where you will end up. You will
have a jungle monad containing gorilla monad holding hostage the banana data
structure the user actually cares about.

~~~
monoideism
In the off chance you're reading this 9 days later, I'd like to push back
against this response (working on an OCaml project at work, actually).

> AFAIK, OCaml also got this issue right while keeping static type checks -
> they decoupled classes from types.

If you mean that the types correspond with OCaml classes themselves and not
the objects that classes create, yes, that's correct. But it sounds like you
might mean the opposite of that. In any case, people rarely use the O part of
OCaml much. Most people find they prefer the first-class modules for almost
everything that needs greater modularity than what functions provide. I've
found that the only thing that OCaml object system is good for is
modeling/wrapping existing OOP or prototypal APIs, like certain XML ones, or
some web APIs.

>You will have a jungle monad containing gorilla monad holding hostage the
banana data structure the user actually cares about.

While it's true that monads are a way to make normally explicit environments
implicit, monads aren't really what the book I'm talking about is about, nor
this article.

Yes, monad-heavy code could (with effort) end up with an application monad
displaying problems similar to the banana-gorilla-jungle problem. I've frankly
never seen monadic code like that, but perhaps you have.

Instead, what this article is advocating for is simply modeling your data
correctly. Really, this is a universal programming theme, that helps a lot
regardless of programming language. But the structures available in
statically-typed functional languages are more powerful and able to model
domains with greater granularity. If you're familiar with the saying "make
invalid states unrepresentable", then you understand what I mean.

------
darkkindness
Yes, yes yes! Encode invariants in your data, don't enforce invariants on your
data. I particularly think this point needs to be stressed more, because
practicably speaking, not every language has the typechecking faculties of
Haskell:

> Sometimes, making an illegal state truly unrepresentable is just plain
> impractical given the tools Haskell provides, such as ensuring an integer is
> in a particular range. In that case, use an abstract newtype with a smart
> constructor to “fake” a parser from a validator.

For instance, few type systems will let you encode prime numbers as datatypes,
so you'd have to do something in your language like

    
    
        newtype Prime = Integer
        mkPrime :: Integer -> Maybe Prime
        mkPrime | isPrime p = Just (Prime p)
                | otherwise = Nothing
    

which is parsing!

~~~
mc3
This is why learning Elm has been valuable for me.

Without the convoluted type system of Haskell, you have to do this more, and
it is a lot simpler to understand and maintain. You are never going to be able
to use the type system to guarantee everything about your data, so let your
runtime code check it, add lots of tests and keep it isolated (Elm calls them
"Opaque Types") so it is easy to reason about.

The equivalent is possible in OO with classes, but with the caveat that only
if you make everything immutable and ideally have a layer of compile time null
checking on top.

In short, with Haskell I am learning and relearning language features
constantly and banging my head. With Elm I grok the language and I am focused
on solving the problem.

~~~
lexi-lambda
I think it’s fantastic that Elm works well for you; it’s a wonderful language
and there’s no doubt that it is much simpler than Haskell. I agree with you
that, in many cases, the technique you’re describing is sufficient, and it
requires much less sophisticated machinery than some of the fancier Haskell
techniques.

That said, I do think what you’re describing is _different_ from what is
described in the blog post. Opaque/abstract types reduce the trusted code
significantly, which is fantastic, but the example provided in the blog post
is actually even better than that: because NonEmpty is correct by
_construction_ , the amount of trusted code goes down to zero. A lot of the
tools Haskell provides, like GADTs and type families, can be used to capture
more invariants in a correct-by-construction way, rather than relying on
abstraction boundaries. There are advantages and disadvantages to both.

I certainly don’t have any ill will toward anyone who finds the complexity of
Haskell not worth the extra benefits, so to be quite clear, I’m not trying to
sell you on Haskell instead of Elm! I do think being aware of the distinction
is still useful, though.

~~~
mc3
Good reply. My view of Haskell is if I had to do it 40 hours a week in a team
for 50 weeks, I'd get use to it and would see the benefits and get itchy using
any language without them. I'd have wiser colleagues to help me learn and I'd
see the patterns in an existing code base.

But as a hobby FP'er looking from the 'outside' Elm is more suited to me. So
the point is my Haskell vs Elm preference is somewhat circumstantial.

It's like the home DIY person preferring to order a kitchen from IKEA and
install it, instead of making their own cabinetry.

------
schoen
I learned a lot from this article and hope to revisit it.

I had one quarrel:

> Consider: what is a parser? Really, a parser is just a function that
> consumes less-structured input and produces more-structured output. By its
> very nature, a parser is a partial function—some values in the domain do not
> correspond to any value in the range—so all parsers must have some notion of
> failure.

I think it's reasonable to include some total functions as parsers as well,
because some grammars -- structures that are recognized by the parser and that
specify languages -- generate _all strings_ as the corresponding language. In
that case, there are no failure cases because every string is a valid input.

I came up with some more-trivial examples, but I think a less-trivial example
is the split() function in many languages. Its Haskell type signature is
String -> [String], and it is a total function. It handles the case where the
input is a nonempty string with no delimiters (the output is a list containing
a single token equal to the input), as well as the case where input is an
empty string (the output is an empty list), and the case where the input
consists _only_ of delimiters (in many implementations, the output is a list
containing n+1 empty strings).

Another trivial-ish case is where the grammar allows an arbitrary trailing
string following the object of interest (similar to C's atoi). In that case, a
parser will always return the empty object if there is no interesting or
relevant data in the input. (For example, atoi returns 0 if the input string
doesn't begin with an integer.) This could be viewed as a kind of error, but
there may be applications in which this behavior is useful.

~~~
lexi-lambda
I think this is a reasonable point, and your split() example is a good one. I
wasn’t sure while I was writing the blog post if I considered isomorphisms to
be parsers, and I’m still not entirely sure if I do or not, but there’s an
argument to be made that they are simply parsers where the set of possible
failures is empty. I don’t have strong feelings about that distinction one way
or the other, though.

~~~
tylerhou
Some other random feedback: the short phrase makes me think that I should
parse _instead_ of validating. Might I suggest: "Don't just validate: parse"?

~~~
lexi-lambda
The idea really is that parsing subsumes validation. If you’re parsing, you
don’t have to validate, because validation is a natural side-effect of the
parsing process. And indeed, I think that’s the very thesis of the blog post:
parsing _is_ validation, just without throwing away all the information you
learned once you’ve finished!

~~~
tylerhou
I understood the article, and I see your point, but my main feedback is that
when I read the title I assumed that you were going to describe some way where
you would parse input without validating it.

~~~
erikpukinskis
FWIW, I don’t read that implication in the title.

~~~
torstenvl
If validation is necessarily included in parsing, then the title -- parse &&
!validate -- is impossible.

The title of the post says "don't validate," and so I initially expected it to
relate to the mantra of being liberal in what you accept, etc.

~~~
erikpukinskis
One thing to bear in mind is that headlines are not prose, they are shorthand.

So you have to assume there could be missing words, since that’s how headlines
work.

In this case there is a dropped “just”:

Don’t (just) validate: parse!

If the headline was “Parse, never validate” then I would agree with your
reading. But “don’t” is pretty lightweight. I’m having a hard time thinking of
a lighter discouragement word. Given that, I think it’s unfair to assume the
strongest discouragement is implied.

------
matheusmoreira
Language-theoretic security was the first thing that came to mind when I read
the title. Was pleasantly surprised to see it referenced at the end of the
article.

[http://langsec.org](http://langsec.org)

The idea is to formally specify the structure of inputs and reject invalid
data instead of trying to process it.

~~~
danharaj
langsec is directly cited in the article :)

~~~
tome
The citation of langsec is directly mentioned in the comment :)

~~~
danharaj
That's what I get for not reading the whole comment :^)

------
iddan
Dear awesome Haskell writers. You are writing great articles but I can’t
understand the code examples as Haskell is too far from the programming
languages. Can you provide examples in TypeScript / Rust / Swift / Reason or
another C like language? If not, I’m curious why? Love, Iddan

~~~
lexi-lambda
I have provided a translation of the NonEmpty datatype into Java in this
comment:
[https://news.ycombinator.com/item?id=21478322](https://news.ycombinator.com/item?id=21478322)

However, to answer your “why” question: mostly just because I write Haskell
for a living, and my primary personal motivation for writing this blog post
was to share with my coworkers and to point to during code reviews. Therefore,
it makes the most sense for the examples to be in Haskell! Haskell is also
especially well-suited for communicating this sort of thing _if_ you already
understand the notation, as you can sort of see from the Java translation:
it’s significantly more verbose just to get the same idea across. Still, it’d
be great if someone were to provide a “translation” of the same ideas to other
languages, no doubt!

------
QuinnWilton
I really enjoyed this article!

I think that leveraging the type system to enforce invariants about your data
model is one of the most effective things you can do as an engineer.

I gave a talk on this topic at my workplace, using Elixir and Haskell as
examples. It's a little haphazard, since it wasn't written for public
consumption, but someone might find it interesting:
[https://github.com/QuinnWilton/programming-with-types-
talk](https://github.com/QuinnWilton/programming-with-types-talk)

~~~
verttii
Really good stuff, is that talk uploaded anywhere?

~~~
QuinnWilton
It's not unfortunately, I just gave it over lunch to my team about a year ago.
Thanks though!

------
z3t4
Not only was this the first article about Haskel that I could actually
understand. But it was also the first article where type annotations makes
sense. That it actually helps you think about edge cases, instead of just
annoy you.

My personal experience with type annotations (not to be confused with type
systems) makes the code harder to read, for very little benefit. I would like
a type checker that, instead of you having to babysit the type checker, the
type checker should babysit you. Instead of the type checker crying over
trivial issues, and needs your comfort. The type checker should comfort _me_
when I'm sad.

While manually writing human friendly error message helps, you still have to
discover edge cases yourself. It would be nice with a tool, not necessary
baked into the types/system, that automatically detects edge cases. For
example detect when a variable has the possibility of becoming undefined/null.
Maybe achieved via fussing.

A college once asked me "How do I get rid of the damn nulls?"

Null's are basically a lazy error/exception. Error handling is so laborious we
often "forget" to do it. And this is very confusing for beginners. There are
basically two camps, one that think errors are exceptions, and the other that
thinks errors should be first class citizens. But there are different domains
of course. I don't want my CPU to maybe return a error message when I ask it
to multiply two numbers, I want that to always work, I don't care if the
result get shifted (that would be a job for the type checker to detect, or
better; automatically use mult64 instead of mult32), because handling errors
at that level and detail would be too annoying and bad for performance.
Writing network services however, it is crucial that I get noticed if a
request failed.

~~~
tejon
For what it's worth: it's _very_ rare for a type annotation to be required in
Haskell. It's just considered best practice (supported by compiler warning
options) to have them on all top-level declarations, for two reasons:

\- it's a good extra layer of _human-readable_ documentation about basic
behavior and requirements (as in this article); and

\- it lets the compiler give better error messages.

The compiler is _always_ going to do its own inference, whether you give it a
signature or not. If it infers a valid type _which isn't the one you thought
you wrote_, and you didn't specify what the type _should_ be, that
function/statement will compile -- you won't get an error until later, when
you attempt to consume it and the types don't match. This can be harder to
trace, especially if there's a bunch of polymorphic code in the middle where
any type will pass. Supplying a manual annotation lets the compiler
immediately say "hey, these don't match."

------
hcarvalhoalves
I would like to see an example on something less trivial than encoding `not
empty` on the type. At some point the difference between what the author calls
"parsing" and "validating" gets blurry, or the type system can't encode.

~~~
danharaj
You can always use abstract data types to indicate in the type system that you
know something that can't directly be expressed. The point is not that
"parsing" and "validating" are distinct concepts, but that they're two
different points of view on the same problem.

------
beders
It's pretty cute. Still no static types for "Strings that end with a dot" or
"Strings that match [a-z]+" or "Strings that are proper nouns". ;)

I'm not sure I understand the benefits of this compared to a traditional
parser generator.

"Use a data structure that makes illegal states unrepresentable." is not
effort spent well. (And given Gödels incompleteness theorems unachievable in
the _general_ case.)

A proper parser generator using a grammar will give you the structural
guarantees you need, a lexer will reject invalid terms. And a transformation
step gives you the target structure. Eventually you will have to validate
though at runtime.

~~~
lexi-lambda
> Still no static types for "Strings that end with a dot" or "Strings that
> match [a-z]+"

Sure there are. :) Technically speaking, anyway. Here’s a type for “strings
that end with a dot”:

    
    
      -- | A string that ends in the character '.',
      -- represented as a string without its trailing dot.
      newtype StringEndingInDot = StringEndingInDot String
    
      fromString :: String -> Maybe StringEndingInDot
      fromString str = case reverse str of
        '.':cs -> Just $ StringEndingInDot (reverse cs)
        _      -> Nothing
    
      toString :: StringEndingInDot -> String
      toString (StringEndingInDot str) = str ++ "."
    

And here’s a type for “strings that match [a-z]+”:

    
    
      data Letter = A | B | C | D | ... | X | Y | Z
      type StringAZ = NonEmpty Letter
    

Now, admittedly, I would never use either of these types in real code, which
is probably your actual point. :) But that’s a situation where there is a
pragmatic “escape hatch” of sorts, since you can create an abstract datatype
to represent these sorts of things in the type system without having to
genuinely _prove_ them:

    
    
      module StringEndingInDot(StringEndingInDot, fromString, toString) where
        newtype StringEndingInDot = StringEndingInDot { toString :: String }
    
        fromString :: String -> Maybe StringEndingInDot
        fromString str = case reverse str of
          '.':_ -> Just $ StringEndingInDot str
          _     -> Nothing
    

You may rightly complain that’s a lot of boilerplate just to represent this
one property, and it doesn’t compose. That’s where the “Ghosts of Departed
Proofs” paper ([https://kataskeue.com/gdp.pdf](https://kataskeue.com/gdp.pdf))
cited in the conclusion can come in. It provides a technique like the above
that’s a little more advanced, but brings back composition.

~~~
colllectorof
_> Technically speaking, anyway. Here’s a type for “strings that end with a
dot”:_

These are just two functions that wrap and unwrap a string. Your "type"
doesn't generate a compile-time error when constructed with the wrong data.
This doesn't even handle the simplest use case:

    
    
       putStrLn (toString (fromString "something."))

~~~
erdeszt
That code doesn't compile because fromString returns Maybe StringEndingInDot
and toString takes StringEndingInDot. So it does protect you from misuse.

~~~
colllectorof
A properly enforced static type would not need to emit Maybe, because it would
be impossible to set it to the wrong value in the first place.

Not to mention that to be truly useful a static type would need some sort of
literal that would emit a compile-time error if the supplied init data was
wrong.

In short, examples above do not demonstrate Haskell's ability to statically
enforce rules like "string needs to end with a dot".

Now, you could make a type that always prepends a dot when asked for a string
representation, which happens to enforce this specific constraint, but you
cannot use a trick like this for most constraints (such as the second example
of "only alpha characters").

~~~
mrgriffin
I think (but cannot guarantee) there's nothing much stopping you writing
template Haskell to construct valid values at compile time if you want. It's
just that most of the time you're (or at least _I_ am) happy to use "unsafe"
functions because the verification is simple enough to do in your head, and
the other 99% of the time I'm creating values it's from run-time data.

------
magical_h4x
Nice article! It got me thinking about an issue I've noticed in dynamically
typed languages (namely JavaScript) where it's very easy to end up doing lots
of (potentially redundant) validation all over the place because it's much
more difficult / unwieldy to pass that information along.

~~~
ricardobeat
In this case not even TypeScript can rescue you. You might naively implement:

    
    
        const head = (input: A[]) => input[0]
    

and it will happily infer a return type of _A_. Then you'll fall flat on your
face the first time you encounter an empty array:

    
    
        head([]) // -> undefined
    

To make it correct you need to explicitly define a return type of _(input:
A[]): A | undefined_ , just as the _Maybe_. It's obviously impossible for TS
to guarantee that an array will not be empty at runtime, but I wish this
specific case triggered some help from the type checker.

~~~
tylerhou
> It's obviously impossible for TS to guarantee that an array will not be
> empty at runtime, but I wish this specific case triggered some help from the
> type checker.
    
    
      type NonEmptyArray<T> = { head: T, rest: T[] };
    
      function makeNonEmptyArray<T>(array: T[]): NonEmptyArray<T> | null {
        if (array.length == 0) return null;
        return { head: array[0], rest: array.slice(1, array.length) };
      }
    
      function head<T>(array: T[]): T | null;
      function head<T>(array: NonEmptyArray<T>): T;
      function head(array: any): any {
        if (Array.isArray(array)) {
          if (array.length === 0) return null;
          return array[0];
        }
    
        if (typeof array === "object") {
          return array.head;
        }
      }
    

[http://www.typescriptlang.org/play/#code/C4TwDgpgBAcg9gOwKIF...](http://www.typescriptlang.org/play/#code/C4TwDgpgBAcg9gOwKIFsygIICcsEMQA8AKgHxQC8UA3lABYS4AmAXFEQDRRYQDOwrRANoBdKAF8A3ACgpAMwCuCAMbAAloigpcAawjxkaTDnzESAClzGQAkQEpW+1OhDY8hUlAA+UBPIA2ftRSUFCqslAWVgB0fhAIAObAtBSUAAy2XBDA8lgIPv5+0iHc2bnUdAwsUJZugqnCnNx8rDX4UTx+qkoQZgCMnK0gMXGJtBmSUmIyCspqGvRMppFuNsL2bF75AdIzKup5C4xLgw6ITkZuputEO4p785XL+C0IIOu4r0EhYRGubao8P4gJ5vDJUYIhULhEHDBJJFJpDIlHJ5XzbCHFLIo6pWOrCIriGTfaGgSBwcKDBFQABEcAARgArCAqalgjGZUp5QZRQ4EqZiIA)

It's impossible for Haskell to guarantee an array will not be empty at runtime
as well; that's why we can write a new type + a smart constructor to track
runtime properties.

~~~
s_severus
A possibly more elegant TypeScript solution:

    
    
      type NonEmpty<T> = [T, ...T[]];
      
      function head<T>(input: NonEmpty<T>): T {
        return input[0];
      }
    

[https://www.typescriptlang.org/play/?ssl=5&ssc=2&pln=1&pc=1#...](https://www.typescriptlang.org/play/?ssl=5&ssc=2&pln=1&pc=1#code/C4TwDgpgBAcg9gOwKIFsygDwBUB8UC8UA2lgDRQB0VWRAurQNwCwAUKwGYCuCAxsAJaIoACwgBDACbYcACn4IwnYAC5YiVOhDSAlKqxQA3qyhQAThGCdTCKPMXAiABkasAvq1ajJMutuYsAegCoAEFTAHNOFAgEYCg4dihQSCgAcjpU2wBnKAQ4OLEsrP5whDEAIwAbaGA4KDAxUzFo4AhTeMTk6FT4ZDRMbgBrPIB3BBxUilYgkwAFUzhIU1A0x0z+HJQN4oRw2xsutIyocqUzCABHTn5zCX2k8G7ejQGEYbgxiam2Fi8JHwATLQ-
FAZnBBgBCDwsHiILJxBBRcptLKqREoZGmOgEYgARhcv3E-3RmKyfmmwTCkWisQ6DxSqRJbWOG1y+SghWKpQq1SSdQaTRabTphx66n6WiZpi+0NhCHhbL6mhgSJRqmeEowUrwhCI+P8fxkeSVoBVGJRILBkKAA)

------
bretthopper
I just watched a good Elm conference talk about this same concept called
"Making Impossible States Impossible" \-
[https://www.youtube.com/watch?v=IcgmSRJHu_8](https://www.youtube.com/watch?v=IcgmSRJHu_8)

------
mehrdadn
Am I right to suspect the only reason "there can’t be any performance
overhead" is that this is being done in a lazy language like Haskell? Meaning
the statement won't hold in >99% of practical cases? Or did I misunderstand
something?

~~~
spuz
No, this doesn't have anything to do with Haskell's lazy evaluation (at least
not in the NonEmpty list example presented). The general idea is that if you
are going to perform validation checks at some point in your dynamic language,
you won't lose any performance performing those checks up-front in your static
language.

~~~
mehrdadn
But how can this possibly be true in general? As an example, just imagine I
want to check that a string represents a list of space-delimited integers,
where each one is also less than 10 digits. It's far more trivial to verify
that than to actually parse the integers. And by performing the verification
pass separately before the parsing pass, I can reject invalid inputs early,
leading to much faster rejections than if I parse them as I validate them. The
only way I can see there being zero cost difference is if everything is
implicitly lazy, such that at runtime the verification won't even happen until
the parsing needs to be performed too. Right?

~~~
papln
It's true that in the _failure_ case, you can get faster reults by short-
circuiting. If failure cases are a substantial portion of your runtime, then
yes, doing a fast pre-pass can be more efficient.

I'd speculate that in the real world, 99% of cases have time dominated by
success-cases. Exceptions would be things like DOS attacks.

~~~
mehrdadn
No that's just wrong. It's not just failures. Imagine if I verified they were
all zero then I wouldn't have to do a full parse. Or if I verify they have
only a few digits then I would avoid bignum parsing. I think this proves what
I'm saying - this simply isn't true in general.

~~~
garethrowlands
It's not necessarily wrong, though what you say makes sense.

Any code that you write, you can move into the parser module. In your example,
you have a function that checks the strings are all zero, and you wouldn't
call the parser if it returned true. But you can simply declare this code to
be part of the parser and just not call the rest of the parser. The difference
is in the API: in the one case, you return false and in the other you return a
parse error.

Now, you may say that, knowing this information, you're going to parse into
ints instead of BigNums. But the parser can do this too.

You might also say that I happen to know, due to some context, that all the
numbers have 10 digits or fewer. And that therefore I can do better than the
compiler. But if you make the context and the strings the input to the parser,
then you bring it level again.

Or you might say that my application has some context that gives it an edge
but the parser is a general purpose library that does not understand the
context. In that case, your application does indeed have an edge. But that
applies to any general purpose library and is not a point about the merits of
parsing versus validation.

What's maybe more interesting is if you might only need some of the parsed
results or the parsed results wouldn't fit into memory. But for that you just
need an incremental or streaming compiler.

This is actually pretty common in real world compilers. For example,
javascript parsers typically don't lex javascript functions in their initial
pass through the source.

~~~
mehrdadn
> In your example, you have a function that checks the strings are all zero,
> and you wouldn't call the parser if it returned true. But you can simply
> declare this code to be part of the parser and just not call the rest of the
> parser.

Again -- like with the previous argument about "error" vs. "success", this is
a red herring built around the fact that my example was so trivial and easy to
describe this way. What if the logic was supposed to be "if the digits are all
zero then don't store the numbers in the database, otherwise subtract them
from my account balance"? You avoid a database hit if they're all zero, so are
you going to say debiting your account is now "parsing" too?

------
larusso
I understand the concept of wrapping values in specific types which gives you
certain guarantees at compile time. And I really like this concept and will
play around with this some more because the empty something issue is something
I myself struggle with. But what really urgs me is the usage of throw in the
exceptional case. My goto type in these situations would be Either rather than
a throw. But this would create nearly the same issues on the caller site as an
Maybe would create. Now one could argue that this is an exceptional case the
user or Programm can’t possibly handle. So how do you handle this then? My
main usage is in Rust and here the panic! handle seems to be used as often as
unsafe raw pointers :)

------
giomasce
Using this programming style requires a rather powerful type system, if you
want to go past the simple examples the author shows and encode more complex
validity conditions. I am still learning these things, but AFAIU the extreme
edition of all this is the Calculus of Constructions used in Coq and Lean,
where you can encode theorems and algorithm specifications as very complicated
type; proving a theorem corresponds to showing an element of the theorem
itself (seen as a type), and if you find an element of a type encoding a
certain specification, Coq can work out for you a program that implements that
specification.

This is what I understood, at least. Things are quite complicated!

~~~
stoops
A pragmatic interpretation of this is to:

* subclass or compose String whenever possible. (Id, Name, Address, AddressLine1)

* subclass or compose Number/Matrix types whenever possible (e.g. Users, Packages, Widgets)

* Use immutable collections (e.g. Google Guava library in Java)

I have built very powerful software with small teams using these principles.

At scale, your day to day programming becomes easy as the compiler is doing
all the error checking.

It is also very slow to program this way. You write A LOT of boilerplate, and
there's only so many code-generation libraries you can throw at it before it
becomes a pain to setup your IDE.

But it is worthwhile for complex applications. We did not have bugs.

------
jillesvangurp
Kotlin's smart cast is nice in this context.

One example is with nullable types. if you have val foo: Whatever? // might be
null and Whatever and Whatever? are two separate types. foo.someMethod()//
compilation error because foo is not of type Whatever

if(foo != null) { foo.someMethod() // works because foo was smart cast to
Whatever after the null check }

In the same way doing a switch on a type smart casts the variable to that
type. Smart casting of nullable types also gets rid of a lot of clumsy things
like Optional, Maybe, etc. you'd need in other languages where you'd have to
call a method to extract the value, assign it to a new variable, etc.

------
z3t4
What is the equivalent in TypeScript? const head = (arr: Array<number>):number
=> arr[0]; happily returns undefined if you pass an empty array (with strict
null checks)

~~~
lexi-lambda
TypeScript is sadly very unsound by design, so doing this kind of thing in
TypeScript is always going to be more “best effort” and less “exhaustive
proof.” That’s not necessarily wrong or bad _per se_ , as after all, Haskell
has escape hatches, too (and so do dependently-typed languages, for that
matter). However, there are philosophical differences between the way the
languages’ type systems are designed.

When I’ve talked to people who develop and use TypeScript, the general
impression I’ve gotten from them is that TS is actually as much about tooling
as it is about correctness. TS enables things like autocomplete, jump to
definition, and quick access to documentation, and it does that by leveraging
the typing information to “see through” the dynamic dispatch that makes that
sort of thing infeasible in dynamically-typed JavaScript. The TS type system
does provide some correctness benefits, don’t get me wrong, but where ease of
use and correctness are in conflict, usually ease of use is preferred.

This is true even with all the “strictness” options enabled, like strict null
checking. For some examples of TypeScript unsoundness, see [1] and [2]. Flow
is actually generally a lot better than TS in this regard (though it does
still have some major soundness holes), but it looks like TS has basically
“won,” for better or for worse. But again: TS won because its tooling was
better, not because its ability to actually typecheck JS programs was better,
which I think reinforces what I said in the previous paragraph on what TS is
really about.

[1]
[https://twitter.com/lexi_lambda/status/1068704405124452352](https://twitter.com/lexi_lambda/status/1068704405124452352)

[2]
[https://twitter.com/lexi_lambda/status/1068705142109810688](https://twitter.com/lexi_lambda/status/1068705142109810688)

------
anasbarg
When we started writing the code for Heavenly-x
([https://heavenlyx.com](https://heavenlyx.com)), the first thing we needed to
write before anything else is the definition of a data structure that
represents the requirements as close as possible (a concrete syntax tree).

We’re building a tool with a custom DSL for building CRUD GraphQL APIs, with
auth and data validation and transformation. So our architecture consists of
three phases: \- Parser \- Setup \- Runtime

There’s no way the parser would succeed if the input is not valid. We’re
writing our software in Scala and we are using parboiled2 for parsing the DSL
into a concrete syntax tree, so if it succeeds then it’s valid and if it
fails, it fails early and we don’t have to worry about validation in later
phases. We wrote some custom validation logic that traverses the concrete
syntax tree after the parser to validate for some requirements that we
couldn’t encode in the concrete syntax tree, but it’s really a small portion
of our codebase and it’s easy to maintain.

At the Setup and the Runtime phase we assume that the concrete syntax tree is
valid.

At the Runtime phase we have a running GraphQL server so we have a parsing
phase too but it’s handled by Sangria so we don’t have to worry about it.

We are also building a UI for those who don’t like using our language. It’s a
React app where the state data structure looks exactly like our concrete
syntax tree.

We’re launching soon. You can subscribe for early access here:
[https://heavenlyx.com/get-started](https://heavenlyx.com/get-started)

------
jdonaldson
Haxe has a nice mechanism to handle this called "abstract types".
[https://haxe.org/manual/types-abstract.html](https://haxe.org/manual/types-
abstract.html)

The critical validation step happens when the abstract type is created, not
when it is used or passed to other functions...similar to the example in TFA.

The added benefit is that you get a type that behaves pretty closely to a
class, but adds no typing or runtime memory overhead. E.g. here's an example
for an "unsafe" string :
[https://gist.github.com/jdonaldson/9a09287e540e7392857f](https://gist.github.com/jdonaldson/9a09287e540e7392857f)

Another benefit is that you abstract types can define relationships between
multiple types in this way, making it possible to perform validation for
functions that need to handle "Either<T>"-style types.

------
jes5199
Woah cool. Back when I was using Haskell (uh, ten years ago, now), people kept
telling me that they were _pretty sure_ there was a flag that detected
incomplete pattern matches at compile time, but no one could ever tell me what
it was. -Wincomplete-patterns . Now I know. Thanks!

~~~
papln
[https://downloads.haskell.org/~ghc/latest/docs/html/users_gu...](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/flags.html)

------
cryptica
It doesn't make sense to use JSON as the interchange format for a statically
typed language. There is an impedence mismatch between the two. You are forced
to infer types which are not actually there.

The reason why JSON is so popular is the same reason why dynamic typed
languages became popular. The interfaces between components are simple and
human-readable. A library written in a dynamically typed language can be
effectively integrated into any system without the possibility of type
mismatches.

If you have an API written with Protocol Buffers, every system which interacts
with yours needs to agree with your type system; this creates tighter
coupling.

~~~
whateveracct
Doesn't this hold true for all serialization ever? Replace JSON with "bytes"
(aka any serialized data) and it still holds:

It doesn't make sense to use bytes as the interchange format for a statically
typed language. There is an impedence mismatch between the two. You are forced
to infer types which are not actually there.

------
phreack
I love this kind of work having the compiler prevent you from creating
business logic bugs. If you're into this and Kotlin, take a look at the
contracts experiment - [https://blog.kotlin-academy.com/understanding-kotlin-
contrac...](https://blog.kotlin-academy.com/understanding-kotlin-
contracts-f255ded41ef2)

------
privethedge
> head :: [a] -> a

> This function returns the first element from a list. Is it possible to
> implement?

But the type itself doesn't say that it must return the first element.

> If you see why parseNonEmpty is preferable, you understand what I mean by
> the mantra “parse, don’t validate.”

Okay, what about more complex cases? E.g. how do you describe type "such value
doesn't exist in the db table yet"?

~~~
lexi-lambda
> But the type itself doesn't say that it must return the first element.

Sure, I didn’t say it does. That type isn’t a full specification of head. This
blog post isn’t about proving literally everything in your program correct,
which is impractical anyway because if you did that it would be as easy to
prove the wrong thing as to write a bug in the first place. Some functional
tests are still needed.

> Okay, what about more complex cases? E.g. how do you describe type "such
> value doesn't exist in the db table yet"?

I’ve addressed that kind of question more generally in another comment[1], but
for that specific case, I think it probably isn’t worth proving, because a
database lives outside of your application, and it’s the database’s
responsibility to manage that kind of data integrity, not your application
logic. That’s one of the kinds of “execution-phase failure modes” that I
describe in this paragraph of the blog post:

> Parsing avoids this problem by stratifying the program into two
> phases—parsing and execution—where failure due to invalid input can only
> happen in the first phase. The set of remaining failure modes during
> execution is minimal by comparison, and they can be handled with the tender
> care they require.

[1]:
[https://news.ycombinator.com/item?id=21478427](https://news.ycombinator.com/item?id=21478427)

~~~
privethedge
> Sure, I didn’t say it does.

Sorry, I implied it from your "foo" example.

> two phases

So maybe we need to change the mantra to “parse, and validate as little as
possible”?

------
bambax
I have done, and still do, a lot of "shotgun validation" ;-(

But after having read the article it's still not clear how to log illegal
values or entities if they can't even be represented in the data model? How do
you talk about things you can't name?

~~~
matt_kantor
You could return an error type wrapping the original value (and any other
useful context) using Either instead of Maybe.

------
qtplatypus
I agree broadly with this however in many cases that I have been dealing with
I don't think that there can be a type system that would be rich enough to
encode the validation laws.

For example how would you encode "This purchase request must be made before
the closing date for the purchase"?

~~~
lkitching
This isn't about encoding the validation rules in the type system, only the
requirement that validation with respect to some properties has been done. So
in your example you would just define a function

    
    
        createPurchaseRequest :: UserRequest -> Purchase -> Maybe PurchaseRequest
    

now everywhere in the code taking a PurchaseRequest can assume it is valid.

------
noobiemcfoob
I like the concept of encoding properties, like a boolean of non-empty, in an
object.

Beyond that, this article further convinced me type systems are for the
pedantic. A given function signature is impossible? Seems like just another
strength of a dynamic language.

~~~
aidenn0
What does your dynamic language do when you take the first element of an empty
list? There is no obvious "correct" thing to do. Furthermore, whatever you do
return is unlikely to be the same sort of thing that is returned for a non-
empty list.

A dynamic language will have a behavior that corresponds to some sort of type
signature, and it's not possible to write behavior that corresponds with the
type signatures given as examples of "impossible" in the article.

A type signature is merely a statement about behavior, so it's nonsensical to
make a false statement about the behavior, and Haskell catches this.

~~~
millstone
What would be the type signature of, say, Python's `pickle.load()`?

~~~
tome
It would be this

[https://www.stackage.org/haddock/lts-13.21/base-4.12.0.0/Pre...](https://www.stackage.org/haddock/lts-13.21/base-4.12.0.0/Prelude.html#v:read)

~~~
millstone
I think that's different. `read` requires you to know what you're
deserializing up-front, while `pickle` decodes the type dynamically from the
data.

Dynamic languages really can have functions whose behavior cannot be expressed
as some sort of type signature.

~~~
mrgriffin
I'm pretty confident that you could write something that was equivalent to all
the useful `pickle` calls. By that I mean you'll need to know which operations
you'll want to do on your unpickled object:

    
    
      readAny :: forall c r. [TypeWithReadAnd c] -> String -> (forall a. c a => a -> r) -> Maybe r
      readAny types string handle 
    

I think it's fair to say "hey, pickle doesn't require me to list all my types
explicitly", but on the other hand, it's not like pickle can conjure those
types out of thin air--it considers only the types that are defined in your
program.

Here's an example that uses Read as the serialization format and only deals
with Int, Char and String; but hopefully you can imagine that I could replace
the use of read with a per-type function that deserializes from a byte string
or whatever.

[https://repl.it/@mrgriffin/unpickle](https://repl.it/@mrgriffin/unpickle)

~~~
aidenn0
IIRC the pickle format can define new classes, but I haven't looked at it in
over a decade so I might be misremembering.

~~~
mrgriffin
You might be right, it's been a long time for me too. But if it _can_ define
new classes, then I'd expect that the code for those classes' methods would
also be in the pickled format, at which point there's no particular reason you
couldn't deserialize it in Haskell too... with some caveats about either
needing to know what types those functions should have, or needing to build a
run-time representation of those types so that they can be checked when
called, or (hopefully!) crashing if you're okay with just unsafeCoercing
things around.

