
Types as axioms, or: playing god with static types - pcr910303
https://lexi-lambda.github.io/blog/2020/08/13/types-as-axioms-or-playing-god-with-static-types/
======
hardwaregeek
One corollary about using positive space is that you realize we overuse
booleans a lot. I routinely use a lot of boolean logic in my JavaScript and
TypeScript code. I use it to decide when to display a component, when a
component is being hovered, when it's odd, etc.

Recently I wrote a simple tree walk interpreter and had to implement boolean
unary negation:

    
    
         (UnaryOp::Not, Value::Bool(r)) => Ok(Value::Bool(!r)),
    

Why is this line important? Well it's quite possibly the first[^1] time I have
used the boolean negation (!) operation in Rust!

Turns out, when you have proper enums, booleans aren't the best tool.
Something like `isLoading` can now be `data LoadingState = Loading | Success`.

Why is that better? Well if you want to add a third or fourth state, it's
trivial.

    
    
         data LoadingState = Loading | Success | Failure
    

You can even give it data to pass along

    
    
         data LoadingState a err = Loading | Success a | Failure err
    

Turns out you can have a lot more states than two. Often times in the
equivalent boolean case you end up with a couple booleans: isLoading, isError,
etc.

But then you end up with invalid boolean states! Like isLoading = true,
isError = true. Huh? By making a multi-state type, you can restrict to only
the possible states.

[^1]: Meh actually more like 5th but you get my point

~~~
valenterry
I fully agree. Not only is it easier to extend, it also makes reading the code
so much easier and prevents mistakes, whereas for booleans you have to rely on
variable names and hope to not pass them around wrongly.

------
piinbinary
This reminds me a lot of Parse, don't Validate (by the same author,
apparently): [https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
va...](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)

I think there's another reason to make invalid states inexpressible in the
type. When you need to change existing code (perhaps years after it was first
written) in a way that breaks key assumptions, you find that the current types
don't allow expressing that change. This is a good warning sign that other
code will need to be updated to take the new states into account. Without the
type enforcing that invariant, it might have been much harder to tell that
this particular change (out of all the changes you make over the years) is the
one that breaks a key assumption. (A key assumption that might have been
forgotten by now!)

Types become a message to your future self.

------
valenterry
I find it a bit strange - the example of "a list with an even number of
elements" can be modeled in typescript as well (simply using a list of tuples
with two elements).

The distinction of the two "views" does not make much sense to me. In Haskell
when doing `data A = X | Y` also "restricts" `A` to X or Y.

However, as much as I like and appreciate Haskell's typesystem, I find union
types very powerful. All the reasons for not have subtyping in Haskell always
drill down to problems with type inference or other "practical" issues that
seem to be fixable by a more advanced compiler or by accepting some
inconvenience - but maybe someone can enlighten me?

Also, being able to do `type X = A | B; type Y = C | D; type Z = X | Y` is
very powerful and helps a lot when using code to describe business
requirements without having to use wrappers and dealing with the overhead that
they bring. Even more so when a typesystem is so powerful that it can express
a pattern match on some type `Z` and do: `if(NOT A) ... else if (A) ...` while
checking exhaustiveness. I wish more languages would adapt this.

~~~
chriswarbo
As the author says, this distinction is mostly arbitrary and 'philosophical'.

The (arbitrary, 'philosophical') difference between Haskell's `data A = X | Y`
and TypeScript's `type A = "X" | "Y"` is that the TypeScript values "X" and
"Y" already exist (they're strings). Functions which produce and consume these
values already exist: there are certainly many functions which act on
`string`, and perhaps there are silly examples which produce such strings too,
e.g. a 2D geometry library which uses them to represent axes.

> without having to use wrappers and dealing with the overhead that they bring

I think part of the author's argument is that including such wrappers in our
values can sometimes be useful, to convey more information about the domain.
As an extreme example, we can represent pretty much everything in number
theory using `int`; e.g. the fundamental theorem of arithmetic tells us that
`int` is equivalent to `prime[]` (and of course Euclid proved that `prime` is
equivalent to `int`), but it's probably more useful to use the second
representation, even though it introduces wrappers and overhead.

As a more sensible example, I prefer Haskell's `Maybe` type compared to the
"nullable types" offered by some languages, since the latter loses all
structure: `Just Nothing` represents a different failure case to `Nothing`
(and we can easily collapse them using `join`), whilst `null` doesn't provide
that distinction.

~~~
valenterry
> I think part of the author's argument is that including such wrappers in our
> values can sometimes be useful, to convey more information about the domain.

And I think that is why the article, while in general touches a very
interesting topic, leaves a bad aftertaste for me.

It makes it sound as if one method is better than the other - almost like
"union types take away your power, better do it like this". However, both
approaches have their place.

> As a more sensible example, I prefer Haskell's `Maybe` type compared to the
> "nullable types" offered by some languages, since the latter loses all
> structure

And you just gave a very good example. The two are not the same by any means
and I am happy to be able to work in a language that has a Maybe type in the
standard library. The classic example is having a map where querying by a key
might or might not return a value of type A. Because A can also be null/none,
without a Maybe type, this cannot be properly expressed without losing
information.

However, and I really want to emphasize this: union types _can_ emulate sum
types like Maybe. Simply be creating a wrapper and using it. I.e. creating two
unrelated classes/structs Just and None and then defining

    
    
      struct Just A = ...
      struct None = ...
      type Maybe A = Just A | None
    

There is Maybe, emulated by union types.

The other way around is not possible though. Using Haskell#s capabilities, it
is not possible to emulate Typescript's (or other language's) union types,
which I find very sad. They are a useful thing to have. The only thing that
might come close is Haskell Liquid and some fancy typelevel programming
machinery to emulate union types.

------
gweinberg
This was interesting but it seems to me that if you really want your type
system to let you do everything you might reasonably want to do but also to
catch any errors you could reasonably expect it to, you not only need a
concept of subtype, you need to be able to tell your compiler that a function
that can handle mutilple types can return different types based on the type
passed in.

For example, let's say I have a square root function that will happily handle
real or even complex input, and I will want to pass it's output to another
function that only accepts real input. Maybe I'm asking too much from life,
but it seems to me that I need to tell the compiler that if I pass in a
nonnegative real input, I'm getting a nonnegative real output. AND I need to
be able to assure my compiler the number I'm passing in is a nonnegative real.

~~~
valenterry
You are right and it seems you have a very good gut-feeling about these
things! :)

What you describe is commonly called a "(value) dependent type system". There
are languages that support this, e.g. Idris which was specifically built for
that: [https://www.idris-lang.org/](https://www.idris-lang.org/)

Here is an example where the printf function is built purely in Idris:
[https://paulosuzart.github.io/blog/2017/06/18/dependent-
type...](https://paulosuzart.github.io/blog/2017/06/18/dependent-types-and-
safer-code/)

The compiler will parse the input string such as "some number %d and a string
%s" at compile time (!) and assess whether the following arguments passed to
printf satisfy the shape of the string.

------
duijf
My favorite real world example of how you can use these techniques to reduce
bugs:

    
    
        -- Bad.
        data Session = Session
          { authenticated :: Bool
          , challenge :: Maybe Challenge
          , userId :: Maybe UserId
          }
    
        -- Good.
        data Session 
          = Unauthenticated
          | Authenticating Challenge
          | Authenticated UserId
    

(Credits to my friend Arian for the example [1])

The first type permits a value like `Session { authenticated = True, challenge
= Challenge "<some challenge>", userId = UserId 1 }`. That's a nonsensical
value which doesn't make sense in terms of the business logic.

Generally, when you have types which permit values like this:

\- Someone (you or a coworker) will eventually write some code that constructs
such nonsensical values.

\- This means that you need a lot of tests to ensure that all code using this
type works correctly when given such nonsensical values.

By using the second definition of `Session`, you don't have to worry about
this at all. Nonsensical values can never exist so you will not accidentally
construct them. Therefore you need less tests to ensure your code is correct.

\- - -

For some reason, people are really keen to get into code like this when
boolean flags are involved. Consider the following code (this time in Python):

    
    
        from dataclasses import dataclass
    
        @dataclass
        class Options:
            connect_tls: bool
            verify_cert: bool
            some_other_setting: int
    

It does not make sense to have `Options(connect_tls=False, verify_cert=True,
...)`. There is no certificate to validate when you connect without TLS.

(This generally happens when someone is tasked with implementing the
`verify_cert` option. They see the existing options type, the existing flag
for `connect_tls` and just add a second boolean.)

When I review code like this, I generally advocate for using Enums:

    
    
        from dataclasses import dataclass
        from enum import Enum, auto
    
        class ConnectionOptions(Enum):
            # Could also be: `PLAIN = 'plain'` or `PLAIN = 1`
            PLAIN = auto()
            TLS_UNVERIFIED = auto()
            TLS = auto()
    
        @dataclass
        class Options:
            connection_options: ConnectionOptions
            some_other_setting: int
    

It's almost the same safety level as in Haskell (although the Enum has a
default serialization, which you need to think about e.g. when you store it in
a database).

[1]:
[https://twitter.com/ProgrammerDude/status/124908893689234637...](https://twitter.com/ProgrammerDude/status/1249088936892346371)

------
city41
The TypeScript EvenList<T> type at the end does not work as the author is
implying.

[https://www.typescriptlang.org/play?#code/C4TwDgpgBAogbhAdgG...](https://www.typescriptlang.org/play?#code/C4TwDgpgBAogbhAdgGQJYGdgB4AqA+KAXigG0BdKAH1JwBoo7YEUNt8yBuAKC4GMB7RJigAjAFxMkaTFkQBXALYiIAJwLESARloAmTn0HCAhhPhTWsxcrVFS2nbQDMtACz6gA)

I am not a TS type expert by any means, but I don't believe it's possible to
express an even list in TS's type system.

But with that said, I enjoyed the article and I agree with the points its
making.

~~~
lexi-lambda
I (the author) did not imply that the definition of EvenList I gave would
admit [1, 2, 3, 4]—it only admits [1, 2, [3, 4, []]].

Indeed, this is the _whole point_ of the second half of the blog post: you can
enforce lots of invariants by modifying the way you represent your data, at
the cost of having to write some additional code to work with your alternative
representation. Reread the conclusion.

~~~
city41
Yup, my bad. Thanks for poking me.

------
layer8
What the article describes is a simple consequence of the Curry-Howard
equivalence, that function types correspond to logical implication, and
constructors are just a special case of function types.

~~~
chriswarbo
I wouldn't say it's a "simple consequence" or "just" this-or-that. Yes, Curry-
Howard gives us a logical perspective on our code; but there are still better
and worse ways to represent things from a logical perspective.

For example, we could reframe the author's point from the logic side, and
complain that lots of mathematics suffers from defining things as sets-with-
contraints, when it may be simpler to follow a more algebraic approach.

~~~
layer8
Actually, my aim was to give a broader perspective and draw attention to the
fact that the inference rules as presented in the article are not limited to
constructors, but can apply to any function. Thinking in those terms, you want
to give functions a return type that matches the guarantees you want the
function to give the caller (and argument types that match the guarantees the
caller should give to the function). Furthermore, it means that using types in
terms of a deductive system is not restricted to languages with ML/Haskell-
like data constructors, but generally useful in any statically-typed language
that supports user-defined types.

