
Sum Types Are Coming (2015) - signa11
https://chadaustin.me/2015/07/sum-types/
======
Cieplak
Once you get the hang of algebraic data types (ADTs), definitely check out
generalized ADTs (GADTs). They reduce the amount of code required to express a
solution, make the type system do a ton of work for you to prevent bugs from
compiling, and make it easier to embed domain-specific languages in your code.
A lot of the GADT tutorials out there are a little abstract for my taste and I
wish there were more examples with tangible applications, but this one is
decent:

[http://joeyh.name/blog/entry/making_propellor_safer_with_GAD...](http://joeyh.name/blog/entry/making_propellor_safer_with_GADTs_and_type_families/)

~~~
Peaker
Why do they reduce the amount of code?

They add the ability for constructors to carry constraints, which aids
expressivity in some cases.

But the majority of GADT uses purely add safety. Can use ordinary ADT without
a type-variable that is correlated with the data constructor, and crash at
runtime if the constructors don't match.

~~~
tom_mellior
> Why do they reduce the amount of code? [...] crash at runtime if the
> constructors don't match.

You answered yourself: If you cannot have these runtime type mismatches, you
don't need to write the corresponding bookkeeping and checking code.

~~~
Peaker
You don't have to write it - you just have to avoid -Wall :-)

------
gameswithgo
The last couple of years I've been doing a lot of exploring different
languages and language features. Sum types are one of the few features I've
come across that seem to have the potential for large, practical benefit. Not
only do you get the potential to eliminate null from a language with Option or
Maybe types, but many cases where you might use multiple classes and files to
express some hierarchy, can be replaced with just a few lines of code that is
so easy to read you can even show it directly to the non programmers working
on the project, to work through the definition together.

This especially happens nicely when the syntax is built into the language in a
succinct way as in F# / OCaml (possibly others as well!)

~~~
masklinn
It's also a convenient way to implement "value-based" state machines (where
the entire machine is a single type with a value for each state, as opposed to
type-based where each state is implemented as a different — and possibly
completely incompatible — static state).

~~~
tomjakubowski
+1. They are a huge help for managing things like connection state. You can
create powerful, statically-enforced invariants, like that a lifetime of a
socket is 1:1 with a state of your state machine.

------
bcherny
Note that TypeScript and Flow have implemented sum types since this post was
published.

~~~
catnaroek
No, they haven't. Union types and sum types are semantically very different
from each other.

~~~
brabel
How do they differ?

~~~
tel
The Union of Int and Int is exactly Int; the Sum is not.

The union of two types contains all values that are in either component type.
The sum of them contains all of the values of the first type, each one
associated with a tag like “I came from the left side” plus all of the
elements from the second type with a similar tag.

The Union of Bool and Bool has 2 values, the Sum has 4.

If we define Maybe[A] to be the process of unioning on “null” to the values of
A then Maybe[Maybe[A]] = Maybe[A]. If it’s defined with a Sum then that
equation doesn’t hold.

------
nixpulvis
Sum types are one of the best things, as they empower very good `match` style
conditionals.

~~~
sevensor
I can attest to this. I've just done a project in F# -- my first experience
using a statically typed functional language beyond messing around in a REPL
-- and the expressive power of this combination is intoxicating. I may have
gone overboard; I found it hard to write procedural code in F#, so I split up
my procedures into steps, with a type to model each step and a sum type to
model the whole procedure. Coming from C / Python, where I was used to
procedural code mostly working most of the time and then failing in hilarious
ways, this was a revelation. A language where I can make the compiler check my
procedural logic for me! Liberating!

~~~
yetanotheruser
Does f# have a good repl?

What ml style language has the closest thing approaching a lisp style repl?

~~~
tom_mellior
Not saying it's the closest to what you have in mind, but you could look at
the Emacs Tuareg mode for OCaml. Here is a reference card:
[http://www.ocamlpro.com/files/tuareg-
mode.pdf](http://www.ocamlpro.com/files/tuareg-mode.pdf)

If you are a SLIME fan, you might notice that some of the key bindings to
interact with the toplevel (the OCaml word for "REPL") are identical in
Tuareg.

That said, although this will sound tongue-in-cheek: Having used Isabelle and
Coq with their interactive editors, I find their approach better than a REPL.
In these systems, you execute code directly from your source file, as opposed
to (conceptually) sending snippets of code to a REPL. There is no concept of
code you have written in the source (or the REPL) which is not synced up with
the state of the REPL (or the source code, respectively). I wish other
languages had modes like this. There seems to be one for Prolog, but I've
never used it and don't know how well it works in practice:
[https://www.metalevel.at/ediprolog/](https://www.metalevel.at/ediprolog/)

------
IshKebab
Hopefully flow typing will be next. It's a really cool idea and I'm surprised
it hasn't caught on more.

[https://en.wikipedia.org/wiki/Flow-
sensitive_typing](https://en.wikipedia.org/wiki/Flow-sensitive_typing)

~~~
girvo
One of my favourite features of Flow is the ‘*’ type and having Flow handle
the given type based on control flow via “if” checks. You can achieve it
somewhat in TypeScript too, IIRC. Neither are quite as powerful as I would
like, but then what I would like is basically magic!

------
reacweb
Are they really coming? When I was a student, I have encountered them the
first time in Ada83 (discriminant records) in 1990, then in gofer (a variant
of Haskell developed at Oxford) in 1992, then in Caml light (the ancestor of
OCaml) the same year.

~~~
Crespyl
I think maybe "coming to the mainstream" fits better. I too saw it in Ada in
school, and of course Haskell and friends have had them for ages.

There does seem to be recent trend of popular, highly visible languages that
are adding the feature though. I think the biggest thing though is really that
the pattern matching syntax that makes them so easy to use is becoming more
common.

You can have Sum types without pattern matching, and vice versa, but it seems
to be much more common now to expect any new language to include pattern
matching.

------
jstimpfle
One thing I really need is getting at "the type field". This is useful
information by itself, and I want it most of the time - but I think in Haskell
it's not possible to get at it without manually defining a "type type" and a
function that computes that type from a given sum-typed data.

I don't think that lambdas/closures have such a problem in most programming
languages. It's still unclear how to handle function pointers for
lambdas/closures, but it should be a lot easier to make a lambda into a real
function than changing a datatype definition when it's already used across the
codebase.

So, I think sum types are really another one of those ideas which may prevent
some bugs sometimes, but in the end are not generally super useful, and are
just a little too clever and end up causing a more work than they save. At
least unless there's a well-defined representation from which we can extract
the type information, in which case I'd probably be happy to use them.

~~~
tom_mellior
Could you give an example in a (possibly hypothetical) programming language of
your choice of what you mean by "the type field", and what you would do with
it? Maybe we could point you to something that does what you want.

Though I do wonder if reflection on type information (at runtime?) would be
useful and easy for you if you find basic algebraic data types "a little too
clever" and "causing more work than they save".

~~~
jstimpfle
Regarding the type field, look for "enum EventType type;" in the article. (In
practice I would not use "enum EventType", but simply "int" \- but anyway).

What I would do with it? Like the result of (x+y), or any type of data, I can
use it for whatever I need. I might simply want to count the number of
occurences of each "type". Often I'm simply not interested in the data besides
the "type", and in these cases I find that case-matching is a lot of
boilerplate. And it doesn't reify the "type" to authoritative runtime values.
Which, as you say, might be done with reification support from the language,
but I don't want to bother with that -- and it would not allow me to choose
the possible values of the type field, which might be a requirement if the
options are defined somewhere else.

~~~
tom_mellior
> Regarding the type field, look for "enum EventType type;" in the article.

OK. If I understand correctly, you want to use that field in other contexts
than doing a switch on it, whereas in Haskell the usual way of consuming the
type tag is to do such a switch, as in the article's example:

    
    
        case event of
            ClickEvent x y -> <do click stuff>
            PaintEvent color -> <do paint stuff>
            // there are no other possible cases, so no default case needed
    

> For example, I could separate a large array of sum-type data into separate
> arrays, and also process them separately.

Sure, but you can do that with a case expression as above. Here is a complete
example:

    
    
        data Event = ClickEvent Int Int | PaintEvent Int
    
        -- separate a list of Events into two lists, the first holding the payloads
        -- of the click events, the other the payloads of the paint events
        separate :: [Event] -> ([(Int, Int)], [Int])
        separate [] = ([], [])
        separate (event : events) =
            case event of
                ClickEvent x y -> ((x, y) : clicks, colors)
                PaintEvent color -> (clicks, color : colors)
            where
                (clicks, colors) = separate events
    
        example = [ClickEvent 1 2, PaintEvent 42, ClickEvent 3 4, PaintEvent 23]
    

Running it:

    
    
        > separate example
        ([(1,2),(3,4)],[42,23])
    

You can now go ahead and process the list [(1,2),(3,4)] separately from
[42,23].

(Depending on the exact specification and performance requirements, the code
might be written a bit differently in the real world.)

EDIT: Ah, you edited your post while I was writing. Oh well. Regarding
reifying constructors as integers or whatever, yes, you might have to write a
two-line function to do that.

~~~
jstimpfle
Yep, sorry for the edit. I did the arrays example but then noticed it was not
really an answer to your question, since it can be done with sum types in
almost the same way.

------
s-kim
Hi everyone, similar to the link, I wrote a tech memo about an algebraic data
type to share what I have learned while studying the data type.

[https://kstreee.github.io/techmemo/algebraic.pdf](https://kstreee.github.io/techmemo/algebraic.pdf)

------
fermigier
For Python (as a library, not a core language construct):

[https://github.com/radix/sumtypes/](https://github.com/radix/sumtypes/)

(Same year - 2015 - as the original post ;)

~~~
fphilipe
If you're looking for a simple sum type without associated value, the standard
library's _enum_ works quite well:
[https://docs.python.org/3/library/enum.html](https://docs.python.org/3/library/enum.html)

------
gumby
Since this article was written c++17 has added std::variant and things like
std:any. But his complaint about c++ matching being verbose is really AI
complaint that c++ typically doesn’t have a first class type system (unless
rtti is used, and even then you can’t create new types at runtime). This lack
is either a limitation or benefit depending on your objectives for the program
you are developing.

There isn’t a One True Approach; I do all my work in Common Lisp or C++ and
although both are multi-paradigm languages they aren’t substitutable for each
other.

------
okket
What is the difference between these 'sum types' and Swifts enum type?

Also I think the name 'sum type' is bad. Nothing gets added or summed up, just
combined and selectively chosen. For this reason I find supercharging 'enum',
like Swift does, a more logical solution.

~~~
dbaupp
Swift enums are sum types.

Various languages use different keywords to access the same concept:

\- Rust: 'enum'

\- Swift: 'enum'

\- Haskell: 'data'

\- F#: 'type'

\- Ocaml: 'type'

 _> Also I think the name 'sum type' is bad. Nothing gets added or summed up,
just combined and selectively chosen. For this reason I find supercharging
'enum', like Swift does, a more logical solution. _

They're the same solution...

In any case, the name sum comes from viewing types "algebraically" as the
article discusses. Doing addition of types with a sum type has a lot of
similar properties to doing addition of numbers, and similarly doing
multiplication of types with a product type (tuple, struct or class, in Swift)
has a lot of similar properties to doing multiplication of numbers.

~~~
gavinlynch
why would you want to view a programming language algebraically?

~~~
yorwba
It lets you reuse your existing knowledge about algebra to transform programs.
For example if you have a data type that has two different cases (= sum) each
of which has a bunch of fields (= factors in a product) some of which are
shared (= common factors in a sum of products) then you can literally factor
them out, just like you can factor A * B * C + A * D * C into A * (B + D) * C.

~~~
seanmcdirmid
Conjunction and disjunction are more apt anologies that have the same
properties. There is also no division types while subtraction is used very
rarely.

~~~
DougBTX
I didn't think to call it that at the time, but I think that I came across a
division type in react-redux the other day:
[https://github.com/DefinitelyTyped/DefinitelyTyped/blob/8f8f...](https://github.com/DefinitelyTyped/DefinitelyTyped/blob/8f8f6c439296ade0ab205751a72dfd626dd73364/types/react-
redux/index.d.ts#L50)

Playing with that a little, if adding a property to an interface is a product,
eg:

    
    
        interface Foo {
            foo: string;
            bar: number;
        }
    

where Foo is a product of string and number, then removing a property from an
interface is division:

    
    
        type Diff<T extends string, U extends string> = ({ [P in T]: P } & { [P in U]: never } & { [x: string]: never })[T];
        type Omit<T, K extends keyof T> = Pick<T, Diff<keyof T, K>>;
    
        interface Foo {
            foo: string;
            bar: number;
        }
    
        interface Bar {
            bar: number;
        }
    
        type Out = Omit<Foo, keyof Bar>;
    

the output type Out is equivalent to:

    
    
        interface Out {
            foo: string;
        }
    

or phrasing it another way:

    
    
        Out = Foo / Bar = (string * number) / number = string
    

I can't think what a 1/number type would be used for other than to remove
number from a a T*number, in other words I can't think what a rational type
would be used for unless it simplified down to a "normal" type. But I wouldn't
like to bet that there's no other use :-)

~~~
seanmcdirmid
What is the set theory intuition of that? If product is intersection, division
is...?

~~~
yorwba
Product is _not_ intersection. The equivalent to product types in set theory
is the Cartesian product (
[https://en.wikipedia.org/wiki/Cartesian_product](https://en.wikipedia.org/wiki/Cartesian_product)
).

The closest thing to a division would be a quotient set (
[https://en.wikipedia.org/wiki/Quotient_set](https://en.wikipedia.org/wiki/Quotient_set)
), but there you're dividing by an equivalence relation. It is however
possible to define an equivalence that undoes set multiplication: (A × B)/~ ≅
A if (a₁, b₁) ~ (a₂, b₂) holds iff a₁ = a₂, ignoring the other component.

~~~
seanmcdirmid
That isn’t how product (intersection) types work in typescript. If we are
talking about typescript, of course.

~~~
yorwba
Product and intersection are different things in TypeScript, as well. The
product of string and number would be a type like { first: string; second:
number }, which combines two different values into one; whereas the
intersection string & number is the type of all values that are both strings
and numbers.

~~~
seanmcdirmid
Right, but if we are going to call union types as sum types, why aren’t
interesection types called as product types? Anyways, this is why the whole
sum type thing breaks down and Union is more apt, since we can describe A|B
and A&B using the same terminology family.

~~~
yorwba
A | B and A + B are only "the same" (but not identical) if A and B are
disjoint (i.e. there is no value that is in both types). That's why + is also
called _disjoint union_. You can simulate + with | by introducing artificial
tags to make A and B disjoint, but in the end they are different operations.
TypeScript doesn't really have first-class support for sum types because it
needs to remain interoperable with JavaScript, so this simulation is the
closest you are going to get.

~~~
seanmcdirmid
I agree, so it really isn’t an example of the popularity of sum typing.
Typescript does have support for user-supplied tagging, so you can also
approximate it to some extent.

~~~
DougBTX
Yea, support isn't first class. When I was learning to use union types in
TypeScript, initially I wrote custom type guards to distinguish them, using
any property that was unique to a particular constituent type to differentiate
them.

Nowadays I consider that a mistake, and all unions have a discriminator
property, and there is no need for custom type guards (exceptions would be
when writing .d.ts files for JS that uses them untagged, that sort of thing)
since it makes the code much more explicit.

------
Taniwha
Of course there's nothing new under the sun, this is one of the features of
Algol68 (1968) that Wirth discarded when he made Pascal

They're not coming, they've been and gone, probably a couple of times

~~~
tom_mellior
All I know about Algol 68 is what I just gleaned from Wikipedia, but there I
got the impression that what it had was a limited form. Namely, the type
indexed things mentioned in the featured article as present in C++ and D,
where you don't have tags and thus cannot usefully distinguish different
variants with the same underlying type.

~~~
Taniwha
nope, A68 unions had hidden tags (strictly speaking the spec didn't say they
had tags but it was the only obvious implementation) it did have a case
statement to switch on the tags and extract the values as described above

------
ManDeJan
I prever the c++ name variant, super usefull and new addition to c++17

~~~
humanrebar
Also, std::optional, which is analogous to Maybe.

Of course, both have been in boost for some time now. C++ so far has been
doing this at the library level. But it remains to be seen if they will make
their way into actual interfaces, for example in standard algorithms.

------
platz
sum types are are not all that different from subtyping+inheritance, it's
simply the other side of the expression problem.

~~~
yen223
Unlike subtyping, sum types are bounded - you can't add new variants without
modifying the originally-defined type.

That's a huge difference in terms of code architecture - it enables things
like _exhaustive_ pattern matching, and allows the programmer to enforce
stronger invariants in the code.

~~~
rbehrends
Subtyping can be closed, too (see Scala's sealed classes, for example) and
algebraic data types can be open (see OCaml's extensible variants [1] or
polymorphic variants [2]).

This is simply a distinction between closed and open sum types.

[1] [https://caml.inria.fr/pub/docs/manual-
ocaml/extn.html#s%3Aex...](https://caml.inria.fr/pub/docs/manual-
ocaml/extn.html#s%3Aextensible-variants)

[2] [http://caml.inria.fr/pub/docs/manual-
ocaml-400/manual006.htm...](http://caml.inria.fr/pub/docs/manual-
ocaml-400/manual006.html#toc36)

------
taneq
Do we really need a new name for this? It's basically "safe C-style unions",
but pretty much everything in C is unsafe, so you might as well just call it
"unions".

~~~
saurik
This is not a new name... in the theory of algebraic data types there is a
model of what it means to add and multiply types, resulting in "sum" and
"product" types. I am not 100% sure but I believe these terms have been used
since the early 70s.

The term "union" as used in C is highly related but not quite the same. If you
add a tag to the union (and so have a "tagged union", often represented in C
as a struct with an enum and a union) then you are equivalent in an
information sense (but the usage is still different, as a C union says
something very specific about how memory is used).

~~~
Taniwha
as mentioned elsewhere 'union' originally came from Algol68 where (eventually)
they were implemented exactly as described in the article

------
Fufjeudndh
I sure hope so

~~~
nixpulvis
same <3

