
Sum Types in Go - luu
http://www.jerf.org/iri/post/2917
======
alangpierce
You can get a stronger sum type using the visitor pattern (which is the
typical way to get sum types in Java). I ported the code in the post to use
that approach, and here's what it looks like:

[http://play.golang.org/p/U_CZG9IMZ6](http://play.golang.org/p/U_CZG9IMZ6)

The main advantage of this approach is that, like with "real" sum types, you
get compile-time checking that you've handled every case. It also seems to
mesh better with Go's interfaces, since there's no need to introduce a dummy
method so the interface can work as a tag.

On the other hand, bypassing Go's type switch construct is a little
disappointing, since the automatic casting in type switches makes the syntax
pretty clean.

It's also interesting to see that having "r" as an assignable return value
comes in handy here (since you can't return from inside a lambda). In Java,
I'd solve the same problem by making Visit usable as an expression (like case
expressions in functional languages) by giving it a parameterized return type,
but the lack of generics makes that annoying in Go.

------
simias
_If you 've never used a language with good support for sum types, they sound
worthless_.

That's definitely my experience, I never used a language featuring sum types
before I started learning rust a few months ago but now I miss them as soon as
I go back to C.

For those of you who are not yet enlightened here's a couple examples: I had
to implement a time counter struct. It was either counting, in which case I
would only store the date corresponding to the time 0 (and I would compute the
current counter value by substracting the current time to this date instead of
having to update the counter every second) or halted in which case I would
just store the last counter value.

So my first try was:

    
    
        struct Counter {
            counting: bool,
            /// When counting is true
            tzero:      u32,
            /// When counting is false
            last_count: u32,
        }
    

Now of course, that's error prone and wasteful. Wasteful because you store two
u32 when only one of them will be used at any given time and error prone
because you have to be careful to always correctly check the value of
_counting_ before using one of the other values. You could save some space by
using a single "tzero_or_last_count" u32 but that makes it even more error
prone.

And then I saw the light:

    
    
        enum Counter {
            Counting(u32),
            Halted(u32),
        }
    

This does exactly what I needed: No wasted space since this only takes the
space for a single u32 and then a little more for the tag, the equivalent of
my _counting_ flag above but handled by the language directly. And it's so
much nicer to handle: the language won't let me access the underlying integers
unless I match the enum which forces me to check. What was an implicit
convention left for the coders to enforce is now checked by the compiler.

An other example I have is the following:

enum CacheState { Valid(Cache), Dirty, }

I can only access the _Cache_ when it's not _Dirty_. In C or C++ you could get
similar results by using encapsulation and have accessor functions that check
the dirty flag but you have to implement and debug all that by yourself. With
sum types it's built directly into the language.

~~~
Cyther606
As another illustrative example, let's say we want to create a library for
working with mathematical expressions such as x^2 + 5*x. It's natural to
define our data types like this:

    
    
        type
          FormulaKind = enum
            fkVar,        ## element is a variable like 'X'
            fkLit,        ## element is a literal like 0.1
            fkAdd,        ## element is an addition operation
            fkMul,        ## element is a multiplication operation
            fkExp         ## element is an exponentiation operation
    
        type
          Formula = ref object
            case kind: FormulaKind
            of fkVar: name: string
            of fkLit: value: float
            of fkAdd, fkMul, fkExp: left, right: Formula
    

Nim's enum [1] is an old-school typesafe enum, as in Ada, without any fields.
To avoid name clashes, it's common to prefix the enum values with a two-letter
abbreviation. The case part in the object declaration introduces a checked
union. So the access of f.name will raise an exception if f.kind != fkVar.

[1] [http://goran.krampe.se/2014/10/20/i-missed-
nim/](http://goran.krampe.se/2014/10/20/i-missed-nim/)

------
Animats
_A sum type is a type in a language that can have multiple different kinds of
values, which themselves may contain values of differing types. To put it in C
terms, it is a "union" in a struct with an element that says which member of
the union is currently the active element._

That's called a discriminated variant type. Pascal had it, as did Ada and the
various Modulas. Rust has something like it, as an extension to enums. In C++
and Go, you can sort of fake it with the subclassing mechanisms.

Go tends to over-use "interface{}" as an "any" type. This adds cost to things
like unpacking SQL rows, which come back as "[]interface{}" and then have to
be converted to language types item by item. Go could use something built-in
mechanism for

    
    
        err := Pack(Titem, interfaces)
    

to pack up an item of type T into an array of interfaces, and

    
    
        err := Unpack(interfaces, Titem)
    

to unpack one. It's easy to compile hard code for that, but inefficient to
dismantle the structure with reflection. It's useful whenever there's
marshaling needed - JSON, SQL, XML, etc.

------
pbnjay
Seems to be trying to fit Haskell into Go, which is probably not what you want
to do. What exactly do "sum types" get you that a more direct implementation
does not? This may be a case where the simplified example isn't describing the
problem very clearly.

Instead of using interfaces for simple "tagging", have them do something
useful to all the types in the system, for example return the value (identity
or the operation):
[http://play.golang.org/p/bDbzb_ItfZ](http://play.golang.org/p/bDbzb_ItfZ)
This gets you a shorter, more idiomatic representation which is easier to
follow and doesn't rely on an eval function.

It also allows you to later add in type-checking:
[http://play.golang.org/p/WkEWF0rg-P](http://play.golang.org/p/WkEWF0rg-P) or
even further down the road, automatic casting of bool to int:
[http://play.golang.org/p/YfCTcoPwuh](http://play.golang.org/p/YfCTcoPwuh)

~~~
jerf
"Instead of using interfaces for simple "tagging", have them does something
useful to all the types in the system,"

I said that already:

"Of course, the tag interface is only used if you have no meaningful interface
to define for your sum type. If you do have a real interface you can define
you don't need a fake is* function, just define the interface."

The point is precisely that this is still useful even if you've got nothing
else in common with the types.

In the spirit of Go simplicity ( _cough_ ), I actually prefer that the central
pump define all the processing in one place when using a message pump, rather
than scatter them around OO-style.

------
jesstaa
"A possible exception can be made for things like int, but even then I prefer
defining type aliases for things as much as possible, so the type system is as
helpful as possible."

It's important to note that Go doesn't allow you to define type aliases.
Embedding is the closest you get.

~~~
TheDong
What do you mean? Isn't this a type alias:
[http://play.golang.org/p/J2gHol1Egd](http://play.golang.org/p/J2gHol1Egd)

When I say "type MyInt int" didn't I just type-alias int?

Edit: I'm aware that they're not able to be used interchangeably, but I think
I've heard the term used in this context before. I think I'm actually
incorrect here and have just heard people use an incorrect term and thought
that was an actual go term for some malformed type alias.

~~~
evmar
Grandparent is correct. You didn't make an alias, because it's not
interchangable:

[http://play.golang.org/p/fiz0NnduNF](http://play.golang.org/p/fiz0NnduNF)

(try running it)

------
colin_mccabe
What I usually find is that in the case where I have a Golang function that
needs to return either a foo or a bar, I can just return a tuple of (foo, bar)
and have one of them be nil. Yes, you miss out on the compiler enforcing the
fact that one is nil, but that hasn't been a big problem for me in practice.
The same goes with passing a "foo or bar" to a channel.. I just send a struct
where one field is nil. It works very well in practice.

It would be interesting to see a discussion of why Golang didn't implement sum
types. I suspect that part of it is efficiency... right now, the sizes of Go
structs are known at compile time. With a sum type, you would either have to
give that up, or pad to the size of the largest possible value. This is
similar to unions in C, where the size is dictated by the size of the largest
element.

I think that in the case he gives, of an expression tree, using interfaces
like he did is the right way to go. I would only say, that "eval" should be a
method in the interface, to eliminate the clunky case statement based on the
value type. This would also allow him to get rid of the isExpr method and
write the code in a more idiomatic way, I think...

~~~
alangpierce
Changing eval to an interface method would work, but wouldn't get quite the
same effect.

The way I see it, there's a two-dimensional problem here: one dimension is the
different cases for the type (I, B, Add, Mul, Eq), and the other dimension is
the different operations you want to perform (eval, typecheck, optimize,
etc.). Each operation handles each case, so you have n*m pieces of code, but
you're stuck with a (pretty much) one-dimensional text file, so as a
programmer, you have to choose how to slice it, and I think different problems
make more sense sliced in different ways. Sum types are nice because they let
you put all the cases for an algorithm in the same place, which often makes
the algorithm easier to understand. That said, maybe Go's flexibility in
method ordering (the fact that you can declare method anywhere, not just next
to the struct they act on) makes this a non-issue. You could define your
different case types at the top of the file, then put all of your eval method
together, all your typecheck methods together, etc. It wouldn't be as easy to
work with as pattern matching, but it's certainly better than having the
different pieces scattered everywhere.

Another thing to consider is extensibility, particularly if you're writing a
library or plugin system or something like that. If you use an interface to
declare "an expression is something that knows how to be evaluated,
typechecked, and optimized", then someone using your code as a library could
add another expression case without a lot of trouble, but they would have no
(safe) way to add an operation. With typical sum types, you have the opposite
constraint: "an expression is either I, B, Add, Mul, or Eq", so a library user
can safely add additional operations, but they couldn't add additional
expression cases. I think that making both dimensions extensible is a
fundamentally hard problem: for example, if plugin 1 wanted to add a new case
and plugin 2 wanted to add a new operation, who would be responsible for
handling the new operation in the new case? So as a library author, you have
to choose which dimension is fixed and which dimension is extensible, and sum
types are the way to keep the list of cases fixed.

~~~
colin_mccabe
I apologize if this sounds dismissive, but I think you are still not quite
looking at this from a Golang point of view. If you want to group all the Eval
methods together for multiple types, please do! This ain't Java. You don't
have to use one file per type, unless it makes sense to do that. As you said,
Go gives you flexibility in method ordering.

 _Another thing to consider is extensibility, particularly if you 're writing
a library or plugin system or something like that. If you use an interface to
declare "an expression is something that knows how to be evaluated,
typechecked, and optimized", then someone using your code as a library could
add another expression case without a lot of trouble, but they would have no
(safe) way to add an operation._

So, there's a few ways to do this. One is to have your external module use an
external function with a case switch, like you did. Another way is to create a
new interface for your new operation. Then, your plugin library could create
wrapper types for the basic types, and implement the new interface for them.
Go makes wrapper types pretty easy to create through the anonymous struct
member mechanism.

 _I think that making both dimensions extensible is a fundamentally hard
problem: for example, if plugin 1 wanted to add a new case and plugin 2 wanted
to add a new operation, who would be responsible for handling the new
operation in the new case? So as a library author, you have to choose which
dimension is fixed and which dimension is extensible, and sum types are the
way to keep the list of cases fixed._

Maybe I'm missing something, but this doesn't seem that hard in Go. Plugin 1
can add a new type which implements whatever interfaces it wants to. Plugin 2
can add a new interface which other people can implement, or not, as they
choose. Once you get away from the idea of type hierarchies, this stops being
difficult.

I agree that there are some cool things in SML, OCaml, and other languages
that Go doesn't have (like algebraic pattern matching and currying), but there
are other things it does have which can be very helpful. Right tool for the
job, and all that.

~~~
alangpierce
Thanks for the thoughts. I've worked with Go enough that I know pretty much
all of the language features, but not enough to be comfortable with idiomatic
programming, so it's nice to see a more experienced perspective.

 _Maybe I 'm missing something, but this doesn't seem that hard in Go. Plugin
1 can add a new type which implements whatever interfaces it wants to. Plugin
2 can add a new interface which other people can implement, or not, as they
choose. Once you get away from the idea of type hierarchies, this stops being
difficult._

Yeah, it's not so hard if you allow operations to be optional, but I think
there are some cases where you can't get away with that. As a more concrete
example, let's say we're writing a flexible compiler and we're hoping to
provide two extension points:

* People should be able to add support for additional architectures. They do this by writing a new function (or something conceptually equivalent) that takes a syntax tree and outputs a list of instructions in some particular instruction set.

* People should be able to extend the language by adding new syntactic elements (new cases for a syntax tree node).

If one plugin adds do-while loops and a completely separate plugin adds ARM
support, there won't be any code for compiling do-while loops to ARM, so the
compiler would simply be broken for that case (there's no acceptable "default
behavior"). But I think that's a fundamentally hard problem that can't be
solved with any language feature.

But if we assume that only one plugin is allowed (or that plugins have a
strict ordering), then I think either of the solutions you mentioned (type
switch or wrapper types) sound pretty reasonable, and they're more flexible
than sum types because sum types would force the syntax tree format to always
stay the same. Still, I don't think any of these solutions is perfect. For
example, when using wrappers, the Sum, Mul, and Eq cases would still point at
the non-wrapped types, so I think you'd have to do a bunch of conversions.
With the type-switch (but only for plugins) approach, the built-in x86 code
generator would be implemented in a completely different way from the ARM code
generator plugin, which seems weird to me. Also, neither of your approaches
has a way to enforce at compile time that every case was handled/wrapped.

------
TheDong
One thing he says I think is incorrect.

"interface{} is a code smell unless your function truly takes anything" seems
wrong to me because I frequently find myself using interface anytime my
function can take multiply typed maps or slices (as a substitute for a
generic) and I don't think you can use an expression to indicate all slices of
all types.

In that case, you start with a type assertion using reflection (is a slice of
something) but you can't avoid taking an interface.

It's a minor nit; overall this article is very well written and rings true.

~~~
TheHydroImpulse
While I'm not a Go programmer, the reason it _should_ be a code smell is
because you lose type safety and incur a run-time penalty cost for using
reflections. Perhaps the things people build in Go will make the later point
moot, but the former shouldn't.

~~~
TheDong
Oh, I absolutely agree it's bad to lose type safety, but you have to realize
that go does not give you another option. When you want to do something that
takes a "generic" slice, you don't have generics. If you're using reflection,
you're already so far out in left field in terms of safety that you might as
well just give up on your type safety because you don't have any other safety
either.

I fully agree that this is bad, but sicne Go is not a good language it's what
you do and thus this is not a code smell in Go, but rather an artifact of Go
lacking generics.

------
hc5
> Most Go-ic? Maximally Go-ful? What's Go's equivalent to "Pythonic"?

Gophery

~~~
jvehent
Idiomatic. [http://www.merriam-
webster.com/dictionary/idiomatic](http://www.merriam-
webster.com/dictionary/idiomatic)

------
toolslive
here are some attempts to do this in C++, [http://www.stroustrup.com/OOPSLA-
typeswitch-draft.pdf](http://www.stroustrup.com/OOPSLA-typeswitch-draft.pdf)

seems there are no good solutions.

------
DaNmarner
On a side note, Swift's enum provides complete support for sum type.

~~~
dons
They may not be recursive however

~~~
panic
You can work around that using a "Box" type that wraps a value in a reference:

    
    
        class Box<T> {
            var contents : T
            init(_ initialContents : T) {
                contents = initialContents
            }
        }
    
        enum BinaryTree {
            case Leaf
            case Branch(Box<BinaryTree>, Box<BinaryTree>)
        }

