Hacker News new | past | comments | ask | show | jobs | submit login
Sum Types in Go (jerf.org)
62 points by luu on Dec 26, 2014 | hide | past | favorite | 36 comments



You can get a stronger sum type using the visitor pattern (which is the typical way to get sum types in Java). I ported the code in the post to use that approach, and here's what it looks like:

http://play.golang.org/p/U_CZG9IMZ6

The main advantage of this approach is that, like with "real" sum types, you get compile-time checking that you've handled every case. It also seems to mesh better with Go's interfaces, since there's no need to introduce a dummy method so the interface can work as a tag.

On the other hand, bypassing Go's type switch construct is a little disappointing, since the automatic casting in type switches makes the syntax pretty clean.

It's also interesting to see that having "r" as an assignable return value comes in handy here (since you can't return from inside a lambda). In Java, I'd solve the same problem by making Visit usable as an expression (like case expressions in functional languages) by giving it a parameterized return type, but the lack of generics makes that annoying in Go.


If you've never used a language with good support for sum types, they sound worthless.

That's definitely my experience, I never used a language featuring sum types before I started learning rust a few months ago but now I miss them as soon as I go back to C.

For those of you who are not yet enlightened here's a couple examples: I had to implement a time counter struct. It was either counting, in which case I would only store the date corresponding to the time 0 (and I would compute the current counter value by substracting the current time to this date instead of having to update the counter every second) or halted in which case I would just store the last counter value.

So my first try was:

    struct Counter {
        counting: bool,
        /// When counting is true
        tzero:      u32,
        /// When counting is false
        last_count: u32,
    }
Now of course, that's error prone and wasteful. Wasteful because you store two u32 when only one of them will be used at any given time and error prone because you have to be careful to always correctly check the value of counting before using one of the other values. You could save some space by using a single "tzero_or_last_count" u32 but that makes it even more error prone.

And then I saw the light:

    enum Counter {
        Counting(u32),
        Halted(u32),
    }
This does exactly what I needed: No wasted space since this only takes the space for a single u32 and then a little more for the tag, the equivalent of my counting flag above but handled by the language directly. And it's so much nicer to handle: the language won't let me access the underlying integers unless I match the enum which forces me to check. What was an implicit convention left for the coders to enforce is now checked by the compiler.

An other example I have is the following:

enum CacheState { Valid(Cache), Dirty, }

I can only access the Cache when it's not Dirty. In C or C++ you could get similar results by using encapsulation and have accessor functions that check the dirty flag but you have to implement and debug all that by yourself. With sum types it's built directly into the language.


As another illustrative example, let's say we want to create a library for working with mathematical expressions such as x^2 + 5*x. It's natural to define our data types like this:

    type
      FormulaKind = enum
        fkVar,        ## element is a variable like 'X'
        fkLit,        ## element is a literal like 0.1
        fkAdd,        ## element is an addition operation
        fkMul,        ## element is a multiplication operation
        fkExp         ## element is an exponentiation operation

    type
      Formula = ref object
        case kind: FormulaKind
        of fkVar: name: string
        of fkLit: value: float
        of fkAdd, fkMul, fkExp: left, right: Formula
Nim's enum [1] is an old-school typesafe enum, as in Ada, without any fields. To avoid name clashes, it's common to prefix the enum values with a two-letter abbreviation. The case part in the object declaration introduces a checked union. So the access of f.name will raise an exception if f.kind != fkVar.

[1] http://goran.krampe.se/2014/10/20/i-missed-nim/


That `enum Counter` example (and Either in general) is a union type and not a sum type. It's either Counting (with a u32) or Halted (with a u32) but never both at the same time.


A sum type is a type in a language that can have multiple different kinds of values, which themselves may contain values of differing types. To put it in C terms, it is a "union" in a struct with an element that says which member of the union is currently the active element.

That's called a discriminated variant type. Pascal had it, as did Ada and the various Modulas. Rust has something like it, as an extension to enums. In C++ and Go, you can sort of fake it with the subclassing mechanisms.

Go tends to over-use "interface{}" as an "any" type. This adds cost to things like unpacking SQL rows, which come back as "[]interface{}" and then have to be converted to language types item by item. Go could use something built-in mechanism for

    err := Pack(Titem, interfaces)
to pack up an item of type T into an array of interfaces, and

    err := Unpack(interfaces, Titem)
to unpack one. It's easy to compile hard code for that, but inefficient to dismantle the structure with reflection. It's useful whenever there's marshaling needed - JSON, SQL, XML, etc.


Seems to be trying to fit Haskell into Go, which is probably not what you want to do. What exactly do "sum types" get you that a more direct implementation does not? This may be a case where the simplified example isn't describing the problem very clearly.

Instead of using interfaces for simple "tagging", have them do something useful to all the types in the system, for example return the value (identity or the operation): http://play.golang.org/p/bDbzb_ItfZ This gets you a shorter, more idiomatic representation which is easier to follow and doesn't rely on an eval function.

It also allows you to later add in type-checking: http://play.golang.org/p/WkEWF0rg-P or even further down the road, automatic casting of bool to int: http://play.golang.org/p/YfCTcoPwuh


Sum types are much more direct than other variants in a language that effectively implements them.

For example, in Rust, who's `enum`s are effectively sum types, we can take advantage of the type system while using channels that are well-typed.

https://github.com/servo/servo/blob/master/components/msg/co...

    pub enum Msg {
        ExitMsg,
        FailureMsg(Failure),
        InitLoadUrlMsg(Url),
        LoadCompleteMsg,
        FrameRectMsg(PipelineId, SubpageId, Rect<f32>),
        LoadUrlMsg(PipelineId, LoadData),
        ScriptLoadedURLInIFrameMsg(Url, PipelineId, SubpageId, IFrameSandboxState),
        NavigateMsg(NavigationDirection),
        PainterReadyMsg(PipelineId),
        ResizedWindowMsg(WindowSizeData),
        KeyEvent(Key, KeyState, KeyModifiers),
        /// Requests that the constellation inform the compositor of the title of the pipeline
        /// immediately.
        GetPipelineTitleMsg(PipelineId),
        /// Requests that the constellation inform the compositor of the a cursor change.
        SetCursorMsg(Cursor),
    }

Here we can define a sum type called `Msg` that has various variants. These are the only thing a channel constraint to that type can consume. Each destructing or pattern matching has to cover all possible cases, otherwise it's a compiler error.

This makes it effortless to have complex communication where the data is composed of multiple things, multiple types.

[1] Here's another example of a sum type being used as a messaging type in Servo. [2] Shows pattern matching on all variants and variants of variants, which produces well-typed code.

[1]: https://github.com/servo/servo/blob/master/components/net/re...

[2]: https://github.com/servo/servo/blob/master/components/net/re...

I probably use `enum`s pretty frequently, especially when working with channels. I don't use primitive types-based channels too often come to think of it.


"Instead of using interfaces for simple "tagging", have them does something useful to all the types in the system,"

I said that already:

"Of course, the tag interface is only used if you have no meaningful interface to define for your sum type. If you do have a real interface you can define you don't need a fake is* function, just define the interface."

The point is precisely that this is still useful even if you've got nothing else in common with the types.

In the spirit of Go simplicity (cough), I actually prefer that the central pump define all the processing in one place when using a message pump, rather than scatter them around OO-style.


"A possible exception can be made for things like int, but even then I prefer defining type aliases for things as much as possible, so the type system is as helpful as possible."

It's important to note that Go doesn't allow you to define type aliases. Embedding is the closest you get.


What do you mean? Isn't this a type alias: http://play.golang.org/p/J2gHol1Egd

When I say "type MyInt int" didn't I just type-alias int?

Edit: I'm aware that they're not able to be used interchangeably, but I think I've heard the term used in this context before. I think I'm actually incorrect here and have just heard people use an incorrect term and thought that was an actual go term for some malformed type alias.


Grandparent is correct. You didn't make an alias, because it's not interchangable:

http://play.golang.org/p/fiz0NnduNF

(try running it)


When I say "type MyInt int" didn't I just type-alias int?

No, you created a new type.

  cmccabe@keter:~/tmp> cat inta.go
  package main
  import "fmt"
  type myfoo int
  func printint(i int) {
    fmt.Printf("i = %d\n", i)
  }
  func main() {
    var foo myfoo
    printint(foo)
  }
  cmccabe@keter:~/tmp> go build inta.go 
  # command-line-arguments
  ./inta.go:13: cannot use foo (type myfoo) as type int in   argument to printint
You are probably thinking about "typedef" in C++, but go works differently in this case.


I would not consider your example to be one of a "type alias". Alias usually means that the names can be used interchangeably without a difference in behavior (i.e. casting). Go actually has some built-in aliases where this is the case. "byte" is an alias for "uint8". They can be used interchangeably. Here is an example of the distinction: http://play.golang.org/p/ukYCBqFeW_


It's not an alias, it's a new type. eg. http://play.golang.org/p/bXgryxNT0H


Correct. I will fix that when I'm no longer traveling.


What I usually find is that in the case where I have a Golang function that needs to return either a foo or a bar, I can just return a tuple of (foo, bar) and have one of them be nil. Yes, you miss out on the compiler enforcing the fact that one is nil, but that hasn't been a big problem for me in practice. The same goes with passing a "foo or bar" to a channel.. I just send a struct where one field is nil. It works very well in practice.

It would be interesting to see a discussion of why Golang didn't implement sum types. I suspect that part of it is efficiency... right now, the sizes of Go structs are known at compile time. With a sum type, you would either have to give that up, or pad to the size of the largest possible value. This is similar to unions in C, where the size is dictated by the size of the largest element.

I think that in the case he gives, of an expression tree, using interfaces like he did is the right way to go. I would only say, that "eval" should be a method in the interface, to eliminate the clunky case statement based on the value type. This would also allow him to get rid of the isExpr method and write the code in a more idiomatic way, I think...


well it's a hack to make up for the lack of go expressiveness. Period.

Go needs static polymorphism AND generics.it doesnt need to look like Haskell.


Changing eval to an interface method would work, but wouldn't get quite the same effect.

The way I see it, there's a two-dimensional problem here: one dimension is the different cases for the type (I, B, Add, Mul, Eq), and the other dimension is the different operations you want to perform (eval, typecheck, optimize, etc.). Each operation handles each case, so you have n*m pieces of code, but you're stuck with a (pretty much) one-dimensional text file, so as a programmer, you have to choose how to slice it, and I think different problems make more sense sliced in different ways. Sum types are nice because they let you put all the cases for an algorithm in the same place, which often makes the algorithm easier to understand. That said, maybe Go's flexibility in method ordering (the fact that you can declare method anywhere, not just next to the struct they act on) makes this a non-issue. You could define your different case types at the top of the file, then put all of your eval method together, all your typecheck methods together, etc. It wouldn't be as easy to work with as pattern matching, but it's certainly better than having the different pieces scattered everywhere.

Another thing to consider is extensibility, particularly if you're writing a library or plugin system or something like that. If you use an interface to declare "an expression is something that knows how to be evaluated, typechecked, and optimized", then someone using your code as a library could add another expression case without a lot of trouble, but they would have no (safe) way to add an operation. With typical sum types, you have the opposite constraint: "an expression is either I, B, Add, Mul, or Eq", so a library user can safely add additional operations, but they couldn't add additional expression cases. I think that making both dimensions extensible is a fundamentally hard problem: for example, if plugin 1 wanted to add a new case and plugin 2 wanted to add a new operation, who would be responsible for handling the new operation in the new case? So as a library author, you have to choose which dimension is fixed and which dimension is extensible, and sum types are the way to keep the list of cases fixed.


I apologize if this sounds dismissive, but I think you are still not quite looking at this from a Golang point of view. If you want to group all the Eval methods together for multiple types, please do! This ain't Java. You don't have to use one file per type, unless it makes sense to do that. As you said, Go gives you flexibility in method ordering.

Another thing to consider is extensibility, particularly if you're writing a library or plugin system or something like that. If you use an interface to declare "an expression is something that knows how to be evaluated, typechecked, and optimized", then someone using your code as a library could add another expression case without a lot of trouble, but they would have no (safe) way to add an operation.

So, there's a few ways to do this. One is to have your external module use an external function with a case switch, like you did. Another way is to create a new interface for your new operation. Then, your plugin library could create wrapper types for the basic types, and implement the new interface for them. Go makes wrapper types pretty easy to create through the anonymous struct member mechanism.

I think that making both dimensions extensible is a fundamentally hard problem: for example, if plugin 1 wanted to add a new case and plugin 2 wanted to add a new operation, who would be responsible for handling the new operation in the new case? So as a library author, you have to choose which dimension is fixed and which dimension is extensible, and sum types are the way to keep the list of cases fixed.

Maybe I'm missing something, but this doesn't seem that hard in Go. Plugin 1 can add a new type which implements whatever interfaces it wants to. Plugin 2 can add a new interface which other people can implement, or not, as they choose. Once you get away from the idea of type hierarchies, this stops being difficult.

I agree that there are some cool things in SML, OCaml, and other languages that Go doesn't have (like algebraic pattern matching and currying), but there are other things it does have which can be very helpful. Right tool for the job, and all that.


Thanks for the thoughts. I've worked with Go enough that I know pretty much all of the language features, but not enough to be comfortable with idiomatic programming, so it's nice to see a more experienced perspective.

Maybe I'm missing something, but this doesn't seem that hard in Go. Plugin 1 can add a new type which implements whatever interfaces it wants to. Plugin 2 can add a new interface which other people can implement, or not, as they choose. Once you get away from the idea of type hierarchies, this stops being difficult.

Yeah, it's not so hard if you allow operations to be optional, but I think there are some cases where you can't get away with that. As a more concrete example, let's say we're writing a flexible compiler and we're hoping to provide two extension points:

* People should be able to add support for additional architectures. They do this by writing a new function (or something conceptually equivalent) that takes a syntax tree and outputs a list of instructions in some particular instruction set.

* People should be able to extend the language by adding new syntactic elements (new cases for a syntax tree node).

If one plugin adds do-while loops and a completely separate plugin adds ARM support, there won't be any code for compiling do-while loops to ARM, so the compiler would simply be broken for that case (there's no acceptable "default behavior"). But I think that's a fundamentally hard problem that can't be solved with any language feature.

But if we assume that only one plugin is allowed (or that plugins have a strict ordering), then I think either of the solutions you mentioned (type switch or wrapper types) sound pretty reasonable, and they're more flexible than sum types because sum types would force the syntax tree format to always stay the same. Still, I don't think any of these solutions is perfect. For example, when using wrappers, the Sum, Mul, and Eq cases would still point at the non-wrapped types, so I think you'd have to do a bunch of conversions. With the type-switch (but only for plugins) approach, the built-in x86 code generator would be implemented in a completely different way from the ARM code generator plugin, which seems weird to me. Also, neither of your approaches has a way to enforce at compile time that every case was handled/wrapped.



  > It would be interesting to see a discussion of why Golang 
  > didn't implement sum types.
The discussion is alluded to in the FAQ:

http://golang.org/doc/faq#variant_types

"We considered adding variant types to Go, but after discussion decided to leave them out because they overlap in confusing ways with interfaces. What would happen if the elements of a variant type were themselves interfaces? "

I would be interested to read the original discussion, if it was carried out via publicly accessible mailing lists.

  > I suspect that part of it is efficiency... right now, 
  > the sizes of Go structs are known at compile time.
I don't think this would warrant much concern. To use your own example, emulating sum types via a tuple of `(foo, bar)` where both `foo` and `bar` are nil-able implies that they are interfaces (or pointers), which (AFAICT) means that the return type of your function is always going to be the size of two pointers at runtime (16 bytes on a 64-bit platform). If you replaced that with an actual sum type, the runtime representation would consist of a single pointer and a tag, which would be between 9 and 16 bytes (depending on padding and alignment), so even in the worst case it's no more wasteful than the tuple approach. Additionally, using tuples here, every time that you add another potential variant (i.e. `(foo, bar, baz, qux)`) you inflate the size of the return type by another eight bytes, whereas the sum type scales gracefully without getting larger. And if you're worried about a sum type where one of the variants is some huge struct, then you're free to store a pointer to an instance of that struct instead, in which case your runtime size is, again, no worse than 16 bytes (assuming all the other variants aren't huge structs as well), and it's not like this introduces any greater runtime efficiency than the tuple approach given that interfaces already (again, AFAICT) demand a pointer indirection.

(I'm not an expert on Go though, I could be wrong! Corrections welcome!)


Hi kibwen,

I think it's pretty clear that you can minimize the size of either a sum type or a non-sum type by using pointers rather than values, as you pointed out. But, obviously, Go supports returning values, not just pointers. In that case, Go will have to allocate space for the maximum-sized component of the value type, which could be arbitrarily large. This is a non-obvious fact unless you're a language implementor, or just really interested in the implementation. This somewhat violates the concept that the performance costs of code should be clear.

This is less of an issue in languages like Java that do not have value types. When everything is a pointer, all of the elements of the sum type have the same size (or at least, a size less than 8 bytes, in the case of primitives). This may not be the only, or even the best reason for not implementing them, but it's still something to consider.

It's true that sum types can be more space-efficient than using nullable struct members like my example did. But this mainly seems like an issue when you have more than two types in the sum type, which seems to come up very rarely. The difference between returning two 8-byte pointers and returning a type tag and an 8-byte pointer is not a big deal on a 64-bit machine. You are using 2 64-bit registers in any case. Maybe there's cases I haven't thought of, but at first glance, the efficiency argument for sum types seems very weak.

[Variant types are] alluded to in the FAQ:

Thanks! I didn't see that before. Interesting.


One thing he says I think is incorrect.

"interface{} is a code smell unless your function truly takes anything" seems wrong to me because I frequently find myself using interface anytime my function can take multiply typed maps or slices (as a substitute for a generic) and I don't think you can use an expression to indicate all slices of all types.

In that case, you start with a type assertion using reflection (is a slice of something) but you can't avoid taking an interface.

It's a minor nit; overall this article is very well written and rings true.


While I'm not a Go programmer, the reason it should be a code smell is because you lose type safety and incur a run-time penalty cost for using reflections. Perhaps the things people build in Go will make the later point moot, but the former shouldn't.


Oh, I absolutely agree it's bad to lose type safety, but you have to realize that go does not give you another option. When you want to do something that takes a "generic" slice, you don't have generics. If you're using reflection, you're already so far out in left field in terms of safety that you might as well just give up on your type safety because you don't have any other safety either.

I fully agree that this is bad, but sicne Go is not a good language it's what you do and thus this is not a code smell in Go, but rather an artifact of Go lacking generics.


>"interface{} is a code smell unless your function truly takes anything" seems wrong to me because I frequently find myself using interface anytime my function can take multiply typed maps or slices (as a substitute for a generic) and I don't think you can use an expression to indicate all slices of all types.

It's still a code smell -- just one that's made necessary to add for certain cases by the language (for the lack of better mechanisms).


It's code smell because you're subverting the type system, and your program is not checked to be type-safe.


It's a "code smell" and not an "error" for that reason. You can't always eliminate code smells, but you should always be keeping an, errr, nose on them. (Keep an eye on them isn't quite right, is it?)


> Most Go-ic? Maximally Go-ful? What's Go's equivalent to "Pythonic"?

Gophery



Goistic?


here are some attempts to do this in C++, http://www.stroustrup.com/OOPSLA-typeswitch-draft.pdf

seems there are no good solutions.


On a side note, Swift's enum provides complete support for sum type.


They may not be recursive however


You can work around that using a "Box" type that wraps a value in a reference:

    class Box<T> {
        var contents : T
        init(_ initialContents : T) {
            contents = initialContents
        }
    }

    enum BinaryTree {
        case Leaf
        case Branch(Box<BinaryTree>, Box<BinaryTree>)
    }




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: