
Sum Types in Julia and Rust - xiaodai
https://andreaskroepelin.de/blog/sum_types/
======
amitport
I may be wrong, but I don't think this is what people mean by 'sum types'.

I was expecting union types (e.g.,
[https://www.typescriptlang.org/docs/handbook/unions-and-
inte...](https://www.typescriptlang.org/docs/handbook/unions-and-
intersections.html#union-types) or [https://dotty.epfl.ch/docs/reference/new-
types/union-types.h...](https://dotty.epfl.ch/docs/reference/new-types/union-
types.html) )

I guess enums are a type of sum type, but seems this is more specifically
about enums

~~~
millstone
Simple way to think about it is plus. Sum plus sum increases, union plus union
does nothing.

1\. A union type is like int | double. It means the value is one of a set of
possible types. And it's a true set: `int | int | double` is indistinguishable
from `int | double`.

2\. A sum type is a new type built from a _list_ of other types, which assigns
a 'tag' to each possible list element, like a Rust enum. The tags are not
themselves types: they are like a struct field name. It's quite possible to
have a sum type with N tags, each wrapping `int`.

So with this understanding Rust has sum types but not union types, while
TypeScript has union types but not sum types.

~~~
paavohtl
It is possible (and popular) to have sum types in TypeScript by manually
adding a tag, though.

    
    
      type Maybe<T> =
        | { kind: "Some", value: T }
        | { kind: "None" }

~~~
carlmr
This is also why sum types are often referred to as tagged unions. They are a
special case of a union, in Rust's case with support from the type system.

------
siraben
When I think of sum types, I like the categorical definition the best, which
is that a sum A+B has two morphisms (i.e. constructors) Inl : A -> A+B and Inr
: B -> A+B, with a simple commuting diagram[0]. Or in Rust,

    
    
      enum Sum<A, B> {
         Inl(A),
         Inr(B),
      }
    

Why do I prefer this definition? Well, category theory abstracts away
irrelevant details, and sums have a "universal property" associated with them.
Roughly speaking that means that it doesn't matter how you define sum types in
your language, if they fit the universal property of sums (up to isomorphism)
then they truly can be considered sum types. In the Rust PlayerClass example
the corresponding sum is (Solarian + (Polarian + Centaurian)), and morphisms

    
    
      Sol  = Inl . Inl
      Pol  = Inl . Inr
      Cent = Inr . Inr
    

[0]
[https://en.wikipedia.org/wiki/Coproduct](https://en.wikipedia.org/wiki/Coproduct)

~~~
millstone
I don't know any category theory whatsoever. What happens in your example if A
= B? Would it have two morphisms or just one?

~~~
cakoose
What might be causing confusion is the difference between tagged unions and
untagged unions.

A union type "A | B" means "a value of type A or a value of type B". Example:

    
    
        function f1(): String | Integer {
            if (rand()) {
                return "hello"
            } else {
                return 12
            }
        }
    
        function f2(x: String | Integer) {
            switch (typeof x) {
                case String: return "string: " + x
                case Integer: return "integer: " + x
            }
        }
    

The type "String | String" is exactly equivalent to "String".

A tagged union (aka sum type) "A + B" means "either a left value or a right
value; if it's the left, it has type A, if it's the right it has type B".

    
    
        function g1(): String + Integer {
            if (rand()) {
                return Inl("hello")
            } else {
                return Inr("bye")
            }
        }
    
        function g2(x: String + String) {
            switch (x) {
                case Inl(s): return "left value: " + x
                case Inr(s): return "right value: " + x
            }
        }
    

The type "String + String" has one bit of additional information than just
"String".

------
celrod
> Use subtyping in Julia and enums in Rust.

I would use enums in Julia, making the enum you define a field of the
PlayerClass:

    
    
      @enum PlayerStar Sol Pol Cent
      struct PlayerClass
          star::PlayerStar
          # other fields
      end
    

Julia's compiler will split small unions into static dispatches behind
branches, and can also decide whether or not to specialize on types or create
a generic functions.

But if your code is performance sensitive, I'm more comfortable controlling
the behavior than relying on these optimizations. The problem is, dispatch is
often a more convenient coding style than long branches.

------
richardwhiuk

      fn greet<T: Player>(&self, other: &T);
    

Yeah, this doesn't work, but the equivalent to what you are writing in Julia
does:

    
    
      fn greet(&self, other: Box<dyn Player>);

~~~
JoshMcguigan
I’m not sure what is meant here, because that top example would compile.

edit - Alright I think I see what the original article was trying to express.
In the original article that top example method was defined as part of a
trait. Because that method is generic that would make the trait not object
safe. On its own that is fine (this would still compile), but there was some
previous example code which relied on this trait being object safe.

------
muglug
Great article — I’ve never written a line of Julia or Rust, but could still
understand it perfectly

------
chalst
Two thoughts:

For this problem, I would use union types in Julia. Union types are a sort of
sum, but they are amalgamated sums whilst sums types in PL semantics usually
means disjoint sums. The difference is that disjoint sums 'mark' whether a
value is of the left or right type, while with amalgamated sums the value may
belong unmarked to the intersection. The distinction does not matter in the
example the post gives.

Second, Julia does not give the benefit that Rust gives of type coverage, that
is, ensuring that functions that take the sum type as argument actually are
defined for each branch of the sum type. The Rust compiler guarantees this
automatically. AFAICS, with Julia it is up to the user to provide tests
exploring the branches.

~~~
bocckoka
Julia is a dynamic language, I don't think having the rigor of a compiled
language (probably the most rigorous out there) is a reasonable expectation.
It would be nice to have as an external linter service, probably.

~~~
pa7x1
I think you are mixing two different things.

Julia is a compiled language and Rust is too. Julia is dynamically typed while
Rust is statically typed.

------
snicker7
Traits are possible in Julia [0] and may be a better solution.

[0] [https://ahsmart.com/pub/holy-traits-design-patterns-and-
best...](https://ahsmart.com/pub/holy-traits-design-patterns-and-best-
practice-book/)

------
daralthus
> Use subtyping in Julia and enums in Rust. One caveat though: Enums can not
> be extended from the “outside”. If you plan to let users of your library add
> new type variants, you will have to use some kind of trait approach.

This extensibility from the outside is the most important idea behind Julia!

For example this is why a generic deep learning library doesn't need any extra
implementation details to be able to run on GPU as well.
[https://fluxml.ai/Flux.jl/stable/gpu/](https://fluxml.ai/Flux.jl/stable/gpu/)

------
thom
Someone Rustier than I am, what's the story with this enum solution when I
want to add a _new_ star later? Can one implement this in a way that respects
the open-closed principle?

~~~
richardwhiuk
You can flag an enum as `non_exhaustive` to indicate that it might contain
additional values in the future.

If you do that, then anyone writing the code must add code to handle other
cases, or it's a compile error (i.e. you must match the enum as:

    
    
      #[non_exhaustive]
      enum Foo {
          CaseA,
          CaseB
      }
    
      let f : Foo = get_foo();
    
      match foo {
          CaseA => {},
          CaseB => {},
          // Exhaustive matching
          _ => {},
      }
    

[https://doc.rust-
lang.org/reference/attributes/type_system.h...](https://doc.rust-
lang.org/reference/attributes/type_system.html)

~~~
thom
But how can the PlayerClass trait methods get extended in the future without
opening them back up to add cases to the match expression?

~~~
ragnese
I'm slightly confused. First, PlayerClass is not a trait. And the methods
would have to be edited when you add a case to the enum. Is that what you're
asking?

It's different than if you were implementing those methods on a trait and just
had several types that implemented the trait. But the article does show the
potential complications that sometimes arise with Rust traits.

The difference is that a trait is "open" and an enum is "closed". So the
enum's methods can/must account for all variants. A trait method calls to the
implementor for work, much like inheritance in other languages.

~~~
thom
Sorry, somehow my brain had thought impl and trait went hand in hand, I see
that's incorrect.

My question is just - do people really accept this tradeoff? The Rust advice
in the article seems to create a maintenance burden compared to the Julia
version. Every time you want to extend the system you have to go back and
crack open old code. Is the performance difference really that great? Is this
really the default approach for Rust programmers? Not snarking about Rust at
all, I'm fascinated, but the open-closed principle would be front of mind for
me when deciding on these sorts of abstractions (both as a language designer
and just a developer).

~~~
ragnese
Your concerns are not unwarranted. I've done several Rust projects and
_usually_ there really isn't any trade-off.

If you want something open for extension, use a trait unless you can prove to
yourself that you "can't". If it's closed or unlikely to be extended, use an
enum.

Think of it this way: In Rust you have a choice between open and closed (trait
vs enum). In Java, you get open (interface) and that's it. Other languages do
also have both, like Rust: Swift, Kotlin, apparently Julia.

Traits have a few sharp edges because of the nature of Rust being built around
zero-cost abstractions. But, most of the time, they work just like an
interface in Java. The snag I most often hit is if one wants to return Self in
a trait method. The most obvious way to work around that, IIRC, is to just
return a Box<dyn MyTrait>. Keep in mind that in many languages, these things
actually work similarly except that the language will just happily throw stuff
on the heap transparently. Rust requires you to acknowledge that the former
case needs to be boxed and likely on the heap, since you can't know its size
at compile time.

In this particular case of extending an enum, I don't see it nearly as
horrible as you're describing it. Let's look at it from the POV of another
implementation in another language (I don't know Julia): You have an
interface, Player. That interface defines a couple of methods. You implement
three classes that conform to that interface. Now, six months later you come
back and want to add a fourth. You make a new class, say that it implements
Player and then your IDE/compiler yells at you until you implement those
methods. In the Rust-enum version, you have an enum with three variants and a
couple of methods, each of which has to match on the variants. Six months
later, you want to add a fourth PlayerClass variant. So, you add it and then
your compiler yells at you until you implement the fourth match branch in the
couple of methods on the class.

That doesn't seem that different.

To answer some of your inner questions: Yes, Rust devs reach for enums a lot.
But really only if you don't expect the cases to change much. Writing version
1.0 in Rust usually takes longer than in other languages because Rust is still
a low-level language. Rust's performance advantage doesn't matter for most
applications. People choose Rust for much more than the performance: its error
philosophy, its enums and good matching, etc.

~~~
thom
This is an excellent answer, thank you. I suppose the only remaining
discomfort I anticipate is if you have an internal abstraction that might one
day be exposed externally in a library. That refactoring doesn’t seem fun, but
I suppose these things rarely sneak up on you.

~~~
ragnese
> I suppose the only remaining discomfort I anticipate is if you have an
> internal abstraction that might one day be exposed externally in a library.
> That refactoring doesn’t seem fun, but I suppose these things rarely sneak
> up on you.

The place where this ends up happening most is around error handling. Crafting
your public error types in Rust is an art form. Usually in Rust libraries,
errors are implemented in enums. So whoever calls your API can see that it
failed and then can match on the reasons it may have failed (or just wrap it
in their own error type and bubble it up). If you're not careful, you can
cause breaking changes in your API by doing something as simple as changing a
dependency (your dep's concrete error type was wrapped inside one of your
error variants).

I have somewhat mixed feelings on it, but Rust does allow us to mark enums
with a special annotation that forces all matches on it to include a wildcard
match (even if it matches all current variants explicitly). It is generally
considered good practice to mark your error enums with said annotation so that
you can add failure modes in the future without requiring a major version bump
for just an extra error case. One could use the same annotation for any enum,
of course.

------
bocckoka
Julia enums do not equal to Rust enums. You can achieve that functionality
(that a type can be one of a set of cases, type algebra in general) with
subtypes. Which is fine I'd say, but there could be terser syntax for it
(which is available through packages, and it's slower). Having abstract types
which cannot be instantiated is also nice.

------
amelius
There are sum types, and product types, but what would "exponentiation types"
look like?

~~~
guerrilla
Exponential types are functions [1]. The type A -> B could be written B^A.
Interestingly, if you take functions to be lookup tables, then that's related
to the number of possible unique lookup tables. First, notice that the type A
+ B has |A| + |B| values, the type A × B has |A| + |B| values and then we can
see that A -> B has |B|^|A| values by remembering that functions can be seen
as binary relations with the property of being functional. This means that
there are |B|^|A| possible unique functions. Try an example by counting the
number of functions from the set { 0, 1 } to { 2, 3, 4 }. You will see that
there are 3^2 = 9 possible unique functions.

1\.
[https://en.wikipedia.org/wiki/Function_type](https://en.wikipedia.org/wiki/Function_type)

~~~
amelius
Interesting! Now I wonder what "tetrated" types correspond to [1]

[1]
[https://en.wikipedia.org/wiki/Tetration](https://en.wikipedia.org/wiki/Tetration)

~~~
guerrilla
Well, I don't know of any type theory that has that but you could make
something up. C^(B^A) = (B^A) -> C = (A -> B) -> C. So you'd get nested
higher-order monomorphic functions like if the base is A and the height |B| =
5 then the result of tetration would be (((A -> A) -> A) -> A) -> A. I don't
really know what that means though. Maybe an expert Haskeller could recognize
a pattern.

