Hacker News new | past | comments | ask | show | jobs | submit login
Enums in Rust – and why they feel better (shuttle.rs)
70 points by nathanjclark on Nov 23, 2023 | hide | past | favorite | 78 comments



The term "Newtype" more specifically describes a wrapper around a single type. It's like an alias.

It's incorrect to describe an enum with two variants as a 'newtype'.

I think the example was misunderstood from https://rust-unofficial.github.io/patterns/patterns/behaviou... (which uses Password, but doesn't use an enum with two variants).

The use of enums in the example is fine; but it's not a "newtype".


More specifically the newtype pattern is mostly so that you can implement foreign traits on foreign types.

You can't normally do this since there's a high risk of duplicate implementations, so by enforcing only local types/foreign traits or foreign types/local traits (and of course local/local types and traits), this duplication doesn't occur - there's only ever one implementation of a trait per type, and the ownership of those implementations is very clear and we'll defined.

The newtype pattern simply makes a "local" type that wraps a foreign type, allowing you to implement the foreign trait on the new local type, and forwarding calls of the trait methods to the inner type - all while maintaining the single trait impl and ownership constraints rust imposes.

Further, given all of the compiler optimizations that happen, newtypes are effectively free.

For those who haven't seen it, this is what newtype looks like:

    use some::ForeignType;
    
    struct LocalType(ForeignType);
As the parent comment states, it has nothing to do with enums.


> More specifically the newtype pattern is mostly so that you can implement foreign traits on foreign types.

For some definition of "mostly". I use most of my newtypes neither in Rust nor in Haskell to implement traits, but most of the time to declare - newtypes ;) that wrap other (primitive) types.


Hey, author here! Thanks for the feedback.

I agree that this could have been worded better - I'll make an adjustment to change this.


Other people have mentioned it in the comments, but they chose the wrong name for these (as did Swift). They're not enums, they're (pick one):

- tagged unions - sum types - algebraic datatypes - variants

There are some rust packages that transform rust enums into proper enums, the best one being const_table, or my own table_enum. Java got it right except for bundling the data with the Enum object instead of storing it in separate tables.

Having said that, I could not be happier that sum types have entered the mainstream.


I think I have much more problem with the idea that table_enum is "proper enums" than with the choice to name Rust's sum types enum.

Because Rust is a low bit-banging language, union needs to exist, which I think rules out calling enum "tagged unions" even if that's one possible way to look at the implementation. In a high level language which lacks union types this wouldn't an issue, but Rust is not that language.


Java has sum types now with sealed classes!


I always hated the name. You're not enumerating anything. You are picking a variant, encoding a choice. Maybe "sum type" is too "monad", but surely "variant record" or even "variant" would have been a better name.


In early Rust, the keyword was `tag`. In order to improve familiarity with C++ programmers, bikeshedding raged between `enum` and `union` for years before finally landing on the former.


TBH it's kinda funny how they narrowed it down to two options and then went with the "obviously" wrong option (also to get C++ programmers on board, wouldn't the obvious name not be "variant"?

I really wonder how that decision making progress looked like :)


> the "obviously" wrong option

But it's not, hence the aforementioned bikeshedding. When Rust enum variants have data, they're like C unions with automatically-inserted tag checking. When Rust enums don't have data, they're like C enums (they're literally how you do FFI interop with enums in C, which should demonstrate beyond a shadow of a doubt that `enum` isn't categorically wrong). They have the properties of both depending on how they're defined, but at the end of the day they technically have more in common with C enums than C unions (which is why Rust also has `union` for doing C FFI).

> to get C++ programmers on board, wouldn't the obvious name not be "variant"?

Most early Rust contributors were also C++ programmers (due to Mozilla), and I don't recall a single person ever suggesting `variant` as the keyword. I suspect people here are overestimating the mindshare of boost::variant as of 2014.


Don’t forget rust also has unions. Not using the name “union” for a feature that’s different than unions doesn’t feel “obviously wrong” to me.


Rust's Union is for C FFI, right?

Why not call Enum Union, and the FFI one CUnion or something? Like, is there something meaningful that Union is for other than C interop? Isn't it functionally identical to a tagged union in every other respect?


C FFI is a very useful thing, sure, but I don’t think it’s exclusive to that. It’s a nice primitive to have access to. I think Rust would feel weird with tagged unions and not untagged ones, just like having both sum and product types are good.


> Rust's Union is for C FFI, right?

No :)

> is there something meaningful that Union is for other than C interop?

Look more closely at the Rust standard library, MaybeUninit<T> is a union.


The right solution was to make union safe with an opt-in way to make it unsafe IMO.


The thing is that accessing an untagged union's data is just inherently unsafe. Say you made a union with a `u8` field and a `bool` field. If you had an instance of it and set its value to `5`, that's just not a valid `bool`. Hence the requirement of `unsafe`; you need to tell the compiler that you _know_ it's a valid value at the point you're accessing it.


Then make them tagged, that's what rust enums are, tagged unions, except they're called enums instead of unions.


I mean... yeah, that's exactly why Rust has tagged unions as well... both are useful in the right contexts, but tagged unions are easier to work with in general. That doesn't mean that (untagged) unions should be safe to work with though, because they can't be.


In concrete terms, what would that look like?


Syntax wise Zig does it with union(enum). Union and Enums are both concepts directly related to C and can be used separately too.


Union did end up meaning another thing that is just like C union and just as unsafe.


> to get C++ programmers on board, wouldn't the obvious name not be "variant"?

The use of "enum" in Rust predates the standardization of C++17, where std::variant was introduced.


Boost as "stdlib prototype" had that type much earlier though, going back to 2002 or so:

https://www.boost.org/doc/libs/1_64_0/doc/html/variant.html

The name and concept was also common enough to that I used "Variant" in my own C++ utility class collection around 2005 or so.


Yes, but `boost::variant` existed since 2004 https://www.boost.org/users/history/version_1_31_0.html


"C++ calls it variant."

"OK, let's not go with that then"

"C calls it union"

"OK, we DEFINITELY can't call it that"


Rust already has things called unions, which are C-style unions. Using the same word for two different things would definitely have been more confusing, not less...


They are tagged unions right?

"Enum" is not the worst term here. In fact you are enumerating the variants. It's only slightly misleading. It implies that you're comparing an integer.

There is an actual enum backing it to encode the choice (tag), and a union to encode the data (if it has data). Or at least that's what I would expect.

"Variant" is used in Rust lingo to name the variants, not the whole type.


While I agree with you, I can understand going with enum so that they don't scare away the C and C++ crowd with being too fartsy with theoretic stuff.


It is especially confusing for C and C++ coders, because Rust enums are a completely different thing than C enums, which makes the naming choice worse than just inventing a new name.

Better names would be "tagged union" (which is spot-on descriptive - because it *is* a union with a tag slapped on - but "tagged_union" of course is a bit unwieldy as keyword) or "variant" (which is the name for that thing in C++ and elsewhere since forever - it's not like Rust invented those tagged-union-thingies, it just added some convenient syntax sugar to the language to deal with them).


Maybe it's because you can use a Rust enum like in C by doing a field-less or unit-only enum?


I agree with you! I dislike the name enum and really like how F# does it. In F#, everything is type and what kind of type you create comes down to the type construction literal you use.

` (* enum ) type 'a Option = Some of 'a | None `

( struct - well: record. *) type Company = { Name : string Age : int }

Fudge. No idea how to format that in HN.


In practice, now C developers get to hear from the Rust crowd that C doesn't have enums because what C has is unlike Rust "enums".


they scared me away when I tried to do from/to u32 and comparing them, etc.


OCAML uses the terminology “(Polymorphic) Variant” for this kind of thing. Since OCAML is awesome, I consider this a strong precedent.


Some schema languages call it “choice”.


"coproduct type"


Better enums, but not better ADTs.

I didn’t write a lot of compilers in Rust yet, but I find it limiting that you can’t pattern match across a Box<T>, and all ASTs require boxing for recursive references.


I think what you want is to enable box_patterns[0]. I have no idea why it isn't stable yet.

[0] https://doc.rust-lang.org/beta/unstable-book/language-featur...


The Rust language team wants a more general solution that also works with types like `Arc`, instead of hardcoding a special case for `Box` into the language. This is generally called "deref patterns." But there are tricky open design questions, so the effort is stalled until someone comes up with a proposal addressing them


I believe it's not stable because it gives std::boxed::Box special treatment in the language. They'd rather find a solution that could work equally well for smart pointers defined outside of std.

Box patterns are likely to be superseded by one of these:

https://github.com/rust-lang/rust/issues/42640

https://github.com/rust-lang/rust/issues/87121


Often, when you have a compiler-y recursive ADT, you will want to allocate the individual nodes in an arena rather than from the global allocator as with Box. And at that point, you can use references (with the lifetime of the arena) rather than Boxes, and you can pattern match across those.

(Note that this doesn't work if your "arena" is just a Vec, which is something people have sort of hijacked the term "arena" for. It needs to be an arena that doesn't reallocate its contents on growth, something like https://docs.rs/bumpalo/latest/bumpalo/, in order to hand out references that live across allocations like this.)


I do find myself wanting deref patterns in the rustc codebase often :)


> I didn’t write a lot of compilers in Rust yet

what a pleb


An interesting aspect of sum types (what Rust calls enums) is that you can implement them in the language as a library if you have real unions, but not vice-versa.

Here's my example of sum types being implemented in julia as a regular package: https://github.com/MasonProtter/SumTypes.jl


I don't find "you can implement tagged unions with regular unions but not vice versa" all that interesting of a statement, since one forces tags on you it's not enlightening that you can't remove them, and with any other structuring mechanism you can add tags back in.


Yes, that's correct, but I think the way people talk about these things, it's not always clear why that's the case. I.e. I think from that description it's not particularly obvious to me that you can't find sneaky ways of removing the tags.

The important thing here is that the tags are fundamental to the subtyping relations. So unless your language is flexible enough to let one redefine what it means to be a subtype, then there's no chance of making unions out of tagged unions.

TaggedUnion{A, B} does not have the property that A <: TaggedUnion{A, B}, whereas with regular unions you must have that property, which is a very hard thing to hack into a language after the fact.


> A commonly said piece of feedback from someone who's learning Rust as a second language tends to be that enums are far better supported in Rust than any other language.

This is nonsense. I doubt that this is common feedback and it is just not correct that it is far better supported in Rust than in any other language. Why would anyone say that? Rust pretty much has support for ADTs, sum types and product types. Just like any modern (functional) language has.

Haskell, Scala, F#, OCaml, etc.

(And I love rust, write it in my day-to-day as well as on my spare time.)


> Haskell, Scala, F#, OCaml, etc.

Rust is the first not-weirdass-academic-wtf language to have these features ;)

(I love functional programming and apply the functional approach no matter what language I’m using - but why is it that functional languages are always so deeply down the academic rabbithole that they think eg “car” and "cdr” are intuitive names for “retrieve the first element of a list” and “retrieve the rest of the list” functions??)


I'm sure you know car and cdr are named for historic reasons and they are not featured in ML family languages as listed above. But coming back to the name of "sum type", I find it very potent to be used with "product type" to reason about Algebraic Data Type's properties, like a * (b + c) = a * b + a * c and don't understand why it's often avoided to be called in this way in "mainstream" programming languages.


Except Haskell, Scala and F#. They are neither academic nor rabbitholes. You're simply not familiar with them - perhaps a trip outside of your safe space could be something? :)

Car and crd is lisp and from the 60:ies, basically. You can liken them to assembly instructions because that's kinda sorta why they're named like that.

I still stand by what I said about them.

If you write Rust for 1-2 years, I bet you you'll come to look at programming in those languages completely differently. I have a very high carry over, going in to Rust from those languages.


`car` (Contents of Address part of Register) and `cdr` (Contents of Decrement part of Register) are historical terms from the IBM 704, used in LISP. They're legacy from early computing, not academic jargon.


MacCarthy transformed them into academic jargon in fact. Knowing that the words originated in the IBM 704 idiosyncrasies not found in other machines, he used them anyway in papers about symbolic computation.

Lisp pairs are flexible objects that can be coupled together into shapes and uses that are not lists. car and cdr can mean first and rest, and ANSI Common Lisp [1994] has those synonyms, but there are uses of cons cells where first and rest do not make sense.

MacCarthy must have realized that words that do not invoke any connotations are good for the elements of a flexible pair structure. Choices like first/rest, left/right, top/bottom and others are saddled with semantics that don't match every use.

Programs that need a pair structure in which the two pieces have very specific roles can provide their own synonyms, if their authors feel it makes code more readable.

I seem to recall that Knuth, in TAOCP, at one point, calls the pointers of a binary tree node ALINK and BLINK.


To be fair though, Scala's sum types are a bit of a hack aren't they, or at least pretty cumbersome? Declaring a `sealed abstract class` (or is it trait) and then a bunch of case classes syntactically disconnected from the sealed abstract thing that defines the sum type.


Scala 3 has enums, which can also serve as ADts: https://docs.scala-lang.org/scala3/reference/enums/enums.htm...


Ah right, thanks. I've only used Scala 2. To what extent do Scala 3 enums give you what you want in the way of sum types? E.g. compared to Rust enums?


car and cdr are no more confusing then the historical C functions that turn entire sentences into initialisms. And surely nobody would call C an academic language in this context.


There is a group of folks that had learnt these features via Rust, and for whatever reason always praise Rust for "inventing" them.


A similar thing happened about 15 years ago with Ruby and “metaprogramming”.


This is certainly happening but one of the reasons that Rust become popular is that it included, maybe not mainstream, but already existing concepts (lifetimes included).


Because many people start learning in one of the industry incumbent, not necessarily good by design languages.

Java, C, C++, python, Javascript, go...

As a second language is a quite different selector than after education in language design or excursions to functional programming.


Java enums are also quite powerful versus what C, C++ and C# can do.

They are real objects, can have associated data and methods.

So if they are surprised by Rust, haven't learnt Java properly


Java enums store the associated data once per option, while Rust enums store it per value. So they're very different. You couldn't model Rust's `Option` or `Result` as a java enum.


> You couldn't model Rust's `Option` or `Result` as a java enum.

For reference, they can be modeled as a "sealed" class since Java 17. https://openjdk.org/jeps/409

Prior to Java 17, sealed classes feature can be partially recreated via private constructors in the super classes. This approach forces you to put all classes in a single Java file, which can be awkward to navigate.


Sure, I didn't say it was the same, only they are more powerful than C, C++ and C#.


I Haven't used java in ages.

Though from a quick search, it's still just scalar/unary values. Rhough one of the better implementations of those.

But the big advantage of the more sum type Style enums rust provides is the ability to nest values in them while preserving type safety guarantees. So I still see them as miles better as a language concept.


Java enums are a class with a fixed number of instances.

The thing in Java that is closest to Rust enum's would be sealed classes, which was a preview feature in Java 15 and finalized and released with Java 17.

https://openjdk.org/jeps/409


Still, better than C, C++ and C#, which was my point.

For the real deal, there are sealed classes now, modeled on how Scala does it.


erh no, those in java allow you to do just that. It is one of my favorite features of java. In essence, you get to embed multiple synchronized maps inside a java enum (if you want to). All handled 'invisibly' for you.


No, Java enums can contain data but do so in a manner orthogonal to Rust enums. Java enums are not sum types; each variant is simply a (`final`) instance of the enum type which, mostly, is just a regular class type with a private constructor so that no other instances can be created. All variants have the same shape and the author of the enum decides what data the variants contain.

Rust enums are sum types, which means that each variant can have a totally different shape and contain different data, and it's the user of the enum that supplies the data when they create an instance of a variant. This quintessential use case for sum types not possible with Java enums (although it is now sort of possible with records and sealed traits):

    enum Result<T, E> {
      Ok(T),
      Err(E)
    }


Still, you get to see all fields from all variants, while Rust variants are more isolated.


>Why would anyone say that?

Most people have not had any exposure to functional programming languages or functional programming in general. When they say "any other language" they mean, any other language that they have used or know anything about I guess.

This is one of those cases where we fail to realize that other people's reality is totally different than ours. Everyone does this all of the time, not just you specifically to be clear, this is a limitation in our brains.


Obvious they would say that because they're not familiar with the other languages. Like it or not they probably come from Python/C++/JavaScript or similar.


I might say something like that. My background is a lot of JavaScript and a little bit of other things like c++, python, Ruby, php, etc.

To me, enums are one of the great things about Rust, and I wish I had the same capabilities in JavaScript and other languages.


I think that's one of the important features why I LOVE Rust! I recently wrote a Chip-8 emulator and creating an enum with all commands = AMAZING!

(I come from C/Embedded world)


The author talks about people who learn it as a second language. Not third or tenth.


C++ class enums are perfectly fine, it's just the old style C enums that are awful.


> Other languages like Go do not necessarily have enums, but you can represent enums by using something like this (in Go):

This misses the important point that Go lacks pattern matching -- one uses a switch statement instead -- and the Go compiler lacks the ability to check that all branches have been handled in a switch statement.

Go's concurrency stuff is beautiful. If Go had sum types / Rust-style enums with pattern matching and handled errors as a Result sum type, then it would be a much nicer language.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: