enum class Handle : uint32_t { Invalid = 0 };
Handle h { 42 }; // OK
One of their examples demonstrates the number one issue for me with enums, which was not fixed with `enum class`. Since values outside the range of the type are valid, you are constantly needing to check for invalid values in any function that takes an enum [class]. Ruins any attempt at "parse, don't validate" style in c++ and completely ruins the "type safety" which c++ people are always going on about.
IMO that's the typical experience with many of the features in modern C++ standards. You read about a really neat useful thing they added, something that seems to provide a safe and practical way to overcome a shortcoming in the language. You may even get a little excited...until you try to actually use it and realize its full of new footguns and weird limitations
Yes, you read about std::variant on a blog and think that it is a sum type. Then you try it out and realize that it's a thin (type-safe) wrapper over tagged unions that is at least three times slower and has about 5 unreadable alternatives that replace simple switch statements.
Then you find out that members of a "variant" are not really variant members but just the individual types that can be assigned to a union. For example, assigning to a non-const reference does not work (and obviously cannot work once you realize that std::variant is just syntax sugar over a tagged union).
Most of these new additions since C++11 are just leaky abstractions and wrappers.
> and obviously cannot work once you realize that std::variant is just syntax sugar over a tagged union
It would be easy to make it work, there isn't necessarily a strict relation between the template parameter and the actual stored object. Not having reference variant members was a conscious decision, same as optional<T&>. Hopefully this will be fixed in the future.
A few code snippets of what you see as weaknesses of std::variant may be appropriate, as I couldn't figure out your complaint. Assigning to a variant taken by non-const& works fine for me.
I personally would have liked to see recursive variant types and multi-visitation (as supported by boost::variant).
std::variant is not a true algebraic data type, since the individual element constructors do not construct the variant type automatically. Compare to OCaml, written in a verbose and unidiomatic way that is similar to C++:
# type foo = Int of { n : int } | Float of { f : float };;
type foo = Int of { n : int; } | Float of { f : float; }
# Int { n = 10 };;
- : foo = Int {n = 10}
# let r = ref (Int { n = 10 });;
val r : foo ref = {contents = Int {n = 10}}
Notice that the constructor Int { n = 10 } automatically produces a foo type and assigning to a mutable ref works.
The same in C++, using assignment to a pointer to avoid the lvalue ref error that is irrelevant to this discussion:
#include <variant>
struct myint {
int n;
myint(int n) : n(n) {}
};
struct myfloat {
float f;
myfloat(float f) : f(f) {}
};
using foo = std::variant<myint, myfloat>;
int
main()
{
const foo& x = myint{10}; // works
foo *z = new myint{10}; // error: cannot convert ‘myint*’ to ‘foo*
}
As stated above, this obviously cannot work since C++ has no way of specifying a myint constructor that -- like in OCaml -- automatically produces the variant type foo.
C++ would need true algebraic data types with compiler support (that would hopefully be as fast as switch statements). To be useful, they would need a nice syntax and not some hypothetical abomination like:
using foo = std::variant<myint, myfloat> where
struct myint of foo { ... };
There is a difference between an API promissing that a value wont be null and a buggy program setting a null where it should not. A reference is only null if someone fucked up. As a programmer you can usually rely on a reference not being null and you couldn't do anything about it if it was anyway within the constraints of the language.
no, deferencing a null ptr is UB. An enum class outside the declared values is perfectly valid.
You could design a language feature where integer to enum is checked, but that's not enum.
Enum classes already add scoping, forbid implicit conversions and allow explicit underlying types. Those are pure extensions. Making undeclared values invalid or UB would be very surprising to people used to normal enums.
One of those cases happens accidentally all the time (in more complex variants than the motivating example you responded to), the other never happens except on purpose. It's like complaining guard rails are pointless because people being launched with catapults might still fly over them and plunge to their deaths.
I've never seen a nullptr somehow sneak into a reference. Never.
What I have seen is automatic variables escaping their scope as a reference, which rust protects against. And is also much more dangerous, because dereferencing a nullptr is defined behavior on most platforms
Well, apparently it was fixed with `enum class`… until it was unfixed in C++17 for unfathomable reasons. It’s honestly crazy that C++ doesn’t have simple exhaustiveness-checked enums. The obvious actual solution for the valid use case of deserializing an enum from an integer would be something like
IIRC for a brief period GCC considered values outside of the enumeration as UB and heavily optimized according to this. At some point this interpretetion made it as far as at least a draft standard. Then it got reverted as it went against decades of common usages and instead the opposite was made explicit in the standard.
Making it UB for enum classes was considered, but the strongly typeded alias use case was considered safer and more useful.
Unfortunate. Then again, constraint value integer types are something that would be useful in general and perhaps can be provided independently of enum class.
an event loop often wants an event type enum that has defined values for internal events and then everything out of range means pass onto the users handler. There are other variations where you need to pass an enum value without caring what it mean.
At least in C++ a template needs to be known at compile time, but I want to build my event loop and latter add in more values that should be handled without rebuilding it.
Using Rusts' retarded enum keyword for your example doesn't exactly make your point understandable, especially when talking in the context of actual enumerations.
Anyway, the point of what gp describes is to do this in a single integer without overhead so a naive sum type is not the solution.
> Using Rusts' retarded enum keyword for your example doesn't exactly make your point understandable, especially when talking in the context of actual enumerations.
data Event = System SystemEvent | User a
> Anyway, the point of what gp describes is to do this in a single integer without overhead so a naive sum type is not the solution.
Virtually all event systems end up carrying auxilary event data of some form, so sum types are what you want regardless.
I don't know C++ too deeply but from the little bit in the article, it seems like these `enum class` enums are actually Newtypes with associated constants instead of actual enumerations. They clearly don't 'enumerate' all possible values but they do give different behavior from their underlying type.
Is it possible to override the constructor so it limits the allowed values? Implement other methods on the `enum class`? If so then that makes them still very useful, albeit not in the usual `enum` sense.
That's by design. Consider mapping a file or reinterpret casting a network buffer that contains a structure with such an enum: if has been written by a different version of an application, the possible enum values might be different. That was considered, among other thing, an important use case to support.
You can easily build your own safe enum on top of you really want.
Edit: someone else pointed out the bitmask use case else thread. Strongly typed integrals is also a common use case.
You can justify the escape hatches for every feature in C++. The problem is that we eschew sensible defaults to handle the edge cases. Without going into a rust war, I think rust's unsafe is a great way to handle this - for 99% of use cases, you _really_ don't want to put an invalid enum value in there. But, in the number of cases where you do, you should have an escape hatch to do so.
If you could do:
my_enum_type foo(std::vector<char>& buf, int& offset)
{
// bounds check omitted for brevity
// return my_enum_type{buf{offset++]}; // outside an unsafe-style block would cause a compile error, or a throwing constructor. TBD
std::unsafe {
return my_enum_type{buf[offset++]};
}
}
you would reduce the possible impact areas to places where you explicitly want to do dangerous stuff.
Instead we end up with a feature that is a "zero cost abstraction" which just pushes the bookkeeping onto the user - every switch statemeent needs to handle the case where someone has passed in 42.
I assure you I'm very critical of C++ bad defaults. But in this case I think it was the right solution. There were three options:
1. make invalid values UB.
2. make invalid values non-representable by enforcing checks.
3. enum class is just a strong integer typedef with named constants.
Luckily 1 was reject: already too much UB. 2 would require runtime checking and was not considred viable by many; also it would prevent a lot of useful cases. 3 was the remaining option and was consistent with existing enum usage.
I don't think reinterpret casting enums is something that should be encouraged and uncovering such points where unexpected values can be introduced is exactly the point of having a strict enum. Just because putting random values into the enum isn't itself undefined behavior doesn't mean that your program isn't going to blow up later on.
Bitmask enums are also a hack and the better solution is to have different types for the individual flags (an enum) as well as a type for combinations of flags (custom type built around an integer).
Yes, these are existing uses but when enum class was designed there were no existing usages for that.
Having tried the solution with a separate type for the bitset as opposed to the enum itself, in my experience the complexity it is just not worth the improvement.
> but when enum class was designed there were no existing usages for that.
IIRC diverging from standard enum was considred, but rejected as deemed a surprising change in behaviour.
The main difference between old ("unscoped") and new ("scoped") enums, besides dropping implicit conversions from/to integral types, is the scope of the named constants. With unscoped enums, the constants are in the surrounding scope, which means that constants with the same name but of different enum types collide with each other. One solution is to wrap them in a dummy struct or class (`struct Foo { enum Bar { baz = 0 } }` => `Foo::baz`). Scoped enums are more or less a shortcut for that, where with `enum class Foo { bar }` you have to write `Foo::bar`, or you can use `using enum Foo` to be able to write plain `bar` again.
The reason other values of the underlying integral type beyond the named ones are allowed, is that enums are often used to specify bit values or bit masks that you AND/OR with each other. It is what it is.
The other question is, how would you go about converting an integral value to an enum type if only specific values were allowed? Either you need an explicit case distinction (potentially large switch) for all the allowed values, or there would have to be a hidden structure/array specifying the valid values at runtime, generated by the compiler. C++ is rather conservative in adding such runtime structures.
> The reason other values of the underlying integral type beyond the named ones are allowed, is that enums are often used to specify bit values or bit masks that you AND/OR with each other. It is what it is.
Except bitwise operations of orign enum values yields the underlying type and not an enum value. And for enum classes the operators don't exist at all. So you need to write custom operators(or manual casts) for this use case anyway so you might as well go all the way and write a proper typed bitset type instead of abusing enums. Allowing this for old enums makes sense for C compat but that doesn't mean enum class couldn't have been stricter.
While I'm not necessarily sold on making tons of stuff UB in release builds and checked in debug builds, it seems maybe better than not having exhaustive enums at all. Here's an example of this feature from some other language: https://ziglang.org/documentation/master/#enumFromInt
What the alternative? Let’s say you have a file, you parse a uint32_t, and you want to convert that into the Handle type. If the enum is closed how do you do it? Giant switch? That would break the fundamental principle that C++ abstractions are zero-cost.
You either need that giant switch or checks at every use anyway in order to protect yourself against bad data. Better yet would be an explicit mapping of numbers in the file and enum values so that your file format and runtime enum are decoupled. The compiler can still optimize that into a simple range check when the values match up.
We could allow zero cost non-conversion from the primitive type (and invalid values at rest as a consequence) but check and complain about invalid enum values at runtime, "opt-in", when we care: in some kind of novel exhaustive pattern matching statement (not an old school switch statement), which can simply treat invalid values as an erroneous default case, or by invoking a synthesized validation method of the enum class (and if possible with validating variations of conversion from the primitive type, copy constructors and the like).
Most enum use is served adequately by never letting in potentially invalid values from untrusted input; an enum variable that is set to an enum constant will necessarily have a valid value. "Decayed" primitive values, like a bitmask formed by bitwise operations between valid enum values, aren't normally intended to be re-ingested as enum values.
I don't think that "abstractions are zero cost" necessarily applies to every serialization format you could choose to use, or to any other thing that isn't part of the language.
If you want to guarantee that some value is within the allowed range of values, normally you'd have to run some code to do that, yes.
It applies to the enum. If you couldn’t lift an integer into the enum without branching, then the enum isn’t zero cost - it requires more resources than C would to do the same thing. Bounds checking is never mandatory in C++.
When C++ has broken this rule, developers have tended to turn those features off or ban them in style guides.
Providing exhaustive enums wouldn't prevent the language from providing non-exhaustive enums or users from placing values into them cheaply using memcpy.
Why is an enum a class? What does that even mean semantically? Why can't an enum simply be ... an enum?
Even if under the covers an enum is/was implemented as a special class, why would the language syntax leak this implementation detail to the programmer?
Perhaps it would be cleaner to make enum its own concept, rather than trying to shoehorn it into classes. It is a fundamentally different thing than a class after all. Is there a reason, why this is not done?
Because C++ has no way to compatibly adjust its syntax, it's important to them to reuse keywords where possible.
So enum and class were two existing keywords whereas scoped would be a new keyword.
There's also a sentiment among C++ proponents that classes (a user defined product type with implementation inheritance) are all you really need anyway.
My guess currently is that C++ 29 will get some sort of pattern matching. It's too big a feature to board the C++ 26 train in autumn 2024 and there's nowhere close to consensus on how it should work or how it's spelled yet.
So, assume C++ 29 has pattern matching. That causes people to attempt various idiomatic constructions from languages which always had pattern matching, and of course they're all rather awkward in C++. I think that realisation might point towards an actual language sum type, possibly via abortive attempts to "fix" matching and/or some of the pseudo-sum types provided by the C++ stdlib. So then you'd expect proposals in C++ 32 or C++ 35.
I don't think they're complaining that Sum types are equivalent to any or all of those things, but rather that what they wanted was Sum types and C++ has given them all this other stuff claiming it will solve their problem, instead of a Sum type which does solve the problem.
If what I need is a bookshelf then it's true that a wardrobe, a cardboard box, a Kindle, and a shredder are all different things and a bookshelf would not substitute for any of them, but giving me the wardrobe, cardboard box, Kindle and shredder doesn't solve my problem, I needed a bookshelf.
It makes no sense to mention dyanmic_cast or inheritance (clearly referencing virtual functions here) when languages with sum types also have equivalents. It just shows a misunderstanding of how these features are used.
I don't think I've ever seen an analogy contribute to a discussion, and yours is no exception.
You seem to have a very narrow C++ view of what sum types are. Sum types are related to the expression problem, and the given example is the canonical instance of the expression problem.
Tagged unions are an implementation and one that's often a poor fit for the problem. Sum types are an idea from type theory, rather than an implementation detail. It's very on-brand for C++ to have standardized a poor implementation detail rather than the useful idea.
Look at how much hoop jumping was required to make std::optional<T&> work for C++ 26 and then compare how Rust's Option<&T> isn't even special, that's just naturally what happens.
> In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type, or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. [1]
C++ class inheritance is not a sum type for a couple reasons:
- It does not allow you to discriminate on the type.
- You can add RTTI, which allows you to discriminate, but then it is not a fixed set of types.
Obviously, Rust does sum types better than C++, but that is really irrelevant.
I already knew what a sum type is. Fortunately I discovered Rust, which actually has proper sum types rather than trying to make do with a tagged union and a whole lot of squinting.
uint8_t value = static_cast<uint8_t>(Permissions::Read);
The latter seems clearly more expressive, not less. You could tighten it up a bit by using auto on the LHS instead of the redundant uint8_t. In the former I don't know what's going on and I have to go read another header to figure out the type of `value`.
The expressibility of a language is what can be expressed in it, not what it can express to you. This additional feature allows you to express this projection without being explicit.
That's true, but unrelated to the common usage of the word "expressiveness" when talking about programming languages. There's often a tradeoff between expressiveness and the thing you're talking about, which I'll call clarity. For example, macros increase expressiveness but decrease clarity. In Racket (or other lisps), you can define a macro `(my-let x 17 (+ x 1)) -> 18`. This is expressive, as most other languages don't let you define binding constructs so easily, but also bad for clarity because `my-let` doesn't look different than a regular function, so it's hard to tell at a glance that it's doing something a function could never do.
The literature on programming languages contains an abundance of informal claims on the relative expressive power of programming languages, but there is no framework for formalizing such statements nor for deriving interesting consequences. As a first step in this direction, we develop a formal notion of expressiveness and investigate its properties. To validate the theory, we analyze some widely held beliefs about the expressive power of several extensions of functional languages. Based on these results, we believe that our system correctly captures many of the informal ideas on expressiveness, and that it constitutes a foundation for further research in this direction.
I think expressiblity is trickier than that. You can think in a language with an imperative mindset where you are describing what to do, or another with a declarative mindset.
The former is more expressive because it doesn't require a template to manually deduce the underlying type of an enum as `static_cast<std::underlying_type_t<Permissions>>(Permissions::Read)`
The first is expressive in that you know that the "value" variable definitely ends up with the underlying numerical type of the Permissions enum, whereas in the second perhaps that static_cast is changing from a different numeric type.
And when the enum values don't fit into the uint8_t? If you want the enum variables constrained to an integer type do that in the enum definition itself. The compiler can then bark at you when you add a member outside the range.
Speaking of enums, what's a better serialization framework for C++ like Serde in Rust, and then what about Linked Data support and of course also form validation.
Linked Data triples have (subject, predicate, object) and quads have (graph, subject, predicate, object).
RDF has URIs for all Subjects and Predicates.
RDF Objects may be URIs or literal values like xsd:string, xsd:float64, xsd:int (32bit signed value), xsd:integer, xsd:long, xsd:time, xsd:dateTime, xsd:duration.
RDFS then defines Classes and Properties, identified by string URIs.
How best to get from an Enum with (type,attr, {range of values}) to an rdfs:range definition in a schema with a URI prefix?
Python has dataclasses which is newer than attrs, but serde also emits Python pickles
Wt:Dbo years ago but that's just objects to and from SQL, it's not linked data schema or fast serialization with e.g. Arrow.
There should be easy URIs for Enums, which I guess map most closely to the rdfs:range of an rdfs:Property. Something declarative and/or extracted by another source parser would work. To keep scheme definitions DRY
re Serde: If you want JSON, https://github.com/beached/daw_json_link is about as close as we can get. It interop's well with the reflection(or like) libraries such as Boost Describe or Boost PFR and will work well with std reflection when it is available.
Adding reflection will be simple and backwards compatible with existing code as it would only come into play when someone hasn't manually mapped a type. This leaves the cases where reflection doesn't work(private member variables) still workable too.
Haven't looked at JSONLD much, but it seems like it could be added but would be a library above I think. Extracting the mappings is already doable and is done in the JSON Schema export.
> SHACL is used for expressing integrity constraints on complete data, while OWL allows inferring implicit facts from incomplete data; SHACL reasoners perform validation, while OWL reasoners do logical inference.
- "Show HN: Pg_jsonschema – A Postgres extension for JSON validation" https://news.ycombinator.com/item?id=32186878 re: json-ld-schema, which bridges JSONschema and SHACL for JSONLD
Actually they haven't. Using glob imports inside small functions can enhance readability without causing confusion. Otherwise any kind of aliasing and importing would be polluting the scope since they are bringing other items into your module without the full path.
I disagree, because the scope doesn't escape those functions and it can greatly reduce the amount of ::
Usually in an entire codebase or a module there's a lot of competing, similar symbols. But in a function that's not the case, because they do something small. So there's no benefit to being more "explicit", because the function already states what it does.
For example you might not want to use std::chrono. But in, say, a series of timing functions you really don't want to be writing std::chrono:monotonic_clock::now(), do you? You can just use a using and then just call now() or whatever.
in my opinion. Both objects/structs/whatevers can have a state type of their own but the shorthand form does not indicate whether or not the two can be used interchangeably. As a workaround, you could rename the setter to set_computation_state and set_system_state but then you're just repeating yourself in a different manner that's not as precise.