The `any*` fat pointer in C3 is a version of this: https://c3-lang.org/anyinterfaces/ Once this is available the step to embedding some runtime type information isn't far.
Interesting. And reading all the changes from C, it looks like what occasional C developers should always remind themselves when coming back to C. That's a useful list by itself.
I did not know about C3. I love C because is what I first learn to program, the simplicity, the complete lack of magic. Looks like I will love C3 as well:
Polymorphic types are certainly useful, and I wish C had a better way to go about it, but this proposal feels like an odd patch to C's type system. Especially this part:
```
Using a run-time value of type _Type T in _Var(T) can be allowed in general (and is useful), but
needs to be restricted to contexts where full type information is not required at compile-time.
```
Semantic rules that conditionally apply only in some contexts is a common tendency of the C++ standard that many C programmers often dislike.
I don't think this is a deficiency, but an inherent property of powerful type systems. If you make them very expressive, then you can not have full type checking at compile time. On the other hand, if you restrict them so that you do full static type checking, than they are very limited.
It's a natural limit – but it doesn't make it easy to figure out in advance when it is/isn't permissible to do what you want to do. (Now, if C had type inference...)
I am not sure I agree. What is allowed by the typing rules should always be clear. What may not always be clear is whether you statically get an error during compilation or only a run-time error. But considering that we now randomly get silent miscompilation which may or may not cause a run-time problem, if you cast a void* to the wrong type, I think this is a clear improvement.
Here is another way in which this is a little patch-like. The author observes that:
(a) it is desired that _Var(T)* and void* be compatible;
(b) however. pointers-to-T for different T are not guaranteed to be compatible in general,
as a consequence, it is NOT guaranteed that _Var(T)* is compatible with T*, which is a theoretical wart worthy of C++. I can see it becoming especially annoying if you’re trying to introduce _Var(T) polymorphism into a code base that currently uses preprocessor macros that textually substitute types.
Perhaps the better solution would be forgo giving void* any special status whatsoever. However, that means that this type of polymorphism can’t be implemented using void* as a polyfill.
It's worth keeping in mind that code aesthetics is an important aspect of C codebases. There's a lot of C code that is exclusively lowercase, sans the macros. So introducing keywords like _Type and _Var will serve to hinder their adoption, because it'd make the code that much more "ugly". Just like what happened with _Generic - a reasonable feature, bad keyword selection -> barely any field use.
The C specification mandates that new keywords use _Keyword naming conventions to ensure backwards compatability by not overriding potentially existing identifiers in codebases. That is why the C specification has reserved identifiers that begin with an underscore and either an uppercase letter or another underscore.
Typically, a <stdkeyword.h> header is included that contain macros to provide the lowercased variants. I.E., this is how _Bool was implemented; <stdbool.h> provides the lowercased `bool` variant.
C23 is scheduled to promote bool, alignof & co. to keywords, so the concern for using _Xxx keywords is recognized by the committee. They introduce _Xxx keywords, sometimes alias them to lowercase versions with macros and let this age. Then, some time after, they switch to the "primary spelling", which is how the lowercase versions are referred to.
You can't easily lowercase _Type and _Var, so practically speaking it will take years before these features could be suitable for wide-spread adoption. Hence my original comment - given the friction, is it worth expanding the language this way at all then?
_Bool etc. came with convenience macros defined from <stdbool.h> and so on, but _Generic never did, suggesting that the underscored version was meant to stay forever that way. (Otherwise it should have been named as something like _Generic_switch and later renamed to generic_switch...) Maybe _Type and _Var are similarily intended.
IMHO, the C committee should just copy a subset from C++, in addition to having new features for free, you also make sure c and c++ are 100% compatible.
It's for backwards compatibility of C compilers (and lack of proper namespaces).
All identifiers starting with an underscore are reserved for use by compiler intrinsics and such. Although most compilers don't complain if your variable names start with a leading underscore, it is recommended to not have such identifiers in your code.
This is so that the new keywords don't collide with existing code (the combination of underscore followed by a capital letter is reserved). For instance until C23, 'bool' was actually called '_Bool' internally.
Just as with stdbool.h before, there could be a stdlib header which wraps those internal names into something more human-friendly.
You can still use virtual function calls in plain C, you just do things the same way that C++ does things internally. Your first member of the struct is a Vtable, and you need to assign that member when you create the object. Your first parameter for the virtual method calls is a "this" pointer.
The problem with rolling your own virtual tables without compiler support is that devirtualization is generally not possible, even in fairly easy cases.
If by "devirtualization" you mean inlining the virtual method and associated optimizations, that can be done with If-else (checking which vtable it is), then explicitly calling the function. Failure to find a known method can proceed to calling the function pointer.
Speculative or unconditional devirtualization are reasonable things to do if you are the compiler.
Writing code by hand, it's quite unwieldy, since the compiler doesn't know that a `FinalFooFrobber` always contains has its vtable set to `FrobberVtable foo_vtable`, which often appears only after inlining.
Asking C programmers to do inlining by hand is not sane. This isn't like interpreted languages where inlining sucks - here, the compiler is quite capable of it, just the vtable stuff confuses it.
By the same argument we don't really need C to begin with, because things can still be coded in assembly. It all boils down to convenience of reducing boilerplate. As the meme goes - "You don't need sneakers to run, but they sure help."
And if I understand your point its that we should embrace higher level tools rather than trying to build abstractions in lower level tools. In that case OP should just use C++ rather than trying to build an abstraction into C. That going higher level is actually why C++ was created in the first place.
The point was that if there's a certain coding pattern that can be accommodated by adding a new language feature, it is worth considering. The fact that it's readily doable with some effort in some other way is not an argument against it.
C++ started as a reasonable extension to C, but now it's just something else (nor does it even want to be seen related to C now anyway). Extending C with something new, without rushing, is a perfectly fine idea.
> The following for [sic] declaration [sic] are then all equivalent:
int i;
int *ip;
typeof(i) *i;
_Var(_Typeof(i)) *i;
What? How are "int i" and "int *ip" equivalent?
_Var(_Typeof(i)) ident strikes me as incredibly silly. The type value produced by _Typeof(i) should be directly usable as a declaration specifier, without requiring hoisting to a name.
FFS, GNU C has had a sane typeof working for years.
To bind a type to an identifier, we don't need a _Type type at all. typedef does this!
I.e. not:
_Type x = int;
but
typedef int x;
Existing, decades-old typedef is how you bind an identifier to a compile-time type value. There is no need to do this via a _Type type specifer, which then forces us to move the type we want into the initializer. We use the typeof storage class specifier, which then lets us have the type we want as another specifier, and we don't need an initializer.
This works in GNU C:
int x;
typedef typeof(x) tx; // same as typedef int tx;
The problem of declared, named containers to capture type is long solved.
Is it just me, or are disillusioned C++ refugees who've gone back to C now trying to wreck it a second time? Probably just me. Since the 80's, C++ served as an excellent decoy to absorb craziness and keep C sane.
The only thing I'm confused about is why more C standards keep coming out. C is what it is. Work with the archaic parts of it as need be. You use C for ultimate cross-platform compatibility (every exotic platform and its mother has an ANSI-C/C99 compiler). If you're able to run on the most bleeding-edge C compiler supporting the most recent WG version of C... then why not just use another language? I say this as someone who loves C. If there are bits that feel old, you just create a DSL to get around these problems that compiles down to C... but you don't change C itself.. Just my perception working with all this for many years now.
For me the big problem with most C competitors is they go overboard. The only alternative I've tried that sparked my fancy was D with -betterC flag. It's unfortunate that it's not its own language, but rather a "profile" of a much much larger language.
I switched back to C from C++. The reasons were that C removes a lot of unnecessary complexity which helps concentrating at the problem at hand. It also removes a lot of features that when used incorrectly makes the code much harder to understand. So if one often has to work with code written by unexperienced programmers (as I have to do), I found C++ very problematic. The stronger type checking is largely a myth.
A trivial example is assigning pointers to incompatible types or passing them to incompatible function parameters not being an error. GCC 14 and Clang 16 are only just now making these errors by default. C++ never had this problem.
Another example, is having type safe generic functions and classes. Using void* to pass around arbitrary types is something that you don't need to do in C++ because it supports generic programming.
In C++ you can for example, pass an array of a specific length to a function, and make it a compile time error to pass an array of incorrect length, using std::array. In C arrays decay to pointers.
static_case is also a thing in C++ that allows you to cast between types much more safely.
There are so many things that I could probably spend several hours typing them out and I don't think that is really needed.
C++ is certainly more type safe than C, whether or not you think the tradeoffs it makes are worth is something to debate for sure, but it gives you tools to write type safe code that simply don't exist in C.
Ok, let's debunk this point for point:
- passing incompatible types is a constraint error in C just as in C++. Whether this is a warning or not depends on the compiler and on compiler flags. And as you observed yourself, the defaults are now even the same.
- Passing void* around is also generally not necessary in C.
- There is no reason to let arrays decay in pointers in C. One can take the address of an array and then get the same type checking.
- Whether static_casts really makes much of a difference is debatable. In my C code I avoid casts and the rare exceptions I wrap in type-safe macros.
So in summary, I think this just confirms that C++ people think it is safer because they do not know how you would do this in modern C.
>passing incompatible types is a constraint error in C just as in C++. Whether this is a warning or not depends on the compiler and on compiler flags.
Much of the existing C code out there is filled with these things because they weren't errors until recently, and people ignore warnings. This is why Fedora and Gentoo are doing so much work to port all of their software to "modern C", so that it will actually compile with new compilers, and hopefully fix bugs at the same time. Existing c++ code did not have this problem because it was always an error.
Note that modern compiler in this case means bleeding edge, released literally within weeks or months of this posts date, most distros are not yet compiling the world with these and most users are not using them.
>Passing void* around is also generally not necessary in C.
This is not true.
I know several examples in real libraries that require this, two OTTOMH is PAM and Wayland. Both of these require void pointers to give you access to some state object that you create and need passed around, it has no way to know what this type will be ahead of time, and C has no other way to do this.
Another example is data structures or algorithms on data structures like sorting, you either use macros to emulate generics or you use void pointers. Qsort is a perfect example.
>One can take the address of an array and then get the same type checking.
I don't think this is true, C will not include the length in the type. Passing std::array<T, N> in C++ causes an error if T or N are different, in C this length information is not part of the type and does not get type checked.
There is plenty of old C code, this is irrelevant. Equally irrelevant is that you found a library that uses a void pointer. I can show type unsafe C++ code as well, and would be equally meaningless. We talk about whether C++ is more type safe in principle and it is not.
Here is the array example:
https://godbolt.org/z/P8494W3Pf
Note how grotesquely bad the C++ syntax is for exactly the same type safety.
>>One can take the address of an array and then get the same type checking.
> I don't think this is true, C will not include the length in the type.
No, you can take the address of an array of length N, as long as you're content with having a function that requires the array be that particular length. For example, if you wanted to write a function that adds one to every element of an array with ten elements, and require that it must have ten elements, you can write it like this:
void add_one(int (*array)[10]) {
size_t i = 0;
do {
(*array)[i] += 1;
} while (++i < 10);
}
However, you usually don't see people do this, because they want their functions to work on all arrays, regardless of their length. So, they pass a pointer to the first element and a value representing the length.
You will see people pass pointers to arrays when writing functions that operate on 2D arrays, because that's how you pass a pointer to the first element of a 2D array. The address of the first element of an array of size 10 arrays of int, is a pointer to a size 10 array of int.
// 2D_array_ptr: a pointer to the first element of an (n x 10) 2D array
void add_one_to_array_of_array10(int (*2D_array_ptr)[10], size_t n) {
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < 10; ++j)
array[i][j] += 1;
}
}
In function signatures, array declarations decay to be equivalent to pointers, so you could also write the function signature like this:
Many people, Especially sice C++ has become a playfield for CS language theorists, who invent more and more overcomplicated language "features" with little to no practical use.
Getting features into WG21 still requires some effort to get them through, regardless of what people in the outside think, and yeah I do agree it could get some more direction.
C++ may seem to get everything dumpped into it, however any language nerd that feels like diving into what the history of languages with similar age (Python, Perl, Ada, C#, Java, F#, OCaml,...) have across all their versions, standard library, main interpreter/compilers, .... will find out those aren't much better either.
There's a lot of feature gap between C and C++. Many including me like to program in C but want just a _little_ more safeguards, syntax sugar and some established ideas (like defer) that improve the programming experience.
Now there are languages like Odin, Zig and C3 that are trying to fill that niche but having C evolve slowly to accommodate such users is also a great idea.
C99 added Designated Initializers which IMHO is probably the most innovative declarative syntax to initialize anything and it took C++ roughly 20 years to adopt such a thing (and then they had to nerf it).
1. Type safety for generic interfaces such as qsort.
2. Language interoperability. Then you need to have a dynamic framework for constructing types and function calls.
Hi Martin, thank you for writing this proposal. This is just my two cents, but one-off void* functions, like qsort, are less of a pain point relative to generic containers. With generic containers it's common to have a collection of void* functions that must be consistently invoked with identical type T. Correct me if I'm wrong, but this proposal cannot genericize a struct field, i.e. it can genericize type 'T' but not 'T->someField'. The latter would be useful for something like 'vec_push(v,p)' where 'v->data[]' is the type T needed to determine if 'p' is a compatible type.
Tangentially related, the macro-based containers you've written here [1] are the best answer for type-generic containers I've come across. One "gotcha" is the container name must be a valid C identifier otherwise it doesn't token paste correctly (see Example #2 of your REAMDE where you typedef'd string* as string_ptr to workaround this). Would you give consideration to a new preprocessor mechanism for concatenating a list of tokens into a single valid C identifier? i.e. Something like CONCAT(struct Foo *) would produce struct_Foo_Ptr? The result is guaranteed token paste-able.
Thanks. I agree that data structures are even more important. So I plan to make this for for data structures too (this is not a full proposal yet, just testing the waters):
#define foo(tag, T) struct tag { _Var(T)* p; }
then we also need a solution for the problem you mention. One option I thought about is to allow strings as tags: https://godbolt.org/z/cMc3aPjsK And if the builtin that transforms types to strings you be used as a string literal, this could work nicely. But even better may be to just all parametrizing the tag with a type: struct foo(T)
The other point: For interoperability with other languages you often need to construct function calls at run-time. For this one needs to be able to describe types. Adding this by external libraries is always a pain because one has to carefully support the ABI of all supported architectures. A compiler can support this seamlessly. A _Typeof operator that returns a type that can be passed around and interrogated at run-time solves this in a very elegant and simple way.
Generic libraries. There are programming domains in the embedded world where code reuse can shorten the development timeline in a significant way, which gives engineers more time to test and verify.
Aviation would be my personal experience. We have to implement services on the aircraft for structured communication support with the ground tower. Those would really benefit from something like that, since we wouldn't have to implement a backend for every project and we could simply reuse the whole library everywhere.
I use C at my day job and our company works with ANSI C. I believe it's C89. I was curious to find out - who's using C23 in prod? Please let me know in the comments. I'm super curious!
We make file formats. It's all C and we work on the low-level FFmpeg C API and low-level jpeg & brotli apis. We use GCC. I didn't know there's C89 and gnuc89 lol
It's really frustrating to watch this happen, first with using arrays to mean non void pointers, then with _Generic for dispatching on types, and now this. And the worst part to me is not necessarily that they're complicating or "bloating" C, because these are necessary features, the worst part to me is that they seem to simultaneously recognize these features are necessary, and not want to commit to a full and proper implementation of them like e.g. proper generics or simple templates. Instead they create stuff like this, where they haven't fully committed and so it's weird and awkward to use and doesn't feel fully integrated into the language, with less clear semantics, so in the end it ends up kind of worse than C++, whereas what I wish they'd do, if they're going to bring features from C++ to C, is just fully and properly implement the feature, but make it as simple and limited and basic as possible — the basic concept of templates and monomorphization makes sense to me, buy I'm not exactly sure how this works at compile time.
It this point, if I want generics, I'd rather just use C++.
Note that a type system with polymorphic and dependent type system is far more powerful than generics or simple templates. In my opinion monomorphization is rather bad, because it creates a lot of compile time bloat while still not being very expressive from a type system point of view.
It is not more unsafe than fixed-sized arrays on the stack and stack clash protection (which you need anyway) protects against this. Also if you compare with C++ and use std::vector, surprise, a CVE about to happen: https://godbolt.org/z/cTG71aTsf
(yes, one can activate library assertions, but still - by default - unsafe)
C++ has added more new design warts on top of C than C ever had though (which means that C++ is now caught in an infinite cycle of trying to fix warts it had introduced itself a couple of years ago).
At least in C there's a possibility to learn from C++'s mistakes, and only integrate the 'good parts' (and the C++ template system certainly isn't one of those good parts).
Yeah, that would be the ideal — take just the simplest, cleanest version of C++'s good parts and integrated them. But lile the sibling post said, that's not what they're doing. Instead they're designing their own new versions of each feature that they add, and because it seems they're not willing to fully commit to new features or changing the language much, the features are always weirder with more gotchas and less baked than thr C++ versions. It isn't so bad right now because the language is overall much smaller, but if they eventually grow to anything close to the size of C++ (and C++ isn't even that big of a language anymore compared to other modern languages) it'll actually be way worse.
Having switched from C++ to C, I think C is a lot less weird than C++. But I also observed that C++ programmers usually do not "get" C (but think they do, because "it is just a subset") and often think some parts are weird simply because they are different.
In any case, if you need a clean subset of C++ (but nobody I have met really agreed about what this is) nobody is stopping you from defining one.
> Having switched from C++ to C, I think C is a lot less weird than C++.
Overall? Without a doubt. But these specific features they've been adding are implemented very unnecessarily strangely just to avoid fully committing to them, so that they end up in this design space where they may not be weirder than the full Lovecraftian horror of the equivalent C++ feature, but they are much weirder than the ideal/abstract concept of the equivalent C++ feature, which is what I wish they would have pulled in. Like, looking at this proposal, it isn't clear to me at all how the compiler is actually compiling or the runtime is executing these polymorphic function calls at all, in cases where they don't just collapse to void*. And it isn't clear to me how exactly type scoping and substitution works either. Whereas in concept at least templates are crystal clear and also easy to expand to do much more advanced macro-less metaprogramming, like D does.
> and often think some parts are weird simply because they are different.
I have way more experience with C than C++, but I'm usually a Rust programmer which works more like C++ so maybe that's where my bias is, but I really think this implementation is unnecessarily weird in general too — I've used a lot of languages.
It seems some things are crystal clear to you because you are already used to them, but here you do not "see" how this will work, so you have your doubts. Fair enough.
To me it is crystal clear how this will work, and I much prefer this to generics in C++ and Rust or other similar languages.
But I am not saying it is wrong what C++ and Rust does. So if you prefer this, those languages are there ready to be used. I just think there should also be a good alternative for people who do not like how this works there.
Curious, why do you think Rust does not look like a better C++ (in the sense of not making the mistakes of it)? Is there something you find impractical?
Yeah, I'm a huge Rust fan, but some of the decisions they've been making with the newer RFCs, especially around things like convenience syntax, and the fact that they now have async/await and neutered coroutines, instead of having proper coroutines capable of doing both ala Kotlin, among other things, seem really iffy to me. I'm afraid it'll turn into Scala or something soon, with a huge bag of redundant or partially overlapping features.
I've been somewhat successful to use Ada instead. It has a lot of inconveniences of its own, but compared to C++ it's a language where it's hard to get things to work in any way, but once you get them to work, they are likely to do what you want, whereas in C++ it's easy to get things to compile, but then heaven only knows what that code will do.
Also, Ada has a very decent interop with C. To the point that w/o any prior experience, I needed to use some functions from SQLite that weren't already exposed through Ada, and it was a pretty smooth sailing: very little "glue" code, all code that connected C and Ada was written in Ada.
My guess is this is what most people do. The few remaining holdouts are busy turning C into something that isn't C because they don't like things that aren't C.
In a way, Zig to me is Modula-2 with a revamped syntax for C minded folks, with a nice meta-programming story. In terms of language features.
Personally I am not a big fan, because it doesn't have a good story for use-after-free (other than adopting similar analyser tooling like C derived languages), if I want to type @ all over the place I can use Objective-C, and the community doesn't seem very keen in supporting binary library distribution.
That is me, we don't have to like all the same things.
Even with UAF caveat, it still better than plain old C.