Hacker News new | past | comments | ask | show | jobs | submit login
C23 is finished: Here is what is on the menu (thephd.dev)
257 points by ingve on Aug 5, 2022 | hide | past | favorite | 196 comments



> N2888 - Exact-width Integer Types May Exceed (u)intmax_t

This creates a problem. How do you portably printf() an integer type that you don’t (and can’t) know the size of, like, say, a uid_t?

Before, when intmax_t was guaranteed to be the largest type, you could reliably and portably cast a uid_t userid to an intmax_t and printf() that: printf(PRIdMAX "\n", (intmax_t)userid);

But now, when any type might be larger than an intmax_t? What do you do?

(The same problem exists when reading an integer string; how do you read, say, a port number, when you don’t know what type in_port_t is? Previously, you could call strtoimax() on the string, detect overflow, and then cast the resulting intmax_t to in_port_t and finally compare for equality to make sure it wasn’t truncated.)

> There is active work in this area to allow us to transition to a better ABI and let these two types live up to their promises

Yes please. Introducing new wider integer types in your CPU, but pretending you haven’t created a new architecture with an associated new ABI, is silly.


I mentioned this in another article of mine. Robert Seacord closed the loop on "how do I print numbers the non-shit way?" in C23 with N2680:

https://thephd.dev/c-the-improvements-june-september-virtual...

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2680.pdf


Great, thanks! But that only solves printing. How do I do the reverse operation? I.e. parsing a string to an integer type of an unknown size, while checking for overflow?


I think this would work: first, determine the range. For unsigned variants, we have whatever_max = (whatever)(-1), if it isn’t already defined.

That makes parsing the unsigned variant relatively simple. Undefined behavior would be end-of-game, but unsigned addition and multiplication will not invoke that (if you need he,p, look at your favorite multiple-precision library for ideas about detecting wraparound)

For parsing signed integers, derive the min and max values of the signed variant from the max value of the unsigned one ((whatever_max - 1) / 2 or something like that

Then parse your input to the unsigned variant first, then check whether it fits in the unsigned version.


¯\_(ツ)_/¯

Someone's got to write a paper to get it in, hopefully based on some sort of existing practice.


I think it's best to regard intmax_t as a failed experiment. E.g. GCC has long supported __int128, clang has _ExtInt, other compilers have similar extensions, but they refuse to increase intmax_t due to ABI concerns.


If people use intmax_t and then they expect the ABI to stay stable, than they are at fault, not libc, LLVM or GCC for changing it when the actual "max int" changes.


But all major C compilers promise that they won't make ABI changes. So I'd rather say that standardizing intmax_t was a mistake.


If you don't standardize intmax_t, programs will invent the concept for themselves through some combination of preprocessing directives and shell scripts that probe the toolchain.

If you want to know "what is the widest signed integer type available", you will get some kind of answer one way or another.


Or even worse, if they don't know their build tools, some kind of union based monstrosity...


GCC changed long to 64-bits and x86 codebases had to accept that. The base integer types are designed to be variable sized and you code accordingly based on the minimum size guarantee. Assuming types are frozen in time is how you get things like LLP64.


No, gcc added support for ABIs that defined long as 8 bytes. They didn't change the size of long for any ABI.


Long is 32 bits on x86, 64 bits on x86_64. x86 code bases broke because they made non-portable assumptions.


No. It's 32-bits minimum. It can be anything larger. That's what the language says. Assuming otherwise is nonportable.


Why are you saying “no” and then agreeing with what I wrote? Am I misunderstanding something?


Because it isn't "64-bits on x64". It's 32-bit minimum on all platforms. Assuming larger sizes than the minimum is how brittle code gets written. A compiler vendor can set the sizes to whatever they want regardless of the target architecture and claim conformance if they meet the minimum.


Talking about the SysV ABI. There are multiple standards involved, not just the language standard itself.

> A compiler vendor can set the sizes to whatever they want regardless of the target architecture and claim conformance if they meet the minimum.

Compilers claim conformance with other standards besides just the C standard.


You are confusing ABIs with what the C standards mandate. A compiler emitting code for a particular architecture need to conform to that architecture's ABI. Or else it won't be able to interface with syscalls or other libraries. This goes for other things too like calling conventions and setup of stack frames. As klodolph states, long is 64 bit on x86_64, (32 on x86_64-win) and 32 on x86.


Because "32 bits or greater" != "exactly 32 bits"


Right. One is what the C standard says, and the other is what the ABI standard says. Maybe I’m missing something here.

Obviously, when someone says “type X is size Y on architecture Z”, they’re not talking about the C standard, they’re talking about some particular ABI.


> the other is what the ABI standard says

There are multiple ABIs for x86_64. I think the major compilers use 32 bits for int and 64 for long, but there could very well be a compiler that used different sizes, with a different ABI.


Yep, the x32 ABI on Linux is based on the amd64 instruction set, registers, etc. but uses 32-bit long and pointers (which in my experience is a nice performance boost, I measured up to 20% in some cases)


Right but the reason we don’t have ILP64 is mostly legacy code.


The mistake was not properly versioning the ABI


I feel the fault instead lies in the operating system for wanting to run old binaries so much that the operating system pretends that the system architecture (and associated ABI) has not changed, even when there exists larger integers than what the previous architecture had set as intmax_t.


The problem is not old binaries, it is old code that assumes an upper bound to the size of intmax_t.


I’m not sure what you mean. I agree that the problem is not the binaries, but I believe the problem to be the operating system. The source code may not have assumed a specific upper size limit on intmax_t, but the binaries were compiled in good faith to a specific machine architecture, which had a specific width on intmax_t, namely the largest integer type which that architecture could support. If then new CPU models come out with larger integer types, but the operating systems still runs that same binary straight up (without any emulation layer), this exposes the old binary to larger integer types, breaking the binary. The binary is not at fault, it’s the operating system which committed the sin of running the binary on what is effectively a different architecture without any considerations on what architecture the binary was actually compiled for.


The OS could do this for old binaries but no for old source. A lot of FOSS code has baked in assumptions about the max size of intmax_t and the compiler and the OS cannot do much about broken logic.


> A lot of FOSS code has baked in assumptions about the max size of intmax_t

Well, that code is not, and was not ever, standards compliant.


And it turns out that doesn't matter much in terms of adoption and popularity.

After all, if predictable behavior in the future across all possible configurations were a high priority requirement for the tools people use to make computers do things, neither C nor C++ would have gotten off the ground. Instead, people don't memorize the entire standard. They write code that works on their machine, publish it, fix the bugs when people say it doesn't work on their machine, and maybe converge towards something that is standards compliant but, more likely, they converge towards something that works on the standard implementation on 90% of the operating systems available.


> A lot of FOSS code has baked in assumptions about the max size of intmax_t

Would you be able to name three examples?


I realize I used FOSS improperly, I had no reason to single it out; my comment reads like a critique of open source which was not my intention.

My intention was to make a good case for old code being compiled today by third parties.

But to my understanding intmax_t was introduced in C99, before widespread dominance of 64 bits platforms. So it would not be surprising that there exists code with lower assumptions about its size.

But actually I made also another mistake, the major problem had little to do with faulty logic and more to do with dynamic linking [0]

I will see myself out today.

[0] https://thephd.dev/intmax_t-hell-c++-c


intmax_t can rationally be regarded as being outside of the ABI concept. It should give you the maximum integer available, not the maximum integer that your compiler had 15 years ago for the sake of compatibility.

Anyone who uses intmax_t in a durable API (public function argument, member of public structure) is just a goof.


> It should give you the maximum integer available

How would you compile that to machine code which would, when run, still give you the maximum integer on next year’s CPU with all-new super-duper wide integers?

It is more reasonable to think of a binary to be compiled to a specific architecture triplet, and if the CPU changes, the architecture must change, and therefore the ABI, and if the OS wants to run a binary from an older architecture, some emulation layer is needed.

Of course, this would be a lot of work for the operating system people if they want to run binaries from lots of what are now different architectures, and it was apparently easier to just force the C standard to abandon the entire concept of intmax_t being the largest.


In C, you get what you pay for. When the platform changes, it's not unreasonable to recompile in order to take advantage of new hardware features. The OS is certainly different, even if it's still called "Whatever OS", it had to be recompiled in order to support the new hardware, unless, of course, it doesn't. We're all lazy when it comes to our own work, whether it's supporting new hardware in a compiler or OS or super web service on some new cloud platform or a large system vendor.

intmax_t by intent is the maximum integer available on a given platform. Abandoning that idea is a disservice to compiler writers, programmers -- especially scientific programmers -- everywhere.


Oh, I agree completely. Distributions should have bit the bullet and announced a new architecture when the new CPUs with new integer types were released.


> if the CPU changes, the architecture must change, and therefore the ABI

That would add a lot of friction to changing CPUs, and making distributing software more complicated.


Lotta goofs write a lotta code, though. I'm not sure that I disagree with you, but this feels like the sort of thing that could really hurt to break.


There are a lot of goofs out there running goofy binaries for which they have lost the goofy source code. These people are unfortunately also the committee’s problem.


OK, fine, but how, then, can the two problems I described be solved?


Well, what did you do before intmax_t was added to C99?

- POSIX defines that in_port_t is equal to uint16_t, so for that type there's no problem at all ;)

- Don't use printf/scanf for this, roll your own

- Refuse to print/read uid_t that are outside intmax_t range

- Use an autoconf test


My program was first written when intmax_t already existed; AFAICT, intmax_t was introduced in C99. I also need to parse and/or print uid_d, gid_t, and pid_t, so I guess that in the future I’ll have to implement my own integer parser and printer, like user “orlp” suggested elsethread.

Until then I guess I’ll have to do something like

  #if sizeof(pid_t) > sizeof(uintmax_t)
  #error Nope
  #endif
…repeated for all the types I need to parse and/or print.


You can't use sizeof in #if, but you can use C11 static_assert instead.

Also this doesn't just apply "in the future" -- integer types larger than intmax_t already exist, C23 just updates the standard to match reality.


You can do this:

  #if SIZEOF_PID_T > SIZEOF_UINTMAX_T
  ...
  #endif
Where you detect these sizes from the toolchain and deposit them as #define constants in some "config.h" header.

There are ways to detect sizes by compiling a source file to an object file, and then analyzing the object file (no execution), so things work under cross-compiling.

I've used a number of tricks over the years, and settled on this one:

https://www.kylheku.com/cgit/txr/tree/configure?h=txr-278#n1...

The basic idea is that we can take a value like sizeof(long), which is a constant, and using nothing but constant arithmetic, we can convert it to the characters " 8". These characters can be placed into a character array where they are delimited by some prefix that we can look for in the compiled object file with some common utilities.

This is quite durable in the sense that it can be reasonably expected to work through a wide variety of object formats.

In that test program, I have a structure with some character arrays. The larger character arrays hold the informative prefixes. The two-byte arrays hold decimal digits for sizes. Two digits go up to 99 bytes so we are safe for a number of years.

The DEC macro calculates the ASCII decimal value of its argument, expanding to a two-byte initializer for a character array:

  #define D(N, Z) ((N) ? (N) + '0' : Z)
  #define UD(S) D((S) / 10, ' ')
  #define LD(S) D((S) % 10, '0')
  #define DEC(S) { UD(S), LD(S) }
E.g.

  DEC(42) -> /* the equivalent of */ { '4', '2' }
But actually

  DEC(42) -> { UD(42), LD(42) }
          -> { D((42) / 10, ' '), D((42) % 10, '0') }
          -> { ((42) / 10) ? ((42 / 10)) + '0' : ' ',
               ((42) % 10) ? ((42 % 10)) + '0' : '0' }
The space instead of a leading zero is so that if we pull this into shell arithmetic, it isn't confused for an octal number.


That problem could be a really good application for a _Generic macro, which is another newish language feature in C.


To the first problem you can use a classic mod 10 divide by 10 loop to convert into a string digit by digit.

Similarly for the second problem you can read digit by digit in reverse order and create the number using x = 10x + d while detecting overflow.

I'm not saying that this is a great thing to need to do, but the mentioned problems aren't particularly fundamental and the solutions are CS101 stuff.


You may be right that it’s the best solution. It’s horrible to contemplate, but it might be so.


The best solution is to say that uid_t et al are probably no larger than uint64_t, and introducing a static_assert that makes sure that the compiler will fail if it's not.

Sure, someone might come up with an architecture where that's not true. But even in an intmax_t-is-really-the-largest-integer-type world [which hasn't been true for the major C compilers for the past decade or so!], I can almost assuredly come up with some compilation modes that still conform with whichever C version you use that still causes your code to fail horribly. I don't think there's any architecture where uid_t et al would exceed uint64_t, and until such an architecture comes to exist, it's quite frankly not worth caring about.


Oh I agree, it's bad.


> "To the first problem you can use a classic mod 10 divide by 10 loop to convert into a string digit by digit."

But the "physical" size of the number as represented in memory can change across platforms, no?


Maybe C just needs to provide a way to print/format any built in type without specifying it. They got stdc_popcount to work with any integer type (looks like function overloading in C through macros), so can same be done for printing integers?


Don’t forget the other side of the coin; we also need to read a string into any integer type (while also detecting overflows).


> But now, when any type might be larger than an uintmax_t? What do you do?

Complain to POSIX so they add printf macro defines for all their implementation-sized integer types by the year 2100.


For printf(), that would work. For sscanf(), POSIX could add macros there, too, but sscanf() does not detect integer overflow. That’s why the only previous solution was strtoimax() which does detect overflow.


scanf not detecting integer overflow is arguably another problem. It's usually the easiest way to parse numbers out of simple strings, but the lack of overflow detection is a problem. (Actually we use sscanf for this and ignore the overflow problem, as long as it doesn't cause any security or correctness concern.) IIRC glibc actually does overflow detection, so there's not anything inherently problematic with adding it.


> "How do you portably printf() an integer type that you don’t (and can’t) know the size of"

"Portably" across what? You need to narrow down the problem space and better define it.

If it's portably across a myriad of processor architectures, hardware profiles and operating systems then you would need to take the intermediate representation route (like Java). There's very little else that can be done in the face of infinite, non-compatible ABIs at every layer.


I mean portable over all platforms which claims to support standard C.

> you would need to take the intermediate representation route (like Java). There's very little else that can be done in the face of infinite, non-compatible ABIs at every layer.

No, I don’t think so. There’s an obvious solution, dictated by the architectual choices made by C (i.e. every architecture which introduces a new larger integer type, by necessity redefines intmax_t, and creates a new ABI), but the community (for reasons of their own) did not want to do it that way, and instead broke the promise of intmax_t always being the largest type.


I added one of those features, #embed, to my Cedro C source code preprocessor and posted about it yesterday: https://news.ycombinator.com/item?id=32341326

Here is the direct link: “Use #embed from C23 today with the Cedro pre-processor” https://sentido-labs.com/en/library/cedro/202106171400/use-e...

As I write there, “The advantage of using Cedro is that the source code is the same and this way it is very easy to use different compilers, only adding or removing the #pragma line.”

I also added a command-line option to embed the bytes as strings instead of byte literals; embedding an 8 MB file, the strings variant compiled between 28 and 72 times faster depending on which compiler I used, gcc or clang. It is more efficient, but less compatible, that’s why it’s optional.

I’m considering adding some low-hanging fruit, like the number literal separators: 1'384'849 → 1384849


For number literal separators, I'd suggest using underscore rather than apostrophe. Ada, C#, D, Haskell, Java, Kotlin, OCaml, Perl, Python, PHP, Ruby, Go, Rust, Julia, and Swift all use the underscore. C++ wanted to use the underscore, but was forced to use the apostrophe because of user-defined literals.


Yes, I meant implementing it as specified for C23 which was done for compatibility with C++, although I do prefer the underscores.

Maybe that’s a better idea, implement it with underscores in Cedro as it does not need C++ compatibility.

Edit: alright, it works. I also extended the parser to accept the apostrophe as part of number tokens. The only thing it does is to remove the underscores when writing.

I might need another command line option to specify whether the output should be C23:

    1_234_567 → 1'234'567
or pre-C23:

    1_234_567 → 1234567
    1'234'567 → 1234567


> N3042 - Introduce the nullptr constant

They should have formally defined NULL as void* instead.

> Someone recently challenged me, however: they said this change is not necessary and bollocks, and we should simply force everyone to define NULL to be void*.

I'm glad someone brought it up. The scenarios nullptr "fixes" are so minor adding a new null concept to fix them is the sort of overkill solution I expect from the C++ committee, not the C committee. It's a shame this wasn't thought-out more.

> I said that if they’d like that, then they should go to those vendors themselves and ask them to change and see how it goes.

The C committee has already broken compatibility on numerous occasions: forcing 2's complement representation, changing the type of u8"" string literals from char to unsigned char, removing mixed wide string literal concatenation (allowed in C11, disallowed in C23). It's difficult to be convinced by the argument of backwards compatibility when the committee has already broken it on multiple occasions. Furthermore, I seriously doubt compiler vendors who define NULL as 0 would take issue with redefining it as void*. This change is so minor and inconsequential there should be no push back.


I am not all that standards savvy, so pardon my ignorance.

>Introduce the nullptr constant

>They should have formally defined NULL as void* instead.

That doesn't make any sense to me. void* is a type isn't it? nullptr is a value. I don't see the overlap.


> That doesn't make any sense to me. void* is a type isn't it? nullptr is a value. I don't see the overlap

The C standard permits NULL to be defined as (void*)0 or 0. The issue is this creates an inconstancy when NULL is used with _Generic selection: a compiler defining NULL as (void*)0 will cause NULL to be caught by a pointer case whereas a compiler defining NULL as 0 will cause it to be caught by an integer case. This inconsistently could be solved by mandating NULL be defined as (void*)0 so it will always be caught by the pointer case.

The issue is instead of mandating (void*)0 the C committee decided to introduce a whole new nullptr concept. This duplicates the concept of null in the language and requires all C compiler vendors to modify the languages type system to account for it. So it creates more work for vendors and duplicates the concept of null all to solve this minor use case.


For anybody else who has no idea what C23 is and for whom the article's intro really wasn't helpful, this is about a proposed major standard revision for the C language (https://en.wikipedia.org/wiki/C2x).


> memset_explicit

Exciting to see this in the standard. Once this is in all compilers, this will probably have security effects beyond C.

Fun fact: Miguel Ojeda linked to @cperciva's article and its following HN discussion in the note (N2897) to propose this: https://news.ycombinator.com/item?id=8270136


I'm confused, why didn't they just mandate memset_s instead of introducing memset_explicit?


My understanding, following the reading of https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2897.htm is that it was simpler to "salvage" a single function from "Annex K" by renaming it, considering that "Annex K" was both optional and controversial (some called to remove it).


Because people don't care abut security (or they just have no idea), hence they roll their own _FORTIFY_SOURCE hacks, but they still don't get memset_s right. So they rather named it what it is, protect from compiler optimizations, but not from cache side-channel attacks, which the _s variant would do. Because that is the secure variant.

memset_explicit is just to overcome stubborn optimizer people who insist to optimize functions away (without warnings!), even when they have no idea about side-effects.


> but not from cache side-channel attacks, which the _s variant would do

Wow, was that a requirement for memset_s? I'd never heard of timing being a something guaranteed by the *_s functions, regardless of security being in their name.


When the ISO standard Annex K was written, they didn't know of those security problems yet. No one knew. But it is documented to be the secure variant, so known vulns need to be fixed there. It can be slow, it just needs to be secure.


Because Annex K sucks and should be removed entirely.


For those, like me, who weren't up to speed with why this argument is made, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm seems to be a good summary of the problems with it.

(if it isn't, hopefully somebody who knows more will post a better link)


I'm not familiar with C at all, but I'm wondering why they chose to add a keyword just for this specific case instead of providing a way to granularly tell the compiler not to optimize specific lines of code or what optimizations are permitted.


This is mainly useful for implementing things like explicit_bzero(), to guarantee memory is zeroed before freeing it(free() makes no guarantees about the contents of freed memory). This is hugely important for discarding things like private keys.

Up until now, safely implementing explicit_bzero() has depended on various hacks that a new compiler version might drag out from under you. This should end the "arms race".


I'm guessing that this would have been either too complex, or taken too many years. The pragmatic, lower scope approach, already required quite a lot of work.


Volatile is exactly for this case. If you write or read a volatile variable then the compiler isn't allowed to remove the operations for any reason.


It's a bit more subtle than that; variables aren't volatile, objects are volatile. A volatile pointer is a hint to the compiler that the object pointed to might be volatile. If it can't disprove that "might" into "definitely not" then it treats it as volatile.


That's what i thought, but apparently this is wrong.

Had a disscussion about this here:

https://news.ycombinator.com/item?id=27804810


I don't believe they are correct, or at least their arguments doesn't explain why it wouldn't work. Volatile wouldn't work with memset since memset doesn't take a pointer to volatile memory, that is true, but you could set the volatile data yourself and it shouldn't be optimized away as per the spec. The data living on a stack doesn't matter, it works perfectly fine having volatile data on the stack.

I tried the following code in a few compilers, it works perfectly for overwriting buffer pointers that goes out of scope, also tested removing the volatile to ensure that the memory write was ignored and I reproduced the code vulnerability where the previous message was still there the second time I tried to read a message, and from the way I read the spec this shouldn't be optimized away by any compiler:

    void memset_explicit(char* p, char c, size_t n) {
        volatile char* vp = p;
        while (n--)  *vp++ = c;
    }
Casting other data types to a pointer to volatile data is within the spec, and the compiler should then treat it like any other volatile data.


It's a shame that attribute((cleanup)) wasn't standardized since it's very widely used in real C code, but at least the badly designed and massively overengineered "defer" suggestion was rejected.


Widely used in GCC C code.


It's supported by Clang, so widely used in open source projects - systemd, libvirt, everything using glib, some Linux kernel tools, dracut, qemu, glibc, ..


Apparently, no one has bothered to update clang´s documentation on that regard, https://releases.llvm.org/14.0.0/tools/clang/docs/AttributeR...

Many of those open source projects happen to only compile with GCC, exactly because they rely on GCC C.

Google has spent several man years making the Linux kernel compilable with clang, and not all of it has reached upstream to this day.


https://clang.llvm.org/docs/AttributeReference.html#cleanup. We use it in several projects and compile those with clang.

Here's nbdkit compiled with clang 14:

https://gitlab.com/nbdkit/nbdkit/-/jobs/2814022460

  checking whether clang is Clang... yes
  [...]
  checking if __attribute__((cleanup(...))) works with this compiler... yes


Thanks for the example.


> and not all of it has reached upstream to this day.

Which parts haven't? Asking as the project lead.


As far as I can tell, only Linux Android compiles cleanly out of the box with clang, not so with regular Linux kernel.

At least that is my impression from occasional Linux talks.

By the way what's up with NDK, Android packages, and that roadmap with better C++ development?

Thankfully no longer a problem of mine.


> not so with regular Linux kernel.

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

$ cd linux

$ make LLVM=1 -j$(nproc) defconfig all -s

$ echo $?

0

$

See also the official kernel docs which describe more which architectures can be expected to compile cleanly out of the box. There are something like 10^6000 possible kernel configurations though, so no promises (regardless of toolchain). https://www.kernel.org/doc/html/latest/kbuild/llvm.html#supp...

We support clang-11 and newer.

Linus Torvalds has been using clang to build the kernels he runs personally, for ~2 years now.

https://lore.kernel.org/lkml/CAHk-=wiN1ujyVTgyt1GuZiyWAPfpLw...

>> I build the kernel I actually _use_ with clang, and make sure it's clean in sane configurations

Not sure what's going on these days in Android land, thankfully no longer a problem of mine.


I stand corrected, and apparently the love for the NDK experience goes both ways.


Yes, Clang is pretty bad about updating their documentation of GCC features they support.


> N3020 - Qualifier-preserving Standard Functions

I just read the actual proposal on it. They describe an example implementation using macros and _Generic.

What’s really bad about this is that the arguments provided to these methods are used more than once in the macro body. This will cause:

- gibberish compiler errors in case of typos in the arguments,

- worse compilation times, as the compiler has to process more code.

Constructs like these may lead to an exponential explosion in preprocessed source code size. I hope that nobody implements it that way.


I've encountered this problem in real code bases when you have a getter function for a complex data structure. You have to decide between:

   field *my_struct_get_field(my_struct *s)
    const field *my_struct_get_field(const my_struct *s)
In practice you will choose the first instance, so you simply won't use const where you could. Which is unfortunate.

It would be nice to have a shorthand for this. The nested macro shown in the paper looks like it can fix the problem, but is quite complicated. I would like something like:

    autoconst field *my_struct_get_field(autoconst my_struct *s)
The returned pointer inherits the qualifiers of the argument. Inside the getter function the pointer is const, and there is only one implementation of the function.


I agree. While I didn't make an effort to actually understand the approach here, N3020 was the one that I disliked because it solves too small of an issue in practice and there probably can't be a simple enough solution. Any solution is probably complicated because the issue is only a symptom really.

The real issue, in my eyes is, that "const" simply doesn't transfer well across function call boundaries. "const" is always only a local judgement - data that is const for one function is not necessarily const for another function. I've even read an old conversation where Dennis Ritchie was uttering concerns about const when it was initially designed.

Typically I use the const keyword only for global static data, because there it makes a real technical difference. Sometimes I declare function pointer parameters as pointer-to-const, partly for reasons of documentation, but I'm always aware of the strstr() issue.


The compiler does not need to implement it with the macros shown. They could use other techniques, so long as it has the same net result of preserving constness of the argument. For example, the macro definitions could easily use a macro that resolves to a magic generic intrinsic (which becomes a call the existing function implementation) that maintains the constness of the argument.

An exponential code size increase even with the example implementation would only happen if people are nesting these functions inside each other. And while that could happen, it would be unusual to nest them very deeply. So for compilers that just want an easy implementation, and don't really care about quality of error messages or maximizing compile speed, the shown approach is probably fine.


_Generic is an awful feature relative to attribute((overloadable)) anyway so that's probably a non-issue


How exactly can this lead to exponential explosion?


Code that does things like strcat(strcat(strcat(strcpy(...), ...), ...), ...)


If I understand the implementation section of the proposal correctly, the preprocessor will just add ome cast to (const char*) per function call (assuming the innermost argument is const). Nothing exponential


For the ease of discussion here is the relevant piece of code:

    #define IS_POINTER_CONST(P) _Generic(1 ? (P) : (void *)(P)  \
        , void const *: 1                                       \
        , default : 0)
    #define STATIC_IF(P, T, E) _Generic (&(char [!!(P) + 1]) {0}  \
        , char (*) [2] : T                                        \
        , char (*) [1] : E)
    #define _STRING_SEARCH_QP(T, F, S, ...)       \
        STATIC_IF (IS_POINTER_CONST ((S))         \
            , (T const *) (F) ((S), __VA_ARGS__)  \
            , (T *) (F) ((S), __VA_ARGS__))

    #define memchr(S, C, N) _STRING_SEARCH_QP(void, memchr, (S), (C), (N))
    #define strchr(S, C) _STRING_SEARCH_QP(char, strchr, (S), (C))
    #define strpbrk(S1, S2) _STRING_SEARCH_QP(char, strpbrk, (S1), (S2))
    #define strrchr(S, C) _STRING_SEARCH_QP(char, strrchr, (S), (C))
    #define strstr(S1, S2) _STRING_SEARCH_QP(char, strstr, (S1), (S2))
Here `IS_POINTER_COST` expands `P` twice so `_STRING_SEARCH_QP` expands `T`, `F`, `S` and each variadic argument 2, 2, 4 and 2 times respectively. Note that only `S` and variadic arguments are expressions and evaluated only once, but the number of expanded tokens can blow up exponentially.


I'm just happy that both C and C++ now have bit stuff like byteswap, popcount, rolr, etc.

It was a bit silly that stuff like that wasn't in either standard until this recently.


These this are much more complicated to define then you think when you consider the kinds of strange hardware C supports, where bytes are not 8 bits and things like that. We tried to get endian-ness conversion functions in to c23 but ran out of time just because defining these functions so that their functionality is clear on non standard hardware is so difficult.


It would have been reasonable to define the functions only for the CHAR_BIT == 8 case, and if any implementers that support weird architectures want the functionality let them come forward with a proposal. I got the impression that the people working on the proposal got into an avoidable mess of complications.


Given that weird platforms are so dependent on C and since we release versions so infrequent, we try to get it right the first time. Often even defining what is excluded becomes complicated.


if C throws away portability it is no longer C


C's secret to portability is called #ifdef spaghetti and mostrosities like autoconf.


oh, I didn't see that byteswap wasn't in there. I googled the "stdc_" part and saw a paper that included byteswap. I guess that was an older draft.

Damn, so close. Well I hope it's included next time.


I'll do my best to get ror/rol and byteswap in!


> We do not have namespaces in C, which means any time we add functionality we basically have to square off with users.

Has anyone seriously proposed adding namespace to C? This feels like an obviously good addition to me. What’s the argument against it?


Namespaces are often called for especially by programmers used to C++ but doing C. They seem like such a simple feature to implement. I'm still not sure I like them:

- What is the benefit of using A::B::C versus A_B_C, really, in terms of avoiding name clashes? I can point out one immediate benefit of doing the latter - code is easier to read because line noise is reduced (perhaps subjective but I think it will be hard arguing the other way around).

- Is it a good thing that half the code will now refer to A::B::C using only B::C or even only C? Without assistance of a program with good semantic insight (a solid IDE, etc) this only makes identifier search harder.

- Namespaces "enforce" discipline in one way - you can be sure that the symbols at the binary level will be properly prefixed. But with only a little discipline programmers can do the prefixing themselves, and in return they can move code between files (/namespaces) more freely, which is good for refactoring.

And as a sibling commenter pointed out, one factor why namespaces get called for could be lack of understanding that we can hide the "guts" of any function (or variable) using "static" linkage. There is less discipline (prefixes..) needed for these functions because they are only visible in the current translation unit.


> What is the benefit of using A::B::C versus A_B_C, really, in terms of avoiding name clashes?

Many C programs do not use hierarchical names like `A_B_C` as they ideally should. They instead pick a short identifier that is (often incorrectly) believed to be distinct enough.

> Is it a good thing that half the code will now refer to A::B::C using only B::C or even only C? Without assistance of a program with good semantic insight (a solid IDE, etc) this only makes identifier search harder.

If you meant that the identifier search should be possible with just grep, yeah namespace makes it harder but it's not the only cause and this doesn't explain why many other languages with less overall IDE support than C/C++ have namespaces. And I believe it is possible to design a namespace system that only needs a ctags-level automation for proper identifier search, though I don't know how you feel about ctags.


> Many C programs do not use hierarchical names like `A_B_C` as they ideally should. They instead pick a short identifier that is (often incorrectly) believed to be distinct enough.

What is your list of name clashes that you experienced? How many real headaches did they give you? Or is it all a non-problem? Not a rhetorical question - I'm mostly working on smaller projects < 100KLOC.

> I don't know how you feel about ctags.

I use ctags from time to time when I have to, but I still don't like it when there are multiple namespaces that contain the same set of names which confuses navigation operations, even vim with ctags I think. Maybe even IDEs like Visual Studio, I'd have to check though what works and what doesn't.


> What is your list of name clashes that you experienced? How many real headaches did they give you?

Here are a dozen examples from Xlib, where Windows happened to choose the same short common names for many things: https://gitlab.freedesktop.org/xorg/proto/xorgproto/-/blob/m...

... and IIRC this list is no longer sufficient; you need to add to it in order to compile with the current Windows SDK. (Upon closer inspection, it was updated two weeks ago, so maybe it's fine... until the next Windows SDK update.)

I'll admit that this happens less often on Linux, where the system headers are smaller (and everybody uses X11 so there's historic reason to avoid all the short names that were gobbled up by Xlib in the '80s), but I've still run into occasional clashes between different library headers, or between legacy code and updated headers. eg. bool/Bool/BOOL are common collisions among pre-C99 libraries (including libraries that require C99 but don't remove the old names for backwards compatibility reasons), as well as min/MIN/max/MAX which still aren't in standard C as far as I can tell.

The headaches it gives are real, but not large in the grand scheme of things. The lack of defer (or otherwise standardized and cross-platform __attribute__(cleanup) ) is a bigger headache, for example.


I've programmed C for 35 years, and namespaces clashes have happened for me exactly once. There are two JSON libraries that use json_* as a prefix and have conflicting symbols. This actually caused a quite difficult to track down bug: https://bugzilla.redhat.com/show_bug.cgi?id=2001062

However this is not a reason to add namespaces. (In fact the bug was fixed using symbol versioning, an already existing feature of ELF.)


> What is your list of name clashes that you experienced? How much real headaches did they give you? Or is it all a non-problem? Not a rhetorical question - I'm mostly working on smaller projects < 100KLOC.

I too refrained from using C for that large software so I don't have many examples either, but in one case I was using TweetNaCl where you have to supply `extern void randombytes(unsigned char*, unsigned long long)` for CSPRNG and I had to rename it for some reason I can no longer recall.


> What is your list of name clashes that you experienced?

My experience is that sharing libraries is so wildly difficult In both C and C++ that code is not shared and wheels are reinvented. This has more to do with build systems than namespaces. But namespaces are a factor.


> many other languages with less overall IDE support than C/C++ have namespaces.

Because they are not C-style language. In Python or Java, for example, you need a separate file to create a package. In C++ you can add namespaces anywhere you want. This makes C++ namespaces harder to maintain even with automated tools.


Namespaces can be assigned alias to avoid colisions, good luck doing that to A_B_C.


Good point, but then again I'm not that sure it's a net benefit because I like everything in a codebase to refer to a given object using the exact same name.

I suppose aliasing is useful for widely used libraries, for example if a big software projects wants to include two different versions of the same library. Or Team A wants to rename their module but make it easier for other teams that use their module to follow suit.

All that can provide some ease of use in the short term but produces more mess to clean up in the long term. IMHO.

Real name clashes between two different libraries are quite unlikely, and namespaces would only solve that problem on the source code level, not on the binary / symbol level.


Namespaces can enforce good project structure. If you're using A::B::C and A::B::D you can be 100% that both C and D definitely live under B, and your editor can work with that as well.


> you can be 100% that both C and D definitely live under B

And that brings you what benefit exactly? I've seen a number of projects that are preoccupied about "proper nesting", while it is 99% bureaucratics and all those projects are still a mess.

The benefits of "living under this or that" are technically zero, and with respect to human factors are minimal given that 1) you can get most of the organizational benefits of namespaces by doing A_B_C, and 2) you can also add new members to proper namespaces from external files in most languages, including C++, so there really isn't a difference w.r.t "knowing for sure".


The benefit is a nice mental model of where what function lives (if you don‘t overdo it). This type of packaging also lends itself to module systems, a thing I desperately wish for in C


This type of "packaging" is completely orthogonal to module systems. It's merely a syntactic discipline that adds one more complication to deal with even where it has zero benefit.

You could say that the C/C++ systems already has "modules" if you look at object files and header files. Of course, they are a limited kind of modules because they are a bit low-level and C has the preprocessor problem, making it slow to "import" modules. But none of that has to do with namespaces.


Well, they enforce having a lot of project structure. Java and C# use their namespaces to ensure that everything useful is under five layers of pointless names like System.Collections.com.net.org.ArrayList.

I think not having namespaces may be an effective way to prevent enterprise programming from happening.


Yea, I‘ve got to be honest in that I don‘t understand Java or C# hierarchies.

Node.js and Rust do module systems just fine.


Yes people have floated the idea. I argued against it, I was asked at the wg14 meeting to explain my position and the following was my reply:

I was asked during the meeting why I think that name spaces are a bad idea, so here is my thoughts. I must warn you that the following may sound like an anti C++ rant, because well .... it is.

Name spaces have several bad aspects. The majority of them are that they make the code a lot less readable.

First, If I see a function call in source code, I cant assume what function is called. There may be a namespace somewhere above or in a header-file somewhere that changes the meaning of the code. It matters a lot for instance when you copy code from one file to another, and all of a sudden the code just means something completely different. Name spaces add state that changes the meaning of the code around it. When your code isn't doing what it should and you are trying to figure out why, you need to be able to trust that the code you are reading on screen is actually what you think it is. I dislike it for the same reason i dislike overloading: it makes it less clear what things you are reading really are.

(It may be argued that you already cant trust your eyes in C, because you can do lots of devious things with the per-processor, and to that I say, yes its bad enough, so lets not add more of it)

Name spaces splits the identifier in half, but its still a unique identifier.

my_module::create();

is and has to be as unique as:

my_module_create();

So what are we accomplishing? Do we have the same level of collisions? No, in fact we have more collisions because the parts can collide individually! if there are multiple my_module they will collide, and if there are multiple create in different namespaces they will collide!

my_module::create(); // one name space my_module::destroy(); // another name space

Collision! how about:

using namespace my_module; // namespace with a create using namespace my_other_module; // namespace with a create

crate();

Collision! If we instead write plain old C:

my_module_create(); my_module_destroy();

No collision.

my_module_create(); my_other_module_create();

No collision.

Beyond creating new opportunities for collisions, we create confusion by making it possible to hide half the identifier somewhere else. Yes, it can save some typing, but programing is never hard because of typing, programing is hard because its hard to understand what is going on. Namespaces solves the easy stuff, by making the hard stuff harder.

Its the age old, trap of C++ of adding something clever out of convenience, that turns out to be unclear and something that the programmer has to manage.

Further, namespaces encourage people to use short common names even further, because they think it saves them typing, and they think if it goes wrong they can always manage it with namespaces, and then they end-up having to manage something they shouldn't have to manage in the first place.

I have never had a namespace collision in 20+ years of C programming. Why? Because I use long descriptive names, that always start with where the functionality resides. Its readable, straight forward and works. Namespaces is a complicated system for managing a problem that should never happen in the first place, unless the user is very careless. We should not encourage carelessness.

I think C, should take some blame for this being a problem because the standard library have far shorter names then is advisable. It has made people think that names like: "jn" or "erf" are good examples of unique, clear and descriptive naming. The added wrinkle of "significant characters" has made people think that C mandates short names, something that no major implementation requires. There also seems to be a persistent belief that display technology has not yet evolved to the level that we can display more then 80 characters per line and that we therefor need to use short cryptic names for everything. This is also an argument from a different age.

Namespace collision in C almost never happens between 3rd party libraries, it is almost the exclusive problem of the standard library because it is so poorly named. If we want to fix this, by hiding the standard library behind a namespace, we might as well just add a c_standard_lib_ prefix to all functions, and keep the garbage that is namespaces out of C. Not that we would do either, since both world break backwards compatibility. So why would we add namespaces if it wouldn't be used by the standard lib, the very library we have name collision problems with? In fact if we added an optional namespace for the standard lib, all we would accomplish is to pollute the namespace with one more identifier.


    using namespace my_better_name = my_module;

    //....

    my_better_name::create(); // colision gone
Problem solved, better improve your C++ knowledge.


This addresses one (very uninteresting) "problem" shown by the person you are replying to. A problem which can be solved regardless of the existence of "using namespace a = b;" by simply not using the namespace. There are still all the other problems which are more concrete but which namespaces can't solve.

The C standard library should have (maybe as early as C11 or C99) picked stdc_ as a "namespace", reserved it, made it a warning to put things in it, and used it for everything going forwards.

What C needs is not namespaces, it's a module system, so that symbol visibility can be more tightly controlled.


Good luck with that.


> Yes, it can save some typing, but programing is never hard because of typing, programing is hard because its hard to understand what is going on.

I agree that optimizing number of keystrokes is a bad goal, but in my opinion that isn't the selling point of namespaces. The benefit is better readability. Code written with namespaces lets your eye focus the semantic name of the method, not extra character noise that was added as a form of manual name-mangling.

This is highly opinionated, of course, because there is a tradeoff with the ability to uniquely identify a method at a glance, as you mentioned. Which camp you fall into is likely to depend on what sort of software you write and whether you use an IDE or a text editor.


This is a misunderstanding of C++ principles, not an anti-C++ rant.

In C++, it is intentionally not straightforward to know what code gets called when a statement is executed - even without namespaces. We have:

* Function overloads

* Non-trivial (and not-built-in) implicit conversions

* Non-trivial copy and move constructors

* Template and concept resolution, which can be tricky

* Operator overloads, so that foo[n] or bar() may do something very different than what you expect.

And if you add in macros to the mix, you can really go crazy. For example, in this SO answer: https://stackoverflow.com/a/70328397/1593077 I explain how to implement an infix operator which lets us write:

    int x = whatever();
    if ( x is_one_of {1, 3, 42, 7, 69, 550123} ) {
        do_stuff(x);
    }
C is simply not that kind of language (well, macros notwithstanding I suppose).


This is nice if your goal is to come up with convoluted ways to say the same thing in fewer characters (ignoring the code that you had to add to enable the shorter version in the first place).

With your example, both a switch and simple if-else chains would be perfectly readable and also easy to write.


I agree with everything you say but want to add a caveat to the "long descriptive names". I find that there must be a balance where names are clear and unique enough while not tiring the eyes too much by making them read long boring repetitive prefixes all the time. There is a fine art to creating words and also sentences (statements).

By now if a "module prefix" is bigger than say 4-5 characters, I get unhappy and know I need to improve, to find a short mnemnonic. Same goes for local variables names, which I try keep at 1-5, rarely they get 10+ characters long. There is always this tension between names being "self-documenting" and "just long enough to remind of the purpose that was explained in a not too distant context". Variables that are frequently used should be shorter. Variables that have a clear intuitive use (like "x" or "i" which most often have very clear meaning with only little context added) should be shorter. Module names should _always_ be very short because (I assume) there are only few modules, and it's better to remember the purposes of a few modules together with their abbreviated names, than to have to read a long repetitive module name each other line. Function name suffixes (without the module prefix) should often be long because there are many different functions and not all of them can be cached with their meanings in the programmer brain - so I allow function names to be a little self-documenting typically.

I agree that the names in POSIX and C are too short and cryptic but think they made more sense in the context of Unixes of the 1970s when projects weren't as big as today.

I still try to stay within 80 or 100 columns because line length a readability / eye strain concern as well, but given that I'm in the 8 spaces camp I don't freak out anymore if there is the occasional 140 characters line and I'm to lazy to trim it down. Judicious insertion of linebreaks is most often useless busywork. Then again, some function signatures take too many arguments (glVertexAttribPointer/glBufferData... or the Win32 API come to mind) and inserting linebreaks in calls can improve readability sometimes.


> “I find that there must be a balance where names are clear and unique enough while not tiring the eyes too much by making them read long boring repetitive prefixes all the time.”

I have that problem, repetitive prefixes getting in the way of reading the code, and I’ve been doing some experiments with what could be described as “local namespacing”, only inside an expression or statement, with the prefix/suffix feature of the backstitch macro in my C preprocessor:

https://sentido-labs.com/en/library/cedro/202106171400/#back...

For instance, writing a program with libxml2, I write:

    Next(reader) @xmlTextReader...;
which gets translated to

    xmlTextReaderNext(reader);
and I find the first easier to read.

Whether that’s the case for others, I don’t know.

A longer example from the link I wrote above, this time for libuv:

    @uv_thread_...
        t hare_id,
        t tortoise_id,
        create(&hare_id, hare, &tracklen),
        create(&tortoise_id, tortoise, &tracklen),

        join(&hare_id),
        join(&tortoise_id);
Result:

    uv_thread_t hare_id;
    uv_thread_t tortoise_id;
    uv_thread_create(&hare_id, hare, &tracklen);
    uv_thread_create(&tortoise_id, tortoise, &tracklen);

    uv_thread_join(&hare_id);
    uv_thread_join(&tortoise_id);


In C you can use functions the file doesn't see, and since namespaces changes what symbol an expression refers to depending on what symbols exists in the program it will lead to a lot of strange scenarios. C++ doesn't allow this, and is the main way C++ isn't compatible with C.


Technically, they are a part of the attribute syntax in C23. Maybe a backdoor way to get them into a future standard.


I'm skeptical of it. I think it is a

std::opinion::bad_idea()

;)


maybe we could call the language with such extensions "C++"


Except that C++ doesn't serve that role, C++ is its own whole thing. I think "C but with namespaces and a method call syntax (and templates?)" would be a great language which would occupy a completely different space than C++.


if you introduce namespaces and method calls you have to introduce name mangling, to differentiate

    namespace foo { void x(); }
and

    namespace bar { void x(); }
and then you have to rely on compiler vendors to use the same mangling everywhere otherwise you end up exactly at the C++ position where there are multiple incompatible name mangling schemes, thus C code compiled with e.g. cl.exe would not be able to call a C function compiled with gcc (and FFIs wouldn't be able to either so you loose the "easy language bindings" "feature" of the operating system ABIs)


Namespaces by themselves aren't a reason why name mangling is needed. There isn't a technical reason why foo::x couldn't be the symbol name, literally. (Not sure what ELF and PE/COFF etc would think of these names in current implementations).

Name mangling is needed if you want to overload functions and qualify them only by the types (not names) of their in and out parameters. This is where it gets ugly on the binary level.


> There isn't a technical reason why foo::x couldn't be the symbol name, literally.

how do you do when you want to access your C function from a language which is binary-compatible with C but uses :: for something else? [a-zA-Z_][a-zA-Z0-9_]* identifiers are the only thing that the whole world more-or-less standardized around.

e.g. in fortran you can directly import a C function and call it. But "::" in the middle of a function name would very likely fail (I don't know enough fortran to tell for sure but given how the syntax looks...):

     subroutine foo (a, n) bind(c)
       import
       real(kind=c_double), dimension(*) :: a
       integer (kind=c_int), value :: n
     end subroutine foo
allows to call a "void foo(double *a, int n);" function defined in C. I imagine that

     subroutine ns::foo (a, n) bind(c)
would likely not work


> how do you do when you want to access your C function from a language which is binary-compatible with C but uses :: for something else?

Typically you'd do this by picking a different internal name for the function, and putting the external symbol in quotes.


Fair point. The C naming convention with flat names is quite conservative and consequently allows the names to be used directly from most other languages without any language-aware mapping / compatibility layer.


Rust doesn't allow function overloading. Name mangling is used there to implement linking multiple versions of an external module into the final build.


Although Rust doesn't have ad-hoc polymorphism, it does have polymorphism, and so it needs to track multiple versions of the same function anyway.

Take String::contains(). The equivalent feature in C++ is an overloaded function, so there are I believe it's three, versions of this function which take different parameters: A string, a char and a pointer to chars, they do similar things in practice but the compiler has no idea, there are just three functions with the same name. However the Rust feature is polymorphic, there are N versions of this function depending on what monomorphisations are chosen at compile time. The compiler knows these are all the same function, but the parameters have different types in practice at runtime and so the generated machine code is different. If your program can String::contains(cat_photo_jpg) then the compiler will produce the code to call it with your Jpeg type or whatever and it will need to keep it distinct from the version where it takes a String or a char or whatever.

Rust does this more often than C++ because it cannot choose ad hoc polymorphism. So if there should be a foo function which can take parameters bar, baz or quux, we need to decide either that bar, baz and quux all implement some trait which foo takes, or we need three separate functions foo_with_bar, foo_with_baz and foo_with_quux.


This question is very OT considering the what TFA is about, but…

What does the term ad-hoc polymorphism mean in your comment? Are you saying Rust does not have ad-hoc polymorphism because the syntax acts like the function in parametrically polymorphic and only once monomorphisation occurs are the different functions per argument type are generated is it N different functions. Does this line of thinking also say Haskell does not have ad-hoc polymorphism?

And in case any reader wonders, I am genuinely interested in your response to these questions (i.e. this isn’t bait nor rhetorical).


Indeed; I am wondering why go does not consider traits in rust a form of ad hoc polymorphism?


Ad hoc polymorphism is distinct from parametric polymorphism in that rather than parameter type itself being a parameter, there are just an arbitrary (ad hoc determined) set of types allowed.

The C++ standard library defines contains(x) so that it'll work for a string x, a single char x or a pointer to chars x. Nothing else can work, those are the arbitrary list of types which work, that's ad hoc polymorphism.

A separate implementation is provided for each of those three cases, which is why if I've got a JPEG, I can't instead ask if the string contains the JPEG, that's not one of the three implementations provided.

The Rust standard library just defines the type of x in terms of a trait, Pattern. So, my Jpeg type can just implement Pattern and now I can ask if Strings contain the Jpeg and that works because contains delegates the matching problem to Pattern and Jpeg implements Pattern -- only the specific idea of "contains" as distinct from "begins with" or "split" or a dozen other functions is handled by the contains function.

I am not a (serious) Haskell programmer but I would argue that Haskell also lacks ad hoc polymorphism here as I understand it (and to be clear: I think this is in general a good or at worst reasonable choice)

The place where ad hoc really shines is when you're got a function that say, makes complete sense with exactly two (or maybe three, fewer than two is irrelevant and more than three seems unlikely to provide reasonable ergonomics) specific types, which otherwise have nothing useful in common.

For example suppose I've got a whole variety of bird types, Goose, Chicken, Ostrich, Penguin, Sparrow and so on, and I've got this function thunk() and I realise that, oddly it makes sense to thunk a Sparrow or an Ostrich, but literally no other birds at all. I think hard about it, but the best I can come up with to describe this property is "Thunkable" since all it really means is you can thunk() them. And there's no "content", there's no special implementation work in "Thunkable" that shouldn't live in thunk() for maintenance anyway. In this case ad hoc polymorphism is great because it saves needing to make this stupid "Thunkable" trait / type class / type-of-types / interface just to group together Sparrow and Ostrich for this single purpose.

But I'd argue the "Thunkable" case is rare, and C++ has a lot of cases where ad hoc polymorphism was the wrong choice and they fell into it.

I mentioned three types, really C++ contains() does only two things but it spells one of them two ways for 20+ year old reasons. It can do strings (as std::string, but also via C's char * type) and it can do a single character char (one code unit, so, no poop emoji).

Rust provides four Patterns, they are a string reference, a single char (a Unicode scalar, so yes a poop emoji works), a slice of chars (any of the chars matches), or a predicate which matches characters.

Now, do any of those four feel like things you'd definitely never want in C++? Because if C++ wanted all of them that's now five overloads for contains. And five overloads for find, and for every other matching function, on every string or string-like type...

I believe ad hoc polymorphism is so rarely what you really want, and yet it so often detracts from the rest of the language facilities that "We don't have that" is a sensible language design choice, same as for multiple inheritance.


I think my issue is with your definition. Type classes in Haskell are the implementation for ad-hoc polymorphism, and likewise traits in Rust. I think the definition you are using is not the commonly given one and that is where my confusion came from.

Typeclasses (in a Haskell context) we’re formalized in the Wadler and Blott paper “How to make ad-hoc polymorphism less ad hoc”. And for Rust, in [1] Traits are explicitly stated to be the method in which Rust achieves ad-hoc polymorphism.

1: Klabnik, Steve; Nichols, Carol (2019-08-12). "Chapter 10: Generic Types, Traits, and Lifetimes". The Rust Programming Language (Covers Rust 2018)


I didn't know either of these things, I guess now I have some reading to do, so thanks.

Edited to add: Hmm. Actually though, surely that second reference is just "The Book" as it's called, did it actually say it's about ad hoc polymorphism? Because I've read this section of The Book, although I hadn't in 2018, and it doesn't mention "Ad hoc polymorphism".

What's there now (I just checked) is a description of how you'd approach this problem in Rust, using traits, but doesn't claim this is ad hoc polymorphism, and sure enough doesn't involve an arbitrary set of types which is the sort of the point of why "ad hoc" is there in the name.


The standard definition of ad-hoc polymorphism imputed to Strachy, where:

Strachey chose the adjectives ad-hoc and parametric to distinguish two varieties of polymorphism [Str67]. Ad-hoc polymorphism occurs when a function is defined over several different types, acting in a dif- ferent way for each type. A typical example is overloaded multiplication: the same symbol may be used to denote multiplication of integers (as in 33) and multiplication of floating point values (as in 3.143.14).

That is from the Wadler paper where typeclasses are formalized. Typeclasses and Traits are the implementation details for those function symbols that vary in implementation for each type. The restrictions on the types (like which types implement a trait or have a type class defined for it) are the types the symbol can be used on.

You seem to be focusing on the ‘arbitrary set of types’ point, but the only connection between the types accepted by Rust’s generic functions (which are functions that accept a type provided it has some trait) are that they take types which have an impl for that trait.

I think there is a bit of ambiguity regarding the term ad-hoc polymorphism at play. You seem to think the trait/typeclass implementation of ad-hoc polymorphism (which was invented to formalize a well behaved class of ad-hoc polymorphic functions) makes it no longer ad-hoc. My position echos Wadler, it’s still ad-hoc but just less ‘ad-hoc’ (i.e. more formalized).


It comes to this phrase about a function "acting in a different way for each type".

In C++ there are literally three separate implementations of std::string's contains method for three type signatures. This is pretty clearly what is being discussed as "acting in a different way".

In Rust there's just one, here's the entire function body of contains: pat.is_contained_in(self)

OK, well that's just buck passing right? Clearly this is_contained_in() method on Pattern is really just the contains() implementation, we're passing the work to this function that as you point out needs to be implemented by each of the matching types for Pattern.

Except, wait, Pattern actually defines is_contained_in(haystack), thus: self.into_searcher(haystack).next_match().is_some()

Sure enough Pattern implementations although they're not forbidden from implementing is_contained_in themselves, do not in fact do that, they just implement into_searcher. Our hypothetical Jpeg type can provide a suitable into_searcher implementation which results in a Searcher for the Jpeg somehow, without knowing what contains() or split_once() or trim_start_matches() do, and now they will work on Jpegs.

So the "acting in a different way for each type" for contains() ends up only being because of details about the inner behaviour of that type, which is exactly parametric polymorphism so far as I can see.


Rust breaks parametricity via functions like size_of and in unstable features such as specialization.


You do realistically need a kind of name mangling yes, if you want to keep using [a-zA-Z0-9_] as the set of characters in symbol names. But without overloading or other fancy stuff, you could have a super simple name mangling scheme: separate the names by a double underscore "__".


What saddens me is that this is yet another missed opportunity for fixing the C strings library, which is sadly mostly broken beyond repair or straight unusable on modern systems.

Annex K tanking badly was just sad. At least they finally standardized strdup().


How do you intend to fix the strings library? beyond strlen() and _maybe_ the search functions like strchr() there really isn't much you should use. And those functions are _fine_.

Some of the string functions, like strtok() or strcat(), and I would also include strdup(), are broken beyond repair and should simply not be used. The only way to "fix" the library would be to remove these functions, but I assume this would create more trouble with legacy code than it's worth.

C strings are mostly a useful storage format for small strings, including string literals, and as a "default" string type for the little basic functionality that is expected to be part of C to support a small program growing up - including printf() and friends.

For longer strings, you are expected to build datatypes that fit your usecase, and that includes decisions about memory management. It should not be the responsibility of the language to make these decisions for you.


By having something like SDS[0] as part of the standard library, but that is clearly asking too much.

[0] - https://github.com/antirez/sds


This is BSTR for the Windows/COM folks.


Replaced by HSTRING on modern times. :)


I get strtok and strcat, but what's wrong with strdup?


It's only very marginally useful, strdup(x) is equivalent to malloc(strlen(x)). It is one of the few functions that allocate, but it doesn't have "alloc" in its name, putting more of an onus on the programmer to free() appropriately. Also, it's advocating bad programming style. Just mallocing random small things all over the place is not how you structure programs well. (Instead, most strings should be in a pool or in another context that is more local than the malloc() heap, so individual strings can be released at once without having to track all the strings attached to a million different structures).


Other POSIX functions, such as glob, also allocate memory and don't have alloc in the name.

It's something that I use. Maybe not frequently, but I definitely use it.

Also, it is in no way equivalent to malloc(strlen(x)). You're off by one byte and you've not copied any of the string.


Oops, see how my mind became lazy when thinking about such a boring, tasteless function? A function like memdup() would be slightly more useful.

At least, glob() has its own globfree() function, leaving room for slightly more mindful organization.


You can't even do strcpy(malloc(strlen(x)+1), x) because it doesn't check that malloc succeeded. strdup does all of this for you! :-)


Maybe your mind would become more engaged if you took the time to understand how the functions worked properly.


I do of course understand how this "works".


> strdup(x) is equivalent to malloc(strlen(x))


Dude. Have you ever typed or said total nonsense because your mind was already busy with the next, more interesting thing?


If things frequently seem nonsense to you perhaps greater reading comprehension skills may help you keep up with busy-minded folk.


What are you up to? I made a fairly clear and simple point but you seem to be hung up on a stupid "typo". And what do you allude to with regards to reading comprehension?


If you frequently make stupid typos perhaps you may enjoy writing lessons to go with the reading comprehension ones.


I think I could recommend a couple lessons to you as well. Have a good day.


Annex K is appalling. It depends on global shared mutable state, so it is incompatible with threading.


The committee is still run by fossils carrying the misguided torch of endless backwards compatibility. No one is compiling 1980s C on GCC trunk releases, but by god we still need to make sure it compiles cleanly.


It usually doesn’t, often it will compile with warnings.


I really don't know any C beyond K&R years ago, but this was an enjoyable read.


Probably worth clarifying whether you actually mean the Second Edition of Kernighan and Ritchie's book "The C Programming Language" which is about ANSI C and thus approximately ISO C89, or you literally meant what is called "K&R C" the language described in the 1978 edition, by the same name, which is a language somehow composed entirely of sharp edges.


Since you asked, it was the second edition which came in an academic bundle with iirc Symantec Think C for Macintosh.


and cute post/pre increment/decrement tricks!


I started reading through/doing the exercises earlier this year and I noticed there was a lot of "cute" tricks in their code examples, such as loops that perform functionality that's all defined in the control flow definition (init/condition/increment) of a for loop, and instead of a loop body it's immediately followed by a semicolon.

Perhaps they're just to give you a better understanding of the more minor details/intricacies of the language, but I couldn't help but think with a lot of their code, if I saw a coworker write something like that I'd roll my eyes that they didn't do it in a more obvious way.


I think that the way K&R writes C is probably idiomatic C just de facto. But it's also reasonable to believe this style is completely unacceptable for maintainable code today and should fail review for example.

So, it's not crazy to argue that because idiomatic C isn't acceptable therefore rule out C as a language for an organisation to use entirely. Clearly if you don't have an alternative this highlights a problem rather than itself being a solution. But it might be a reason not to do whatever it was you were thinking of doing at all. Do you really need to write micro-controller firmware for your project? Isn't there an off-the-shelf component where it's not your problem to maintain the horrible C code that makes it work?

[ Or you could go all Oxide Computer Company and decide no, C is just terrible so we are going to use Rust and if we need to rewrite the firmware on this switch that's what we'll do. That might make total sense, if you go into it clear eyed ]


When will they add a slice (fat pointer) type? I.e. a generic version of

> struct T_slice { T* ptr; size_t len; };

with all the syntactic sugar and bounds checks added in. This could catch so many out of bounds issues and doesn't seem that hard to do.


The author listed "wide pointers" in the second paragraph of things they are impatient for, and would like a faster release cycle for. It sounds like it could be another 10 years.


The original authors of the language also advocated for this, and Walter Bright's D compiler has a BetterC compiler that also includes a

int arr[..];

syntax for such things. But, our plates are full and the proposal wasn't written: it'll be up for next release, for sure, and it's one of the many things I would like to put in the language.


I am so glad to see #embed.

I always thought it was silly that the best way to embed a resource in a binary using CMake was to either convert it to hexadecimal C array manually or use specific flags with ld to convert it into an object (and good luck getting that to work reliably with cross-compiling!).


I'm pleased by making types with the same tag and fields compatible within a TU. That was a stupid thing to work around with offsetof.

Unfortunate characterisation of constexpr in C++ over revisions as feature creep. It's more that any C++ should be executable at compile time with constexpr ultimately going the way of the register keyword, but selling people on things like compile file I/O is a long game.

Ultimately though it's hard to be too excited. I don't have a use case for C any more, beyond keeping some old code alive (with fstrict-aliasing et al and a sense that I really should port it to something else before I can no longer get any compiler to build a program out of it).


Extensive discussion about #embed a couple of weeks ago.

https://news.ycombinator.com/item?id=32201951


What will the __STDC_VERSION__ be?


The current draft says 202311L. But I guess it's not set is stone yet?


They really want to kill C right?

If you want a better C, do yourself a favor and use D

Same syntax, more secure and with modules


This kind of comment is not helpful at all, I use Rust today however, yet I cannot sometimes deny, it's better to use C depending on the target and restrictions by hardware compilers.

Every language has downsides/upsides and time. There is a nice quote by Titus Winters: “Engineering is programming integrated over time”[0], this helped me a lot in learning that context matter for tools/languages/frameworks, and you can learn with their mistakes.

[0]: https://youtu.be/tISy7EJQPzI


Which change in particular do you dislike?


It's not about things i dislike, it's subjective, it is about their priority and the feature they think are needed

I'm still waiting for tagged unions, tuple and pattern matching

auto and #embed is nice to have though


I don't think we have any proposals for language-supported tagged unions (have to do them yourself), tuples (write your own macro/struct), or pattern matching (lole).

You'll have a hard time pushing for those: C will likely never get them in those forms. At least, not for another 10-20 years, since there's no existing C compiler that implements any of those things yet.


There you highlighted what i meant by they want C dead


I like to imagine that their goal is similar, yet less ambitious: to keep C alive only for those who really need it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: