A Special Kind of Hell: intmax_t in C and C++ (2020)

bsder · on Sept 6, 2022

The problem isn't "intmax_t". The problem is "int".

If you have an ABI, you need to put an explicit size and signedness on every parameter and return value. Period. No excuses.

No "int". No "unsigned int". If I'm being really pedantic, don't even use "char".

It should be "int32_t", "uint32_t", and "uint8_t".

Every time I see objections, it's always someone who wants to use some weird 16-bit architecture. The problem is that those libraries probably won't work anyhow since nobody tests their libraries on anything other than x86 and maybe Arm. If your "int" is 16 bits, you're likely to have a broken library anyway.

CodesInChaos · on Sept 6, 2022

There are a few variable size types that make sense in an ABI. Like `size_t` and `(u)intptr_t`.

> If I'm being really pedantic, don't even use "char".

Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

bsder · on Sept 6, 2022

> Careful about that one. I believe `uint8_t` isn't required to be a character type, which has implications for type based aliasing.

The fact that every single compiler typedefs "uint8_t" and "unsigned char" and yet that isn't guaranteed by the standard is the kind of thing that just makes you want to cry.

I'm actually genuinely curious about this: people talk about the possibility of existence, but I haven't seen anybody point to an actual compiler that implements uint8_t as an extended numeric type rather than being an equivalent typedef to "unsigned char". Is there a compiler that really does this?

jlokier · on Sept 6, 2022

> The fact that every single compiler typedefs "uint8_t" and "unsigned char" [...] and yet that isn't guaranteed by the standard

They don't. The standard is right on this. It's supposed to let authors know what can be relied on portably on real C/C++ systems in use. (Targets where char is not even 8 bits might make you cry even more, but I think those are legitimately obscure now.)

> Is there a compiler that really does this?

Here's a popular one with hobbyists in 2022. The definition of uint8_t in the Arduino system header files is not "unsigned char". I have seen this definition used elsewhere too:

  typedef unsigned int uint8_t __attribute__((__mode__(__QI__)));

https://forum.arduino.cc/t/definition-of-uint8_t/177947 https://forum.arduino.cc/t/mismatching-integer-type-definiti...

bsder · on Sept 6, 2022

Except that __attribute__((__mode__(__QI__))) makes that effectively equivalent to "unsigned char".

From https://groups.google.com/g/comp.lang.c/c/sE6I4sgGyPs

> A few tests on generated code, however, suggest that gcc treats these exactly as signed char and unsigned char, both in how the react to _Generic and in how the compiler handles them for alias analysis.

So, that means that there are currently zero existing compilers that treat uint8_t differently from unsigned char.

JonChesterfield · on Sept 6, 2022

Pretty sure I remember a GCC bug that amounted to uint8_t behaving differently to unsigned char wrt aliasing. It's the sort of thing I'd expect clang to do too.

Gibbon1 · on Sept 6, 2022

Personal opinion, 'int' is way above NULL as a bad idea.

bsder · on Sept 6, 2022

"int" wasn't a bad idea "back in the day".

Different architectures had all manner of different "native" sizes. 8-bit micros were still relatively expensive even up through mid-1980s. The 80286, for example, was 16-bit and went away only in the early 1990s. IBM AS/400s were still 48-bits through the mid-1990s. Apple was still dealing with non-32 bit clean applications in the mid 90s. Linux running on Alpha was still cleaning up x86-isms in mid 1990s.

C99 finally introduced "stdint.h". By the mid-2000s, everything had converged to 32-bit (or 64-bit).

C11 should have deprecated "int" so that compilers could throw warnings on it. But, then, we still haven't removed K&R signatures from the C standards, so here we are.

Wevah · on Sept 6, 2022

IIRC K&R signatures are out in C23.

gpderetta · on Sept 6, 2022

As a developer I complain that passing unique_ptr is a couple if cycles slower than it needs to be because of the ABI and I wish the committee/GCC was more aggressive with ABI breaking.

As an user I complain that I can't run linux games from 20 years ago because GCC broke the libstdc++ ABI 15 years ago and how WIN32 is the only stable unix ABI.

Luckily I'm not a compiler developer.

neeeeees · on Sept 6, 2022

Out of curiosity, what field do you work in that a couple of cycles matters/is measurable?

gpderetta · on Sept 6, 2022

I do work in a field where cycles matter (HFT), but in this specific case I was definitely hyperbolic :).

RcouF1uZ4gsC · on Sept 6, 2022

> the vast majority of the shared ecosystem depends on shared libraries/dynamically linked libraries for the standard.

The more I use C and C++, the more I am convinced that shared libraries are the biggest technical debt for the whole ecosystem. It is these shared libraries that are the driving impetus behind “ABI stability”. It is because of shared libraries that we can’t have nice performance or safety enhancing features.

Right now the C and C++ ecosystem is groaning under the weight of shared library technical debt.

Gibbon1 · on Sept 6, 2022

My dog in this hunt is I don't care about ABI stability because everything is statically linked in my world. And when I see this stuff my sour thought is why aren't they versioning their ABI's?

I remember a friend in the dark ages worked on a mixed Pascal and C codebase and they had some tool that generated an adapter layer between the two. All it usually did was flip stuff around on the stack before calling the routine. And then clean things up before returning.

pjmlp · on Sept 6, 2022

Those shared libraries are a way to create plugins for commercial software.

Surely one can use OS IPC instead, like we used to do during the days static linking ruled the compiler world, but then don't complain about higher resource usage when every single plugin is its own process.

UncleMeat · on Sept 6, 2022

It used to be the case that resource limitations meant that shared libraries were meaningfully more efficient. I know that some software communities like to complain about modern software bloat but statically linking everything is so incredibly valuable across so many different dimensions that it is absolutely worth the cost. Heck, modern link-time optimization probably means that applications run faster despite the code-size bloat.

Shared libraries not only require this absolutely crippling adherence to the ABI they also are a security mess since you need updates from both your software provider and your operating system vendor to ensure that anything is safe.

pjmlp · on Sept 6, 2022

Static linking is even worse in what concerns security unless you can compile the world from scratch, every single time a new update comes out.

UncleMeat · on Sept 6, 2022

If you can't recompile your application then you are already fucked for security regardless of static or dynamic linking.

And even for vulns in libraries, to me, recompiling and shipping an update is a more trustworthy approach than just hoping that my users are getting library updates from their OS vendors.

pjmlp · on Sept 6, 2022

Dynamic libraries can be updated independently of the main application, that is the whole point about plugins, and commercial software is a thing.

UncleMeat · on Sept 7, 2022

Yes but they often aren't and you have no control over whether a user will update their dynamic libraries. And you still need to be able to recompile and redistribute your application because you are going to have vulns in your code as well so this hasn't actually saved you the responsibility of recompiling to protect your users.

RcouF1uZ4gsC · on Sept 6, 2022

> Those shared libraries are a way to create plugins for commercial software.

That is an entirely reasonable use. And one that can have specific types and ABI guarantees for the commercial software and the plug-in. In addition, this would only be required if you were the developer of the commercial software or a plug-in. One example of this is COM which had specific types and memory allocation/deallocation.

However, the case where the C standard library is being dynamically linked, forces everyone to pay the price for ABI stability.

pjmlp · on Sept 6, 2022

Except that the C standard library is compiler specific, only in the UNIX world it blends with the OS API, due to their symbiotic relationship.

krastanov · on Sept 6, 2022

Wasn't dynamical linking crucial 20 years ago, when drive space and bandwidth were expensive?

dralley · on Sept 6, 2022

It's still pretty important today for things like security-critical libraries, libc, plugin systems, etc.

However it is probably overused relative to the situations where it is genuinely useful.

smitty1e · on Sept 6, 2022

Hence the increasing popularity of docker for interpreted projects.

However, if we're decoupled from the system .so files, then the 'Hell' of the The Famous Article might be reduced to a Purgatory.

gpderetta · on Sept 6, 2022

How would you link against, say, an hardware specific libopengl.so without shared libraries?

intelVISA · on Sept 6, 2022

I haven't used a shared lib in many years thankfully, outside of OpenSSL most libs aren't too painful to link statically.

mburee · on Sept 6, 2022

And glibc, but most people interested in static linking are using musl/dietlibc/uclib anyways

kazinator · on Sept 6, 2022

intmax_t should be kept out of stable ABI definitions, and out of API's.

There has to be an ABI for it because we have to pin down what it means to pass an intmax_t as an argument to a function, how it is aligned on the stack if passed that way, and how it's placed into a structure and so on.

However, there could be a provision that the ABI treatment of intmax_t is not guaranteed; it is subject to change due to the redefinition of intmax_t.

And, for that reason, it should be kept out of API's.

That leaves API's that deal specifically with intmax_t itself rather than using it to represent something. Those can use aliasing and versioning.

Say we had a function like this:

  struct intmax_quot_rem intmax_div(intmax_t, intmax_t);

today intmax_t might be 64 bits, so the application should be compiled in such a way that the call to intmax_div goes to some __intmax_div_64, which looks like this:

  struct intmax_quot_rem __intmax_div_64(int64_t, int64_t);

Even when intmax_t changes to 128, that compiled program continues to reference __intmax_div_64 which uses int64_t parameters and structure members. A newly compiled program calls __intmax_div_128.

A particular problem would be functions in the printf family. Say we have a conversion specifier which prints intmax_t which is 64 bits today. Here, the solution is even simpler. The "PRI" macros introduced in C99 provide it. Given an intmax_t value x, we print it like this:

  printf("x = %" PRIdMAX "\n", x);

so today that might expand to some conversion specifier that is identical to the one for PRIx64. And so that compiled program will have that baked into its conversion string, so everything will continue to be the same even if the platform moves to a 128 bit intmax_t.

A newly compiled program on the 128 bit intmax_t platform will get a different PRIdMAX string from the header file, which expands to a conversion specifier matching int128_t.

Basically all the issues are solvable except the issue of some application code carelessly using intmax_t in its APIs without any plan for versioning.

Const-me · on Sept 6, 2022

I think C# did it right in 2000. Regardless of CPU architecture, integer types like short, int, or ulong have fixed size of 16, 32 and 64 bits respectively.

There're couple special types with machine-dependent size, like IntPtr, but these only used for opaque handles and C interop.

Macha · on Sept 6, 2022

This is something it inherited from Java and is standard in newer languages like Rust and Go too