IMO, defining your own types is one step too far. Now everyone who is already familiar with C types has to learn your own quirky system to understand one program. I think it does probably make sense to be specific about the sizes though e.g. using uint32_t over just uint (and expecting to receive some architecture-dependent size you might not get with uint.) These types should be defined in the right header (I think it depends on compiler?) It's been a while since I wrote any amount of C so my apologizes if this isn't correct.
Sure, that's not true for 16 bit targets. But are you really going to port a 5Mb program to 16 bits? It's not worth worrying about. Your code is highly unlikely to be portable to 16 bits anyway.
The problem is with `long`, which is 32 bits on some machines and 64 bits on others. This is just madness. Fortunately, `long long` is always 64 bits, so it makes sense to just abandon `long`.
So there it is:
char - 8 bits
short - 16 bits
int - 32 bits
long long - 64 bits
Done!
(Sheesh, all the endless hours wasted on the size of an `int` in C.)
Yet another issue is that `char` is signed on some platforms but unsigned on others. It is signed on x86 but unsigned on RISC-V. On ARM it could be either (ARM standard is unsigned, Apple does signed).
I therefore use typedefs called `byte` and `ubyte` wherever the data is 8-bit but not character data.
I also use the aliases `ushort`, `uint` and `ulong` to cut down on typing.
On the other hand, the types in <stdint.h> are often recognised by syntax colouring in editors where user-defined types aren't.
Then you're better off using custom types - that way people will immediate know your type is non-default - as opposed to hiding your customization away in a makefile, pranking people who expect built-ins to behave a certain way.
The people who understand that it can be either, depending on a compiler switch, are exactly the people who use an explicit sign (typically via a typedef) to ensure their code always works.
The people who say that char is de facto signed and everyone should just deal with it, are the people who end up writing broken code.
Yes, the optional sign on char is also madness. C had a chance in 1989 to make it unsigned, and muffed it. (When C86 decided between value-preserving and sign-preserving semantics, they could have also said char was unsigned, and saved generations of programmers from grief.)
D's `char` type is unsigned. Done. No more problems.
Oh I don't even know where to start with this. Given that C is the lingua franca of embedded development, and each processor and compiler has different opinions of what an int is, I would never claim that an int is 32 bits.
It's just so much less error prone to define a uint32_t. That's guaranteed to be the same
Yeah, almost the only time I'm writing C anymore is embedded, where I want to reason about type widths (while taking on as light a cognitive load as is possible). I have enough code that gets compiled to an 8, 16, or 32 bit target depending on context that having the bit width right on the tin is valuable. And it doesn't even cost me "hours and hours".
Also: Embedded is almost the only time you really, truly need to care about how many bits a type is, and only when you're interacting with actual hardware.
For almost every other routine task in programming, I would argue that it really doesn't matter if your int is 32 bits wide or 64 bits wide. Why go through the trouble of insisting on int32_t or int64_t? It probably doesn't matter for the things you are counting.
Some programmers will say "Well, we should use int64_t here because int32_t might overflow!" OK, so why weren't you checking for overflow if it was an expected case? int64_t might overflow too, are you checking after every operation? Probably not. "OK, let's use uint64_t then, now we get 2x as many numbers!" Now you have other overflow (and subtraction) problems to handle.
Nowadays, I just use int and move on with my life. It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."
> It's one of those lessons from experience: "When I was younger, I used int and char because I didn't know any better. When I was older, I created this complex, elaborate type system because I knew better. Now that I'm wise, I just use int and char."
Right on, dude. I've gone full circle on that, too.
I also spent years wandering the desert being enamored with the power of the C preprocessor. Eventually, I just ripped it out as much as possible, replacing it with ordinary C code. C is actually a decent language if you eschew the damned preprocessor.
> Other models are very rare. For example, ILP64 (8/8/8: int, long, and pointer are 64-bit) only appeared in some early 64-bit Unix systems (e.g. UNICOS on Cray).
My computer being one of those (rare?) architectures. Though I think it is not entirely dependent on the processor and the OS choice also affects this.
Umm, sorry I remembered that wrong. Turns out int isn't 64 bits on my machine. I should double check before posting next time. (I mistaked long with int, and long isn't 64 bits on some systems). I can't delete it now.
Yep. DSPs always have weird architectures, but in most cases, one isn't compiling the same code for multiple DSP architectures. As an example, the C2000 line has a 16-bit `char`; There is no support for "bytes".
Exactly this (plus floating point types and unsigned qualifier) and done. It’s standard C, there is no need to invent yet another unnecessary “type” system for standard C native types. I do like bool though.
I quit using "long" because sometimes a long is 32 bits and sometimes 64, and I can never remember which compiler does which. But "int" is 32 bits and "long long" is always 64 bits, so that's what I stick with.
The type that is 32 bits in C is int32_t, and the 64 bit one is int64_t; if you really want those specific widths, you can just use those types.
The type long is the smallest ranking basic type that is at least 32 bits wide. Since int is only required to go to 32767, you use long if you need a signed type with more range than that. That made a lot of sense on platforms where int really did go up to just 32767, and long provided the 32 bit one.
Now long, while at least 32 bits, is not required to be wider than 32; if you need a signed type that goes beyond 2147483647, then long long is it.
Those are the portability rules. Unfortunately, those rules will sometimes lead you to choose types that are wider than necessary, like long when int would have worked.
Where that matters, it's best to make your code tunable with your own typedefs. I don't mean typedefs like i32 but abstract ones, like ISO C's time_t or clock_t, or POSIX's pid_t. You can adjust your types without editing numerous lines of code.
Choosing integer sizes in C is pretty easy. The standard guarantees certain minimum ranges.
1. Consider the char and short types only if saving storage is important. Do not declare "char number_of_wheels" for a car, just because no car has anywhere near 127 wheels, unless it is really important to get it down to one byte.
2. Prefer signed types to unsigned types, when saving storage is not important. Unsigned types bend the rules of arithmetic around zero, and mixtures of signed and unsigned arithmetic add complexity and pitfalls. Do use unsigned for bitmasks and bitfields.
3. Two's complement is ubiquitous: feel free to assume that signed char gives you -128, and short gives you -32768, etc. ISO C now requires two's complement.
3. Use the lowest ranking type whose range is adequate, in light of the above rules: rule out the chars and shorts, and unsigned types, unless saving space or working with bits.
For instance, for a value that ranges from 0 to 65535, we would choose int. If it were important to save storage, then unsigned short.
The ISO C minimum required ranges are:
char 0..255, if unsigned; -128..127 if unsigned, therefore: 0..127
signed char -128..127
unsigned char 0..255
short -32768..32767
unsigned short 0..65535
int -32768..32767
unsigned int 0..65535
long -2147483648..2147483647
unsigned long 0..4294967295
long long 9223372036854775808..9223372036854775807
unsigned long long 0..18446744073709551615
If you're working with bitfields, and saving storage isn't important, start with unsigned int, and pick the type that holds all the bits required. For arrays of bitfields, prefer unsigned int; it's likely to be fast on a given target. It's good to leave that configurable the program. E.g. a good "bignum" library can easily be tuned to have "limbs" of different sizes: 16, 32 or 64 bit, and mostly hides that at the API level.
If you're working with a numeric quantity, remove the unsigned types, shorts and chars, unless you need to save storage (and don't need negative values). Then pick the lowest ranking one that fits.
E.g. if saving storage, and don't need negative values, search in this order: char, signed char, unsigned char, short, unsigned short, long, unsigned long, long long, unsigned long long.
If saving storage, and negatives are required: signed char, short, int, long, long long.
If not saving storage: int, long, long long.
If the quantity is positive, and doesn't fit into long long, but does fit into unsigned long long, that's what it may have to be.
Yes it does bend rules. Say that a, b and c are small integers (we don't worry about addition overflow). Given an inequality formula like:
a < b + c
we can safely perform this derivation (add -b to both sides):
a - b < c
This is not true if a, b and c are unsigned. Or even if just one of them is, depending on which one.
What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.
This is rarely what you want, except in specific circumstances, when you opt into it.
Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.
Using masking on a poweer-of-two-sized index will work with signed, due to the way two's complement works. For instance, say we hava have [0] to [15] circular buffer. The mask is 15 / 0xF. A negative index like -2 masks to the correct value 14: -2 & 15 == 14. So if we happen to be decrementing we can do this: index = (index - 1) & MASK even if index is int.
> What I mean by "bend the rules of arithmetic" is that if we decrement from zero, we suddenly get a large value.
Yes completely consistent with rules of modular arithmetic. A programmer ought to be able to extend math horizons beyond preschool. Which is ironic because I can explain this concept to my 6 year old on a clock face and it’s easy for them to grasp.
> Unsigned tricks with circular buffer indices will not do the right thing unless the circular buffer is power-of-two sized.
How will they “not do the right thing?”. With power of 2 you avoid expensive moduli operations, but nothing breaks if you choose to use a non power of 2.
> two's complement
Two’s complement is not even mandated in C. You are invoking implementation defined behavior here. Meanwhile I can just increment or decrement the unsigned value without even masking the retained value and know the result is well defined.
Like I get 2s complement is the overwhelming case, but why be difficult, why not just use the well defined existing mechanism?
And there’s no tricks here, literally just using the fucking type as it was designed and specified, why clutter things with extra masking.
In the N3096 working draft it is written: "The sign representation defined in this document is called two’s complement. Previous revisions of this document
additionally allowed other sign representations."
Non-two's complement machines are museum relics, and are no longer going to be supported by ISO C.
> why clutter things with extra masking.
Because even if the circular buffer is a power of two, its size doesn't necessarily line up with the range of a given unsigned type.
If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck; you can't just uint8_t, uint16_t or uint32_t as the circular buffer index without masking to the actual power-of-two size.
(Note that uint16_t and uint8_t promote to int (on the overwhelming majority of platforms where their range fits into that type), so you don't get away from reasoning about signed arithmetic for those.)
> If the buffer doesn't have a width of 256, 65536, or 4294967296, then you're out of luck
Why so much hyperbole?
You’re not out of luck. You can atomic increment/add the unsigned no matter the buffer size. You don’t worry about overflow like you would with a signed type. You can mask after.
And you continue to avoid answering the simple question: what is the advantage of the signed type. I’ve already outlined the one with unsigned, especially with atomics.
The main advantage is not foisting unsigned on the user of the API.
(You can do that while using unsigned internally, but then you have to convert back and forth.)
The most important decision is what is the index type at the API level of the circular buffer, not what is inside it. But it's nicer if you can just use the API one inside.
The sizeof operator yielding the type size_t which is unsigned has done a lot of harm. Particularly the way it spread throughout the C library. Why do we have size_t being unsigned? Because on small systems, where we have 16 bit sizes, signed means limiting to 32767 bytes, which is a problem. In all other ways, it's a downer. Whenever you mention sizeof, you have unsigned arithmetic creeping into the calculation.
The author of the above blog article has the right idea to want a sizeof operator that yields ptrdiff_t instead of size_t. (Unfortunately, the execution is bungled; he redefined a language keyword as a macro, and on top of that didn't wrap the macro expansion in parentheses, even.)
> Yes completely consistent with rules of modular arithmetic.
In modular arithmetic, there is no such thing as <. (To put it precisely, ℤ_𝑛 is not an ordered ring.) Or are you teaching your 6-year old that 9:00 today is later than 7:00 tomorrow?
Unsigned arithmetic is useful for wrapping clocks, like interrupt tick counters and whatnot. There is always some current value, "now". There is a range of it defined as the future. Everything outside of that range is considered past. Timers are never set farther into the future beyond the range, and are expired in a timely way so that unexpired timers never recede sufficiently far into the past that they appear to flip to the future. One way of doing it is to just cut the range in half: take the difference between two times t1 - t0 and cast it to the same-sized signed type. If the difference is positive, then t1 is in the future relative to t0. If negative, t1 is in the past relative to t0.
This is one of those niche uses of unsigned.
You probably want to hide it behind an API, where the domain is opaque and abstract and you have function such as a time_before(t1, t0) predicate.
True, but this is not valid if they are signed, either. Take
a = INT_MIN
b = 1
c = 2
Then
a < b + c
is true. But
a - b < c
invokes undefined behavior.
Edit: missed
> Say that a, b and c are small integers (we don't worry about addition overflow)
Ah, well that makes this example vacuously true, however I'm not sure what the utility in that restriction is. We've only moved the goalposts from "bend[ing] the rules of arithmetic around zero" to bending the rules of arithmetic outside of "small integers".
> however I'm not sure what the utility in that restriction is.
We have moved the goalposts much farther apart.
If we are using a 32 bit integer type, all we need is that a, b and c fit into 31 bits. Then there is no way that b + c or a - b overflow. For a single addition or subtraction, we just need one bit of headroom.
I.e. the values do not actually have to be that small.
There are all kinds of situations in which programs work with small integers, where the calculations could bork if an unsigned creeps in.
A cliff near zero is qualitatively different from clipping at two extremes. An electronic device that clips everything below zero volts will distort even the faintest waveform. One that clips near the power rails has clean headroom.
If b = 0x7fffffff and c = 0x7fffffff, b and c both fit in 31 bits, and b + c overflows to -2 in signed int32 twos-complement math (I think).
If b = 0x40000000 and c = 0x40000000, b and c both fit in 31 bits, and b + c overflows to -2147483648 in signed int32 twos-complement math (I think).
Maybe the definition of "32 bit integer type" you're using is meant to encompass only 32 bits as all unsigned (but then there are a - b terms that would overflow if b > a).
They don't fit into a 31 bit two's complement (i.e. signed) representation, in terms of representing their interpretation as the familiar 32 bit INT_MAX.
31 bit two's complement goes from -0x40000000 to 0x3FFFFFFF. There is a 0x7FFFFFFF bit pattern, which represents -0x00000001. It has a sign bit which is 1. (So, adding that to itself does go to -2, but under that interpretation there is no overflow.)
Any pair of values in that range can be added or subtracted in 32 bit two's complement.
Including the most negative value: -0x40000000 + -0x40000000 = -0x80000000.
so... what I'm seeing is that C got it wrong relative to the way things actually work and get used.
the fact that you had to have tribal knowledge about all of this is why C shouldn't stay for the long term and we should phase out languages into ones with stronger more correct defaults.
would a new programmer use "long long"? would they notice immediately that things didn't work if they didn't use it?
Rust got it correct by labeling the bits with the type directly
Rust's integer types are poorly abstracted. The use of specifically sized types for quantities that are not related to hardware is comically ridiculous.
In the C world, only the goofballs do things like use char or int8_t for the number of children in a family, or wheels on a car.
yet that is what Rust code looks like. Almost every Rust code sample I've ever seen sets off my bozon detector just for this reason.
The author did qualify it with personal coding style. Frankly the standard types are too verbose and I wish this guy's elegant and clear list had been the one that was adopted way back when.
That ship has sailed ages ago. There are some things you should just accept about C, or any programming language really. Just because you can do something doesn't mean you should do something. I don't know how many years of experience in C this guy has, but this is a "been there, done that" case for me. I stick to stdint and stdbool today, and even if only half the code/libs I interface with do that, it's already worth the extra _t-typing all the time. Just the fact that they use the i prefix for signed, and s for string has a high chance that his s8 string type gets confused with an 8-bit signed int.
But as you say, it's a personal style, and the author seems to be aware of that:
> I’m not saying everyone should write C this way, and when I contribute code to a project I follow their local style.
Because that's by far the most important rule to follow in any language.
I think the rest is less controversial, the 0 vs. NULL thing has been going on forever; I didn't check recently but I'd assume "const somestruct *foo" would still sometimes help out the compiler to optimize vs. the non-const version.
> The author did qualify it with personal coding style. Frankly the standard types are too verbose and I wish this guy's elegant and clear list had been the one that was adopted way back when.
They didn't adopt it for the same reason that it is a bad idea now - too many programs already contained at least one variable named after his types.
If the standard had adopted his convention, too many programs will break, which is why his convention is currently unsuitable for any existing project.
> Wouldn’t existing programs just continue working?
Only ones which don't have variables named `i8` or `b32` (which is common, but not for booleans).
I've seen many projects which used the pattern [a-z][1-9]+ as variables. Those programs with a variable called `i8` won't compile if the standard made a type called `i8`.
In particular, the standard reserves entire patterns to itself, so it cannot reserve the pattern of [a-z][0-9]+. They could, and did, reserve the pattern *int*_t for themselves.
But that problem exists for any C project that uses an external library. If the library defines something that the project already uses, then the project will not work.
In my mind that's not a problem with the decisions taken by the author of the article, it's more of a symptom of C's limitations.
> But that problem exists for any C project that uses an external library. If the library defines something that the project already uses, then the project will not work.
For libraries, yes, but we're talking about why the standard didn't do it.
The standard did not want[1] to reserve keywords that current programs were already using.
A library that conflicts on keywords will only break with those programs that use it. A standard that conflicts on keywords breaks all programs in that language.
> In my mind that's not a problem with the decisions taken by the author of the article, it's more of a symptom of C's limitations.
One of the constraints of taking decisions is to work within the limits existing framework - if you're avoiding the alternatives that don't break, then it's the decision-makers bug, not the frameworks.
The framework has limitations, widely published and known. You make decisions within those limitations.
[1] Although, they do do it, it's only with relectance, not on a whim to avoid typing a few characters)
Cannot declare a variable called `u8` when there is a typedef of `u8`.
And even when you can declare a variable called (for example) `int`, that effectively "breaks" the program by not being even a tiny bit readable anymore.
Those may be terrible variable names but they were understandable back in the 70s and 80s when disk space was at a premium and compilers only cared about the first 6 characters in a variable or function name. That's the downside of a 50 or so year old programming language: you have to worry about not breaking legacy code that did things based on the hardware limitations of that time.
Because those existing programs surely don't use the same identifiers for other stuff? Certainly there is no code out there using s8 for "signed char" instead of "utf8 string"? :-)
I did too. Humans use context to resolve ambiguities in language, and in this case the context was very much statistically favouring the library; if you're using it, glib is literally "someone else's type system".
It is unfortunate that the two have such similar names because there's a lot of room for confusion. It doesn't help that they have somewhat adjacent functionality almost.
Rust made the correct choice: things used most often should be assigned the shortest names. This "Huffman encoding" style is what natural languages have evolved toward as well. In 2023, if I were to write C, and didn't have existing guidelines to adhere to, I'd most probably introduce the same typedefs as the author here has done.
It’s not great but they’re just aliases so they’re interchangeable, which means you can keep everything consistent within a project and it won’t cause any problems when interacting with outside code
Until you include a header written by someone with the same opinion, and now you get compile errors because they both defined 'u8'.
I gotta be honest, all of those style suggestions look good until you try them in a non-solo and non-isolated project, and then you see what a mess you created.
We've all been there, as C programmers, and we've all done that in the past, which is why we don't do it anymore
> Unless they were defined to completely different types, that shouldn’t be an error
In this case it almost certainly will be - after all, the blog posts `byte` is defined as char, which could be signed or unsigned. A correct typedef for `byte` is `uint8_t`, so it's almost guaranteed that this will conflict.
Which is why I said it's best not to redefine the primitive types - you're almost certain to conflict with someone else who defined it differently.
On my keyboard layout it's one keypress. And since code is read about 100 times as much it's written, I don't particularly care about reaching 250 WPM while writing code. The difficulty of writing code is thinking about it, not actually physically writing it.
> These types should be defined in the right header
stdint.h
It's always been amazing to me how many different projects I've worked on (not that I've been in professional C for about 7 years now)) that include their own painstaking recreation of this file.
Reusing them and effectively translating them just to your own name is just annoying to the reader IMHO. I am reminded of a C++ project I worked on, where I questioned the extensive use of typedefs around collections of things, various forms of references and compound objects etc. I was informed by one of the more experienced C++ folks that it made the code easier to comprehend.
Later I saw the typedef cheat-sheet sellotaped to the side of his monitor...
> It's always been amazing to me how many different projects I've worked on (not that I've been in professional C for about 7 years now)) that include their own painstaking recreation of this file.
How many of them started before stdint.h existed? AFAIK, it's a somewhat recent addition to the C language, and IIRC, for a long time even after it became part of the C standard, some popular C compilers still didn't have it.
As recently as eight years ago, on projects started within the previous handful of years. It’s more to do with a lot of C programmers being stuck in a sort of stasis IMHO. (I’m sure I was too in many ways).
And yes, Microsoft were the outlier and absolutely dragged their heels on stdint, but you could always grab a compliant implementation from one of the FOSS projects that produced one.
There's no requirement for a born-1995 codebase to still build on a 1995 system in 2023.
I work on a born-1995 codebase. We started requiring an ISO C11 plus GNU extensions¹ several years ago and are actively removing "compatibility" checks and kludges that are outdated.
[¹ to be fair - not needing to support Windows is a godsend for any C project.]
Just a note: defining own integer types has sense for resource-limited platforms. Most common type I see is something like "dim_t", which is 32-bit or 64-bit depending on use-case. 32-bit integers are often used even on 64-bit platforms in pointer compression schemes (for example, allocate your own heap and only store 32-bit offsets). This not only gives 2x improvement on memory usage for <4GB workloads, but it also improves performance due to better cache locality.
More 'pointers' (32 bit offset ints) fit on a single cache line. Or, put differently, a list off offsets is half as long amd hence everything om the list is twice as close to everything else on the list.
IMO, defining your own aliases for stdint.h types is innocent enough, but manually defining prototypes for Win32 calls instead of including standard headers is one step too far. You don’t own the ABI here: things like HANDLE and WPARAM have already changed sizes once, who’s to say ULONG_PTR will stay the same as uintptr_t for all future architectures? There is a good reason why doing the same on Unix is discouraged.
With slightly different semantics; as I recall, the Linux uXX and iXX types have natural alignment (equal to size), while stdint.h types are not required to.
The big issue with custom integer types is that while they are awesome in the implementation files, they are problematic for libraries in headers. And if you want to avoid a divergence between header and implementation files you're kinda stuck with the inttypes.h ones in practice.
> Now everyone who is already familiar with C types has to learn your own quirky system
Oh I dunno. On one hand yeah learning a quirky system is an annoyance at times. On the other hand when you're coming from a language with a real type system dealing with custom types is standard operating procedure.
I've had to patch a lot of C over the years. I can't say I've ever been bothered by types. It's always the usual suspects; hard coded offsets peppered throughout the codebase, stack smashing, baby's first callback implementation, "parsing" that omits lexing/tokenizing, archaic business logic that may-or-may not have ever been correct.
Assuming you can trust those types to be what they look like, the code is readable.
I've worked with C for well over 30 years; custom typedefs are par for the course. Work with OpenMAX libs? You have OMX_U32. On Windows? You have DWORD. Using Glib? guint32 ...
But much C code is bringing in library headers which contain their author's own pet choices for these, which inevitably are not the same and the result is extremely confusing when you have that in play as well as the stdint.h ones.
The kernel contains a mixture of "pet" types like u32 and stdint ones, it's already confusing.
He also does make a "crazy" choice later to call his string class "s8" which clashes with his nomenclature here.
I agree. A lot of languages have settled on those same names or something similar. We don’t live in a world with a single word size anymore so carrying bit length in the name is critical, and so is keeping identifier names short. His trade off is exactly the one I would make.
Because we've used those names since forever, but that's archaic random crap really. Nothing apart from maybe "byte" makes sense here, the rest is completely arbitrary historic cruft. Could as well have called the rest timmy, britney and hulk.
Of course plenty of people are confused, the overhead of "short/long" just makes no sense, but yet another bad design from the past carefully preserved
Why would we ever need 128-bit CPUs? I remember the PS2 had something like that (with details and caveats I don't understand), but subsequent games consoles went back to a more usual register size: https://en.wikipedia.org/wiki/128-bit_computing
All I know is we keep having this issue with saying "nah, this is it. Nobody will ever need more than this." And then inevitably the time comes when we need more.
Back in the 80s, 16 bit programmers knew that 32 bit code was coming, so they carefully crafted the code to be portable to 32 bits.
Of course, none of it worked on 32 bit machines because the programmers had never written 32 bit code before and did the portability measures all wrong.
We used to call 16-bytes a paragraph, so the nostalgic geek in me would love to see ‘para’ catch on. I never thought I’d be slinging around whole paragraphs of memory in registers!
In that case D should probably start to have an internal conversation about what they're going to call 128 bits then, 'cause its going to become a thing sooner or later.
stdint already has that covered though: (u)int128_t
That's really interesting, and for me a totally unexpected name, having never seen that nomenclature before - would be interesting to see how consensus around that was arrived at - but hey, we gotta call it something!
(But not DoubleQuadWord please ... )
But they are buggy (correct code cannot depend on the sign of `char`), which is usually the result of typedefing primitive types to save typing 3 characters on each use.
I'm guessing this is lacking an outer pair of parentheses (i.e. it's not `((size)sizeof(x))`) on the grounds that they're unnecessary. In terms of operator precedence, casting binds tightly, so if you write e.g. `sizeof(x) * 3`, it expands to `(size)sizeof(x) * 3`, which is equivalent to `((size)sizeof(x)) * 3`: the cast happens before the multiplication. Indeed, casting binds more tightly than anything that could appear on the right of sizeof(x) – with one exception which is completely trivial.
But just for fun, I'll point out the exception. It's this:
(size)sizeof(x)[y]
Indexing binds more tightly than casting, so the indexing happens before the cast.
In other words, it's equivalent to `(size)(sizeof(x)[y])`, not `((size)sizeof(x))[y]`.
But you would never see that in a real program, since the size of something is not a pointer or array that can be indexed. Except that technically, C allows you to write integer[pointer], with the same meaning as pointer[integer]. Not that anyone ever writes code like that intentionally. But you could. And if you do, it will compile and do the wrong thing, thanks to the macro lacking the extra parentheses.
…On a more substantive note, I quite disagree with the claim that signed sizes are better. If you click through to the previous arena allocator post, the author says that unsigned sizes are a "source of defects" and in particular the code he presents would have a defect if you changed the signed types to unsigned. Which is true – but the code as presented also has a bug! Namely, it will corrupt memory if `count` is negative. You could argue that the code is correct as long as the arguments are valid, but it's very easy for overflow elsewhere in the code to make something accidentally go negative, so it's better for an allocator not to exacerbate the issue.
With unsigned integers, a negative count is not even representable, and a similar overflow elsewhere in the program would instead give you an extremely high positive count, which the code already checks for.
Personally I prefer to use unsigned integers but do as much as possible with bounds-checked wrappers that abort on overflow. Rarely does the performance difference actually matter.
> I could use _Bool, but I’d rather stick to a natural word size and stay away from its weird semantics.
This is even more subjective, but personally I like _Bool's semantics. They mean that if an expression works in an `if` statement:
if (flags & FLAG_ALLOCATED)
then you can extract that same expression into a boolean variable:
_Bool need_free = flags & FLAG_ALLOCATED;
The issue is that `flags & FLAG_ALLOCATED` doesn't equal '0 if unset, 1 if set', but '0 if unset, some arbitrary nonzero value if set'. (Specifically it equals FLAG_ALLOCATED if set, which might be 1 by coincidence, but usually isn't.) This kind of punning is fine in an `if` statement, since any nonzero value will make the check pass. And it's fine as written with `_Bool`, since any nonzero integer will be converted to 1 when the expression is implicitly converted to `_Bool`. But if you replace `_Bool` with `int`, then this neither-0-nor-1 value will just stick around in the variable. Which can cause strange consequences. It means that
if (need_free)
will pass, but
if (need_free == true)
will fail. And if you have another pseudo-bool, then
if (need_free == some_other_bool)
might fail even if both variables are considered 'true' (i.e. nonzero), if they happen to have different values.
_Bool solves this problem. Admittedly, the implicitness has downsides. If you're refactoring the code and you decide you don't really need a separate variable, you might try to replace all uses of `need_free` with its definition, not realizing that the implicit conversion to _Bool was doing useful work. So you might end up with incorrect code like:
if ((flags & FLAG_ALLOCATED) == true)
Also, if you are reading a struct from disk or otherwise stuffing it with arbitrary bytes, and the struct has a _Bool, then you risk undefined behavior if the corresponding byte becomes something other than 0 or 1 – because the compiler assumes that the implicit conversion to 0 or 1 has been done already.
#define FLAG_63 (1ULL << 63)
long long flags = FLAG_63;
In this case,
if (flags & FLAG_63) pass();
will pass, but
typedef int BOOL;
BOOL set = flags & FLAG_63;
if (set) pass();
won't pass, due to truncation.
Question: Would you argue that a datatype that holds the smallest (1-bit) datum should be as wide as the largest integer type just to handle such cases?
If so, that would be highly inefficient for storage purposes. Note that Win32 has 32-bit BOOL type, but internally NT uses 8-bit BOOLEAN type to store bools in structures.
> if (need_free == true)
> Is such a horrible code smell to me. You have a perfectly good boolean. Why compare it to a second boolean to get a third boolean?
> if (need_free)
You are probably interested if the `need_free` flag is set to true, and not if `need_free`. It is true that `if (need_free)` has the same behaviour, but it is some steps farther from what you are interested in.
This feels to me like you're introducing the same unnecessary extra layer into your text as in the original code. I mean, why not
"You are probably interested in whether it's true that the 'need_free' flag is set to true"
leading to
> if ((need_free == true) == true)
? Answer: because that extra layer of indirection adds nothing, and just gives you a bit of extra cognitive load and an extra opportunity to make mistakes. I think the same is true about going from "need_free" to "need_free is set to true".
(This becomes less clear if you have variable names like 'need_free_flag'. I say: so don't do that then! It's almost always appropriate to give boolean values and functions that return boolean values names that reflect what it means when the value is true.)
Actually, and this is probably surprising to many, this is equivalent to
(size)(sizeof ((x)[y]))
sizeof is not a function but a unary operator, and indexing (as well as function calling...) binds stronger than the sizeof operator. It is not a function, not even syntactically! Hence why I strongly prefer putting a space after the sizeof keyword, and to not use parens for the operand unless needed.
That's a good catch. The moral of the story is that unless your macro definition expands to a single token (e.g #define X 123) you should always, always, always surround it with parenthesis. Because C's precedence rules are damn complicated.
> Because C's precedence rules are damn complicated.
This particular part is not actually complicated: the postfix operators bind the most tightly, then the prefix ones, then the infix ones. (The last part is quite messy, though.)
So (int)x[y] parses the same way as, for example, *p++, which should be familliar to a C programmer.
> While I still prefer ALL_CAPS for constants, I’ve adopted lowercase for function-like macros because it’s nicer to read.
"ALL_CAPS" in C was not for constants, but for preprocessor macros. It's shouting in all-caps, because it means "Look out! There's a cpp macro expansion here!"
Related, please stop using "ALL_CAPS" for constants in other languages. Not only does shouting constants as the most prominent syntax in the code make no sense, but there are much better uses for shouting in a programming language.
(For an example of a good use of "ALL_CAPS": if your language ever acquires Scheme-like template-based hygienic macro transformers, "ALL_CAPS" (or "ALL-CAPS") is excellent for making template pattern variables stand out within the otherwise literal code blocks.)
The case of what is a constant, and whether or not it even really is, is not always clear in C. As an embedded developer (almost always on bare metal), variables declared with the const modifier are usually (but not always, it depends on the linker script) placed in read only memory. For those kind of variables (read only ones, they're not really constants as in C++ constexpr) I don't use all caps. But for preprocessor macros, always. Even "#define MY_CONSTANT 10" is a macro, and not a constant or a variable. And it should be treated with caution, because it is dangerous (inexperienced programmers might change it to #define MY_CONST 2 * OTHER_CONSTANT, which opens up a can of worms).
It's the reason Rust makes us use an exclamation mark with macro calls: beware! magic! here!
I like this as a convention, but not necessarily as a grammar rule. Printing values is common enough that it shouldn't require shouting for constant attention, simple code shouldn't trigger sensory overload.
I do think that the default convention of SHOUT_CASE for constants in Rust is too in-your-face given that there's nothing about Rust constants that would particularly require them to stand out. I might have gone with CamelCase given that some things in Rust already straddle the "types are CamelCase, terms are snake_case" delineation (enum variants, even data-less ones, are CamelCase, as well as the implicitly defined constant `Foo` for any unit struct `Foo`).
There is some gotcha: they're copied on use, which means you could end up with more than one copy in your binary (unlikely with an optimizing compiler), or, worse, that if you have interior mutability in them it just won't work.
Interesting take, but good luck with this fight against all caps (snake case) constants, at this point it's almost a consensus, a shared culture element, a deeply ingrained habit that the vast majority of developers have and recognize.
Changing this would be a huge undertake I'd be afraid of engaging, if I cared that much.
Thanks for bringing this up. There is no reason for modern languages that have proper constants (not preprocessor macros which happen to sometimes be used for them) to use this tedious style.
Constants are so innocent and useful. Why indirectly discourage their use by making their usage an eye-bleed?
So you don’t have to wonder if it’s a variable. Old Mac style uses prefixes: kConstant, gGlobalVar, TType, mMemberVar. Remember this was when all coding was done in black and white in a plain text editor.
I do it mainly as a form of namespacing and aid in readability.
do_thing(foo); // foo is variable
do_thing(FOO); // FOO is constant (i.e this call should always do the same thing)
foo = FOO; // I wanna name a variable the same as a constant
nah, sorry, I'm going to keep using ALL_CAPS_CONSTANTS (and enum members, because they're also constants). I like having constants be visually distinct from the rest of my code.
I wrote and still maintain an open source C project for 20+ years. Once a year I get a new guy coming in and telling me I am doing it wrong: you should typedef all data types, you should stop using const, and so on. It stopped being funny after the first couple times.
I'm getting an "I don't use const, and here's my view on it" vibe from the author much more than "you shouldn't use const". I'm really not getting any demand that you change your coding style, just someone reflecting on their work and explaining it to others. And... Whether I agree with their choices or not, I find that very cool and informative.
It wasn't clear to me if he's talking about const as a variable declaration qualifier - I never used it - or const in pointer types, which is very useful.
For me it's the other way around -- I use const for global variables because it makes a real difference, the data will be put in a .ro section.
Pointer-to-const on the other hand (as in "const Foo *x") is a bit of a fluff and it spreads like cancer. I agree with the author that const is a waste of time. And it breaks in situations like showcased by strstr().
I use pointer-to-const in function parameter lists though (most of the time it does not actually break like in strstr()): as documentation, and to be compatible with code that zealously attaches const everywhere where there (currently) is no need to mutate.
But overall my use of const is very very little and I generally do not waste my time (anymore) with it. I almost never have to use "const casts" so I suppose I can manage to keep it in check. In C++ it is a bit worse, when implementing interfaces, like const_iterator etc. That requires annotating constness much more religiously, and that can lead to quite a bit of cruft and repetition.
You're right, I also mark array definitions as const to control the section they go in, I was thinking about things like const int a = 5; ... anything I want a simple const var for is done with #define for me.
const is indeed viral when eg, used in apis, but it's a strong indication at a glance for api users what they can expect to happen to the memory the pointer points to, whether it's just for input or is modified... and the virality is only a pain (it can be a pain) if you didn't use it from the start so all the things it might call are already kitted out with it.
It's kinda both for me. If you have certainty from top to bottom that something will always be const, then maybe.
This is often not true, and even if you think so, you're often wrong. I can't recall all the consequences of the flaws in the system (promote/demote const), but it's not fun to deal with.
I've seen so many things wind up passed to a function or going through an interface eventually that's non-const (or lets not forget is "const'd for safety").
This is where some would say you should give up on practical grounds... if the mission is to determine which const scenarios can be ensured, you argue this is not practically possible and throw the whole thing out.
I will say that using const can make a huge mess of code, especially if you care about adhering to (very reasonable) guidelines.
There's a really good chance you will either have to promote or cast away the const, which I hate.
I tend to agree regarding not using const. It's been a while, so I don't have an example off the top of my head, but it's incredibly easy to break the const mechanism and have to deal with these annoying flaws.
I've just seen this go really bad with any kind of code that has a split responsibility between teams. Eventually you will have to pass to a non-const interface, that you aren't supposed to change.
So perhaps it makes sense if you have control from the top down and can ensure that the constness is maintained, or completely not, if it ends up non-const (then you could also try to move the interface to const, if it truly is)...
... I also suspect in many projects you'd just have to come to the conclusion that nothing can be const'd, because it ends up non-const anyway. Thus leading to the conclusion "just don't use const".
P.S. I'm a bad boy that didn't read TA yet. This is just based on my past experience where we didn't really have the authority to change stuff in the stack... often times there was eg an MCU interface at the end that was non-const... guess we could contact the silica manufacturer... sure they'll get right on that.
It's more work. Not only to put all the annotations correctly, but only because it causes some real headaches. It's easy (implicit) to transition from non-const to const 1 pointer level deep. But the other way around -- it's really awkward to "remove" a const.
The strstr() signature is probably the shortest example / explanation why. To implement strstr(), you have to hack the const away to create the return value. Alternatively, create a mutable_strstr() variant that does the exact same thing. This is the kind of boilerplate that we don't want in C (and that C is bad at generating automatically).
Think about it this way: Real const data doesn't exist. It always gets created (written) somewhere, and usually removed later. One way where this works cleanly is where the data is created at compile time, so the data can be "truly" const, and be put in .ro section, and automatically destroyed when the process terminates. But often, we have situations where some part of the code needs to mutate the data that is only consumed as read only by other parts of the code. One man's const data is another man's mutable data.
In C, the support for making this transition work fluently is just very limited (but I think it's not great in most other languages, either).
> The strstr() signature is probably the shortest example / explanation why. To implement strstr(), you have to hack the const away to create the return value.
It seems to me the "hacking" is exactly the side-effect that is wanted. It's like the requirement in Rust to do certain kinds of things in an `unsafe { }` block (or using the `unsafe` package in Go): not that you want the compiler to prevent you from doing things completely, but that you want the compiler to prevent you from doing things by accident.
> One man's const data is another man's mutable data.
Yes; and the point of `const` for function parameters is to make sure that data isn't mutated unexpectedly.
> It seems to me the "hacking" is exactly the side-effect that is wanted.
It is not. It's broken at the surface level. If you passed a pointer that is already const on your side, you get back a non-const pointer back that allows you to write to your const memory.
And the C library‘s hacks around not being able to overload functions (which is the only reason for strstr et al‘s weird signature) wouldn‘t stop me from using const. It can be really useful both for documentation and for correctness. Think memcpy, not strstr.
But read 2 sentences further, where I had addressed this already.
> Think memcpy, not strstr.
See my other comments, I do think that making const function parameters is generally good for documentation and compatibility. strstr() is only a showcase for the limitations. Typically, const works for function parameters but not data structures.
You want to avoid promoting or casting away the const. If you start using const, you almost inevitably wind up with a mismatch.
This introduces flaws in the type system. I wish I had a better breakdown on the impact of these concerns, but I'd rather not worry at all.
Anyway, if you don't use const, this goes away. Bear in mind the minor amount of "safety" it provides, because you can just ignore it later, as you arguably tend to be doing anyway when you pass a const to non-const or visa versa.
Inevitably, outside of really small insular project (and often times even then), there's something down the line that winds up being non-const that you don't want to change.
C developers of this mindset tend to just come to the conclusion that you will immediately break the type system, just give up on the whole game.
Edit adding at least on example:
Example: You define as const and remove the const later. If anything writes to the non-const, this is undefined behavior
Example: I believe the above is actually true for const promotion if you modify the non-const version... I think this is only after the call (edit. ie after it become const, really interest in the answer).
No Undefined behavior
/* I imagine this would be okay */
si_non_const = si_non_const + GetMagicValue();
/* Const is promoted here */
const int fparam = si_non_const;
/* Writing to fparam is undefined past here */
f_const(&fparam);
Undefined behavior
/* I imagine writing, after using as const is also not defined, but is fine at this point */
si_non_const = si_non_const + GetMagicValue();
/* Here we now have a constant value that will never be written to */
const int fparam = si_non_const;
f_const(&fparam);
/* I think this would also be UB, even though it's accessed through a different symbol */
si_non_const = si_non_const + GetMagicValue();
Interested in other opinion, maybe will think on later... would it be valid for the compiler to remove that last assignment?
Edit: Sorry, this is unreadable, if you put a space between the not undefined, and undefined it's easier
I"m even not agreeing with the const part. The standard did define const API"s, so why shouldn"t we follow? I know that C const are only half of C++ consts, but still.
Still catching const errors somewhere because I do use const in APIs.
I only agree with "Declare all functions static except for entry points".
s8(s) is only for literal strings, it should be called s8_c instead and keep s8 for the default ctor.
The struct return part is okay, but I"ve never used. This is not Common Lisp.
Can you explain the static functions thing? What’s the benefit of declaring all functions static if you’re compiling them as a single translation unit anyway?
I get that everyone has their own coding style, but ditching established conventions in C for personal aesthetic seems a bit much. Like, using u8 or i32 instead of the standard uint8_t or int32_t might save a few keystrokes, but it could confuse anyone else looking at the code. And the custom string type over null-terminated strings? C's built around those, and deviating from that just feels like making life harder for anyone else who might need to work with your code.
And manually writing out Win32 API prototypes instead of including windows.h might shave off some compile time, but it's like ignoring a well-maintained highway to trek through the woods. Just seems like a lot of these changes are about personal preference rather than sticking to what makes C code easy for everyone to work with.
> Like, using u8 or i32 instead of the standard uint8_t or int32_t might save a few keystrokes [...]
It is not about saving keystrokes, it is about reducing sensory load when reading it.
Sorry, I know it may sound like I'm splitting hairs, but every single time when the argument of verbosity vs conciseness in programing languages comes around, this "keystrokes" argument is thrown and it is extremely flawed. The core belief that conciseness is only better for faster typing but that verbosity is somehow always better than conciseness for reading is just plain wrong and we should stop using it. And yes, verbosity has some advantages for reading comprehension, but so does conciseness, no side is a clear winner, it is all about the different compromises.
I think you completely missed the point of his comment. C isn't a new programming language. There are well worn conventions and making custom types because you don't like them is like forking your own custom dialect nobody can understand for very little benefit.
u16 etc show up a lot and are unlikely to confuse programmers.
Where it does go to pieces is when two different programs both define u16, use them in header files, and then a third program tries to include both those header files at the same time. The big advantage of <stdint.h> is avoiding that failure mode.
The namespaced library type equivalent is something like libname_u32, at which point it's tempting to write uint32_t instead of the libname:: or libname_ prefix.
> Like, using u8 or i32 instead of the standard uint8_t or int32_t might save a few keystrokes, but it could confuse anyone else looking at the code.
This seems a theoretical possibility _at best_, and a fairly strawman-y one at that. I doubt any competent C programmer would get "confused". Irritated at a different style, maybe.
Too, everything is hard to read before you learn to read it. -- Rich Hickey
> To beginners it might seem like “wasting memory” by using a 32-bit boolean
Maybe I'm a beginner then. He lists a few cases where it's not worse than sticking to 8-bit bools, but no cases where it's actually an improvement. It still wastes memory sometimes, e.g. if you have adjacent booleans in a struct, or boolean variables in a function that spill out of registers onto the stack. Sure it's only a few bytes here and there, but why pessimize? What do you gain from using a larger size?
It depends entirely on the architectures | CPUs, that said the obvious case from past experience is numeric processsing jobs where (say) you flow data into "per cycle" structs that lead with some conditionals and fill out with (say) 512 | 1024 | 2048 sample points for that cycle (32 or 64 bit ints or floats) .. the 'meat' of the per cycle job.
My specific bug bear here was a junior who insisted "saving space" by packing the structs and using a single 8 bit byte for the conditionals.
Their 'improved' code ground throughput on intel chips by a factor of 10 or so and generated BUS ERRORs on SPARC RISC architectures.
By packing the header of the structs they misaligned the array of data values such that the intel chips were silently fetching two 32 bit words (say) to get half a word from each to splice together to form a 32 bit data value (that was passed straddling a word boundary) to pipe into the ALU and then do something similar to repack on the other end - SPARC's quite sensibly were throwing a fit at non aligned data.
Point being - sometimes it makes sense to fit data to the architecture and not pack data to "save" space (this is all for throughput piped calculations not long term file storage in any case)
This is the use case for `uint_fast8_t` (part of the C99 standard); it should use whatever width of unsigned integer is enough to store a byte, but fastest for the platform. You always know that the type can be serialized as 8 bits, but it might be larger in memory. So long as you don't assume too much about your struct sizes across platforms, it should be a good choice for this. Although, if alignment is an issue, it might be a bit more complicated depending on platform.
10 years ago when ATmegas were still around and your 32 bit variable was generating 3 instructions for addition I would say „right on“ but now everything is a 32 bit Cortex-M and please stop polluting your code with this nonsense
Please understand, I am still in a position where I am writing new code for a platform which only has one compiler, a proprietary fork of GCC from nearly 20 years ago. I assume other C programmers might have similar situations.
I think it's not a GPL violation if you keep the fork non-public.
Though I'm entirely sure not when something is considered private or public. You can obviously make changes to a GPL repo, compile it and run the executable yourself and just never release the source code.
But what happens when you start sharing the executable with your friends, or confine it to a company?
"I made this GCC fork with some awesome features. You can contact me at joe@gmail.com if you're intere$ted ;)"
My understanding is that the GPL only requires the source code to be made available on request for at least 3 years (or as long as you support the software, if more than 3 years). If you want to require people who want the source to write to you via the Post Office and pay shipping+handling+cost of a disc to receive the source code, I believe this is permitted by the GPL as long as you don't profit off of the cost.
Of course, for almost all practical cases, the source code for a GPLed program is made available as a download off the Internet because the mail order disc route seems really archaic these days and probably would be removed altogether in a GPL version 4 if some prominent company used this loophole to evade the spirit of the GPL. Either that or somebody would jump through your hoops to get the source and just stick it on a public GitHub repo. If you then DMCA that repo, you'd be in violation of the GPL.
If you share an GPLed executable with your friends or with other people at a company, then they'd presumably be able to request the source code. But if you run a Cloud GCC service with your fork, you could get away with keeping your source code proprietary because GCC isn't under AGPL.
All the GPL says on source code access is that you need to make the source code available to whoever you distributed your program to. If the program never leaves a closed circle of people, neither does the source code.
For example, Microchip XC16 [1]. It is GCC with changes to support their PIC processors. Some of the changes introduce bugs, for example (at least as of v1.31) the linker would copy the input linker script to a temporary location while handling includes or other pre-processor macros in the linker script. Of course if you happen to run two instances at exactly the same time one of them fails.
As far as the licensing part goes they give you the source code, but last time I tried I could not get it to compile. Kind of lame and sketchy in my opinion.
But if you don't use packed attributes, then the compiler will still add padding as necessary to avoid misalignment, while not wasting space when that's not necessary.
The key part (for myself) of ForkMeOnTinder's comment was:
> Maybe I'm a beginner then. He lists a few cases where it's not worse than sticking to 8-bit bools, but no cases where it's actually an improvement. It still wastes memory sometimes
They key part of my response is sometimes "wasting memory" (to gain alignment) is a good thing.
If someone, a beginner, is concerned about percieved wasted memory then of course they will use "packed".
As for the guts of your comment, I agree with your sentiment but would exercise caution about expecting a compiler to do what you expect in practice - especially for cross architectural projects that are intended to be robust for a decade and more - code will be put through muliple compilers across multiple architectures and potentially many many flags will be appied that may conflict in unforseen ways with each other.
In general I supported the notion of sanity check routines that double check assumptions at runtime, if you want data aligned, require data to be big endian or small endian etc then have some runtime sanity checks that can verify this for specific executables on the target platform
If you have three chars next to each other in a struct, there's a good chance they'll take 4 bytes of memory due to padding. 4 32-bit bools guarantee it'll take 12 at least, if not 16.
Most of the time an easy optimization is to pad fields of your struct to a 32 bit boundary. Almost any compiler will do this for you (look up "struct alignment / padding"). If the compiler is going to do this anyway, might as well use the memory yourself instead of letting it be empty space. If it doesn't happen, you leave performance on the table, so doing this raises the chance that your struct/fields will be aligned.
Nuance is that each field should be at an address divisible by the fields size or wordline size, not some magic 32 constant. The entire struct should also be padded to a multiple of the largest fields size. In practice this usually means 32 bit alignment.
Architectures are generally optimized for aligned access (or disallow unaligned access), but what counts as "aligned" is different for each type.
A char type that is used for a bool can be accessed on any byte boundary because the alignment of a char is 1. The alignment of a 32-bit value is 4.
However, architectures are generally more optimized for 32-bit operations in registers. If you're dealing with a char in a register, the compiler will generally treat it as a 32-bit value, clearing the top bits. (This is one of those places where C's UB can bite you.)
However, there are architectures where 32-bit access is optimized.
I disagree about the structs vs out-parameters thing. I’ve found it makes functions that could return an error much harder to compose and leads to a proliferation of types all over the place. In practice almost all functions can fail (assuming you are handling OOM), so having a predictable style of returning errors is more important.
Which almost noone ever does. It's very hard and almost never has any benefit. At that point you have way different problems than programming style choices...
You'd get normal errno/out param semantics if you had access to semantic struct unpacking. But you don't. Even then, composing optional values in C has always been a bit of a pain. If you're not doing exceptions, your two choices seem to be exceptions and monads in every language, and neither work in C, or would even be compatible with the philosophy that most C programmers have. I guess you could attempt to pull some kind of macro but it only works on simple one-to-the-other calls. C++ optionals, as terrible as C++ is, are certainly more fun to use than
Returning option<foo> (or sum<foo, error>) is the right thing but a real pain to write in C. I'm not sure the pattern of `if (thing(...)) goto fail` on every function call is particularly wonderful either, though the Go crowd seem to like it.
Otherwise there's thread_local mylibrary_errno, which might actually be the right thing for within a library, translating it to an enum return on the boundaries.
Well, I should probably just say "We're done here." and stop reading the rest of the article. "Signed sizes" are an extremely surprising abstraction break that are just asking for disaster.
> No const. It serves no practical role in optimization, and I cannot recall an instance where it caught, or would have caught, a mistake.
Should you even be writing C if you haven't hit this? People mix up "in buffers" and "out buffers" all the time. "const" flags this immediately.
> Declare all functions static except for entry points. Again, with everything compiled as a single translation unit there’s no reason to do otherwise.
And when you go trying to debug something and get at a variable or function that you can't find because everything is "static", you'll curse the one who wrote the code.
> Another change has been preferring structure returns instead of out parameters.
Which is a great way to accidentally return a pointer to your stack and open a big ass security hole. Passing in the output buffers makes clear the ownership semantics.
This guy seems like he mostly writes code for 64-bit systems. The coding advice is ... okay, I guess? Maybe? In that domain?
In a 32-bit embedded domain, some of these guidelines are a good way to get youself into a lot of trouble in a real hurry.
Eh, I hard disagree with this memo. He's either dismissing or unaware of the biggest advantage of unsigned types, namely they make invalid state unrepresentible. And essentially all of his criticism of unsigned types is really criticism of the sloppy way old C and C++ compilers let you mix signed and unsigned numbers in math operations.
Modern C/C++ compilers can and will warn you (quite aggressively) if you mix signed and unsigned numbers without thinking about it.
A lot of the examples also seem weird. Eg, he gives a negative example of a function:
In this, he complains that you can still write buggy code:
area(height1-height2, length1-length2);
He's right - that is potentially buggy, But, that code would be buggy whether the area function took signed or unsigned numbers as input. However, the signed version of this function is still worse imo because it could hide the logic bug for longer. If the area function should always return a positive number, I'd much rather that invalid input results in an area number like 4294967250 than a small negative number.
Similarly, accidentally passing a negative index to a vec is much more dangerous with signed indexes because v[-2] will probably quietly work (but corrupt memory). However, v[4294967294] will segfault on the problematic line of code. That'll be much easier to find & debug.
And a lot of the examples he gives, you'd get nice clear compiler warnings in most modern compilers if you use unsigned integers. You won't get any warnings with signed integers. Your program will just misbehave. And thats much worse. I'd rather an easy to find bug than a hard to find bug any day of the week.
The advantage of using signed types is that you can reliably find overflow bugs using UBSan and protect against exploiting such errors by trapping at run time. For unsigned types, wrap-around bugs are much harder to find and your program will silently misbehave.
With unsigned you can actually check for overflow yourself very easily
z=x+y; if(z < x || z < y) // overflow
And bounds checks are just a single comparisons against an upper bound (handles both over and underflow)
size = x + y;
// or
size = x - y
if(size < bound) // good to go
Prior to C23 (stdckdint.h) its very error prone to check for signed overflow since you have to rearrange equations to make sure no operation could ever possibly overflow.
You can write correct programs with both. The reality is that people often fail to do this. But you can automatically detect signed overflow and protect against it, while unsigned wrap detected at run-time could be a bug or could be just fine (e.g. because you did your own "overflow" check and handle it correctly). This makes it extremely hard to find unsigned wraparound bugs and impossible to trap at run-time.
The no-const people will never be satisfied, so just use const as necessary, propagate as required, and ignore them when they complain. If they take it out, put it back in. They'll always get bored first. I've been doing this for 25 years, and I'm still here.
(The static thing might depend on the tooling. I went static-by-default about 15 years ago, around the same time I went full size_t, and I've yet to have a problem with it.)
I’ve been her 30 years. I’ve never found much use for const. I value brief simple code that doesn’t rely on things like const to tell people what’s going on.
Codebases have their own conventions and design patterns. If you have that const is a needless formality.
Code should be being simple and clean first, constantly stating things that are obvious 90% of the time isn’t that.
What happened to the conditional expressions? Move them to the interiors of doX() and doZ().
That was an interesting point. Not sure that it's always valid but I guess it depends where you want the abstraction to lay, and how it affects the mental construct around the code.
e.g.
deleteRecords();
is not better than
if let x = deadRecords()
deleteRecords(x);
Sure, it looks messier but there is value is showing upfront that you're pruning and not wiping.
If the author wisely renames his function e.g. pruneDeadProjects(), yes. But merely moving the the condition within the function can be dangerous for context and be a leaky abstraction.
typedef all structs - yes, helps with conciseness. Use typedefs liberally, I say. But only typedef the things themselves, not pointers to the things. You can always use (type *) when you need a pointer. In particular, for function pointers, typedef the function, not the function pointer. Then you can use the function typedef for function declarations too, which gives you parameter type checking without needing to fix declarations everywhere if you change a function signature. I see most C codebases get this one wrong, typedef'ing the function pointer and still needing to manually write out all function declarations for that pointer definition.
I'm not sold on the structs as return types thing. I prefer just a numeric error code as a return value, and out parameters for any other returns.
I prefer to use typedef's for opaque structs to emulate classes with all private fields, and use 'struct' for plain ol' data structures. Classes should only be accessed via functions, while structs can be accessed directly.
I think this is more-or-less a C/POSIX standard convention. E.g., `pthread_t` vs. `struct stat`.
> I prefer to use typedef's for opaque structs to emulate classes with all private fields, and use 'struct' for plain ol' data structures. Classes should only be accessed via functions, while structs can be accessed directly.
That's all fine, but you cannot have nicely behaved stack allocated structs and use the data hiding method outlined in that blog post, which I think is a pretty big caveat
Yes, allowing clients to control allocation of a struct is a crucial feature of any C API, especially if it's going to be used on embedded targets where heap is unavailable or restricted.
The pattern I like to use for this is to expose class definitions, and declare each field with an underscore suffix to indicate it's private.
> But only typedef the things themselves, not pointers to the things.
I agree with this. One of the things I dislike about SDL_net, etc, is they do exactly what you're describing. It's a pointer but they typedef it as if it's a value type.
I’ve started writing a bare metal OS for Arm64. It’s very early but I’ve done some similar things. I’m using pascal strings, I’ve also renamed the types (though I’m using “int8” style, not “i8”).
I quickly decided that I never intend to port real software to it, so I really don’t have to conform to standard C library functions or conventions. That’s given me more freedom to play around. C is old enough to have a lot of baggage from when every byte was precious, even in function names.
It’s nice to get away from that. Much like the contents of this post, that plus other small renamed just ended up feeling like a nice cleanup.
Yeah, old C compilers would only look at the first 6 characters of a name, and the rest were insignificant. That's how you get nanrs like "strcpy" and "malloc" instead of something like "string_copy" or "mem_allocate" (I still think "memory_allocate" would be long enough to be annoying to type).
One of last vestiges of this fact AFAIK was libjpeg, which had a macro NEED_SHORT_EXTERNAL_NAMES that shortens all public identifiers to have unique 6-letter-long prefixes. Libjpeg-turbo nowadays has removed them though [1].
Assuming float is 32 bits and double is 64 bits sounds like a foot-gun. OpenCV defines a float16_t [0], CUDA implements half-precision floats [1], micro-controllers implement whatever they want.
C++23 introduces fixed width floating-point types [2], but not aware of any way to enforce this in C. What I would suggest it to have a macro to check data is not lost at compile time.
Generally I agree with others, it might be better to leave some of these things as default for readability, even if it is not concise.
The notion of "personal style" is problematic, even for hobby projects, because (good) programming is ultimately a social activity.
Even Linux started as a personal project, but because of its quality and the need it met it quickly spread. So please write your code in such a way that experienced other C programmers can read it easily.
In isolation, I like some of his ideas, but some issues with C remain, and he is perhaps just to comfortable with C to jump ship and embrace Rust, which has many things he likes and more (e.g. no buffer overflows by design).
I adopted the style of writing all macros in lower case with the prefix "macro_", so i can grep through all macros. So macro_ becomes like a keyword. Same with enum.
I use almost the same naming scheme for i32,f32, etc., but i typedef for example size_t const to usz, and size_t to usz_ (or use macro mut(x) x ## _) . So all shorter type names are const by default. I use a single header of 35 sloc to do that.
I see that this way of coding can be confusing for other people, so i avoid it when writing code that other people have to work with. But for personal code its really enjoyable for me.
That breaks any macro that uses sizeof in its expansion, and subtly changes any code snippet you might bring into the code that uses sizeof, even if those macro are defined first.
Speaking of which, if you define a macro for a C keyword before including any standard header, the behavior is undefined.
It's an unparenthesized unary expression, which has a lower precedence than postfix. sizeof(x)[ptr] will turn into (size)sizeof(x)[ptr] which parses as (size) ( sizeof(x)[ptr] ).
This reminds me of "#define max ..." in Windows.h. Not as bad, but if you autoreplace `sizeof(ptrdiff_t)` with `sizeof(size)`, good luck, because it will output size of type of size variable, if it exists in the scope.
Yeah that's why the uppercase/lowercase hardline is a good one. Even Lisp, where free variable capture is a legitimate design pattern, has troubles with this.
Please don't. `const` is incredibly valuable, not only to the reader, but to the compiler.
Take for example:
int Foo_bar(Foo const* self);
Just looking at this signature, I know that calling `bar()` will not modify the state of the object. This is incredibly valuable information to the reader.
Furthermore, if I want to create a `Foo` constant, I can only call this function if it is `const`.
static Foo const a_foo = FOO_INIT(&some_params);
return Foo_bar(&a_foo); // Will not compile without 'const' in function
`const` is valuable to the compiler, since `a_foo` can be placed into ROM on some platforms like MCUs, saving precious RAM.
You only know there are no mutations if Foo itself does not contain any indirections. Additionally, the compiler generally cannot assume that Foo_bar does not modify Foo as it is legal to cast away const as long as it is not originally a variable declared as const (so in your static Foo example it would be UB to cast away const).
static + const is valuable, but const parameters are merely a convention, there is no actual enforcement around them and due to aliasing the compiler generally can’t assume the parameter doesn’t actually change anyway.
> Additionally, the compiler generally cannot assume that Foo_bar does not modify Foo as it is legal to cast away const
No, but it can warn you!
The type is meant to capture programmer intention, and if you use `const` the compiler can warn you that your intention does not match the intention of the existing code (like, the intention of the author who wrote Foo_Bar).
const in C and C++ are an abomination. On a pointer they don’t tell the compiler to do shit, because they can’t.
That I can agree with TFA. However I agree with the GP that dismissing it entirely is a little misplaced. It serves as a hint/documentation and I think the article undersells the value of rodata (not the pointer use of const which is basically shit).
I mean I have seen at least a few SIGSEGV/aborts due to attempted writes to ro memory. Also like, one of the few modern justifications for C, embedded, const still has important link time meaning.
Final only protects the variable from being assigned a new reference (similar to a const pointer). It doesn’t protect any of the underlying data held by the object from being changed, unless the entire hierarchy has every field declared final as well. I still use final heavily in all of my Java code, but it doesnt convey the full intent I would like it to.
I remember James Gosling saying, a long time ago, that the whole class should be either mutable or not so you do not need to tag some methods with const.
The consequence is that you may define two classes, one non-mutable and one mutable like String/StringBuilder.
It means you have to triplicate each mutable class, because besides the immutable variant you also need the common interface (e.g. CharSequence), in order to pass mutable instances to read-only functions.
Yes, so three classes. I’m counting a Java interface as a class, because it is the same as a purely abstract class. In any case, three different named types.
As a side note, I would say the interface is unmodifiable, not immutable, because references of the interface type may refer to mutable instances that can mutate while you use it through the interface. Immutable = doesn’t change state, unmodifiable = you can’t change it’s state via that reference (but it might change it’s state due to other concurrent code holding a mutable reference). This nomenclature comes from the “unmodifiable” collection wrappers in Java, which don’t make the underlying object immutable.
I'm afraid you are mistaken. In particular for pointers, const does not guarantee that the memory at the location pointed to won't change. Const only guarantees that the address itself doesn't change.
> const does not guarantee that the memory at the location pointed to won't change
I didn't say this. I said a `const` function tells the reader that the state of an object doesn't change.
Another reader correctly pointed out that there are ways to modify the state of a `const` parameters (indirection and const cast), but I would argue that such an API is poorly-designed.
To qualify my original comment, a reader only knows a function doesn't change an object's state if the API is well-designed.
Even then, some other function can change the memory at the address of self while this one is executing, especially in concurrent systems. Additionally, any other pointer pointing to the same address can also modify self's memory. const in this case is really just "scout's honour".
float4 is nice, but in video games, especially graphics, the vast majority of our floats are 32 bits but we also use 16 bits floats quite extensively, and in this case it is quite practical to follow the same convention. f32 -> f16 f32x4 -> f16x4
And when working on some heavily optimized SIMD code on the CPU side, I tend to use the default types even less.
My preference for float4 comes from HLSL so I'm aware. For vectors of 16 bit floats my preference would be half4 etc. (Although I do concede that in HLSL land that doesn't actually do what you want on older language versions.)
IMO including the number of bits in all primitive types is usually an overcorrection from trauma caused by C/C++'s historic loose definitions of primitive type sizes. However I don't write much CPU SIMD code and can definitely see how you'd develop your preference from that context.
If you're dead set on doing this, the correct way would be to name the macro in all caps e.g. #define SIZEOF(x) as C is case sensitive. It is somewhat self-documenting to the next guy that SIZEOF() != sizeof().
Considering '#defines' are done in a textual pre-precessing by the C pre-processor, they don't know much at all about the C language. You can define out int, long, struct or anything.
I have seen many people redefine 'for' and 'while'. These people often argue that it is an improvement.
This is a very common macro to get static array lengths and i'm not sure there is any other way to do the same thing (i.e. give a static array, get back the number of items in it) in any other way.
countof(foo()) looks like foo() is only called once, but would actually be called twice. That's what GP is talking about, it's evaluated twice after the expansion when the code is actually running, not during the expansion.
It is not evaluated for regular arrays. It is evaluated for arrays with variable size, you need to be careful a bit. But this is rarely happens to be a problem.
The general rule for sizeof is to apply it only to variable names or directly to typenames.
For example, it would be illegal to do the following:
> #define int long
Because you're replacing the int keyword with something else.
The standard says:
> 17.6.4.3.1 [macro.names] paragraph 2: A translation unit shall not #define or #undef names lexically identical to keywords, to the identifiers listed in Table 3, or to the attribute-tokens described in 7.6.
That's C++.
But I couldn't find the same restriction in C. In fact it seems that C allows it as long as you don't include any of the standard C header.
> 7.1.2 "Standard headers" §5 [...] The program shall not have any
macros with names lexically identical to keywords currently defined prior to the inclusion of the
header or when any macro defined in the header is expanded.
And so you still can re-define keywords, but only after you've included all the standard headers you want. Which makes sense: the meaning of including a standard header is entirely standard-mandated (they are not even required to be actual files) so making anything that could potentially mess with the implementation's implementation of standard headers UB is reasonable.
While there are a few disagreeable points, I like the article.
I've always felt that C is unfairly maligned. Yes, it's very low level, it's meant to be. Yes, it lets you shoot yourself in the foot, but what language doesn't?
Most of the problems with C are really issues with the standard library, the Unix (now Posix) interfaces, and the string type.
None of these are actually part of C, but are part of how C is normally used. So those problems can be avoided, and use C for what it's good at.
It's not unfairly maligned, it's just that everyone remembers their college/university 'learning experience' which made no distinction between C/C++, they were told to use the Borland compiler, and when trying to learn printing "hello world" they only got a `segmentation fault` error instead of a stack trace. When they asked why it's so hard, they were told C/C++ is hard - so they dropped the class.
Then they picked up a JS or Python class, were told high-level languages are easy and viola! they started to understand programming.
That's the reason people are spiteful of it. They had a terrible learning experience right out the gate.
No; it's up to the program author to link against a library that provided back-traces (and maybe install a signal handler to call into that unwinder). Even then, some kind of information needs to be retained in the binary that's normally not (-gmlt comes to mind).
Usually folks attach a debugger to capture a stack trace. Usually the debugger uses debug info to determine where the program is, and it's stack trace. Or it can walk frame pointers. Depends on if either are even used, which is a compile time decision.
The issue is that it's a spectrum: how easily you can shoot yourself in the foot, especially on accident, without awareness of the risks. And perhaps what the consequences are when you do. Risk and consequence. C is high risk and also high consequence.
In higher level languages, you can't shoot yourself in the foot nearly as easily in such a way as to trivially create a correctness problem and security vulnerability (like a buffer under/overflow). Languages like Java and C# make it pretty difficult to shoot yourself in the foot this way (though you still can in other ways, like with incorrect concurrency). Rust makes it a lot harder to shoot yourself in the foot across the board, especially on accident (i.e., without being aware that you're something dangerous and low-level, viz. `unsafe`).
The "categorically" part is a useless qualification, you don't program in a binary world, the ease with which a footgun is possible in a language is very important and can't be reduced to isPossible
Haha, you have no idea how powerful my dissociation is! I could very well be programming in a binary world, if my dissociative identity is linked to the computer!!
Jokes aside though, I doubt that no programmers are immersed in the computer while they code. It totally is a different world, just implemented inside this one.
> I've always felt that C is unfairly maligned. Yes, it's very low level, it's meant to be. Yes, it lets you shoot yourself in the foot, but what language doesn't
Isn’t it a beauty of lower level languages that creating higher level abstractions provides more value?
I like it a lot. Especially the part about ditching const
qualifiers. They clutter function declarations, don't make the intent
any clear, and almost never improve performance. Restrict, on the
other hand, I've found makes the compilers emit better code in many
cases.
But I don't like using 1 and 0 instead of booleans. Many standard C
functions (fclose for example), return 0 on success. Better to be
explicit here.
I like using the const keyword, and believe it serves a real purpose with readability. I feel like most things are read access by default which is why it seems cluttered. I believe rust gets immutable by default correct .
I use an exitint typedef to signify "0 is success, non-0 is failure" and boolint equivalent to his b32. Not typesafe of course, so it's just info for fallible humans.
I think C really needs an update to the standard library that includes these shorter types. Seems like a fine list though.
Some of my own style changes this year:
I try really hard to write functional code. Mainly try to keep functions pure, and write declarative code. I find that this makes the code easier to write(not necessarily read), and I'm less scared of bugs.
I also avoid malloc unless I absolutely need it. You can usually preallocate space on the stack or use a fixed length buffer, which pretty much avoids all fears of memory leaks or use after free type bugs. You will sometimes waste memory by allocating more than you need, but it's a lot more predictable.
> This seems like a bad idea, because the whole point of an assert is that something shouldn't happen, but might due to a (future?) bug.
And so it’s a bad idea because…?
The whole idea is to notice a bug before it ships. Asserts are usually enabled in test and debug builds. So having an assert hit the “unreachable” path should be a good way to notice “hey, you’ve achieved the unexpected” in a bad way. You’re going to need to clarify in more detail why you think that’s a bad thing. I’m guessing because you would prefer this to be a real runtime check in non debug builds?
Yikes. I did have to go down a little rabbit hole to understand the semantics of that builtin (I don’t normally write C if that wasn’t immediately obvious from the question) but that seems like a really questionable interpretation of “this should never happen”. I would expect the equivalent of a fault being triggered and termination of the program, but I guess this is what the legacy of intentionally obtuse undefined behavior handling in compilers gets you.
The builtin itself is fine. It works exactly as it's intended. It says "I've double and tripple checked this. Trust me compiler. Just go fast". But you should not use it to construct an assert.
Eh. I absolutely get what you're saying. And this is for sure flying very close to the knife's edge. But if your assertion checks don't run in release mode, and due to some bug, those invariants don't hold, well, your program is already going to exhibit undefined behaviour. Why not let the compiler know about the undefined behaviour so it can optimize better?
The nice thing about this approach is that the assertion provides value both in debug and release mode. In debug mode, it checks your invariants. And in release mode, it makes your program smaller and faster.
Personally I quite like rust's choice to have a pair of assert functions: assert!() and debug_assert!(). The standard assert function still does its check in both debug and release mode. And honestly thats a fine default these days. Sure, it makes the binary slightly bigger and the program slightly slower, but on modern computers it usually doesn't matter. And when it does matter (like your assertion check is expensive), we have debug_assert instead.
> But if your assertion checks don't run in release mode, and due to some bug, those invariants don't hold, well, your program is already going to exhibit undefined behaviour. Why not let the compiler know about the undefined behaviour so it can optimize better?
Usually in release mode you want to log the core dump and then fix the bug.
Yeah; thats why I like rust's approach. You can either leave assertions in in release mode, so you get your core dump. Or you can take them out if you're confident they won't fire in order to make the program faster.
The unreachable pragma suggested by the author is just a more extreme version of the latter choice.
Really lovely. A lot here reminds me of design in Odin lang. Short integral types, no const, composite returns over out params. Big fan of the approach of designing for a single translation unit and exploiting the optimisations that provides from RVO etc.
It makes no sense to use the word "length" to mean one less than the number of items. You could call it maxindexof perhaps.
There may be good arguments for zero based indexing, but we have to also accept that there are downsides. One is that your code has to feature an artificial quantity obtained by subtracting one from a meaningful quantity.
OK, thanks I don't use C much and definitely forgot about that. But my point still stands doesn't it, in that the author is using these preprocessor macros for general arrays, which don't have a special terminator sentinel.
I came back to C after a good long time in Go, and I found that my C style had picked up some of these same good ideas, which I attributed to Go. In particular I also swore off NUL terminated strings, and started using structure returns to send back multiple values.
Pretty much agree with most of this. My own personal evolved style is pretty similar. I'm suspect this kind of pragmatic style offends the theoretic and the academic. That can be intimidating. I'm glad at least one other person out there is like me.
Sharing preferences and opinions on code ergonomics certainly has value for me, and I bet it has to other people too. This is, after all, a developer's forum.
I'm certain your opinion on petunias and your possible distaste for orchids will be welcomed in a flower-news type orange site. :-)
Rugby has much lower injuries than American football, it's often argued because rugby players don't use the helmets and padding and so are less willing to make catastrophically dangerous hits.
I would not use typedefs as this should not be in the C syntax (like enum,switch, and much more).
The primitive type names should be native, but to "fix" C I prefer using the C preprocessor (I use it for namespace/name mangling too). This is not perfect, but should be already way more than enough.
With proper preprocessor usage (without going amok), one can write one compilation unit software roughly easily.
>> typedef all structures. I used to shy away from it
it seems this may be more an issue with being shy, than a coding issue :P
in my view, language is a Style, so coding in C is c-style coding.
when i started coding in python (small projects initially), my code cried-out Java Java (typing, packaging, naming, oop), and took nearly as long to write - I laughed my head of when I realized, just because I could doesn't mean I should.
Can anyone explain the use of ptrdiff_t instead of size_t (he has typedef'd "size" for ptrdiff_t) The macros are wrapping _Alignof and sizeof both which can't return non-zero numbers I thought.
Great article. One thing for me is that I think we named our variables wrong. We should be specifying the number of bytes, not bits. I use
U1, U2, U4, U8, I1, I2, etc
Also S for "slot" aka unsigned pointer sized integer (usize_t)
Another big point is formatting code to line-up instead of with an autoformatter. When you are doing something which is almost the same but slightly different it helps readability considerably. It is also a sign of a well-loved codebase, since I've never seen an autoformatter that can do it.
Maybe we could make formatters at least auto _detect_ that code is already aligned and to just leave that code alone. Some kind of "love heuristic"
Could you elaborate on the reasoning behind using byte length instead of bit length?
Most of the time when I use fixed-width int types I’m trying to create guarantees for bitwise operators. From my perspective I feel like it therefore makes the most sense to name types on a per-bit level.
We almost always talk in bytes. When trying to reason about alignment it's bytes, when reading a serial IO from a file it's bytes. I hardly ever think in bits and when I do, I think in hex not decimal.
I also like that it makes all the type names the same width (notably U1/U2 vs u8/u16).
Honest question: why C? I've been writing C++ firmware for SoCs, supposedly an area where C is supposed to be king, and C++ was just as good, except it came with all the batteries included. So, why C?
Because C++ without exceptions is not C++, and those languages cannot catch exceptions, nor call overloaded functions, nor delete or create objects using new and delete, nor refer to fields with classes, nor call methods on objects.
you can run valgrind and it will just point to you "you freed the memory on this line and then tried to reused it 10 lines bellow here, fix it." -- every time.
And once you fix it, you have built a light weight library that you can use from any other language.
Also these pointer manipulation is what gives C its power.
I don't use any dynamically allocated memory in my firmware projects.
I do sometimes use statically allocated pools for things like packet buffers and allocate chunks out of them, but their lifetimes are not scope based, so automatic call of constructors/destructors would not be of any help.
What do you want me to say? If nothing in your code owns a resource and needs to dispose of it when it's done, then obviously, you don't need smart pointers. I'm not here to evangelize, I'm trying to understand why someone would straight out refuse to use a language that offered more options if needed.
I'm also wondering if you've heard about std::span, given your use case. I would be surprised if you weren't rebuilding much of that functionality.
Everything around the rules of destruction in derived classes from a base class with/without a virtual destructor.
How about capture of values in a lambda within a loop? How to prevent a template expansion from killing you build process? When are move semantics sufficient for the std container classes and when are they not? What's the order of construction of multiple objects at file scope? When should a copy constructor be written so that the default shallow copy is prevented?
All of those are footguns, because if you do them wrong the program has runtime bugs without any compiler warnings.
They are all absent from C.
I'm typing on a phone, so won't go into detailw, but if you want to make such an insane claim, bear in mind that Scott Meyers himself said that C++ is too complex for him
You are not disagreeing with me when you make that claim, you're disagreeing with one of the world's foremost experts on
C++.
But you have an uncontrollable urge to write here? Someone's holding a gun to your head?
> You are not disagreeing with me when you make that claim, you're disagreeing with one of the world's foremost experts on C++.
I guess that settles everything then. Never mind that you're misquoting him.
Look, if you hadn't written the last two paragraphs, I'd have replied to your points, but they strongly indicate it would fall into deaf ears. You're clearly more interested in entertaining the peanut gallery more than actual discussion.
> I guess that settles everything then. Never mind that you're misquoting him.
I'm not misquoting him - you are free to provide a link to the context in which he said what he said.
The worlds foremost expert in C++, author of dozens of books on C++, disagrees with you. I'm merely agreeing with him.
> Look, if you hadn't written the last two paragraphs, I'd have replied to your points, but they strongly indicate it would fall into deaf ears. You're clearly more interested in entertaining the peanut gallery more than actual discussion.
The fact that you entered a thread about C practices, then got all salty when you tried to go with the "but why not use C++?" argument, then devolved into personal attacks is ... well "classy" is not the word I'd use.
EDIT: You can't respond to those points - those are all well-known footguns that are present in C++ but not in C. What were you going to respond with? "No, C++ doesn't have those!"?
Because the can read someone else C code without needing to look up almost all the quirks they used, while I cannot read someone's C++ code without having Google handy.
When I started reading the first section, about his own types, I couldn't help thinking: oh my, sounds like "Hungarian notation"[1] :)
I think those definitions go to a header file. But how different will it be if he use existing types with an abbreviation system? And is this feature available with some IDE?
i love this blog, i hold Chris Wellons to a very high estime BUT i utterly disapprove the usage of macros so much, especially to wrap cstd types, functions, etc.
I'm currently on a C++ (mostly C with C++ compiler) trip. It does make some things easier and some things harder. It makes it easier to work with C++ developers :-). I sometimes use the more involved C++ features but often regret it after because of complications.
But one thing that makes it worth it is the removal of the struct tag space. I have a strong dislike for the struct tag boilerplate in C, but the alternative -- typedef boilerplate -- in C is unbearable to the point that I have a macro to define structs in C that does this automatically.
#define STRUCT(name) typedef struct name name; struct name
STRUCT(Foo) {
int x;
int y;
};
But macros often come with disadvantages. In this case it's that many IDEs have trouble finding the struct definitions from a usage site.
In general, the style the author has adopted is to introduce brevity where he can, and use wrappers over what would have been standard and idiomatic C code. In most situations, these conventions aren't good in a non-solo project, because they simply aren't as obvious to the programmer.
He says so himself:
> I don’t intend to use these names in isolation, such as in code snippets (outside of this article). If I did, examples would require the typedefs to give readers the complete context. That’s not worth extra explanation. Even in the most recent articles I’ve used ptrdiff_t instead of size.
You require extra work to understand his basic types before reading even a short snippet, so he doesn't use it when he wants people to read short snippets.
Introducing additional stuff the programmer must remember that does not add any safety is pointless busywork.
A non-complete summary of his conventions:
1. typedef standard typenames to 3-char symbols,
2. remove qualifiers like const,
3. use macros to reduce the amount of typing the programmer does,
4. typedef all structs (and enums too, I assume)
5. A macro-ized string-typed with prefixed-length.
> Starting with the fundamentals, I’ve been using short names for primitive types. The resulting clarity was more than I had expected,
This isn't clear: `int8_t` is a lot clearer to a C programmer than `i8`, because a C programmer has already internalised the pattern of the stdint.h types. This is going to lead to subtle bugs as well: quick, according to his convention, what is the % specifier for `byte`?
You can use %c, but that gives you an ascii character (which is not what we think of when we say 'byte').
If you use PRIu8 the compiler might give warnings because `char` might be signed. The best option is to just not use `byte` and use `uint8_t` instead (or, in his system, `u8`).
Same with `b32` vs `i32` - it's a distinction without a difference and mixing these types won't give compiler warnings, while it is almost certainly an error on the part of the developer. Use `bool` if you don't like `_Bool`.
In general I try to take advantage of whatever typing C provides; I don't try to subvert it because I want the compiler to warn me when my intention doesn't match the code I wrote.
> No const. It serves no practical role in optimization, and I cannot recall an instance where it caught, or would have caught, a mistake.
I disagree with dropping `const`.
1. It's useful as an indicator to the caller that the returned value must/must not be freed. It's a convention I use that makes it easy to visually spot memory leaks.
Of the two functions below, it's clear to me which one needs the returned value `free()`ed and which one doesn't.
2. It actually does catch a lot of problems, because the compiler warns me when I attempt to modify a value that some other code I wrote never intended to be modified. It's about intention, and when I know it is safe to modify the `const` value, then I have to explicitly cast away the const to compile my program. Anyone reading the program will know that the modification of the const-qualifed value is intentional, and therefore safe.
> #define s8(s) (s8){(u8 *)s, lengthof(s)}
This is interesting. I will try this out in my next project. I do think that there'll be quite a few compiler warnings for sign-mismatch though. This is the second "I wonder what the sign is" question for programmers reading his code - it means that his code has to compile with the flags that he compiles it with (I assume he's passing a flag to force chars to a particular sign). You can't simply compile his code in another project unless you copy his flags, and those flags may conflict with the new projects flags.
I also wish that he'd showed a few examples of how having the length helps - what is presented in the post doesn't show any additional string safety over using nul-terminated strings. All those macros, including the one that creates the struct, could be written to operate on null-terminated strings. In essence, the length can be simply unused for everything! Where's the safety!?
> It’s also led to a style of defining a zero-initialized return value at the top of the function, i.e. ok is false, and then use it for all return statements. On error, it can bail out with an immediate return.
I use a similar pattern, but I use `goto cleanup` on all errors; you can't, as a general pattern, return early in a non-trivial C function without leaking resources. You can, as a general pattern, `goto cleanup` in every C function to clean up resources. I prefer the general pattern that I use everywhere rather than having to ensure that all resources acquired up to that particular return statement are released.
> rather than include windows.h, write the prototypes out by hand using custom types.
I think this is a very bad idea: you can't depend on the headers not changing after a compiler or library update. Sure, maybe in practice, all the Windows types and declarations don't change all that much, but I wouldn't want to be the developer trying to hunt down a bug because the interface to some function has changed and the compiler isn't giving me errors.
All in all, I dunno if I would look forward to working on a team with these conventions - the code is harder to read, doesn't work in isolation, needs custom flags, and introduces a string type without introducing any string safety with it.
Fascinating how he's incorporated lessons from Go and Rust. The short primitives are definitely a boon for readability and using the s8 type over null-terminated strings is something that should be built in.
Hopefully I never need to review code written with these definitions. It's an awful idea to do this.
The first section shows why some languages have no 'typedef': introducing another layer of aliases is just not a good idea. It's confusing, it changes the appearance to basically a new language. Just use the standard names, instead of redefining your language, like everyone else. This style is almost as bad as '#define begin {'.
Many of the other defs are obfuscations or language changes -- this coerces C to something else. I'd not like to read code written with this, as it heavily violates the principle of least astonishment (POLA).
(As a side note, I don't understand the #define for sizeof. The operator sizeof returns size_t -- it's size_t's definition, so what is this for?)
That one is dubious. Char has magic aliasing properties that uint8_t might not have (iirc that was contentious in a GCC bug report) and it will be signed on some platforms and unsigned on others, which changes implicit integer conversions.
Missing from this is to embrace attribute((overloadable)) and attribute((cleanup)).
Overloadable is the sane, useful alternative to the thing standardised as _Generic. The C _Generic will let you define an overload set, with some weirdness around type conversions, provided you write the entire set out as a single _Generic expression, probably wrapped in a macro. If you want to dispatch on more than one argument, you nest _Generic expressions. If you want to declare different functions in different headers - maybe you want 'size(T)' defined on various types in the codebase - you can't. If you don't like the idea of thousands of lines of distracting nonsense in the preprocessed output, tough. Or - use overloadable, get open overload sets, minimal compile time cost, obvious intermediate IR, everything works. Prior art is all of C++, so talking decades of the tooling learning to deal with it.
Cleanup is either a replacement for raii, or a means to have debug builds yell at you when you miss a free. It looks like that got warped into a thing called 'defer' with different behaviour that didn't make it through the committee last time.
Other than that, ad hoc code generators work really well with C. Especially if you're willing to use some compiler extensions. Code generators + overloadable will give a fair approximation to templated data structures without going deep into the insanity of the preprocessor. If the overloadable functions are static inline forwarding things in a header they don't even mess up symbol names; you just get a straightforward translation to vector_float_size or whatever.
Personally I've given up on ISO C. I'd quite like to code in a dialect of C99 with a few of the GNU extensions and the equivalent of `fno-strict-aliasing`, but C with the pointer provenance modelling and an accretion of C++ features has no personal value. Currently still using clang with flags to make it behave like that but I'm conscious that's on borrowed time - the application performance friendly aliasing rules are the default and gaining popularity, and relying on opt-out flags is a means of opting into compiler bugs.
Semi-actively seeking something that will let me write assembly without the hassle of manual register allocation and calling conventions. Old style C with some of the warts bashed off would be good for that.
I might just be a grumpy old dev but a lot of this stuff gets an immediate no from me because it’s so unidiomatic. You have to unlearn the accepted way of doing things and you end up with a codebase that is just so foreign to anyone looking at even a small chunk of it, unless they are committed to really learning to do things your way.
Everyone knows what a uint32_t is when they see it. The cognitive overhead (until it becomes second nature, obviously) just feels like a heavy price to pay in order to save yourself a few characters.
(Some other stuff in the proposed coding style still gets a thumbs up from me, though.)
Unpopular opinion: something being unusual does not necessarily mean it is bad. Yes, it will look foreign to random people looking at it, but if someone wants to seriously work with it, it will only take a few days to get familiarised with it.
The justification of "cognitive overhead" is, from what I have seen, a shibboleth for rejecting "outsider" code written by someone not conforming to the language standards by claiming it is harder to understand. Personally, I would say that says more about the person's inflexibility and/or OCD, not the writer's style.
I am not saying every style is good (some simply obfuscate things and/or make things overly verbose or unreadable) but rejecting a style solely based on it being "non-idiomatic" is not a good thing.
Well said. This is also something that I don't buy from the criticism towards Lisp. Something along the lines of: "Lisp did not become mainstream because everyone writes their own little language for their project, and so no one can understand other project's code."
pg wrote excellent arguments against this criticism in "On Lisp" § 4.8 Density, which apply just as well to the discussion above:
“If your code uses a lot of new utilities, some readers may complain that it is hard to understand. People who are not yet very fluent in Lisp will only be used to reading raw Lisp. In fact, they may not be used to the idea of an extensible language at all. When they look at a program which depends heavily on utilities, it may seem to them that the author has, out of pure eccentricity, decided to write the program in some sort of private language.
[...]
If people complain that using utilities makes your code hard to read, they probably don’t realize what the code would look like if you hadn’t used them. Bottom-up programming makes what would otherwise be a large program look like a small, simple one. This can give the impression that the program doesn’t do much, and should therefore be easy to read. When inexperienced readers look closer and find that this isn’t so, they react with dismay.”
I don't even think that's a controversial opinion. Breaking convention isn't inherently bad; it has costs, some of which you described. In this case specifically, the novelty is not justified by any significant benefit.
The u32, i8, etc type aliases are the least offensive parts of this to me, even though I rarely see them in C code. I think those are pretty clear.
b32, size (ptrdiff_t), usize (size_t), nothing for ssize_t... what? Those are unidiomatic and also kind of weird. The macros... some are fine, some are weird.
If this makes the author more productive in C, it might behoove them to see if a higher level language like Rust would meet their needs.
I want to both be polite to the OP but also agree.
Writing correct C is hard, so I’m not going to knock anyone who found stuff that helps them.
But pound defining shit to things you know via your Hungarian notion? Write some elisp. My Haskell programs don’t actually have Unicode lambda in them.
Pascal strings? Yeah, that’s probably the better call, but why not use C++ or Rust or something where a bunch of geniuses got it right already?
>Pascal strings? Yeah, that’s probably the better call, but why not use C++ or Rust or something where a bunch of geniuses got it right already?
I'll be diving into C fairly heavy for the first time ever next year. I intend to skip right past pascal strings and implement/use free pascal's AnsiString or UnicodeString, both of which are reference counted, have a length (with no limit) and are guaranteed null terminated. I've stored a gigabyte in them in a few milliseconds. There's no need to allocate or free memory either... it's like freaking magic.
If at all possible, I'll just lift the one from Free Pascal. Otherwise, it's yet another chore in the process of bringing MStoical (a modern port of the STOIC language) to life
Too many geniuses spoil the soup. We all see many of thousands of recipes and techniques over the course of our careers and it makes sense that each of us are continuously curating the small subset that we reach for in every project. I enjoy seeing the workbenches of other craftsmen, and nothing here looks unfamiliar.
> I might just be a grumpy old dev [...] Everyone knows what a uint32_t is when they see it.
You might not be old enough then :-P many codebases typedef their own int types. See glib (gint, gshort, gint32, etc), SDL (Sint32, Uint32, etc) off the top of my head and there are many that define types like "int32" or "i32" like the linked article.
I cut my teeth on DWORD, PHALF_PTR, and friends, so my issue is not so much “don’t know how to grok this” as it is “we finally have sane, universal type names and you’re throwing them away.”
Sure, the _t suffix may be an eyesore but I’ll take size_t over “size” any day.
I mean to each their own, but in my own experience, I value being able to easily and reliably copy-and-paste code snippets across projects (and I have a million of them, across several evolutions of my own personal coding styles and conventions) or files without worrying about whether the typedefs are in scope, polluting a namespace with possibly conflicting names or macros, etc.
I also have often found myself publishing “for my own use only” code as open source later and like to keep things understandable to maybe help teach someone something someday.
How so? As the page you linked mentions, simply casting 'const T *' to regular 'T *' is well-defined; it's only modifying a const object through the pointer that's UB (C17 6.7.3/7).
> I don't believe assignments are sequence points and only the function call is.
Assigments within expressions don't create sequence points. However, the expression of an expression statement is a full expression (i.e., not a subexpression of another expression), and there is a sequence point between each pair of full expressions (C17 6.8/4). In other words, the semicolons create sequence points.
To be clear, it's only UB if the object was defined const, which is the case given he wrote:
> One small exception: I still like it as a hint to place static tables in read-only memory closer to the code. I’ll cast away the const if needed.
So you are correct on this point. Funnily enough, such objects are relatively rare IME, so I had to double-check to see that he was advocating it specifically in the rare case where it must not be applied.
Given that this particular undefined behavior usually causes crashes in practice, I expect the author is talking about casting away the const but not actually writing to the pointer. Which is legal.
He never said he needs to cast away const to do what he is attempting to do, he just said that he wants to cast away const to reduce clutter, even though the program would have the same semantics as if he kept the const.
If only there were a way to indicate the function argument isn't mutated. </s>
My spidey senses tingle whenever I see const-ness cast away because it almost always means something is wrong. Either a function is missing a qualifier on an argument, or something very unsafe is happening. Why force callers to cast away const-ness in hopes that everything will be fine when you can just write the correct function signature.
C is wonderful so if you an find a project at work with a lot of C code then it'll remain forever. All these other fad languages will die before C ever does.
Well, from a team perspective, it's extremely opinionated and hostile to newcomers and messes with core language features at the expense of readability. If it's your personal codebase then do whatever, obviously.
It doesn't mess with a core language feature to alias 'u8' to 'uint8_t'. It's a reasonable use for the name and one used in other languages (e.g., Rust). There's nothing in the C standard that defines or uses the 'u8' name.