
A critique of "How to C in 2016" - gbugniot
https://github.com/Keith-S-Thompson/how-to-c-response
======
avian
> Zeroing memory often means that buggy code will have consistent behavior; by
> definition it will not have correct behavior. And consistently incorrect
> behavior can be more difficult to track down.

I agree that using calloc should not be an excuse for writing sloppy
initialization code. But in my experience, inconsistently incorrect behaviour
(e.g. heisenbugs that tend to appear and disappear depending on the content of
some random chunk of uninitialized memory that happens to be in an unfortunate
state after malloc) are one of the hardest type of bugs to fix. It's the type
of the bug that occasionally happens in production and typically can't be
reproduced in a controlled manner in a development environment. I certainly
prefer consistently incorrect behavior.

~~~
dietrichepp
The tools for rooting out these problems have gotten a lot better, thankfully.
I'm very fond of the address sanitizer. Using calloc() everywhere can prevent
the address sanitizer from finding problems.

~~~
neopallium
GCC (>=4.8) & Clang now have Address Sanitizer support for fast memory error
detection. About a 2x slowdown [1].

0\.
[https://github.com/google/sanitizers/wiki/AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
1\.
[https://github.com/google/sanitizers/wiki/AddressSanitizerPe...](https://github.com/google/sanitizers/wiki/AddressSanitizerPerformanceNumbers)

------
stupidcar
Ironic that he begins by taking issue with the idea that you should avoid
writing C if you can, then proceeds to provide the best evidence possible for
why it's true. Why on Earth would you voluntarily code in a language where
people can debate something as simple as which type to use for integers,
unless you absolutely had to?

~~~
EliRivers
Because it's simple and low-overhead and compilers for it exist on every lump
of hardware I've ever had to work with, from a clunky x64 to a PIC to a
TigerSHARC to a bizarre DSP thingy from a tiny fab lab somewhere.

A great many of the problems that people experience with C code can actually
be nullified (zing!) by finding a competent C programmer who does the job
properly. A tool being easy to misuse doesn't mean we shouldn't use it; it
just means we should demand high standards of the programmers and not employ
some chancer who learned how to program from the back of a cereal packet. I
give young children plastic safety scissors. I give competent adults chainsaws
and scalpels.

~~~
krylon
The key phrase here is " _if_ you can avoid it".

When your target platform is an 8-bit microcontroller or an embedded system
with hard real-time requirements, your options are very limited.

But there are plenty of situations where there are alternatives, and often
these allow you to solve your problem in less time, with much lower risk of
bugs than in C. As always in life, one has to evaluate one's options, consider
their respective advantages and weaknesses and then make a choice.

If I had to write a document like that, I would use a different phrasing from
"if you can avoid it". But the sentiment - to avoid using C where superior
alternatives exist[1] - is something I agree with.

> A great many of the problems that people experience with C code can actually
> be nullified (zing!) by finding a competent C programmer who does the job
> properly.

That is kind of like saying the solution to traffic accidents is only letting
people drive that don't cause accidents or demand of people that they drive
sufficiently carefully. Sure, that is something worth wishing for - but it is
not very realistic to expect this to actually happen.

Not that I disagree that C is a very demanding language. Even the best
programmers make mistakes. But this is the reason the author of the original
document the linked article refers to advised people to avoid C where
possible. Most programmers will be better off for most of their projects with
a language like Go, or Python or whatever.

[1] Of course, if by "alternatives" I mean languages like Perl, Python or
Ruby, it is worth keeping in mind that their main implementations are written
in C.

~~~
pjmlp
> When your target platform is an 8-bit microcontroller or an embedded system
> with hard real-time requirements, your options are very limited.

You mean like the CPUs supported by MikroElektronika's Pascal and Basic
compilers?

[http://www.mikroe.com/8051/](http://www.mikroe.com/8051/)

[http://www.mikroe.com/pic/](http://www.mikroe.com/pic/)

[http://www.mikroe.com/avr/](http://www.mikroe.com/avr/)

~~~
krylon
I get your point. However, I did not say there were _no_ alternatives, just
that they were "very limited" (when compared to, say, Debian running on
amd64).

------
legulere
I disagree with many of the arguments. Many of them are ignoring the reality
of today.

One example:

> If you want bytes, use unsigned char. If you want octets, use uint8_t

Bytes and octets are the same today. I never came across a system where this
wasn't true. I was never told about a system where this wasn't true. I wasn't
even told that there once were systems where this wasn't true until fairly
recently.

~~~
ue_
>Bytes and octets are the same today. I never came across a system where this
wasn't true.

The fact that all modern systems use eight bit bytes is no reason to assume
that a byte is eight bits. The fact is that a byte is not eight bits large.

People may wish to run code on "historical" systems, and someone may wish to
create a 9-bit byte system for fun.

To write C code with an assumption that things are one way, when the standard
does not define this assumption, screams of unportable code. You might as well
use GCC extensions - after all, for a long time, what would it matter?
_Everyone_ used GCC, right?

~~~
lultimouomo
By the same argument we shouldn't assume that bits have only two values -
_what, you mean I can 't write C for my ternary logic computer?_

You need to make a decision about which simplifying assumptions you make - C
has decided not to assume there are 8 bits to the byte. I think it's arguable
that in 2016 this is not the best decision, and they could get rid of some
cognitive overhead by making the simplifying assumption.

~~~
dietrichepp
Unfortunately, there are real world systems with C compilers that do not have
8 bit bytes. It's easy to forget the embedded world.

C is one of the very, very few languages that you can reasonably expect to run
on your weird embedded microprocessor, microcontroller, DSP, or whatever.

Bit, however, means "binary digit". That's the literal definition of the word.
Historical usage of the word "byte" has varied.

~~~
lultimouomo
> C is one of the very, very few languages that you can reasonably expect to
> run on your weird embedded microprocessor, microcontroller, DSP, or
> whatever.

I'm not an expert on DSP processing in any way, but I'd argue that if the
environment is so peculiar that it doesn't have 8-bit bytes, you won't be able
to use so many of the (formally optional) parts of C that you might as well
call your code "weirdC" and do without char. It's not like you're going to use
normal C libraries, and you are most probably relying on a specific compiler
implementation and a lot of platform specific weirdnesses.

~~~
hamiltonkibbe
You can use char, and for the most part you won't notice any difference, until
you start looking at the memory display in your IDE and have a short-lived WTF
moment. A c-string looks strange in the memory view because there's a NULL
octet between each character but that's an implementation detail that you
don't really notice 99% of the time, because everything on the processor is
designed to use 16-bit words. If you write a buffer of chars to the UART tx
register, it will only put the half of the word with the character data onto
the wire, just like with any other microcontroller, just like with PySerial,
and just like with JS and websockets. That's why the C standard doesn't say a
char is 8-bits, it just says that sizeof(char) is 1. I would argue that any
code doing anything under the assumption that a char variable occupies 8 bits
in memory and that the next char in an array is physically adjacent to it with
no wasted space in between is just poorly-written and inherently not portable.

As for why this is the case, Generally DSPs are designed to do a few tasks
very well (multiply-accumulate being the most obvious example, simultaneous
reads from X and y memory, etc.) and are generally optimized for a specific
word size. If you really need to access an octet on E.g. A 16-bit TI DSP you
can, you just need to shift and mask. That's obviously not very efficient,
because that's not really what it's optimized for. If that's what you really
need, you picked the wrong part. You're gonna be filter-/FFT-/DCT-/whatever-
ing way more 12 or 16-bit ADC samples per second, per dollar, and per
milliWatt on that 16-bit DSP than you would be on your octet-addressable
MSP430.

Regarding "Normal" c libraries, If they're valid C they should work just fine
for the most part, With some obvious exceptions: you're not going to have a
fun time with your "floats" on a processor without FP HW. That's why there's a
standard, if you're not relying on undefined or implementation-defined
behavior, you'll be fine(ish)

~~~
lultimouomo
> I would argue that any code doing anything under the assumption that a char
> variable occupies 8 bits in memory and that the next char in an array is
> physically adjacent to it with no wasted space in between is just poorly-
> written and inherently not portable

I don't think that's fair at all. It's well written and perfectly portable,
_under that assumption_. If that assumption happens to hold for all the
systems that could is meant to run on, why shouldn't you make it? Why assuming
that unsigned integers overflow to 0 is fair, and assuming 8 bit bytes is not?
They are both pragmatic choices. I think arguing that the C standard didn't
make the best choice is at the least a reasonable position. Saying that people
who disagree with you on this are writing bad code is not the right way to
approach the subject, IMO.

> If they're valid C they should work just fine for the most part

I've yet to find a library that is valid C, in the strictest sense of the term
- i.e., no floats or doubles, no 8-bit char, no uint8_t, etc. I'm sure there
are, among libraries _meant to run on DSPs_. Does it make sense for them and
libraries meant to run on bigger processors to adhere to the same exact
standard? I'm not sure.

~~~
GFK_of_xmaspast
> If that assumption happens to hold for all the systems that could is meant
> to run on, why shouldn't you make it?

Because what's going to happen two years down the road when you need to get
that code working on a different platform.

(In the old days, they called that thinking "all the world's a VAX", which
became "all the world's linux" in the not-quite-as-old days)

~~~
lultimouomo
> Because what's going to happen two years down the road when you need to get
> that code working on a different platform.

And what happens when the different platform handles overflow differently? Or
has non-IEEE floats? Or breaks whatever assumption the C standard already
makes?

Here's the thing: C is _already_ built on assumptions, which might or might
not hold for a specific platform.

Choosing which assumptions we hold true is a pragmatic choice that must be
done. C does it, and does it consciously - but it doesn't mean that it nails
every choice.

~~~
GFK_of_xmaspast
If you're in a situation where you need specific implementation details of
floating point arithmetic, then may the devil have mercy on your soul, because
god certainly won't.

~~~
Solarsail
Wouldn't debugging / testability imply this is a fairly large proportion of
programs? Having a calculation give different results on different hardware
would make testing difficult: [http://yosefk.com/blog/consistency-how-to-
defeat-the-purpose...](http://yosefk.com/blog/consistency-how-to-defeat-the-
purpose-of-ieee-floating-point.html)

~~~
GFK_of_xmaspast
Your own link explains how easy it is to get different results even on the
same hardware depending on compiler flags.

------
coldtea
I find myself to disagreement with almost ALL the points this "critique"
makes.

They are either minor pedantic corrections, "it's how it's always been done"
things, or "you might have your reasons for doing it in a bizarro way"
affairs.

Sorry, but for 90% of use cases, the original article has better advice.

~~~
Manishearth
Agreed. Most of the integer stuff is pedantry ("doesn't make it non-standard"
\-- who cares, the point being made was that these types are better). There
are some valid and useful points being made, but most would serve better as
caveats on the original article rather than straight out "don't listen to
this".

------
eloy
Not to be confused with Ken Thompson.

~~~
rocks
Almost fell for that....

------
rplnt
What's up with creating "empty" repositories for articles? Isn't there a
better solution? I think using gist would be more suitable if you want the
article to be attached to your github username...

------
Anilm3
One of the most important reasons to use calloc, other than the zeroing of the
allocated memory, is the fact that it checks for integer overflows (at least
on most implementations).

For example, when allocating an array of n elements, in malloc you would do
something like the following:

ptr = malloc(sizeof(int) * n);

Which could potentially lead to an overflow of the size passed to malloc,
hence allocating a significantly smaller buffer and end up accessing adjacent
memory which hasn't been allocated to the buffer (buffer overflow)

Calloc requires two arguments, one for size and one for number of elements,
which allows it to check for an overflow before performing an allocation (e.g.
by using SIZE_MAX):

ptr = calloc(n, sizeof(int));

This is not to say that zeroing memory is not useful. There are many
situations in which a zero in a memory block represents the end of the usable
data, as for example in a string, so zeroing memory in those situations is
definitely recommended.

~~~
xcgvgh
Using calloc instead of malloc, just for checking arithmetic overflow, is
superfluous. You can always write a check or a wrapper for malloc that does
that automatically. It is so easy I can write it here:

    
    
      _Bool Check( const size_t a , const size_t b )
      {
        if( a > SIZE_MAX / b )
        {
           return false;
        }
        return true;  
      }
    

(If proper warnings are enabled, it also provides extra integer type
checking.)

~~~
Anilm3
Agreed, but there's little point in reinventing the wheel in this particular
case.

~~~
xcgvgh
There is no reinventing going on here. In C you have to write relatively low
level code. This includes manually calling allocs for objects, which includes
the code for checking your arithmetic. At some point you will have to write
that code. If you are smart you will wrap it into a function.

~~~
Anilm3
I don't see the point of the discussion, that piece of code (or relatively
equivalent), is already being performed by a libC function (calloc, in this
case), how is that less convenient than you writing it yourself?

It has nothing to do with how smart you are or how low level C is, you are
effectively reinventing that part of calloc by wrapping malloc.

~~~
masklinn
There's a point to be made that people _will_ be idiots and risk bugs for the
sake of efficiency.

That's why OpenBSD has reallocarray(3) (which serves as a fine malloc, though
the kernel not having reallocation has mallocarray(9))

------
appleflaxen
> The first rule of C is don't write C if you can avoid it.

My C skills are non-existent, but it occurs to me while reading the response
that when two people who have so carefully considered the subject can arrive
at such different places... then yes, you probably shouldn't write C if you
can avoid it.

~~~
benbenolson
C is the only language that will run on certain strange systems, and that is
the only reason that most of these quibbles even matter. It's more of an
argument of "What is the most portable way to write C?"

C is a very small and elegant language that never lies to you. There aren't
any useless functions that are specific to one use case, and things that you
don't really, really need are simply left out of the language. This means that
instead of spending hours learning the language and all of its quirks, you get
to focus on the problem that you're trying to solve, and come up with the most
efficient solution possible (there's a reason why almost all performance tests
use C as 1). You don't have any language quirks like those that are discussed
in this article, unless you are running an architecture that has similar
quirks, in which case you're going to have quirks no matter what you do. Edge
use cases, as with everything, may cause a bit of quirkiness.

That being said, don't be scared off from C just because of these pedantic
arguments.

~~~
GregBuchholz
>C is a very small and elegant language that never lies to you.

I've liked:

[http://www.gowrikumar.com/c/index.php](http://www.gowrikumar.com/c/index.php)

..and...

[https://www.google.com/search?q=expert+c+programming+deep+c+...](https://www.google.com/search?q=expert+c+programming+deep+c+secrets)

~~~
GregBuchholz
Also don't miss:

[http://blog.robertelder.org/7-weird-old-things-about-the-
c-p...](http://blog.robertelder.org/7-weird-old-things-about-the-c-
preprocessor/)

------
grabcocque
This, ladies and gentlemen, is why C gives me the willies. And why this
happens:

[http://www.cvedetails.com/vulnerability-
list/year-2016/month...](http://www.cvedetails.com/vulnerability-
list/year-2016/month-1/January.html)

~~~
benbenolson
Those are due to programmer error. The fact that C doesn't lie to you means
that it takes a bit more experience to program properly; as with all things,
with great power comes great responsibility. You don't blame the gun for
killing people, you blame the person for shooting it.

~~~
whyever
> Those are due to programmer error.

How does that help? Almost all security vulnerabilities are due to programmer
error.

> You don't blame the gun for killing people, you blame the person for
> shooting it.

It's more like choosing between a gun that makes it really easy to shoot
yourself in the foot by accident and one that is safer.

------
oldmanjay
Reading this as someone who neither loves not loathes C, I would not have
written this as a series of rebuttals. It makes the essay seem defensive and
nit picky. I get more of a sense of emotional attachment to a particular way
than any sort of reasoning.

------
unwind
Aah, this felt like a much-needed piece. The original text was so long, it
kind of made my eyes glaze over and I never found the time to do this level of
detailed critique. Well done!

------
xcgvgh
>size_t is defined as "an integer capable of holding the >largest array index"

>>No, it isn't.

Could anyone elaborate?

I don't see how could someone allocate more than SIZE_MAX bytes with
(m/c/re)alloc since they all take size_t as an argument, and size_t is the
type used for defining array sizes. Thus size_t can hold the largest array
index (in fact it can hold the largest_index+1 since arrays are zero based).
Any counterarguments?

~~~
chrisseaton
Maybe he's taking it very literally, as I think size_t is literally defined as
'the integer type of the result of the sizeof operator'. So it literally isn't
defined as 'an integer capable of holding the >largest array index' as that's
not the definition and words they use.

~~~
viraptor
But if sizeof returns size_t, then your can't have indexes larger than that.
Otherwise you could point beyond the largest in-memory object possible. (Using
smallest (byte) indexing)

But yes, he may be arguing the definition rather than meaning.

------
cesarb
> Converting to uintprt_t will give you some result -- but that result won't
> be useful. Unless you're writing low-level system code, don't do any pointer
> comparison or arithmetic unless both pointers point into or just past the
> end of the same object.

I disagree. There are situations where you need an ordering between unrelated
objects, but the precise order doesn't matter, only that it's consistent.

An example is when you have a complex data structure with one lock per node,
and you need to lock more than one node at a time for some algorithm.

To prevent a deadlock, you must always take the locks in the same order.
Comparing the address of the locks gives you a total order over all the locks,
which is enough.

~~~
zodiac
I think comparing (using <) pointers to unrelated objects is undefined
behavior. So you do not get a consistent ordering.

~~~
cesarb
The part I quoted is talking about casting the pointers to uintptr_t (which is
defined behavior) and comparing the resulting uintptr_t (since it's an
integer, comparing it is defined behavior). It's not talking about comparing
the pointers directly (which is undefined behavior). His argument is that,
while defined behavior, "(uintptr_t)ptr_a < (uintptr_t)ptr_b" is useless; I
disagree, since it's a well-known trick to prevent AB/BA type deadlocks.

------
fithisux
The discussion usually boils down to

a. We need a better C

b. We need better programmers.

Personally I believe C tries to hard to be high level instead of what it
really is. Newer revisions should try to make it lower level. For example I
never could understand why we need enums. Enums are very useful for Java or
Golang which are high-level. C never needs enums. Why should I ever put const
in the argument of a function? It makes sense for D but for C no. Why should I
ever use complex in C. Is there any HW that supports complex numbers? If it
becomes standardized, then why not?

C should be good to write kernels and simple utilities and simple compilers.
For the rest, use a high-level language.

~~~
woodman
> For example I never could understand why we need enums.

> Why should I ever put const in the argument of a function?

Reducing potential state and conveying programmer intent. Both lead to safer
and more performant code (by way of compiler optimizations).

~~~
fithisux
What you say is incompatible with the "Cross-platform assemble", simple
language ideals.

If that was the intent why not throw namespaces and other goodies in?

~~~
woodman
Well those ideals are pretty subjective, I'd say that the ideal language would
be flexible, performant, safe and productive. As those goals conflict with one
another, the ideal language would find the perfect balance. I don't know if C
has found that balance, but it has had a pretty good run.

Defining sets (enums) and memory access constraints (consts). I wouldn't
describe that as high level from a conceptual or practical perspective.
Namespaces are simple conceptually but not in practice. The same can be said
about the beautiful logical consistency of S-expressions.

------
cataphract
Isn't another problem of "ptrdiff_t diff = (uintptr_t)ptrOld -
(uintptr_t)ptrNew;" that you're assigning an unsigned value to a signed
variable?

If ptrNew is larger than ptrOld this can give a very large value via wrapping
around, so large that it will probably be bigger than the max value of
ptrdiff_t, making the assignment a form of signed overflow.

~~~
xcgvgh
The assignment is a conversion from unsigned to signed, if the value cannot be
represented by the signed type, the result is implementation defined or an
implementation defined signal is triggered.

~~~
cataphract
Right. Implementation defined, not undefined. Still not great for portable
code, which is what the article is all about.

~~~
xcgvgh
It's also very dangerous.

------
geocar
I'm divided.

I think it is good to share our ideas and our theories with each other because
the discourse can stretch our minds and help us better imagine the perspective
of another. Dialogue can foster creativity and invention, and an honest
argument can embiggen friendships like nothing else.

On the other hand, opinions are like assholes.

------
Mayzie
> But zeroing a structure with memset, though it will set any integer members
> to zero, is not guaranteed to set floating-piont members to 0.0 or pointers
> to NULL (though it will on most systems).

Why is that? On those systems, is a sequence of 0 bits not considered 0.0f or
NULL?

~~~
masklinn
Not sure about fp, but the C standard very carefully and explicitly notes that
NULL pointers doesn't mean a runtime value of 0. The comp.lang.c FAQ 5.17[0]
further provides examples of nonzero NULL pointers e.g.

> The CDC Cyber 180 Series has 48-bit pointers consisting of a ring, segment,
> and offset. Most users (in ring 11) have null pointers of 0xB00000000000.

This is a bit confusing because the "null constant" is an integral 0 e.g.
`(void _) 0` is a null pointer, but it 's defined as an arbitrary compiler
instruction. `(void _) foo` where foo is 0 at runtime may not be a null
pointer at all.

[0]
[http://c-faq.com/null/machexamp.html](http://c-faq.com/null/machexamp.html)

------
lmm
> int in particular is going to be the most "natural" integer type for the
> current platform. If you want signed integers that are reasonably fast and
> are at least 16 bits, there's nothing wrong with using int

The slight performance gain on a few systems that you get from using int
rather than int16_t is very rarely worth the hugely increased risk of
introducing platform-specific undefined behaviour, IMO.

> float and double are very commonly IEEE 32-bit and 64-bit floating-point
> types, particularly on modern systems, but there's no guarantee of that in
> the language.

True. Is there an alternative? Something like float32_t?

> But more often all you need is a particular range of values. For that, you
> can use either the [u]int_leastN_t or [u]int_leastN_t types, or one of the
> predefined types.

Sure - but again, how much performance do you gain from using a _leastN type,
and how much reliability do you lose?

> It's capable of holding the size of the largest object your implementation
> supports. (There's an argument that that's not necessarily guaranteed, but
> for practical purposes you can rely on it.) It can hold the largest memory
> offset if all offsets are within a single object.

But you just said in the previous section that it only makes sense to perform
pointer arithmetic within a single object anyway.

> There is a widespread convention, particularly in Unix-like systems, for
> functions to return 0 for success and some non-zero value (often -1) for
> failure. In many cases different non-zero results denote different kinds of
> failure. It's important to follow this convention when adding new functions
> to such an interface. (0 is used for success because typicallyi there's only
> one way for a function to succeed, and multiple ways for it to fail.)

The convention is not as widely accepted as this section implies. When adding
to an existing system one should follow that system's conventions. When
writing for Unix one should follow the Unix conventions. But there are other
conventions (e.g. the VMS one) that also see use in C.

> I don't often use automatic formatting tools myself. Perhaps I should.

Yes, you should.

> Zeroing memory often means that buggy code will have consistent behavior; by
> definition it will not have correct behavior. And consistently incorrect
> behavior can be more difficult to track down.

Disagree strongly. Consistently incorrect behaviour is much easier to track
down than inconsistently incorrect behaviour.

> if you're trying to program defensively, you might consider initializing
> allocated memory to some value that's known to be invalid rather than one
> that might be valid.

Agreed - but how? One of the biggest problems with C is that it is extremely
hard to mark a state as invalid (hence the whole integers for error codes
discussion earlier, where really an error code should have a different type
from a valid integer result).

~~~
snakeanus
>The slight performance gain on a few systems that you get from using int
rather than int16_t

int16_t is also unportable as he mentions later

>the hugely increased risk of introducing platform-specific undefined
behaviour

What risk? Please explain how int introduces undefined behaviour.

>how much performance

It's not about performance. intN_t is NOT guaranteed to exist.

>But you just said in the previous section that it only makes sense to perform
pointer arithmetic within a single object anyway.

Yes and that's right.

~~~
lmm
> int16_t is also unportable as he mentions later

It's theoretically unportable. I have yet to see a real system to which it's
unportable.

> What risk? Please explain how int introduces undefined behaviour.

(Signed) integer overflow is undefined behaviour. It is very common to develop
on platforms on which int is 32 or 64-bits. This carries a high risk of
accidentally overflowing the limits of a 16-bit integer and not realizing.
When you do this you don't notice anything (because it doesn't show up on your
development machines), but you've introduced undefined behaviour that can
easily manifest in practice on machines with 16-bit int.

