
Guide to Advanced Programming in C - pfacka
http://pfacka.binaryparadise.com/articles/guide-to-advanced-programming-in-C.html
======
ghswa
I'm enjoying this so far although a couple of things have made me scratch my
head in the example code allocating a vector.

    
    
      int create_vector(struct vector *vc, int size) {
    
        vc->data = 0;
        vc->size = 0;
    
        if (vc == NULL) {
          return VECTOR_NULL_ERROR;
        }
    
        /* check for integer and SIZE_MAX overflow */
        if (size == 0 || size > SIZE_MAX) {
          errno = ENOMEM;
          return VECTOR_SIZE_ERROR;
        }
    

Accessing the fields of vc before the NULL check seems like an error. Also,
would it not be simpler to change type of the size parameter to size_t so that
it's the same type as the size field of the vector struct?

~~~
vinkelhake
Yes, by having the NULL check after those members have been accessed, you're
telling the compiler that vc cannot be NULL (because accessing the members if
vc is NULL would be undefined behavior). The compiler can (and GCC _will_ )
eliminate the NULL check completely.

~~~
signa11
huh :) have you actually _tried_ doing that i.e. call the function with NULL
params and see ? you will see a segfault. anyways, rather than rely on
vagaries of -O2 implementation on compilers, i would rather be explicit about
it and just assert invariants...

~~~
vinkelhake
I believe you got my comment completely bass ackwards.

I'm not describing some optimization that you, the programmer, can do. I'm
talking about what the compiler can (and will) do by assuming your code
doesn't have undefined behavior.

In this particular case, the compiler recognizes that if the pointer is NULL,
undefined behavior has already been invoked and it can therefore eliminate the
check.

I recommend this series of posts on the LLVM blog:
[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know.html)

~~~
signa11
> In this particular case, the compiler recognizes that if the pointer is
> NULL, undefined behavior has already been invoked and it can therefore
> eliminate the check.

ah, makes sense thank you !

------
cjensen
Wow. That's full of falsehoods like these:

"What happens is that variable i is converted to unsigned integer." No: 'long
i' is converted to 'unsigned long'.

"Usually size_t corresponds with long of given architecture." No: For example,
on Win64 size_t is 64 bits whereas long is 32 bits.

If you're going to write about Advanced Programming, you should be careful to
actually be correct.

~~~
ahy1
> Wow. That's full of falsehoods like these:

> "What happens is that variable i is converted to unsigned integer." No:
> 'long i' is converted to 'unsigned long'.

Actually, _unsigned long_ is an unsigned integer. He didn't write _unsigned
int_.

> "Usually size_t corresponds with long of given architecture." No: For
> example, on Win64 size_t is 64 bits whereas long is 32 bits.

"Usually" is the keyword here. He could have said "Usually _size_t_ has at
least the same amount of bits as _long_ " and it would be better related to
the referred rule.

~~~
zurn
The "short" long is fantastically tasteless, it's not like source compatiblity
with Win16 is has resulted in 64-bit builds of apps appearing.

------
deletes
How does this check for anything??:

    
    
       if ( size == 0 || size > SIZE_MAX ) {...
    

size is not given a type, but if it is a size_t, then the comparison is
constant and if statements is always false, unless size == 0. The second part
is useless, since the maximum size of size_t == SIZE_MAX.

~~~
jwise0
Worse, it is not even a certain check against integer overflow. A particularly
large expression could have overflowed substantially enough that it becomes
positive again!

This, in fact, was the source of a vulnerability in PHP [1]. (The way they
'resolved' it, uh, isn't much better.)

There is almost never a magic bullet for integer overflow. Programming secure
systems requires thought.

[1]
[http://use.perl.org/use.perl.org/_Aristotle/journal/33448.ht...](http://use.perl.org/use.perl.org/_Aristotle/journal/33448.html)

~~~
comex
Note that since it says 'size == 0', it's not merely "not a certain check",
but a check that passes for almost all overflowing values :)

------
greenyoda
I'm not sure how malloc/free could be considered "advanced" C programming
techniques. It's pretty hard to write any non-trivial program in C without
using malloc/free and knowing the differences between static, stack-allocated
and heap-allocated memory.

~~~
adultSwim
You'd be surprised with how far you can get without dynamic memory or
recursion...

Still, I agree with your general sentiment. That some of these are "advanced"
topics is troubling. Too many programmers today don't know how a computer
works. Our ideas of "mastery" are way too low.

~~~
erobbins
in the modern climate of bootcamped rails hackers, knowing that memory even
needs to be allocated at some point is advanced.

------
sdegutis
I like the idea behind this article, but I'm a little skeptical of some of its
advice.

For instance, I don't think it's a good idea to recommend using Boehm GC as a
general solution to avoid memory management. It can't really tell the
difference between pointers and pointer-sized integers, which means it usually
leaks memory.

~~~
pfacka
I see, but could you please provide better alternative? I couldn't find
anything more alive and representative than Boehm GC.

~~~
sdegutis
C is inherently not suitable for a GC. If you want automatic memory
management, it's better to use something like Go. I'm not saying this is a bad
GC library, just that I'm not sure I would recommend it for general use when
writing C code.

~~~
TheCoelacanth
So to avoid using a conservative GC, you should instead use Go, a language
which also has a conservative GC.

------
eliteraspberrie
Good advice. Integer arithmetic is one of the trickiest aspects of C, and
dangerous in combination with the manual memory management. For more
information, see the free chapter of TAOSSA:
[http://pentest.cryptocity.net/files/code_analysis/Dowd_ch06....](http://pentest.cryptocity.net/files/code_analysis/Dowd_ch06.pdf‎)

There are (were) a couple bugs in the example code. Here are a some guidelines
that will help avoid those, and most problems with integer operations and the
heap in general.

First, don't mix unsigned and signed types in arithmetic; and always prefer
the size_t type for variables representing the size of an object.

Second, check for overflow before an operation, not after, like so:

    
    
        if (size > SIZE_MAX / 2) {
            goto error;
        }
        newsize = size * 2;
    

Third, always double-check the arguments to memory allocation functions,
especially for zero, because the result is not always well defined.

    
    
        if (size >= SIZE_MAX - n) {
            goto error;
        }
        foo = malloc(size + n);
        foo[size] = ...;

~~~
nkurz
The chapter sounded interesting, but your link didn't work. Here's a fixed up
version:
[http://pentest.cryptocity.net/files/code_analysis/Dowd_ch06....](http://pentest.cryptocity.net/files/code_analysis/Dowd_ch06.pdf)

(Not your fault. I too miss the good-old-days when copying a PDF link from
Google didn't involve multiple steps or URL decoding.)

------
antirez
Boehm is a bad advice in general... there are situations where it could work
maybe, like if you implement an interpreter for the fun of it or something
like that.

For many kind of programs reference counting is the way to go for C, it still
is manual, but an order of magnitude safer...

~~~
freyrs3
It also makes profiling much harder. Boehm doesn't play nice with valgrind
without a fair bit of work.

------
bch
The article says two different things about how free() works:

    
    
      1. In case of NULL pointer free does no action.
      2. In "double free corruption" section, it says "Could be caused by calling free with pointer, which is [..] NULL pointer"
    

So: which is it? Otherwise, there's no point in the NULL/assert() dance, you
can freely free() with impunity:

    
    
      struct foo *a, *b, *c;
      a=NULL; b=NULL; c=NULL;
      a=malloc(sizeof *a);
      b=malloc(sizeof *b);
      c=malloc(sizeof *c);
    
      if(!(a && b && c)) {free(a); free(b); free(c); return 1;}

~~~
yan
What the article intended to say is calling free() on the same pointer twice
is considered 'double free'. Also an issue with the c++ delete operator.

Calling free() on NULL is a no-op.

------
bstamour
Is the phrase "locator value" defined by the C standard? I've never heard of
it before. I always knew the 'l' in "lvalue" as meaning "this expression can
appear on the left-hand side of an assignment operation."

EDIT: Found the answer (thanks, draft C11 standard.)

Section 6.3.2.1 contains the definition of lvalue, and it does not use the
phrase "locator value" at all. However if you want to use it as a reminder
that modifiable lvalues can be assigned to, then more power to ya :-)

I can't help it - I'm a sucker for standardese.

------
jheriko
"Check for NULL at beginning of function or blocks which are dereferencing
pointers to dynamically allocated memory"

I really strongly disagree with this. Better for it to crash if you expect
this memory to always have been allocated. This way you can fix your bug
instead of putting your app into some potentially unexpected state... if
allocations fail it doesn't make sense to just 'carry on anyway' in many
situations.

~~~
dkersten
I disagree. Unless you can know for absolute certainty that there is no
cleanup required, it is not a good idea to simply crash and leave resources in
an inconsistent state.

Otherwise, you need a way for the cleanup code to dispose of the resources and
leave the system in a consistent state before crashing.

Ok, its obviously good practice to develop your software in a way that a
random outage doesn't leave anything in an inconsistent state, but a lot of
software fails to do this. At the very least, you may end up with something
like when a server crashes and cannot restart because the port is still
considered in use by the OS.

~~~
jheriko
if you are writing software which has to have some kind of guaranteed
reliability then maybe there is a sort of argument for this - but unless you
are handling the failure case in a deterministic and well behaved way then I
still think that is a much worse state to be in than a crash. Even in
production. In either case you are putting the system in a potentially weird
or unrecoverable state - the difference with a hard crash is that you know
straight away and its easier to debug, even after its shipped.

Imagine the customer error report or imagine its a critical system and it
starts making mistakes...

For most software smoke testing heavily is enough to make it super rock solid.
I know that most software does not do this - just using a web browser or a
smartphone makes it painfully obvious that even the big software houses have
some seriously shoddy practices and that testing gets seriously neglected (it
may be that its impractically big... i stuggle to buy that tbh)

------
Scene_Cast2
The sum() example does not work as the article says it does. Under Visual
Studio 2013 compiling for x86 (C++ compiler, but C++ also has integer
promotion rules), the function returns zero. The reason is: 65535 = 2^16-1.
uint16_t is signed, therefore it has 15 bits to represent the value. When
executing "int16_t a = 65535;" under a debugger, "a" is set to -1.

~~~
burstmode
If uint16_t really is a signed 15bit value under VS2013, somebody in the
compiler development department has a great sense of humor.

~~~
Scene_Cast2
My bad - that's a typo. I meant int16_t, the same as in the sample code.

------
aktau
Just a small question:

__m128i c = _mm_set1_epi16(2) __attribute__((aligned(16)));

Don't gcc/clang take care of aligning that type automatically? Isn't the
attribute thus redundant?

------
arunc
Wondering why C++ style has been followed, at least, for the * association
with the type and not with the variable.

~~~
rquirk
What about that funky braces style? Code with wrong braces or whitespace
messups indicate a lack of care. If the coder can't put a { in the right spot,
what else has he screwed up?

------
zurn
+1 for promoting Boehm GC.

GCC uses it.

edit: also Inkscape, w3m.

------
adultSwim
Worth reading.

