
C Questions and Answers - skazka16
http://kukuruku.co/hub/programming/i-do-not-know-c
======
ChuckMcM
I think this was silly, the author clearly does know C but they are
complaining about optimizing compilers which do things "behind your back" and
are becoming an increasing nuisance. It's sort of a passive aggressive "I
think this should be an error but it isn't an error because twisted logic that
the compiler uses with respect to undefined operation."

That people can teach themselves what to expect the compiler to do isn't all
that surprising, and it also isn't surprising that a "modern" compiler does
stuff an "old" C programmer might think is ridiculous.

I've engaged in this particular argument a few times only to throw up my hands
in frustration over some exquisitely twisted line of reasoning that gave the
compiler hacker person a fraction of a percent improvement[1] by exploiting
this kind of situation. As long as I have a compiler flag that turns it all
off its tolerable. But sheesh, sometimes I think these are C programmers who
don't have the guts to become Rust programmers. You want to start fresh dudes,
clean slate. Embrace it.

[1] "But Chuck, over the millions of machines out there its like an entire
computer's worth of CPU cycles you can use for something else!"

~~~
userbinator
On the topic of (overly) aggressively optimising compilers:

[http://blog.metaobject.com/2014/04/cc-
osmartass.html](http://blog.metaobject.com/2014/04/cc-osmartass.html)

...and attempts at turning C into something a bit less programmer-hostile:

[https://news.ycombinator.com/item?id=8233484](https://news.ycombinator.com/item?id=8233484)

My point of view is that compilers should be optimising at the level of
machine instructions, not by attempting to second-guess the programmer and
remove code that it thinks invokes UB. I've looked at tons of compiler output
over the years, and there's plenty of opportunity for optimisation in
instruction selection and register allocation... C should be a "do what I say,
not what I mean" type of language.

~~~
xenadu02
Meh, the problem is the preprocessor. A lot of really silly things are
produced by macro expansion, so the "obviously silly" optimizations really do
end up mattering.

~~~
pmr_
That sounds interesting, but I'm suspicious. Do you have any examples of such
macros and maybe some statistics or ideas how often they are actually seen in
the wild?

~~~
eatonphil
There are also libraries such as libCello[0] and Viola[1] (mine) to really
simplify C through macros. libCello is some really cool stuff.

[0] [http://libcello.org/](http://libcello.org/) [1]
[https://github.com/eatonphil/viola](https://github.com/eatonphil/viola)

------
EpicEng
Well apparently I do know C. If the author wanted to be as contrived a
possible there are certainly more devious edge cases which could have been
trotted out. The fact that e.g. the compiler may optimize out a NULL check
_after_ you've already dereferenced the darn thing shouldn't be surprising.
Just fix your silly bug.

~~~
rav
The problem is that (correct me if I'm wrong) an expression like "&foo->bar"
counts as a dereference of foo, even though the result of the expression is
simple pointer arithmetic involving foo (adding the offset of the bar member).

~~~
pcwalton
Yes, you're correct [1], and that's why offsetof is a builtin in gcc and
clang.

Even more insidious is the fact that memcpy of zero bytes from or to null is
undefined behavior.

[1]: [http://stackoverflow.com/questions/26906621/does-struct-
name...](http://stackoverflow.com/questions/26906621/does-struct-name-null-b-
cause-undefined-behaviour-in-c11)

------
ternaryoperator
This reminds of me of the Quiz books that were popular years ago. They'd show
some code that inadvertently tripped some obscure corner of the language.

Rarely did the quizzes provide great insight. Rather, they confirmed the
benefits of keeping your code idiomatic.

~~~
weland
It's pretty much what pops into my head every time I see things like these. I
got most of those correct, but my universal reaction was _why the hell would I
write something like that in the first place_?

~~~
ben0x539
I figure a lot of the time, you don't write it like that, but you get weird
behavior in a much more complicated situation without obvious defects that
eventually can be reduced to an example that would fit in with those quizzes.

~~~
weland
I don't remember any of that, either. The worst I've ever had was a silly bug
due to operator precedence. Barring some _truly_ uninspired things (like
signed integer overflow being undefined), I really think most of those are
cases one shouldn't run into, not even in a much more complicated situation.

~~~
JoeAltmaier
Many of the complicated situations arise from macros and templates that use
arguments in contexts the coder doesn't know. E.g. the stl. Also the macro
writer doesn't know the context wherein the macro will be expanded. You can
end up with issues of precedence, correct statement construction, expression
evaluation order etc.

~~~
weland
STL is not a problem in C land and fishy macros don't make it past code review
in my book :-).

I don't disagree on the usefulness of teasing your brain with these things
once in a while. However, I think the best way to ensure you don't hit bugs
caused by such things is to avoid the situation altogether.

~~~
JoeAltmaier
It doesn't have to be a very fishy macro at all to be problematic. How about

    
    
      #define  Sum(a,b) a+b
    

This of course fails in any context where the precedence of '+' doesn't match
your intent e.g. Sum(1,5) _7 becomes 1+5_ 7

Do you remember to always define your expression macros with parentheses? Any
time you didn't do that, you have a bug.

~~~
lmm
> Do you remember to always define your expression macros with parentheses?

Yes - that's a necessary part of being idiomatic. Any macro that's defined
without brackets sticks out like a sore thumb, and won't pass code review,
even if it's (initially) used in a place where they wouldn't be necessary.

------
bgvopg
Surprisingly accurate, author did his research.

5 is also missing a check for a NULL pointer. ;)

Most of there rules are unfortunately ignored, as obscure information. The
worst offender I see in wild code is 5.

The second most ignored is not checking values before computation: 10, 11, 12.

No.7 is very interesting, rarely violated, most programmers don't even know
that is a thing or just assume the processor won't trap on an unaligned read.

~~~
DanWaterworth
> 5 is also missing a check for a NULL pointer. ;)

Not necessarily. It could be up to the caller to ensure that the pointer isn't
NULL.

------
ericfontaine
Although most comments use these examples to argue that C is a bad language, I
would argue the opposite, that these examples show how C is an extremely
useful language. C has occupied a niche position as the lowest commonly-used
language that is both human-readable (at the level of expressing algorithms,
data structures, functionality, and control flow) but not specific to any one
instruction set architecture (and thus can compile to any architecture). Each
C construct or statement maps efficiently into machine instructions, while not
being an assembly language. I have trouble imagining how to make something
like C any lower without becoming architecture-specific or cumbered with
details that are more economically-suited for a compiler. But if too much
higher, then the programmer becomes detached from this close relationship to
the computer. The programmer can focus on implementing important speed & space
performance details of an algorithm while not getting bogged down by more
mundane details (such as register allocation, matching jump statements to
their targets labels, keeping track of the return stack of a function, etc.)
that are better suited for a compiler to handle. (That being said, I think
Rust or something like it is a strong successor and can additionally express
concurrency).

These examples illustrate well-intended and useful features of C, not flaws. I
will explain why for each in a comment below:

~~~
ericfontaine
#1. Tentative definitions are historical baggage from Fortan Common blocks
([https://blogs.oracle.com/ali/entry/what_are_tentative_symbol...](https://blogs.oracle.com/ali/entry/what_are_tentative_symbols)),
which may have helped adoption of C by Fortran users. Although it remains in
the C language specification, this feature can be ignored.

#2. Treating dereferencing a NULL pointer as undefined behavior means that the
compiler is not required to generate additional instructions such as asserts
or crashes to guard against potentially dangerous side effects. C compiler
assumes that the programmer is in control of his/her code. In this example, it
can be assumed that a careful C programmer has already guaranteed that the
pointer will not be null when it is dereferenced. This C feature is an
optimization to avoid generating redundant or unnecessary asserts or handling
code.

3\. C allows the programmer to handing pointers, allowing for such low-level
optimizations that may not be possible in higher-level languages. A careful C
programmer may have taken steps outside the function to handle the situation
where yp==zp, or may otherwise be unconcerned about a particular case, for
performance reasons.

4\. A correct implementation of IEEE 754.

5\. Since C is designed to efficiently compile to any computer architecture,
it needs to be aware of the distinction between the arithmetic width of an
instruction set vs the width of addresses. Ints are optimized to default to
the natural arithmetic width of an instruction set (so that compilation
doesn't produce unnecessary packing/unpacking instructions whenever they are
accessed) but is guaranteed to be atleast 16 bits wide. However, since data
structure can be as large as addressable memory, it is necessary for size_t to
be the width of addresses.

6\. Allowing size_t to be unsigned allows all bits of a size_t variable to be
utilized for expressing size.

7\. Undefined behavior is, again, an optimization feature, allowing each
compiler to implement as it sees fit.

8\. Comma operator is useful when first operand has desirable side effects,
such as compactly representing parallel assignment or side effects in for
loops.

9\. C allows unsigned integers to wrap around 0 and UINT_MAX. This feature can
be utilized as an optimization, for example as a free (no additional
instruction) deliberate modulus operation. This is usually how unsigned
integers behave in assembly.

10 & 11 & 12\. Some ISA's, like MIPS, treat overflow of signed numbers as an
exception. Others simply treat the result as a valid two's-compliment value.
Since C is machine independent, C's official specification for overlow of
signed numbers must be compatible for all ISA's. Simply treating the result as
undefined does the trick, and means the compiler doesn't have to make
guarantees or version for each ISA.

~~~
TorKlingberg
Good comment, but I'd like to add that int is not really the "natural
arithmetic width of an instruction set" any more. We will never see 64-bit
ints. The sizes of int and long seem to be "whatever works, and is compatible
with what it used to be".

~~~
_kst_
I've seen 64-bit ints (on Cray systems).

One disadvantage of making int 64 bits, even if that's the natural size, is
that if char is 8 bits, then short has to be either 16 or 32 bits (or 64) --
which means that you can't have predefined types covering all the common sizes
(8, 16, 32, 64).

That's not quite true, since C99 introduced _extended integer types_ \-- but I
don't know of any C compiler that has provided them.

(The intN_t and uintN_t types in <stdint.h> don't solve this; they still have
to be defined in terms of existing types.)

------
belovedeagle
> (especially C programmers)

Author needs to stop his anti-intellectual everyone-is-as-ignorant-as-me
bullshit. I see that a lot re programming to justify a lot of silly positions.
If you only know Javascript, that's great, I rather like having shiny things
in my browser (I like it too much, even). That doesn't mean that C programming
is obsolete; some of us know C. For example, I got #5 and #9 wrong, and the
rest I got right including the general idea of the justifications. (10/12 is
pretty good for someone who grew up in the Java era, but I want to get
better.)

~~~
oneeyedpigeon
I think the opinions "c is too complicated and buggy and, therefore, obsolete"
and "javascript is a 'shiny' language for amateurs" are just as narrow-minded
as each other.

~~~
zamalek
> javascript is a [...] language for amateurs

He never said that or implied it.

~~~
oneeyedpigeon
"If you only know Javascript" \- why even bring that up? There is no mention
of Javascript in the original article at all.

~~~
icebraining
Because it's a language that many people know exclusively?

------
chadaustin
The only one I got wrong was the one about IEEE semantics. It clearly is
possible to know C. :P

------
dllthomas
I'll readily admit that there are corners of C where I would not be able to
tell you just what breaks. But I know about where they are and enough to stay
away from them. If I'm not sure, the next guy won't be either.

I got 12/12 here. I would say I know C.

------
Veedrac
A trickier variant of #11 is to show people

    
    
        bool is_zero(int x) {
            return x == -x;
        }
    
        bool is_zero(float x) {
            return x == -x;
        }
    

and ask them which is wrong and for what value. Most of the time the
instinctive response is that it must be the float code (because floats are
evil, duh).

This works even in languages with defined overflow for integers.

~~~
rav
But you should give the functions different names, since it is not valid to
define the same symbol more than once (and there's no name mangling in C that
enables function overloading).

------
copsarebastards
1.

    
    
        int i = 10;
    

Q. Is this code correct?

A. Yes.

2.

    
    
        extern void bar(void);
        void foo(int *x)
        {
          if(x == NULL)
          {
            return;
          }
    
          int y = *x;
          bar();
          return;
        }
    

Q. It turns out if you check the validity of your variables before you use
them it prevents you from having to understand undefined behavior.

A. Is there a question here?

3\. There was a function:

    
    
        #define ZP_COUNT 10
        void func_original(int *xp, int *yp, int *zp)
        {
          int i;
          for(i = 0; i < ZP_COUNT; i++)
          {
            *zp++ = *xp + *yp;
          }
        }
    

I optimized it this way:

...because nobody had any idea what it was doing and so it wasn't used
anywhere.

4.

    
    
        double f(double x)
        {
          assert(x != 0.);
          return 1. / x;
        }
    

Q. Is it possible for this function to return inf?

A. If you're at the point where you're asking that question you should have
been using a decimal library a long time ago.

    
    
        int my_strlen(const char *x)
        {
          int res = 0;
          while(*x)
          {
            res++;
            x++;
          }
          return res;
        }
    

Q: The provided above function should return the length of the null-terminated
line. Find a bug.

A. They didn't use `strlen()`.

------
_RPM
This reminds me of tests I took in earlier CS classes. Knowing those things
are utterly useless in practice.

~~~
gamegoblin
My CS101 prof would give 50-100 line blocks of code, and hidden in them
somewhere would be something like:

    
    
        if(condition); {
            some stuff
        }
    

Note the semicolon after the if-statement.

The tests became games of "find the semicolon or = in place of ==". Ugh.

~~~
oneeyedpigeon
Thankfully, your compiler probably warns about this nowadays, making such
pointlessness obsolete.

~~~
ygra
And the static analysis tool you're hopefully using.

------
shele
Regarding the first answer: What is called a "tentative definition" is of
course a "declaration".

~~~
sirclueless
That's not actually true in this case. The author is correct about this.

When "int i = 10;" is encountered, the tentative definition _behaves_
effectively as a declaration. If the compiler were to reach the end of the
translation unit and the variable i was never defined elsewhere, however, "int
i;" serves as a definition.

~~~
shele
In think it is the other way around - the declaration is a tentative
definition, so a definition unless there is a definition at file scope.

------
belovedeagle
Why the heck was the name of this post changed? It rather conveniently puts
the author in a better light by downplaying the anti-intellectual nature of
the article I commented about above. I thought the general rule was that posts
should be titled with the title of the linked article, which was the case
before but now is not the case.

EDIT: To answer my own question, the submitter is clearly the author based on
his submission history. So yes, this was an act of self-censorship to try to
hide the author's disgusting attitudes.

~~~
skazka16
Not exactly true. I publish articles, which we translate into English (mainly
from Russian). The original author is Dmitri Gribenko and it seems he is not a
member of HN community.

------
kdoherty
Unsigned int >= 0 in a decrementing for-loop is a classic trap

~~~
bgvopg
Some would twitch at this:

for(i = length - 1; i < SIZE_MAX ; i--)

but it is completely valid.

~~~
joosters
The twitchers are right. Whoever wrote that line of code is a smartass who is
deliberately obfuscating stuff. Don't work with these people!

~~~
bgvopg
I prefer to learn, rather than stay ignorant, when I find new tech.

The only people I would not want to work with are ignoramuses and ironically,
smart-asses. So you.

------
seba_dos1
5 is IMO nothing else than nitpicking. 2 and 4 (and maybe 6) might have some
importance in real life, while others are easy and kinda expected (although
comma operator in 8 might not be known to less experienced programmers). I
can't really see the point of this article other than "hey, do you remember
that there is a concept called undefined behavior in C?".

~~~
ygra
You'd be surprised how many C or C++ programmers don't know about UB. Some
(I've worked with one of them) even went so far as to say »I know what
assembly the compiler generates from that, even if it's UB I know how it
behaves.« And those people then wonder that their code does something
differently when moving to a new compiler version, or when switching to a
different compiler.

------
nspattak
I was so surprised by the first question that I actually tried it on my
computer...

Am I the only one who tried compiling :

#include<stdio.h> #include<stdlib.h> int main(int argc, char *argv[]) { int i;
int i=10; printf("i=%d\n", i); return 0; }

and got the redeclaration error I expected?

~~~
MikeTaylor
The original article makes it clear that this is how things work with C
globals, not locals. If you compile and run this program with _gcc -Wall
-pedantic x.c && ./a.out_ there will be no errors and it will emit _i=10_ as
expected:

    
    
      int i;
      int i=10;
      #include<stdio.h>
      int main(int argc, char *argv[]) {
          printf("i=%d\n", i);
          return 0;
      }

------
kenjackson
I just got the first one wrong and haven't written C code in a decade. I have
to admit I haven't seen that construct used in any programs. I guess, no harm
no foul, but still seems like an odd construct to allow.

~~~
foogoespop
The first one would appear when using global variables which are shared across
files, you just wouldn't see it as the pieces would usually be in separate
files.

Something like:

    
    
        /* file.h */
        int global_i;
    

and you have:

    
    
        /* file.c */
        int global_i = 0;
    

Then if file.c includes file.h both will be in the same _file_ during
compilation.

------
ConAntonakos
I realize this has probably been asked, but anyone recommend good resources
for learning C? And not necessarily just the legendary textbooks, but any
clever tutorials or fun learning resources? Thanks!

------
bottled_poe
Does anyone know of a tool which can automatically scan source code for these
types of oversights?

Assuming of course the compiler doesn't already check for all of these...

~~~
pcwalton
You can use the clang sanitization flags to catch them dynamically (at least,
the null pointer dereference and signed overflow).

------
halayli
Knowledge is not absolute. Yes you can be a C expert and still not know every
single little detail about C or any other language for that matter.

------
dang
Since the title seems to be controversial we changed it to a neutral one.

------
fizixer
He's discussing subtle points of the C language and yet there is no mention of
which compiler he's using, whether the results might be different for
different compilers, hardware platforms, and how they correlate with multiple
C standards (C89, C99, C11/C1X, etc).

He succeeded in convincing me that he does not know C!

~~~
sjolsen
There is no mention of which compiler he's using or whether the results might
be different for different compilers and/or hardware platforms precisely
because he's discussing the C language— not the idiosyncrasies of particular
implementations.

>and how they correlate with multiple C standards

There's nothing going on here specific to any particular revision of ISO C
(that I can see).

~~~
fizixer
Too much of a coincidence!

Did he make sure the behavior he mentioned is uniform across all ISO
standards? What is his source for coming up with a certain answer to a certain
piece of code? any of the standard documents? K&R book? gcc output?

As an example:

In the answer to point 2, he claims:

> ... the compiler thinks that x cannot be a null pointer ...

First of all, this gives a strong indication that he's analyzing a compiler
output, a compiler that he didn't reveal in the article.

But even if we ignore that, and he truly is going by a rule that is uniform
across all of K&R, C89, C99, and so on, could you or him point me to any page
in the C99 standard document where it is explicitly stated along the lines
that the compiler "should assume" the pointer to be not NULL after an
undefined dereferencing in line (1) and hence ignore (2) and (3)? (Based on my
experience, I have a very strong hunch that a standard would not enforce
assumptions as a result of an undefined operation.)

If you could, you/he "may" have a point ("may", because I still have 11 other
points to critique). If not, he and you clearly have no idea what you guys are
talking about!

~~~
wtallis
The questions are about the C language, not any particular implementation
thereof. Question 2 requires you to know that different compilers may at times
legally do different things with the same piece of code. Answering question 2
does not require you to know _which_ compiler is in use. Instead it requires
you to think in the mindset of someone trying to write portable code that will
work as intended with _any_ compliant compiler.

In this case, the code is invalid because it invokes undefined behavior, and
the compiler is allowed to do literally anything. The author of a portable C
program is not allowed to rely on any particular behavior. I don't have any of
the ISO C standards docs handy, but I'm quite certain that they all agree
here. The (probably hypothetical) compiler in use here is apparently trying to
apply several heuristics that are useful in other situations but fail here
because there is no right answer.

In general, the C standards avoid requiring a specific behavior where choosing
to require a specific behavior could hurt portability. Most C compilers make
use of their leeway to interpret things in a manner that helps make bad code
run and many compilers make promises beyond those required by the standard to
aid in writing non-portable code.

~~~
fizixer
> ... the compiler is allowed to do literally anything ...

and yet he makes the claim that "bar() is invoked" which is not only incorrect
(bar may or may not be invoked), but also misleading for C newbies who are
actually trying to learn something by reading this article. Hence my original
comment.

~~~
wtallis
"bar() is invoked" is not incorrect. It's perfectly legal for a compiler to
produce this result. It's just not _mandatory_ that all compilers do so.
There's nothing wrong with the question postulating that a particular compiler
behaves this way; the _point_ of the question is to remind the programmer that
they have to expect and be prepared for variance between compilers when it's
permitted by the standard.

------
fdik
Already the first example is wrong. So I stopped reading.

% cc -std=c11 -o dingens dingens.c dingens.c:6:9: error: redefinition of 'i'
int i = 10; ^ dingens.c:5:9: note: previous definition is here int i; ^ 1
error generated. % cat dingens.c

#include <stdio.h>

int main() { int i; int i = 10;

    
    
        printf("hello, world\n");
        return 0;

}

%

~~~
anant
He clearly mentions that the statements are _not_ in a function body, rather
just in the global portion of a C file.

