

The most stupid C bug ever - Garbage
http://www.elpauer.org/?p=971

======
jrockway
This is a good article because it shows the author's thought process. That
means we can see what he did wrong and learn to think differently.

The first thing that sticks out as problematic is the assertion that a level
of indirection around tmpfile is going to be slow. There are two ways we can
determine if this is true; Try It And See, or read the output of the compiler
to see what code it generates. Let's try the second one.

We'll make a file called slow.c with the contents:

    
    
        int f(void){
          return 42;
        }
    
        int g(void){
          return f();
        }
    

Then we can see what code gcc generates with the -S option:

    
    
        $ gcc -S slow.c
    

and looking at slow.s. Here's the important part of g (with some annotations
removed):

    
    
        g:
        .LFB1:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	call	f
    	popq	%rbp
    	ret
    

So, we're calling f here. But you don't compile your production code with -O0,
so let's try -O3. Here's f:

    
    
        f:
        .LFB0:
    	movl	$42, %eax
    	ret
    

And here's g:

    
    
        g:
        .LFB1:
    	movl	$42, %eax
    	ret
    

Notice how we're not calling f anymore; we just get f's body right in our
function. So there's no speed cost to writing a function that only calls
another function. This is the simplest possible case for an optimizer to
optimize, and of course it does.

If you want to be explicit, you can always inline g. Then, at g's call site,
we see:

    
    
        main:
        .LFB2:
    	pushq	%rbp
    	movq	%rsp, %rbp
    	call	f
    	movl	$0, %eax
    	popq	%rbp
    	ret
    

Which is almost the same thing as what -O3 produced, except g's body is copied
into its caller instead of f's body being copied into g. (And this is at -O0.)

Another case where the thought process goes awry is the guess that there is a
preprocessor bug. It's easy to verify, run gcc with -E and see what you get.
Here's our test case:

    
    
        #define foo f
    
        int f(void){
           return 42;
        }
    
        int g(void){
          return foo();
        }
    

Preprocess:

    
    
       $ gcc -E buggy.c
    
        int f(void){
          return 42;
        }
    
        int g(void){
          return f();
        }
    

And look. No bug!

So let's move on to the final problem, the weird comment. We'll again see what
our preprocessor output looks like on a minimal test case:

    
    
        #define FILE int
    
        #ifdef _WIN32
        #  define tmpfile w32_tmpfile
        #endif
    
        FILE *tmpfile(void) { return (FILE *) 0 }
        FILE *w32_tmpfile ( void ) { return (FILE *) 0 }
    
        void test(void){
          FILE *g;
    
          if (code == 200) {     // Downloading whole file
            /* Write new file (plus allow reading once we finish) */
            // FIXME Win32 native version fails here because Microsoft's version of tmpfile() creates the file in C:\
            g = fname ? fopen(fname, "w+") : tmpfile();
          }
        }
    

We end up with:

    
    
        int *tmpfile(void) { return (int *) 0 }
        int *w32_tmpfile ( void ) { return (int *) 0 }
    
        void test(void){
          int *g;
    
          if (code == 200) {
    
    
    
          }
        }
    

Which looks like our code has suspiciously gone missing. Let's try removing
the comments:

    
    
        int *tmpfile(void) { return (int *) 0 }
        int *w32_tmpfile ( void ) { return (int *) 0 }
    
        void test(void){
          int *g;
    
          if (code == 200) {
            g = fname ? fopen(fname, "w+") : tmpfile();
          }
        }
    

And hey, our code is back. Must be something to do with the comments.

Anyway, the C toolchain is very simple, so let the computer do the compilation
for you when you suspect the compiler isn't doing what you want. Read and
understand its output; don't rely on a tool that does something that you
couldn't do yourself.

Something I mentioned in that code completion article the other day was "don't
rely on your editor to be your brain". This got a lot of downvotes, but this
is exactly the problem the author has here. His editor didn't interpret the
source code in the same way that the compiler did, and that's what matters.

Learn the language you are using so that you can spot bugs! Don't think
something isn't a comment just because your editor doesn't show it in italic
orange! Suspect an editor bug, not a compiler bug.

And finally, if you use correct grammar in your comments, this wouldn't have
happened:

    
    
        /* We use our own tmpfile implementation here, since Windows' writes directly to C:/. */

~~~
roel_v
Alternatively, use a debugger. This issue would have become obvious in seconds
if he had stepped through it with a debugger. Of course if you only have gdb
as a debugger, that's easier said than done - I'd too rather stab myself in
the eye with a hot poker than using gdb again. Proper tools make issues like
this much less painful to solve.

~~~
pestaa
What was it so painful about it you had to highlight gdb specifically? I'm
just starting out with it, so far it seems a lot more flexible than its visual
counterparts do.

~~~
roel_v
Its lack of easily seeing the surrounding code you're stepping through. Its
cumbersome ways of use (having to type commands for inspecting variables
rather than just being able to point at them). Its difficulty of learning it.

And all of that for no benefit - I spend hours learning vim too, but at least
there I got flexibility and a supremely powerful editor in return; but there
is nothing gdb can do that the Visual Studio debugger can't. And there you
just set a breakpoint and press run, while for gdb it takes you a day just to
get integration with your editor set up the first time. I'm not sure what you
mean with 'flexible' - a debugger is by its very nature interactive, so the
typical advantages of text-based tools don't apply. But of course YMMV etc.

~~~
jrockway
gdb can "list" and "where", which gives you the lexical and dynamic contexts
respectively.

Here come the downvotes... but if you don't know the code running in the
debugger well enough to remember the details, well, you've found your bug.
Read and understand your code very carefully before you waste your time
single-stepping through it.

(This is the biggest problem I see with developers that use IDEs -- they
forget that they have a brain, and that your brain is always going to be
faster than clicking shit in Eclipse.)

~~~
roel_v
Oh great, another internet toughman boasting about how all those IDE-using
pussies are just stupid and how Real Programming should be left to Real
Programmers (queue story of Mel here somewhere). _rollseyes_

When you have to type commands to see what you would otherwise see
immediately, how is that _not_ a usability deficiency? GDB fans just have
Stockholm syndrome.

And if you think there is anyone on this world who has a complete mental model
of each line of code in a 10k-line project, let alone a 100k or 1m one, you're
insane. I'm not talking about debugging a 15-line function, But even then,
single-stepping through a function of that size doesn't even cost extra time
when you use proper tools (and not that POS gdb), so 'waste your time' doesn't
even apply there.

Debuggers have a bad rep with some crowds because of the damage gdb did -
people think that that's how debugger work and can only work. Navel-gazing at
its worst.

~~~
jrockway
_Oh great, another internet toughman boasting about how all those IDE-using
pussies are just stupid and how Real Programming should be left to Real
Programmers (queue story of Mel here somewhere). rollseyes_

If you want to talk like this, go back to Reddit. I'm not an Internet
toughman, I'm describing the motivations behind the interface of gdb. Turns
out that just because someone disagrees with you doesn't mean they're dumber
than you.

In this case, I think it's worth listening to me because I've used pretty much
every development tool imaginable to write hundreds of thousands of lines of
code. I may know what I'm talking about! Shocking!

 _When you have to type commands to see what you would otherwise see
immediately, how is that not a usability deficiency? GDB fans just have
Stockholm syndrome._

Information overload is just as bad as too little information. GDB lets you
see what you need to see when you need to see it. It doesn't guess what you
want it to do; you tell it. You're the brain and it's the computer.

And, it hooks into your text editor if you want that little => next to the
line of code GDB is on. I've personally never found this to be useful. (As I
mention in a comment above, it's rare for modern compilers to produce code
that runs top-to-bottom exactly as you've typed it anyway. The source code is
an explanation of the sort of steps you want the computer to take, but it's
not an exact model of what the comptuer's going to do. Saying "we're now
running this line of code" can be very misleading.)

 _And if you think there is anyone on this world who has a complete mental
model of each line of code in a 10k-line project, let alone a 100k or 1m one,
you're insane. I'm not talking about debugging a 15-line function._

This is just plain wrong. The debugger is the wrong tool for analyzing a bug
that persists over 10k lines of code. The key to debugging is to reduce the
scope of the problem, and the way to do that is by not tightly-coupling large
chunks of code. If you have a bug that requires a mental model of 10k-1m lines
of code, unit tests and refactoring is the tool you need to be using.

But, with that in mind, 10k lines is not that much to keep in one's memory. I
maintain cperl-mode.el, which is 10k lines of elisp that I did not write. I
haven't touched it for months, but I still know where the important parts are
and how they interact. Reading is one of the most important skills a
programmer can have. If you can't remember what you've read, you need to slow
down. Lack of understanding causes bugs.

 _single-stepping through a function of that size doesn't even cost extra time
when you use proper tools (and not that POS gdb), so 'waste your time' doesn't
even apply there_

Yes it does. Single-stepping requires a round-trip between the computer and
your brain for every line of code. When your IDE jumps you to a new file
because you stepped into a function, your brain has to context-switch and read
the surrounding code. All in all, it's a slow and complicated operation
compared to the normal process of running your unit tests and scanning the
logs if there is a failure. That case is fast because most parts are done by
the computer; it just gives you a few pieces of information that you can use
to analyze, tweak, and re-run. And you can drink some coffee while the
computer does most of the work!

A debugger is for a very special case where unit tests cannot find bugs, or
the effort to write these test would be too high. A good example is examining
a core dump from a production process. You load it up in your debugger and can
inspect the program state in great detail. You see that the current call frame
is 0x626f6f66. That looks like the text "foob", which seems like it might have
come from some text your program was working with, "the foobar chronicles".
But you're using bounds-checking for your string operations, so why the bug?
Use the debugger to inspect some local in-memory state, like the "len"
parameter to strlcpy. It's 2938473873! Must be an integer overflow somewhere.

The key here is that everything we did was interactive and required human
thought at each step. We didn't know what the problem was, so we asked the
debugger for some data. Then we thought about it for a while, came up with a
theory to test, and asked the debugger for more information to prove or
disprove our theory. After repeating this a few times, we came to the
conclusion that we are doing an addition wrong in the foo function, and that's
making the program segfault.

With that in mind, we switch back to our text editor, write a test for the
length-calculation math, and then fix the math. We shouldn't need a trip to
the debugger for a while, because we use the debugger to collect information
in odd cases, not to watch our program run normally.

So anyway, I guess my hangup with the graphical IDE-based debuggers is "why".
Why are you writing code that needs to be single-stepped regularly? Why is
your codebase so messy that you can't understand it? Why is your time so
worthless to you that you are doing the computer's work for it, instead of the
other way around?

A debugger is a tool to use for investigating weird things, not something you
should bust out everyday.

 _Debuggers have a bad rep with some crowds because of the damage gdb did -
people think that that's how debugger work and can only work. Navel-gazing at
its worst._

A lot of insults, but not a lot of information. If you're so sure that Visual
Studio's debugger has solved all of programming's problems, why not share a
concrete example of how you use it? Your post adds nothing to the conversation
and makes people wish your computer would punch you in the face.

~~~
Mithaldu
> I'm not an Internet toughman, I'm describing the motivations behind the
> interface of gdb.

Actually, yes, you did that, but you still are one.

For the plain and simple reason that you repeatedly stooped to insults while
making your points instead of letting them stand on their own. You should not
be surprised people think little of you if you cannot teach without belittling
people.

~~~
jettero
It's funny how both sides in any argument or disagreement think one side is
belittling the other by doing nothing more than assuming the other side
doesn't understand something or another (this includes you just now btw).
Certes pointing out ad hominem counts as ad hominem every single time too.

~~~
Mithaldu
> this includes you just now

Most certainly not. The factual divide is one issue and his politeness
another. The posts of jrockway are laced with flat out insults that are
unnecessary and have nothing to do with any assumptions about understanding.

Let me point one out:

> Your post [...] makes people wish your computer would punch you in the face.

(Omitted part not being an insult.)

~~~
jettero
Surely there was insult and ad hominem in his post (not arguing the fact), but
I believe it was in reply to the insult and ad homimen from the post he was
replying to. Calling someone an internet toughman counts as ad hominem just as
much as calling someone computerfacepunch.

Although, the more I think about it, your point was probably: no need to stoop
to the same level.

~~~
Mithaldu
Consider that it was preceded by:

> they forget that they have a brain > where -O3'd code magically executes as
> though the compiler didn't do any optimizations on it. > And finally, if you
> use correct grammar in your comments

These have varying degrees of subtlety, but they are definitely aimed digs.

There was a definite precedent that warranted the toughman comment and made it
seem like an accurate description to me even when i was actively trying to be
nice.

Edit: I understand your point and think about it fairly often, i.e. when
programmers in one language attack another and the attacked group goes on to
point out how they refrain from attacks. I just do not think i have any ash to
sprinkle on my shoulder in this case.

~~~
jrockway
It's not a dig. I do think you're dumb if you don't like gdb. Naturally, my
opinion comes out in my writing, which I think is my Creative License. You
should realize that anything you read is biased. You should read the post for
content, mix it with its credibility, and then reach a decision.

In the end, I think that my experience programming adds more credibility than
my love for snide comments subtracts. That doesn't make me a toughguy, that
makes my writing interesting to read.

~~~
Mithaldu
Let me put it like this: mst you are not.

------
angusgr
I agree with the various other commenters on the blog post - that treating the
wrapper function like an unacceptable amount of redirection is insane
premature optimisation at expense of clarity.

Not to mention the wrapper can be inlined in C99, or would be automatically
inlined with -O3 in gcc, not that it matters one iota in this case.

The other glaring bug would seem to be Windows' implementation for POSIX
tmpfile() requiring admin access. But I guess that's not too surprising.

~~~
nandemo
Indeed. What's a wrapper function's overhead compared to _opening a file_ ,
which subsequently will be _written_ to?

Frankly, I wish this sort of blog post wouldn't be upvoted here. It's one
thing when someone asks a basic question on StackOverflow. Even experienced
programmers can learn with the answers. But when someone writes an
"authoritative" blog post that misses the basics, it's more noise than signal.

~~~
joshu
Well, jrockway's deconstruction was pretty nice to read even if the original
was boring.

~~~
nandemo
Point taken. I learned something with the comments if not with the original
article.

------
burgerbrain
I've actually encountered this, but I can't say it stumped me at all.. my
syntax highlighting made it immediately apparent what would have happened had
I tried to build.

A coworker of mine also ran into a similar issue (not realizing something was
commented out, thus odd behaviour) with a _if 0 .... #endif_ comment. Vim
highlights that correctly too but his emacs setup did not (no idea how stock
it was).

~~~
hartror
That was my reaction. Syntax highlighting makes syntax errors a breeze to
notice.

~~~
Jach
I thought this as well, then added a point to "Backslash directory separators"
on my list of reasons to go back in time to kill Bill Gates...

~~~
StavrosK
Hey, none of this would have happened if the author used proper punctuation!

------
zvrba
"That has a performance impact on all platforms..." I mean, tmpfile() is about
to make AT LEAST one system call, access the disk, and do a bunch of stuff he
has no control over whatsoever, yet he's worried about one extra CALL
instruction. Talk about premature optimization.. (and incompetence, if you
will).

------
chalst
The comment from SteveB in the linked-to thread was interesting:

    
    
       Most editors i have used highlight the next line in green as a comment, 
       however i have come across one that took me over a year to solve, that is 
       remarkably similar.
    
            //why does this next line not compile??/
            x = 0;
    
        because the ??/ operator is a trigraph meaning \
        That was never been highlighted correctly any editor I have used.

~~~
s3b
wait. So it should have been compiling before the comment was added. Why then
was the comment added?

~~~
kc5tja
To prove a point.

------
splicer
Compiling with gcc's -Wall option, which implies -Wcomment, would spit out
"warning: multi-line comment". Unless you have good reason to do otherwise,
_always_ use -Wall. I recommend -Wextra and -Werror as well.

~~~
Jach
Also -ansi and -pedantic if you're aiming at the C89 standard for some reason,
it's great to look in the gcc manpage and see all the little flags that are
around. I don't often like -Werror since it can interrupt my flow. If I'm
testing runtime correctness, and something like "suggest parentheses around
assignment used as truth value" pops up when that was intentional, I don't
want to make the context switch to go back and fix it and recompile before
context switching again back to test whatever runtime correctness I was aiming
for. I'll fix it later. Philosophically, -Werror assumes I'm the type of
person who ignores warnings completely for the duration of a project, and will
stop you from ignoring them in the first place even if there may be reasons
to; I don't think most people are that way.

------
exDM69
"That has a performance impact on all platforms: the direct call to tmpfile()
is now an indirect, which defeats optimization, and well, everything."

This couldn't be any more wrong. First of all, even if there were a
performance impact, it would be meaningless compared to the time it takes to
perform the system calls required for tmpfile().

Second, the compiler will inline the function call if it can. You can use a
regular "inline" function and if you're not convinced it's good enough, then
use __attribute__(force_inline) or equivalent. Even if the damn function is
not an inline function, it will get inlined if it's in the same translation
unit as where it is called (look at the asm output of gcc -O3). In the (not so
far) future we'll be able to enjoy link time optimizations and don't have to
care about translation units w.r.t. inlining optimizations.

~~~
msftguy
Even without inlining, it will be subject to tail call elimination with any
sane compiler: my_tmpfile will just branch to _imp_tmpfile instead or
returning.

------
gfodor
I too am tired of paying the price for C's notoriously expensive function
calls.

~~~
jrockway
Especially around the otherwise speedy task of disk IO. On Windows.

------
wglb
Martin Fowler in "Refactoring" supported one of my prejudices, which is that
"comments are bugs". So we could have nipped this bug in the beginning by not
commenting here.

Then there is the question of why are we even thinking about the optimization
issues about calling tmpfile? Are we doing billions of these calls? Or more on
the order of tens of them. But in either case, wouldn't it be evident prior to
any optimization or measuring effort that the time taken up by the OS to
actually create the file would swamp any gain done by this optimization?

One might then argue that the bugs are 1) using comments 2) attempting an
optimization that simple pre-calculation shows no real benefit.

------
signa11
emacs unfortunately fails on the trigraph-sequence "??/" and ofcourse the line
immediately following doesn't get executed. having AST based syntax hi-
lighting would of course fix the issue. most editors, afaik, used hacked
together regexe's which break-down under various conditions.

------
JoeAltmaier
Similar bug:

    
    
      int i = 10;
      int *p = &i;
    
      int j = 100/i; // j==10 right? right
    
      int k = 100/*p; // k==10 right?
    

That last line doesn't compile.

~~~
reemrevnivek
Use spaces between arithmetic operators. You've already got them around the =
sign. It looks much better, and prevents this problem:

    
    
      int i = 10;
      int *p = &i;
    
      int j = 100 / i; // j==10 right? right
    
      int k = 100 / *p; // k==10 right? right
    

Also, any decent syntax highlighter would have caught the latter problem. The
problem in the blog post was much more subtle, because most (all?) syntax
highlighters won't catch it.

------
qntm
In addition to everything else, please don't use a negative condition in an
if/else statement.

    
    
        if(NULL != fname) {
            g = fopen(fname, "w+");
        } else {
            g = tmpfile();
        }
    

Better:

    
    
        if(NULL == fname) {
            g = tmpfile();
        } else {
            g = fopen(fname, "w+");
        }

~~~
arethuza
I'm not particularly disagreeing with the point you are making - but would it
really be asking to much for people to provide explanations of _why_ one form
is preferred over another?

~~~
qntm
Certainly.

The else block in this case is executed when "NULL != fname" is false. That's
an unnecessary double negative and takes a nonzero amount of mental effort to
unscramble. The example I gave reverses this. This would also be acceptable:

    
    
        if(fname) {
            g = fopen(fname, "w+");
        } else {
            g = tmpfile();
        }
    

(I also don't regard either statement as an "exceptional condition", they both
seem equally likely to me.)

~~~
arethuza
Thanks - I've started getting a bit grumpy about coding rules that come
without any explanation of _why_ you should do something (simply saying
something is "best practise" often being the worst "explanation" of all).

~~~
VBprogrammer
For what it's worth I couldn't disagree more. I use the guard pattern very
often (as described here <http://video.ias.edu/webfm_send/1207> [Video]).

Rather than writing:

    
    
       if (argc == 3) {
          if (fname) {
             /* Do stuff */
          } else {
             printf ("File name cannot be null");
          }
       } else {
          printf ("Wrong number of arguments");
       }
    

I would write:

    
    
       if (argc != 3) {
          printf ("Wrong number of arguments");
       } else if (!fname) {
          printf ("File name cannot be null");  
       } else {
          /* Do stuff */
       }

~~~
qntm
That has merit. Personally I usually find myself doing something more like
this:

    
    
       if (argc != 3) {
          printf ("Wrong number of arguments");
          // throw an exception or return NULL or something
       }
    
       if (!fname) {
          printf ("File name cannot be null");
          // throw an exception or return NULL or something
       }
    
       /* Do stuff */
    

The important thing, to my mind, is the absence of an else block.

~~~
VBprogrammer
Yeah, that is a pattern I occasionally use but I prefer what I have written
above as the control flow is more explicit. A missing return (or even one
which is quite deliberately omitted) is much harder to spot.

------
5hoom
I loved the punchline, & probably wouldn't have spotted that one myself.

As others have noted, the moral is to not use C++ '//' style comments in C
code or at least hope that you are using a clever editor ;)

~~~
StavrosK
It would have produced the same bug in C++ code as well.

~~~
5hoom
I guess it would, it's just not something I've run into.

The only time I've ever seen backslash's used in C or C++ as a line-break is
in huge multi line macros with lots of variables that has been broken up for
readability.

It never occurred to me it could be used to do multi line comments but there
you are.

------
dools
Maybe this is why my C tutor used to say over and over again "never use C++
style comments even though Visual studio let's you do it".

~~~
jmj42
My thoughts, exactly. Perhaps I'm a little overly pedantic, but it annoys me
to extremes when I find c++ style comments littering up c

~~~
burgerbrain
What I do is I turn on my compilers warnings for C++ comments in C code, then
use C++ comments with all my TODO comments. That way I know if there are any
outstanding TODOs when I go to build.

~~~
mahmud
It's not a warning in GCC. The -ansi flag, and older standards, err on C++
comments, but they're still valid input for C99 mode(s).

~~~
burgerbrain
With -pedantic you can get:

    
    
      warning: C++ style comments are not allowed in ISO C90 [enabled by default]
    

Might be worth noting at this point since I'll already halfway though the GCC
manpage again, that -Wcomments seems to be exactly for the sort of situation
the author of the article found himself in.

------
spitfire
The most stupid C bug ever? Pointers.

Or rather, value of, rather than dereference by default. Ada got this right,
dereference by default, no pointer arithmetic.

~~~
keeperofdakeys
It isn't a bug, it is a feature. C wasn't designed to abstract away the
reality of pointers, which are the basic building blocks of assembly. If you
don't like pointers, then don't use C. All programming languages have their
time and place (even something like whitespace).

The advantage of not abstracting pointers is that it makes low-level
programming much easier. Instead of having to write your operating system
kernel in assembly, you can write it in a portable language that is much
easier to read!

