The Little C Function From Hell

scott_s · on June 19, 2012

If this sort of thing is interesting to you, the author and his colleagues wrote an excellent PLDI 2011 paper about their research which systematically checks various compilers for exactly these kinds of bugs: "Finding and Understanding Bugs in C Compilers": http://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf

Perhaps my favorite part of that paper is the fact that there is no ground truth (section 2.6). They discover bugs by testing random programs against multiple compilers. If the result from any of the compilers disagree, then there must be a bug. (They guarantee that the inputs are legal.) In theory, it's possible that all of the compilers could be wrong in the same way, which means they wouldn't discover a bug. In practice, this is extremely unlikely. But you can't know for sure. (In practice, they never saw an instance where there were three different results from three compilers; at least two of the compilers always agreed.)

Deestan · on June 19, 2012

> They discover bugs by testing random programs against multiple compilers. If the result from any of the compilers disagree, then there must be a bug.

How does this make sense?

If the result differs from the specification, it is a bug.

If the result is unspecified in the specification, the different compilers can differ as much as they want without any of them being considered buggy.

Wilya · on June 19, 2012

From the linked paper: "Although this compiler-testing approach has been used before [6, 16, 23], Csmith’s test-generation techniques substantially advance the state of the art by generating random programs that are expressive—containing complex code using many C language features—while also ensuring that every generated program has a single interpretation. To have a unique interpretation, a program must not execute any of the 191 kinds of undefined behavior, nor depend on any of the 52 kinds of unspecified behavior, that are described in the C99 standard."

loumf · on June 19, 2012

I took "They guarantee that the inputs are legal" to mean that they limited it to programs with specified behavior. They don't know what the behavior is -- just that it is specified.

If they can do this, it finds a subset of bugs, with no false positives.

silentbicycle · on June 19, 2012

C compilers can be buggy, particularly when you start working with vendor-supplied compilers for embedded platforms. A colleague was furious when he realized that his board's compiler didn't support function pointers.

joshAg · on June 19, 2012

that does seem like a rather large omission. how did he work around that issue?

pmjordan · on June 19, 2012

I could imagine a DSP architecture that doesn't intrinsically support indirect jumps. (especially as DSPs frequently use the Harvard memory model) That would make implementing function pointers tricky. I'd probably work around this by making a set of dispatch macros that expand into a giant switch block where each case is a (static) function call. The other option would be self-modifying code, which is annoying to do, to say the least, particularly for Harvard systems.

kragen · on June 20, 2012

If your CPU supports keeping function return addresses on a stack that you can push other things onto, you can do an indirect jump by pushing the address you want to jump to and then "returning" to it. That's a lot easier than self-modifying code or massive switch statements, and just as easy on Harvard as on von Neumann architectures.

silentbicycle · on June 20, 2012

To be honest, I don't know, except that there was a lot of scowling. I think it was a PIC micro.

scott_s · on June 19, 2012

Both loumf and Wilya are correct. In support of their answers, remember that the specification does not specify the results of interesting programs. It says "if you do this, this must be the result." But if you limit yourself to only testing such simple cases, you're not going to find any interesting bugs - because such simple programs are likely to have already been tested.

cube13 · on June 19, 2012

>If the result differs from the specification, it is a bug.

A large part of the C standard is implementation defined(see acqq's post here: http://news.ycombinator.com/item?id=4131828 ), so the result could be different on multiple compilers, not a bug, and STILL completely within spec.

scott_s · on June 19, 2012

If I recall correctly, they only generated programs that had well-specified behavior. Not just legal, but specified.

MidwestMuster · on June 19, 2012

Isn't this called "fuzzing"

dllthomas · on June 19, 2012

It's certainly related. The difference here is that 1) they're comparing output of multiple systems, rather than looking for obviously erroneous behavior of one (segfaults, memory leaks, failed assertions); and 2) the input data is all correct - fuzzing (per my understanding) usually implies tossing bad data in to see if the system breaks (frequently just slightly bad data is more interesting than complete garbage, but either falls under "fuzzing").

scott_s · on June 19, 2012

It's rather like inverse fuzzing. So, zuffing.

_delirium · on June 20, 2012

This kind of coinage is a rather large rabbit hole! Once upon a time in 1963, someone asked: what happens if you take the Fourier transform of a Fourier transform? Well, a Fourier transform gives you a spectrum, so let's call a Fourier transform of that, a new concept called a cepstrum. So what are its bins, analogous to frequency bins? Let's call them quefrency bins, and the cepstrum is therefore a quefrency cepstrum. What's the operation when you modify quefrencies in the cepstrum in some manner other than uniformly, analogous to how one might run a frequency spectrum through a frequency-domain filter? Why, liftering, of course.

andrewcooke · on June 20, 2012

in case anyone else is confused - it (a cepstrum) is the ft of the log of the modulus of an ft. the ft of an ft is the original signal. https://en.wikipedia.org/wiki/Cepstrum

raverbashing · on June 19, 2012

A little C tip I learned from hard experience

NEVER, EVER, NOT IN A MILLION YEARS use a signed int/char etc, unless you are 200% certain you're doing the right thing (that is, it's for something specific you need it)

You WILL have problems, period.

"Oh it's just a matter of knowing the C spec" then please go ahead as I grab the popcorn.

ajross · on June 19, 2012

"For something specific you need it" meaning ... a negative number, like an array or memory address offset? I mean, sure, I agree that you should be doing anything sensitive to 2's complement behavior on unsigned quantities. And if you know the values are positive-definite, unsigned has much cleaner behavior. And I'd even entertain the argument that unsigned should have been the default in the language spec and that signed quantities are the ones that should have specially declared types.

But... your advice as written is just insane. They are real, and required routinely. You can't just apply this as a "for dummies" rule without essentially ruling out half of the code they'll need to write.

raverbashing · on June 19, 2012

". a negative number, like an array or memory address offset? "

A negative number yes, but not really a memory offset (you shouldn't mix negative numbers and memory offset, really)

But yeah, if you're doing "math" go for it, but it's on rare occasions where you need negative numbers (subtraction yes)

The most common case I remember may be sound samples, where you have signed chars.

For all other cases you would be using floating point or decimal numbers

Natsu · on June 19, 2012

> For all other cases you would be using floating point or decimal numbers

If I'm trying to avoid mathematical anomalies, floating point is not what I would run to... "Equal" is a matter of degree, you have to be careful with anything near zero and you can't carelessly mix together numbers that are a few orders of magnitude different than each other.

raverbashing · on June 20, 2012

Absolutely

But for most of "math" you would go for floating point. You won't reinvent some fixed point math using integers just because...

tedunangst · on June 19, 2012

signed usually promotes to unsigned in nice ways, such that if you really want to store -1 in your unsigned, it will just work. I've found using all unsigned types, with the occasional signed value jammed into them, is less error prone than mixing and matching types throughout the program. ymmv, and of course, here be the odd dragon or two.

lawnchair_larry · on June 20, 2012

My goodness, no, this is terrible advice. Never do this. Go fix all the code you wrote immediately. It is full of security vulnerabilities. I'm not kidding, this is so bad.

cperciva · on June 20, 2012

I'm with the OpenBSD developer on this one.

lawnchair_larry · on June 21, 2012

I'm pretty confused. I didn't know who Ted was, but a quick google search shows he is an OpenBSD dev and worked for Coverity. Coverity itself will flag this error. Now you are backing up that position. Historically, this exact thing has been the cause of many security vulnerabilities. It's especially precarious with the POSIX API due to many functions returning negative for error. I recall OpenSSH making sweeping changes to rid the code of signed ints being used for error, for this reason.

Can you explain why you would advocate this? Am I misunderstanding you, or missing something?

I replied to the other comment in this thread with an openbsd vulnerability caused by doing what is being advocated (I did choose openbsd to be funny).

cperciva · on June 21, 2012

Looks to me like the vulnerability you linked to demonstrates the exact opposite of what you think it demonstrates: The problem is that an unsigned value (a length) was being stored in a signed integer type, allowing it to pass a validation check by masquerading as a negative value.

lawnchair_larry · on June 23, 2012

Well, no, select() takes a signed value for the length (it is actually not a length, but the number of descriptors, later used to derive a length), and there is no changing that interface obviously. This is the source of the "negative value" in this example. The problem arises because internally, openbsd "jammed a negative value into an unsigned int", as Ted put it, and made it a very large positive value, leading to an overflow.

cperciva · on June 24, 2012

If the bounds check was performed after casting to unsigned, there would have been no problem. The vulnerability occurred because a bounds check was incorrectly performed on a signed value.

rwallace · on June 20, 2012

Can you give some examples of why you think this is more prone to security vulnerabilities than using signed types?

lawnchair_larry · on June 21, 2012

http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2002-142...

rwallace · on June 22, 2012

Ah, thanks. From reading the summary, it seems that case would have been prevented by using signed integers throughout, but would also have been prevented by using unsigned integers throughout?

barrkel · on June 19, 2012

I've seen more problems from unsigned ints than signed ints (in particular, people doing things like comparing with n-1 in loop boundary conditions). There's a reason Java, C# etc. default to signed integers. Unsigned chars, I have no quibble (and Java, C#, use an unsigned byte here).

rcfox · on June 19, 2012

Unsigned integer overflow has defined behaviour in C, while signed overflow doesn't. Is it really better to protect people from a simple logical error by exposing them to possible undefined behaviour?

With signed integers, you'll run into the same problem with comparing to n+1 at INT_MAX or n-1 at INT_MIN.

barrkel · on June 19, 2012

0 is a really common value for an integer variable in programs. INT_MAX and INT_MIN are not.

It's just my experience. Don't get too wound up about it ;)

raverbashing · on June 19, 2012

"(in particular, people doing things like comparing with n-1 in loop boundary conditions"

There's your problem: that's like saying cars are more dangerous than motorcycles because your finger can get squeezed by the door.

"There's a reason Java, C# etc. default to signed integer"

Legacy? And in Java/C#, and you usually use ints, not so much chars, shorts, etc and casts are probably more picky

I stand by my point, you should only use signed if you know what you're doing and for a specific use only (like math)

barrkel · on June 19, 2012

Signed arithmetic is generally only problematic when you hit the upper or lower bound. The right answer is almost never to use unsigned; instead, it's to use a wider signed type.

It's far too easy to get things wrong when you add unsigned integers into the mix; ever compare a size_t with a ptrdiff_t? Comes up all the time when you're working with resizable buffers, arrays, etc.

And no, Java did not choose signed by default because of legacy. http://www.gotw.ca/publications/c_family_interview.htm:

"Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up."

Unsigned is useful in a handful of situations: when on a 32-bit machine dealing with >2GB of address space; bit twiddling where you don't want any sign extension interference; and hardware / network / file protocols and formats where things are defined as unsigned quantities. But most of the time, it's more trouble than it's worth.

raverbashing · on June 19, 2012

"The right answer is almost never to use unsigned; instead, it's to use a wider signed type.""

Depends on the case of course, but yes, you'll rarely hit the limits in an int (in a char, all the time)

"It's far too easy to get things wrong when you add unsigned integers"

I disagree, it's very easy to get things wrong when dealing with signed

Why? For example, (x+1) < x is never true on unsigned ints. Now, think x may be a user provided value. See where this is going? Integer overflow exploit

Edit: stupid me, of course x+1 < x can be true on unsigned. But unsigned makes it easier (because you don't need to test for x < 0)

"what unsigned arithmetic is"

This is computing 101, really (well, signed arithmetic as well). Then you have people who don't know what signed or unsigned is developing code. Sure, signed is more natural, but the limits are there, and then you end up with people who don't get why the sum of two positive numbers is a negative one.

barrkel · on June 19, 2012

As you admit, you just made a mistake in an assertion about unsigned arithmetic. You're not very convincing! ;)

raverbashing · on June 20, 2012

As I said, if I'm not convincing please go ahead and used signed numbers as I get the popcorn ;)

Here's something you can try: resize a picture (a pure bitmap), with antialiasing, in a very slow machine (think 300MHz VIA x86). Difficulty: without libraries.

jbooth · on June 19, 2012

Java actually uses a signed byte type, which was I guess to keep a common theme along with all the other types, but in practice leads to a lot of b & 0xFF, upcasting to int etc when dealing with bytes coming from streams.

Roboprog · on June 19, 2012

There's a certain aspect of "functions have domains, not just ranges" at work here as well -- e.g. - restricting the (math) tan() function to the domain of -90 to 90 degrees (exclusive), unless you really get off on watching it cycle over madness. If you are going to be playing around the edges of something, it behooves you to put some kind of pre-condition in with an assert of similar mechanism.

In fairness, I guess a function like this is a good example of why you should put in preconditions, as well as a good demonstration that "not all the world is a VAX" (nor MS C 7, nor GCC version N) :-)

duaneb · on June 20, 2012

Actually, a colleague recently convinced me to start using signed ints for, e.g., for loops instead of unsigned ints. His reasoning was that if you're overflowing with signed integers, you'll probably overflow with unsigned integers too (we work with very large numbers), but it's easier to notice if you have a negative number rather than silently wrapping around to a still-valid value.

cperciva · on June 20, 2012

99% of the time, you should not be using any sort of int for a loop variable. Loop variables are almost always array indexes, and array indexes should be size_t.

cpeterso · on June 20, 2012

Google's C++ style guide strongly recommends using signed ints over unsigned ints:

https://google-styleguide.googlecode.com/svn/trunk/cppguide....

regularfry · on June 19, 2012

It's not just the C spec you've got to watch. I saw a wonderful bug last week where the author hadn't spotted that write(2) returned a ssize_t, not a size_t (or didn't appreciate the difference), so was gleefully checking an unsigned variable for a -1 error result.

nitrogen · on June 19, 2012

How did the bug manifest itself? You can store 0xffffffff(ffffffff) in a 32(64)-bit unsigned int, or a 32(64)-bit signed int. In the one case you'll get UINT_MAX, -1 in the other, but they should compare as equal. If you have -Wextra turned on, gcc will give a warning, though.

Here's some sample C code tested on a 32-bit Linux system:

  #include <stdio.h>
  
  int main(int argc, char *argv[])
  {
          unsigned int val1 = 0xffffffff;
  
          printf("val1 == -1: %d\n", val1 == -1);
  
          return 0;
  }

The result:

  val1 == -1: 1

mikeash · on June 20, 2012

It works for == -1, but not for < 0, which is a common way to check for error returns from UNIX calls.

Any decent compiler should warn that such a check is always false, but people don't always pay attention to that stuff....

nitrogen · on June 20, 2012

C is my absolute favorite language, and as such, I learned a long time ago to pay very close attention to compiler warnings and Valgrind memory errors.

mikeash · on June 20, 2012

Sadly, a lot of people have never learned this valuable lesson, and happily build their code with hundreds or even thousands of warnings.

regularfry · on June 20, 2012

Got it in one :-)

timbre · on June 19, 2012

I'm rusty on C, and confused. What should I use instead of signed int?

rcfox · on June 19, 2012

unsigned int :)

bvdbijl · on June 19, 2012

and for negative numbers?

rcfox · on June 19, 2012

In that case, you can be pretty certain that you need a signed integer.

esrauch · on June 20, 2012

What if you only need to store negative integers?

duaneb · on June 20, 2012

Isn't that the same situation as only storing signed integers? The arithmetic is just a little different.

ralph · on June 20, 2012

No, you get room for one more negative int. x = x < 0 ? -x : x; is a common buggy abs(), e.g. when printing an integer. One should test if it's negative, perhaps print the '-', and then ensure it's negative for the rest of the per-digit code.

acqq · on June 19, 2012

Before starting reading everything note that author assumed when writing the article that (provided sizeof( int ) > sizeof( char )):

    char c = (char)128;
    printf( "%d\n", c );

should always be -128, whereas his commenter Mans (March 4, 2011 at 4:15 am) points out that this conversion is implementation-deﬁned according to the standard, that is, compiler authors are free to decide what the result of such conversion should be.

dchest · on June 19, 2012

Should the implementation-defined behavior be always the same? That is,

    char a = (char)128;
    char b = (char)128;

Will a == b for every implementation?

lmm · on June 19, 2012

Yes; that's the distinction between "implementation-defined" and "undefined" behaviour.

beagle3 · on June 19, 2012

While that's how it is generally understood, there is nothing stopping the implementation from defining the behaviour as:

    conversion is rounded up to the nearest multiple of 20 on odd lines. 
    conversion is rounded down to the nearest multiple of 17 on even lines.

Older versions of gcc would have such a (fully standard compliant) behavior when you used #pragma, that included running rogue or nethack and other stuff -- but later versions actually succumbed to implementing useful pragmas.

dllthomas · on June 19, 2012

That depends on what you mean by "should". Is it required by the standard? AFAIK, no. Would it be a braindead decision on the part of the compiler writers to do otherwise? Almost certainly. In practice, the answer to

> Will a == b for every implementation?

is, I suspect, yes.

TazeTSchnitzel · on June 19, 2012

I'd be surprised if it's non-deterministic, so yes.

caf · on June 20, 2012

It's true, but it doesn't affect the analysis. When x is CHAR_MAX and char is signed, the result of ++x is an implementation-defined value with type char - but it doesn't matter what this implementation-defined value is, it must always be less than or equal to CHAR_MAX, so ++x > CHAR_MAX is false no matter what the resolution of the implementation-defined behaviour is.

acqq · on June 20, 2012

If the ++x is performed using int, there must be a downcast to store it in the char x if you even need to store it (and in the given function you don't), but nowhere I see that the standard insists that that very int can't be used in the comparison, so 128 > 127 is OK. Note that the right side of comparison can be int from the start, even if the right side can fit the char. The only thing wrong was the older clang.

So the whole big article can be shortended just to: "clang in 2.7 had a bug but newer version doesn't, all other compilers are OK."

caf · on June 20, 2012

The standard specifies that ++x is equivalent to x+=1, and that the value of an assignment expression like x+=1 is "the value of the left operand after the assignment" (and that "the type of an assignment expression is the type of the left operand").

So, since x is of type char, the expression ++x has type char and its value is the value of x after the assignment, so it must be in the range CHAR_MIN to CHAR_MAX.

acqq · on June 20, 2012

You miss that ++x is not an "x" it's an expression for which the op also writes "The usual arithmetic conversions ensure that two operands to a '+' operator of type signed char are both promoted to signed int before the addition is performed" and I believe that that can be found in the text of the standard too. The underlying idea is to allow most of the run-time calculations to be performed with the "native" most efficient type, which is int. I still claim it's implementation dependent and allowable to evaluate to "128 > 127". I won't discuss this more, I accept that you have a different opinion.

caf · on June 20, 2012

The usual arithmetic conversions do cause the x and the 1 in the implicit x+=1 to be promoted to int before the addition, and the addition is performed in an int. However, since the lvalue that is the left hand operand of the assignment has type char, the result is converted back to char (which, if char is signed and the result is not in the range of char gives an implementation-defined result) and stored in the object designated by the lvalue. The result of the expression is this converted char value.

The important point here is that the result of an assignment expression (including pre- and post- increment and decrement) is the value that was stored in the object that was assigned to.

You can readily check this by examining the value of sizeof ++x (where x has type char).

Dove · on June 19, 2012

You know, not long ago I took somebody's "how well do you know C?" quizzes, and it was fully of this sort of question -- what C does with overflows and underflows in various circumstances. And I must admit, I felt like I had been asked what happens, exactly and specifically, to a particular piece of memory after dividing by zero. "I don't know, I try to avoid doing that!"

I don't know. I can admire the analysis, but I don't understand the motive. Do people really write code that relies on this sort of behavior? Or is it just trivia for trivia's sake?

pmjordan · on June 19, 2012

C's type system is a minefield. (and by extension, C++'s and Objective-C's) I've been bitten so many times by implicit signed/unsigned conversions and int-promotion that I'm now practically paranoid about the whole thing. Chars are promoted to int practically anywhere you use them for anything other than just copying, so if the compilers produce faulty code in those situations, there's no way you can win.

Oh, and speaking of chars: Objective-C's BOOL is really just a char. Yes, it's signed, and yes it gets int-promoted a lot. I dread to think how many bugs are lurking out there in Objective-C code because of that. I wonder if you could catch some of those by comparing the code generated by compiling with the usual BOOL = char typedef, and the same code but with BOOL typedef'd to _Bool (a real boolean type).

rcfox · on June 19, 2012

If it was a week or two ago, the quiz was probably from the same person as this article.

He's not just doing it for fun; he's a professor at the University of Utah, and he's researching this area, looking for bugs in compilers. In fact, he's developed a tool for this: http://embed.cs.utah.edu/csmith/

These tiny bits of strange code are condensed versions of what you might see in the wild, especially after preprocessing.

Nobody's doing ++x > y, but they do something that looks reasonable like foo(x) > bar(x), where foo() and bar() return chars.

nitrogen · on June 19, 2012

Nobody's doing ++x > y, but they do something that looks reasonable like foo(x) > bar(x), where foo() and bar() return chars.

I might write something like "++x > y"; preincrement followed by comparison is a common operation.

pgeorgi · on June 19, 2012

It's interesting when trying to do automated code analysis, and C is an important target for that, since it's used everywhere. But computer programs have a hard time to know when type limits are important and when they're not (or in fact not anticipated by the developer) - so they have to cover all cases.

(IMHO) For obscure cases, there's ideally some clearer version of the same behaviour that could be recommended to the developer - either helping them to find a potential error or to have them use a less ambiguous and easier to understand notation.

jlarocco · on June 20, 2012

Nobody purposely writes code that relies on that sort of behavior.

Much more likely is some code relying on it without realizing it, and getting "random" bugs for some input values.

egonschiele · on June 19, 2012

I do similar deep digging for some topics. It may seem useless, and a lot of times it is. But sometimes digging into a problem helps me understand the bigger picture better, and then it seems worthwhile. Lately I've become more picky about what I'll dig into though.

simcop2387 · on June 19, 2012

It's most likely this kind of code is encountered on embedded systems where you end up with smaller integers and things. This particular case I don't know if I've ever done it though.

sltkr · on June 19, 2012

To be absolutely accurate, the program still invokes implementation-defined behaviour. From the C standard on casting integers: “if the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised”.

Therefore, the author's conclusion that “the behavior is well-defined and every correct compiler must emit this output” is plain wrong. A correct compiler might emit a signal instead of outputting anything.

(However, printing 1 for the last case is still wrong, because there is no possible way for ++x to yield a value greater than INT_MAX, so this cannot be consistent with any implementation-defined behaviour.)

sltkr · on June 19, 2012

^ In the above, I meant CHAR_MAX instead of INT_MAX. Oops!

dchest · on June 19, 2012

Just tried with Plan 9-derived compilers shipped with Go, and there's no bug (provided that my re-implementation of the test case is correct).

Arcticus · on June 19, 2012

Dan Saks did a great set of presentations called "C and C++ Gotchas" on these types of things at the 2006 Embedded Systems Conference in San Jose.

Sorry couldn't find a link that wasn't behind a paywall but here is one for reference.

http://eetimes.com/electrical-engineers/education-training/t...

WalterBright · on June 19, 2012

The Digital Mars C compiler returns the correct answer with or without optimization turned on.

krollew · on June 20, 2012

"That’s a lot of trouble being caused by a two-line function. C may be a small language, but it’s not a simple one." I disagree. C is very simple. You need to know C is not suppoused to work the same on platform and I guess behaviour you have tested is not defined by standard. I think every good C programmer know when he have to be causious because behaviour may be platform dependent. In matter of fact you did as well. I guess case you've studdied is not common to be used in real code. If so, there is very easy solution:

  #if ((char)CHAR_MAX) + 1 > ((char)CHAR_MAX)
   /* some code here */
  #else
   /* some code here */
  #endif

There is no problem to me.

cperciva · on June 20, 2012

Every problem has a solution which is simple, elegant, and doesn't work. This is it.

Section 6.10.1, paragraph 4: "... For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same represenation as, respectively, the types intmax_t and uintmax_t..."

Inside preprocessor directives, your chars aren't chars any more.

krollew · on June 21, 2012

So what they are? Why it doesn't work? Your quote don't clarify.

cperciva · on June 21, 2012

Your "char" in a preprocessor directive is either a uintmax_t or an intmax_t. Either way, it's going to end up as #if 128 > 127 or #if 256 > 255 -- so the first case will always end up being included.

krollew · on June 22, 2012

It's standard behaviour or one compiler does so? Outside preprocessor char has another meaning? Thanks anyway.

cperciva · on June 22, 2012

Outside of the preprocessor, the char type is a one-byte integer (whether it's signed or not is implementation-defined).

krollew · on June 22, 2012

So the good solution is make a test that finds it out, for example in configuration script and set proper preprocessor constant and test that constant instead.

DanielBMarkham · on June 19, 2012

I confess to just scanning the code and being cold on C right now, but isn't ++x a post increment? That is, it occurs after the rest of the expression has been evaluated? (as opposed to x++, which is a pre-increment) Just guessing that the perhaps the issue is when the actual overflow is occurring.

acqq · on June 19, 2012

Mnemonic: read as when you would read the text, left to right:

   ++x increment than use the value x. 
   x++ use the value x then increment.

vhf · on June 19, 2012

It is the opposite, like in many other languages. e.g.

  int i = 0;
  printf("%i %i", i++, ++i); // prints "0 2"

Same goes in C, C++, Java, PHP, ...

[EDIT] Turns out this is a bad example, as "the order in which function arguments are evaluated is undefined" (cf. below) Correct is :

  int i = 0;
  printf("%i", i++); // prints 0
  printf("%i", ++i); // prints 2

decklin · on June 19, 2012

Actually, the order in which function arguments are evaluated is undefined. http://c-faq.com/expr/comma.html

You can use a comma in other expressions to introduce a sequence point: http://c-faq.com/~scs/cgi-bin/faqcat.cgi?sec=expr#seqpoints

vhf · on June 19, 2012

Thanks for your clever comment and for those emitting similar concerns.

You surely are right. I wanted to give a quick example, turns out it was a bad one. Next time I'll write :

  int i = 0;
  printf("%i", i++); // prints 0
  printf("%i", ++i); // prints 2

hythloday · on June 19, 2012

turns out it was a bad one

Given that you were making a point on an article about the complexity of C, I'd say it was an unintentionally excellent example.

sirclueless · on June 19, 2012

If you really want to make your brain hurt, there was an article on HN a while ago about the following statement:

    i = i++;

Evaluating that expression takes a real dive into the guts of the C spec.

simias · on June 19, 2012

I believe you trigger an undefined behaviour there. You modify i twice in the same statement.

jim_kaiser · on June 19, 2012

The standard says, evaluation order of function arguments is undefined.. but in general, printf evaluates right to left. This was a common puzzle as I remember.

excuse-me · on June 19, 2012

Is that defined behaviour?

Is there any rule saying that args to a function have to be evaluated in a particular order - ie is ',' a function point?

CodeMage · on June 19, 2012

It's precisely the other way around: ++x is a pre-increment and x++ is a post-increment.

dag11 · on June 19, 2012

++x evaluates to x's new (incremented) value.

x++ evaluates to x's old (pre-incremented) value.

loeg · on June 19, 2012

Nope, you've got the two backwards.

TwoBit · on June 19, 2012

Summary: ++x for signed char is supposed to convert x to int before the ++, but some compilers get it wrong.

robot · on June 20, 2012

The user must know about overflows and act accordingly. Compiler behavior may naturally change based on optimisations since it is undefined territory.

malkia · on June 19, 2012

That's undefined behavior, the compiler can do whatever it pleases to. It can even print 666, and be done.

AceJohnny2 · on June 19, 2012

No it's not. I had the same thought, but the author carefully points out from the standard: "The usual arithmetic conversions ensure that two operands to a “+” operator of type signed char are both promoted to signed int before the addition is performed."

He does assume sizeof(int) > sizeof(char), which is true on all platforms he has tried. It would be undefined on an AVR or other microcontroller where sizeof(int) == sizeof(char) though.

nova · on June 19, 2012

undefined on an AVR or other microcontroller where sizeof(int) == sizeof(char) though.

Just noting that ints are 16 bits wide in AVR-GCC, unless you use the -mint8 option which violates C standards.

JoachimSchipper · on June 19, 2012

Absolutely true. Note, though, that sizeof(int) == sizeof(char) is allowed if you have 16-bit chars.

cperciva · on June 20, 2012

16-bit chars aren't allowed in modern versions of C -- CHAR_BIT is required to be 8.

JoachimSchipper · on June 20, 2012

Thanks for the update! (For the curious: C99 or POSIX both require CHAR_BIT == 8.)

pascal_cuoq · on June 20, 2012

C99 only requires CHAR_BIT to be at least 8 (5.2.4.2.1:1). POSIX requires it to be exactly 8.

cperciva · on June 20, 2012

Oops, quite right. I missed the "equal or greater in magnitude" line when I was reading that section.

malkia · on June 19, 2012

Thanks! Should read the article more carefully.