Perhaps my favorite part of that paper is the fact that there is no ground truth (section 2.6). They discover bugs by testing random programs against multiple compilers. If the result from any of the compilers disagree, then there must be a bug. (They guarantee that the inputs are legal.) In theory, it's possible that all of the compilers could be wrong in the same way, which means they wouldn't discover a bug. In practice, this is extremely unlikely. But you can't know for sure. (In practice, they never saw an instance where there were three different results from three compilers; at least two of the compilers always agreed.)
How does this make sense?
If the result differs from the specification, it is a bug.
If the result is unspecified in the specification, the different compilers can differ as much as they want without any of them being considered buggy.
If they can do this, it finds a subset of bugs, with no false positives.
A large part of the C standard is implementation defined(see acqq's post here: http://news.ycombinator.com/item?id=4131828 ), so the result could be different on multiple compilers, not a bug, and STILL completely within spec.
NEVER, EVER, NOT IN A MILLION YEARS use a signed int/char etc, unless you are 200% certain you're doing the right thing (that is, it's for something specific you need it)
You WILL have problems, period.
"Oh it's just a matter of knowing the C spec" then please go ahead as I grab the popcorn.
But... your advice as written is just insane. They are real, and required routinely. You can't just apply this as a "for dummies" rule without essentially ruling out half of the code they'll need to write.
A negative number yes, but not really a memory offset (you shouldn't mix negative numbers and memory offset, really)
But yeah, if you're doing "math" go for it, but it's on rare occasions where you need negative numbers (subtraction yes)
The most common case I remember may be sound samples, where you have signed chars.
For all other cases you would be using floating point or decimal numbers
If I'm trying to avoid mathematical anomalies, floating point is not what I would run to... "Equal" is a matter of degree, you have to be careful with anything near zero and you can't carelessly mix together numbers that are a few orders of magnitude different than each other.
But for most of "math" you would go for floating point. You won't reinvent some fixed point math using integers just because...
Can you explain why you would advocate this? Am I misunderstanding you, or missing something?
I replied to the other comment in this thread with an openbsd vulnerability caused by doing what is being advocated (I did choose openbsd to be funny).
With signed integers, you'll run into the same problem with comparing to n+1 at INT_MAX or n-1 at INT_MIN.
It's just my experience. Don't get too wound up about it ;)
There's your problem: that's like saying cars are more dangerous than motorcycles because your finger can get squeezed by the door.
"There's a reason Java, C# etc. default to signed integer"
Legacy? And in Java/C#, and you usually use ints, not so much chars, shorts, etc and casts are probably more picky
I stand by my point, you should only use signed if you know what you're doing and for a specific use only (like math)
It's far too easy to get things wrong when you add unsigned integers into the mix; ever compare a size_t with a ptrdiff_t? Comes up all the time when you're working with resizable buffers, arrays, etc.
And no, Java did not choose signed by default because of legacy. http://www.gotw.ca/publications/c_family_interview.htm:
"Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up."
Unsigned is useful in a handful of situations: when on a 32-bit machine dealing with >2GB of address space; bit twiddling where you don't want any sign extension interference; and hardware / network / file protocols and formats where things are defined as unsigned quantities. But most of the time, it's more trouble than it's worth.
Depends on the case of course, but yes, you'll rarely hit the limits in an int (in a char, all the time)
"It's far too easy to get things wrong when you add unsigned integers"
I disagree, it's very easy to get things wrong when dealing with signed
Why? For example, (x+1) < x is never true on unsigned ints. Now, think x may be a user provided value. See where this is going? Integer overflow exploit
Edit: stupid me, of course x+1 < x can be true on unsigned. But unsigned makes it easier (because you don't need to test for x < 0)
"what unsigned arithmetic is"
This is computing 101, really (well, signed arithmetic as well). Then you have people who don't know what signed or unsigned is developing code. Sure, signed is more natural, but the limits are there, and then you end up with people who don't get why the sum of two positive numbers is a negative one.
Here's something you can try: resize a picture (a pure bitmap), with antialiasing, in a very slow machine (think 300MHz VIA x86). Difficulty: without libraries.
In fairness, I guess a function like this is a good example of why you should put in preconditions, as well as a good demonstration that "not all the world is a VAX" (nor MS C 7, nor GCC version N) :-)
Here's some sample C code tested on a 32-bit Linux system:
int main(int argc, char *argv)
unsigned int val1 = 0xffffffff;
printf("val1 == -1: %d\n", val1 == -1);
val1 == -1: 1
Any decent compiler should warn that such a check is always false, but people don't always pay attention to that stuff....
char c = (char)128;
printf( "%d\n", c );
char a = (char)128;
char b = (char)128;
conversion is rounded up to the nearest multiple of 20 on odd lines.
conversion is rounded down to the nearest multiple of 17 on even lines.
> Will a == b for every implementation?
is, I suspect, yes.
So the whole big article can be shortended just to: "clang in 2.7 had a bug but newer version doesn't, all other compilers are OK."
So, since x is of type char, the expression ++x has type char and its value is the value of x after the assignment, so it must be in the range CHAR_MIN to CHAR_MAX.
The important point here is that the result of an assignment expression (including pre- and post- increment and decrement) is the value that was stored in the object that was assigned to.
You can readily check this by examining the value of sizeof ++x (where x has type char).
I don't know. I can admire the analysis, but I don't understand the motive. Do people really write code that relies on this sort of behavior? Or is it just trivia for trivia's sake?
Oh, and speaking of chars: Objective-C's BOOL is really just a char. Yes, it's signed, and yes it gets int-promoted a lot. I dread to think how many bugs are lurking out there in Objective-C code because of that. I wonder if you could catch some of those by comparing the code generated by compiling with the usual BOOL = char typedef, and the same code but with BOOL typedef'd to _Bool (a real boolean type).
He's not just doing it for fun; he's a professor at the University of Utah, and he's researching this area, looking for bugs in compilers. In fact, he's developed a tool for this: http://embed.cs.utah.edu/csmith/
These tiny bits of strange code are condensed versions of what you might see in the wild, especially after preprocessing.
Nobody's doing ++x > y, but they do something that looks reasonable like foo(x) > bar(x), where foo() and bar() return chars.
I might write something like "++x > y"; preincrement followed by comparison is a common operation.
(IMHO) For obscure cases, there's ideally some clearer version of the same behaviour that could be recommended to the developer - either helping them to find a potential error or to have them use a less ambiguous and easier to understand notation.
Much more likely is some code relying on it without realizing it, and getting "random" bugs for some input values.
Therefore, the author's conclusion that “the behavior is well-defined and every correct compiler must emit this output” is plain wrong. A correct compiler might emit a signal instead of outputting anything.
(However, printing 1 for the last case is still wrong, because there is no possible way for ++x to yield a value greater than INT_MAX, so this cannot be consistent with any implementation-defined behaviour.)
Sorry couldn't find a link that wasn't behind a paywall but here is one for reference.
#if ((char)CHAR_MAX) + 1 > ((char)CHAR_MAX)
/* some code here */
/* some code here */
Section 6.10.1, paragraph 4: "... For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same represenation as, respectively, the types intmax_t and uintmax_t..."
Inside preprocessor directives, your chars aren't chars any more.
++x increment than use the value x.
x++ use the value x then increment.
int i = 0;
printf("%i %i", i++, ++i); // prints "0 2"
[EDIT] Turns out this is a bad example, as "the order in which function arguments are evaluated is undefined" (cf. below)
Correct is :
int i = 0;
printf("%i", i++); // prints 0
printf("%i", ++i); // prints 2
You can use a comma in other expressions to introduce a sequence point: http://c-faq.com/~scs/cgi-bin/faqcat.cgi?sec=expr#seqpoints
You surely are right. I wanted to give a quick example, turns out it was a bad one. Next time I'll write :
Given that you were making a point on an article about the complexity of C, I'd say it was an unintentionally excellent example.
i = i++;
Is there any rule saying that args to a function have to be evaluated in a particular order - ie is ',' a function point?
x++ evaluates to x's old (pre-incremented) value.
He does assume sizeof(int) > sizeof(char), which is true on all platforms he has tried. It would be undefined on an AVR or other microcontroller where sizeof(int) == sizeof(char) though.
Just noting that ints are 16 bits wide in AVR-GCC, unless you use the -mint8 option which violates C standards.