Hacker News new | past | comments | ask | show | jobs | submit login
Both true and false: a Zen moment with C (markshroyer.com)
177 points by Niten on June 27, 2012 | hide | past | favorite | 51 comments

It's crazy to me that we as C programmers are flying blind about when we invoke undefined behavior. There is no blinking red light that warns us that something needs to be fixed, only strange and unexplainable behavior, or in some cases things just work fine because the undefined behavior happens to follow our expectations on some combination of platform and compiler.

I keep wishing that there was a Valgrind-like tool that could detect and report undefined behavior. It would have to be a dynamic, runtime tool, since many/most cases of undefined behavior cannot robustly detected statically. But it would need to have more information at runtime about the original C program than Valgrind has; the assembly alone does not contain enough information to know if the source C program is invoking undefined behavior.

I feel certain that if this tool existed, we would find scores of undefined behavior in all but the most conscientious C programs.

Clang has -fcatch-undefined-behavior, "Turn on runtime code generation to check for undefined behavior.". ( http://clang.llvm.org/docs/UsersManual.html )

Afaik, GCC has some flags for specific types of undefined behavior.

That flag catches four specific instances of undefined behavior, a tiny speck in a sea of all possible ways that undefined behavior can be invoked.

If you are building your project with Clang, and find that your code causes undefined behavior not caught by any existing in-compiler test, and you can reproduce it, you might want to ...

- sign up for the developers' list at http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev ,

- skim over http://clang.llvm.org/hacking.html ,

- build from 'trunk' by carefully following a few simple steps in http://clang.llvm.org/get_started.html ,

- wade in and reproduce your bug ,

... and then share your findings.

Together we can build better tools. Anyone with patience and interest can help.

I think we may have our wires crossed. I'm concerned about undefined behavior in my own programs, not undefined behavior in Clang itself. Someone mentioned Clang as a tool that can help detect UB in other programs, but I mentioned that these checks are far from exhaustive.

Sorry, the fault is mine alone. I used an ambiguous description above, now edited.

The only point I wanted to make was that a collective effort might help cut down on ambiguity in expression that the compiler does not identify. I think we could use "more warnings".

He means to say you're welcome to submit patches to make them more exhaustive.

Build this code with any reasonable warning setting (gcc's defaults are really minimal) and you will get a warning for this, since it's detectable at compile time. Other undefined behavior is harder to catch at compile time, but clang (and possibly other compilers) have tools to catch at runtime.

As is mentioned in the comments, in the author's original code that inspired the post, it was an uninitialized struct member. With pointer arithmetic and everything, I would have a really hard time detecting an uninitialized struct member at compile time.

http://embed.cs.utah.edu/ioc/ will detect one common cause of undefined behavior (integer overflow), which is useful and can catch a lot of exploitable security bugs, but of course it's far from complete.

I feel like valgrind would catch this particular error though.

Thats what Java programmers say.

Dont code C if you dont know what you are doing.

Only a fool spurns tools out of machismo.

Also, I'm willing to bet that that if you've written any significant amount of C, you have some undefined behavior that you're not aware of. Just because it works doesn't mean it's defined behavior according to the standard.

It's so much fun to explore why certain instances of undefined behavior do what they do. I never thought of this as being a possible effect of using uninitialized variables.

Logically, you're doing two things. You're negating the bool and then you're testing that negated value. This is what's expressed in the code generated from the non-optimized version. In the optimized version, the compiler has optimized this into a single jne instruction. I can understand why they wouldn't want to do it for non-optimized builds.

It sounds like gcc always uses xor for negating bools. If all you're doing is negating a bool, it's just a single instruction that involves no branching. It could be the fastest way.

I wonder what the fastest way to negate (boolean-wise) a value other than a bool would be. Maybe you just do a test and then set the value based on a xor of the zero flag.

> I wonder what the fastest way to negate (boolean-wise) a value other than a bool would be. Maybe you just do a test and then set the value based on a xor of the zero flag.

Probably "NOT reg". Exists in most CPU architectures, and is generally smaller (in variable-size instruction sets) than "XOR reg, immed".

NOT, at least in x86, is bitwise, so it won't do the right thing for values other than 0 and ~0 (-1).

Oh, I misunderstood. Thought it the point was to make it bitwise. In that case, it is very architecture-dependent. In x86, you can use TEST coupled with SETcc or CMOV for microarchitectures where branch mispredictions are very costly (P4 comes to mind). If not, and the result is not random, you're better off with a branch.

Another trick is to use the carry flag, i.e., NEG reg; MOV reg, 0; ADC reg, 0. This works for all 386-compatible processors and is reasonably fast.

Unrelated to the discussed topic in the post, but it reminded me of The Codeless Code, an absolute gem. http://thecodelesscode.com/contents

Therein, quite related: http://thecodelesscode.com/case/20

“The sage and the fool go to their graves alike in this respect: both believe the sage to be a fool. Where, then, may wisdom be found?”

Super cool, but I was hoping for a "bluff combinator" like C value. See: "How to circumvent church numerals", Mayer Goldberg and Mads Torgersen, Nordic Journal of Computing, Volume: 9, Issue: 1, 2002. Sometimes I can find a PDF, but right now, I can't.

Assuming the 'bluff combinator' is like what I think it is, you can do similar things in C++ by creating classes with overloaded operators. E.g. overload operator (bool) to return true, and overload operator ! to also return true.

Interestingly, . . . it's a pointless exercise delving into ASM to "see what happens" when you have undefined behaviour in C code. A slight flag change and the only code produced could equally well just be a single NOP.

Still, gcc is a pretty serious piece of software (no more starting nethack these days), so while the behaviour is undefined it's still interesting to see why the compiler ended up generating a code that behaves so weirdly.

If you need to find a use for it, it might make it easier to spot an undefined behaviour later and finding its cause ("hey, this evalutates both to true and false, it must be an unitialized bool!").

So it's still interesting and might not be completely pointless.

LLVM is more interesting piece of software, gcc is kind of crusty and warty.

Still, I'm glad you're interested in the language, if you're interested in undefined behaviour then you should carefully and closely read the latest free draft standard (N1570) and perhaps implement a small C compiler of your own ( Appel : Modern Compiler Implementation ).

There are many C compilers, they are free to do whatever they want when behaviour is undefined and in many cases you'll unfortunately not see anything odd at all, not until you port your code, change your flags or upgrade the compiler and then you'll having a seething mass of "well this stuff worked when we tested it" errors.

Strapping yourself into a vehicle and driving off a cliff teaches you a little about the dangers of accidents and gives you a reason to drive safely, it doesn't particularly teach much about the practice of safe driving.

The example code drove off the road, what happened next really isn't all that exciting.



Many serious pieces of software are crusty and warty.

It's not pointless at all. Yes, you can't rely on the output being that way all the time, but it still teaches you about the compiler, the language implementation, and the CPU architecture to investigate this stuff.

shrug I do think it's interesting to find out, in concrete terms, the sort of real-world assumptions that a compiler might make, which make C's undefined behavior what it is. Yes, pointless in terms of fixing the actual program, though... to each his own :)

I think the compiler is just wrong. In C, zero is false and non-zero is true. Testing just the bottom bit was wrong.

Wikipedia, Boolean in C:

  if, while, for, etc., treat any non-zero value as true

You're misreading what's going on. The if test is indeed testing for a non-zero value. The "problem" is the negation only flipped the LSB in the register, thus leaving behind a non-zero value that passed the second if test. This is perfectly legitimate, since the value is undefined, and if it wasn't the compiler would have already ensured it only contained 0 or 1.

I read all that. If false is zero and true is anything else, its incorrect behavior to assume anything about the value - a test for zero or non-zero is the only legitimate test. I may have passed in 55 as a value for 'true' since any non-zero value is supposed to be true (not false anyway).

You still don't understand.

The if test is correct. It's using non-zero as "truth". That's what you're talking about, and there's absolutely no question that it's doing the right thing here.

The negation is not using non-zero for truth, because it's a negation of a bool, and bools are restricted to only being 0 and 1. However since this particular bool is undefined, the in-memory representation happens to not meet that restriction. But that doesn't matter. The value is undefined. The negation could, quite legally, cause demons to fly out your nose. The fact that you've observed, prior to the negation, that the value appears to be true has no meaning. Undefined values remain undefined even after observation and after manipulation.

Ah. So its the bool type itself that's broken. If an int type was used to store logical operations, the contradictory result is impossible (both true and not true).

Nothing is broken. The compiler would be within its rights to do the same thing to an int (or more realisticly a char or similar type that's shorter than the registers its stored in).

Yes, sure, but why design a language that way? Who is it helping? No language implementer WOULD do that to an int. My development group avoids bool like the plague, using int instead - it has well-defined size, behavior and sensible warnings.

>Yes, sure, but why design a language that way? Who is it helping?

Compiler writers. C exists mostly to be a portable language that's easy to compile. A better question is why would you write programs in C in 2012.

>No language implementer WOULD do that to an int. It certainly happened to float/double on some older architectures where the internal representation included different flag bits. I wouldn't be at all surprised if it happened to char and short. I guess there's an argument for using int for all variables in your program (since memory isn't usually constrained enough nowadays for it to be worth using short etc.), but again, if you weren't memory-constrained why would you be using C?

>My development group avoids bool like the plague, using int instead - it has well-defined size, behavior and sensible warnings

Int might not suffer from this particular behaviour, but if you use an uninitialized variable it will bite you sooner or later. The article's takeaway isn't "avoid bool", it's "initialize your variables"

Maybe you misapprehend the bool issue? It isn't that bool doesn't match a register; its that the value is stored in memory in a subset of the storage unit i.e. 1 bit out of the byte. That's not true of any other scalar.

Suppose you have a char followed by a long in a struct; the compiler will insert seven bytes of padding after the char. There's nothing to prevent it setting the padding to 0 when initializing the char, and using a 64-bit load instruction to load it into a register - in which case you'd get exactly the same behaviour as seen here with bool, only even more confusing.

This is true for integers and pointers. The bool type (introduced in C99) though, is only required by C as to have the value 0 or 1. If you assign 55 to a bool, there are rules that implicitly convert that to a 1. If an implementation chooses to make bool only store 0 and 1, and you somehow manages to squeeze another value into that bool, it's perfectly fine for this to be undefined behavior. Boolean type

"When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1"

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf page 43

The built-in boolean type only has two values, 0 and 1. Anything else in the memory means it's not actually a value of that type.

Heh, interesting. So I guess it's also possible that "b && !b" evaluates to true... which is probably more ammo for C detractors.

That one does not work with g++ and actually does not produce a warning with -Wall without optimisations.

It makes me wonder if that's actually undefined behaviour in C++, because unlike the example in the fine article this time the behaviour of the program does not depend on the (undefined) value of b. Or maybe it's just g++ being clever.

There is plenty of undefined behavior in C++ [1]. It is important to understand that these aren't bugs in a particular compiler, but deliberate definitions in the standard itself to give compiler writers leeway in their implementations. Amusingly, that means "undefined behavior" is explicitly defined in the standard.


On the other hand, a strength reduction pass might just collapse it to false.

good old C compilers did not have the bool keyword and did not have this silly undefined behaviour. Standardisation committees should understand that the quality of a standard is inversely proportional to its length.

I'm guessing you missed the MySQL security vulnerability the other week? That was caused precisely by the lack of a proper boolean in older versions of C and an innocuous-looking workaround.

It's not really Zen until "p" evaluates to "mu".

That's essentially what undefined behaviour is.

There was a bare unlabelled power line wire hanging from a pole and I touched it and was fine, then another friend touched it another day was electrocuted.

How come the voltage on an unknown wire is just allowed to change like that - what are the laws of physics for!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact