void f(struct some_struct* p) {
int x = p->some_field;
/* ... */
if (p != NULL) {
/* this block might be executed even if p = NULL */
}
}
Because reading `p->some_field` is already undefined behavior unless `p != NULL`, the compiler is free to assume that `p != NULL` is always true, and might avoid the check.
If the memory access doesn't crash the program for whatever reason (maybe it got reordered somewhere else or eliminated as dead code or whatever, I dunno), then if you call that function with a NULL pointer, you fall into undefined behavior that might manifest as that check that you put right there being skipped.
> If the memory access doesn't crash the program for whatever reason (maybe it got reordered somewhere else or eliminated as dead code or whatever, I dunno), then if you call that function with a NULL pointer, you fall into undefined behavior that might manifest as that check that you put right there being skipped.
No; if `p` is NULL, this function has undefined behavior. Full-stop. It is a 100% meaningless function as soon as `p` is NULL, because the "NULL check" happens after the pointer is dereferenced.
So the issue isn't that the compiler can make incorrect optimizations -- the compiler makes optimizations that are entirely correct, assuming that the code that you wrote isn't meaningless.
The code isn't meaningless, it just exhibits undefined behaviour. This doesn't make the program wrong, it just means the standard has nothing to say about what precisely might happen. If the compiler chooses to infer the presence of an equivalent to VC++'s __assume (http://msdn.microsoft.com/en-us/library/1b3fsfxw%28v=vs.90%2...) from the presence of undefined behaviour, it's within its rights to do so, but this particular approach is by no means mandatory.
In fact, most of the compilers I've used actually don't do this, and are (in my view) all the better for it.
> The code isn't meaningless, it just exhibits undefined behaviour.
That's what undefined behavior means. Semantically speaking, the C language assigns no meaning to that function if the input pointer is NULL, and is therefore "wrong" by any reasonable definition of the word if it is NULL -- so the compiler is free to make an array of optimizations based on the fact that the input pointer is not NULL.
I think that interpretation is too strict. The standard is fairly clear on what the result of undefined behaviour might be, defining it as:
``behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.
``NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).'' (italics mine)
This sounds like a long way from "meaningless" in my book. To my reading, the purpose of undefined behaviour appears to be to avoid unduly constraining implementations by not mandating behaviour that could be inefficient, costly or impossible to provide.
You (or anybody else!) may disagree on how far this inch given could or should be taken. But I think the fact the standard explicitly suggests that undefined behaviour could do something reasonable is evidence that programs producing undefined behaviour do not necessarily have to be considered meaningless.
(As a concrete example I have worked on one system where NULL was a pointer to address 0, and where address 0 was readable. Not only that, but in fact address 0 actually contained useful information, and some system macros used it. It was some kind of process information block and so there was a whole family of macros that looked like "#define getpid() (((uint32_t * )0)[0])", "#define getppid() (((uint32_t * )0)[1])", that sort of thing. I'd say this is rather odd, but the standard would appear to allow it. (However, perhaps needless to say, gcc was not the system compiler.))
(See also, the approved manner for using objc_msgSend, since time immemorial.)
On the contrary. The standard is perfectly clear that anything can happen when you write code that employs UB.
> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.
NO REQUIREMENTS -- so, semantically, programs that employ undefined behavior are completely meaningless.
With regard to the second quotation, the part you italicized is nice, but the part you didn't is just as important:
> Possible undefined behavior ranges from ignoring the situation completely with unpredictable results
The entire quotation basically says "when you write code with undefined behavior, anything can happen; the results can be unpredictable, or they can appear to be sensible. An error could also be triggered." But that's the point: no behavior is specified. There's no restriction to what might happen.
Take the function above that has a NULL dereference when the input pointer is NULL. That function could be compiled in such a way that it writes an ASCII penguin to stdout if the input pointer is NULL; it's totally within its rights to do that. Your mental model of how C programs work is entirely inaccurate if you expect undefined behavior in C to do something that you deem sensible.
With your summing up of the quotation, I think you're again being too strict. If the standard doesn't define undefined behaviour, which it doesn't - well, what then? You claim this renders any program that invokes undefined behaviour meaningless; I claim (as I think the standard wording implies) that this simply means the standard doesn't define the results, which then necessarily depend on the implementation in question.
(It may be OK for anything to then happen, but as a simple question of quality - and common decency ;) - an implementation should strive to ensure that the result is not terribly surprising to anybody familiar with the system in question. And I'm not really sure that what gcc does in the face of undefined behaviour, conformant though it may be, passes that test.)
In C99 there are three levels of not fully specified behaviour:
1. Undefined: anything is permitted, the standard imposes no requirement whatsoever. A typical example is what happens when a null pointer is dereferenced.
2. Unspecified: anything from a constrained set is permitted. Examples include the order of evaluation of function arguments (all must be evaluated once though any order is allowed).
3. Implementation-defined: the implementation is free to choose the behaviour (possibly from a given set), but must document its choice. An example is the representation of signed integers.
If x is unused, the compiler is allowed to remove the assignment. If it does that optimization after it removes the "redundant" null pointer test, the optimizer has legally altered your program to remove the null pointer check you thought you had.
Why would you expect the NULL pointer check to do anything? The function, as given, is meaningless if `p` is NULL. The problem isn't that compilers are too aggressive in their optimization -- the problem is that people who are learning C don't usually learn what it means for the behavior of a program to be undefined.
> If x is unused, the compiler is allowed to remove the assignment
This and the Apple SSL bug makes me think that optimising compilers should be far more explicit (i.e. emitting messages or even warnings) about what they're doing than they are now, because it seems far too much about the optimisation process is being hidden and not transparent enough. Not only unreachable code, messages like "result of computation x is never used", "if-condition assumed to be {false, true}", "while-loop condition always false - body removed", etc. would be extremely useful for detecting and fixing these problems.
There have been Linux kernel security bugs where a NULL dereference might not result in a crash, because a malicious user-space program had mapped some accessible memory at address zero. The kernel then went on to skip the optimized-out null check, use vtable pointers in this null page and eventually execute arbitrary code in ring 0.
> That would've already crashed at the p->some_field if p == NULL.
1. Typically yes, and that is exactly why the compiler can remove the check after it, it's already undefined behavior so it can assume the optimistic case (and remove the check).
2. But it is not always true that it would crash. p->some_field might happen to be in a valid memory location to be accessed. This doesn't happen normally because low memory addresses (0-1024, say) tend to not be accessible by userspace programs. But I am not aware of a spec ensuring that. An OS could in theory let your program map memory at address 4, in which case p->some_field would succeed if p is NULL and the offset if 4.