This is a variation of an unfixed 2004 GCC bug. Clang detects the issue when the right warnings are enabled. Those warnings are disabled currently. A developer is auditing and fixing all such instances found by Clang so that they can reenable those warnings.
It's not that GCC proves the uninitialized variable must be 0, it's just that CCP sets it to 0 and everything happens to work out in specific cases.
It does in fact prove that it is zero, because it is the meet of lattice values (undefined, 0).
You can actually solve this particular case by interpreting phi nodes differently than CCP does. CCP does not generally care about what blocks things occur in - it doesn't have to, all definitions dominate all uses, so it is safe to evaluate things in any order. The only thing you stand to lose is optimality because things only go one direction on the lattice (to ensure that it reaches a fixed point).
However, in this particular case, you can see if that if you treated the phi nodes as the equivalent form (assignments occurring on the edge of the previous block), and then processed the loop blocks in dominance order, it is obvious the use is uninitialized.
It is only when looking at phi nodes as an unordered black box (as ccp does) that you get this particular variant of the issue.
This is, of course, just a very cheap form of flow sensitivity.
You can also get the same effect by using SSI form in a lot of cases.
Most compilers do not bother doing real expensive flow sensitive analysis to try to analyze stuff like this, because almost everything you can do comes with its own fun source of false positives and negatives.
For this specific optimization, you want https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-c...
Some compilers perform whole program optimisation, rather than separate compilation of components/modules. This takes longer, but can find some extra optimisations (e.g. if a module provides some functionality, but it's not used in the resulting program, it can be deleted as dead code). MLton does this for StandardML ( http://mlton.org ) and Stalin does this for Scheme ( https://en.wikipedia.org/wiki/Stalin_(Scheme_implementation) )
At the very extreme end is the idea of superoptimisation ( https://en.wikipedia.org/wiki/Superoptimization ). Rather than translating the given code in some way, like a compiler, a superoptimiser treats the given code like a specification or test suite. It performs a brute-force search through the space of all possible programs, looking for any that match the specification. This can find truly optimal code, but it's ridiculously slow. So far its only real-world uses are to find small "peephole" (find/replace) optimisations that can be added to real compilers like LLVM.
There's a related idea called supercompilation (which is often confused or conflated with superoptimisation) https://stackoverflow.com/questions/9067545/what-is-supercom...
A supercompiler can be thought of as running the given code at compile-time. If we know the values of all the functions and variables in a given piece of code, we can just run it to find out what the answer is. Interestingly, we can also "run" code involving values we don't know: we just pass those values around as opaque black-boxes. This is really useful for collapsing layers of indirection, resolving dynamically dispatched targets, baking-in configurable parameters, etc. The downside is that it can lead to bloated executables. Roughly speaking, supercompilation replaces a few long paths containing many conditionals, with many short paths containing fewer conditionals. It also specialises general-purpose functions into many single-use functions which have particular arguments baked-in.
It has to be assuming that, because it is the only path that doesn't lead to undefined behavior.
If I understood correctly, Linus had some colorful thoughts on that:
If you know something is undefined, you warn about it. You don't silently generate random code that doesn't match the source code just because some paper standard says you "can".
To cite the jargon file:
Also, it is common to use bracketing with unusual
characters to signify emphasis. The asterisk is most
common, as in “What the *hell*?” even though this
interferes with the common use of the asterisk suffix
as a footnote mark. The underscore is also common,
Edit: Ironically, HN is translating my asterisk-bolds into italic.
Not a kernel developer: Can you test Kernel code with a sanitizer (if that sanitizer doesn't work in a compiled kernel) by compiling into and using the code as blob in userspace code with mocks (Or whatever, unless it's just utility that doesn't need anything other than an interface)?
I'm not sure what this means. You should not write your code any differently regardless of whether you used a sanitizer to help gain confidence in your implementation.
You should not use sanitizers in production (at least not the current generation).
> Can you test Kernel code with a sanitizer
In fact you can! Both UBSan  and ASan  are in use to root out bugs in linux.
(I'd never heard of the tool before.)
So after the if, the compiler sees that ret is either 0, or an undefined value.
It picks 0 as the undefined value since that's a good optimization, so ret is now always 0, and (in this case) there is no warning.
No it isn't; it costs an extra instruction to initialize that register or memory location to zero.
How C compilers optimize situations involving uninitialized variables is by leaving those variables alone. They do so with the standard's blessing, which grants them that uninitialized automatic variables are "indeterminately-valued" and that if the program uses an indeterminate value, its behavior is undefined. Correctly written programs that perform their own delayed initialization by later assignment benefit from the fact that the compiler didn't add wasteful code to initialize those objects, only to have that compiler-generated initial value be overwritten.
> ret is now always 0, and (in this case) there is no warning.
It's not correct for diagnostics to pertain to versions of the program that were logically altered by the compiler, rather than to the original program. It may be ISO-conforming, only because ISO C doesn't require those diagnostics in the first place, let alone requiring them to be reliable.
If you're talking about the "xor eax, eax" that was emitted, then absolutely not. GCC is doing the most efficient thing possible with eax here.
That xor is not inserted because GCC kindly initializes the variable for you to save you from your mistakes, but because it has to initialize eax before returning (look, there's a function call before in the code example).
And after deciding the return value to be 0 it knows "xor eax, eax" is always going to be faster than actually loading the real value of "ret". (In fact it's a zero-idiom, even a NOP instruction is slower than that!)
What I was really trying to talk about is the SSA semantics and not the actual emitted code.
Now in fairness I'm not familiar with GCC's IR, but I'm going to blithely assume what GCC does with unitialized variables somewhat ressembles LLVM's undef semantics.
When I say that picking 0 is a good optimization, I mean that at an SSA level, the compiler is going have a Phi between Undef and the range [0, 0].
If by "leaving those variables alone" you mean that the compiler should have returned Undef as the result of that Phi, that would be a horrible miscompilation! And of course picking something else than 0 would just be silly.
>It's not correct for diagnostics to pertain to versions of the program that were logically altered by the compiler, rather than to the original program.
Agreed. As a user, I am not happy about this lack of diagnostics and I'm happy to call that a bug if you want. And furthermore, I don't think performing the optimizations above is fundamentally incompatible with good diagnostics. Probably just hard to implement for GCC.
Disclaimer: I am not actually a compiler engineer and this probably contains mistakes.
It seems that the most efficient thing with %eax is not to mention it in an instruction.
There may be some quirk/feature of modern Intel processors that clearing a register will dis-entangle it from considerations of prior hazards. So that is to say, when we execute 'xorl %eax, %eax', the processor knows that any prior value in `%eax` is no longer required by subsequent code. A new lifetime has begun for that register. Therefore, there is no need to execute a pipeline stall to wait for that prior value, if it so happens that that prior value is not yet ready. Internally to the processor, the ISA-level register `%eax` can be assigned to a different internal register at that point, under the discipline of "register renaming".
In the absence of any such hardware considerations, the clearing of %eax is pure waste.
> it has to initialize eax before returning
Where is that requirement coming from? Not from ISO C, certainly. The behavior is undefined. It was the programmer's job to ensure that the return value is calculated by a well-defined expression, not that of the compiler. From that standpoint alone, it's perfectly conforming to emit code that just leaves the existing content in %eax and returns.
not so much of a quirk, but basic behaviour of any OoO processor due to register renaming. This is true on the vast majority of intel (and AMD) cpus (in fact probably all the currently sold ones as even atom has acquired a limited OoO engine and the knight variants are no more).
It is so fundamental that, as described elsethread zeroing a register is almost free (it still costs decode bandwidth and icache size).
Bottom line in the vast majority of cases, zeroing the registers is almost always a win.
edit: having said that I doubt that gcc is trying to optimize the UB case, probably it the optimizer just requires that the variable has a value at that point.
I think what you're missing is that UB is a property of the execution of the program, and not just of the code that was compiled.
If in practice the if branch is always taken, then there is NO undefined behavior (surprisingly, perhaps)!
That means GCC has to be prepared to handle that case, and has to set eax to 0 when the if branch is taken.
>Where is that requirement coming from? Not from ISO C, certainly. The behavior is undefined.
Not so! You don't know until the program is run.
Edit: (removed some speculation about assuming the branch is always taken, that was not relevant)
>There may be some quirk/feature of modern Intel processors that clearing a register will dis-entangle it from considerations of prior hazards. So that is to say, when we execute 'xorl %eax, %eax', the processor knows that any prior value in `%eax` is no longer required by subsequent code. A new lifetime has begun for that register.
That's absolutely right, by the way. I think this was introduced with Sandy Bridge, in 2011.
Look, GCC (what I have here: 7.3.0) is doing this even for the following trivial function:
xorl %eax, %eax
You're absolutely right that in this case, it's wasteful.
In fact, if you try Clang you will see that it only emits a ret.
Maybe GCC is doing you a courtesy (or maybe nobody taught it to completely omit an Undef return value, so it just picks zero!).
Either way, it's unconditionally UB, GCC can do whatever it likes.
>do we still know until run-time that this has UB?
Well, there is no branch here, is there? I'm clearly not going to disagree with you on this one =]
To be clear: UB is still a property of the execution of the program, but here it's not hard to statically determine that all executions of this program are Undefined Behavior.
I threw this in godbolt, so play around, e.g. initialize ret or invert the condition. Linus' version has 36 lines, most other things I tried end up with 40 or more.
Update: Added -Wall to godbolt
So GCC has to have some instruction somewhere that ensures eax is 0 at the end of the function in case the if was taken.
That's why it can't just leave a garbage value in eax. Sometimes it's not actually garbage.
And when it really is garbage? Well who cares, might as well return 0.
When shmem_reserve_inode is called, under the x86 C ABIs (all of them!), it will place its return value in eax.
So after the call, at this precise moment, "ret" is "eax", and we don't know it's value.
Then we do "if (ret) return ret", after this line "ret" is still "eax", and this time we know for sure that its value is 0.
But eax is not bound to ret until the end of the function. EVERY function we call before returning is allowed to modify eax (whether that function call returns a result or not)!
And since we know that ret has the value 0, well we don't even need to keep the value in register eax, do we? We can just use eax for something else and not worry about storing "ret" anywhere, it's just a constant now.
>What I don't somehow see is the advantage of, or requirement for, preparing a zero value in ret that didn't come from shmem_reserve_inode
That's the subtelty. We don't prepare ret, we prepare eax.
Conceptually, at the C language level, at the tail of the function "ret" still contains 0.
But at the x86 level, eax DOES NOT necessarily contain 0.
(Because we call d_instantiate and other functions before returning, and they're allowed to modify eax).
So ret has the right value, but we still need to load the value of ret, into eax before returning. And the most efficient way to do that? Just set eax to 0.
I genuinely hope that helps =]
Now I can see how x86 C ABIs is working; thank you for explaining because I did not know much about that before, but now I know, and indeed it is making sense. (On a different instruction set, something else might make sense; I don't really know.)
>If we do this later in the function, like just before returning, we can use %eax for other purposes in the meanwhile.
Okay, I think we completely agree here!
>that's not semantically the same as initializing ret.
Hmm, so is it that you're asking why we're initializing ret? Well, that sounds like a good excuse as any for another incredibly long and boring wall of text =]
If I'm extra lucky I'll get called out as the amateur I am and learn something in the process (!)
So, here's the thing. Talking about the compiler initializing ret as a variable, separate from its storage in eax is not really what the compiler is trying to do, not as I understand it.
We really aren't "initializing it" so much as trying to guess it's value in all possible executions (because that's just what compilers do these days).
The way the compiler works is that first it gets rid of the idea of variables that can be assigned multiple times, it switches to something called SSA form where every "variable" is initialized exactly once. Want to reassign a variable? Just declare one with a new name instead and use it going forward.
The funny part of SSA is that when there's a branch, after it joins back you end up having to define a variable that could have two possible values (branch taken, not taken).
That's not something you should normally be able to do with the SSA rules, so it's represented by a special Phi "value" in the compiler's intermediate representation.
A Phi basically just says "either we came from path A and we have value Va, or from B and it's Vb".
But what the compiler really wants is to forget about the branch and the Phi business, and just deduce a plain value for ret so it can move on with cold hard numbers in mind.
Since we have undefined behavior here, we can simplify Phi<Undef, [0, 0]> into just [0, 0]. And that's how the compiler "forgets" that ret was uninitialized in the first place. It just optimizes the undefinedness away while trying to guess values, if you will.
So long story short we're not so much trying to intialize ret than we're trying to guess it's value, and forget that there were multiple branches in the first place.
(And for the code path where it's deduced to be 0, well it's not really picking, it doesn't have a choice =] )
A compiler with less optimizations could chose to spill the "ret" variable and not pick a value at all, for example. It would just return whatever happens to be lying on the stack if you took the unitialized path.
My point really is that there are two interesting paths here, one where it's undefined and one where it's 0 ... I guess my point is that 0 is more than just 'convenient' because it's required for one of those paths
The "actual" frequency of the outbursts is somewhat secondary to the perception of frequency. Every time Linus writes a profanity-laced rant, even if it's email number 83/175 for the day, that particular piece is preserved, archived, and passed around the internet for giggles, further reinforcing whatever pre-existing conception the person referencing the email is arguing for.