* Ada can do so using the Normalize_Scalars (though by default it'll just read whatever garbage is at that location) however the initialisation value is implementation defined (it's recommended that the value be invalid, so that an access to an uninitialised value faults)
* Rust requires that all control paths initialise variables
Both Clang and GCC can warn about potentially uninitialised variables (using -Wuninitialized in Clang and -Wmaybe-uninitialized in GCC, both being in -Wall)
Initialization is not the same thing, though, for everything but Ada.
I'm not looking for a c-like language where variables must be defined / are implicitly initialized to a constant. Those have problems, among other things the cost of double-initialization.
I'm looking for a c-like language where the compiler is free to assume that an uninitialized variable is anything it wants - but it must be consistent about it. So, for example, "return x - x", where x is an integer, will always return zero.
I'm happy to say your question led me on a very interesting adventure :)
Suppose x is an uninitialized int★ (pretend ★ is an asterisk---HN doesn't seem to allow asterisks in comments). Suppose we do
int ★y = x; (★x)++; (★x)--;
Should y-x necessarily be 0, in the language you're looking for? Think carefully. If x had been initialized (and pointed to a valid memory location, say), would y-x necessarily be 0? Surprisingly the answer is no! It's possible that x could point to itself. In that case, (★x)++; actually changes the value of x itself, so that it is no longer cancelled out by the following (★x)--;
I guess the moral is that what you're asking for is subtler than meets the eye
int test(int *x) {
int *y = x;
(*x)++;
(*x)--;
return (y - x);
}
into this at O3:
test(int*):
xor eax, eax
ret
Effectively, into:
int test(int *x) {
return 0;
}
Which seems to contradict your adventure. I think it's because x cannot legally point to itself - x is an int ★, not a void ★. And you can't do arithmetic on a void ★, so it's not a problem there either.
Here, GCC optimize it away. It's not required to do so. In fact, GCC can do whatever it wants in this case and still follow C standard. It may as well
Yeah, you can't do x = &x. But since you didn't initialize x, it's initial value can be whatever is in memory at that particular time (or something else, nasal demons etc)
If the particular value of x at runtime (there is no typechecks in C at runtime) happen to be &x, you can effectively get weird result.
I think the real answer is that the DCE optimizer understands that y is assigned the same address in x, x never changes (after arithmetic cancels out), y never changes, and then the difference between y & x becomes 0.
There are a lot of other ways for that to fail. You could scribble all over the executing code, or your runtime, or part of the operating system....
I don't think anyone would expect a "sane C" to let you scribble all over completely random pointers without harm. But if the pointer doesn't hit any bytes that are already in use, then "sane C" should return 0.
It would be sane to ask that the literal expression "x - x" be optimised to 0, but I would never dream of asking that "y - x" be optimised to 0. Not unless the compiler can prove the values will be exactly the same in all circumstances and no, it's not allowed to just delete code that invokes undefined behaviour.
> I think he's also talking about my mythical C-like language that has somewhat sane rules for these things.
Oh, OK.
It's important to realize, though, that if things get defined down too strictly, compilers will be hard-put to generate efficient code for more than one or two closely-related families of processors, which is important in a language designed to be useful for programming rather heterogeneous embedded systems (that is, more 'embedded' than ARM, perhaps with odd word sizes or similar).
Which is why I'm not taking about removing UB altogether, just talking about the more egregious examples of things that cause nasty bugs that aren't particularly problematic to code-gen.
Not to mention that embedded systems end up using no optimization half the time anyways, because compilers go so overboard with performance that good luck trying to write safe code in them.
> Those have problems, among other things the cost of double-initialization.
Compilers optimise away significantly more complex things than initialising a variable twice.
> I'm looking for a c-like language where the compiler is free to assume that an uninitialized variable is anything it wants - but it must be consistent about it. So, for example, "return x - x", where x is an integer, will always return zero.
So you're looking for C? Because whatever garbage's in there will be the same garbage in both references to x. I have issues with your qualification of that as "sane" though.
In C, the compiler is perfectly allowed to compile "return x - x" with x uninitialized into "return 1". Accessing an uninitialized variable in C is undefined behavior. The compiler can do anything. Nasal daemons.
> Compilers optimise away significantly more complex things than initialising a variable twice.
Right, but there are cases where being required to initialize a variable to, say, 0, is not equivalent than being required to initialize a variable to <some arbitrary value>.
For instance, "if (x == 0) y = someExpensiveFunction();" with x being uninitialized. With the "must be initialized to a specific constant" this sort of thing cannot be optimized, whereas with "must be consistent" this can be compiled down to nothing.
> In C, the compiler is perfectly allowed to compile "return x - x" with x uninitialized into "return 1". Accessing an uninitialized variable in C is undefined behavior. The compiler can do anything.
Sure. The peephole optimiser will pretty certainly run first and remove the whole expression though.
> For instance, "if (x == 0) y = someExpensiveFunction();" with x being uninitialized. With the "must be initialized to a specific constant" this sort of thing cannot be optimized, whereas with "must be consistent" this can be compiled down to nothing.
You've noted yourself in an other comment that the arbitrary value may be 0, the compiler can make no assumption about what's inside x, so there's absolutely no way to optimise this away, whereas with a specified constant the compiler knows statically whether the test will or will not succeed.
> The peephole optimiser will pretty certainly run first
That's what I'm trying to avoid though. The whole "pretty certainly" thing. The compiler will "pretty certainly" do not optimize this. The compiler will "pretty certainly" optimize that.
And then compiler versions change and people get bitten. Time and time again.
You're misinterpreting what I'm trying to say w.r.t. uninitialized variables here, or perhaps I'm just being idiotic at explaining myself.
In what I'm describing, the compiler can assume the value of x to be anything it wishes. It's just that, once the value is fixed, it must be fixed.
Think of it this way: the compiler may choose any initial value of any variable it chooses.
I'm proposing a language where a read from an uninitialized variable must be consistent with the uninitialized variable having been set to an arbitrary value (in range) before the first read from it. So compiling the static code {int x; return x;} into {return 0;} is perfectly allowed, as is compiling it into {return 42;}.
>The peephole optimiser will pretty certainly run first and remove the whole expression though.
Which is pretty dangerous.
RunCriticalCodeA();
if (x) {
y = 3;
RunCriticalCodeB();
} else {
y = 4;
RunCriticalCodeB();
}
OtherCriticalCodeC();
Fail to initialize x and suddenly all that code is replaced with a return. Even the part outside the if, even the part before the if. Compilers already exist that are this aggressive.
Yep. Optimization by C compilers has gotten to the point of "your worst enemy is the compiler". (My favorite: I had UB in my debug function, but inside an ifdef debug block for speed purposes. So I was getting the most absurd magic errors, that only happened when I was trying to debug)
As I've said, I want reads from uninitialized variables to be at least consistent.
if (x) {
foo();
} else {
foo();
}
should always run foo(). Ditto:
foo();
if (x)
bar();
baz();
should always run baz() (assuming everything before it is "normal" - not going to get into jumps or system exit here), should always run foo() if it runs baz(), and should always run foo().
* Ada can do so using the Normalize_Scalars (though by default it'll just read whatever garbage is at that location) however the initialisation value is implementation defined (it's recommended that the value be invalid, so that an access to an uninitialised value faults)
* Rust requires that all control paths initialise variables
Both Clang and GCC can warn about potentially uninitialised variables (using -Wuninitialized in Clang and -Wmaybe-uninitialized in GCC, both being in -Wall)