\* Go and D initialise all variables \* Ada can do so using the Normalize_Scalar...

TheLoneWolfling · on April 22, 2015

Initialization is not the same thing, though, for everything but Ada.

I'm not looking for a c-like language where variables must be defined / are implicitly initialized to a constant. Those have problems, among other things the cost of double-initialization.

I'm looking for a c-like language where the compiler is free to assume that an uninitialized variable is anything it wants - but it must be consistent about it. So, for example, "return x - x", where x is an integer, will always return zero.

xamuel · on April 22, 2015

I'm happy to say your question led me on a very interesting adventure :)

Suppose x is an uninitialized int★ (pretend ★ is an asterisk---HN doesn't seem to allow asterisks in comments). Suppose we do

int ★y = x; (★x)++; (★x)--;

Should y-x necessarily be 0, in the language you're looking for? Think carefully. If x had been initialized (and pointed to a valid memory location, say), would y-x necessarily be 0? Surprisingly the answer is no! It's possible that x could point to itself. In that case, (★x)++; actually changes the value of x itself, so that it is no longer cancelled out by the following (★x)--;

I guess the moral is that what you're asking for is subtler than meets the eye

TheLoneWolfling · on April 22, 2015

GCC 4.9.2 seems to compile this:

    int test(int *x) {
      int *y = x; 
      (*x)++; 
      (*x)--;
      return (y - x);
    }

into this at O3:

    test(int*):
        xor	eax, eax
        ret

Effectively, into:

    int test(int *x) {
        return 0;
    }

Which seems to contradict your adventure. I think it's because x cannot legally point to itself - x is an int ★, not a void ★. And you can't do arithmetic on a void ★, so it's not a problem there either.

palunon · on April 23, 2015

Here, GCC optimize it away. It's not required to do so. In fact, GCC can do whatever it wants in this case and still follow C standard. It may as well

Yeah, you can't do x = &x. But since you didn't initialize x, it's initial value can be whatever is in memory at that particular time (or something else, nasal demons etc)

If the particular value of x at runtime (there is no typechecks in C at runtime) happen to be &x, you can effectively get weird result.

In fact, at -O0 :

    000000000000000b <test>:
       b: 55                    push   %rbp
       c: 48 89 e5              mov    %rsp,%rbp
       f: 48 89 7d e8           mov    %rdi,-0x18(%rbp)
      13: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      17: 48 89 45 f8           mov    %rax,-0x8(%rbp)
      1b: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      1f: 8b 00                 mov    (%rax),%eax
      21: 8d 50 01              lea    0x1(%rax),%edx
      24: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      28: 89 10                 mov    %edx,(%rax)
      2a: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      2e: 8b 00                 mov    (%rax),%eax
      30: 8d 50 ff              lea    -0x1(%rax),%edx
      33: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      37: 89 10                 mov    %edx,(%rax)
      39: 48 8b 55 f8           mov    -0x8(%rbp),%rdx
      3d: 48 8b 45 e8           mov    -0x18(%rbp),%rax
      41: 48 29 c2              sub    %rax,%rdx
      44: 48 89 d0              mov    %rdx,%rax
      47: 48 c1 f8 02           sar    $0x2,%rax
      4b: 5d                    pop    %rbp
      4c: c3                    retq

TheLoneWolfling · on April 23, 2015

I don't see what you're saying here. In C, it's undefined, so it's irrelevant.

In my mythical language, the compiler is free to choose whatever initial value of x it wants. So your counterexample still doesn't work.

oso2k · on April 23, 2015

I think the real answer is that the DCE optimizer understands that y is assigned the same address in x, x never changes (after arithmetic cancels out), y never changes, and then the difference between y & x becomes 0.

TheLoneWolfling · on April 23, 2015

Yep.

And you can apply the same logic to the grandparent comment.

Dylan16807 · on April 23, 2015

There are a lot of other ways for that to fail. You could scribble all over the executing code, or your runtime, or part of the operating system....

I don't think anyone would expect a "sane C" to let you scribble all over completely random pointers without harm. But if the pointer doesn't hit any bytes that are already in use, then "sane C" should return 0.

anon4 · on April 22, 2015

It would be sane to ask that the literal expression "x - x" be optimised to 0, but I would never dream of asking that "y - x" be optimised to 0. Not unless the compiler can prove the values will be exactly the same in all circumstances and no, it's not allowed to just delete code that invokes undefined behaviour.

cbd1984 · on April 23, 2015

> it's not allowed to just delete code that invokes undefined behaviour.

In your mind, what does 'undefined behavior' mean? Because to the rest of us, it means "The standard says the compiler can do anything at all".

TheLoneWolfling · on April 23, 2015

I think he's also talking about my mythical C-like language that has somewhat sane rules for these things.

cbd1984 · on April 23, 2015

> I think he's also talking about my mythical C-like language that has somewhat sane rules for these things.

Oh, OK.

It's important to realize, though, that if things get defined down too strictly, compilers will be hard-put to generate efficient code for more than one or two closely-related families of processors, which is important in a language designed to be useful for programming rather heterogeneous embedded systems (that is, more 'embedded' than ARM, perhaps with odd word sizes or similar).

TheLoneWolfling · on April 23, 2015

Yep.

Which is why I'm not taking about removing UB altogether, just talking about the more egregious examples of things that cause nasty bugs that aren't particularly problematic to code-gen.

Not to mention that embedded systems end up using no optimization half the time anyways, because compilers go so overboard with performance that good luck trying to write safe code in them.

pascal_cuoq · on April 22, 2015

> no, it's not allowed to just delete code that invokes undefined behaviour.

I think it is.

MaulingMonkey · on April 23, 2015

> It's possible that x could point to itself.

...in a way that doesn't violate C++'s type aliasing rules, and thus invoking undefined behavior?

EpicEng · on April 23, 2015

Ummm... x certainly cannot point to itself within the rules of the language, so... poor example.

WalterBright · on April 22, 2015

> Those have problems, among other things the cost of double-initialization.

In D:

    int x;  // initialization eliminated as a 'dead store' by optimizer
    x = 3;

You can also do:

    int x = void;

making the "don't initialize this" proactively marked, rather than being the default behavior.

TheLoneWolfling · on April 23, 2015

Right, and in trivial cases that works.

But if you've got, for example

    int x;
    if (somethingThatYouKnowIsTrueButTheCompilerDoesnt())
        x = 3;

there's no way to express that with either of those methods. You'll end up with the redundant store.

WalterBright · on April 23, 2015

If you know it is always true, why is it in an if() ? You'd right it as:

    int x;
    if (somethingThatYouKnowIsTrueButTheCompilerDoesnt())
        x = 3;
    else
        assert(0);

and the compiler will remove the redundant store to x.

masklinn · on April 22, 2015

> Those have problems, among other things the cost of double-initialization.

Compilers optimise away significantly more complex things than initialising a variable twice.

> I'm looking for a c-like language where the compiler is free to assume that an uninitialized variable is anything it wants - but it must be consistent about it. So, for example, "return x - x", where x is an integer, will always return zero.

So you're looking for C? Because whatever garbage's in there will be the same garbage in both references to x. I have issues with your qualification of that as "sane" though.

TheLoneWolfling · on April 22, 2015

Did you read the article?

In C, the compiler is perfectly allowed to compile "return x - x" with x uninitialized into "return 1". Accessing an uninitialized variable in C is undefined behavior. The compiler can do anything. Nasal daemons.

> Compilers optimise away significantly more complex things than initialising a variable twice.

Right, but there are cases where being required to initialize a variable to, say, 0, is not equivalent than being required to initialize a variable to <some arbitrary value>.

For instance, "if (x == 0) y = someExpensiveFunction();" with x being uninitialized. With the "must be initialized to a specific constant" this sort of thing cannot be optimized, whereas with "must be consistent" this can be compiled down to nothing.

masklinn · on April 22, 2015

> Did you read the article?

Of course.

> In C, the compiler is perfectly allowed to compile "return x - x" with x uninitialized into "return 1". Accessing an uninitialized variable in C is undefined behavior. The compiler can do anything.

Sure. The peephole optimiser will pretty certainly run first and remove the whole expression though.

> For instance, "if (x == 0) y = someExpensiveFunction();" with x being uninitialized. With the "must be initialized to a specific constant" this sort of thing cannot be optimized, whereas with "must be consistent" this can be compiled down to nothing.

You've noted yourself in an other comment that the arbitrary value may be 0, the compiler can make no assumption about what's inside x, so there's absolutely no way to optimise this away, whereas with a specified constant the compiler knows statically whether the test will or will not succeed.

TheLoneWolfling · on April 22, 2015

> The peephole optimiser will pretty certainly run first

That's what I'm trying to avoid though. The whole "pretty certainly" thing. The compiler will "pretty certainly" do not optimize this. The compiler will "pretty certainly" optimize that.

And then compiler versions change and people get bitten. Time and time again.

You're misinterpreting what I'm trying to say w.r.t. uninitialized variables here, or perhaps I'm just being idiotic at explaining myself.

In what I'm describing, the compiler can assume the value of x to be anything it wishes. It's just that, once the value is fixed, it must be fixed.

Think of it this way: the compiler may choose any initial value of any variable it chooses.

I'm proposing a language where a read from an uninitialized variable must be consistent with the uninitialized variable having been set to an arbitrary value (in range) before the first read from it. So compiling the static code {int x; return x;} into {return 0;} is perfectly allowed, as is compiling it into {return 42;}.

Dylan16807 · on April 23, 2015

>The peephole optimiser will pretty certainly run first and remove the whole expression though.

Which is pretty dangerous.

    RunCriticalCodeA();
    if (x) {
      y = 3;
      RunCriticalCodeB();
    } else {
      y = 4;
      RunCriticalCodeB();
    }
    OtherCriticalCodeC();

Fail to initialize x and suddenly all that code is replaced with a return. Even the part outside the if, even the part before the if. Compilers already exist that are this aggressive.

TheLoneWolfling · on April 23, 2015

Yep. Optimization by C compilers has gotten to the point of "your worst enemy is the compiler". (My favorite: I had UB in my debug function, but inside an ifdef debug block for speed purposes. So I was getting the most absurd magic errors, that only happened when I was trying to debug)

As I've said, I want reads from uninitialized variables to be at least consistent.

    if (x) {
        foo();
    } else {
        foo();
    }

should always run foo(). Ditto:

    foo();
    if (x)
        bar();
    baz();

should always run baz() (assuming everything before it is "normal" - not going to get into jumps or system exit here), should always run foo() if it runs baz(), and should always run foo().