Hacker News new | comments | ask | show | jobs | submit login
RFC: Automatic variable initialization (llvm.org)
34 points by ndesaulniers 10 days ago | hide | past | web | favorite | 34 comments





It's a pity the flag -ftrivial-auto-var-init=zero is eventually going to go away (there is -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang). I understand their motive, but it would be great to use that flag and others like -fno-strict-aliasing etc. to make a "safe and friendly C" dialect. I.e. turning undefined behavior into "obvious" behavior when possible.

I thought you writing `-enable-trivial-auto-var-init-zero-knowning-it-will-be-removed-from-clang` was just satire. Nope. A little humorous that it's a real flag. https://reviews.llvm.org/D54604?id=174471

Already in Lisp for decades: (let (x y z) (list x y z)) -> (nil nil nil).

However, nil has a distinct type. Thus:

  (let (x)  ;; not initialized
     ;; ...
     (when condition
       (setf x ...))
     (numeric-code x)) ;; still blows up if condition was false
In other words, this initialization doesn't create a big risk of hiding errors due to initialization with a correctly typed bogus value that could be inappropriate.

nil is the empty list. The risk of a bug where a local variable holds an empty list that should have been initialized to some non-empty list seems low.

nil is also Boolean false. The risk of a bug where a local Boolean variable was implicitly initialized to false, but should have been true, also seems low.


Even if projects choose not to adopt this (~3% performance impact might be too much?) I'd love to have this as a sort of "lightweight address/UB sanitizer" to check for uninitialized variables.

It actually accomplishes the opposite; hiding uninitialized values from tools such as Valgrind. If you're trying to find uninitialized variables then you would specifically not enable this flag so that the detection can work.

This is why I intentionally leave variables uninitialized rather than put bogus initialization values. Example:

    int x;
    if (cond) {
        x = foo();
    } else {
        x = bar();
    }
Some people when they see my code, have the urge to modify it so that x is initialized to 0 before the branches. But this hides bugs. If for example, I later made the if-else chain more complicated, and one of the branches failed to initialize `x`, better to find out with Valgrind than to have a hard-to-find-bug where an invalid value of 0 was used in an unexpected place.

This would be a flag to enable on release builds; a bug mitigation technique rather than a debugging technique.


> Some people when they see my code, have the urge to modify it so that x is initialized to 0 before the branches. But this hides bugs.

Something like Java's final would be useful in such a scenario.

I write code which looks exactly like this, except I only do this for final variables - the compiler throws an error if I add a branch which doesn't explicitly set it.


This is hard to do in C/C++:

   int val;
   if (do_something(&val)) { ... }
Is val initialized or not? Note that passing in addresses of values is the standard way of getting around the fact that C/C++ does not support multiple return values.

" the compiler throws an error if I add a branch which doesn't explicitly set it."

That's pretty useful.


It is! It's called definite assignment:

https://docs.oracle.com/javase/specs/jls/se10/html/jls-16.ht...

Java was my first language, so i took it for granted; i was alarmed when i found out that C and C++ don't have it. It's saved my bacon too many times to count.

FWIW, Rust does have this rule; this code:

    fn show(x: usize) {
        let message;
        if x % 2 == 0 {
            message = "even";
        }
        println!("{} is {}", x, message);
    }
is rejected with error E0381:

https://doc.rust-lang.org/stable/error-index.html#E0381


I am not up to date with latest C++ developments but I wonder if there is something you could build with template magic that would provide this in C++.

Unfortunately, most CPU's aren't Valgrind. You have to port your code to GNU/Linux user space to take advantage of Valgrind. (Evidently, Solaris and Darwin are supported now, as well as Android.)

By the way, Valgrind has an API, using which variables that have been reliably cleared to zero can be marked as de facto uninitialized.

However, in this situation, it requires integration of use of the Valgrind API into the compiler, which has to generate code on function entry points to not only clear the locals, but do the Valgrind API call to mark the memory.

But then this will of course yield false positives from Valgrind when programs rely semantically on the initialization for correctness and not just to have predictable behavior for a forgotten initialization.


> Evidently, Solaris and Darwin are supported now, as well as Android.

Not the latest version, though; support is still only offered for macOS Sierra.


Being able to run programs under Valgrind is kind of a luxury. Kernel / embedded developers can't really do that.

If you have faith your compiler will warn/error on use of uninitialized variables (and GCC/Clang seem to warn about it pretty reliably these days, but C is an old language and compilers certainly have not always warned about it), go ahead and continue that practice. In my experience, when comparing accidental zero initialization to accidental undefined behavior, I think usually the accidental zero is easier to understand and correct when your tools are (maybe!) GDB. If you have valgrind, that's a different story.


> Being able to run programs under Valgrind is kind of a luxury. Kernel / embedded developers can't really do that.

But they can run large parts of the code in tests and those test can be run under valgrind.


Well, it's theoretically possible. And that would be nice. But it isn't the reality.

> It actually accomplishes the opposite; hiding uninitialized values from tools such as Valgrind.

My point is that this is a useful tool when I can't use Valgrind, or any other sanitizer, due to performance reasons or because they don't exhibit the undefined behavior I was trying to track down. And I wouldn't initialize to zero: I'd use pattern initialization to 0xfa or whatever is the standard " lowest probability byte that represents uninitialized memory but is still likely to fault or produce blatantly wrong results".


The debug builds in VC++ used to initialize memory with a certain pattern. That was pretty useful to see problems in the debugger.

I think in general a defined default value is a good thing. Maybe it shouldn't be 0 but something like INT_MIN so you clearly see what something is wrong.


0xAA, 0xA5, 0x5A, 0x55 are generally good values.

They tend to be low-probability values so if you get two of them together it's almost always a bug. They don't tend to collide with other things and are far enough away from overflow that they don't accidentally get masked by an off-by-one bug. They have interleaved 0's and 1's so that if you get a bit error, it sticks out. They're also obviously different enough that if they get copied, they kind of stick out.

I tend to "initialize" my data to these by default. They also tend to be my go to for "deinitializing" data.

It's remarkable how often I catch bugs in my C code simply by scanning RAM for those values.


That approach is great for uninitialized pointers, but not so hot for numeric code.

It's still good to see that all uninitialized ints are something like 12345678 instead of random numbers. Not perfect but helpful.

True! trivial_object.ref_count is 0xDDDDDDE. Oops!

0xdead for int16's and 0xdeadbeef for word+ values? :-D

INT_MIN (0xffff) has a sort of collision with PCI MMIO read failures (timeouts, unsupported request, etc), which return all-bits-1.


I don't accept the analysis that "uninitialized is better than a bogus initial value because we have a tool which can treat it as a trap value".

Here is my main argument: once tool is used and discovers the uninitialized variable, of course, there will be a fix applied: either to initialize the variable, or to assign a value to it later in a conditionally executed path in order to prevent the uninitialized use in that path. Either way, when this fix is applied, the tool will thereupon stop detecting an uninitialized value in the scenario that is being tested. And the fix can be applied with a bogus value. In other words, our goal is to prevent the use of an uninitialized variable, and the only way to do that is to put a value in it before it is used, and that always carries the risk of a wrong value being put there.

Additional argument: we know that some initial value is wrong only because we have some test which reproduces the situation of that value being set up and used, and that test fails. If we don't have such a failing test, it doesn't help us to leave the variable uninitialized, if our goal is to establish a correct value for that situation for which that test is missing. The Valgrind tool will certainly inform us that the variable is uninitialized, but in reference to a different test. Our debugging effort will establish a correct value for that test, without any guarantee that the value is correct fro the untested situation.

Correctness is not achieved until all execution paths access a stable value of the variable. We can't magically make the jump from "uninitialized" to "correct initial value" with zero risk of "bogus value". The tool which detects uses of uninitialized data is silenced once we supply a value for that data; it doesn't help with the issue of choosing the correct value.

So basically the "don't initialize" argument is implicitly based on the belief that there is no way to detect a bogus initial value, except for the end user of the software to discover a malfunction in the field. And that, magically, even though we can't reproduce the consequences of a bad initial value in development, it helps us to instead leave the variable uninitialized and run everything under Valgrind; we will then be alerted to the uninitialized variable, and not make a mistake in choosing a value for it, even though we still don't have a test case which distinguishes bogus from non-bogus value.

In summary, it is not a credible proposition that the initial coding (with no test case) carries a risk of choosing a wrong initial value, whereas post-mortem fixing of an uninitialized variable bug (still with no test case) does not carry a risk of choosing a bogus initial value.


Sure, but why the trivial prefix? Doesn't sound too ensuring then.

I'm assuming the "trivial" refers to "trivial" types.

Sure, but it's unclear what the benefit of leaving auto C++ objects uninitialized is.

IIRC, auto C++ objects aren't uninitialized, they're default-initialized, but said default-initialization may leave them with uninitialized fields. I would hope that this new automatic variable initialization would also apply to those fields of default-initialized auto C++ objects.

Built-in types like int are also "object"; only C++ class objects can have constructors that initialize them.

Auto declarations don’t mean uninitialised, they get default initialisation, which for many types means “undefined”.

I should have been more clear. What's the benefit of leaving automatic storage C++ object memory uninitialized?

It's fast?

The proposal would already accepti a performance tradeoff to initialize memory of uninitialized "trivial" variables. I'm not sure why you would want to draw a line in the sand at trivial variables only.

What makes non-trivial uninitialized object memory higher penalty or lower risk?


Say I create a struct that's a thousand bytes large. Making sure that it is properly zeroed is a large performance penalty.

Putting a huge struct on the stack is going to be a bad time for multiple other reasons, with or without this initialization.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: