
Uninitialized Reads: Understanding the proposed revisions to the C language - nkurz
http://queue.acm.org/detail.cfm?id=3041020
======
pjc50
> but some do it anyway—for example, to create entropy

Ouch! This is a really terrible idea and people should stop doing it in C.
It's a bad entropy source as, unless you're very specific about the hardware,
you have no idea what you're going to get. Maybe it _is_ all zeroes. Maybe
it's a page of a previously executed program.

If you're going to do this, do it in assembler so it's deliberately
unportable.

~~~
mikeash
For those who don't remember, the infamous Debian bug which resulted in weak
keys being generated on Debian systems for a couple of years stemmed from
this.

Using uninitialized memory wasn't an actual problem there, as it was mixed in
with a bunch of other entropy before being used, and those other sources were
sufficient even if the uninitialized memory had predictable contents. However,
tools like Valgrind complained about it, so somebody fixed it, and
inadvertently wiped out almost all of the other entropy sources going into it
as well.

~~~
nullc
The story is even more interesting. OpenSSL did that _and still does it
today_. As a result any program that uses the OpenSSL RNG is full of spurious
valgrind errors (exactly what you want in security critical code...).

It had (and still has) a -DPURIFY flag available at compile time that gets rid
of that stupidity, which is still not default. OpenSSL have not changed it, as
I understand, because of a concern that there might exist some embedded system
somewhere where that is the only useful source of randomness.

A debian developer saw the errors all over the place, wrote a patch to fix it.
The patch, unfortunately, went to far and managed to knock out the secure
randomness source to. The patch was posted to the OpenSSL list and no one
responded "No don't!" or "Have you seen the already existing DPURIFY?"... and
the rest is history.

Fedora, at least, adds DPURIFY, but many other distributions do not.

~~~
mikeash
Oh wow, I didn't realize the option was still there, let alone on by default.
Madness.

The embedded system argument makes no sense to me (understanding that you're
just repeating it, not defending it). If there's no other source of randomness
then there's no good source of randomness, period, and you're screwed. Better
not to even try, then.

------
Animats
The C and C++ standards people are finally trying to deal with their safety
problems, decades too late. At least they're trying. Some of this may be an
attempt to catch up to Rust.

The "trap values" concept is interesting. They're thinking that some floating
point values are signaling NaNs, which will cause an exception if processed.

One way to approach this is to divide types into "fully mapped", where all bit
patterns map to a valid value, and "partially mapped", where they don't. All
the integer types, "char", "short", "int", "long", etc. are fully mapped on
modern machines. "bool" isn't. Floating point isn't. Pointers are not. Enums
are not. Structs are fully mapped if all their fields are fully mapped.

Initializing a fully mapped type with junk won't break the C/++ machine model.
That's not true of partially mapped types. So this could be enforced in casts.
Casting to fully mapped types could be permitted as at present, but casting to
a partially mapped type should require something visibly unsafe, like C++'s
"reinterpret_cast". In C, turning the raw memory from "malloc" into a
partially mapped type should be a visible act. In C++, all partially mapped
fields should be required to be initialized in the constructor. It's OK at the
language level if fully mapped fields start as junk.

Requiring that pointers be initialized would be good for safety. Failure to
initialize a pointer is usually a bug. Sometimes it's an entry point for an
exploit.

~~~
panic
IIRC signaling NaNs aren't trap representations: they're fully-defined values
which support operations like isnan().

~~~
brandmeyer
Correct. For more detail: the only mandated "signaling" for a signaling NaN is
that it sets the sticky Invalid bit in the floating-point status register (aka
Invalid exception). It is up to the host processor as to whether it is
possible to get a signal on changes to the floating-point exception bits. Some
CPU's don't have the ability at all.

------
pornel
D has got it right.

`int x` is initialized to 0 by default, but those who really care can write
`int x = void` to get an uninitialized value.

I think that feature could even be a backwards-compatible addition to C.

~~~
minipci1321
I don't have a strong feeling either way, but sometimes not giving a value to
a variable allows the compiler and static analyser to quite easily detect uses
of non-initialized. OTOH, assigning '0', a perfectly valid value, to a
variable initially, might help to conceal bugs when the variable is not
updated to the actual value before use.

We would really need a "not-a-value" for integral types in order to create
them initialized by default, and there is no such thing unfortunately. This is
different for pointers (except in very specific situations) and for floating-
point types, so these should be initialized.

~~~
nullc
The compiler detecting uninitialized use catches a lot of real bugs.

A compromise there would be to define it to be initialized to zero, but leave
uninitialized use still defined as an error in the program (just not one
optimizers can exploit.).

Unfortunately the impact on optimization is also a consideration too.

~~~
braveo
In languages that allow you to do things like set the value of enumerations, I
always start with 1. The reason being that an uninitialized variable, bad
outside input, etc, is more likely to get caught as bad input rather than
whatever happens to be in slot 0.

------
CJefferson
I wish one of clang, or gcc, would gain an option to let me initalise any
stack variable to 0 if there is not an explicit assignment to it. I suspect
(and I'd really like to know) that with optimisation, the practical cost of
this would be 0, and a whole bunch of issues would be cleaned up.

For malloc, I can already choose fairly easily to 0 out memory, but here it
can get more expensive if one isn't careful.

~~~
CoolGuySteve
msvc detects uninitialized stack variable usage as a run time check and throws
an exception.

I've always wondered why the compiler can't warn about this at compile time.
It _should_ know when uninitialized stack space is being used since it is,
after all, the compiler artificially designating what the stack is!

~~~
gsg
In general it's impossible. Consider this C program:

    
    
        void do_something(int *);
    
        int f() {
            int x;
            do_something(&x);
            return x;
        }
    

Is the return of x a use of initialized memory? Maybe. Relink the program and
the answer might change.

~~~
smhenderson
Isn't x already initialized to zero as it's static?

I googled it and this[0] seemed relevant:

 _1655 — if it has arithmetic type, it is initialized to (positive or
unsigned) zero;_

[0] [http://c0x.coding-guidelines.com/6.7.8.html](http://c0x.coding-
guidelines.com/6.7.8.html)

~~~
gsg
What you want is just a few lines up:

    
    
        1652 If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
    

(x is an 'auto' variable in the given program.)

~~~
smhenderson
Yeah, that's what I get for replying without making sure I'm correct! :-)

------
nullc
This is the paper on the survey they mentioned:
[http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201606-...](http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201606-pldi2016-clanguage.pdf)

I'd really like to find a fully copy of the survey and the response stats.

------
ridiculous_fish
Fun fact: the fastest way to load an XMM register (that is, SSE) to all bits 1
is to compare that register to itself, essentially:

    
    
        int x;
        x = (x == x);
    

This is indeed what clang and gcc emit for e.g. _mm_set1_epi8(-1). All values
are equal to themselves - well, except NaNs!

~~~
poikniok
Could you explain further, so x = 1 now, what exactly has all bits set to 1?

~~~
gsg
x86-64 vector compare instructions (mostly) produce masks, which are vector-
sized values containing 1 for each bit of the input for which the comparison
is true, and 0 where it is false. Since a vector analogue of x == x would be
true everywhere, it would result in 0b1111....

It's true that the given C code doesn't have those semantics, but you can see
what the GP was driving at if you squint a bit. Uh, and if you know how these
instructions work already.

