
On C-optimizing compilers removing code that has undefined behavior - mpweiher
http://www.yodaiken.com/2017/06/16/the-c-standard-committee-effort-to-kill-c-continues/
======
svckr
The author quotes an article referring to using uninitialized memory for
additional entropy and that article cites [10].

Let me just say: DON'T do that, ever. Uninitialized memory is rarely random.
If you're lucky, it's quasi-constant because your own program always writes
something else to it before. Worst case an attacker manages to write something
to that memory before you read it, thereby controlling part of your seed. The
cited article also recommends to "In short, just don’t use uninitialized
memory for more randomness."

Really, exploiting undefined behaviour in crypto code is a bad idea, and you
should make sure you compile with flags that at least catch the most common
cases, like the one mentioned in the post.

[10] [http://kqueue.org/blog/2012/06/25/more-randomness-or-
less/](http://kqueue.org/blog/2012/06/25/more-randomness-or-less/)

~~~
pjmlp
Actually some well know researchers are quite known to be against UB, but C
compiler writers don't seem to be willing to listen to them.

[https://blog.regehr.org/archives/1054](https://blog.regehr.org/archives/1054)

~~~
svckr
I can relate to that. I guess undefined behavior makes it somewhat harder to
write correct software. I've definitely read someone make the same point.

The problem is, you can't always know if (for example) you're reading from an
uninitialized variable, unless you run the program with the possibly infinite
number of valid inputs, thereby also solving p != np.

In my opinion, undefined behavior is what we have and we had enough time to
learn how to deal with it. If at all possible, compilers should be able to
detect UB, so I can use compile flags to make UB an error. Changing the
language to "define" undefined behavior, might turn it into unexpected
behavior and I wouldn't like that. At least we've learned the lesson with
newer languages.

~~~
pjmlp
> At least we've learned the lesson with newer languages.

The thing is, we already knew that from older languages as well. C's UB is a
consequence of the people involved in ANSI C89 not wanting to fully define the
language, like the other languages were doing.

I think the lesson is that "Performance trumps correctness" is the wrong path
to follow, specially since not everyone using C is a 10x coder that always
makes use of best practices.

------
andreyv
> The programmer has clearly attempted to set x[0]=0

Then why not just write it like that? This is like deliberately shooting
yourself in the foot and complaining that your shotgun works.

~~~
sddfd
Exactly. A compiler cannot guess what a programmer intended, but has to stick
to the language semantics.

Also, the article goes on to argue that uninitialized memory is used to seed
random generators, which is generally undefined behavior. I don't think this
should be a valid use-case.

~~~
GedByrne
So go use rust and leave C for those who think it should.

~~~
JCzynski
I will communicate cryptographically with people using C, so no.

------
wfunction
Uhm, I'm pretty sure this page is wrong, and the compiler cannot optimize this
out. An "indeterminate" value is either an unspecified value or a trap
representation. Like they said, the value here cannot be a trap
representation, so it's just an unspecified value. An unspecified value is
merely an unknown value, NOT a dynamically-mutating value, and NOT a trap
representation. So XORing an unspecified value with itself must result in
zero, not undefined behavior. (UB would occur with a trap representation,
which this isn't.)

~~~
szemet
Here the first answer says (with some quotations) that it depends on, if the
variable could be declared as register or not.

[https://stackoverflow.com/questions/11962457/why-is-using-
an...](https://stackoverflow.com/questions/11962457/why-is-using-an-
uninitialized-variable-undefined-behavior)

edit: i recognised that it does not count as the original example is about an
array element...

------
bluejekyll
> Despite the best efforts of developers of rival programming languages

This is a flippant statement. It could be argued C++ falls into this category,
but others are so new that they just haven't had the time to grow into the
communities that would benefit from them.

> C’s advantages have preserved it as an indispensable systems and
> applications programming language.

But what about the disadvantages? I love C, I will always have fond memories
of using it to do mind-bending things, but given the choice of using safe
languages vs. an unsafe one at this point in time is a mistake.

Where there is an option to use a safe language instead of C, that option
should always be chosen. It could potentially save lives depending on where
it's deployed; it will definitely save money in the long term with fewer bugs.

C is not the only low cost abstraction and peak performance language available
out there anymore.

~~~
pjmlp
> C is not the only low cost abstraction and peak performance language
> available out there anymore.

It never was, it just happened to become one thanks to UNIX's adoption by the
enterprise market.

Back in the 80's it was just yet another systems programming language fighting
for a spotlight, and any junior Assembly developer could write better code
than C compilers for 8 and 16 bit systems.

Oh, the pleasure of using Turbo Basic, Turbo Pascal and Modula-2 and not
having to deal with the unsafety issues of C.

------
panic
What CPUs in common use still have trap representations? Are there other
reasons why reading from an indeterminate value can't give an arbitrary but
well-defined result?

~~~
barrkel
x86, in the form of signalling NaNs, when the appropriate fpu flag is set.
Some languages feel it's better to fail a computation early and loudly than to
silently proceed with a cascade of non-signalling nans possibly getting as far
as the screen or database. Delphi for one. C will see this if it's in a
library linked with code in such a language.

~~~
panic
Aren't signaling NaNs defined to trigger a floating point exception, not
invoke undefined trap-representation behavior?

EDIT: I found this proposal
([http://www.cl.cam.ac.uk/~pes20/cerberus/n2091.html](http://www.cl.cam.ac.uk/~pes20/cerberus/n2091.html))
which has some discussion about existing trap representations (including
whether to consider signaling NaNs to be trap representations). The conclusion
seems to be that segmented pointers on the Motorola 68k are the only place
trap representations are truly necessary -- maybe it's time to just remove
them.

------
pjmlp
School of thought of programmers in the Algol and ML-like family of systems
programming languages, "Correctness trumps Performance".

School of thought of programmers in the C language family of systems
programming languages, "Performance trumps Correctness".

At least the C++ camp, which has quite a few Algol family refugees, there is
some effort to remove few UB from the standard, but not all of them.

In resume, pick your side and don't be surprised for what you get in return.

------
mcherm
I am 100% in agreement with the C standard committee on this one. Treating all
use of uninitialized memory as undefined behavior is a reasonable limitation
that gives compiler writers a great deal of flexibility to improve
performance. If the only loss is the ability to use uninitialized memory as a
source of entropy for random number generators, then that is an incredibly low
price to pay for the increased performance.

~~~
GedByrne
Then why isn't the code being disallowed? Why is it being silently removed in
the middle of the night through an optimization back door?

~~~
pornel
It is disallowed by the standard.

It's not disallowed by compilers, because they're deficient. In this case I'd
guess compiler front-end doesn't track which values in an array are
initialized, and when optimizing back-end sees UB it's too late to emit an
error.

------
dom0
You don't get to write C _and_ complain about UB breaking your code at the
same time.

~~~
kiriakasis
you do in my opinion, i don't understand why UB is so foundamental to C, in
this case woudn't a compiler warnig or error be better?

~~~
wfunction
UB is pretty damn fundamental to C. Not many people realize it though. It's
_the_ fundamental difference between C and C++ compared to Python or C# or
Java or Rust or whatever.

~~~
vyodaiken
Wrong. Indeterminate is fundamental to C. UB is a hack.

~~~
wfunction
Cool, you've convinced me now.

------
szemet
Undefined behaviour was always there to make possible compilers to optimize as
they like (nowadays usually for speed or to ease porting to specific
architectures) - that is the essence of C "portable assembly" mentality and it
always was. Am I wrong?

I tolally agree that this is a bad mentality for software developement in
general, but at least it is the "authentic C way" so I feel that part of the
critics ungrounded...

~~~
xorblurb
Nope. The age of the sufficiently "advanced" (actually: retarded?) compiler
was only theoretical during most of the lifetime of C, and UB were actually
defined mostly because of differences between processors, NOT compilers.
Compilers were faster thanks to their backend.

The standard even says: "undefined behavior: behavior, upon use of a
nonportable or erroneous program construct or of erroneous data, for which
this International Standard imposes no requirements"

However, today, you should act as if the nonportable part of this statement
don't exist anymore, and act as if ALL UB are absolutely forbidden and result
in the worst non-deterministic consequences, in all cases and regardless of
your actual target. Meaning that today on e.g. x86, if you use a mainstream
compiler, you ARE limited by some of the limitations of e.g. some obscure
outdated DSPs.

And this is likely impossible to check in a non-trivial program.

So just use another language. C for serious purposes is dead. (some projects
that started in that language continue to be developed in it, but given the
security issues it creates, this will become unacceptable in a not too far
future; so don't wait it is too late to switch: lead that movement.)

------
minipci1321
Maybe the commitee tries to kill the language. Or maybe it tries to divert
from the field the people who routinely abuses the language and tries to
outsmart the compiler. Those people will now have to try harder, and maybe at
some point this additional time lost on debugging and validation, will start
showing up in teams' stats. I say -- good thing.

------
jwilk
I get a database error... Here's an archived copy:

[https://archive.is/JmzfZ](https://archive.is/JmzfZ)

