
Defining the Undefinedness of C (2015) [pdf] - gbrown_
http://fsl.cs.illinois.edu/FSL/papers/2015/hathhorn-ellison-rosu-2015-pldi/hathhorn-ellison-rosu-2015-pldi-public.pdf
======
iainmerrick
Here's something I'd love to know about undefined behavior in C: is this
something specific to C, or is it something that _any_ similar language would
have to contend with?

It seems like problems crop up when you combine a fairly low-level language
with an emphasis on performance, a specification that explicitly calls out
implementation-defined and undefined semantics, and very highly optimizing
compilers.

None of the "safer C" languages seems to have gained much traction. Is that
because the problem really is intractable, or just because maintaining a
decent level of compatibility with C is too hard? The major selling point of a
safer C is being able to reuse lots of existing C code, after all.

Note that other languages probably have the same pitfalls, just latent for now
-- their specs are silent about undefined operations, or they don't have specs
at all, and their compilers don't optimize as aggressively as GCC or Clang.

~~~
kps
I think it was an unintended consequence of the wording chosen by the ANSI C89
committee. The reality of pre-ANSI C was that if you wrote ‘+’, the
expectation was that you'd get the target machine's ‘add’ instruction, no more
and no less. It might overflow, it might not; it might crash or hang your
machine on overflow or trap values — but that is all your problem, not the
language's or compiler's. I worked on a commercial compiler at the time, and
at the time, people took ‘undefined behaviour’ as acknowledgement that
sometimes the effect of generated code could be surprising to people
unfamiliar with the particular system, not as encouragement to screw the
programmer.

~~~
simias
That's a good explanation for invalid memory accesses and divisions by zero
but I'm not aware of many architectures where addition overflow traps (it
_can_ trap on MIPS but there's an other instruction that simply wraps around).
That being said I don't have an encyclopedic knowledge of instruction sets.

I could be wrong though, after all compilers back then were a lot less clever
than now so maybe they couldn't really make use of this information. I'd be
curious to get a first hand testimony as to why certain UB exist in the first
place.

~~~
tom_mellior
This is not first hand, but the commonly given reason for undefined signed
integer overflow is that it allows for mathematically correct reasoning: x < x
+ 1 is true for all x. Or rather, the compiler may assume that it is true and
optimize accordingly, since the programmer is trusted to avoid the undefined
overflowing case.

~~~
iainmerrick
"Mathematically correct" doesn't seem like the right way to think about it,
since the whole problem is that it isn't correct!

~~~
tom_mellior
Yes, but it wouldn't be correct the other way round either. Wraparound on
signed overflow is a well-defined operation on CPUs, but it is nonsensical. In
almost no computation will it give a result that is meaningful in the context
of what the program is trying to do. If you try to compute the average of two
positive numbers as (a + b) / 2 but get a negative result, it would be of no
use if this were "well-defined".

So from the point of view of a C compiler that exploits undefined signed
overflow, if your code could overflow, it was _probably_ already buggy, and
_you_ should have fixed it.

(One problem with this line of reasoning is that it kind of also applies to
unsigned numbers, but there C goes the other way.)

------
nickpsecurity
That team’s awesome. One of few groups in formal methods using rewriting logic
(Maude) instead of things like Coq or Isabelle/HOL. They seem to move faster
on semantics as a result. They also build their own modified logic called
matching logic on top that they claim is better than separation logic.

[http://www.kframework.org/index.php/Main_Page](http://www.kframework.org/index.php/Main_Page)

More interesting, their use of these tools allowed them to make their C
semantics executable in a style similar to GCC:

[https://github.com/kframework/c-semantics](https://github.com/kframework/c-semantics)

I want this group to do one for Rust and SPARK as a reference spec for
certifying compilers for those. Also useful for the diverse compilation
concept to counter Karger's compiler/compiler subversion. On a side note, they
also have a company that they use to fund and apply their work:

[https://runtimeverification.com/](https://runtimeverification.com/)

~~~
tom_mellior
> using rewriting logic (Maude) instead of things like Coq or Isabelle/HOL.
> They seem to move faster on semantics as a result.

This statement is close to nonsensical. What are you comparing it to? Where is
a comparable executable semantics project in Coq or Isabelle?

This is great work, and they are definitely using the right tool for the job.
But the field of "formal methods" is enormous, and you seem to be saying that
apples are better than oranges.

~~~
solidangle
Here is a Coq formalization of C11:
[http://robbertkrebbers.nl/research/ch2o/](http://robbertkrebbers.nl/research/ch2o/)
It only covers a large fragment of C11, but they formalized the operational,
axiomatic, and executable semantics and proved that these correspond to each
other.

~~~
tom_mellior
Yes, and there are other formalizations as well. But none of them (as far as I
am aware) formalize the _undefined_ parts in the way the featured article
does. I think comparing projects with different goals and then saying "the
differences must be due to the tools used" isn't solid reasoning.

------
fundabulousrIII
My good god. Do you actually believe that this scenario reflects a weakness in
C or a C compiler?

~~~
lotyrin
Yes?

In the grandparent's solution, I should be able to e.g. collection.map, the
type system and/or hints from myself should inform the compiler whether or not
there are side-effects, if runtime bounds checks are needed, if this is
something that can and should be turned into SIMD or concurrent threads. OR I
should be able to write some mildly-portable assembly-type algo using
intrinsic functions that represent precisely what I want the machine to do,
and expect that is precisely what will be done (and deal with porting this to
whatever platforms I support). In neither case is there a concept of undefined
behavior.

Modern C/C++ is at a level of abstraction where I'm writing explicit
instructions for a hypothetical machine, the compiler will consider explicit
details in my implementation as implicit suggestions in order to map what I've
asked it to do to what the hardware is capable of, and it's up to me to have
to be aware of the consequences.

It's not the worst problem of course, but it's not a non-problem.

~~~
fundabulousrIII
Your approach is emblematic of programming as a culture of production versus a
culture of understanding. I don't agree with any approach which characterizes
a language as good or bad based on it's ability to read minds and make
decisions based on the unknown.

When you mandate safe behavior based on any expert rules-based or heuristic
system you have to leave the unsafe option open which is _just_ as bad.

