
What is C in practice? - 0x09
http://www.cl.cam.ac.uk/~pes20/cerberus/notes50-2015-05-24-survey-discussion.html
======
nkurz
_It remains unclear what behaviour compilers currently provide (or should
provide) for this._

It might be nice if future surveys explicitly asked a followup question
"Regardless of the standard or behavior of existing compilers, is there one of
these answers that is the 'obviously correct' manner in which compilers should
behave? Which one?"

If practically all users believe that the same answer is 'obviously correct',
compiler writes might want to take this into account when deciding which
behavior to implement.

    
    
      For MSVC, one respondent said:
    
      "I am aware of a significant divergence between the LLVM 
      community and MSVC here; in general LLVM uses "undefined 
      behaviour" to mean "we can miscompile the program and get 
      better benchmarks", whereas MSVC regards "undefined 
      behaviour" as "we might have a security vulnerability so 
      this is a compile error / build break". First, there is 
      reading an uninitialized variable (i.e. something which 
      does not necessarily have a memory location); that should 
      always be a compile error. Period. Second, there is reading 
      a partially initialised struct (i.e. reading some memory 
      whose contents are only partly defined). That should give a 
      compile error/warning or static analysis warning if 
      detectable. If not detectable it should give the actual 
      contents of the memory (be stable). I am strongly with the 
      MSVC folks on this one - if the compiler can tell at 
      compile time that anything is undefined then it should 
      error out. Security problems are a real problem for the 
      whole industry and should not be included deliberately by 
      compilers."
    

I'm much less familiar with MSVC than the alternatives, but this is a
refreshing approach. Yes, give me a mode that refuses to silently rewrite
undefined behavior. Is MSVC possibly able to take this approach because it
isn't trying to be compliant to modern C standards? Does it actually reduce
the ability to apply useful optimizations? Or just a difference in philosophy?

~~~
Someone
_" First, there is reading an uninitialized variable (i.e. something which
does not necessarily have a memory location); that should always be a compile
error. Period."_

You cannot implement that efficiently without rejecting valid programs.
Consider code like this:

    
    
      int x;
      if( f()) x = 1;
      if( g()) h(x);
    

For sufficiently complex functions f and g, there's no reasonable way to
decide at compile time whether x will be set whenever g() returns true. For
example, f() might always return false because Fermat's last theorem is true.

And that 'reasonable' likely isn't a necessary part of that statement.

~~~
nothrabannosir
Yes, this is exactly the reason for undefined behavior. It is often overlooked
in the "well, the compiler should just throw an error instead of invoking
undefined behavior" debate: the compiler doesn't just "choose to invoke"
undefined behavior: it relies on it to do program analysis.

For example, your example could legally be rewritten to:

    
    
        int x;
        f();
        x = 1;
        if (g()) { h(x); }
    

The only difference is if f() were false, and someone would then access x and
see 1; but that's undefined behavior, so you can ignore it. In fact, assuming
x is not accessible outside this block:

    
    
        f();
        if (g()) { h(1); }
    

These optimizations happen all over the place, it's not the compiler invoking
or causing undefined behavior, but assuming that it won't ever happen.

EDIT: Note the further optimization that looms: if f() can be proven pure (no
side effects), then it can be removed. This makes little sense for a function
with no arguments (in which case it would just be a constant). If, however,
f(y, ...) is some expensive but pure function, it can just be removed
completely.

~~~
dllthomas
_" Note the further optimization that looms: if f() can be proven pure (no
side effects), then it can be removed. This makes little sense for a function
with no arguments (in which case it would just be a constant)."_

It could apply if f computes a value based on global state but doesn't change
anything, which is slightly weaker than "pure".

------
userbinator
There is a relevant article
[http://blog.regehr.org/archives/1180](http://blog.regehr.org/archives/1180)
and discussion
[https://news.ycombinator.com/item?id=8233484](https://news.ycombinator.com/item?id=8233484)
about what programmers really think C should be like (i.e. the "portable
assembler" it was designed to be originally); it incidentally also shows what
they think a sane machine architecture should be like.

------
timtadh
Question 2 is :

Is reading an uninitialised variable or struct member (with a current
mainstream compiler):

(This might either be due to a bug or be intentional, e.g. when copying a
partially initialised struct, or to output, hash, or set some bits of a value
that may have been partially initialised.)

a) undefined behaviour (meaning that the compiler is free to arbitrarily
miscompile the program, with or without a warning) : 128 (43%)

b) ( * ) going to make the result of any expression involving that value
unpredictable : 41 (13%)

c) ( * ) going to give an arbitrary and unstable value (maybe with a different
value if you read again) : 20 ( 6%)

d) ( * ) going to give an arbitrary but stable value (with the same value if
you read again) : 102 (34%)

e) don't know : 3 ( 1%)

f) I don't know what the question is asking : 2 ( 0%)

\--------------------

I know of one datastructure (a sparse set of integers from 1-n) which relies
on this behavior:
[http://research.swtch.com/sparse](http://research.swtch.com/sparse) . I
always thought it was a neat trick. However, from the article is seems that
may NOT give stable values to uninitialized members. Which may make that data
structure behave strangly or cause the program to miscompile.

~~~
JoshTriplett
That's a question for which multiple answers are correct: a, b, _and_ c.

~~~
ustolemyname
I disagree regarding b. I think you'll find the outcome of this expression
quite predictable:

int b; int c = b * 0;

~~~
JoshTriplett
Sure, if the expression doesn't depend on the value at all, it _probably_
won't have unpredictable results. (Though as with any kind of undefined
behavior, don't count on it, as compilers can be "clever" sometimes.)

------
ydcvjk
A lot of C programmers make valid assumptions based on their system
architecture and compiler. Is there any point trying to unify their obviously
different practices, while at the same time ignore the Standard? No.

~~~
MichaelCrawford
To me, the whole point of C is to enable one to take advantage of one's system
architecture. Thats not really possible in higher level languages.

~~~
pcwalton
Well, one of the interesting takeaways from the article is that C as specified
often does _not_ let you take advantage of your system architecture. The
specification has things like GPUs and segmented memory architectures in mind
when it forbids you from doing seemingly reasonable things like taking the
difference between the addresses of two separately-allocated objects, even
though chances are very good what you're trying to do works just fine on all
architectures you care about.

~~~
AnimalMuppet
Yes and no. The standard specification defines a C that is very portable. It
is quite reasonable that you cannot _portably_ take advantage of your system
architecture, which means you cannot do it using C as defined by the standard.

But you still can write stuff that looks just like C, that a C compiler will
accept, and that lets you take advantage of your system architecture. You just
need to understand that a new version of your compiler can break all of your
nifty tricks.

~~~
ydcvjk
Finally someone that understands.

~~~
Gibbon1
Not really because there are two kinds of c programer out there.

1\. Those that communicate with the outside world exclusively via library
calls to an operating system.

2\. Those that do not.

People that live in world #1, usually don't get people living in world #2.
Example.

    
    
      // wait for write ready
      while((spi.S & SPI_S_SPTEF_MASK) == 0)
        ;
    
      spi.D;
      spi.D = data;
    

Guess what,

No spi.D; is actually important. No just because the program never writes to
spi.S does not mean you can delete the while loop. No you cannot reorder any
of this and have it work.

~~~
ydcvjk
Sorry, I have no idea what you are trying to say. Maybe stick to the relevant
issue and be less abstract?

~~~
icedchai
His example is anything but abstract. Embedded programming is a world unto
itself. You sometimes need to read an address before you can write, etc.

~~~
dllthomas
I think the parent meant "implicit" rather than "abstract". The code is not
abstract; but the pragmatic intent of the comment is not made very clear. I
believe I follow but I'm not confident enough to risk confusing things by
guessing.

(I believe I follow the intended conversational import; I certainly follow the
code.)

~~~
Gibbon1
Sort of the point is, in pure languages only the symbolic operations are
important. And the side effects (memory operations) are unimportant and beyond
the control of the programmer. So you can do all sorts of transformation on
the code without effecting the end result.

C is definitely impure. There is a large set usage cases where the memory
layouts are important. Where the orders of operation and memory accesses are
important.

What I worry about is when the optimizer is allowed to make assumptions about
undefined behavior, that may be actually important on certain targets.
Consider referencing a null pointer.

Winows7, Linux, in user land if you do that and you'll get a seg fault. And if
the default handler runs your program dies.

The ARM cortex processor I've been slopping code for, reading a null pointer
returns the initial stack pointer. Writing generates a bus fault. Since I
actually trap that, and error message gets written into non-initialize memory
and then the processor gets forcibly reset.

On an AVR reading a null pointer gives you the programs entry point. Writing
to that location typically does nothing. Though you can write to it if running
out of a special 'boot sector' and you've done some magic incantations.

So that's the problem I have with the idea that 'undefined means the optimizer
can make hash of your program' instead of trusting the back end of the
compiler will do something target appropriate.

~~~
AnimalMuppet
Yes, embedded is very side-effect-ful. When you have to make that kind of
thing work, you usually have to know both your hardware architecture and your
compiler.

    
    
      *(unsigned long*)0xEF010014 = 0x1C;
    

presumes memory mapped I/O, a 32-bit hardware register at 0xEF010014, and what
0x1C will mean to that register. It also presumes that the compiler will
generate a 32-bit access for an unsigned long.

Going back to Gibbon1's code sample, you have to know what your compiler is
going to do at what optimization level. You either have to declare spi.S to be
volatile, or you have to compile at -O0, or some such. And you may well need
to review the generated assembly code a few times in order to _really_
understand what your compiler is doing with your code.

If you're writing the Linux kernel and you want to be portable across hardware
architectures, it gets harder. You can't just be processor- or hardware-
specific. You probably need the volatile keyword. I don't know what the kernel
compiles in terms of optimizations, but I bet it's not aggressively optimized.

------
bshanks
As noted in the article, a summary of this document (about one-third the
length) from the same authors is available at
[http://www.cl.cam.ac.uk/~pes20/cerberus/notes51-2015-06-21-s...](http://www.cl.cam.ac.uk/~pes20/cerberus/notes51-2015-06-21-survey-
short.html)

------
marcosdumay
It's scaring to notice that all this is undefined behavior. Hell, glibc
network functions rely on undefined behavior!

------
aidenn0
Note that the C99 (I'm not up on C11 yet) standard specifically allows type-
punning through the use of unions (and disallows it in essentially all other
cases except when one type is a character type).

Also, there seems to be some confusion about storing and loading pointers,
when the standard speaks to this as well; roughly: a pointer which is
converted to a "large enough" integer type will point to the same object when
converted back. It is permissible for an implementation to not provide a large
enough integer type, but excepting that, the behavior is well defined.

~~~
jcranmer
C89 prohibited union type-punning. C99 allowed it, but mistakenly left the
prohibition in the (non-normative) list of undefined behaviors. C11 did
rectify the list.

~~~
aidenn0
One of the TCs for C99 fixed the list I believe.

------
programmernews3
As far as it's possible compilers should check as much as possible when false
assumptions are made.

Is it possible to build a language, which would reduce the number of false
assumptions?

------
lsiebert
Interestingly git used bit fields in it's code.

------
Kenji
This is why we need to implement everything in JavaScript

 _hides_

~~~
krylon
Apparently, to some people, irony is an undefined behavior...

(Just to be clear, I am referring to the people that downvoted you.)

~~~
Kenji
Good one. Too bad humour is not tolerated here :)

~~~
gjm11
As I've said before: the thing is, the HN crowd is a really tough audience --
they will only be amused by jokes if they're _actually funny_.

If your idea of humour is "make some pop-culture reference and it will
automatically be funny" or "be generically cynical about everything and it
will automatically be funny" then, yeah, you're going to have a difficult time
on HN.

Something like 1% or so of my HN comments are jokes. They usually do pretty
well for karma. (Slightly to my surprise, the most recent one to flop
completely was one that was also making a slightly serious point, and one that
I don't think many HN readers would disagree with. Ah well, can't win 'em
all.)

