
Proposal for a Friendly Dialect of C - 0x09
http://blog.regehr.org/archives/1180
======
Verdex
I think it's interesting that the famous tech companies are all developing
programming languages that are less managed than java/c#, but more predictable
(read: less undefined behavior) than c/c++. Facebook seems interested in D,
Apple has Swift, Microsoft has some sort of secret project they are working
on, Google has Go, and Mozilla has Rust. Even c++ seems to be attempting to
modernize with the new additions to it's spec. And now we see a desire for c
itself to change. I wonder if our industry is at a turning point where managed
languages aren't quite cutting it, but no one is comfortable going back to the
'good old days'.

On a personal note, I like the idea of friendly C so much that I finally made
an HN account. One of my favorite things to do is to take things apart and
understand them. I was mortified when I learned the real meaning of undefined
behavior in c/c++. It seems like the only way to be sure you understand a C
program is to check the generated machine code. Even worse is that when I try
to talk to other developers about undefined behavior, I tend to get the
impression that they don't actually understand what undefined behavior means.
I can't think of a way to verify what they think it means without insulting
their intelligence, but hopefully the existence of something like friendly C
will make it an easier discussion to have.

~~~
72deluxe
I wonder if the managed languages "aren't cutting it" because they're not
fashionable and seen as stinky old dinosaurs to the generation of developers
growing up now?

I mean, some people openly scoff at languages that aren't Python or
ECMAScript, and it appears to be very/extremely fashionable/obligatory on HN
to aggressively bash C++, even when it appears that many haven't written or
maintained anything in it, and casually dismiss it as "complicated" and daft
as it has no compulsory garbage collection. "Huh! It allows to you address
memory! HOW STUPID. Why would a language even exist anymore that lets you do
that?!"

When reading some of the concepts behind C++ and the reasons for decisions,
some of them make excellent sense. The 'newer' languages are simpler,
primarily because they haven't had decades of existence for people to demand
the features that people have demanded in C++.

Where I work we use C and C++ for most things, but that's just the industry
we're in. It appeared to become the norm to teach Java at universities now
(from the people I have bumped into), so C++ and C are no longer the starting
points for development for most graduates, perhaps?

~~~
LunaSea
I think it's also a question of using the tools that fit the problem.

If the fashionable choice is to use C++ for example, a lot more can go wrong
if the programmer isn't doing thing right.

Now if you have the same situation with a higher level language, the worst
that can happen is infinite loops.

Most projects don't need the extra performance C++ would provide. So most of
the time it's a better choice to use a higher level, more predictable language
like Node.js or Go.

Since ECMAScript engines like V8 have been optimised a lot in the last years
too, the performance difference is not that high either.

~~~
seren
This is not only about extra computational performance, but also about
predictability, I'd love to program on a hard real time system with Go instead
of C++, but given it uses GC, it is not going to happen. I have mostly worked
on real time embedded systems, and I only encountered C and C++, mostly
because there is no other sane choice available.

------
userbinator
I really like these suggestions since they can be summed up in one sentence:
they are what C programmers who write code with UB would already expect any
reasonably sane platform would do. I think it's definitely a very positive
change in attitude from the "undefined behaviour, therefore _anything_ can
happen" that resulted in compilers' optimisations becoming very surprising and
unpredictable.

 _Rather, we are trying rescue the predictable little language that we all
know is hiding within the C standard._

Well said. I think the practice of UB-exploiting optimisation was completely
against the spirit of the language, and that the majority of optimisation
benefits happen in the compiler backend (instruction selection, register
allocation, etc.) At least as an Asm programmer, I can attest that IS/RA can
make a _huge_ difference in speed/size.

The other nice point about this friendly C dialect is that it still allows for
much optimisation, but with a significant difference: instead of basing it on
assumptions of UB defined by the standard, it can still be done based on
proof; e.g. code that can be proved to be unneeded can be eliminated, instead
of code that may invoke UB. I think this sort of optimisation is what most C
programmers intuitively agree with.

~~~
_delirium
_instead of basing it on assumptions of UB defined by the standard, it can
still be done based on proof; e.g. code that can be proved to be unneeded can
be eliminated, instead of code that may invoke UB. I think this sort of
optimisation is what most C programmers intuitively agree with._

The main motivation for relying on some of the UB assumptions is that
compilers _weren 't_ able to prove things in a lot of cases where programmers
expected them to, because C (especially in the face of arbitrary pointers, and
esp. with incremental compilation) is quite hard to prove things about. So
compilers started inferring things about variables from what programmers do
with them. For example, if a pointer is used in a memcpy(), then the
programmer is signalling to us that they know this is a non-null pointer,
since you can't pass NULL as an address to memcpy(). So when compiled with
non-debugging, high-optimization flags, the compiler trusts the programmer,
and assumes this is a known-to-be-non-null pointer.

It would be interesting to see some benchmarks digging into whether turning
off that kind of inference has significant performance impact on real-world
code. Without adding more source-level annotations to C (like being able to
flag pointers as non-NULL), it will reduce some optimization opportunities,
since the compiler won't be able to infer as many things, even things that are
definitely true. But it might not be enough cases to matter.

------
simias
I can fit the proposed changes in two categories:

* Changes that replace undefined behaviours with undefined values. This makes it easier to catch certain types of coding errors at the cost of certain kinds of optimizations.

* Changes that remove undefined behaviours (wrapping arithmetic, memcpy/memmove, aliasing rules).

I'm comfortable with the first kind, although you can already achieve
something very similar to that with most compiler (as far as I know) by
building with optimizations disabled. Also stuff like missing return values
generates a warning in any compiler worth using, if you ignore that kind of
warnings you can only blame yourself.

The 2nd kind bothers me more, because it makes otherwise invalid C code valid
in this dialect. I'm worried this makes things even more difficult to explain
to beginners (and not so beginners, I still have to check the aliasing rules
from time to time to make sure the code I'm writing is valid).

Even if you're very optimistic this friendly C is not going to replace daddy
anytime soon. There'll be plenty of C code out there, plenty of C toolchains,
plenty of C environment where the definition of friendliness is having a dump
of the registers and stack on the UART in case of an error. Plenty of
environments where memcpy is actually memcpy, not memmove.

For that reason I'd be much more in favour of advocating the use of more
modern alternatives to C (and there are a bunch of those) rather than risking
blurring the lines some more about what is and isn't undefined behaviour in C.

~~~
pascal_cuoq
> There'll be plenty of C code out there

Precisely. The idea here is that any C program that does not specifically
require one's complement or sign-magnitude representation will work just as
well compiled as friendly C as it does compiled as C.

Also, just because testing has found no exploitable bugs in a C program
compiled with GCC 4.8 does not mean that GCC 4.9 will not introduce a new
optimization that wreaks havoc with undefined behavior that was there all
along. No such surprises with friendly C. It's the trusty C compiler that you
had in 1995, and it always will be.

Security-critical programs would never rely on undefined behavior!, you might
say. Let us take the example of ntpd, one of the building blocks of the
Internet, considered critical enough to be part of two Google bounty programs.
Here is a list of currently harmless undefined behaviors in it that a compiler
could use as an excuse to produce vulnerable binary code tomorrow:
[http://bugs.ntp.org/buglist.cgi?emailreporter1=1&emailtype1=...](http://bugs.ntp.org/buglist.cgi?emailreporter1=1&emailtype1=substring&email1=trust-
in-soft)

Friendly C is not a new language. It is C as most developers understand it,
and with bad surprises prohibited as much as possible.

Disclaimer: Julien, who reported these ntpd undefined behaviors, is my
colleague, and I am a co-author of the “friendly C” proposal.

~~~
simias
I get your point (and it's a fair one) but what happens when it goes the other
way? A bit of code written (and working) in friendly C finds its way into a
regular good old C codebase. Suddenly it contains a critical bug that may go
unnoticed.

Or some coder is used to friendly C, uses aliasing, memcpy and wrapping
arithmetic without thinking twice. Then he's asked to work on some project
building with a regular C toolchain. Gun is cocked and pointed at the foot.

Those problems only occur because friendly C is a dialect of C, you don't get
those drawbacks if you tell devs to use Rust or whatnot, because the
difference between Rust and C is obvious, unlike friendly C and C.

~~~
jjnoakes
There's an obvious solution to this: have friendly-c compilers accept some
declaration silently in a source file:

    
    
      !dialect friendly-c
    

Now, c code can compile as friendly-c code, but explicitly-declared friendly-c
code will fail if passed to a normal C compiler.

~~~
jzwinck
Such code would also fail if passed to tools like lint, and would result in
warnings or worse when loaded in IDEs. The incredible C toolchain built over
the decades is one of C's biggest advantages; friendly-c cannot afford to lose
even 10% of it.

~~~
comex
So make it a #pragma.

~~~
ygra
Then it cannot cause a compile error in standard C.

------
twoodfin
Can someone give a rationalization of why a "friendly" dialect of C should
return unspecified values from reading uninitialized storage? Is the idea that
all implementations will choose "0" for that unspecified value and allow
programmers to be lazy?

I'd much rather my "friendly" implementation immediately trap. Code built in
Visual C++'s debug mode is pretty reliable (and useful) in this regard.

EDIT: It occurs to me that this is probably a performance issue. Without
pretty amazing (non-computable in the general case?) static analysis, it would
be impossible to tell whether code initializes storage in all possible
executions, and using VM tricks to detect all uninitialized access at runtime
is likely prohibitively expensive for non-debug code.

~~~
zellyn
The scope of an unspecified value's effect is limited to that particular
value, whereas undefined behavior's effects are unlimited:
[http://en.wikipedia.org/wiki/Nasal_demons](http://en.wikipedia.org/wiki/Nasal_demons)

~~~
kazinator
However, in practice, continuing with an unspecified value can have
catastrophic consequences. And, also, undefined behavior can have nice
consequences, like terminating with a diagnostic message, which is not a
conforming way to implement unspecified behavior.

------
cousin_it
Sometime ago I came up with a simpler proposal: emit a warning if UB
exploitation makes a line of code unreachable. That refers to actual lines in
the source, not lines after macroexpansion and inlining. Most "gotcha"
examples with UB that I've seen so far contain unreachable lines in the
source, while most legitimate examples of UB-based optimization contain
unreachable lines only after macroexpansion and inlining.

Such a warning would be useful in any case, because in legitimate cases it
would tell the programmer that some lines can be safely deleted, which is
always good to know.

~~~
regehr
The linked articles by Chris Lattner explain why this is very difficult.

[http://blog.llvm.org/2011/05/what-every-c-programmer-
should-...](http://blog.llvm.org/2011/05/what-every-c-programmer-should-
know_21.html)

~~~
cousin_it
Is it true that undesirable UB exploitation often happens after macroexpansion
and inlining, and doesn't make any actual source lines unreachable? Are there
any simple examples of that?

~~~
regehr
Does this example help?

[http://markshroyer.com/2012/06/c-both-true-and-
false/](http://markshroyer.com/2012/06/c-both-true-and-false/)

~~~
cousin_it
Yeah. Thanks!

------
DSMan195276
The kernel uses -fno-strict-aliasing because they can't do everything they
need to do by adhering strictly to the standard, it has nothing to do with it
being to hard (The biggest probably being treating pointers as integers and
masking them, which is basically illegal to do with C).

IMO, this idea would make sense if it was targeted at regular software
development in C (And making it easier to not shoot yourself in the foot).
It's not as useful to the OS/hardware people though because they're already
not writing standards-compliant code nor using the standard library. There's
only so much it can really do in that case without making writing OS or
hardware code more annoying to do then it already is.

------
otikik
I haven't done C in at least a decade, so bare with me.

> 1\. The value of a pointer to an object whose lifetime has ended remains the
> same as it was when the object was alive

> 8\. A read from uninitialized storage returns an unspecified value.

Isn't that already the case in C?

> 3\. Shift by negative or shift-past-bitwidth produces an unspecified result.

What is the meaning of "an unspecified result" there?

> 4\. Reading from an invalid pointer either traps or produces an unspecified
> value.

What is the meaning of "traps" here? Is it the same as later on ("math- and
memory-related traps")?

~~~
rcthompson
If I'm understanding correctly, I think the difference between "undefined
behavior" and "returns an unspecified value" is consitency. If you had
something like:

    
    
        int i = [UNDEFINED BEHAVIOR];
        if (i > 0)
            COND1;
        if (i > 0)
            COND2;
    

then most people would expect that either COND1 and COND2 both execute, or
they both don't execute. But I believe that a compiler is theoretically free
to produce code that executes one but not the other since the value of i is
undefined. In other words, the compiler doesn't have to act as though i has
one specific value after that assignment. It can assume any value it likes
independently each time i is used. It can assume that i is positive at the
first check and then assume it's negative the next time, even though it might
be provably true that no code between those two checks can possibly change the
value of i. The change to "unspecified value" would mean that the compiler can
give i whatever value it wants to, but it has to be a single defined value,
and subsequent code must not behave as though i had multiple changing values.

But I'm not really a C programmer, so feel free to correct me if I'm
interpreting this wrong.

~~~
JoeAltmaier
Most responses are hyperbole. Undefined behavior likely doesn't include making
up new code; its not certain where you will end up, but someplace in existing
code is a certainty.

~~~
mikeash
Even excluding obvious fantasy like nasal demons, and real but unusual cases
like starting nethack, there are completely mundane things that are reasonably
common in real compilers but that don't go "someplace in existing code", like
aborting on integer overflow, or crashing when you access a bad pointer.

~~~
JoeAltmaier
Agreed; or like doing nothing, or choosing whatever value the CPU produces
when overflowing.

------
revelation
Plenty of architectures do not trap at null-pointer dereferencing (they don't
have traps). Some (like AVR) are not _arcane_ , they are one of the best
excuses for still writing C nowadays.

~~~
zokier
Just because a improvement is not applicable to all situations does not mean
that is should not be pursued.

------
jhallenworld
I have strong feelings that the C standard (and, by extension, C compilers)
should directly support non-portable code. It means many behaviors are not
"undefined"\- instead they are "machine dependent". Thus overflow is not
undefined- it is _defined_ to depend on the underlying architecture in a
specific way.

C is a more useful language if you can make machine specific code this way.

I'm surprised that some of the pointer math issues come up. Why would the
compiler assume that a pointer's value is invalid just because the referenced
object is out of scope? That's crazy..

Weird results from uninitialized variables can sometimes be OK. I would kind
of accept strange things to happen when an uninitialized bool (which is really
an int) is not exactly 1 or 0.

Perhaps a better way to deal with the memcpy issue is this: make memcpy() safe
(handles 0 size, allows overlapping regions), but create a fast_memcpy() for
speed.

~~~
dllthomas
The standard differentiates between "undefined" and "implementation defined".

~~~
to3m
...with one option for undefined behaviour being for it to behave "in some
documented manner characteristic of the platform" (C11 para 3.4.3.2). Which is
not the same thing as being implementation defined, of course, but is at least
a step above simply deciding your program is invalid and generating bogus code
because of it.

So compilers could do this stuff already. I have no idea why they don't. C
isn't really the language for people who want platform independence.

~~~
dllthomas
Well sure. They don't because you can get better performance if you are able
to eliminate branches that you can prove aren't taken.

------
Roboprog
Back in 1990, it didn't take long to figure out that (Borland) Turbo Pascal
was much less insane than C. Unfortunately, it only ran on MS-DOS & Windows,
whereas C was everywhere.

Employers demanded C programmers, so I became a C programmer. (now I'm a Java
programmer, for the same reason, and think it's also a compromised language in
many ways)

For anybody who is willing to run a few percent slower so that array bounds
get checked, there is now an open source FreePascal environment available, so
as not to be dependent on the scraps of Borland that Embarcadero is providing
at some cost. Of course, nobody is going to hire you to use Pascal. (or any
other freaky language that gets the job at hand done better than the current
mainstream Java and C# languages)

~~~
zamalek
Pascal, in general, was an amazing language. Simple to understand, compiled to
native code - honestly the only complaint you could now (with FreePascal) make
about the language is that it is a little "wordy."

Much the same as D - which was my initial thought when I read "friendly C." We
already have a friendly C.

~~~
peapicker
When I first learned C, after having used Turbo Pascal for years (back in the
80s) it was frustrating. When I finally understood it well a little later, I
felt like I had taken off the shackles of Pascal...

~~~
Roboprog
Granted, original, 1970, flavor Pascal on the CDC Cyber was very limited. I'm
curious, though, what is it that Turbo Pascal 5.0 and above would prevent you
from doing?

It's been a while, but if I really have to, I could cast a pointer to a long,
increment it by sizeof in a loop, and cast it back to a pointer in Pascal to
do pointer arithmetic in a tight loop.

I can use procedural types anywhere a function pointer in C would be used.

I can make complex constants (tables) just like a pre-initialized array of
structs in C.

There is an extension to return in the middle of a function/procedure rather
than toggling flags to pinball-style drop to the end of the routine.

I can set compiler pragmas to allow/force me to check for error codes after
every memory and I/O call (at the risk of accidentally stumbling on to cause
secondary damage), rather than just failing with a line number.

I can nest functions inside of other functions, and reference the enclosing
variables, just like in GNU C (oh, wait, that's a non-standard GNU extension),
even if I can't return such a function as a "use it later", honest to God
closure.

What are the shackles? (I'm probably in for a face palm, oh, _that_ , moment
when I get the answer, but I'm drawing a blank at the moment)

~~~
zamalek
> What are the shackles?

I think it would be supporting libraries. Looking at RedMonks's statistics[1]
Pascal doesn't even feature (I was able to find ~600 on GitHub). By today's
standards that's a pretty big shackle, you'd have to write a lot of
functionality yourself (although that's ignoring that you can use C libraries
with a little effort).

I love the language, it just had no staying power. Maybe people just prefer
curly braces.

[1]: [http://redmonk.com/dberkholz/2014/05/02/github-language-
tren...](http://redmonk.com/dberkholz/2014/05/02/github-language-trends-and-
the-fragmenting-landscape/)

~~~
Roboprog
>> curly braces

and "wokkas"

I mean, how many XML interpreters do we really need? At least for Java, it
seems like there are too many "Enterprise!!!" libraries that seem to exist to
extend the lameness of the language by requiring you to write a DSL in XML.

I suppose for C there really are a lot of low level libraries that actually do
things, though, such as read/write sound and graphics files, or implement
useful abstract data types. Things that might seem "built in" in a 21st
century language/environment, but not in C or Pascal.

------
sjolsen
I don't really see what's to be accomplished by most of the points of this. A
program that invokes undefined behaviour isn't just invalid; it's almost
certainly _wrong_. Shifting common mistakes from undefined behaviour to
unspecified behaviour just makes such programs less likely to blow up
spectacularly. That doesn't make them correct; it makes it harder to notice
that they're incorrect.

Granted, not everything listed stops at unspecified behaviour. I'm not
convinced that that's a good thing, though. Even something like giving signed
integer overflow unsigned semantics is pretty effectively useless. Sure, you
can reasonably rely on twos-complement representation, but that doesn't change
the fact that you can't represent the number six billion in a thirty-two bit
integer, and it doesn't make 32-bit code that happens to depend on the
arithmetic properties of the number six billion correct just because the
result of multiplying two billion by three is well-defined.

Then there's portability. Strict aliasing is a good example of this. Sure, you
can access an "int" aligned however you like on x86. It'll be slow, but it'll
work. On MIPS, though? Well, the compiler could generate code to automate the
scatter-gather process of accessing unaligned memory locations. This is C,
though. It's supposed to be close to the metal; it's supposed to force-- I
mean, let you do everything yourself, without hiding complexity from the
programmer. How far should the language semantics be stretched to compensate
for programmers' implicit, flawed mental model of the machine, and at what
point do we realize that we already have much better tools for that level of
abstraction?

~~~
aidenn0
Is this program "almost certainly wrong":

    
    
        uint32_t bytesToUnsigned(uint8_t bytes[4]
        {
          return bytes[0] |
                 bytes[1] << 8 |
                 bytes[2] << 16 |
                 bytes[3] << 24;
        }
    

The behavior is undefined on a system with 32 bit integers because of signed
arithmetic overflow (despite the fact that all the explicit types involved are
unsigned, a uint8_t gets promoted to a _signed_ integer before the left-shift
operation).

Right now it will work on every compiler I've tried, but it would be perfectly
valid (by the ANSI specification) for a compiler to assume that the result of
that function can never have the highest bit set. In friendly C, the result is
well defined.

~~~
sjolsen
>Is this program "almost certainly wrong"

No, that one's a consequence of C's insane type system. The solution here
isn't to change the semantics of signed integer arithmetic. The solution is to
change integer promotion to use unsigned arithmetic like it should have done
in the first place.

~~~
rootbear
Not taking a position on this, and it's been a long while, but I seem to
remember that the discussion in X3J11 on the issue of integer promotions,
which mostly occurred before I joined, were long and heated.

------
Someone
_" Reading from an invalid pointer either traps or produces an unspecified
value."_

That still leaves room for obscure behavior:

    
    
      if( p[i] == 0) { foo();}
      if( p[i] != 0) { bar();}
    

Calling foo might change the memory p points at (p might point into the stack
or it might point to memory in which foo() temporarily allocates stuff, or the
runtime might choose to run parts of free() asynchronously in a separate
thread), so one might see cases where both foo and bar get called. And yes,
optimization passes in the compiler might or might not remove this problem.

Apart from truly performance-killing runtime checks i do not see a way to fix
this issue. That probably is the reason it isn't in the list.

(Feel free to replace p[i] by a pointer dereference. I did not do that above
because I fear HN might set stuff in _italics_ instead of showing asterisks)

~~~
mseebach
Short of languages like clojure and haskell that are heavily opinionated on
immutability and side effects, I can't think of any languages that
meaningfully protects against that kind of obscure behaviour. Indeed, it seems
"else" was meant for exactly this kind of code.

------
rwmj
I would think also something like "if I write a piece of code, the compiler
should compile it", perhaps "or else tell me with a warning that it isn't
going to compile it".

------
Ono-Sendai
Not sure this is a good idea. Since a lot of behaviour becomes implementation-
defined, code written in this dialect will not be portable.

~~~
andrewchambers
It already is implementation defined by accident.

------
kazinator
> 1\. The value of a pointer to an object whose lifetime has ended remains the
> same as it was when the object was alive.

This does not help anyone; making this behavior defined is stupid, because it
prevents debugging tools from identifying uses of these pointers as early as
possible. In practice, existing C compilers do behave like this anyway: though
any use of the pointer (not merely dereferencing use) is undefined behavior,
in practice, copying the value around does work.

> 2\. Signed integer overflow results in two’s complement wrapping behavior at
> the bitwidth of the promoted type.

This seems like a reasonable request since only museum machines do not use
two's complement. However, by making this programming error defined, you
interfere with the abilty to diagnose it. C becomes friendly in the sense that
assembly language is friendly: things that are not necessarily correct have a
defined behavior. The problem is that then people write code which depends on
this. Then when they do want overflow trapping, they will have to deal with
reams of false positives.

The solution is to have a declarative mechanism in the language whereby you
can say "in this block of code, please trap overflows at run time (or even
compile time if possible); in this other block, give me two's comp wraparound
semantics".

> 3\. Shift by negative or shift-past-bitwidth produces an unspecified result.

This is just word semantics. Undefined behavior, unspecified: it spells
nonportable. Unspecified behavior may seem better because it must not fail.
But, by the same token, it won't be diagnosed either.

A friendly C should remove all _gratuitous_ undefined behaviors, like
ambiguous evaluation orders. And diagnose as many of the remaining ones which
are possible: especially those which are errors.

Not all undefined behaviors are errors. Undefined behavior is required so that
implementations can extend the language locally (in a conforming way).

One interpretation of ISO C is that calling a nonstandard function is
undefined behavior. The standard doesn't describe what happens, no diagnostic
is required, and the range of possibilities is very broad. If you put "extern
int foo()" into a program and call it, you may get a diagnostic like
"unresolved symbol foo". Or a run-time crash (because there is an external foo
in the platform, but it's actually a character string!) Or you may get the
expected behavior.

> 4\. Reading from an invalid pointer either traps or produces an unspecified
> value. In particular, all but the most arcane hardware platforms can produce
> a trap when dereferencing a null pointer, and the compiler should preserve
> this behavior.

The claim here is false. Firstly, even common platforms like Linux do not
actually trap null pointers. They trap accesses to an unmapped page at address
zero. That page is often as small as 4096 bytes. So a null dereference like
ptr[i] or ptr->memb where the displacement goes beyond the page may not
actually be trapped.

Reading from invalid pointers already has the de facto behavior of reading an
unspecified value or else trapping. The standard makes it formally undefined,
though, and this only helps: it allows advanced debugging tools to diagnose
invalid pointers. We can run our program under Valgrind, for instance, while
the execution model of that program remains conforming to C. We cannot
valgrind the program if invalid pointers dereference to an unspecified value,
and programs depend on that; we then have reams of false positives and have to
deal with generating tedious suppressions.

> 5\. Division-related overflows either produce an unspecified result or else
> a machine-specific trap occurs.

Same problem again, and this is already the actual behavior: possibilities
like "demons fly out of your nose" does not happen in practice.

The friendly thing is to diagnose this, always.

Carrying on with a garbage result is anything but friendly.

> It is permissible to compute out-of-bounds pointer values including
> performing pointer arithmetic on the null pointer.

Arithmetic on null works on numerous compilers already, which use it to
implement the offsetof macro.

> memcpy() is implemented by memmove().

This is reasonable. The danger in memcpy not supporting overlapped copies is
not worth the microoptimization. Any program whose performance is tied to that
of memcpy is badly designed anyway. For instance if a TCP stack were to double
in performance due to using a faster memcpy, we would strongly suspect that it
does too much copying.

> The compiler is granted no additional optimization power when it is able to
> infer that a pointer is invalid.

That's not really how it works. The compiler assumes that your pointers are
valid and proceeds accordingly. For instance, aliasing rules tell it that an
"int *" pointer cannot be aimed at an object of type "double", so when that
pointer is used to write a value, objects of type double can be assumed to be
unaffected.

C compilers do not look for rule violations as an excuse to optimize more
deeply, they generally look for opportunities based on the rules having been
followed.

> When a non-void function returns without returning a value, an unspecified
> result is returned to the caller.

This just brings us back to K&R C before there was an ANSI standard. If
functions can fall off the end without returning a value, and this is not
undefined, then again, the language implementation is robbed of the power to
diagnose it (while remaining conforming). Come on, C++ has fixed this problem,
just look at how it's done! For this kind of undefined behavior which is
erroneous, it is better to require diagnosis, rather than to sweep it under
the carpet by turning it into unspecified behavior. Again, silently carrying
on with an unspecified value is not friendly. Even if the behavior is not
classified as "undefined", the value is nonportable garbage.

It would be better to specify it a zero than leave it unspecified: falling out
of a function that returns a pointer causes it to return null, out of a
function that returns a number causes it to return 0 or 0.0 in that type, out
of a function that returns a struct, a zero-initialized struct, and so on.

Predictable and portable results are more friendly than nonportable,
unspecified, garbage results.

~~~
scott_s
I believe you have confused _undefined behavior_ and _implementation defined_
behavior. Undefined behavior means that the code is not legal, and if the
compiler encounters it, it is allowed to eliminate it, and all consequential
code. (The linked posts and papers have lots of examples of this.)

Implementation defined behavior means that the code is legal, but the compiler
has freedom to decide what to do. It, however, is _not_ allowed to eliminate
it.

~~~
kazinator
After 20 years of comp.lang.c participation, it's unlikely that I'm confusing
UB and IB.

Undefined behavior doesn't state anything bout legality; only that the ISO C
standard (whichever version applies) doesn't provide a requirement on what the
behavior should be.

Firstly, that doesn't mean there doesn't exist any requirement;
implementations are not only written to ISO C requirements and none other. ISO
C requirements are only one ingredient.

Secondly, compilers which play games like what you describe are not being
earnestly implemented. If a compiler detects undefined behavior it should
either diagnose it or provide a documented extension. Any other response is
irresponsible. The unpredictable actual behaviors should arise only when the
situation was ignored. In fact it may be outright nonconforming.

The standard says:

 _" Possible undefined behavior ranges from ignoring the situation completely
with unpredictable results, to behaving during translation or program
execution in a documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message)."_

A possible interpretation of the above statement is that the unpredictable
results occur only if the undefined behavior is ignored (i.e. not detected).

If some weird optimizations are based on the presence of undefined behavior,
they are in essence extensions, and must be documented. This is because the
situation is being exploited rather than ignored, and the program isn't being
terminated with or without a diagnostic. That leaves "behaving in a documented
manner".

But optimizing based on explicitly detecting undefined behavior is not a
legitimate extension. It is simply insane, because the undefined behavior is
not actually being defined. There is no extension there, per se. Optimization
requires abstract semantics, but in this situation, there aren't any; the
implementation is taking C that has no (ISO standard) meaning, it is not
_giving_ it any meaning, and yet, it is trying to make the meaningless program
go faster. Doing all this without issuing a diagnostic is criminal.

I don't think the GCC people are really doing this; only people think that.
Rather, they are writing optimizations which assume that behavior is _not_
undefined, which is completely different. The potential there is to be over-
zealous: to forget that GCC is expected to be consistent from release to
release: that it preserves its set of documented extensions, and even some of
its undocumented behaviors. Not every behavior in GCC that is not documented
is necessarily a fluke. Maybe it was intentional, but failed to be documented
properly.

Compiler developers must cooperate with their community of users. If 90% of
the users are relying on some undocumented feature, the compiler people must
be prepared to make a compromises. Firstly, revert any change which breaks it,
and then, having learned about it, try avoid breaking it. Secondly, explore
and discuss this behavior to see how reliable it really is (or under what
conditions). See whether it can be bullet-proofed and documented. Failing
that, see if it can be detected and diagnosed. If such a behavior can be
detected and diagnosed, then it can be staged through planned obsolescence:
for a few compiler releases, there is an diagnostic, but it keeps working.
Then it stops working, and the diagnostic can change to a different one, which
merely flags the error.

~~~
scott_s
_But optimizing based on explicitly detecting undefined behavior is not a
legitimate extension. It is simply insane, because the undefined behavior is
not actually being defined._

The authors of this proposal agree, and are trying to avoid such situations by
just eliminating undefined behavior. Their blog posts and academic papers
(linked from the blog post in this submission) have many examples where such
insanity has happened.

------
dschiptsov
Show us, please, how the dialect of C which is used for development of the
Plan9 is unfriendly and not good-enough.

~~~
astrange
It's been empirically demonstrated that C wasn't good enough for Rob Pike.

~~~
dschiptsov
Are you in love with Golang too?)

------
allegory
It is friendly, if you're not an idiot. Not becoming an idiot is the best
solution (practice, lots).

Edit: perhaps I made my point badly but don't assume that you'll ever be good
enough to not be an idiot. Try to converge on it if possible through. I'm
still an idiot with C and I've been doing it since 1997.

~~~
mikeash
How long does it take? I'm at about 20 years and still haven't made it.

~~~
allegory
Well I see it more like it takes e^-x years i.e. it never converges on total
non-idiocy but you can get pretty close.

(been doing it for 17 years)

~~~
pascal_cuoq
For nearly all of the last 10 years, I have been working on the specific topic
of how to better predict what a C program can do (and in particular, correctly
predict that a C program will behave well). This is a shorter time than 17
years, but it is more focused than using C as a tool for actually achieving
something else, where you aren't thinking about the specifics of C most of the
time, hopefully.

Despite this, a fortnight ago I was caught off-guard by GCC 4.9's choice of
compiling the program in
[http://pastebin.com/raw.php?i=fRbGfQ6p](http://pastebin.com/raw.php?i=fRbGfQ6p)
to an executable that displays “p is non-null” followed by “p:(nil)”.

There has to be a way out of this. Seriously. Most C programmers neither asked
for nor deserve this.

~~~
DSMan195276
If you don't mind me taking a guess, is it because calling memcpy with NULL is
undefined-behavior, thus GCC assumes p must be non-null and thus the if (p) is
redundant and x must equal 1?

~~~
pascal_cuoq
Exactly. Propagating the information that p was passed as argument to memcpy
and thus must be non-null is a new GCC 4.9 optimization (some discussion here:
[http://stackoverflow.com/questions/5243012/is-it-
guaranteed-...](http://stackoverflow.com/questions/5243012/is-it-guaranteed-
to-be-safe-to-perform-memcpy0-0-0) ).

