Hacker News new | past | comments | ask | show | jobs | submit login
The Problem with Friendly C (regehr.org)
173 points by Mindless2112 on Dec 24, 2015 | hide | past | favorite | 173 comments



Since, according to Chandler Carruth, the aim for Clang is to not do optimizations based on undefined behaviour without a corresponding instrument in ubsan[0], I don't see much traction on this well-defined/boring C effort.

You know, in my experience, things would be great if people actually turned on warnings. I write all my new code with -Weverything with a few noise categories turned off. Everybody should build code at this level from day 1.

[0] http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html


> Everybody should build code at this level from day 1.

On the project I run, the CI will reject your code in CR if it doesn't compile under two different compilers with all but a handful of warnings turned on as errors, runs and passes tests, and passes various linters, including a check for undocumented types, variables, parameters and return values. IMNSHO, every project should be like this. Once it's setup, it's a breeze and doesn't impede development speed at all.


For new code and for actively maintained old code you are perfectly correct. But there is a huge amount of C code we still rely on that isn't getting enough attention and I don't want that getting broken either.


At work, we migrated a small codebase this year to VS2008. It was "painful". This week I put it through asan and found five buffer overflows or similar with about a half hour of work, but the thing still spits out a hundred warnings or so when you compile, and that's with default settings. This project is about 1% of the code, by line count, that my team is responsible for.

At home, I turn on -Wall -Wextra for starters, enable about a dozen more for good measure, with -Werror as the pièce de réisistance.


Also we're all waiting for the strict aliasing checker.


The problem is UBSan is still kind of a pain to use. Just turn all of these optimizations directly into warnings and the problem would be solved--if you want to optimize them, the warnings will let you do it yourself.


If ((uint32_t)x<<32)==x on one system, and ((uint32_t)x<<32)==0 on another, why is that a problem? The issue isn't that one system behaves differently from another. The issue is that compilers take it upon themselves to decide that because code would behave differently on these different systems - which is why the standard leaves it undefined - it can be considered to fall into some kind of black hole that allows the compiler to do whatever it likes.

But why can't it just produce something different on each system?

Allow me to call it as I see it: the modern interpretation of undefined behaviour is bullshit. What compilers do today should be the recourse of absolute last resort, and the sort of thing that makes its authors feel bad. But it seems to be treated as a matter of course.

I don't know what to say.

Mandatory reading: http://robertoconcerto.blogspot.co.uk/2010/10/strict-aliasin... ("Everyone is fired"); https://groups.google.com/forum/#!msg/boring-crypto/48qa1kWi... ("these people simply don't understand what most C programmers want"); http://blog.metaobject.com/2014/04/cc-osmartass.html ("please don't do this, you're not producing value")


Apparently a (uint32_t) shifted by 32 is license to become completely insane.

  int main(int argc, char **argv) {
    uint32_t x = (uint32_t)0x12345678<<(uint32_t)32;
    printf("x<<32 = %x\n", x );
  }

    (Almost all compilers make a warning about the shift size)
  gcc 4.9.2 on Debian x86:      0x0
  gcc 4.9.2 on Debian x86, -O3: 0x0
  clang 3.6 on Debian x86:      0x0
  clang 3.7 on Debian x86:      0x0
  gcc 4.9.2 on Raspbian ARM:    0x0
  tcc 0.9.27 on Raspbian ARM:   0x0                NO WARNING
  tcc 0.9.27 on Debian x86:     Segmentation fault NO WARNING
  clang on El Capitan:          0x5a788cb0
  clang on El Capitan, -O3:     0x54049be8


> Apparently a (uint32_t) shifted by 32 is license to become completely insane.

That's easy to explain. The optimizer replaced the instruction with an undef [1], which has no live range. So it printed the value of some random register, which happened to be 0, 0x5a788cb0, or 0x54049be8.

[1]: http://llvm.org/docs/LangRef.html#undefined-values


I am almost LOLing at the segmentation fault. Where the fuck does a segmentation fault come from? It's shifting a value. Utterly mystifying.


Looking at the generated instructions might give a clue, but I'm just as mystified. Given the size of the program, I doubt there's much of it to inspect anyway.


Maybe it's a hardware trap and not a regular null reference segmentation fault?


Well, shl doesn't trap on x86 for out of range values; it just masks the second operand by 0x1f [1]. But it could be some other kind of trap.

Complete guess in the dark: tcc misencoded an instruction causing an illegal instruction trap, maybe in some sort of misguided attempt to optimize the shift to a lea.

[1]: http://x86.renejeschke.de/html/file_module_x86_id_285.html


Wouldn't that normally produce SIGILL, not SIGSEGV?


I think the problem is that some of these optimizations aren't just compiler makers being greedy, they're actually a huge benefit. IMO what's missing is the ability to mark areas "unsafe" -- IE, tell the compiler "it's ok to take advantage of certain optimizations here" while marking other areas "please don't goof with this" (ie, security critical code). You can kind of do this with pragmas, but not really.

Here's a good example from the llvm blog: ( http://blog.llvm.org/2011/05/what-every-c-programmer-should-... )

> Signed integer overflow: If arithmetic on an 'int' type (for example) overflows, the result is undefined. One example is that "INT_MAX+1" is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code. For example, knowing that INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true". Knowing the multiplication "cannot" overflow (because doing so would be undefined) allows optimizing "X*2/2" to "X". While these may seem trivial, these sorts of things are commonly exposed by inlining and macro expansion. A more important optimization that this allows is for "<=" loops like this:

> for (i = 0; i <= N; ++i) { ... }

> In this loop, the compiler can assume that the loop will iterate exactly N+1 times if "i" is undefined on overflow, which allows a broad range of loop optimizations to kick in. On the other hand, if the variable is defined to wrap around on overflow, then the compiler must assume that the loop is possibly infinite (which happens if N is INT_MAX) - which then disables these important loop optimizations.


For example, knowing that INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true".

If a programmer writes "X+1 > X", chances are this is an overflow check. Doing this is perfectly defined by the standard if X is an unsigned integer, but not if it's signed. That's what doesn't make sense, since they could've made the unsigned case undefined as well.

which allows a broad range of loop optimizations to kick in

What sort of optimisations exactly, and just how significant are they? I don't believe C should be a language where the compiler does all sorts of high-level optimisation; it should be a straightforward "do what I say" type of language where you get almost exactly what you write, and the only optimisations should be at the level of things like instruction selection --- the optimisations that a programmer would not be able to do at the source level.


> I don't believe C should be a language where the compiler does all sorts of high-level optimisation; it should be a straightforward "do what I say" type of language where you get almost exactly what you write, and the only optimisations should be at the level of things like instruction selection --- the optimisations that a programmer would not be able to do at the source level.

In that case, you're asking for easily 3x-5x less performance in the software you use every day, if not more. You can get essentially this by turning on -O0 and compiling, say, Firefox. This is done in debugging, and it's a terrible browsing experience.

Compiler optimizations are really, really important. CSE, GVN, LICM, SROA, etc. aren't done for fun. They're done because modern software depends on them to be fast.

Writing low-level "post-optimized" code is at odds with good engineering practice. If you can't factor things out into functions and count on an inliner to inline them and then propagate away constants, SROA away intermediate structures, etc. then you're essentially telling programmers they can't factor code out into functions. If your compiler refuses to do the Hacker's Delight integer divide optimization (which, by the way, you need good constprop to make good use of a lot of the time), then you're telling programmers they have to get out their copy from their bookshelf and compute the magic number every time they want to integer divide fast (especially on ARM). This sort of thing puts a huge drain on developer productivity and maintainability.


What I want is for the semantics of the language to match the semantics of the physical machine I'm writing software for. If on x86 INT_MAX+1==INT_MIN, then that's what should happen on x86. If on ARM INT_MAX+1==INT_MAX, that's what should happen on ARM. No, I don't want portability. Portability means you're coding against the least common denominator. If I want it to be portable, I'll use a different compiler.


> If on x86 INT_MAX+1==INT_MIN [...]

There is no such thing as "+ on x86". "+" is an operator of the C language. Modern x86 has about two dozen instructions that can be used to perform additions, each with slightly different behaviour, and with different performance characteristics. Someone has to define how C's "+" maps to those instructions. That is what the C standard does. The way it does that is by specifying properties that any implementation of C's "+" needs to have, leaving the specific choice of instructions to the compiler writer.


There are integer additions of different size, but is there anything you could honestly call plain integer addition that has different behavior from wrapping?


What's your point? If you define a "plain integer addition" to be "wrapping binary two-complement addition", then there isn't, otherwise, there is.

But the C standard doesn't say "+ maps to what you could honestly call plain integer addition" anyway, so it's kindof pointless? And if you were to define your own language, you obviously could define "+" to have architecture-specific semantics, whithout appealing to any "plain integer addition".


>What's your point? If you define a "plain integer addition" to be "wrapping binary two-complement addition", then there isn't, otherwise, there is.

I'm not defining it that way. I used the word 'plain' because some instructions are specifically designed to be variants of normal instructions. It's in their name that they do thing abnormally, so they should be discarded when talking about normal behavior.

>But the C standard doesn't say "+ maps to what you could honestly call plain integer addition" anyway, so it's kindof pointless?

You originally said: There is no such thing as "+ on x86". "+" is an operator of the C language.

So I was explaining what "+" means in the absence of C rules.

+ means addition in general conversation, that's basic math and English skills.

That it's integer addition is obvious from context, because we're adding 1 to INT_MAX.

And I already explained why I used the word 'plain'.

-

There are architectures with normal integer add instructions that do not have twos-complement behavior. I'm just not aware of any such instructions on x86. Can you name one?

If there are none, then I am comfortable asserting that "+" on x86 exists and is twos-complement.


> + means addition in general conversation, that's basic math and English skills. > > That it's integer addition is obvious from context, because we're adding 1 to INT_MAX.

Well ... yeah, "+" usually means addition, sure. But there are so many types of addition that that's not really specific enough for a language definition. And adding 1 to INT_MAX? Well, you could interpret "INT_MAX" to indicate C code, and thus "+" to be C's addition operator. In which case, it's not integer addition, at least not the kind that basic math and English skills would suggest: Adding two positive integers is always defined (and also never gives a value lower than either of the two operands), very much unlike "+" for "int"s in C.

> There are architectures with normal integer add instructions that do not have twos-complement behavior. I'm just not aware of any such instructions on x86. Can you name one?

Sure, PADDSW, for example. At least if I understand your "twos complement" requirement correctly to mean a specific wraparound behaviour. Or, if you want to count that, DAA and AAA.


>PADDSW

As I see it that's a variant of PADDW, so whatever PADDW does is what matters, and PADDW does twos complement.

>DAA and AAA

I thought about calling those out, I think it's stretching too much to refer to an archaic instruction that works on 'ascii' or such when we're figuring the behavior of 'integers'. Even if it is one byte.


> As I see it that's a variant of PADDW, so whatever PADDW does is what matters, and PADDW does twos complement.

Nope, PADDW is a variant of PADDSW, so whatever PADDSW does is what matters, and PADDSW does saturating addition.

Or in other words: You are defining "normal" to mean "twos-complement with wrap-around" after all, which makes it rather unsurprising that every "normal" instruction according to your definition turns out to be doing twos-complement addition with wrap-around.

> I thought about calling those out, I think it's stretching too much to refer to an archaic instruction that works on 'ascii' or such when we're figuring the behavior of 'integers'. Even if it is one byte.

Well, packed BCD definitely is not ASCII. Nor twos-complement, nor ones-complement, nor IEEE754. And BCD quite definitely represents a range of the integers.


"ascii adjust after addition"

> Nope, PADDW is a variant of PADDSW, so whatever PADDSW does is what matters, and PADDSW does saturating addition.

That's like arguing that Red Sports Car is a variant of Red Sports Car with Stripes. The one that adds adjectives is the variant, the one without those adjectives is the base.

If PADDW did saturating arithmetic and PADDWW did wrapping, I would agree with you, but that's not how x86 works.


It is, however, what many DSPs do.


Many DSPs don't support byte addressing or arithmetic on 8-bit integers which a huge amount of non-DSP C code relies on. They're not exactly friendly to code not written specifically to run on the DSP. (Or C code in general, for that matter - the reason they have saturating addition and other oddball instructions is because it makes a lot of DSP code faster, but there's no C representation of any of those things.)


And now we get back to the initial quote: "What's your point? If you define a "plain integer addition" to be "wrapping binary two-complement addition", then there isn't, otherwise, there is."


Okay, well since it's my term, I'll make it absolutely clear. The definition is not circular. On those DSPs, plain integer addition is saturating. For x86, so far all evidence points to it being twos-complement.

So the original original quote, "on x86 INT_MAX+1==INT_MIN", is true. As far as I can tell.


> You can get essentially this by turning on -O0 and compiling, say, Firefox. This is done in debugging, and it's a terrible browsing experience.

Which says more about Firefox than it does about -O0. I've run Seamonkey on 3x-5x slower machines than I use these days and it's been fine.


Gecko is what mostly benefits from the optimizations, and Gecko is the largest component shared between SeaMonkey and Firefox. I don't believe SeaMonkey at -O0 is an acceptable experience.


IMHO aggressive optimization at compile time is an example of premature optimization. Let the hardware have access to a straightforward representation. Once the run-time hot-spots are identified, the hardware (firmware, VM, whatever) can rewrite the binary code to execute faster. Excessive compiler optimization makes this difficult or impossible (too much information thrown away.) Compilers should be designed for fast compilation speed.


That would work if most applications spent all their time in a few hot spots. But, contrary to popular wisdom, that's usually not the case. Most applications have flat profiles (to steal a quote from DannyBee—but it matches my experience as well). They have flat profiles because people have spent a lot of time optimizing them. In this context—which is the norm—eliminating optimizations to save compile time and deferring optimization to a few "hot spots" has the effect of turning off optimization for the whole program.

It's common to write off compiler optimizations as unimportant, because they're invisible and people don't see them. They're also complex, which makes people predisposed to get rid of them in the name of "simplicity". But, for better or for worse, optimizing compilers are necessary complexity.

Optimizing compilers are not ubiquitous because compiler engineers just like to play with technology. They're ubiquitous because you need them.


I agree optimization is important. So important that it should be pushed down into the hardware. Binaries should look almost like source code. But that's just my vision for what it's worth.


Hardware already does insane amounts of optimization. The modern superscalar out of order processor basically does it's own JIT from X86 into their own internal micro-ops. Reordering instructions on the go etc. That's another 2-10x speed difference on modern computers.


Completely agree :) But even better to push C code straight down to the hardware and let it crunch on that! Let it allocate a few thousand registers, or spawn off an FPGA compiler to create a few new instructions. Crazy?


Hardware doesn't work like this. You might want to read Hennessy and Patterson, and the original RISC I paper.

http://www.amazon.com/Computer-Architecture-Fifth-Edition-Qu...

http://www.cecs.pdx.edu/~alaa/ece587/papers/patterson_isca_1...


RISC created a huge local minimum by speeding up C code to the exclusion of other languages. I predict that eventually future processors will hide more features from the higher software levels (such as number of registers, instruction types and formats) in order to improve efficiency at the machine level. I think we are seeing this trend with GPUs already. Current CPUs don't do this because they have to maintain binary compatibility with a huge installed base. We can compare notes in a decade or so :-)


Current processors already do that. You don't see the true number of registers or the true instruction set/format of any modern Intel processor. x86 instructions are translated into micro-ops, so x86 is really just a compatibility layer.

I do agree that current processors optimize for C/C++ (although of course there are niche systems like Azul which optimize for other languages). It would be nice to have processor extensions that allow us get better GC performance, or better handling of immutable values. There's a chicken-and-egg problem getting there.


GPU's don't do any OoO processing like modern CPU's do. They also don't do any register renaming. They execute things really literally, up to the point where one has to manually put delay slots for pipelined stuff if one really writes the raw asm (Which the manufacturers tend to keep really hidden, in order to avoid the binary compatibility trap, see https://github.com/NervanaSystems/maxas as an example for third party assembler for nvidia Maxwell arch)

On GPU's the binary compatibility issue is solved by having the driver compile the shader/compute kernel before it's used. As an example nvidia uses PTX (see http://docs.nvidia.com/cuda/parallel-thread-execution/) as an intermediate language in CUDA which is then compiled by the runtime into the actual ASM.

On modern CPU's the register renaming has already decoupled the physical registers from the instruction set register. As an example modern haswell has over 100 registers per core.


> RISC created a huge local minimum by speeding up C code to the exclusion of other languages

Would you mind expanding this.


In the future I believe you are going to see less emphasis on the aggressive speedup of C code for traditional CPUs. Instead you will see many more gadgets with simpler processors that run C code slower in the effort to save power. GPGPUs and algorithm specific hardware (e.g. video, crypto, network, DSP, neural nets) will fill out the rest of the chip. At some point GPUs will have enough raw power and GP features for it to be possible to run an instance of a late-80's operating system within the working set of a single GPU processing element (perhaps with virtual memory emulated as in jslinux.) At that point the need for a power hungry CPU and artificial CPU/GPU distinction will start to fade away completely. Along with Peak Oil we will have Peak CPU.

So, in general I am saying that the road to better performance will not be in aggressive compiler optimization, but rather in higher level design tools to manage totally new software/hardware abstractions. Binaries will be specified at a higher level and look more like source code. At this point my crystal ball becomes admittedly a bit fuzzy.


Your claim was that RISC "created a huge local minimum by speeding up C code to the exclusion of other languages" and it's not at all clear to me what RISC has to do with c, and why other languages are worse off for this.


It doesn't really seem like this solves the problem of optimizers introducing bugs and vulnerabilities.

Take the canonical optimizer-created security hole: the hardware optimizer replaces a constant-time compare (which doesn't leak timing information) with a variable-time compare (which does).

I don't think this solves the problem we're setting out to solve, ie. the optimizer introducing bugs.


That good point, replacing a constant time compare with a non constant time compare is a very terrible bug to introduce.

A more amusing issue I saw was an optimization that looked for places where it could replace manual memory copying with ca call to memcpy. Something that drove the guys writing libc nutzoid. Because it was replacing the code in memcpy with a call to memcpy. (On some platforms you can implement memcpy with special assembly language calls. On some you can't)

Personally I care little about speed, since if I need more speed I can get that. And frankly if you tell me the resulting binary is 20% faster for some things, I just do not care. But I worry a lot about losing the ability to reason about side effects.


Maybe we are all better served by faster compilers that create straightforward binaries (and less bugs overall both in the compiler and application code.) Optimization researchers could focus on source-to-source transformation tools with intelligent human-in-the-loop guidance. Or else they can work at the hardware/JIT level if they prefer.

Right now compiler writers are playing in a kind of local minimum (premature optimization as I said.) This may produce 3x-5x faster binary code today but also forces CPU manufacturers to retain backward compatibility causing them to also stay stuck in this local well. Eventually a new architecture is created (with 10x the registers etc.) and the cycle continues.


> What sort of optimisations exactly, and just how significant are they?

Well, if you could remove one instruction per loop on most computers (especially conditionals that could cause a branch mis-prediction), the effect of that in terms of performance is pretty staggering. Instead of thinking of it as an optimization, think of it as a check that doesn't need to be inserted into the code. If you can tell the compiler "I promise I won't overflow this variable", that's a lot less error checking it needs to insert on every iteration.

> I don't believe C should be a language where the compiler does all sorts of high-level optimisation; it should be a straightforward "do what I say" type of language where you get almost exactly what you write

Well, you could just compile with -O0 then. I don't necessarily agree with C on this, but it's pretty clear the community has made the decision to prioritize speed over safety. Nothing wrong with that, but if safety is your priority you should probably look at things other than C. Even if you were to define undefined behavior, it's a spectacularly dangerous language.


> but if safety is your priority you should probably look at things other than C

And those things are?

C is the low level language we have. Now compiler writers decided it'll work like a high level language, except for all the actual benefits.

We'll probably get usable C compilers that work (in fact, this already started), and some stuff only good for old code.


Rust or Go are possible alternatives, or Java if it doesn't need to be system level.


Java for performance critical or realtime things sounds like a feverish dreamed nightmare.


The question was about safety, and java is indeed safe.


> If a programmer writes "X+1 > X", chances are this is an overflow check.

If the programmer writes

    for (i = a; i < b; i++)
then it's not clear if you need to check 'i' for overflow; Assuming that 'i++' is greater than 'i' allows the compiler to, say, unroll the loop.


> Doing this is perfectly defined by the standard if X is an unsigned integer, but not if it's signed. That's what doesn't make sense, since they could've made the unsigned case undefined as well.

It does make sense if you understand the historical context: There was a time when some processors still used ones-complement representation for signed integers. That's why the highest performance choice of instructions for signed integer addition would give different results in the overflow case on different processors. Which is why the standard leaves the behaviour undefined, so that compiler writers can choose the instructions that make for the fastest signed addition on the respective target, while programmers can not rely on any specific behaviour.

None of that applies for unsigned integers, and also, the common overflow behaviour of unsigned integers is commonly used intentionally for many types of computations (such as in cryptography), so it is both useful and it doesn't cost anything in terms of performance to define the semantics of unsigned overflow.


As the article points out, weird looking comparisons like `x+1>x` tend to pop up when macros get expanded or functions get inlined.


If the standard writers had wanted to, they could have specified that a shift by an amount as wide or wider than the type had an "implementation-defined result", which is used elsewhere (for example, right-shifting a negative value gives an implementation-defined result).


Yes. And that would make sense.

They might not insist on exactly this, but they already leave this sort of possibility open by suggesting that undefined behaviour could result in the program behaving "in a documented manner characteristic of the environment".

If the standard writers' hands were tied by having to cater for every system they'd ever heard of, even the stupid ones where shifting by 32 (or whatever) results in rand() (or whatever) - implementers' hands are not! They have only one system to think about!

If that system happens to be one of the stupid ones - well, anybody working on it will have exactly the same problem.

But if it isn't one of the stupid ones - why do they insist that their users should behave as if it is?


What modern interpretation? As long as C has been standardized, undefined behaviour has meant nasal demons.

There's an argument to be had that some undefined behaviour should rather be unspecified or implementation-defined, but compilers making use of it for aggressive optimization? That's by design.

Quoting another article by John Regehr:

My view is that exploiting undefined behavior can sometimes be OK if a good debugging tool is available to find the undefined behavior.

Thus -fsanitize=undefined.


How about being build-system defined? As in, let me specify a "target behavior profile" as a switch to the compiler, which would be shorthand for a bunch of switches that define specific results for specific undefined behaviors: "-foverflow=checked" (add explicit checks) vs. "-foverflow=wrap" (do what most ISAs already do, but even on target ISAs that don't do that) vs. "-foverflow=clip" (do something weird that no target ISA does, like making short 65535 + 1 = 65535.)

I would expect that a single-pass compiler like gcc would look at the toolchain's target ISA, compare it to these switches, and then generate extra shim code to adapt any cases of target-arch-does-X into code-does-Y-on-X, while letting do-X-on-X just translate directly. Basically the same way floating-point code gets compiled: architectures that support it get it, architectures that don't get a shim.

Meanwhile, a two-stage compiler with an intermediate representation, like clang, should generate IR with granular types specifying the chosen behavioral semantics on each operation. The second step in such a toolchain becomes much more formalized: rather than taking code that can contain undefined behavior as input, and then "defining" it during translation in some random heuristic manner, you just take code that already specifies the behavior it wants in full, and then attempt to generate target-arch code that does whatever it says in the most efficient manner possible while not breaking any of the guarantees the IR asks for. In other words, your code-generation stage becomes a static recompiler, like this one[1] for the NES ISA.

And you can always specify a switch at code-gen time that will error out the compilation if you've done something that has caused shims to be inserted. Or even a switch to throw away the behavioral semantics the IR asks for and substitute your own, like happens in most performance-focused emulator software.

[1] http://andrewkelley.me/post/jamulator.html


GCC has an intermediate representation. It's just not exercised as a separate project (like LLVM). It's far from "single pass"


You're right; to be clear, what I was talking about above was whether the passes are decoupled at an abstract level, not so much whether the implementation uses multiple phases.

You can take LLVM "bitcode" as the product of clang, move it to another system with a different arch, and finish compiling the IR to a native binary on that system. This implies that the first step and second step don't share any state other than the IR itself—meaning that flags passed directly to clang in the first phase can't affect codegen, other than by influencing the generated IR; and that codegen has to treat the IR (and the environment) as its sole inputs.

Whereas with GCC, as far as I know, the RTL IR isn't arch-neutral, and the GENERIC/GIMPLE IRs aren't fully "baked" yet (i.e. they're compiler->compiler step IRs, not post-compilation pre-codegen IRs.) So there's no conceptual guarantee (even if there's an implementation-level one) that the compiler won't use compile-phase state to inform codegen.

In other words: it's guaranteed that you'd be able to implement the switches I'm talking about in clang+LLVM. You might (but won't necessarily) have to do some work to get GCC into the right shape to implement them.


LLVM IR isn't architecture-neutral. It encodes type layout / alignment, as well as ABI details. You can construct an LLVM target that is itself compilable on multiple architectures (see Portable Native Client), but it requires a nonstandard ABI.


As long as the option -foverflow=undefined is kept around, you have my vote. Now, it's just a simple matter of programming ;)


I think the equivalent for the current practice would basically be:

1. setting "-foverflow=promote" on the compiler's parsing-and-IR-generating stage, thus generating code that acts like all integers occupy infinitely-sized abstract machine registers, and expects this to get shimmed as checked promotion to pointers-to-structs from a bignum library on most architectures;

and then, seperately,

2. setting an optimization flag like "-Onopromote" for the target-recompilation stage, that throws away all the bignum promotion shims that would have been inserted, and pretends that it's adhering to bignum-promotion semantics anyway (thus creating code that will be optimized the way C code currently is.)

That's quite a bit more complicated, but also a lot more explicit about what's going on: you have to specify (or pick a profile that specifies) the actual semantics you want, and then you have to explicitly opt into assuming those semantics rather than enforcing them. There'd probably be a similar pair of {semantic, assumption} flags for each currently-undefined behavior.


Undefined behaviour has meant undefined behaviour. That doesn't mean nasal demons. That means this:

    Possible undefined behavior ranges from ignoring the situation 
    completely with unpredictable results, to behaving during translation
    or program execution in a documented manner characteristic of the
    environment (with or without the issuance of a diagnostic message),
    to terminating a translation or execution (with the issuance of
    a diagnostic message).
Permit me to suggest that when there is a choice between making things behave in a documented manner characteristic of the environment, or something else - don't go for the something else.


the Standard imposes no requirements on undefined behaviour. The list you quoted is informative, to give us a basic idea what to possibly expect, not a list of things we may rely upon.

Rejecting translation outright aside, undefined behaviour is the strongest language the standard uses for illegal constructs. I do not see it as undefined in the sense of lacking a common definition (eg is 0 a natural number, or not) but something that is undefined the way 0/0 is undefined in maths.

In a way, the criticized behaviour of aggressively opimizing compilers is a consequence of ignoring the situation completely with unpredictable results. It's the same line of reasoning that makes restrict useful, just on steroids.


You'd think that'd make things clear, then: if you want a compiler to error out on trying to compile a literal 0/0, then you'd probably also want the same behavior of trying to compile a literal dereference of 0, or a literal (INT_MAX * 2), or whatever else.


clang does warn about both of those:

    $ cat foo.c

    #include <limits.h>
    int main() {
      int x = INT_MAX+1;
      *(int *)0;
    }

    $ clang foo.c
    foo.c:3:17: warning: overflow in expression; result is -2147483648 with type 'int' [-Winteger-overflow]
            int x = INT_MAX+1;
                           ^
    foo.c:4:2: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
            *(int *)0;
            ^~~~~~~~~
    foo.c:4:2: note: consider using __builtin_trap() or qualifying pointer with 'volatile'
    3 warnings generated.


You missed the point of the analogy: It's not about how compilers treat the expression 0/0, but how mathematicians do.

If that expression turns up in your calculation, you did something you were not supposed to do and have no one else to blame.


No, rebutting that was exactly my point: programming is, in some ways, a friendlier version of working on mathematical proofs, in that the compiler will tell you when you "did something you were not supposed to do." Programmers expect to be able to hack away like monkeys on typewriters, and just run into a virtual wall whenever they misstep. Compilers will error out when a programmer explicitly types "0/0", and that's a good thing. It'd be a good thing in the other cases, too.


You are still missing the point. Yes, compilers obviously should not intentionally produce unnecessarily destructive behaviour. But that is not what is happening. What if the code contains "x/y"? Now, the compiler can try to prove that x and y can never be 0 at the same time. But what if that proof doesn't succeed? Maybe x is user input, so who knows what the user will enter at runtime? Now, the language spec could specify that the computation of 0/0 always results in a runtime error. But if it did that, there might be target architectures where the code that would need to be generated to check for that case would have a 300% performance overhead versus code that does unpredictable stuff when it gets fed 0 for both x and y, but in every other case performs a perfectly predictable division. That slowdown would be inacceptable for computation-intensive code where a 0/0 could never happen by construction, even though the compiler can't prove it. And that is why the standard simply says "it's undefined behaviour".


> But if it did that, there might be target architectures where the code that would need to be generated to check for that case would have a 300% performance overhead

See my sibling reply down-thread. You don't want the compiler to automatically insert the check; but neither do you want the compiler to assume you want to not insert the check. You want the compiler to error out until you tell it whether or not you want a check. Or rather, whether you want checked-division semantics—which will be free on some architectures and costly on others—or unchecked-division semantics—which will be dangerous on all architectures, but technically impossible on some (presumably, it would be approximated by using checked division and then throwing away the error.)

Either way, you probably don't want the default "checked division that could error out on the architectures where it's free, and unchecked division on the rest"; you then have to code for both the case where your function can now trap/throw an exception, and the case where control continues through your function using the invalid calculated intermediate as input for further operations. You've probably coded to handle one or the other—but it's very unlikely you've coded to handle both. And why would you want to have to?

Deciding on your code's semantics should a development-time decision. The compiler should prevent you from making the mistake of leaving your code's semantics to the whims of the target architecture your (presumably portable) code happens to get compiled on, at some other time by some other person, unless you explicitly opt in to having the semantics of "whatever the target architecture wants to give me", on a case-by-case basis. The compiler should ensure that your code has the semantics you want everywhere—whether those be "always do X" or "do whatever the target arch's closest approximation to X is."

What we call "undefined behavior" in C is actually implemented by C compilers as a very specific set of assumptions about what semantics developers prefer—and those assumptions may or may not be right for most code, but they can't be right for 100% of code. Code is made of decisions[1]; the whole of the task of programming is to explicate every decision about a business-process that was previously left implicit, so that the same predictable and deterministic thing will happen everywhere, every time. So why should compilers allow programmers to leave their desired semantics implicit?

[1] http://siderea.livejournal.com/1241996.html


> You want the compiler to error out until you tell it whether or not you want a check. Or rather, whether you want checked-division semantics—which will be free on some architectures and costly on others—or unchecked-division semantics [...]

I think I disagree. I don't want that to be a feature of the compiler, but of the language, in which case there is no way to "error out". That is to say: there should be operators for one kind of semantics and operators for another kind (it's a bad idea to overload the same operator on the same operands with different semantics). And if the language doesn't have that feature, the compiler shouldn't error out--unless it can prove that your code will unavoidably run into undefined behaviour, in which case, simply not producing a binary at all is perfectly within the range of acceptable results, and probably a good idea.

> [...] you then have to code for both the case where your function can now trap/throw an exception, and the case where control continues through your function using the invalid calculated intermediate as input for further operations. You've probably coded to handle one or the other—but it's very unlikely you've coded to handle both. And why would you want to have to?

If you think in terms of handling the results of undefined behavior, you are kindof doing it backwards. The language specifies certain preconditions that you have to meet in order for some operation to be defined. It's your job as a programmer to make sure that those preconditions are met, in which case it doesn't matter what some compiler does when they are not met.

> The compiler should ensure that your code has the semantics you want everywhere—whether those be "always do X" or "do whatever the target arch's closest approximation to X is."

Well, yes and no. Giving you the same semantics certainly is a good idea, where possible. But to "do whatever the target arch's closest approximation to X is." is a useless idea, as you can not write any reliable code on that basis. A language always has to specify certain properties that an operation needs to have, and every compiler needs to choose an implementation that fulfill those requirements, not just some vague "as close as possible"--that way, if you write code that only relies on those guaranteed properties, it will compile and run correctly on all targets. The question is only how many details are guaranteed properties and which parts are up to the implementation, aka "undefined behaviour", aka "stuff you must not rely on in your code". It's kindof impossible to really guarantee "the same semantics"--however, there are some guarantees that would be useful for a lot of code and that C, unfortunately, does not provide.

> What we call "undefined behavior" in C is actually implemented by C compilers as a very specific set of assumptions about what semantics developers prefer—and those assumptions may or may not be right for most code, but they can't be right for 100% of code.

That's actually not true. What we call "undefined behavior" in C is actually implemented by C compilers as "it doesn't matter what happens in this case". Sometimes, compiler writers will actually define some of the things that are undefined behavior in the standard, but more often than not, the compiler doesn't actually contain any explicit definition for what is supposed to happen in those cases. The compiler only worries about transforming your source code into machine code that does what the standard specifies where the standard does specify what is supposed to happen--where the standard doesn't specify anything, any resulting behavior is just a coincident byproduct of how the compiler works, which can change with any new release, and even depending on optimization options.

> So why should compilers allow programmers to leave their desired semantics implicit?

They actually don't. If the programmer wants to express "add x to y (two signed integers) with wrap around" and writes in a C program "x+y", the programmer has simply made a mistake. It may be that in some cases, if you write "x+y", your compiler will behave with those intended semantics. But that is a coincidence. There is nothing implicit there, "x+y" in C simply means something different than the programmer intended, and just by coincidence sometimes produces the desired result. It's not any more correct than writing "exit(0);" when you want to print "hello world"--the only difference is that the compiler is free to produce the intended result even though it's not required to, which confuses many people. But just because the compiler sometimes produces the intended result does not mean that what you wrote actually means what you intended.


I agree with pretty much everything you wrote here. I think my main disagreement is that you can fix this at the language level. Languages are static entities; some are "living" in the sense that they get new major revisions (C++0x, Python 3, etc.) but on a day-to-day basis, you have to deal with the language you've got, and most languages don't have any way to "annotate in" these sort of semantics on a per-module basis. (C is one of the only places this has been given even bare thought, actually, with compiler #pragmas.)

IMHO the goal of a compiler is this: to take as input 1. code represented in some standard grammar, and 2. configuration represented in a compiler-specific format; and to combine these two things to generate a build artifact.

I assume that you want the build artifact to have the same deterministic behavioral semantics on all target platforms, regardless of whether your language allows you to encode those semantics. Insofar as compilers have the two inputs stated above, then each semantic property that can't be made explicit by the language grammar, must then fall to the compiler's configuration.

In other words, it's the compiler's job (through you) to fix the language's mistakes, such that vague code becomes non-vague binaries.

Imagine for a moment that, instead of global command-line switches passed directly to the compiler, the compiler's configuration existed in the form of a "hints file" alongside each source file—effectively, a proprietary system of markup/annotations for the language, but separated into its own file instead of being intermingled in the source like Java's @annotations.

In such a model, you'd have a 1:1 correspondence between input #1 and input #2: each file of C code would have an equivalent file of hints, telling the compiler everything it needs to know that isn't in the C code (and which, moreover, can't actually be expressed at all in C.)

With our current backwards batch-programming methodologies, you'd probably have to write the "hint" files yourself, or have your IDE generate them by twiddling knobs. My real expectation, though, is that your compiler would actually run through your source files interactively, asking you questions where it thinks things are vague and recording the answers in its hint files, only actually doing a compilation pass once all its questions have been answered. Your compiler would actually work with you like a copy-editor, taking your muddled prose and putting in little query-marks to show its confusion.

Which is all to say, we're never going to get a revision of the C language standard that will actually give "x + y" platform-independent overflow semantics. But we could definitely create compilers that would annotate each "x + y" in a C codebase with a particular overflow semantics, without doing anything to the code itself that breaks the C language grammar. This would let us remove the vast majority of what is currently "undefined behavior" from our code, without fighting a losing battle for more semantically-explicit languages.


> Anyway, that's all to say, IMHO the goal of a compiler is to take as input 1. code represented in some standard grammar, and 2. configuration represented in a compiler-specific format, and combine these to generate a build artifact.

Then, what you really are doing is defining a new language. And one with a horrible syntax at that. If the "code" alone does not actually describe what the program is supposed to do, that "configuration" becomes part of the program (if the binary's behaviour changes depending on whether you compile it with or without the "configuration", it is obviously part of the code), and having parts of your program kindof "out of band" is about the worst idea ever for auditing and maintenance, and for general readability. Also, given that that part of your new language is not part of the C standard, say, you cannot use any C compiler to compile it. If you are lucky, you find another compiler that implements another new language based on C with a different syntax for the "configuration" that provides the same semantics, then you can start porting from your own new language to that compiler's new language, and then maintain two implementations of that part of your code (erm, "configuration").

If you want to create a new language with better semantics, better invent one with sane, in-band, syntax. If you want to write C ... well, then write C, C is turing complete, so you can express anything you need in it, and every (non-buggy) C compiler will understand it.

> My real expectation, though, is that your compiler would actually run through your source files interactively, asking you questions where it thinks things are vague and recording the answers in its hint files, only actually doing a compilation pass once all its questions have been answered.

I think that that can't work. There are just so many places where it's somewhere between extremely hard and impossible for a C compiler to prove that undefined behaviour cannot happen that you probably would need a dozen annotations for every line of your code. Plus, keeping track of how the recorded annotations map to the code after it has been modified is extremely fragile, so if you wanted to be sure, you'd have to answer all the questions for a source file again after any semantic change you make.


Well I submit that as a simple question of quality of implementation, implementers should try their hardest to go for the something that behaves "in a documented manner characteristic of the platform"! But instead, they seem to pick something else. And I claim that this is not helpful to their users, who would be far better served by the characteristic-of-the-platform option.


It's really friction between incompatible visions of C: Some want it to be a portable wrapper over assembler with predictable semantics, others want it to run as fast as possible.

Like it or not, the standard leaves the doorway open for the latter group to come up with increasingly aggressive optimization, only limited by their ingenuity.


> But why can't it just produce something different on each system?

It could. That is indeed how Rust, for example, defines it.

But in C "unspecified values" have a way of turning into undefined behavior really quickly.

Here's an actual bug we have in Rust [1]:

    let index = 1.04E+17 as u8;
    let array = vec![1, 2, 3, 4, 5];
    println!("{}", array[index as usize]); // segfault
Why does that segfault instead of emitting a safe panic? Because the IR looks something roughly like this (in C notation):

    double tmp = 1.04e+17;
    uint8_t index = (uint8_t)tmp;
    vector *array = make_a_vector();
    if (index >= array->length)
        panic();
    printf("%d\n", array->ptr[index]);
The optimizer correctly determines that the cast from double to uint8_t results in an undefined value. In LLVM terminology, undefined values can arbitrarily change value over their live range, and this is allowed per the C specification. So LLVM is allowed to replace the conditional "index >= array->length" with "false", and this is indeed what it does, leading to this code:

    double tmp = 1.04e+17;
    uint8_t index = (uint8_t)tmp;
    vector *array = make_a_vector();
    printf("%d\n", array->ptr[index]);
And with that, LLVM optimizes out our bounds check, and safe code is able to perform arbitrary memory reads and writes. Note that all we needed was an unspecified value resulting from a floating point cast!

The fix is probably going to be to stop using LLVM's floating point conversion instructions and to instead use some "safe conversion" intrinsics that we'll define.

> What compilers do today should be the recourse of absolute last resort, and the sort of thing that makes its authors feel bad. But it seems to be treated as a matter of course. I don't know what to say.

You might think that I would think this way too after fighting bugs like the above, but I actually don't. I appreciate the performance that undefined behavior brings to C, and I think we'd have much slower programs without it. Undefined behavior resulting from strict aliasing is the reason why a for loop setting an array to zero can optimize to a 10x faster memset, for example. And heavily inlined series of functions, as well as generic libraries like the C++ STL, frequently require these optimizations to get reasonable performance. I think it's not an exaggeration to say that much of the software performance we take for granted—stuff you care about, like codecs, games, browser implementations—is the result of optimizations that are enabled by undefined behavior.

[1]: https://github.com/rust-lang/rust/issues/10184


For anyone wondering: double -> uintx_t is indeed undefined in C and C++ for values outside of (0, UINTx_MAX), which is kind of a gotcha since, double -> long -> uint8_t is fine (assuming all integer parts of double fit in a long).


Which they don't? Right?


The problem here is, as you point out, the conversion of double to unsigned byte.

More specifically, the problem is that processing this operation is not producing error; it instead propagates the problem and produces invalid code. That's a situation where nobody wins.

I think we'd be better off if many undefined behaviours were instead implementation defined. If, instead, LLVM had converted the double to some integer and then chosen the lowest 8 bits from that integer, we'd have gotten some random value - but at least the comparison wouldn't have been ignored. The comparison should only be ignored if the compiler can prove that the value definitely lies within the range; if it isn't sure, it can't remove the comparison. The programmer wrote it for a reason; ignoring the intent of the programmer is a bug.

On strict aliasing, I'm against it without explicit opt-in over a delimited subset of source code. I understand that using & is going to harm the performance of my code; I think that's an acceptable tradeoff for more predictable behaviour. If it harms the performance of C++ vector iterators, for example, I don't care: strict aliasing rules are a worse cure than the disease of C++'s poorly thought out abstraction tools. Let a subset of the program follow Fortran rules if it's required.


> On strict aliasing, I'm against it without explicit opt-in over a delimited subset of source code. I understand that using & is going to harm the performance of my code; I think that's an acceptable tradeoff for more predictable behaviour.

I believe that you and others think that's an acceptable tradeoff. At the end of the day, though, most people want C compilers to produce the fastest code possible. Compiler authors are just responding to what users want. The example below is actually an excellent example of this.

> If it harms the performance of C++ vector iterators, for example, I don't care: strict aliasing rules are a worse cure than the disease of C++'s poorly thought out abstraction tools.

It's not just C++. Consider (from Chris Lattner's blog post):

    float *array;
    void zero_out() {
        for (int i = 0; i < 100; i++)
            array[i] = 0.0;
    }
People saw compilers missing the optimization to compile this to a memset, and filed bugs against them. As a matter of fact, though, that is an illegal optimization to make unless the compiler is able to take advantage of strict aliasing. That's because, absent strict aliasing, array[0] could legally be a type-casted pointer back to "array" itself, even though almost nobody would actually write that code.

The reason why compilers implemented strict aliasing rules is that their users demanded that they fix "optimizer bugs" like the one above. And, honestly, I can't blame either the compiler developers or the users. Strict aliasing ends up being one of the things that's often important for performance. The problem, if anything, lies with the C language.


I'm not sure that's the best example. In that case, the knowledge that the 'array' global never has its address taken allows you to perform the optimization. You can also rewrite it by copying 'array' to a separate local that doesn't have its address taken, and ordinary SSA construction solves the problem.

I feel like a new systems language needs to come and make aliasing explicit in a way, so that it doesn't surprise programmers but still allows them to get the optimizations that they want. Tracking uniqueness like Rust is a good first step, but it doesn't work for unsafe code or anything with more subtle aliasing relationships.


> I feel like a new systems language needs to come and make aliasing explicit in a way, so that it doesn't surprise programmers but still allows them to get the optimizations that they want. Tracking uniqueness like Rust is a good first step, but it doesn't work for unsafe code or anything with more subtle aliasing relationships.

I think it could be added to Rust. :)

Uniqueness can be viewed as just a starting point. It's a conservative subset of more sophisticated systems like PALE. Since it's a subset, it could be expanded to the full system eventually.

I think the biggest problem is not implementing it or finding a way to slot it into the language design but rather making it easy enough to use to achieve widespread adoption. Rust already stretches the boundaries of programmers' willingness to learn new type systems/static analyses to control aliasing, and it has, as you point out, a very simple system. In fact, I think it's unproven that programmers will be willing to adopt such a system en masse at all†, since Rust is still niche. If and when Rust really takes off in widespread production, we'll have proof that programmers are willing to adopt aliasing information in the type system.

† It'll be interesting to see how the ISO Core C++ lifetime checker fares here. I'm not optimistic about how much use it will get due to complexity and effectively being a different language from C++, but I'd love to be proven wrong.


> I'm not sure that's the best example. In that case, the knowledge that the 'array' global never has its address taken allows you to perform the optimization.

Separate translation units. For this to work, you'd need to postpone all optimizations to the link stage. Even then you're not safe, because you may be producing a dynamic library and the program loading it can take the address of an object by dlsym().


You can specify aliasing explicitly with the restrict keyword. It's used in cases where you have two pointers of the same type that are otherwise allowed to alias each other. For pointers of different types it doesn't make much sense and shouldn't be required, there is just no valid reason, except trying to be a smart ass, for two pointers of different types to alias each other, for generic containers you can use void pointers.


Wouldn't the compiler be capable of noticing that if the array[0] is assigned to zero while

- array pointer points to pointer itself, then the pointer is set to NULL pointer (assuming architecture where float 0 and NULL are represented in memory by zero), and derefencing null pointer is an undefined behaviour

- array points to misaligned itself (possible on x86), then there is a write outside of variable bounds, and undefined behavior

- array points to part of zero_out or memset function code, however the function is constant, and cannot be changed, and this causes undefined behavior

In each of these cases, there is an undefined behavior not depending on strict aliasing.


It's a problem because that will lead to bugs. You write code on x86, it works for years and then you let it run on ARM and it will misbehave. If it weren't a problem, then the compiler doing whatever it wants also wouldn't be a problem.

IMO the best thing would be if the compiler would insert code to print to screen a warning and exit the program, if it comes across undefined behaviour. (This is different from sanitizers, that check for all places whether undefined behaviour will occur).


The whole point of undefined behaviour is to allow for optimizations. Adding tons of runtime checks is the opposite of optimization. A lot of this is about being able to use a single instruction of the respective target architecture vs. adding another two, three instructions for the check, thus slowing things down to half the speed or less.


It's not really about optimisations, but more about being able to translate directly to fast machine instructions (which may behave differently on different machines). What I'm talking about is not adding checks, but what happens when the compiler knows that something has undefined behavior through e.g. Constant propagation. This happens in the optimizer on IR level. The undefined behaviour is there to make translations to machine instructions efficient - so that you don't need to do checks on pointers and that addition, shift, etc. on several instructions are supported although they behave slightly differently


> What I'm talking about is not adding checks, but what happens when the compiler knows that something has undefined behavior through e.g. Constant propagation.

But ... either the compiler can prove at compile time that the program has undefined behaviour, in which case it should simply refuse to compile, or it can not, in which case it has to emit a runtime check to determine when undefined behaviour "is about to happen", which is potentially expensive?

> The undefined behaviour is there to make translations to machine instructions efficient - so that you don't need to do checks on pointers and that addition, shift, etc. on several instructions are supported although they behave slightly differently

... or in other words: for optimisation?

I mean, optimisation ultimately is nothing but the selection of the cheapest possible sequence of instructions to perform a given computation. And undefined behaviour helps with that, not just in the last step of mapping your IR to instructions, but in every step before that, as every optimisation step has to preserve the semantics of the code, and the fewer restrictions there are on the interpretation of a given piece of code, the more options every optimisation step has to transform the code without changing the semantics, and thus the more chances for one of them being cheaper than the others.


> But ... either the compiler can prove at compile time that the program has undefined behaviour, in which case it should simply refuse to compile, or it can not, in which case it has to emit a runtime check to determine when undefined behaviour "is about to happen", which is potentially expensive?

It's more fine-grained: The front end (namely clang in llvm) in easy cases can already see that something is always undefined behaviour and it will omit a warning (for instance through constant propagation). This happens with `i = 32; x<<i` for instance.

The optimizer then does stuff like inlining function that lets it see even more places where there sure will be undefined behaviour. This is the point I'm talking about. The compiler already knows that it will be undefined behaviour - no checks are needed. Performance is of no concern here, as the program already is in a buggy state and may just quit. This will happen when you have the function call f(32) in one function and in `f` there's a shift `x << param`. I believe in this case there will be no compiler warning and the compiler will just optimize out the shift. This is where I think compilers could behave more helpfully.

Where performance matters is the case where the compiler doesn't know the parameter. To be completely safe the compiler would need to check wether the shift amount is over 31 (`if shamt > 31 { exit(0)} else { x >> shamt }`). Here there's also a performance hit in case there is no undefined behaviour occurring.


> The compiler already knows that it will be undefined behaviour - no checks are needed. Performance is of no concern here, as the program already is in a buggy state and may just quit.

You have it all backwards. If there is some place in the control flow of the program where the compiler can prove that the program is guaranteed to run into undefined behaviour, then it's either because it has proved that the whole program unavoidably runs into undefined behaviour (in which case the compiler should simply refuse to compile), or because there is a branching instruction preceding that part of the code that divides the control flow into the part that the compiler has proved to run into undefined behaviour and the part where the compiler has not proved that, in which case you obviously have a branching instruction (and probably associated condition computations) that has the function of "performing a runtime check" and that introduces an overhead vs. simply dropping that check/branch from the code.

If the program somehow determines at runtime whether it is in a defined or an undefined state, there has to be code to make that determination--code that the optimizer could avoid producing/eliminate as far as language semantics are concerned, as it doesn't have to guarantee any specific behaviour in the undefined case, so simply having the undefined case execute the same code as the defined case, thus avoiding the check to discern the cases, makes the defined case faster.


Probably because it would go in the same direction as multiple-dialects proposal. If we can have bcc-x86 that returns x ans bcc-arm that returns 0, then we'll likely end up with bcc-x86-fullshifting-nounaligned-nullcheckremoveok-otheroptions and 10 dialects that switch some of those behaviours. If you're designing a well-defined language, then why make it well-defined-per-architecture?


But the standard already permits this. I'm just suggesting that rather than have compiler writers use these liberties to allow their compiler to remove code or make it do something peculiar, they could just allow it to do the natural thing for the target platform.

(This is explicitly allowed by the standard. Consult its definition of undefined behaviour.)

Why not create a compiler that does what your users want? Why spend your time creating a stick to beat them with?


Because the overwhelming majority of users of C compilers are compiling someone else's code that isn't narrowly targeted to the platform+compiler in question. So they get no benefit at all from specifying undefined behavior, but they do get a benefit from the performance gained by assuming away undefined behavior as dead cases.


they do get a benefit from the performance gained by assuming away undefined behavior as dead cases

I think this is a highly suspicious assumption. Programmers don't write code for no reason; they have a mental model of the current state of parameters and variables, and establish invariants as they write loops and conditionals. If you're going to start throwing out chunks of code the programmer wrote, you need to be certain - not merely unsure, but certain - that the code is dead. Infectious undefined values that propagate through data flow to break code flow isn't a sane programming environment.


The, to me, obvious thing to do with code with undefined behaviour is to emit an error and refuse to compile it. Programmers should never rely on undefined behaviour.

Then again, I think languages shouldn't have undefined behaviour and that programmers who use languages with undefined behaviour deserve what they get.


In the most interesting cases, it can't reasonably be detected at compile-time anyway.

See also here:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...


The problem, as Regehr (the author of this submission) has pointed out many times before, is more subtle than that. Because certain behavior is considered undefined, compilers are allowed to assume the code it is compiling is well defined, and optimize accordingly. That can cause simple bugs to have mysterious effects. For example:

    a->thing = 42;
    if (a == NULL) {
      return;
    }
Obviously that code is wrong; I should check for NULL before using a. But because I dereferenced a, the compiler can assume a must not be NULL, and just removes the null check as dead code. This scenario has caused problems in the kernel, and made bad bugs even worse.


But the compiler could just as well emit a warning "useless NULL comparison" instead of blithely assuming that the programmer intentionally wrote a book expression. That wouldn't handle every case of UB, but would handle many.


A warning on removed dead code isn't helpful because dead come is legitimately removed all the time. No one would ever heed it.


If you ignored dead code produced by macros and limited the check to function-scope, I think you'd mostly eliminate false positives.


It's usually only at runtime rather than compile time that such situations arise.

At runtime we have the clang and gcc undefined behaviour sanitizers, but they aren't free in terms of performance.


I think the most important point of Friendly C is not to define the behaviour for what would otherwise be undefined, but to define a behaviour; from this point of view, it would be unnecessary to argue over the examples he mentions like memcpy() vs memmove() and integer shifting --- it only suffices that every implementation define the behaviour, and what that behaviour precisely is can differ between them. A lot differs already between platforms, and so all this basically means is to document those differences. This agrees with the spirit of the language as a "high level assembler", removes all the undefinedness, and would be entirely unsurprising to most programmers.

The situation gets worse when we start talking about a friendly semantics for races and OOB array accesses.

I don't have anything to say about races, but OOB array accesses should simply do what you'd expect the hardware to do: attempt to access memory at the location the array indexing equation gives. It may segfault, or it may access some other contents in memory. That's what any C programmer would probably expect.


You have to look deeper for the OOB array access question. For example, imagine I have code like this:

  if (a > 1)
    b++;
  array[b] = 0;
  c = a + 10;
Under your suggested semantics, can the compiler use the value loaded from `a` at the point of the if() statement to calculate `a + 10`? Or does it have to emit a reload of `a` after the array access, in case `array[b]` was an OOB access that overwrote `a`?


I'm not seeing the deeper question, whereas what 'userbinator' said still makes sense to me.

The compiler's implementation defined behavior would simply be "if you tell me to write a value to an address, I'll create assembly that tries to write that value to that address". That's all. If this address is outside the allocated range of the array, there is no guarantee that it's safe that write this value, or that it won't break something else.

So in your example, no, the compiler is not required to reload 'a'. The compiler is allowed to presume that the array[x] syntax will have no affect on the value of a local variable, whether that variable is stored in a register or on the stack. The compiler is not guaranteeing safety, just best effort.


When you're saying that the compiler should be allowed to optimise under the assumption that OOB array accesses don't clobber other variables, this means that OOB array accesses can make very weird things happen indeed. For example:

  if (a > 1)
      b++;
  array[b] = 0;
  if (a > 1)
      foo(a);
We might see foo() called with argument 0 when that's apparently impossible - the OOB write has apparently "reached forward" (it can also "reach backward").

This is why it ends up just being "undefined behaviour" - to do better either you have to somehow exhaustively document all the kinds of weird things that can happen ("it writes to the memory" isn't enough, because of the way that can interact with the optimiser) or you have to unreasonably constrain the optimiser.


I'd say that it doesn't have to read 'a' again because these are separate variables and there's no requirement for 'a' and 'array' to be contiguous, or for that matter 'a' being in memory at all instead of just being in a register. An OOB access could overwrite the memory storing 'a', but I think this case of relying on the ordering and location of objects in memory is just not something any C code would ever have to do.


You're already allowing undefined behaviour then. Very confusing bugs could result - e.g. the value of 'a' might be different from what it looked like in a debugger, because the array access changed it in memory but the copy in the register hasn't changed.


How about treating it just like another thread accessing the values? So yes in the general case, no if you put in memory barriers or atomics.


Race conditions are generally treated as "you get undefined behavior" for much the same reasons.


I mean race conditions on actual hardware, not the way the standard abandons all hope yet again.

Stale reads don't require total chaos.


    I don't have anything to say about races, but OOB array
    accesses should simply do what you'd expect the hardware
    to do: attempt to access memory at the location the array
    indexing equation gives. It may segfault, or it may
    access some other contents in memory. That's what any C
    programmer would probably expect.
The problem is that defining OOB array access this way means throwing out pretty much every semantic guarantee from the rest of the language. There is really no friendly way to define writing an arbitrary value to an arbitrary location. By way of example, pretty much every (non-embedded) C implementation since 1972 supports recursion by writing an address of code to execute to memory. If you have semantics allowing a programmer to write to that location, then you can start executing literally any piece of code you want to. On some platforms like x86 you can execute starting in the middle of an instruction.

For example, "return" would stop meaning "execution resumes at the callsite of this function" and instead mean "execution resumes at the callsite of this function or maybe somewhere else if someone wrote to arr[-5]". The article makes a big point about the controversy of giving memcpy() the semantics of memmove(); how would you like the semantics "memmove() unless a concurrent thread is running, in which case anything might be written to dest and src might be overwritten."


Tone: I do not mean this as sarcasm or merely chasing fashion, I'm quite serious. As both theory and practice are showing, you're never going to be able to get the consensus you want out of C. There's no "saving" C... not because that's somehow mathematically impossible, but simply because the project is too staggeringly large for us to even wrap our heads around. It would literally be easier to get people to start using another language...

... so, why not do that then? We have, for perhaps the first time in 40 years, a candidate for a systems language that can truly replace C, that has truly different semantics (i.e., not C++, which is still profoundly C with a lot of stuff bodged on the side). I'd suggest trying to use Rust, and working with that team to nail down whatever remaining issues may yet be undefined in Rust that may cause trouble in the future. Whatever remaining practical problems there may be (and my impression is that that isn't really a long list), work on resolving them.

Again, I am not being sarcastic or cynical; it is truly my estimate that it would be easier to get people behind that than to fix C at this point. Probably by a good two or more orders of magnitude. Obviously we're not going to rewrite all existing code. Obviously there's a lot of code still to be written that is so deeply embedded in C that there's no practical alternative to adding more C to it, even in a world of FFI support and such. But if the people who care about the idea of BoringC or Friendly C start getting behind Rust, getting their hands dirty with it, and doing what the Mozilla project is doing to find places where they can start slotting subsystems in cleanly to existing code bases, you may just be able to start creeping out of the mess we're in now.

And... who knows. If this becomes acceptable, then generally considered a good idea, then best practice, then perhaps even something you need to do unless you want people to think poorly of you... perhaps real change will prove to be less intractable than we thought. People consistently overestimate change in the short term but underestimate in the long term. It is, perhaps, not too much to hope for that huge swathes of our fundamental systems could be running on Rust in 20 years, instead of C.

So I'll reemphasize once again... yes, I know I'm proposing a staggeringly enormous change. The only thing that it has going for it is that I still think it's easier than the staggeringly-enormous-squared other choice.

Something's gonna happen. After all... what's the alternative? That C is still the foundational language of the entire computing world in 2035?

Seriously?


I'll take the bet that in 2035 it's still going to be C/C++ (or a C derivative like Boring C). Because rewriting all that code is an economic impossibility. There's a very long way to go before the rate of foundational Rust code written exceeds the rate of foundational C/C++ code written. And even if you manage to have 100% Rust and 0% C/C++ code being written, you still have a huge legacy to write, which literally costs billions of dollars (ALL operating systems, ALL browsers, ALL interpreters (Python/PHP/Perl/...), ALL compilers (GCC, LLVM), ALL web servers, etc.)

Honestly I think your view of technological adoption is fairly naive -- I don't see much content here other than "everyone get behind Rust!".

IMO the more realistic approach is a systems approach: make it so that badly written C code doesn't completely hose your system. I like the application compartmentalization work (Chrome style, DJB style), and capability work like Capsicum. And LangSec work in making safe parsers. The trusted computing base has to be reduced. Not every line of C code should run in a trusted context!!! Principle of least privilege. We know (or should know) all this stuff.

That is many of orders of less magnitude less work / cost, and I think actually feasible. Coherent and secure systems architecture is more achievable than everybody writing perfect C code or everybody switching to another language.

Another thing the Rust community should be working on is easy and efficient IPC with C programs. So you can rewrite a secure core in Rust and communicate with legacy C/C++ running in an untrusted OS context.

And also fixing the mess that is Linux containers, so it isn't so difficult to secure them (i.e. see Docker's security issues)


'Honestly I think your view of technological adoption is fairly naive -- I don't see much content here other than "everyone get behind Rust!".'

I think you may not have seen much content because you weren't looking for it very hard; for one thing, you seem to have simply instantiated for me an instance of the fallacy I mentioned. People overestimate change in the short term, but underestimate it in the long term.

"IMO the more realistic approach is a systems approach: make it so that badly written C code doesn't completely hose your system."

We've been doing that for 40 years. Since it is already a sufficient problem that C can hose the process it is in, even perfect process isolation is not sufficient. Building all the protection code in C also has a track record of demonstrated failure. You're basically proposing we continue doing the exact same things that are also the reason we're having this conversation in the first place. If this plan worked, we would not even be here.

I'd also point out that if we're talking track records, capabilities have failed pretty comprehensively to date.

"We know (or should know) all this stuff."

And yet, here we are. Theory says this ought to work. Reality says it isn't. Reality wins.

"Another thing the Rust community should be working on is easy and efficient IPC with C programs. So you can rewrite a secure core in Rust and communicate with legacy C/C++ running in an untrusted OS context."

More evidence that you "didn't see much content" in my post because you weren't looking for it is that I already said that.

Finally, I'm arguing the first derivative and you're arguing the zeroth. I think it's perfectly feasible that much more of the code written in 2035 will be in Rust than in C, and people will look at you funny if you insist on writing a new subsystem in C. If you think that's unthinkable, well, that's the long-term underestimating fallacy I was talking about. Since, as I already mentioned in my post, Mozilla is already starting to do this it's hardly unthinkable that by 2035 everybody will be.

With respect, you clearly read what you pre-judged and expected to read in my post, not what I actually said. If you found that to be "naive", well, I don't think that comes from me.


I shouldn't have been so harsh, because despite the fact that these techniques are already widely deployed [1], most people don't understand them. In fact I had a conversation with an ffmpeg developer on this forum and he didn't think securing ffmpeg this way was feasible. And I think another reply to my message mistook FFI for IPC, as you did. So the vocabulary hasn't even trickled down to most programmers.

Application compartmentalization is certainly difficult in practice, and should be made easier, but it is deployed and it's a lot easier than rewriting all code in a different language.

ffmpeg is a great example. It is hideously broken, with thousands of likely security bugs [2]. Now let's compare these two strategies:

    1) Rewrite EVERY video codec in Rust
    2) Reuse existing code, but run it in a completely
    untrusted process (no OS privileges), and only 
    communicate with IPC.
The claim is that #2 is much more achievable than #1. You could say that #1 takes orders of magnitude more work, which is true, but that doesn't even quite capture it -- it actually scales differently because for #2 you do constant work and get security for ALL video codecs. It's O(1) vs O(n).

Why not get secure foundations by 2025 instead of 2035? What I'm arguing for is a hybrid, heterogeneous architecture -- instead of rewriting 100% of code in Rust, rewrite 5% in Rust and isolate the remaining 95%. That gets you a better result which much less effort.

These techniques predate modern browsers -- they are used in qmail and djbdns. [3] [4]

I agree with the idea that most people underestimate change in the long term, but that is orthogonal to my point -- we get get secure systems faster with a different technique, in a shorter time frame.

Also, if you look at the code we're using today, it is perfectly reasonable to conclude that in 2035 we will be using a lot of the same code. People are still using Linux (1991), GNU tools (started in the 80's), Python (1990), Perl (1985), PHP (90's), etc. In 2035 is undoubted we will be using code from 2015 -- in fact it is likely we will be using code from 1985 in 2035. Those are the economics of software. Stable code is extraordinarily valuable and long lasting.

But the good news it that we can be running C code from 1985 in 2035 and still have secure systems!

I apologize for saying your post was naive, but I can tell from your original post and response that you haven't worked with security or operating systems very much. I have some impatience because the "rewrite everything" response reminds me of people who think they can rewrite an old codebase do better, when in reality they would end up with something worse [5]. There is a lack of subtlety and lack of understanding implied when you want to start completely from scratch.

Browsers like Firefox and Chrome didn't even start from scratch. Firefox was derived from decades-old netscape code, and Chrome reused Webkit, which was derived from Konquerer. Google has infinite money, but didn't rewrite Android from scratch -- it is based on Linux and hundreds of other open source projects. Code lasts A LONG TIME -- simply suggesting that everyone rewrite it is not helpful. It's an economic impossibility.

[1] Deployed by code you are using every day: (Chrome, Firefox, IE, Safari, etc.)

https://en.wikipedia.org/wiki/Process_isolation

http://www.chromium.org/developers/design-documents/multi-pr... https://developer.mozilla.org/en-US/Firefox/Multiprocess_Fir...

[2] https://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and...

[3] http://cr.yp.to/cv/activities-20050107.pdf -- see "extreme sandboxing".

[4] http://cr.yp.to/qmail/qmailsec-20071101.pdf

[5] http://www.joelonsoftware.com/articles/fog0000000069.html


I haven't used it yet, but apparently Rust's FFI with C (in both directions) is quite good:

https://doc.rust-lang.org/book/ffi.html


That doesn't do IPC though right? When I say IPC, I mean where you have a Rust process and and a C/C++ process running with different privileges. When people say FFI, they mean Rust code and C code running in the same process.

I thought Chromium had a library for this (for multiple C++ processes), but I guess it is sort of ad hoc now?

https://www.chromium.org/developers/design-documents/inter-p...

https://www.chromium.org/Home/chromium-security/education/se...



This seems like a great start, but it would be far more useful if there was a C/C++ side. I only see Rust examples, which makes me think it is for Rust <-> Rust communication.

Though it's great to see this, because just rewriting in Rust -- while fantastically expensive -- isn't necessarily enough. You need multiple security measures. The grandparent comment was mistaken about this.


In fact, we removed language features in order to gain a zero-cost interop.


Which ones?


Green threads.


Reghr came to a Bay Area Rust Meetup long ago, we're all generally fans of his work. :)


I love the ideas behind Rust. I don't think it can replace C for all use-cases.

C has the use-case of zero-dynamic-allocations. AFAIK, Rust is not very compatible with that mode of use.


As far as I know, C and Rust give you the same opportunities to work with stack-allocated values.


It's not just stack allocations, it's support for intrusive allocations, pointers not always conforming to simple ownership semantics, etc.


Huh? If you don't want any dynamic allocation (or any other features that require support from a runtime system), you can use `#![no_std]` and instead use `libcore` directly. This is how the standard library itself is implemented.


The problem, as I understand it, is that Rust's ownership semantics don't accommodate for the kinds of programming needed by 0 dynamic allocations.

Lots of intrusive nodes pointing freely at one another, is an example. I've heard some Rust libraries approximate intrusive allocations almost perfectly, but not quite perfect.


As long as you're willing to use unsafe, you can write things exactly as you can as in C. That's of course, not as satisfying as not needing to, but by encapsulating that in a library and providing a safe interface, downstream users still get the benefits.


Can a safe interface always be provided?


I can't say that I'm able to speak to every single possible program ever, but at _some_ level, it should end up safe, yeah.


Rust's ownership and borrowing rules are entirely a compile-time thing: they just determine what objects can be safely used, and when. There is absolutely no need to use dynamic allocation to benefit from Rust's ownership and borrowing checks. And none of the other core language facilities (primitive types, unsafe pointers, fixed-size arrays) require dynamic allocation either.

What does require dynamic allocation is using boxed objects and dynamically sized collections (such as vectors and tree maps), which are provided by `libstd`, but not by `libcore`. Think of using `libcore` as using C or C++ in freestanding mode.


As far as I know, nobody managed to create a comprehensive intrusive data structure library that behaves as optimally as in C, due to the ownership semantic restrictions.


So just bypass the ownership checks, with `unsafe`. It gives up on memory safety, but you still keep all the other benefits of a modern language: algebraic data types, pattern matching exhaustiveness checks, parametric polymorphism, etc. So, even when writing `unsafe` code, Rust is strictly superior to C and C++.

Also, I would contend that, for the vast majority of programs, the benefits of a simple, hierarchical, statically enforceable resource management scheme like RAII outweigh the downsides. Programs that need more sophisticated approaches are a very small minority.


Actually I think you are being rather timid in your proposal. C is the symptom: the cause is the underlying Von Neumann machine. The future is hardware - re-configurable hardware connected as needed for the present purpose.


I think this is a fundamental problem with C. Both of the language itself, but also of the C programming culture of insisting on very low levels of abstractions.

Bugs and undefined behaviour is too easy for programmers to write when the level of abstraction is low. Too much boilerplate to get wrong. But too low a level is also bad for an optimizer. It cant assume it understand the programmers intention with the code.

C++ tries to fix some of this by creating a higher level language and library on top of C. Low level code is considered unsafe, when higher level abstractions can be used as replacements.

Some examples of replacing low level with high level. Raw (owning) pointers and manual memory management are replaced with RAII value semantics and occasionally smart pointers. Raw loops are replaced by container iteration and better yet by STL algorithms. Casts are replaced by templates. Threads and mutexes are replaced by tasks and async, etc.

Eventually the idea is to subset C++ so that we can rip the C out of C++.

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppC...


I don't think C is bad for optimizers. What language consistently generates faster machine code than C?


Fortran, but Fortran is also quite low-level. It can do some optimizations that C can't because it disallows pointer aliasing.


restrict?


You have to choose whether or not you want the compiler to generate code that spends time doing things that the programmer didn't ask for.

If you write a library with a function which accepts variables x and y and computes x[y] (or x<<y), there's no way for the compiler to know in advance whether the user will pass in well-behaved values for x and y. If you insist on the behavior being defined in pathological cases, then the compiler must add some kind of branching logic in there to check what's passed in at runtime. In other words, spend time doing things the programmer didn't ask for.

Maybe when we make more progress with formal proof-generating languages, we can create a "friendly" C where the compiler refuses to compile the code until it's accompanied by formal proofs of UB-avoidance.


"Maybe when we make more progress with formal proof-generating languages, we can create a "friendly" C where the compiler refuses to compile the code until it's accompanied by formal proofs of UB-avoidance."

That's entirely feasible. I headed a team which did that for a dialect of Pascal over three decades ago.[1] It's since been done for Modula III, Java, Spec#, and Microsoft Windows drivers.

One reasonable thing to do is to have the compiler generate run-time assertions for every statement which has a precondition for defined behavior. All pointer dereferences get "assert(p != 0)". The shift problem in the original article gets "assert(n > 0); assert(n <= 32);" Now everything has run-time checks.

Then you try to optimize out the run-time checks, or at least hoist them out of loops. A simple prover can remove about 90% of the checks without much effort.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf


Fun fact about large shifts being undefined.

With a 32 bit x, the expression x << b | x >> (32 - b) can be translated to a single "roll x b bits to the left" instruction. But it is only possible if >32 shifts are treated as undefined.


Not true. In fact, modern x86 backends recognize the safe rotation idiom and also translate it to a single ROL instruction: http://goo.gl/NCMEVu http://goo.gl/9Ytsz2 http://goo.gl/vbzHNW


I must be missing something here, but can't you simply not invoke undefined behavior in your program? That way, you don't have to worry about what your compiler will do when it encounters undefined behavior.


That's possible.

It requires extreme care (assuring no memeory leaks is comparably a walk in the park), and will make your code much less performant. But it's possible.


It's easier to expand than contract. When the same decision is faced multiple times, it will be made multiple ways, and it's very difficult to remove one of those options once it's in use. On the other hand, if you remove the decision by standardizing on one of the options, you can always allow the other option later.

Working in Perl, I run into this a lot. When 'there is more than one way to do it' is a driving principle, it's easy to be inconsistent in usage and design. There's a tradeoff here -- it's easier to write code, because you can pretty much write your solution however it pops into your head. However, it's harder to read and maintain, because you have to recognize many many different patterns of usage. When code has been around for awhile, it starts to get an eclectic mix of patterns blended together, which can be frustrating to run into when you're trying to fix that code.

How do other people deal with this? Part of it is language choice, I'm sure. Large companies tend to use languages with less stylistic freedom, which helps teams write code more consistently. But within a given language, how do you balance freedom and consistency, so that people can understand each others' code effectively without being overly burdened by restrictions about how they can write code?


Working in say Java doesn't really force consistency, it just pushes the inconsistency up a level. Programmers still get painfully "clever" and idiosyncratic.


Being an ignorant fool with an uninformed opinion, I would like to see a C compiler that is evaluated and critiqued not only based on the warnings and errors it generates but, more importantly, on the assembly it generates.

Namely, how compact and readable is the generated asm? When we read the asm, can we easily follow what the compiler has done and _why_?

As an ignorant fool, in my mind C is still a shorthand for writing assembly, to save old programmers from continuing to be or new programmers from becoming "assembly language programmers". Obviously many years have passed and "C" has become an institution and means much more to so many people. Aopologies to those people. I am just a fool.

I see the _theoretical_ "C compiler" as nothing more than a code generator, spitting out assembly. Obviously the _practical_ C compiler is very different. Base on the way it's used, it seems inextricably linked with a "preprocessor" (glorified sed).

It is said that asm has a "one to one" relationship with machine code. Theoretically, we can look at asm and determine its machine code equivalent without any "clever" algorithms.

My humble, ignorant fool's opinion is that it would be better if C had a closer relationship to the asm the "C compiler" generates.

Maybe not "one to one" but at least "predictable, boring".

Forgive me for having opinions about overly complex things few people can comprehend (e.g., a "modern C compiler"). I am just an ignorant fool who likes code generators. Especially ones that output assembly.


Most C compilers have a flag to turn this mode on. It's usually called -O0 and is actually the default.


C doesn't turn into some magical obtuse assembly with optimizations enabled, a loop will still be visible in some way but might vectorized, unrolled, eliminated or even merged but you can still map it to the source. It's interesting to see what a compiler can do: ICC does some really fancy stuff such as eliminating whole chunks of code that are meant to help most compilers vectorize.


I agree it would be an interesting exercise to write a compiler optimized for assembly readability. But I don't think there is enough demand for such a thing so that it will get written.

It also would be interesting if those who think there is enough demand start a crowdfunding campaign to prove that there is enough demand.


> But I don't think there is enough demand for such a thing so that it will get written.

Isn't that basically what we have academia for?


I don't think there is enough academic prestige for a compiler optimized for assembly readability either.


"_compact_ and readable"

For me, the former compliments the later. The less code I have to read, the better.


I applaud the author for reaching the daunting part of this project and reacting with humility appropriate to the difficulty. I hope they continue in this effort, because they sound like the right person for the job.

There are going to be a lot of trade offs in any friendly C spec. Appreciation for both sides of a trade off is valuable in making good decisions.


How exactly would performance degrade by defining a virtual machine for C programs to run in where assurances are given that all of the standard behavior is fully defined -this operation either fails, or gives THIS result-? It sounds like it could be massively useful, at least for non-realtime applications.


pcwalton claims a factor of 3x to 5x elsewhere in the thread.

Honestly I think the biggest problem here is social. Every programmer thinks they're smarter than others. If you give them a button marked "remove all safety checks, increase performance by 1%", they'll press it. Those who wouldn't press it have probably already moved on from C.


Creating a C virtual machine might work for most user space applications, but what about operating systems and system libraries? Most operating systems and low level libraries are implemented in C, or at least partially, and implementing a C virtual machine that you then want to run kernel code on top of, seems daunting. Performance wise I would assume a C virtual machine would be on par with java (JVM).

So I would say a C virtual machine wouldn't be a very fruitful endeavor, Most user space applications can be written in a Memory safe language, while the most incumbent C code, kernel and system libraries probably can't or at least shouldn't be executed in a virtual machine.


Let's say you have a trapping division on the actual CPU you run on. And let's say the virtual machine specifies that division doesn't trap. Then the compiler has to insert checks for invalid inputs to prevent trapping from happening. Those checks may degrade performance.


The Android team idea didn't make any sense to me.


I didn't get that either. They're shipping a totally broken locale.h and should be the last persons ever to get involved in C-language design.


I wonder what Ritchie would be thinking of that.

Also, if friendly C is a problem, does that mean that friendly C++ is in the realm of the impossible?


Target a virtual architecture (LLVM IR). Run the original code in an emulator for the target system (QEMU); give warnings where the two differ, with a switch for the virtual architecture to mimic the behavior of whatever architecture you need.

You get friendly C, a new way to think about warnings, and your old code can still be compatible with the new virtual architecture.

Next up, world hunger.


You can't simply run code for all possible inputs.


No, but you can certainly exercise it well. It might be a degree of confidence measure. Combined with property verification (so, formal verification), you'd get a damn good idea they'd be close.


This solution relies on some undecidable problems as being decidable


How so? I don't see the connection to undeciability.


If you have undefined behaviour after an infinite loop that breaks on some condition, then that undefined behaviour will not necessarily have been triggered. Thus, no warning.


Regehr himself "disproved" Fermat's Last Theorem by abusing a non-terminating loop and over-aggressive compilers.

http://blog.regehr.org/archives/140


Yes, this I a great example of the consequences of undecidability. Was taught to me in my algorithms and proofs class




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: