
One Word Broke C - quelsolaar
https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/
======
_kst_
The author claims that:

> In C89, undefined behavior is interpreted as, “The C standard doesn’t have
> requirements for the behavior, so you must define what the behavior is in
> your implementation, and there are a few permissible options”.

That's a serious misinterpretation. C89 says that "Permissible undefined
behavior" includes "ignoring the situation completely with unpredictable
results". There is absolutely no requirement to document what those
"unpredictable results" might be.

The standard joke is that one possible consequence of undefined behavior is
making demons fly out of your nose -- not because that's actually possible,
but because actually making demons fly out of your nose would not violate the
standard. That's equally true in C89 and all later versions of C. The change
in C99 from "Permissible" to "Possible" made no difference. The phrase
"imposes no requirements" has always meant exactly that. (And if you think the
change from "Permissible" to "Posisble" is semantically significant, then it
would have been a recognition of what was already accepted.)

The idea of "nasal demons" goes back to 1992.
[http://catb.org/jargon/html/N/nasal-
demons.html](http://catb.org/jargon/html/N/nasal-demons.html)

~~~
fulafel
As a metapoint, the fact that there is this confusing epistemological
minefield around the specified semantics is a bad thing in itself, separate
from the evils of UB.

~~~
paulddraper
You're not wrong but what are the alternatives?

(1) Have a vague spec that doesn't distinguish between defined and undefined
behavior.

(2) Define everything, at the cost of performance/hardware support.

Remember that C works on 50-year-old hardware, and current hardware.

~~~
fulafel
(2) is the usual standard in computing. It's not like there aren't any
performance or hw support compromises in current C.

------
nneonneo
My "favourite" recent compiler bug comes from Rust. Compile the following code
in Release mode:

    
    
        fn main() { (|| loop {})() }
    

This defines an anonymous function containing an infinite loop, and calls it.
Any sane compiler would make this just loop forever, right?

    
    
        $ cargo run --release test
        Illegal instruction (core dumped)
    

Yeah, that's right. In __Rust __, an ostensibly safe language, Clang 's
overeager optimizer chooses to emit an _illegal instruction_ in lieu of an
infinite loop. (Try it here: [https://play.rust-
lang.org/?version=stable&mode=release&edit...](https://play.rust-
lang.org/?version=stable&mode=release&edition=2018&gist=4233cbe4bf524b1154b2e7b560d58610))

This particular bug actually comes from C++. In C++, threads may be assumed to
always make forward progress, i.e. side-effect-free infinite loops are UB.
However, this is _not_ true in C, and it's definitely not true in Rust, yet
Clang makes this assumption anyway.

It gets a lot worse; once Clang sees an infinite loop and decides it's UB, it
gets to make all sorts of thoroughly invalid and silly "optimizations"; see
this SO question for some examples:
[https://stackoverflow.com/questions/59925618/how-do-i-
make-a...](https://stackoverflow.com/questions/59925618/how-do-i-make-an-
infinite-empty-loop-that-wont-be-optimized-away)

~~~
lmm
> This particular bug actually comes from C++. In C++, threads may be assumed
> to always make forward progress, i.e. side-effect-free infinite loops are
> UB. However, this is not true in C

That's very much disputed. The C standard requires programs' behaviour to be
as-if executed on the C abstract machine when they terminate. It's not at all
clear whether the standard imposes any requirements on programs that would not
terminate when executed on the C abstract machine.

~~~
pingyong
Is there even a case where the compiler can optimize a terminating loop better
when it is allowed to assume that it will terminate at some point?

~~~
palmtree3000
Yep. Here's an example I found once:
[https://godbolt.org/z/XQBcR9](https://godbolt.org/z/XQBcR9)

    
    
      pub fn mul1(mut a: i32, b:i32) -> i32 {
          let mut out = 0;
          while a != 0 {
              out += b;
              a-=1;
          }
          out
      }
    
      pub fn mul2(a: i32, b: i32) -> i32 {
          if a == 0 {
              0
          } else {
              b + mul2(a-1, b)
          }
      }
    

Both are optimized to imul. But that's not actually correct: neither of these
should terminate for negative a!

Incidentally, this is actually a rust bug, caused by LLVM performing this
optimization despite it being illegal in rust.

~~~
OskarS
Interesting! In C/C++, optimizing to imul would be correct since if a was
negative it would eventually underflow, and signed integer underflow is UB.
Therefore ignoring that case is fine.

Presumably the same is true for LLVM IR, which would mean that it's Rust's
responsibility to emit code that checks for that case since in Rust
under/overflow is defined. Very interesting compiler bug! I noticed that they
haven't fixed it for the latest version. Did you submit the bug report to
Rust?

~~~
aw1621107
> Presumably the same is true for LLVM IR

I think it depends on the flags the multiplication operation is emitted with;
namely, the nsw and nuw (no signed/unsigned wrap) would denote whether the
optimization LLVM does is correct. If rust emits those flags then that would
be a Rust bug; if Rust does not, then it might be an LLVM bug (I'm not sure
what LLVM's semantics are for regular imul without flags).

------
userbinator
This topic comes up semi-regularly here on HN but I think the underlying issue
is that standards do not exist in a void, yet a certain group of people seem
to think that they do --- the fact that a standard "imposes no requirements"
should really be taken to mean "think about what really makes sense", not "do
whatever the fuck you want". Hence the "but we still complies with the
standard!" defense from compiler authors when faced with perplexing results of
UB is totally ignoring the reality and practicality of what a compiler is for.
Note that the standard writers have even tried to give that hint, with the
"behaving during translation or program execution in a documented manner
characteristic of the environment" phrasing which is precisely what
programmers are usually expecting, but it seems few actually took the hint.

~~~
giomasce
Different people have different ideas on what "makes sense". The point of a
standard is to not have to use common sense, which is usually not common al
all. So there is no point is saying that when the standard does not say
anything you should use common sense. If the standard says nothing, you should
just avoid relying of that thing.

The compiler is not responsible for the quality of the source code. The
programmer is. The compiler is responsible for the quality of the machine
code, assuming that the source code is correct. It is good to have tools to
check the quality of source code, but they are a different thing from
compilers.

Personally, I am happy that the compiler optimizes the code for me when I
write correct C/C++ programs. If I make a mistake and inadvertently do UB, I
take the responsibility, without shifting it on the innocent compiler.

~~~
pjmlp
> The compiler is not responsible for the quality of the source code. The
> programmer is.

The last 40 years have proven how well it works in practice, specially if one
plugs some kind of networking into it.

Taking the responsibility should be taken to the same liability level of other
engineering disciplines, then it would be interesting to see how long the myth
of only bad programmers write bad C will survive.

~~~
comex
I don't know how much this actually affects your argument, since you seem to
be making a more general statement, but for the record: it's extraordinarily
rare for vulnerabilities to be caused by compiler optimizations specifically.
I've heard of a few instances (BIND denial of service; Linux TUN bug becoming
exploitable instead of a denial of service; IIRC something with Native
Client), but they're interesting precisely because they're rare. So I'd say
that one shouldn't oppose compiler optimizations directly because of the risk
of security vulnerabilities, although of course the prevalence of
vulnerabilities can still be evidence that C programmers are prone to mistakes
in general.

------
tumult
This is shown as an example of illegal C:

    
    
        struct{
        char x
        char y;
        }a;
    
        memset(&a, 0, sizeof(char) * 2);
    

But the only thing illegal about it is that there's no semicolon after 'char
x'.

Writing to (theoretical) padding in this way is fine. Though, you might want
to check the manual to see if this could generate a trap representation on
your platform. (This won't, on any platform I'm aware of.)

Also, the sizeof operator always returns 1 for char and unsigned char (C99:
6.5.3.4), so there's no reason to do sizeof(char).

~~~
unwind
I didn't understand _why_ you would want to write that code. To me, a struct
is a (very very mild) abstraction, since there can be padding added for
alignment and if you don't need to know about each field's offset, then you
shouldn't care.

So just write

    
    
        memset(&a, 0, sizeof a);
    

and be done, that will zero any padding too and end up just doing the right
thing. It's also very clear to the compiler what you're after, and I wouldn't
be at all surprised if a compiler chose not to call memset() for this, and
just does the equivalent of

    
    
        a.x = 0;
        a.y = 0;
    

or perhaps, by knowing about padding, doing a properly-sized single write to
both fields at once.

~~~
tumult
_Why_ is beside the point. I'm not making a judgment about morality or
motivation. I said that one of the few concrete examples in the blog post was
factually wrong. It shouldn't be used as an example for the point the author
is trying to prove.

Also, for proving the point of the blog post, the example you showed instead
would have been wrong in the same way as the original example from the blog
post.

The point was about writing to the (theoretical) padding between fields. Your
example would still have written to this padding (if it existed) in the same
way. And if this padding did exist, it still wouldn't have been illegal, in
either example.

------
aw1621107
I'm not really convinced the author is correct in claiming that a one-word
change opened the floodgates to optimizations on undefined behavior. In
particular I think:

> Careful reading will reveal that the word “Permissible” has been exchanged
> to “Possible”. In my opinion this change has lead C to go in a very
> problematic direction.

is a red herring. In my opinion, the _actual_ problematic phrase is this:

> ignoring the situation completely with unpredictable results

which didn't change between C89 and C99.

It all comes down to what "ignoring the situation" should mean. Compiler
vendors appear to interpret this to mean "ignore situations that invoke
undefined behavior". Programmers who dislike optimizations based on undefined
behavior appear to interpret this to mean "ignore the violation that leads to
undefined behavior and treat it like conforming code". Who's right? It's
ambiguous.

> C compilers have taken the concept of undefined behavior even further by
> doing the mental acrobatics of thinking that “If undefined behavior happens,
> I can do want I want, So therefor I can assume that it will never happen”.

I think this description is a bit uncharitable. I think a better description
might be "If undefined behavior happens, there are no constraints on program
behavior. Thus, if I assume that undefined behavior never happens, optimize
the program based on that assumption, and that assumption is violated, the
resulting program behavior is still OK because the standard imposes no rules
on what the program must do when undefined behavior is invoked."

> Lets take a look at signed integer overflow:

Wasn't this useful for loop counters? I think this gist had more info:
[https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...](https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7)

In addition, this bit:

> In C89, undefined behavior is interpreted as, “The C standard doesn’t have
> requirements for the behavior, so you must define what the behavior is in
> your implementation, and there are a few permissible options”.

is implementation-defined behavior, not undefined behavior. As the C89
standard notes, defining the implementation's behavior is only one of the
things an implementation _may_ do, not something it _must_ do.

~~~
a1369209993
> is implementation-defined behavior, not undefined behavior.

They're (confusingly) misusing the word "define" there. Implementation-defined
behaviour is required to be _documented_ , but _every_ behaviour is something
that "you must define" in the sense that you have written down some (possibly
implicit/emergent) definition of it as part of the compiler's source code. Ie,
it's not "the standard requires you to define this", it's "you are not capable
of _not_ defining this, and the standard requires(C89)/doesn't require(C99+)
you to pick from the following options".

~~~
aw1621107
I think I see what you're trying to get at. It feels almost tautological, but
I think it makes sense.

------
brianpgordon
> The point of a compiler is not to try to show off that who ever implemented
> it knows more loop holes in the C standard, then the user, but to help the
> programmer write a program that does that the programmer wants.

The author makes it sound like the people working on optimizing compilers are
deliberately seeking out these weird corner cases and selecting some random
surprising behavior for them out of a hat, gleefully imagining how confusing
it will be for end users. That's not how it works. Optimizers can be
extraordinarily complex and need to maximize this ill-defined thing called
"performance" in a highly multi-dimensional solution space. They ping-pong
around inside this space constrained only by the specific requirements of the
standard, and it's not surprising that some of the techniques used would
produce some counter-intuitive results if the programmer is breaking the rules
and relying on undefined behavior. It's kind of like if you trained a neural
network to classify cat and dog pictures, and then you showed it a picture of
a fire truck and expected it to give you a useful result.

The idea of a new version of the C standard that defines some of the most
surprising undefined behavior is an interesting one though, and I'd be
interested to see how much that really impacts the ability of the optimizer to
do its work.

~~~
SAI_Peregrinus
I'd love it if the C standard just removed undefined behavior, replaced
explicit instances ("the behavior is undefined" to "the behavior is
implementation defined") and put in a blanket "Any behavior not specified by
this standard is implementation defined". Keep the rest the same, just
document the footguns. Implementation defined is exactly as powerful as
undefined, it just makes the compiler writer describe what will happen.

------
dependenttypes
You might find this relevant
[https://news.ycombinator.com/item?id=10772841](https://news.ycombinator.com/item?id=10772841)

What you want is not another language but a compiler that has a well defined
behaviour and is not willing to change it between versions when UB in the C
standard is triggered.

------
mjevans
Any instance of undefined behavior should result in a Warning, if not an
Error.

Similarly, 'dead code paths' should also result in a Warning, if not an Error.
(Possibly with a way of turning that off for specific functions.)

I also fully agree with the author's statement about what a good compiler
should be doing.

~~~
millstone
> Any instance of undefined behavior should result in a Warning, if not an
> Error

How would you implement this? For example, use-after-free is undefined, and
difficult to detect.

~~~
vnorilo
Warning when optimization passes take advantage of UB is a reasonable desire.
The compilers are getting increasingly good at this, but I believe it would be
far too noisy to blanket enable all of that.

~~~
mjevans
The point is to push the info to the programmer so that the code can be re-
written as only defined behavior. Yes this would eliminate the 'gains' of
abusing UB, but it would actually fix the errors of abusing UB.

~~~
vnorilo
I guess signed indvars would be a prime example. Compilers famously ignore
overflow of signed integers when computing loop trip counts.

Would it be better to only ever use unsigned integers for indvars and loop
limits? Maybe. But that would reach deep into the types in stdlib for example.

And then there's [https://software.intel.com/en-us/forums/intel-c-
compiler/top...](https://software.intel.com/en-us/forums/intel-c-
compiler/topic/698664)

So, all in all, not simple. Not that that invalidates your point: I just
wanted to add some nuance.

------
jancsika
> If the compiler thinks that writing to NULL, is undefined, it can therefore
> assume that since you are writing to p, p can’t be NULL. If p can’t be NULL,
> the entire if statement can be removed and after optimization the code looks
> like this:

Is that correct?

I thought Linus' rant was about NULL checks that came _after_ a dereference
had already happened. In that case it at least makes sense that the compiler
would assume the NULL check would be superfluous.

But how could branching to spit out an error and exit _before_ the dereference
ever get optimized out? I don't see any undefined behavior in the author's
example upon which an optimization could trigger. (Or if it could trigger, it
ought to trigger on array bounds checking and many other situations where
removing the code would clearly cause bugs.)

~~~
quelsolaar
You get this bug because the compiler doesn't know that the function will not
return and therefor assumes that the de-reference will happen even if the
value is NULL. Some compilers have a (non standard) keyword to indicate that a
function will not return. adding an "else" will "fix" the issue.

~~~
aw1621107
This is a corner of C I'm not familiar with, but does the standard say
anything about functions that never return? If it doesn't say that
implementations may assume functions return then that sounds more like a
compiler bug than taking advantage of UB.

~~~
oconnor663
This seems contrary to how (I believe) the compiler works with atomics. If I
call some opaque function foo(), the compiler has to assume that foo() could
perform sequentially consistent atomic operations, and it cannot move other
reads or writes across that function call. Why isn't it also required to
assume that a function could terminate the program or longjump out?

~~~
aw1621107
Yeah, I think you're right. I guess a more general statement would be that the
example optimization may not be valid because that function may contain
unknown side effects, including program termination.

------
bigcheesegs
Yet again someone ignoring the line:

> for which the Standard imposes no requirements.

C89 had no requirements, C99 clarified that the line that followed was non-
normative.

The entire premise of this article is flawed.

~~~
TwoBit
The point of the article is that the author wants compilers to tell users when
they do unexpected things. I don't see how your point is relevant to that.

~~~
clarry
And the problem is that users ask the compiler to optimize, and then they
complain because it optimizes.

See, optimization can do a lot of unexpected things, but it is very difficult
in general for the compiler to tell whether a certain application of (say)
range analysis to eliminate dead code in a bunch of inlined calls is going to
be unexpected. That's exactly the kind of optimization I want a compiler to
do; I can write small functions that are as generic as possible yet cover all
the edge cases, and then the compiler can find out which edge cases cannot
apply in this particular situation.

If the compiler told the user every time it did something, you'd never finish
reading the output. Might as well ask the compiler not to optimize at that
point.

------
dejj
Quite the converse. To me, “permissible” sounds as capricious as does
“possible”. I read the article on “mass amateurization” before and am thinking
how words must be sounding differently to me as a non-native English speaker
and computer scientist. Maybe it had been best to insert “No running with
scissors!” to that spec.

------
mpweiher
What a great idea!

[https://blog.metaobject.com/2018/07/a-one-word-change-to-
c-s...](https://blog.metaobject.com/2018/07/a-one-word-change-to-c-standard-
to-make.html)

The comment section also has a lot of the same justifications for the current
interpretations that I see here, with fairly thorough debunkings.

Changing that one word is probably not quite enough, at the very least you
need to also not make the remaining text a note or also change the status of
notes back again.

Then there is the conflict between "imposes no requirements" and "here is the
range of permissible actions". These cannot both be true, and the current
dogma is to resolve the conflict by pretending the second part dosen't exist
(with, admittedly, some justification). But then why on earth is it in the
standard at all?

~~~
saagarjha
"Ranges from" does not necessarily enumerate the only possible choices:
[https://english.stackexchange.com/questions/148428/using-
thr...](https://english.stackexchange.com/questions/148428/using-three-
examples-with-range-from)

~~~
mpweiher
Never said it did...

?

~~~
saagarjha
Specifically, it would allow for more than just the three options you mention
there.

~~~
naniwaduni
It allows additional options _in between_ those listed.

I'm skeptical that such an ill-defined rationale-centric change would hold any
purchase with modern compiler-writers (it is certainly arguable what points
lie "between" qualitatively-described behaviors), but in general a range does
not usually include literally all possibilities.

------
imtringued
I often see people still choose C even in 2020 for some projects. The usual
arguments are that C has stuck with us for several decades and that it is now
rock solid. Articles like this make me think otherwise. Why do modern
compilers break old code and then claim that it was broken from the start?

~~~
aw1621107
> Why do modern compilers break old code and then claim that it was broken
> from the start?

Because strictly speaking that old code _is_ broken in that it violates the
standard. Older compilers either didn't use the more aggressive interpretation
of undefined behavior or didn't implement the optimizations that resulted in
bad runtime behavior.

------
einpoklum
The author is encouraged to look into undefined behavior sanitizers:

[https://duckduckgo.com/?q=ubsan+sanitizer&t=ffsb&ia=web](https://duckduckgo.com/?q=ubsan+sanitizer&t=ffsb&ia=web)

clang and gcc have those.

Other than that - @_kst is correct, it was the same in C89 and C99, and the
intention is simply to let compilers make the optimization they like ignoring
what happens for UB cases.

------
dirtydroog
It's C++, but UB allows the compiler to call a function the programmer never
called

[https://kristerw.blogspot.com/2017/09/why-undefined-
behavior...](https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-
call-never.html)

I'm not sure how any compiler writer can condone this behaviour.

~~~
aw1621107
That particular behavior (and some other examples of pathological optimizer
behavior) is probably an emergent property of how several optimizations
combine as opposed to being a specifically targeted optimization. I can
imagine something the optimizer doing something like:

1\. Determine possible function call targets

2\. Eliminate invalid targets from call set

3\. If only one target remains, replace indirect call with direct call

That might be useful for something like devirtualization, but also could lead
to the behavior exhibited in that blog post.

Indeed, the second blog post [0] shows that if there are two possible call
targets, LLVM could generate an indirect call like what you may expect.

[0]: [https://kristerw.blogspot.com/2017/09/follow-up-on-why-
undef...](https://kristerw.blogspot.com/2017/09/follow-up-on-why-undefined-
behavior-may.html)

------
lonelappde
Why is a completely wrong blog post based on a misunderstanding of a 30 year
old standard on the front page with 110 votes?

~~~
throwaway2048
because a lot of users here think that a compiler somehow detects undefined
behavior, then emits wacky code just to spite them.

------
OliverJones
rms was all over this "undefined" stuff. Early gcc versions actually tried to
launch the games Rogue or Nethack upon detecting #pragma directives.

Old IBM "theory of operation" books used the word "unpredictable" for this
kind of thing. "unpredictable" should frighten programmers. It's worse than
"random," because random results have a chance of being caught during unit or
system test.

------
scoutt
The article kind of lost me at the first example.

    
    
      struct{
        char x
        char y;
      }a;
    
      memset(&a, 0, sizeof(char) * 2);
    

What does the compiler know about _memset_? Why it should? It can be a
function called _heyLetsFillTheMemoryOverHere( &a, wrongSize);_ and still it's
the programmer's fault.

In fact, it's a good thing that C doesn't care about _memset_ , and doesn't
get in between me and my code. And that's I love about C. It minds its own
business and does what I tell it to do.

> If I use my compiler to compile a program on my machine, the compiler knows
> that I’m compiling it to the x64 instruction set

You can be cross-compiling. I do it every day. The C language doesn't need to
know about every little platform out there. If a compiler wants to go the
extra mile and _help_ about it, that's a compiler-thing and not a C-language-
thing.

If there is a MCU that uses 3 bytes words, why should C be aware of it, 40
years before?

Don't blame the specifications, blame the compiler if you want.

Edit: removed a couple of personal opinions.

~~~
saagarjha
> What does the compiler know about memset? Why it should? It can be a
> function called heyLetsFillTheMemoryOverHere(&a, wrongSize); and still it's
> the programmer's fault.

Because memset is defined by the ISO C standard to have specific semantics,
which the compiler can leverage to make your code faster.

~~~
scoutt
> void _memset(void_ s, int c, size_t n);

> The memset() function shall copy c (converted to an unsigned char) into each
> of the first n bytes of the object pointed to by s.

The standard just declares an interface, to be added to a C library (maybe?).
It's not the job of the compiler to check for programmer's faults. The
compiler doesn't care and it shouldn't.

~~~
gpderetta
The standard defines the behaviour, not just the signature. Memset has been
treated as a builtin since forever.

------
_martamoreno_
Really bad example... Memset to zero members of a structure this way is
actually something that should trigger a compiler error, because it makes no
sense. I have limited understanding of C these days, but I consider that a
good thing, because it allows me to look with fresh eyes on the crap I wrote
back in the days. And this is one of them. It's nonsensical and similar to the
general linux sentiment of naming your variable like random garbage. It's
brainfuck. The only thing I can say with some distance is that exploiting
undefined behavior in pretty much all cases should yield a compiler error.

~~~
saagarjha
> I have limited understanding of C these days, but I consider that a good
> thing

I wouldn't.

------
olliej
There are many people arguing specific semantics of the authors arguments, but
I believe the core problem is C and C++ both dramatically overusing
“undefined” vs “unspecified”.

The difference is huge. Signed integer overflow is (per spec) undefined
behaviour, so an _obvious_ bounds check if UB and so can be removed. If it
were _unspecified_ the compiler would be _required_ to be at least self-
consistent. Eg it couldn’t do 2s complement in one place, but then treat
arithmetic as not being 2s complement elsewhere (the overflow checks). E.g if
the compiler emits code where MAX_INT+1 is MIN_INT, then the compiler can’t
also pretend that that doesn’t happen.

Undefined should be reserved solely for things that cannot have a specified
behavior (UaF, OoB memory, IO weirdness, etc).

~~~
quelsolaar
Some compilers even make it hard to test for overflow:

if(a + 1 < a) printf("Overflow error!"); else a++;

gets converted to:

a++;

because the compiler thinks that a + 1 can never be smaller then a since it
doesn't have to consider signed overflow.

~~~
CJefferson
While that looks stupid in isolation (and I agree it's annoyingly hard to
check for overflow, although gcc and clang have special builtins nowadays to
do it), it turns out there are important reasons for that optimisation.

In general, knowing that 'a+1' is '1 larger than a' allows for lots of
optimisations, when writing to an array in order we can vectorise, do things
in bigger chunks, all sorts of useful and important optimsiations. If every
time those were used the compiler had to check for overflow, it would
seriously effect performance.

~~~
pjmlp
And that is how many CVE entries end up being created, because performance
above anything else is what matters.

~~~
saagarjha
Many CVE entries are created simply because the underlying software was
written in C.

~~~
pjmlp
Written under the premises of "performance trumps all".

This is why it getting rid of the underlying software written in C should be a
concern, or at very least, adopt hardware and development practices that tame
C. After all UNIX/POSIX clones won't get replaced overnight.

Butchers that care for their hands also make use of protective gloves when
dealing with sharp knives.

~~~
naniwaduni
In practice, a lot of software is written in C because it depends on
interfaces that are defined in terms of their C APIs, without caring all that
much about performance.

~~~
pjmlp
Plenty of safer languages offer seamless C FFI, no need to write software in C
just because those interfaces are defined as C ones.

H2PAS was already a thing in MS-DOS days, just as possible example.

~~~
naniwaduni
Only feasible if you only use libraries which maintain ABI compatibility
scrupulously or if you just take a narrow view of portability.

There's a lot of code in the wild which maintains compatibility only at the C
source level using preprocessor macros.

FFIs are a nice toy for one-offs or integrating with vendored dependencies.
Most never get past that stage.

~~~
pjmlp
No C library is changing their ABI every couple of seconds, and plus many of
those tools understand C header files, quite feasible to fix broken bindings
every now and then.

~~~
naniwaduni
The problem isn't changes, it's accommodating multiple versions. Even figuring
out where to find headers is not necessarily easy if you're not the local C
compiler, for whom the tooling must only begrudgingly exist.

------
lmm
Where is the constituency that would want this "sensible C"? People who care
about reasonable behaviour have already moved on to better languages, a la
[https://www.lesswrong.com/posts/ZQG9cwKbct2LtmL3p/evaporativ...](https://www.lesswrong.com/posts/ZQG9cwKbct2LtmL3p/evaporative-
cooling-of-group-beliefs) ; the people still using C are those who want
"maximum performance" and don't think undefined behaviour is a problem, and so
C compiler writers have (understandably) gone ever further in their
exploitation of undefined behaviour to get that "maximum performance".

I don't think this can be reversed, and I don't think we should necessarily
want to reverse it. We have better alternatives now. Let the C people do their
thing, and get on with your life in a language that ensures that programs,
even "incorrect" ones, have reasonable behaviour.

~~~
lone_haxx0r
What are these "better languages" you speak of? Because I've been looking for
a good C replacement but to no avail. C++, D, Nim, Go, Rust, Zig, Odin, etc;
I've looked into them but none convinced me.

~~~
lmm
What were your issues with those languages? Several of them sound like
reasonable options to me.

For a mature, general-purpose language my default suggestion is OCaml. If you
really absolutely can't make garbage collection work for your case (something
I've never seen happen to anyone who actually tried) then your choices are
more limited (and I'd probably favour Rust, despite its relative immaturity).

There are any number of good languages out there, and I could happily go into
the details of which I think offers the best combination of tradeoffs (Scala).
But the bigger picture is that a memory-safe language with the ML featureset
(in particular first-class functions, parametric polymorphism, type inference,
and sum types) should be the minimum baseline these days, and represents a
substantial step up from C (at the most basic and pervasive level, being able
to do error handling with a result type vastly improves your defect rate).
Within that category you have plenty of reasonable choices offering their own
particular selling points.

~~~
indymike
For me: libraries not available or are just wrappers on a C library anyway,
inflexible memory management, bad support for OS features, etc... A lot of it
is just maturity. C is ancient.

