4u - 5u is not undefined behavior in c; unsigned arithmetic overflow is defined ...

tialaramex · on April 24, 2023

You're correct that 4 - 5 isn't undefined in C. But there's a good reason that Rust, for example, chooses not to make its built in integer types (either signed or unsigned) wrap on overflow. Wrapping<u32> is very easy for a typical processor to implement but in most cases overflow is a mistake, so you want to report it not silently give the wrapping answer instead when the programmer apparently hasn't considered overflow at all.

If you give up the aggressive optimisations then C is even less competitive with modern languages, so you're basically arguing that it's acceptable to have code that's harder to write, more buggy and with worse performance just because that's how you'd have done it 30 years ago. At that point you're talking about a retro hobby.

kragen · on April 24, 2023

what i'm arguing is that c compilers exist primarily to support the base of existing c code and to run it correctly and with adequate performance, not out of some unhinged notion of 'competitiveness' or to entice people into writing new programs in it

that is, i do think it's acceptable to have code that's harder to write and with worse performance, but not because of some hypothetical; rather, it's because that's how people did do it 30 years ago, and 20, and sometimes 10, and i want to run the code they wrote. this could be described as a 'retrocomputing hobby' except that that code is, for example, linux, cpython, gcc, r, libreoffice, gtk, glib, grub, bash, freetype, emacs, vim, libjpeg, libpng, postgres, mariadb, ssh, openssl, gnutls, apache, poppler, ghostscript, tex, gpg, zlib, memcached, redis, etc.

if you want to describe running vim and bash and cpython on linux as a retrocomputing hobby, i guess that's kind of defensible actually, but it comes across as wishful thinking when we don't have a working non-retro alternative yet

(moreover i have no idea why you think omitting these risky optimizations makes c code 'more buggy')

i agree that there are numerous deficiencies in c's semantics, and plausibly silent wraparound is one of them, though it seems like rust's alternative is that overflow crashes in debug builds and silently wraps like c in release builds

SleepyMyroslav · on April 25, 2023

I think you need to reevaluate what happens to code once optimizations are turned on. Especially if you can afford to do test guided PGO combined with LTO. There is nothing in common between how code is executed with optimization and what were programmers writing years ago. It is like 5-10 levels of functions are just gone and completely restructured by compiler. I do not see middle ground if people want to keep the performance. Old C++ and likely C that failed to optimize is just gone at the moment.

Whatever legacy code has to be "preserved" is likely to be specifically deoptimized to keep it in frozen state.

PS I come from gamedev and I really wish compilers left us control of what is going on there.

kragen · on April 25, 2023

inlining code and most code-movement optimizations are among the many optimizations that don't require the kind of nasty tricks with undefined behavior that i'm criticizing, and it's been common practice for decades; c++ has depended on method inlining to get acceptable performance since the 80s

it's kind of a pain if you're stepping through the code in gdb but that's acceptable

turning off optimization entirely is still too costly for most cases

int_19h · on April 25, 2023

That is an interesting point. I wonder if it would be feasible to go over every little bit of UB in the Standard and see if it could instead be well-defined in a way that would be compatible with most existing C code out in the wild. How much perf overhead that would actually be? And is there a way to quantify how many exploits that would have prevented?

kragen · on April 25, 2023

this was dan bernstein's project with 'boring c' and john regehr's with 'friendly c'; i think the answer is that most of it can be, but definitely not all, and possibly on some platforms it wouldn't be most existing c code

https://blog.regehr.org/archives/1287

basically it seems feasible but it's going to be a lot of effort

also of course any semantics change will introduce some new exploits, but doing that once seems preferable to doing it every year

int_19h · on April 25, 2023

To be clear, I'm thinking specifically from the perspective of making existing legacy code less broken, not about making the language "more friendly" - as you rightly point out, C really ought to be considered legacy by now (and in new languages, where we will also have this discussion, we can at least begin it from a clean slate). Thus there's no need to consider usability. I was also imagining a hard rule: if the only reason why something is UB is perf overhead, it must not be UB (this covers e.g. the memcpy/memmove debate brought up in that article). But, yeah, as the other example with shifts illustrates, even such tight-fisted approach can be problematic.

The real question is how much perf overhead this entails vs how much benefit from mitigating yet-undiscovered issues caused by UB in legacy code.

GoblinSlayer · on April 24, 2023

C already has bad performance due to null terminated strings.

kragen · on April 25, 2023