Hacker News new | past | comments | ask | show | jobs | submit login

4u - 5u is not undefined behavior in c; unsigned arithmetic overflow is defined to wrap around, in both directions

assigning -1 to an unsigned variable has never been undefined behavior either, but implementation-defined behavior (see §6.2.1.2p3 in iso c90 or §6.3.1.3 in c99 and c11); decent compilers like gcc define it to do the obviously correct thing (see https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.h...), not produce mysterious program behavior. in the new c++20 standard it's no longer even implementation-defined, but fully define behavior in the standard, but in standard c it's still implementation-defined

as a programmer who whines about the kind of ub exploitation at issue here, and who (as established above) has a much clearer idea than you do of what behavior is and is not defined, specifically what i am whining about is that it's harder to debug my code if doing things like dereferencing a null pointer doesn't reliably crash it, but instead triggers the silent removal of apparently unrelated logic from my program, or subtle semantics changes in it, in the name of optimization

also it sucks to have existing security-sensitive code acquire new security holes, and i don't really care if that's a learning opportunity for the guy who wrote it 20 years ago

this didn't happen 20 or 30 years ago, and in those 20 or 30 years the impact of introducing fresh security holes into existing well-tested c code has greatly increased, while the usefulness of c compiler optimizations has greatly decreased




You're correct that 4 - 5 isn't undefined in C. But there's a good reason that Rust, for example, chooses not to make its built in integer types (either signed or unsigned) wrap on overflow. Wrapping<u32> is very easy for a typical processor to implement but in most cases overflow is a mistake, so you want to report it not silently give the wrapping answer instead when the programmer apparently hasn't considered overflow at all.

If you give up the aggressive optimisations then C is even less competitive with modern languages, so you're basically arguing that it's acceptable to have code that's harder to write, more buggy and with worse performance just because that's how you'd have done it 30 years ago. At that point you're talking about a retro hobby.


what i'm arguing is that c compilers exist primarily to support the base of existing c code and to run it correctly and with adequate performance, not out of some unhinged notion of 'competitiveness' or to entice people into writing new programs in it

that is, i do think it's acceptable to have code that's harder to write and with worse performance, but not because of some hypothetical; rather, it's because that's how people did do it 30 years ago, and 20, and sometimes 10, and i want to run the code they wrote. this could be described as a 'retrocomputing hobby' except that that code is, for example, linux, cpython, gcc, r, libreoffice, gtk, glib, grub, bash, freetype, emacs, vim, libjpeg, libpng, postgres, mariadb, ssh, openssl, gnutls, apache, poppler, ghostscript, tex, gpg, zlib, memcached, redis, etc.

if you want to describe running vim and bash and cpython on linux as a retrocomputing hobby, i guess that's kind of defensible actually, but it comes across as wishful thinking when we don't have a working non-retro alternative yet

(moreover i have no idea why you think omitting these risky optimizations makes c code 'more buggy')

i agree that there are numerous deficiencies in c's semantics, and plausibly silent wraparound is one of them, though it seems like rust's alternative is that overflow crashes in debug builds and silently wraps like c in release builds


I think you need to reevaluate what happens to code once optimizations are turned on. Especially if you can afford to do test guided PGO combined with LTO. There is nothing in common between how code is executed with optimization and what were programmers writing years ago. It is like 5-10 levels of functions are just gone and completely restructured by compiler. I do not see middle ground if people want to keep the performance. Old C++ and likely C that failed to optimize is just gone at the moment.

Whatever legacy code has to be "preserved" is likely to be specifically deoptimized to keep it in frozen state.

PS I come from gamedev and I really wish compilers left us control of what is going on there.


inlining code and most code-movement optimizations are among the many optimizations that don't require the kind of nasty tricks with undefined behavior that i'm criticizing, and it's been common practice for decades; c++ has depended on method inlining to get acceptable performance since the 80s

it's kind of a pain if you're stepping through the code in gdb but that's acceptable

turning off optimization entirely is still too costly for most cases


That is an interesting point. I wonder if it would be feasible to go over every little bit of UB in the Standard and see if it could instead be well-defined in a way that would be compatible with most existing C code out in the wild. How much perf overhead that would actually be? And is there a way to quantify how many exploits that would have prevented?


this was dan bernstein's project with 'boring c' and john regehr's with 'friendly c'; i think the answer is that most of it can be, but definitely not all, and possibly on some platforms it wouldn't be most existing c code

https://blog.regehr.org/archives/1287

basically it seems feasible but it's going to be a lot of effort

also of course any semantics change will introduce some new exploits, but doing that once seems preferable to doing it every year


To be clear, I'm thinking specifically from the perspective of making existing legacy code less broken, not about making the language "more friendly" - as you rightly point out, C really ought to be considered legacy by now (and in new languages, where we will also have this discussion, we can at least begin it from a clean slate). Thus there's no need to consider usability. I was also imagining a hard rule: if the only reason why something is UB is perf overhead, it must not be UB (this covers e.g. the memcpy/memmove debate brought up in that article). But, yeah, as the other example with shifts illustrates, even such tight-fisted approach can be problematic.

The real question is how much perf overhead this entails vs how much benefit from mitigating yet-undiscovered issues caused by UB in legacy code.


C already has bad performance due to null terminated strings.


no




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: