> huge existing stock of C code I feel like the amount of effort that has been s...

optymizer · 2024-05-26T23:19:37

It's not that I don't want memory safety or that I feel superior - what I want is to write the fastest possible portable code. That's what C does, and nothing more.

Memory management, array bounds checking and a bunch of other 'safe' features have a price that I'm not willing to apply broadly and redudantly to all of my software.

I'm going for speed, that's why I'm using a Ferrari. Corolla's are fast and safe - use those, don't lobby for Ferrari to add safety to their cars at the expense of speed.

There are hundreds of languages. Use those. Write transpilers for C code for software that shouldn't have been written in C because it had to be safe. That would be a better use of your time.

admax88qqq · 2024-05-28T14:33:40

C is not a fast language outside of microbenchmarks.

If you’re writing large systems Java is the fastest language. Go benchmark Jetty vs Apache when serving non trivial web apps. Java is actually amazingly fast but it feels slow due purely because it starts up slower, startup time is not an issue for long running applications.

Heck just look at Apache Lucene the gold standard of full text search.

optymizer · 2024-05-30T11:54:15

You must be trolling. This is just patently false. Please don't spread misinformation.

Your comment is confusing "fast enough" with "fastest". Java is fast enough for lots of applications and that's fine, particularly because large systems are usually I/O bound, but it makes no sense to conclude that it is therefore faster than C.

I've been writing Java code for the past 10 years. Do you know how people speed up Java applications? They write the code in C, compile it as a library and use JNI to invoke it.

I would recommend you brush up on your CS fundamentals.

tialaramex · 2024-05-26T23:55:06

But that's not what C does, You've been, at best, misled.

What C does is assume that you're willing to sacrifice correctness to make the compiler simpler which is quite different from what you described.

In practice this has a negative consequence for performance as well as safety.

optymizer · 2024-05-27T00:28:02

Having written a compiler for a subset of the C99 standard, I'm going to disagree here.

Array bounds are not being checked on every array access not because it would make compilers too complex.

Correctness is being sacrificed mostly for speed or portability on future CPUs.

There are examples of language features that simplify compiler writing, however.

For example, type promotion from char to int is a feature that reduces the number of cases one would have to deal with when implementing the type system in a compiler, but it's there because it doesn't sacrifice neither performance nor portability.

pjmlp · 2024-05-27T08:31:48

Yet every other systems programming language never had any issues with enabling bounds checks, their only failure was not having a free beer OS to come along for the ride.

rfoo · 2024-05-27T10:34:18

I have to constantly fight against rustc and LLVM to convince it to eliminate bounds check in hot loop when I'm writing high performance Rust and it's a cursed experience I hope nobody has to.

Other replies in this thread mentioned they have similar problems writing Go, I don't know to what extent it applies, in my limited experience working on Go codebases I never see such issues.

pjmlp · 2024-05-27T11:48:38

I am writing code since 1986, in my experience most of those cases are mostly a I feel good kind of thing, and have contributed zero to the project delivery acceptance criteria.

When it does in fact cause an issue with project delivery acceptance testing, the issue is solved by making use of profiling tools, and cirugically disable bounds checking, which most systems languages since the dawn of time also support.

rfoo · 2024-05-27T17:50:47

Well, I would have no idea if a bounds check is eliminated at all (and who wants to care??), if it does not show up in profiling results.

Unfortunately for what I do I had to do this a lot. I guess that's also why I'm not seeing it in Go, never tried to write a query engine in Go.

pjmlp · 2024-05-27T19:13:19

Well, does something like CERN TDAQ/HLT count?

The algorithms, networking protocols and thread scheduling are much more relevant, that the bounds checking done in the C++ data structures.

As for writing query engines with bounds checking languages, there are several examples.

eru · 2024-05-27T04:44:57

> Array bounds are not being checked on every array access not because it would make compilers too complex.

That might be true, but you could still specify something slightly less exploit heavy than 'undefined behaviour'. Eg you could make out-of-bounds access into implementation defined behaviour.

jstimpfle · 2024-05-27T08:42:40

There is no way to predict what will happen if your program is accessing random memory at runtime, especially if it's a write access. To specify what would happen on a write to random memory would fill books that basically lay out most of the internals of the compiler and also the host OS.

eru · 2024-05-27T09:09:19

You could at least define it not to travel backwards in time.

Undefined behaviour in C infects the whole execution, not just what comes afterwards.

jstimpfle · 2024-05-27T10:01:35

Can you clarify what you mean? Is it defined to "travel backwards in time"? I suspect not.

tialaramex · 2024-05-27T22:13:20

Is the situation here that you're unaware of time travel UB optimisations in C and C++?

https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

eru · 2024-05-28T01:46:53

Thanks for digging out the link, so I don't have to!

jstimpfle · 2024-05-28T09:39:05

No I'm aware of examples like these, I just asked for clarification what they mean by "define it to not travel backwards in time". To me this sounds nonsensical.

I'm not deep into compiler construction, but to me these examples seem just like a logical consequence of what UB is -- it's a (runtime) situation that the compiler is not required to take into consideration. It can opt to not emit code to treat these situations at all, etc -- effectively assuming they don't happen. The point is to allow the compiler to blindly dereference a pointer even when it can't prove that the pointer is valid. Or to allow it to implement arithmetic on a register of bigger size (assuming the computation doesn't overflow), etc.

Now, depending on how optimizers are written, the compiler may end up inadvertently detecting UB and optimizing out entire branches of code, just by virtue of how the optimizer works internally. You can bet that the compiler doesn't think much of e.g. what is earlier or later in time, when doing optimizing transformations.

Of course a "miscompilation" (of code that is buggy in the first place) is an unfortunate situation and a diagnostic would be better. Compilers should improve (and they probably do). Compilers should be friendly and give unsurprising results and good diagnostics as much as possible.

But to "define it to not travel backwards in time" right in the spec would probably be very hard and might negate the point of UB in the first place. It would require doing the work of compiler authors, which are the people responsible to figure out how to make _their_ compiler solid and ergonomic while also offering the optimizations people want. This is already a hard task for the authors of a specific compiler, and probably not something that you can easily define in a language spec!

And for balance, I've never consciously had to deal with a miscompilation like this, and I write C and C++ in professional capacity almost every day. Instead, most bugs I deal with are of the most trivial kind, you hit a segmentation fault, quickly navigate to the piece of code where there is still some initialization stuff missing, and fill it in. Or there is a logic bug that is entirely unrelated to UB, those are in fact, typically, more difficult to find and fix.

Note that while I'm by no means an exceptional programmer (not that I think you think that of me). I simply want to solve a problem. And while developping I introduce bugs and even UB sometimes (even though it seems to be quite rare if I can trust sanitizers). I'm actually sophisticated enough to develop in debug mode, with most optimizations turned off, and this might be one explanation why I've never hit an annoying situation like this.

To me, these stories are fascinating, and I think they should be taken seriously. But their effect on online forums is mostly to heat up discussions.

tialaramex · 2024-05-28T12:22:08

This comes up now because SG21 (the Contracts Study Group) have a proposal for C++ 26. Proponents of this work would like to portray it as a crucial safety improvement - you can now write a pre-condition contract and, hypothetically, this could be enforced to deliver a meaningful safety improvement over just documenting the same requirement on a web page nobody reads.

But of course the proposed C++ 26 Contracts rely on C++ expressions. In C++ the expressions are themselves full of potential UB footguns, including signed overflow and illegal pointer de-reference. Thanks to time travel, this means adding the "safety" pre-condition may actually make your software much more dangerous not safer.

One proposed way to defuse this somewhat is to prohibit that time travel. Your contract expressions might still be UB but the idea is to promise by fiat that if so this doesn't actually time travel and destroy previously correct parts of the software.

I genuinely don't know what will happen there and can offer no predictions. In terms of what would be amusing as a spectator I hope either SG23 (Safety) explicitly says this is a terrible idea but WG21 ships it anyway or, equally funny, SG23 endorses the current unsafe nonsense as safe and then a subsequent committee has to establish a "Safety but really this time" Study Group to replace SG23 in a few years when it's thoroughly discredited.

> most bugs I deal with are of the most trivial kind, you hit a segmentation fault, quickly navigate to the piece of code where there is still some initialization stuff missing, and fill it in

Sure. C++ is such a bad language that most of your bug fixing is stuff which wouldn't even happen in a better language. Rust's std::mem::uninitialized<T>() is ludicrously dangerous, so it's deprecated (as well as unsafe) and yet C++ not only does this, it's silently the default for the built-in types. Hilarious. My sense is that a correct fix for this won't land for C++ 26 although maybe Barry can get the stars to align and prove me wrong.

jstimpfle · 2024-05-28T12:32:35

See, I don't mind UB on signed integer overflow for example. You make it sound like a terrible terrible thing. I know it's not defined (and there is a rationale for keeping it undefined even assuming 2's complement). So I don't rely on it.

Quite honestly I don't recall signed overflow to happen, like ever. It's probably happened at some point but I really don't recall. I'm not trying to make it happen because I don't have a use for it. It's not useful anyway to have a number wrap around from e.g. 2^31-1 to -2^31. It is useful however to wrap from UINT_MAX to 0 (modular arithmetic), and this is in fact defined.

Of course, if you write "if (x < x + 20)" and turn the optimizer to -O7, then the compiler will run the body unconditionally, even though assuming signed overflow the test should fail when x equals INT_MAX. Woah, I'm crushed. That condition is exactly what I needed to write.

> Sure. C++ is such a bad language that most of your bug fixing is stuff which wouldn't even happen in a better language. Rust's std::mem::uninitialized<T>() is ludicrously dangerous, so it's deprecated (as well as unsafe) and yet C++ not only does this, it's silently the default for the built-in types. Hilarious. My sense is that a correct fix for this won't land for C++ 26 although maybe Barry can get the stars to align and prove me wrong.

I mean I could just write "#error Unimplemented" to get a compile time error but I'm not bothering. It seems what you describe as a terrible memory safety bug is simply my way of browsing to the next piece of code that I need to work on. Go figure...

Are you still developing C/C++ code? I get the impression you've given up on it and have jumped on the Rust train a hundred percent. At least there is a huge disconnect between the pictures you paint and my own development experience from daily practice.

But to make it clear again, I'm obviously not opposed to having the compiler issue an error whenever it's able to detect UB statically. In fact, this is how it should be.

tialaramex · 2024-05-28T15:08:40

> Of course, if you write "if (x < x + 20)" and turn the optimizer to -O7, then the compiler will run the body unconditionally

You seem very confident how the compiler will react to UB, I wouldn't be. You also seem unduly confident that you can spot such a footgun and wouldn't pull the trigger.

> It seems what you describe as a terrible memory safety bug is simply my way of browsing to the next piece of code that I need to work on. Go figure...

It's Undefined Behaviour, and you're just quietly confident that it'll be fine. Which it will until it isn't one day (and maybe that day was yesterday).

> I mean I could just write "#error Unimplemented" to get a compile time error but I'm not bothering.

A compile time error seems like a weird choice. Why write such an error only to immediately have to fix it? In Rust I'd write todo!() when I need to come back and actually provide a value or write some more code here later, that way it only blows up if this code actually executes.

> Are you still developing C/C++ code?

Not in anger for several years. I write Godbolt-sized samples to make a point sometimes.

> But to make it clear again, I'm obviously not opposed to having the compiler issue an error whenever it's able to detect UB statically. In fact, this is how it should be.

All the popular C and C++ compilers provide a great many flags you can set to get more of these diagnostics you're "obviously not opposed to". How many are you using today? How many did you try and then turn back off because of all the "false positive" diagnostics about things you knew were a bad idea but have preferred not to think about because hey, it seems like it works, right ?

jstimpfle · 2024-05-28T16:34:05

> A compile time error seems like a weird choice. Why write such an error only to immediately have to fix it? In Rust I'd write todo!() when I need to come back and actually provide a value or write some more code here later, that way it only blows up if this code actually executes.

Well that's exactly what I get too by doing nothing and noticing the segfault when running my debug build. Sure, I get it, it's UB and there could be "time travel" and what not. But in practice I seem to get my segfault, so that's just how I end up developing. If it wouldn't work, I could write my own todo() macro, nothing magical about it right?

> All the popular C and C++ compilers provide a great many flags you can set to get more of these diagnostics you're "obviously not opposed to". How many are you using today? How many did you try and then turn back off because of all the "false positive" diagnostics about things you knew were a bad idea but have preferred not to think about because hey, it seems like it works, right ?

I compile with -Wall on Linux and -W4 on MSVC. If I'm not seeing bugs in the integration tests, there is for most domains very little economic incentive to setup various static analyzers etc, so I rarely do that. I run -fsanitize on some of my stuff from time to time just for kicks, but haven't gotten enough value out of it, which is why it's not a habit for me.

But since you mentioned it I went ahead and ran -fsanitize=undefined -fsanitize=address on a test program of the multi-threaded queue I'm working on, which is a bit performance-oriented -- on my older desktop computer it persists > 2M individual messages/sec (600MB/s) to a single disk, with to-memory message submission latencies of < 300nsecs for 99th percentile, < 2usecs for 99.9th percentile and < 30usecs for 99.99th percentile. The test program runs for ~6 seconds, submitting 16M messages (4GB of data), with 4 concurrent readers receiving the messages as soon as they come in. 178 fsync() calls were done by the enqueuer threads or the dedicated flusher threads. There are various internal buffers (a couple MB) and multiple internal message stores (1 optimized for fast submission / 1 for dense storage), and a couple low-contention mutexes but also some wait-free stuff.

-fsanitize didn't find a single UB (I double checked that the detection does work in principle by introducing a signed-overflow bug and a null-pointer dereference as well as an OOB memory dereference). And it found 3 leaks of 1 byte, which seem to be false positives: all related to smaller structures (more than 1 byte) that I allocated and freed correctly. That's all it reported.

I then went on to test using valgrind, which notably reported 0 leaked bytes, and otherwise only reported tons of spam exclusively related to printf-family calls. IIRC these are common false positives due to library mismatches or something like that. You can get rid of them, but I won't bother now.

This is the first time I tried static and runtime analyzers on this project, other than -Wall. In other words, it seems that just by fixing bugs and adding code until it worked, I produced a software of ~5K lines of C code that performs quite well and has 0 bugs or UB uncovered in the good hour of work that I put in.

steveklabnik · 2024-05-28T14:06:41

The sort of thing your parent is talking about is being presented to the committee as an example of things that need to be considered, so it appears to be a serious enough issue to at least seriously discuss.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p32...

jstimpfle · 2024-05-28T17:04:52

There appears to exist a guy with a formal background who is interested in submitting a paper about formal verification and static analysis and stuff. Impressive work, but I really don't know what to take home from the existance of this, or what argument this supports.

In my sibling post I merely want to illustrate that all these concerns have little bearing on my day-to-day work (which mostly doesn't need to be certified, and is not related to the defense industry or similar). Some of these I perceive as FUD, as said I know that you can provoke these situations but I've never personally encountered nasal daemons in practice, and I feel quite productive, am not spending a lot of time on bugs, so why bother.

tialaramex · 2024-05-28T23:55:22

Gabriel Dos Reis is specifically an old colleague of Bjarne Stroustrup (they worked at the same University, Texas A&M) who is now working for Microsoft on C++ tooling and so on.

So, one way to understand these papers is that Microsoft (at least some parts of it) thinks unsafe Contracts are worse than no Contracts. Now, would that mean they just won't implement an unsafe Contracts feature shipped in a C++ 26 document? Maybe. Would these fixes get it over the line? Maybe.

jstimpfle · 2024-05-29T08:33:04

I'm taking a closer look, but from the looks of it I'm not a fan of adding yet another sub-language with differing syntax and semantics. This leads to complexity, it's a path to madness.

Without being involved -- I have no intention using any of these Contracts in whatever form. I will say though that I wouldn't care if there is UB in the contract language (just like there is in the normal language). I would prefer the variant with UB if it is simpler and more aligned with the language core. Removing the UB here is an academic exercise. Safety absolutists are uncompromising about the goal of correctness and provability. They are blind to the pragmatic issues created by the idealism. Contracts in either form could probably improve correctness by a lot, like 99% or whatever. So why should I care about the paper which could in theory bring the remaining 1%? It doesn't affect me pragmatically.

The flaw with either is that this is only in theory. In practice, I will never create enough formal contracts in to significantly improve correctness. Whatever system it is. Why? The costs are just too damn high, the only way to achieve 100% correctness when considering also pragmatic concerns, is to just not write any code.

My approach of just coming up with a simple design (not in code), trying to implement it in the most straightforward way, and fixing the code until it works, as described in my other comment, seems to have achieved something very close to correctness (maybe even 100%? Probably not).

Again, I'm not saying that UB is good or should be tolerated. I don't want it in my programs and if I find an instance of UB I'll try hard to get rid of it. However there is a reason why UB exists in C/C++ (as well as many other languages that may not have as much of it, but still have a lot of it even when not defined explicitly). And alternative approaches, trying to prevent UB mechanically, come with a cost that may not be worth it depending on what you're working on. I feel strongly like it isn't worth it for me. If you're building a fully verified or certified product, tradeoffs are likely different.

If we're citing big names, here is a well known person describing their view, which I find myself agreeing a lot with.

https://www.youtube.com/watch?v=EJRdXxS_jqo

rurban · 2024-05-29T12:08:28

Gabi is the Visual C compiler maintainer, not just tooling. The only sane person in the C++ ISO committee (_besides the sdcc maintainer, who has no power at all_).

rfoo · 2024-05-26T21:50:55

The problem here is people seldomly get paid rewriting existing C code into memory safe lamguages. While once in a while someone would be annoyed enough and pay for a fixing-C effort for a little.

Do you have suggestions on how to fix the incentive?

admax88qqq · 2024-05-26T22:28:24

I don't think that's the root problem. I think C programmers don't believe C is a problem. New software is started every day on C. There's no excuse for that and no financial incentive to do it.

If the engineers actually admitted that C is not a safe languages for shipping software, then we could at the very least freeze the existing code and write everything _new_ in a same language. But we don't. Engineers still go starting brand new greenfield projects in C which is just insane.

rfoo · 2024-05-27T05:47:43

> then we could at the very least freeze the existing code and write everything _new_ in a same language

Sure, if you are willing to help, here's my wishlist:

- I wish we can freeze libssh and write everything new in Rust.

- I wish we can freeze CPython and write everything new in Rust.

- ...

Can you do it for free? At work I'm busy maintaining old projects in C++ and writing new ones in Rust. Since I'm not getting paid to rewrite or maintain our dependencies full-time I can't do above. Oh, I'm not paid to initiate an effort to freeze our old projects either.

If this sounds too harsh:

- I wish we can freeze ZMK [1] and write everything new in Rust (or Zig, though it's not memory safe, whatever).

That's about one of my hobbies and I always wanted to do it.

[1] https://github.com/zmkfirmware/zmk

optymizer · 2024-05-26T23:22:33

Just because you don't understand it, doesn't mean it's insane. It just means that your view of the world is different from others.

girvo · 2024-05-27T01:12:54

> But I think secretly C programmers don't want memory safety

Having just come out of embedded firmware land: it's not secret, a few members of my team were pretty open about either not caring about or not wanting memory safety. But the added productivity that Nim gave us outweighed their complaints in the end