Being fair about memory safety and performance

pron · on Jan 22, 2022

I'm not quite sure who this article is aimed at, and who those "C++ apologists" are, but as someone who programs in C++ all day, doesn't like it at all, and yet won't advocate to switch to Rust, these kinds of arguments are unconvincing. I'm not advocating to switch to Rust not because I don't think it's better than C++. I am absolutely, 100% convinced that Rust is technically better than C++ in most possible ways — in some, significantly better — and worse in almost no way. The problem is that, overall, it's not better enough.

Reevaluating a low-level programming language is something that's done in a large organisation or project once every 15-25 years or so. Switching such a programming language incurs a high cost and a high risk, and is a long-term commitment. To make such a switch, the new language obviously has to be better, but that's not enough. It has to be a hell of a lot better (and, if not, at least return the investment with a profit quickly).

For some, Rust is better enough. For me, not nearly so. Even though it offers a fascinating and, I think, ingenious path to better safety, it shares some of C++'s greatest downside for me, which is that they are both extremely complex languages. Maybe Rust is simpler, but not enough. Rust also shares what I think is C++'s original misguided sin, which is the attempt to create a low level language, whose code appears high-level on the page by means of a lot of implicitness. I've become convinced that that's a very, very bad idea.

If there were no other ideas on the horizon or if Rust seemed like a surefire success, it might have been justified to make such a switch, but that's not the case. Rust's low adoption rate in professional settings is not reassuring to "PL-cautious" people like me, and a language like Zig shows that there are other approaches that appeal to me more, and while more revolutionary and ambitious than Rust in its departure from C++'s philosophy, I think it also has the potential to be better enough. Maybe it will make it, and maybe it inspires some other language that will, or maybe other ideas will turn up. Given the risk and commitment, to me it makes sense to wait. I don't like C++; I believe Rust is better. But that's not enough.

eklavya · on Jan 22, 2022

Every large project with years of code (legacy or well maintained) will be hardly ever re-written with complete success. It's not a property of programming language in my opinion. It's just economic sense. I doubt JDK will be rewritten in Rust/Zig, they did GraalVM in Java though.

Zig might be revolutionary but personally for me Rust is why I can dare write non gc code. I never thought I would ever be robotically precise with memory management on a large project in memory unsafe languages, that doesn't change with Zig. There are experts in C/C++, there will be experts in Zig and we will continue having memory safety CVEs because they don't make mistakes/they can use some other static analysis tool/they can test :)

I think people comfortable writing low level code under appreciate what Rust has done for bystanders or newcomers or a large team. It is definitely complex but choosing C/C++/Zig over it would make even less sense since lack of guardrails.

pron · on Jan 22, 2022

Even if you look just at sound language guarantees, Zig is closer to Rust in terms of memory safety than to C or C++. That it doesn't make all of Rust's static guarantees doesn't put it in the same bucket as languages that make none. Once you reduce memory safety issues to below half of their current rate, there are different paths to achieving a good overall correctness story.

pcwalton · on Jan 22, 2022

Zig is no way closer to Rust than C or C++ in this regard. Zig is in fact not appreciably safer than C or C++. All of the same classes of memory safety problems that are present in C or C++ are present in Zig.

I am extremely doubtful of the claim that Zig eliminates 50% or more of memory safety problems.

pron · on Jan 23, 2022

As (safe) Zig eliminates overflows just as (safe) Rust does, and, like Rust, has not unsafe casts, I don't see how "all of the same classes of memory safety problems that are present in C or C++ are present in Zig." Also, Zig guarantees — just as Rust does — that all pointers and means of creating them are known, and precisely so, so analysis tools could work on Zig better than they do on C or C++. In C/C++, at best they could check pointers at the time of dereferencing, while for Zig they could do so at the time of deallocation. You once said that couldn't work because Zig would have dangling pointers lying around (presumably in addition to the one used for deallocation) because C does, but it might as well be the case that C programs leave dangling pointers because tools cannot detect them, which has affected the programming style, not the other way around.

It's OK to doubt claims/hopes about Zig, just as I doubt the claim that Rust can achieve more correct/secure programs than Zig for less effort. Without empirical data, there's really no way to know. So in the meantime, all there is to go on is personal appeal. But you cannot support your claim that Zig is "just like C" with the assumption that it is.

I think that, at the end of the day, there is what we might call an "ideological" difference between us. While we both accept that both language features and best practices — code reviews, tests, tools — reduce bugs, we draw the line of where it's worth to sacrifice one for the other in different places. It might also be the case that Rust's complexity doesn't sacrifice anything for you, but it does for me, as I'm uncomfortable with complex languages, and I think Rust easily makes the top four most complex "production" languages in history (together with C++, Ada, and Scala). So while even for me C is on the wrong side of my line (just as Idris is on the wrong side of yours), I reject extrapolating from C to Zig, because Zig makes many more guarantees at the language level, so without pertinent data about Zig, the question cannot be settled. If I thought Zig was "like C," I wouldn't have found it promising and so intriguing, either.

Having said all that, I don't want it to sound as if I'm willing to bet on Zig right now. I'm far too risk-averse for that. But I wouldn't bet on Rust right now, either. What it brings to the table doesn't offset, for me, its (still-)high risk. All I'm saying is that the revolutionary Zig hints at a promise of a new low-level language that could bring more to the table and justify the risk.

pron · on Jan 23, 2022

P.S.

I guess we could summarise Rust's and Zig's core design hypotheses as follows: Even though both place the same emphasis on correctness, Rust doesn't compromise on memory safety (which, given what empirical data we do have, is an important component of correctness but certainly not equivalent to it), i.e. it adds all the sound language features needed to provide it, even at the cost of language complexity, while Zig doesn't compromise on language simplicity, i.e. it adds all the sound language features needed to provide memory safety up to the point they impact language complexity. I don't discount the possibility that there might be a language that could be safer than Zig yet less complex than Rust, or perhaps even as soundly-safe as Rust and as simple as Zig, but so far I haven't seen such a language.

Barring any empirical data, we cannot say which, if any, of those two approaches leads to better correctness (where by "better" I mean reaching the desired level of correctness needed for most low-level applications more cheaply), so we both lean on "ideology," where I prefer simplicity whereas you prefer sound guarantees — both of us in the name of correctness. I think we agree that both C and Idris are the wrong paths to correctness, but while we might reasonably disagree on the price we should pay for soundness, placing Zig's memory-safety in the same category as C's is just as exaggerated and misleading as placing Rust's soundness in the same category as Idris's.

By the way, I wouldn't at all be surprised if empirical research ends up finding no significant differences in correctness between the two, and, in fact, would guess it to be the most likely outcome given our inability to find significant bottom-line differences between "reasonable" same-generation languages so far.

littlestymaar · on Jan 27, 2022

> Even though both place the same emphasis on correctness

But they don't. At all. Rust treats correctness as paramount, not just memory safety (for instance, the existence of 6 different string types, or the PartialEq/Eq trait dichotomy are for correctness unrelated to memory safety). Zig doesn't.

Sure you can write correct programs in it, and that's what everybody wants, but the language doesn't make any efforts to make it easier than any others. Zig places as much emphasis on correctness as JavaScript[1], it's not C or C++ level of minefield, but when it comes to correctness the language won't help you.

Zig has cool (killer?) features like its seamless integration with existing C code, ease of cross compiling and a super cool metaprograming ability, there's no reason to oversell it on stuff it doesn't focus on: that's the best to disappoint people who'll try it.

In the same vein, talking about “safe Zig” vs safe Rust is misleading to the readers: all Zig is 100% unsafe by default, unless you compile it with ReleaseSafe or add a @setRuntimeSafety, and even if you opt-in to safety, the amount of safety is actually quite limited at the moment. There's a long time goal[2] to check for all kind of UB at runtime when safety checks are enabled, but it doesn't exist yet, and if you look at the afformentioned github issue, you'll see a bunch of “@andrewrk andrewrk removed this from the 0.x.0 milestone, added this to the 0.x+1.0 milestone”. At this point, the final vision of what “safe Zig” will look like isn't known yet! And unless Zig adopts a borrow checker or find an equivalent alternative (which would be super exciting, but is unlikely), it will incur costly runtime checks, making it undesirable in production as it will likely be slower than a regular managed language (it's not useless though, it will be like a better ASAN/UBSAN[3] that you can use during fuzzing, but pretty far from what Rust offers).

[1] and I say that as someone who spend a significant amount of time writing JavaScript for a living.

[2] https://github.com/ziglang/zig/issues/2301

[3] I say “better” because it would be strongly linked with the actual semantics of the languages (which can still change if the development of such tooling requires it) and not the retrofitted best-effort stuff you can have in C.

oconnor663 · on Jan 28, 2022

> Rust treats correctness as paramount, not just memory safety

My favorite example of this is that Mutex in Rust is a container type. I think it's very interesting to reflect on why other languages that could do this (basically everything but C and maybe Go) still don't do this. I think it has to do with how you can't stop pointers from escaping the critical section if you don't have a borrow checker.

pron · on Feb 2, 2022

> Rust treats correctness as paramount, not just memory safety... Zig doesn't.

I strongly disagree.

> but the language doesn't make any efforts to make it easier than any others.

Of course it does. Correctness is Zig's greatest emphasis, and so the language includes several important correctness features: explicitness (e.g. no overloads, no implicit calls), fast partial compilation, easy testing. Neither Rust nor Javascript have these important correctness features (Comparing Zig's explicit simplicity to JS is just as wrong as comparing its soundness to C. Zig is much sounder than C and much more explicit than JS. Zig is so revolutionary and different from everything else that we have little if anything to compare it to).

It's just that Zig's approach to correctness is different from Rust's. It relies less on soundness and more on simplicity. As someone who's been involved with formal methods for some years, I can tell you that even in that community, there are these two approaches, those who prefer soundness, and those who prefer ease, as two valid approaches to correctness. The soundness camp wants to eliminate certain bugs; the ease camp wants to find as many bugs as possible per unit of effort. I think all agree we need some combination of the two, but where the best sweetspot(s?) is (are?) is an open problem. We do know that there is a tradeoff between the two. To increase ease you must either remove soundness or offer soundness to eliminate a more restricted set of bugs. For example, sound static analysis covers fewer issues than full-blown "deep" sound proofs or model checking in exchange for greater ease, and concolic testing might uncover deep logic bugs, but at the expense of soundness.

> all Zig is 100% unsafe by default

That's incorrect.

> And unless Zig adopts a borrow checker or find an equivalent alternative...

You are defining the notion of correctness to coincide with what Rust does (i.e. sound memory safety). As Zig's correctness is handled completely differently (broad strokes: find less memory safety bugs, find more others), this makes your point tautological.

Both Rust and Zig have the same emphasis on correctness and the same dedication to features that support it — some help soundness, others ease — but their balance between the two, how they try to achieve correctness and by catching which bugs is different. We don't yet know which, if any, of these languages offer a better path to correctness. And if our assumption is that more soundness always equals more correctness, then neither of these languages are in the right direction, as both intentionally sacrifice a great deal of soundness — almost all of it, really — in the name of ease.

pyjarrett · on Jan 23, 2022

> Rust easily makes the top four most complex "production" languages in history (together with C++, Ada, and Scala)

Ada was "complicated" when it was released because it was being compared to C. Contrasting it against contemporary C++ or Rust, Ada 2012 is much simpler.

dralley · on Jan 23, 2022

The language by itself, perhaps not, I don't know.

But, the fact that it has a test framework built in, and the test allocator fails when it detects memory problems (like a built-in valgrind) definitely puts it far above C in terms of making it easy to write correct code.

pcwalton · on Jan 23, 2022

When I say that a language isn't meaningfully better than C in terms of memory safety, I mean C as programmed in 2022, which very much includes ASan.

AndyKelley · on Jan 23, 2022

You are misinformed. ASAN is a debugging tool. It is not a mechanism to provide safety to C applications.

dralley · on Jan 23, 2022

I think their reply is fair as a response to the capabilities of "zig test"

pcwalton · on Jan 23, 2022

We're talking about tests here.

I work with ASan for a living. I know what it is, thank you :)

hutzlibu · on Jan 22, 2022

It is the first time, I have heard of zig and I have to say, it indeed does sound nice. (But I haven't done low level programming in quite some time ..)

Can you maybe summarize a bit more, why you prefer it over rust?

The concept of rust, of beeing safe by default and only optimize critical parts sounds solid to me, but after a quick skimming, I have not seen such a feature for zig, too, possibly by design?

" No hidden control flow. No hidden memory allocations. No preprocessor, no macros."

dralley · on Jan 22, 2022

I'm not OP, but the typical answer is that the language is small enough to be understood by any engineer with a few weeks of training, and is pretty easy for anyone familiar with C to grasp the concepts of almost immediately.

This is something that you definitely can't do with C++ and probably can't do with Rust.

For some people it's more comforting to be able to understand the entirety of the language and focus on the complexities of the problem and the implementation than to have a slightly higher level and larger language taking over the details. It's the same reason people tend to like C over C++ or Go over other languages.

This also generally translates into a couple of technical benefits as well, such as much much faster compilers relative to more complicated languages like C++ and Rust.

Personally I still find Rust much more comfortable to write code in generally but I really appreciate the elegance of the Zig approach.

pron · on Jan 22, 2022

It's mostly a matter of personal taste and values. I am extremely averse to language complexity, I dislike implicitness in low-level languages, and I value fast turnaround times. Others may have different values. I think Rust and Zig might appeal to different people, so you should see if you like it for yourself (it takes 1-2 days to fully learn), but note that it is absolutely not production ready (certainly not for me). I just find its approach refreshing and even surprising, and more in the direction of what I would hope would become the future of low-level programming.

flykespice · on Jan 22, 2022

> No hidden control flow

Oh boy and I was just reading (https://news.ycombinator.com/item?id=30022022) about ISO C unsuitability for OS programming specially its implicit reordering of the control flow of the code, breaking parts that are sensitive to the machine code generated.

I wonder if that point was introduce as a response for this ISO C nuance, does this means the machine code generated by ziglang code will be explicitly what each statement does?

alexhutcheson · on Jan 22, 2022

> Rust is technically better than C++ in most possible ways — in some, significantly better — and worse in almost no way.

It's much worse in one way: Interoperation with existing C++ code. Sure, that's not a fair criteria - C++ is designed in a way that makes it almost impossible for other languages to use C++ libraries without a heavyweight wrapper like SWIG. However, even though it's not fair, it's still really important. If you have a project like LLVM with millions of lines of existing C++ code, adding expensive or complicated interop boundaries between different components within your system is not an acceptable price to pay.

pcwalton · on Jan 22, 2022

Have you tried cxx? C++ interoperability in Rust has come a long way.

alexhutcheson · on Jan 23, 2022

cxx is a huge improvement over manually writing bindings in C (with the inherent limitations imposed by C’s lack of expressiveness). At the same time, you still have to write a cxx::bridge to specify every boundary between Rust and C++ code, and the clients of on either side need to code against some generated code, not a simple .rs or .h file. It’s a huge improvement, but it’s still a lot more developer effort and boilerplate to use a C++ class from a Rust file than it would be to use that same C++ class from another C++ file.

cxx also can only handle a subset of the interfaces expressive in C++. If you haven’t written the interface with the specific goal of making it usable via cxx, it’s pretty likely you’ll have issues wrapping it with cxx.

flykespice · on Jan 22, 2022

Expecting a language be better "enough" is an unrealistic one, you will be waiting forever.

Of course no language will be perfect to a specific domain, there is no objective metric every individual has their own needs and no universal language can cater all them at once, there will be trade offs.

Rust is already better enough( and possibly will take decades before another language "suceeds" it), choose it or be on the perpetual expectation for a "perfect" language

AnimalMuppet · on Jan 23, 2022

> Expecting a language be better "enough" is an unrealistic one, you will be waiting forever.

Disagree. I think the "market" (the set of programmers) defines "enough". When a language is enough better (in some area, doesn't have to be all areas) you see widespread adoption.

C was enough better than PL/I, ALGOL, and assembly. Java was enough better than C++. (Why? Garbage collection, and the huge standard library.)

So far, Rust is not enough better than C++.

Now, I know this is kind of circular. I'm saying that a language is "enough" better if it wins in the market, and I'm saying that a language wins in the market if it's enough better. But I have some trust in programmers, that they are not just sheep. If a language is better than other languages in a way that matters to actual working programmers, a fair number of them will use it.

jbergens · on Jan 23, 2022

I neither a c++ programmer nor a rust one but I read things about programming. My take is that rust is near the breaking point where it becomes so commonly used that more teams and organizations will just start to use it.

The risk of switching for at least parts or a code base will seem low and then it probably looks better enough.

Of course some organizations and code bases are harder to change for some reason or reasons but many will probably start the conversion in the next 5 years. Unless rust just stops growing or something truly better comes along. I personally don't expect something much better soon and I think rust will continue to grow in usage.

pron · on Jan 23, 2022

> My take is that rust is near the breaking point where it becomes so commonly used that more teams and organizations will just start to use it.

You would have said the same thing about PHP or Ruby (or even that they're past their breaking point), and yet companies that are stuck with either one today aren't too happy about it. The problem is that these days, there are a lot of people who can switch languages without much risk. They switch to X, and tomorrow they switch to Y. While it's very good to have low-risk early adopters, having so many of them can really give a false impression about long-term risk. At some point, you need to see many "long-term committers" making a switch.

Some people ask, but how can you have many long-term committers if these risk-averse people also look around for others like them? The answer is that usually this isn't the main factor. If some technology indeed makes a big bottom-line impact, there's a competitive risk in not adopting it (because your competitors will and then beat you in the market), which is why big-benefit technologies spread quickly.

skyde · on Jan 22, 2022

do you think “C” share the same that same miss guided sin?

pron · on Jan 22, 2022

No, there is very little implicitness in C. However, C suffers from other serious problems that Rust and Zig do address (and C++, too, to a lesser degree). It is extremely unsafe (more so than C++), and has a lower abstraction ability even compared to those other low-level languages.

lordnacho · on Jan 22, 2022

Totally agree with this article, as a fellow HFT coder.

The problem in c++ is the surface where you might cause a memory problem is huge. Once it's there, it's a lot of work to test the hypotheses about where it is hiding. On top of that, these kinds of issues can escape your instrumentation in a way that other bugs tend not to. Add some debug lines, things get accessed differently -> Heisenbug. Mega pain in the ass to figure out, lots of time taking everything apart, sprinkling debug lines, running long tests to catch that one time it goes wrong in a million, and so on.

He's also right that the array access thing is not a huge thing, it can't possibly be what your decision turns on, and that most of the code doesn't have a tradeoff in performance, because it's in the config stage rather than the hot path.

Personally I've had a great time with Rust, it's far more productive than other typed languages I've used. On a business level, the issue with the type of bug mentioned above is it destroys your schedule. I've spent entire weeks looking at that kind of thing, when I was expecting to be moving on with other parts of my project. With my current Rust stuff, I'm doing what I expect to be doing: addressing some issue that will soon be fixed like adjusting some component to fit a new spec.

saagarjha · on Jan 22, 2022

This article has a core point which is good: “in Rust the default is safe, and you have to opt-in to unsafety, but in C++ the default is unsafe, and you have to opt-in to safety”. I think it’s easy to argue for Rust using this construction, because, well, that’s the entire point why Rust was created.

But, it really doesn’t take a very long post to talk about this. The remainder goes off the rails, talking about “C++ apologists” (hint: if you’re being “fair”, pick words that are unlikely to cause people to be preemptively upset. This is not one of those words) and their stupid opinions. And the author just trashes them as being complete idiots, but it’s obvious that the arguments come from inexperience or strawmen, which just makes the overall thing not particularly convincing. Saying that the various UB finding tools were useless because you tried using them and didn’t get good results is stupid. Being smug about “people who use modern C++ clearly can’t do HFT, which is the thing that you said you were using C++ to do” is also insipid, just because you spotted the use of a shared_ptr somewhere and read how it’s not zero-cost. Modern C++ has other things in it, you know, many of which are zero-cost and significantly (but not entirely) safer; picking one thing and misrepresenting it does not make for a good refutation.

Anyways, coming from someone who writes a lot of C++ and would also like a lot of code to be migrated to Rust for good reasons, it’s a good idea to approach the tradeoffs honestly and without disdain for those who aren’t convinced yet. The core argument I mentioned above and the closing part of the article does do this…but there’s a lot in the middle that doesn’t, and it drags down the usefulness of the post.

jorangreef · on Jan 22, 2022

> This article has a core point which is good: “in Rust the default is safe, and you have to opt-in to unsafety, but in C++ the default is unsafe, and you have to opt-in to safety”.

On the subject of safe defaults, just to correct that Rust does not in fact have as much default memory safety with regards to buffer bleeds (e.g. variants of OpenSSL's Heartbleed) as it could [1], because it has unchecked arithmetic (integer wraparound) as the default for performance, with checked arithmetic only as an opt-in for safety.

In other words, if an attacker can get some bounds merely to underflow (as opposed to overflow) then they can still read the sensitive memory of a Rust program, even without a UAF or buffer overflow.

Bleed vulnerabilities like these are also low-hanging fruit and significantly easier to exploit.

In other words, bounds checking only ensures you are within the buffer, but checked arithmetic is still needed to ensure that your index was correctly calculated in the first place.

I believe that Rust would be much safer against memory bleeds, if it had checked arithmetic enabled by default for safety, with an opt-out at the block scope level for performance, like Zig has.

[1] https://news.ycombinator.com/item?id=29991439

pcwalton · on Jan 22, 2022

Do you have examples of this actually causing security problems in Rust code at anywhere near the levels of memory safety problems in C/C++?

It strikes me as extremely dubious to tout Zig, which doesn't have memory safety at all, as somehow superior to Rust because Zig has this mitigation enabled by default (with a large performance cost) for an error class that Rust forestalls the vast majority of the negative consequences of, by dint of memory safety.

jorangreef · on Jan 23, 2022

> Do you have examples of this actually causing security problems in Rust

Do we need them though? I don't believe we need to go through a Heartbleed moment for Rust before we realize that checked arithmetic is just a good idea. We should be able to learn from the history of past exploits, even in other languages, not only in Rust.

I've also worked in the security industry. I've written static analysis software to detect zero-days. I've done professional bug bounty engagements. I know how to hack systems and I know from experience that checked arithmetic is important and something we should care more about as programmers.

Many programmers don't know about buffer bleeds and how dangerous are. There's a whole class of memory exploits that are much easier to pull off than a UAF. Security is about defense-in-depth. Why ship with an unsafe default?

> It strikes me as extremely dubious to tout Zig, which doesn't have memory safety at all

Comparing defaults with respect to checked arithmetic is something that programmers should be open to thinking and talking about. There's also no need to dismiss Zig's spatial memory safety as "no memory safety at all". Spatial memory safety is a pretty big win already. It rules out another class of exploits.

With respect to enabling checked arithmetic by default for safety, with an opt-out for performance at block scope level, Zig arguably has a safer default, something that I wish Rust would also adopt.

I don't agree that that's too "extremely dubious" to ask for?

> (with a large performance cost)

Not so. I'm working on a project that processes a million transactions a second.

We don't see any impact from this because the data plane is clearly delineated from the control plane. Checked arithmetic is enabled everywhere in fact because bounds checks are amortized across larger buffers. Compared to the cost of the cache misses to process the data, the costs of the arithmetic of the bounds check and the branch across the larger buffer are an order of magnitude less.

You can also opt-out at block scope level for hot loops. But we haven't needed to.

Again, projects should rather enable checked arithmetic by default for safety and then profile. Turning it off by default at the language layer with an opt-in for safety, in safe builds, just doesn't seem like the right default to me.

We can agree to disagree.

pcwalton · on Jan 23, 2022

The data clearly show a significant overhead for integer bounds checks. https://danluu.com/integer-overflow/

At scale this adds up to millions of dollars. It would be a deal breaker for switching to Rust for many organizations. That's why bounds checks in release mode are not the default.

jorangreef · on Jan 24, 2022

> At scale this adds up to millions of dollars.

Buffer bleeds like Heartbleed cost the industry hundreds of millions of dollars.

Also, to be fair, Dan's post is about checked arithmetic in hot loops, i.e. the data plane, which as I've said, "large organizations" would know to amortize by using large buffers, and by clearly delineating between data plane and control plane.

For example, why not simply disable checked arithmetic at block scope level for a hot loop? Disabling at program level by default, and then having to re-enable it everywhere that's not a hot loop, just seems like conflating data plane and control plane, and like a massive overly big hammer.

It's also dangerous, because unsafe defaults might be run by new programmers who don't understand the risks of unchecked arithmetic, and who think that Rust gives them 100% memory safety.

Also, do you think that a 5% penalty on control planes is cost-prohibitive? I don't.

Control planes are usually where "large organizations" have tons of assertions anyway, for example, AWS really like to run their control planes at constant max load, regardless of actual load, to avoid cascading failure. That costs them millions of dollars, but relative to the hundreds of millions of dollars that their data planes cost, it's worth it, because it saves outages and failures that could easily dwarf the 5% performance gains at the expense of safety.

Safety becomes much more critical at large scale in fact, more so than performance. Better to be correct first, and then fast. Than fast, but not correct.

saagarjha · on Jan 22, 2022

I’m generally in the favor of checked arithmetic by default, yes. But I will give Rust credit in that it makes this easy to enable and also provides access to the various other kinds of arithmetic readily.

adgjlsfhk1 · on Jan 22, 2022

I don't think integer overflow is an especially common source of buffer overflow. This isn't based on any hard data, but I'm pretty sure that the 2 main types of buffer overflow come from

1. not doing the bounds check. 2. not storing the the bounds with the array.

sundarurfriend · on Jan 22, 2022

The substance of the article is overshadowed by its unfortunate tone.

> But wait! The C++ apologists are still talking! What are they saying? How have they not been completely flummoxed?

is just one small sample. I came out of the article liking Rust a little bit less than when I went in (irrational, I know, but true).

The quote from The Big Lebowski comes to mind: you're not wrong, author, ...

patrick451 · on Jan 22, 2022

Yup. The rust evangelists have completely turned me off from rust.

pyjarrett · on Jan 22, 2022

> I learned that the issue was in framework code – code written by my boss’s boss. The code was untested, and written extremely poorly, and had rotted, so that it didn’t work at all.

One unaddressed issue in Rust is that this could easily happen with a crate, and how hard diagnosing it could be, especially due to implicit behavior via procedural and attribute macros. Also, just because your code is safe, there could still be an unsafe block at the end of any long safe call chain. I haven't been able to reconcile to myself how this isn't just an illusion of overall safety.

oconnor663 · on Jan 29, 2022

> I haven't been able to reconcile to myself how this isn't just an illusion of overall safety.

Experienced Rust writers try to be very clear about this, but it's a subtle point that's hard to fit into an elevator pitch for the language.

Safety in Rust is an encapsulation mechanism, and it's closely related to privacy. In fact, we can use privacy as a good metaphor. Suppose we have a private member variable x, in any language that supports such a thing. And let's say the design of our class is such that x should always be less than 10. Does the fact that x is private mean that we're guaranteed it will always follow that invariant? No of course not, because the public methods of our class could have bugs in them that screw up the value of x. However, we still get a useful guarantee here! We're guaranteed that x can only violate its invariants if our methods have a bug. We don't have to worry about what any specific caller is going to do, because privacy rules let us make guarantees about x solely based on our code.

Safety in Rust is similar. If Vec is buggy and unsound, then its "safe API" isn't providing much value. But if I manage to cause UB using Vec, that is a bug report for the Rust standard library, and they will fix it. Once the bug is fixed, then my safe calling code cannot cause UB using Vec, no matter how hard it tries.

These end up being very useful guarantees in practice. Many nontrivial programs can be written entirely in safe code, using only high-quality dependencies that get a lot of testing. Surprise soundness holes come up occasionally, but they're kind of like miscompliation bugs in that it's usually hard to trigger them.

adgjlsfhk1 · on Jan 22, 2022

the are 2 answers here. the first is that the crate is less likely than a c++ library to have this bug since stuff is safe by default. the second is that having a smaller set of unsafe stuff in your code base makes it easier to rule out your code as the problem.

cylon13 · on Jan 22, 2022

I don’t think it’s so much an illusion as it is just not quite as absolute as one might initially think. If you get a segfault in a rust program you know where to look.

pornel · on Jan 22, 2022

People complain about Rust using so many small crates, but it's actually its strength: those small focused single-purpose crates are easier to review, test, and fuzz.

You don't need to answer a question of "is my 1-million-line codebase safe?", but rather "does this 10-line function uphold Rust's invariants?". It may be a tricky question, but you can focus on it in isolation. The contract between crates is safe, so once you've proven the dependency upholds the contract, you can rely on all its usages being safe.

the8472 · on Jan 22, 2022

Since this is about array indexing, if you do slice.iter.for_each() instead of for-in, while-let or manual indexing then it already uses unchecked access under the hood today because internal iteration can use a counted loop and knows it doesn't need those bounds checks.

vlovich123 · on Jan 22, 2022

That’s discussed in the slices section of the article

snicker7 · on Jan 22, 2022

As for array index checking, Julia gets it right. It uses a pair of macros (@boundscheck and @inbounds), which empowers the user elide bounds checking. Moreover, it is extensible to beyond just arrays. I’ve used these macros in my own hand-rolled implementation of a ring buffer.

maxwell86 · on Jan 22, 2022

> To review, where do Rust and C++, these programming languages with their vastly different philosophies, Rust for the cautious, C++ for the fast and bold, stand? In the exact same place.

The author spent 1 page before this statement, and the whole article after it, explaining that this is not true, so the article is a big contradiction.

Rust and C++ are not "in the exact same place".

With Rust, you get bound checking by default. If, after profiling, you find that it is a performance problem somewhere, it allows you to elide it safely. In the programs I work on, 99% of the execution time of my program is spent in 1% of the code, and Rust optimizes for this situation. Instead of debugging segmentation faults due to performance optimizations that buy you nothing in 99% of the code, you can spend your time optimizing the 1% that actually makes a difference.

This is why Rust libraries are program are "so fast". Its not because of multi threading, or because rust programmers are geniuses, but rather because Rust buys these programmer time to actually optimize the code that matters, and in particular, do so without introducing new bugs.

mlindner · on Jan 22, 2022

Really good article that covers some poor arguments about C++.

fulafel · on Jan 22, 2022

There's a presumption here that checked access would cost some nr of nanoseconds per access, but this often isn't the case since predicted, not-taken branches tend to have 0 cycle latency in recent CPUs.

celeritascelery · on Jan 22, 2022

> predicted, not-taken branches tend to have 0 cycle latency in recent CPUs.

This is not the case. Due to instruction level parallelism, the throughput could be unaffected, but you will always have latency penalty. The CPU still needs to run the check (access the length and compare it to the index) and this adds latency. On top of that, it also increases code size, which can impact the instruction cache and binary size. It’s a small penalty, but it’s not 0.

fulafel · on Jan 22, 2022

Speculative execution enables continuing along the predicted branch without stopping. You do need to have the ~2 instructions to get the length test input on hand but that usually can be eaten by insn level parallelism without hurting the latency of the array operation.

errantmind · on Jan 23, 2022

Do you have any evidence of this claim? Perhaps a benchmark?

This doesn't align with any of my performance optimization experience.

fulafel · on Feb 1, 2022

I went looking, and seems I have to walk my claim back somewhat. Wide issue OoO processors hide a lot of the overhead but not all of it.

Jensson · on Jan 22, 2022

Depends, if the function is vectorizable then the cpu can do more elements at a time if it doesn't do the branch prediction work. It is true for non-vectorizable work.

fulafel · on Jan 22, 2022

In autovectorized loops, the generated code typically needs length checks (or static length proofs) to handle tails of vectors. But yes there are still cases where the cost can be measurable.

pizlonator · on Jan 22, 2022

I’m my view, Rust is a very uninspired kind of safe language. But a point on which I agree with Rust is that array accesses should be checked by default.

I think this article ignores some arguments for array bounds checks and it ignores the importance of what the default is:

- It doesn’t matter how fast or slow bounds checking is in theory. It only matters how fast it is in practice. In practice, the results are quite surprising. For example, years ago WebKit switched its Vector<> to checking bounds by default with no perf regression, though this did mean having to opt out a handful of the thousands of Vector<> instantiations. Maybe this isn’t true for everyone’s code, but the point is, you should try out bounds checking and see if it really costs you anything rather than worrying about hypothetical nanoseconds.

- If you spend X hours optimizing a program, it will on average get Y% faster. If you don’t have bounds checks in your program and your program has any kind of security story, then you will spend Z hours per year fixing security critical OOBs. I believe that if you switch to checking bounds then you will instead get Z hours/year of your life back. If you then spend those hours optimizing, then for most code, it’ll take less then a year to gain back whatever perf you lost to bounds checks by doing other kinds of optimizations. Hence, bounds checking is a kind of meta performance optimization because it lets you shift some resources away from security to optimization. Since the time you gain for optimization is a recurring win and the bounds checks are a one time cost, the bounds checks become perf-profitable over time.

- It really matters what the language does by default. C++ doesn’t check bounds by default. The most fundamental way of indexing arrays in C++ is via pointers and those don’t do any checks today. The most canonical way of accessing arrays in Rust is with a bounds check. So, I think Rust does encourage programmers to use bounds checking in a way that C++ doesn’t, and that was the right choice.

As a C++ apologist my main beef is: if bounds checks are so great then please give them to me in the language that a crapton of code is already written in rather than giving me a goofy new language with a different syntax and other shit I don’t want (like ownership and an anemic concurrency story).

rakingleaves · on Jan 22, 2022

Relatedly, this recent paper shows that many manually-removed bounds checks in Rust libraries can be re-introduced with no bottom-line perf regression, depending on the application https://dl.acm.org/doi/10.1145/3485480

robalni · on Jan 22, 2022

I don't know what people mean when they talk about "safe" or "unsafe" code. Doing something like `int a[5]; a[2] = 100;` in C is perfectly safe because there is no bug in that code. The only thing that might be unsafe about that is if you change the code because then you might create a bug. Changing code is always unsafe because you can always create bugs in any language, even Rust.

I don't think "safe" or "unsafe" can be a property of code; it can only be a property of something you do, like changing code. I think that something being "unsafe" means that there is a risk with doing it. Programming is always a risk, even if you write Rust code without using the "unsafe" keyword. You can even have arbitrary code execution bugs in Rust programs without using the "unsafe" keyword; think about bugs like SQL-injections.

All of this doesn't mean than I don't think the checks that the Rust compiler does help. They probably help many people to write less buggy code. I just think it makes no sense to call code "safe" or "unsafe".

adgjlsfhk1 · on Jan 22, 2022

This is only true if you never pass arrays to functions. Once you have a function that takes in an array and does indexing, it is possible to ask if it ever can access memory it wasn't supposed to without changing that function.

eklavya · on Jan 22, 2022

In my opinion it's pretty clear. Rust promises memory safety to me, so all code that is safe IS memory safe. I need to mark a block "unsafe" and have an understanding with the compiler that it can't ensure memory safety here and I am on my own. Makes perfect sense to me :)

robalni · on Jan 22, 2022

If we say that "safe" code is "memory-safe" code, now the question is: what is memory-safe code? If memory-safe code is code that doesn't access the memory in unintended ways, then who knows what is unintended? Only the programmer does; it's impossible to write a compiler that knows what the programmer intends.

Like if you only want to access the data inside the bounds of the array, that's one intention that the compiler will help you to check. If you never intend to access the first element in the array, that's an intention that the compiler doesn't help you to check.

So, there is no memory-safe code or language. I think the only way you could define memory-safe code is that memory-safe code can't contain any code that breaks some rules that the compiler checks for. The problem with that is that those rules could be just about anything, so that definition is pretty useless.

eklavya · on Jan 22, 2022

On the contrary, that definition is the whole point and useful if both parties (compiler and programmer) agree what they mean. It's definitely useless for philosophical musings about words and meanings and what not :D

robalni · on Jan 22, 2022

I totally agree; it's useful to have a language feature that enforces some rules that the programmer knows about. It's just useless to call it "safe". My point is just about the words people use when talking about these things :)

robalni · on Jan 22, 2022

To add some more explanation; there are layers of safety. The arrays in Rust or any other language are a layer on top of the memory pages that you get from the operating system. Just like that, you actually have bounds checks in C because the operating system has bounds checks on the memory pages that you use; the safety is just on a lower level.

Languages like Rust add a layer of safety on top of the operating system's layers. The problem is that even if you have safety on one layer, the next layer will always be unsafe, and as long as you have abstractions in your code, you will always have layers.

Let's say you build some kind of abstraction on top of Rust arrays. The compiler will do bounds checks on the arrays but your abstraction will have no checks unless you implement them. Let's say that some state of your abstraction is invalid; the compiler will not help you to check that.

Therefore you can't have a safe language, because even if one layer is perfectly safe, as soon as you add an abstraction layer, you have no safety checks on that layer. SQL injections are an example of that; even if SQL were a perfectly safe language, as soon as you add a layer on top of that (a function that builds SQL code by concatenating strings) you are back to no safety.

__s · on Jan 22, 2022

https://en.wikipedia.org/wiki/Memory_safety

Outside of your simple example C code, there exists C code which can only be memory safe if the compiler implements a heavy runtime: track pointer allocations, track where pointers source from, raise an error when the pointer is used in an undefined context. See how much work valgrind does to achieve a subset of this task

You could consider C code safe if you included a machine verifiable proof of memory safety with the code.. but that's ridiculously more effort than using Rust

In short, you're arguing semantics over the use of the word safe/unsafe when there's a clear definition Rust offers. You can argue that safe code still has bugs, but that's beside the point

robalni · on Jan 22, 2022

That wiki article seems to define memory-safe code as not containing an arbitrary list of bugs. This doesn't really make sense because even if you have code that doesn't contain those bugs, and even if you have a compiler that helps you to find those bugs, programming is always unsafe. You are not safe just because you don't use the "unsafe" keyword in Rust.

adgjlsfhk1 · on Jan 23, 2022

The list isn't arbitrary. Bugs that let you read and write different memory than you meant to are some of the easiest to exploit. If you look back at the type of bugs that make headlines, they're pretty much all memory safety issues or code injection (and the memory issues show up more often).

RedPanda250 · on Jan 22, 2022

> Most specifically, gcc and icc are much better compilers for those use cases – empirically – than is LLVM.

This is interesting. Where can I read more about this ?

SiebenHeaven · on Jan 22, 2022

Great article, I'm bookmarking it so I can point to it if and when required ;p

habibur · on Jan 22, 2022

Read only top half of the article.

Would like to add that at least in plain C, doing var[index] doesn't invoke any checked() or unchecked() access function call. It's rather compiled into assembly instructions to calculate the address where the data is expected and load it into memory in one or two lines.

saagarjha · on Jan 22, 2022

You’re taking C extremely literally, and compilers are not required to do this (for example, if the array is being iterated sequentially, multiple loads may be done at once). Similarly, in Rust a function call is not necessarily going to actually compile to a function call: it is expected that the access compiles down to something very close to what C would do. The syntax is an abstraction, rather than a normative designation to the compiler for how it should generate code.

masklinn · on Jan 22, 2022

Which is, incidentally, why you can also write `index[var]`. `[]` is really just a convenience for `*(var + index)`. In fact that's exactly how the standard defines it (or at least defined it as of C11, I've not checked more recent versions):

> A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).