Hacker News new | past | comments | ask | show | jobs | submit login
So You Want to Rust the Linux Kernel? (paulmck.livejournal.com)
240 points by lukastyrychtr 54 days ago | hide | past | favorite | 324 comments



I don't want rust in the kernel. As a small time developer making a hardware product, the investment to understand the kernel and how to build drivers is already huge. I have no interest in learning rust, and it will make it harder for people with limited resources to use linux in their projects.


> I don't want rust in the kernel. As a small time developer making a hardware product, the investment to understand the kernel and how to build drivers is already huge.

There's a crucial part missing: the investment [...] to build (memory) unsafe drivers.

I find it very troubling when memory unsafe language programmers don't weigh memory safety at all in their views.

(note that this doesn't necessarily refer to Rust; there's people who swear that one can build drivers also in Golang, so assuming that is realistic, the same reasoning applies)


Note that Rust's safety proposition is different than Go's in a way that's significant for Linux beyond the obvious.

(Safe) Rust is Data Race Free. You can mistakenly write a data race in Go, just as you could in C, and, even though under normal circumstances Go feels like it's memory safe, once you've got a data race all bets are off and you potentially no longer have memory safety. In (safe) Rust you can't write a data race, so, this can't happen.

If you write simple serial programs, you don't care, your programs never have data races because they have no concurrency and therefore can't possibly have concurrent memory writes. But obviously Linux is not a serial program.


Its also important to note that for most systems security critical memory safety is a REALLY big deal, but safety from crashes due to data races is generally not a big deal. Rust has trade-offs to accomplish this.

We proven time and time again we can't write safe c/c++ code from a security perspective. I don't think we've proven that crashing now and then to fix our data races is a bad outcome. There are always bugs to be fixed.


The consequence of a data race isn't (necessarily) limited to "crashing now and then".

The big problem is SC/DRF. Most languages promise (at best) your program is sequentially consistent if data race free. And it turns out we can't debug non-trivial programs unless they're sequentially consistent because it hurts our brains too much.

Causality is something we're all used to in the real world. The man hits the ball with the bat, then the ball's trajectory changes, because it struck the bat. Sequentially consistent programs are the same. X was five, then we doubled it, now it's ten.

But without sequential consistency that's not how your program works at all. The ball goes flying off over the wall around the field, then the pitcher throws the ball, the ball lands in the catcher's glove, the umpire calls it, the batter hits the ball, and yet at the same time misses the ball... We doubled X, now it's five, but before we doubled it that was nine or sixteen? What?


> Its also important to note that for most systems security critical memory safety is a REALLY big deal, but safety from crashes due to data races is generally not a big deal.

It must be noted, though, that data races can lead to memory safety issues in languages which are otherwise free of them, Go being a prime example of that: the built-in hashmap is not thread-safe, and does exhibit memory corruption issues under unprotected concurrent writes and accesses (only concurrent unprotected reads are OK).

There is a “best effort detection” of concurrent misuse, but it’s just that.


read the kernel docs and follow best practices. the code is reviewed by thousands of developers, so chances are what makes in into the kernel tree is pretty good. i don’t need rust. thanks!


First, I've generalized to memory-safe languages, not necessarily Rust.

Second, this is exactly the overconfidence that I find troubling. Vulnerabilities caused by memory unsafety are real, pervasive, and inevitable.

A few resources you should look at:

- the official reports/post by Mozilla and Microsoft about the percentage of vulnerabilities caused by memory unsafety

- the mailing lists with the Linux kernel issues (I don't imply any severity; just the amount)

There are many summarized posts for the lazier, and I'm talking about high profile authors (or authors writing on behalf of high profile companies). Here's a semi-random one:

https://msrc-blog.microsoft.com/2019/07/18/we-need-a-safer-s...


not everyone using linux making a device cares about vulnerabilities. there is no root password on my device and it is not networked.


Anecdotes from kernel maintainers is that memory unsafe code gets pas reviewers all the time. There was even that ethically questionable study that tried and succeed in getting vulnerabilities inserted into the kernel.


Wouldn't it make more sense to use a much more mature safety-centric language with a proper language spec like Ada?


Ada maybe isn't very well adapted to kernel code? From what I know of Ada, doesn't it get it's memory safety from either a GC or not freeing memory entirely? That is -- ownership rules are pretty new to Ada/SPARK.

C-style syntax and community interest also favor Rust.

This spec argument has always seemed like a red-herring to me. Can you explain why the Ada spec would be a significant factor in this instance?


>Ada maybe isn't very well adapted to kernel code? From what I know of Ada, doesn't it get it's memory safety from either a GC or not freeing memory entirely? That is -- ownership rules are pretty new to Ada/SPARK.

I think GC is optional with Ada, as far as I know the memory safety comes from raising exceptions (or refusing to compile) when it detects memory-unsafe operations (array bounds checking etc).

>C-style syntax and community interest also favor Rust.

That's fair, C-like languages are instantly familiar with software people and rust seems to have a more hip image than old man fuddy-duddy Ada.

>This spec argument has always seemed like a red-herring to me. Can you explain why the Ada spec would be a significant factor in this instance?

I'm not arguing for Ada over Rust (I'm not a software guy) so I don't mean it as a red herring, but wouldn't a suite of static verification tools and a formally verified compiler require a spec to be tested against?


Re: spec, not if the spec is implementation (and project's values) defined. I mean -- the Rust project has a ridiculous # of tests which define language behavior without having an ISO standard. Yes, implementation defined behavior in C is usually a place where C compiler engineers trade safety for speed. Yes, that's usually a bad trade. However, I'd look at this situation re: Rust vs. C in a different way though -- the Rust project's values are why defining a standard is less important.

I think a spec is often a red herring because a bunch of folks living in the slum of C, when asked if they would all like to move into Rust's nice 3 bedroom by the park, instead always seem to ask: Wouldn't it be better to form a committee about building us a cathedral? Ada might be better. Someone should try it, but until then I'll take Rust.


It's always fun to read peoples drive-by opinions about the only decent language IMO. Ada is highly compatible with C code. For dynamic allocations with "ownership" there is the concept of memory pools which CAN use a GC if thats how you choose to implement the pool allocator.


Haha, I'm sorry if it seemed like I ever knew what I was talking about. I thought asking questions would make it clear that I don't.

Remain interested in the potential advantages of Ada compared to Rust for Linux kernel development, if you would care to point me in the right direction.


I remember writing Ada (long ago) and my take was: it's nice to work on existing Ada code, but I thought it was tedious to write new code.

there was also a lot of unchecked conversion under the hood.

(disclaimer: I haven't really paid attention to newer versions of Ada)


Honestly if Ada hasn't caught on in the intervening 30 years (despite a lengthy government issued monopoly) then it probably isn't a good choice. A more mature Rust would be nice, but it's gravy at this point.


(caveat: I've read a bit about Ada but don't have any deep familiarity with it)

Ada is safe compared to the other languages of its day, but I'm not sure it compares favorably with Rust. IIRC it does have more of a focus on safe arithmetic rather than memory safety.


The focus of Ada is not on safe arithmetic only, it's on functional safety at large: the code does what it is specified to do and nothing else.

Ada shines in its specification power, how developers can express what the code is supposed to do (strong typing, ranges, contracts, invariants, generics, etc.). And then you can either check your code at "compile time" with SPARK [1], that provides a mathematical proof that you code follows the specification. SPARK also proves that you don't have buffer overflows or division by zero for instance. Or you can have checks inserted in the run-time code which greatly improves the benefits of testing as every deviation from specifications will be detected, not only the ones you decided to check in your tests.

In terms of memory safety, Ada always had an edge on C/C++ because of the lower usage of pointers (see parameter modes [2]) and the emphasis on stack allocation. Now with the introduction of ownership in SPARK it's getting on par with Rust on that topic.

[1] https://learn.adacore.com/courses/intro-to-spark/chapters/05... [2] https://learn.adacore.com/courses/intro-to-ada/chapters/subp...


Interesting. I wasn't aware that safety in Ada had developed to this point. Maybe I'll have to take another look at the language!


No.


To be honest, I don't think Linux being the "One True Kernel" is really worthwhile, and the benefits of adding Rust to the kernel outweigh the downsides.

There are plenty of other kernels available for embedded work, and my understanding of Rust in the Linux kernel is that it's mostly drivers, which seems like it would be optional for your use case.

That said, I've not done much embedded, I may be misunderstanding something.


Rust is a language that forces you to do things in a way that makes code more robust at the cost of a little upfront thought and inconvenience.

The parent comment is complaining that Rust will make kernel dev too hard? Are they joking? It is already complicated.

I would rather see a more formally constructed language used for the kernel than risk adding more C-related bugs.


The OP might be ok with a Kernel that was just in Rust, but having a kernel witha mix of Rust and C adds complexity, both in having to master two languages to understand it well, and a more complex build system.


exactly


What's so bad in having to know two languages for developing drivers? Especially when knowing the second one will rid you of all memory safety bugs just because your code compiles.

And to call Rust's build system is laughable. Issuing `cargo build` could not be simpler (and it includes automatic downloading of dependencies, too).


> It is already complicated

And specifically, one of the ways in which it is complicated is that it's mostly done in a language that doesn't natively enforce memory-safe constructs, forcing developers to walk a best-practices and tooling tightrope to avoid writing code that breaks.


Why the heck is this comment grey?! At least reply and explain where they are wrong?


you know what they say - something about nature creating a better fool


You mean like not incorporating formal verification theories? Yeah: fools, all of 'em.


the kernel and linux is about many important issues, not about chasing the latest language fad


That's your misconception: you believe Rust is a fad. And there's also nothing "latest" about it; it's 10 years old.


it is already complicated. so i don’t want to learn another language. i want to become BETTER at using what is there, not have a language forced onto me. i want to make my tool set smaller, not learn more tools.

also, being able to use linux in an embedded system as a resource constrained developer is hugely important today. adding more tools means more dev cost.


> i want to make my tool set smaller, not learn more tools.

Ouch. Good luck with that.


Sounds like it's time to exit programming, buddy, because the world has some very bad news for you. Yeah, you'll have to keep learning tools. It's included in the big paycheck.


There is also so much to rust that makes it nearly ideal for the prospect of writing a driver.

For example, how many bytes are in an `int`? What is the endianess of your 32bit `int`?

Rust puts up front a lot of the information crucial for cross platform development of hardware drivers.

Rust has few undefined edges which is exactly where C gets into trouble. C, by design, has a bunch of undefined edges.

Honestly, the biggest downside to Rust in the kernel is that Rust is backed by the LLVM, which doesn't have as many supported targets as GCC does. (And, AFAIK, GCC is generally the preferred compiler for the kernel).


> Honestly, the biggest downside to Rust in the kernel is that Rust is backed by the LLVM, which doesn't have as many supported targets as GCC does. (And, AFAIK, GCC is generally the preferred compiler for the kernel).

Fortunately, there are now two competing approach to make GCC a viable compiler to Rust[1], and it's moving really fast. I think the LLVM-only situation won't get in the way more than a few more years.

[1]: https://lwn.net/SubscriberLink/871283/c437c1364397e70e/


> For example, how many bytes are in an `int`?

Rust and Linux both have the same name for say the 32-bit unsigned integer type, they both call that u32. This is of course a coincidence, and not an unlikely one, but it probably doesn't hurt for Linux people getting familiar with Rust.


I personally hope that linux kernel development won’t be constricted by the needs of hardware devs…


Isn't a big chunk of a kernel's job providing a standard abstraction for hardware access?

I'd assume naively that such a construct would, of necessity, involve the needs of hardware devs.


Absolutely!

I was emphatically not suggesting that the needs of hardware devs should be ignored, just that those needs should not constrict kernel development to the detriment of other concerns.


(empathetically*)


The problem is, it isn't "needs of hardware devs" vs "rust". The kernel exists to abstract over hardware. Almost everything in the kernel* is written by "hardware devs". So any benefit Rust brings will be FOR the benefit of hardware devs.

* - There is some cryptography stuff in the kernel (that probably doesn't need to be in there anyway) and that isn't written by hardware devs I suppose.


Rust in itself is easier to understand and reason about than the equivalent kernel C code in my experience.

The actual cognitive cost does not come from Rust itself, but from mixing the two memory models and interop between them, IME and is probably a bad idea in general.

Should just boil the ocean and write the entire LK with Rust only.


If everyone had your attitude, humanity would have never left the trees. There are all sorts of legitimate reasons to take it slow on adding Rust to the kernel and adopting Rust generally. The language is far from perfect.

But "I don't want to learn something new"? No, that's not a legitimate reason.

But Christ almighty, you don't get to block progress of all humanity (and Linux is a humanity scale project now) because you're too lazy to learn some new syntax. Memory safety is important. Memory safety bugs cause billions of dollars of damage every single year. The project if solving them is too important to get tangled up in the weeds of people who don't have "resources" to RTFM.


> Many were increasingly of the opinion that they'd all made a big mistake in coming down from the trees in the first place. And some said that even the trees had been a bad move, and that no one should ever have left the oceans


If Rust would have just used a C-style syntax instead of this wacky quirky stuff that's needlessly hard to read, it would probably be much more widely adopted. I gave it a shot myself but I was too turned off by the friction of getting used to the syntax.


> If Rust would have just used a C-style syntax instead of this wacky quirky stuff that's needlessly hard to read

A lot of the "whacky, quirky" stuff is to avoid the clusterfuck that is C and C++ parsing.

People want IDEs. They also want their IDE to do smart things. Having to compile the universe to get the IDE to do something smart doesn't work very well.

Take a "simple" task like "Is this is a variable name? That is a type definition?". C and C++ have to practically do a full compile to figure that out (see: typedef and templates). Rust, on the other hand, just has to read a few tokens and it knows definitively the answer to that question.

And, you may call this stuff "quirky", but practically all the languages designed in the last 10 years have converged to similar goals and syntaxes to Rust. Language designers know that people don't like "different from C" but they do it anyway.

Think about whether your arguments are stronger than those who are designing new programming languages that they want adopted.


Intellisense and the rest of Visual Studio work perfectly with C#, which is very much C style.


Wacky, quirky stuff?


    fn sum_odd_numbers(up_to: u32) -> u32 {
why fn? why switch the order around? fn and u32 and all that as if keystrokes are such a scarce resource, yet u32 f(u32 a) saves so much more than fn f(a: u32) -> u32.

Closures use pipes instead of any of the existing syntaxes we're used to.

Why couldn't they just use the familiar concept of classes instead of whatever is going on with their strange NIH traits? NIH describes the whole language, every little thing has to be different somehow.

C with Rust's safety guarantees, OOP style classes, and a few more bells and whistles could have taken over the world a lot faster, but instead we have something with a high friction to learning that will be adopted much more slowly by either programmers with a fresh start to whom every language is equally weird, or the relative minority of experienced programmers with the will and free time to push past the friction.


Rust didn't invent any of these syntaxes. Pascal was invented around the same time as C and had functions that start with "function" and types that use the "binding: type" notation. And lots of languages (Ada and ML come to mind) followed in Pascal's footsteps that predate Rust.

It has benefits. It's easier to parse for both humans and machines and it allows for easier type inference.

>Closures use pipes instead of any of the existing syntaxes we're used to.

Closures in Ruby use pipes. It's a common syntax.

>Why couldn't they just use the familiar concept of classes instead of whatever is going on with their strange NIH traits? NIH describes the whole language, every little thing has to be different somehow.

Needless to say, traits aren't NIH either, and there's good reasons for avoiding class-style polymorphism in a language like Rust.

Here's some reading to help explain. https://stevedonovan.github.io/rust-gentle-intro/object-orie...

It also happens to include this choice quote:

>I once attended a Java user group meeting where James Gosling (Java's inventor) was the featured speaker. During the memorable Q&A session, someone asked him: "If you could do Java over again, what would you change?" "I'd leave out classes," he replied. After the laughter died down, he explained that the real problem wasn't classes per se, but rather implementation inheritance (the extends relationship). Interface inheritance (the implements relationship) is preferable. You should avoid implementation inheritance whenever possible

And that's basically what Rust gives you


And clever use of blanket implementations makes traits even nicer for this purpose.

For example, suppose you just invented the type Qux and you always know how turn any Baz into a Qux.

Rust's standard library provides four interesting conversion traits that may be applicable for different purposes, From, Into, TryFrom, and TryInto. Oh no, implementing all of them sounds like a lot of work and who knows which anybody would need?

Just implement From<Baz> for Qux

A blanket implemention of Into<U> for T when From<T> for U gets you Into, then a blanket implementation of TryFrom<T> for U when Into<U> for T gives you TryFrom, and finally a blanket implementation of TryInto<U> for T when TryFrom<T> for U gets you TryInto as well.

So you only wrote one implementation but anybody who needs any of these four conversions gets the one they needed.

Another more obvious example is the blanket implementation of IntoIterator for any Iterator. This makes it easy when you want to be passed a bunch of Stuff you're going to iterate over, just give a trait bound on your parameter saying it has to be IntoIterator for Stuff. You needn't care if the user of your function passes you an array, a container type, or an iterator, since all of them are IntoIterator.

It would have been so easy to overlook this, and end up with a language where people find themselves collecting up iterators into containers over and over to make more iterators, or else constantly turning containers into iterators to call functions.


That's just enable_if from C++, not some amazing new Rust invention.


First up let's be absolutely clear - there aren't many (any?) "amazing new Rust inventions". Rust isn't a proof of concept, it's the mass market product and so it just stole clever ideas wholesale from languages we don't care about, and in some cases haven't even heard of. The various papers saying C++ should do things that Rust did aren't because Rust invented those things but because it has popularized them and now people writing C++ papers want them.

It is true that SFINAE means you can write very general templates in C++ and just not worry about whether they compile with any particular parameters, which is superficially similar to a blanket implementation of a Rust trait. But, because Rust's traits have semantics not just syntax the actual effect is different.

And that makes the blanket implementations very comfortable in Rust, whereas the equivalent pile of enable_if templates in C++ would be trouble.

Remember C++ comes with template language that is explicitly labelled as having unenforced semantic value. It's for documentation, a human reading them can see for example that you expect this parameter type to have full equivalence. However the compiler only cares about syntax, and the syntax just says the type has an equals operator.

This is a great contrast with Rust, whose traits have semantics and so core::cmp::Eq and core::cmp::PartialEq are different traits and the compiler cares which one you asked for even though the syntax is the same.


> But, because Rust's traits have semantics not just syntax the actual effect is different.

Don't concepts give C++ the same semantic power? What am I missing? Concepts give C++ generic code exactly the same nominative constraints you're describing, yes?

Besides: even without concepts, you can use tag types and traits to similar effect. Not everything needs to be duck-typed, even in C++17!

One of my main problems with using Rust instead of C++ is losing metaprogramming power. I really like the template system.


> Don't concepts give C++ the same semantic power?

Not really. But, it's on purpose, and, in their context it makes sense. These aren't idiots, they know there are however many bajillion lines of C++ code and if Concepts doesn't work with that code it's useless.

In Rust, you have to explicitly write implementations of traits. If the author of stupid::Bicycle doesn't implement nasty::Smells and the author of nasty::Smells didn't implement it for stupid::Bicycle either, then a stupid::Bicycle just isn't nasty::Smells.

Even "marker traits" where the implementation is empty, are still explicit. If you have a type that implements Clone but you didn't say it is Copy, then it isn't Copy, even though maybe the type would qualify as Copy if you had asked. Rust will always move this type, because you didn't say copying it was OK.

This is where the semantic strength comes from, and it was fine in Rust because Rust "always" did this. There isn't a bajillion lines of Rust code that doesn't know about traits, and so it's OK to require this.

In C++ they realistically couldn't do that. So, next best thing is to say that a Concept is modelled on syntax. You mentioned duck-typing, and that's what concepts are assumed to be for.

Now, of course you can use tag types, for a home grown Concept it actually might be viable, if you've only got two users and they both know you personally you can just tell them to use the tag type and they'll both add it. But for the C++ standard library Concepts that was not practical. So, that's not how Concepts is being taught or has been used in practice, unlike Rust's traits.


> Closures use pipes instead of any of the existing syntaxes we're used to.

Who is “we”? Pipes are one of the existing closure syntaxes I was used to pre-Rust. And, I mean, I’m probably not alone: lots of current and ex-Rubyists around.


I don't think rust is particularly weird. If you want, you can pretty much write in an imperative style, with objects, etc. It's just got some practical thinking thrown on ontop of that: i.e. make everything immutable by default, use tooling to ensure people aren't holding on to dangling pointers, etc.

Some of these things have tradeoffs, but I think once you get to grips with what the borrow checker is asking of you (doesn't take long), it's a pretty easy language - certainly simpler than C[0], far simpler than C++.

[0]: Non-buggy C, that is. I wrote a pretty basic program in C the other day (I'm not a C programmer) and I literally spent 80% of the time looking at output from -fsanitize=address to catch stupid off-by-one errors.


None of this stuff is NIH, it's just from a different lineage than C.


Tell me you have basically no understanding of the subject matter without telling me you have basically no understanding of the subject matter.


Interesting perspective. FWIW a lot of modern languages now have put the types on the left for a good reason, it's a lot easier to parse. Plus for a reader, at least in Rust, wherever you see a `:` you know that a type is coming after. For the other stuff, I don't know, I got used to it pretty quickly. Not that much of a big deal in my opinion


The "symbol: Type" notation is the "scientific" one for programming languages since "forever".

It's not that "modern" languages are doing something new. Everybody was doing it like that, besides the "C language family". They're the outlier, not the other way around.


Thanks for educating me. I am affected by recency bias


To answer the first paragraph (as I understand it, i'm no expert here):

Parsing! it is faster to parse and probably uses less memory too. This is not only important for compile time but also for editors with code suggestions.

C has a relative long compile time. If you look at golang, which has a goal of a fast compile time, you will see similar syntax.

PS.

Note that Ken is one of the fathers of both C and golang.


> Parsing speed

Parsing is ridiculously fast nowadays. Compilers spend far more time in phases after AST construction (like register allocation, SSA transforms, various vectorizaton passes) than they do constructing their ASTs. Parsing speed is a nonsense argument these days. It's the same level of rigor as painting a car red to go faster.

> If you look at golang, which has a goal of a fast compile time, you will see similar syntax.

Rust critics: "Some of Rust's language choices are clearly inspired by Go and may not have been good ideas"

Rust people: "We didn't copy Go! We independently through a rigorous analysis arrived at our syntax!"

Also Rust people: "If you look at golang, which has a goal of a fast compile time, you will see similar syntax."

> Note that Ken is one of the fathers of both C and golang.

Creating one of these things makes Ken made imperfect. Creating both of these things makes Ken a menace.


I have extensive experience in OOP. Classes are a bad way to achieve polymorphism, and their popularity is an indictment on our profession.


There are good reasons for it, and I'm certain you will gain a better understanding as you gain more experience as a developer.


> There are good reasons for it, and I'm certain you will gain a better understanding as you gain more experience as a developer.

The people pointing out problems with Rust have more experience as developers in their little fingers than Rust's internet fan club has in their whole bodies combined.

In a mature, intelligent discussion, if you have a point, you make it. On HN, if you have a point, you say "there are reasons; I'm not going to give them; also, you're a newbie".

It's comments like yours that make HN unbearable nowadays.


I agree that Rust differs needlessly from previously curly-brace languages. Rust could have achieved its core goals --- memory safety and robust concurrency --- as a dialect of C++ instead of as a whole new language with the latest fashion sense (cough Go *cough) thrown in.

It's also a shame that your comment is being downvoted into oblivion. I remember when HN was for discussion, not for karma-enforced orthodoxy.


> could have achieved its core goals --- memory safety and robust concurrency --- as a dialect of C++ instead

The C++ Standard committee disagree with you - and not for lack of trying, either. The C++ Core Guidelines are the best they could come up with, and these are not rigorously enforced unlike the safe subset of Rust.


Are you talking about [1]? All that paper says is that you can't add a borrow checker to C++ as a library feature. You can add such a thing as a language feature.

I hate that paper. People misrepresent it all the time.

[1] https://docs.google.com/document/d/e/2PACX-1vSt2VB1zQAJ6JDMa...



What do you mean? The core guidelines weren't an attempt to add lifetimes to C++.


Several aspects of it pertain to lifetimes https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines...


Nevertheless, the core guideline were not an attempt to add lifetimes to C++ at a syntactic level. AFAIK, that's never been tried, and I think the effort would be a valuable complement to Rust.


> Given that DEC Alpha famously had difficulty with RCU, it is only reasonable to ask how Rust will do with it.

> Including some wild speculation about how Rust's ownership model might be generalized

I recommend starting with the latests post in the series and the conclusions(https://paulmck.livejournal.com/64209.html and https://paulmck.livejournal.com/65056.html).

From the snippets at the beginning, it looks like the author documented how they started learning about how to solve these problems in Rust, and it does not take too long until the comment section starts pointing the author at crossbeam-rs, GhostCell, etc. which only start appearing in the series in the last couple of sections.

Learning Rust by implementing a linked-list is pretty hard, but "Learning Rust by implementing RCU" has to be, by far, the hardest way to learn Rust that I know of. Kudos to the author for pushing through this so quickly and thoroguly. The posts are all excellent.

I wish the author would go a little bit more into crossbeam-rs. One thing the author does not consider, is that a safe Rust interface to RCU is essentially a "proof of correctness" for RCU, i.e., it proves that using RCU via that interface cannot introduce undefined behavior. (Or maybe more technically, the contracts on the correctness of unsafe code within the safe wrapper provide a potentially incorrect "proof" of this).

AFAIK no such correctness proof for the C APIs exists (there are a couple of linters that catch some issues, but there is no proof that say that if you stick by those warnings your code is UB free).

The author seems to be assuming that the C APIs to RCU have been proved correct, which AFAIK is incorrect, and by ignoring the safe Rust APIs to RCU, the author is missing out on potentially finding issues on the correctness of the C APIs that Rust would discover.

Basically, the question: "Can we expose Linux RCU apis to Rust?" is definetly worth asking, but the question "Can Rust prove Linux RCU APIs correct?" is much more interesting. If it can't, then why not? The fact that we have such proofs for other RCU and hazard pointers APIs in Rust shows that this is at least possible in general. And if Linux RCU can't be exposed in safe Rust, precisely understanding why would definitely be illuminating (Can Linux RCU be proven correct outside Rust, but not in Rust? Or maybe does Linux RCU APIs can actually be proven incorrect by Rust?).


> that a safe Rust interface to RCU is essentially a "proof of correctness" for RCU, i.e., it proves that using RCU via that interface cannot introduce undefined behavior. (Or maybe more technically, the contracts on the correctness of unsafe code within the safe wrapper provide a potentially incorrect "proof" of this).

Adding an unsafe block around a call to C code doesn't prove anything. Rust does not provide proofs of the code you write in it, except if you do not use unsafe at all.

RCU is an API and a general technique. It can be implemented a ton of different ways (including, as the author noted, as a single instruction if Linux is built with pre-emption disabled). You must not conflate the the soundness of an API with the soundness of its implementation. 'Can Rust prove Linux RCU APIs correct?' is nearly completely without meaning.

A similarly meaningless project would be to prove that the idea of malloc/free is correct. There's nothing that could even be incorrect or correct about it, and you don't need Rust wrappers to tell you that. Now prove glibc's malloc is correct; very different question.


> Adding an unsafe block around a call to C code doesn't prove anything.

I didn't say otherwise.

What I said is that writing unsafe code is writing a proof that the code is correct.

If the proof is incorrect, the behavior is undefined, and Rust makes no guarantees.

unsafe literally means "I've proven this code correct".

As mentioned, most uses of unsafe in the standard library are accompanied by a comment that documents the proof.


`unsafe { ... }` means "I hope you are convinced by my arguments in the surrounding comments, or better still in my POPL21 paper, that this is correct". Other than that, not much. The keyword itself has essentially zero relationship with the existence of a formal proof. You are talking about proof in a very loose sense.

The narrower point I think you're making is that hopefully attempting to create a safe API around the RCU pattern makes someone think about RCU's correctness even harder. I'm not sure it will produce much, given how much attention RCU has already received in the past 20 years. Any innovation you get from that process will probably be concentrated in describing/encoding the ownership pattern in Rust terms, since that's non-obvious. But the epoch GC implemented in Crossbeam is pretty similar overall to RCU, so it has much to build on in that regard.


> `unsafe { ... }` means "I hope you are convinced by my arguments in the surrounding comments, or better still in my POPL21 paper, that this is correct".

I disagree here (and that's ok).

The unsafe keyword tells Rust "this code is sound", i.e., there exist no program execution for which this code will cause safe Rust to exhibit undefined behavior.

Rust will parse it, type check it, compile it, link it, etc. "as-if" it was sound.

And Rust only makes any guarantees about what your program does if the unsafe code was actually sound.

From the point of view of the compiler, the unsafe block itself is a user-provided proof that whatever that code does is sound.

If you write: unsafe { *ptr } that's a proof that dereferencing the pointer is sound, therefore the pointer is non-null, it points to an aligned allocation able to hold a type, the value at that memory location is a value of that type (i.e. the pointer is dereferenceable), etc.

The Rust compiler accepts your proof and generates code as if your proof is correct. In particular, it will compile code around that code under that assumption, it will eliminate null pointer checks after that code, etc.

If the proof that an unsafe block provides is incorrect, then the behavior is undefined, and Rust makes no guarantees.

I agree with you that given some unsafe code, actually proving it correct is not something that the compiler does, and that creating a tool that does this is hard. If it were easy, the compiler would already do it, and we wouldn't need unsafe at all.

Unsafe exists because there are things that the compiler can't prove correct, and this escape hatch allows the user to provide a proof that the compiler can integrate into how it compiles the program.

There is a whole repository of Iris proofs for unsafe blocks in the Rust standard library, for which we have formal proofs. But these are not integrated in any way with the Rust code one writes. They have been written by humans aiming to get a confirmation that the proof expressed by the unsafe blocks are indeed correct. Some were not, and had to be fixed, and pretty much every fix has generated a paper.


I think you are not using 'proof' in the way that most people would understand it. A proof is usually considered a series of steps someone can take to verify that something is true themselves (i.e. there is no need to trust whoever produced the proof is malicious or mistaken about what they are trying to prove, if you can follow the proof you can see for yourself). Proving something to a compiler therefore is generally understood to mean giving something to the compiler which it actually checks it sound (even if it could not produce the proof on its own). This is effectively how type-checking and borrow checking works (your program is a kind of proof the compiler verifies). 'unsafe' in this case is not a proof, it is more like a promise (not in the async sense) or assertion (not in the debug sense). The compiler trusts that you have not made a mistake, it does not have any way to verify that you haven't. You may have a proof to the person reading the code that there is no mistake (or you may not, either because your proof is incorrect or the person cannot understand the proof), but it is not a proof to the compiler.

(In fact, there was much bikeshedding at one point about renaming the 'unsafe' keyword to make this explicit: suggestioned included 'assumed_safe', 'trustme', 'trusted', as well as less serious ones like 'yolo' and 'hold_my_beer_and_watch_this': https://github.com/rust-lang/rfcs/pull/117)


> I think you are not using 'proof' in the way that most people would understand it.

I have corrected many math exams of students over the years.

A "significant" amount of the "proofs" provided by the students in those exams were wrong in one way or another.

Many students continued the exams under the assumption that what they just incorrectly proved, was actually correct.

They had no way to verify whether they proofs were correct or not. No theorem provers were allowed in the exam.

Yet all of them called what they wrote "a proof".

---

This is pretty similar to how Rust works.

Without unsafe, Rust proves your programs sound for you.

Unsafe let's you supply your own soundness proofs for parts of the program that the compiler does not know how to prove (or disprove).

If the proofs are not correct, RUst makes no guarantees.

The claim that because "Rust does not prove these for you, they are not proofs" is unfair.

That's like claiming that any "proof" that has not been proved in a theorem prover does not deserve to be called "a proof".

I respect your way to see this. I just don't see it this way.


> I agree [...] actually proving it correct is not something that the compiler does

Yes. Unsafe blocks are bare assertions, no more. Everything you described about what happens in Rustc when you use it is better described as making a bare assertion to the compiler. Everything beyond that is socially constructed. Saying it's a proof sounds even more silly when you recall that Rust programmers treat it as quite the opposite -- radioactive markers of places where the program might cause UB.

You were trying to characterise it as a proof because you're claiming that "Can Rust prove Linux RCU APIs correct?" is an interesting question. But you are actually asking "Will Rust compile RCU code if we assert to it that using the RCU API is safe?", and the answer is unconditionally yes. It is not an interesting question at all. Your misuse of the word proof led you there, so I am suggesting you stop calling the unsafe keyword a proof.

In the broader scheme of things, yes, the way in which Rust encourages you to isolate unsafety is helpful, and so if you wanted something formally verified Rust is a good language for that. That doesn't really translate to RCU, because the hypothetical Rust API would not be an implementation of RCU. All the Iris proofs that have come out of the RustBelt project are bad examples here, because they were actual implementations of things, not APIs around black box implementations that change under your feet when you set kconfigs.

As a final thought, here is a supposedly safe-encapsulated Rust API for RCU, I would invite anyone to suggest how writing down this very simple API or similar + a few calls to extern "C" wrappers for the RCU macros could possibly help "prove Linux RCU APIs correct", whatever that means: https://docs.rs/rcu-clean/0.1.6/rcu_clean/struct.ArcRcu.html

(Little edit: if you're interested in what circumstances will lead Rust wrappers of C code to elucidate safety, look into rlua, a project to create a safe wrapper for Lua (https://github.com/amethyst/rlua). Arguably the main outcome is that we now know that the Lua API cannot be wrapped safely with zero cost, in no small part due to the use of longjmp. Some of RCU's APIs require you do not sleep or block in any way while using it, lest you wake up with the GC train having left the station. For this reason I imagine you won't be able to encode all the usage requirements in the Rust type system, let alone make an actual safe zero-cost API, let alone turn that into a proof of anything.)


What makes you think this kind of proof work hasn’t already been done for this core C code in the kernel?


I've look at the literature every couple of years, and to the best of my knowledge, there is no paper proving that a subset of the Linux kernel RCU apis are sound.

If you are aware of one, I am intrinsically interested in this topic.


Note: If someone wants to learn Rust by implementing linked lists, then https://rust-unofficial.github.io/too-many-lists/ is a must-read :)


Arguably, that's just too hardcore.

You can implement doubly-linked lists in safe Rust by using Rc and Weak, Option, and RefCell (or Arc and Mutex for a safe concurrent doubly-linked list).

Learning about safe Rust is gentler and more useful for a beginner, than starting to learn Rust by learning about "unsafe code".

You have to know what safe Rust allows so that you can appropriately know to which contract unsafe Rust must adhere to.


> You can implement doubly-linked lists in safe Rust by using Rc and Weak, Option, and RefCell (or Arc and Mutex for a safe concurrent doubly-linked list).

You can, but not without introducing runtime overhead relative to C and C++, e.g., by forcing reference-counting where none would be necessary in C.

Rust's safety checks do in fact block some safe and zero-overhead abstractions familiar to people working in C and C++, and denying that isn't helpful. Stating that these techniques can be implemented in Rust with overhead is missing the point.

Arguing that the added overhead is "minimal" (which is a meaningless word, since the word "minimal" just means "I don't care about your scenario") is still missing the point: it's overhead that's not present in unsafe languages.

The reason people use languages like C++ and Rust is to get zero cost abstractions and explicit control over the machine. If you want good performance but don't care about precise control over the machine, write Java or C# or something high level like that. You'll be just as safe as you would be in Rust and more productive.

I'm really tired of this motte and bailey stuff. Rust proponents say Rust gives you fine grained machine control and safety with no performance compromises, and then when you point out that the language doesn't quite live up to that promise, Rust proponents start telling you that you didn't really need that performance anyway. This argumentative tic is annoying.


> You can, but not without introducing runtime overhead relative to C and C++, e.g., by forcing reference-counting where none would be necessary in C.

The runtime overhead is minimal. 96% of it comes from the pointer indirection in the linked list, and this you have either way.

Last time I benchmarked, the checks made the list 4% slower, which is acceptable for many, since doubly-linked lists are already very slow.

This 4% is buying you a correctness proof that your list API cannot introduce undefined behavior. If you write this doubly-linked list abstraction in safe Rust, and make a mistake, the resulting code will still not have undefined behavior.

C and C++, even paying this 4% performance cost, do not provide this guarantee, which is valuable to many.

Unsafe code that removes this 4% runtime overhead is an optimization.

The fact that you can remove this overhead by providing a safe wrapper over unsafe code is a selling point of Rust, but learning how to write such optimized code without learning first what it takes to make it correct IMO completely misses the point of Rust.

It is very easy to write broken safe Rust abstractions over broken unsafe code. At that point, you are in a worse place than C or C++, since you are paying many costs for introducing Rust in a project, but leaving the most valuable feature off the table (correctness, lack of segfaults, lack of data-races, etc.).


Reference counting also adds per-element space overhead.

Also linked lists are often [1] used in C and C++ for intrusive chaining of nodes otherwise owned by other data structures and the reference counting would either impose overhead to those other datastrucutres or be pointless.

[1] In fact I would say this is very common at least in C++ as linked lists make otherwise very poor datastructures by themselves.


I know. So? How does that change anything?

A doubly-linked list is already space inefficient, it adds 2 pointers per element.

Using Rc + Weak adds one extra word per element to store the two (strong and weak) refcounts. So for a 64-bit type, a Vec takes 1x space 1xN, a list takes 3xN, and a ref-counted list 4xN.

Performance wise, the extra space doesn't change anything, since that word is allocated with the element in the same cache line and would have been read anyways (even if you don't access it, the hardware does access it).

If you are using a doubly-linked list, you really don't care that much about any of this, and the only thing that makes a real difference, is the guarantee that your list implementation is correct.


The additional refcount takes space that could be used for the payload that now either needs to spill into the next cacheline or needs to be compressed further.

You can still use double linked lists in high performance code, as long as the list traversal doesn't happen on the critical path (or at all).


> The additional refcount takes space that could be used for the payload that now either needs to spill into the next cacheline or needs to be compressed further.

It could, but since the cachelines are adjacents the prefetcher will pull subsequent cache lines anyways, and since the objects typically used in doubly-linked lists are very large (much larger than 64-bit), then this doesn't matter on any practical application that I am aware of. Our results of 4% performance increase is for the worst-case that we found. Usually is 1% or less.

If you have a real-world benchmark that shows otherwise, can you please share it?

AFAIK there does not exist any hardware for which the performance model for:

- non-intrusive doubly-linked lists,

- typical object sizes used on these lists (>> 64-bit)

- typical operations done on these lists (splice, etc.)

would predict a performance difference, and in our case, having replaced unsafe lists with safe list in many large applications, we never were able to measure a difference in application performance, only in the synthetic worst-case micro-benchmarks.

So I am truly interested to learn about your use case.


You're ignoring the case of a list being densely packed and embedded inside another data structure. If you bloat a data structure with a useless machine word, there are all sorts of ways for that machine word to hurt the resulting program, and you can't just handwave that away.

The Linux kernel is made by people who spend weeks to shave one bit off the size of a data structure. Do you really think they'd respond positively to your saying they don't need to do that?

You really need to stop assuming what kind of computers other people are programming

The larger point here is you don't get to decide whether other people's use cases are legitimate. Either Rust gives you complete control of the machine or it does not. It does not, not in safe mode, but Rust people keep doing this annoying motte and bailey thing about it.


> You're ignoring the case of a list being densely packed and embedded inside another data structure.

No, I am not. This conversation is exclusively about non-intrusive doubly-linked lists.

If you want to have a conversation about intrusive doubly-linked lists we can have it, but it is a very different data-structure.

> You really need to stop assuming what kind of computers other people are programming

Every week I touch x86, arm, ppc64, riscv and gpu assembly.

I write code for apps daily that run from 16-bit micro controllers to the largest super computers in the world, going through phones, desktops, cloud, etc.

I am not assuming what others are programming, but rather talking about what I program for, which include most hardware that anyone can buy today, and quite a bit of hardware that almost no one can even buy, as well as hardware that's not for sale.


This conversation started where folks were talking about where to begin learning Rust. Beginning by learning unsafe{} implementations of a doubly-linked list to save 1-4% of the performance isn't where I would recommend anyone begin learning.

If they do need to shave a single bit off, they can do that, that's a neat feature too.


If you are concerned about high performance/space overhead/cache awareness you wouldn't want to use a linked list in the first place, instead you'd use an arena.


An arena is not a replacement for linked lists. In some cases you can use it as a replacement, but in many cases you can't. So can the idiomatic rust apologists please stop with this? I know you love it the language and wants more people to use it, but these sort of statements doesn't help your case.

"Performance is almost the same in real world cases"

"You don't really need that data structure anyway"

"You can learn how to optimize code later"

People come and want to use Rust for some fast code. They know what they want the code to do and want it to be as fast as the code they wrote before, and see how Rust would help them achieve that. The above statements are then wrong or patronizing or completely misses the point.


> The above statements are then wrong or patronizing or completely misses the point.

I think more what's more going on is that longtime Rust programmers can get frustrated by the sheer volume of people who have never used Rust but dismiss it because they hypothesize that they will need to regularly write things like linked lists in Rust just because they are the easiest data structure to write in C.

When in reality let alone the amount of high performance data structures in the standard library and crate ecosystem mean it's extremely unlikely they'll even feel to write any data structures in the first place. Programming Rust and C or C++ is a very different experience in numerous ways and it is unproductive to expect that not to be the case.


But C++ programmers have ergonomic vectors in the standard library. There is no reason to implement data structures there unless you need them for performance reasons.

And correct use of linked lists is not uncommon at all. Binary trees are linked lists for example, would you implement binary trees as a vector pointing to vectors rather than a linked list containing pointers to similar linked lists?


Yes, I would avoid implementing my own binary tree using linked lists (unless it was just an educational exercise). But also yes, if I did decide that I had a production use case for creating my own binary tree implementation, I would be very likely to look for approaches that use contiguous rather than linked data structures.

I think this perspective you're espousing is "sometimes you just want to knock out a simple bespoke binary tree implementation", but I just don't think that use case makes sense. Either you want to use one from a good library or you have a really good reason to roll your own and it is worth the effort to do it well.


> I think this perspective you're espousing is "sometimes you just want to knock out a simple bespoke binary tree implementation", but I just don't think that use case makes sense. Either you want to use one from a good library or you have a really good reason to roll your own and it is worth the effort to do it well.

There is no library for generic tree data structures, you have to implement those yourself. There are libraries implementing a table interface using ordered binary trees but that is just one of many use cases.


Coincidentally I was just watching this talk from Bryan Cantrill: https://youtu.be/vvZA9n3e5pc. Around the 58 minute mark he shares an anecdote that is relevant to this thread. It boils down to him doing something where he would have written a simple bespoke balanced binary tree in C, but when writing a reimplementation in Rust used a "naive" library approach and was surprised to see a 35% speedup, which boiled down to the library using a btree instead of a balanced binary tree.

This doesn't invalidate your point about there being use cases for binary trees that may necessitate you to write your own, I take you at your word that you've often had strong use cases for this, but I do think it illustrates something I believe to be true, which is that people reach for linked data structures too often, when contiguous ones are a better default.


No, I probably would use the BTreeMap in std instead, assuming I want a search tree. Which is even better, because the primary reason you use binary trees in C is that they are easy to make intrusive. But if I want a priority queue instead there's also BinaryHeap. If I wanted to store data on the branch nodes I might use intrusive_collections. Or maybe if I was dealing with a more complex graph I'd pull in a library like petgraph. If all that fails I could still arena-allocate the nodes to get around the lifetime issues. If I really absolutely wanted to, which I don't see happening any time soon, personally. I'd need a more concrete example there.


I said binary tree, not "Map interface implemented using an ordered binary tree". They are not the same thing.


I am aware, as you might be able to gather by reading past the first sentence of my comment. Of course if you'd just said what you needed the binary tree for, I could have given you a better answer.

If you don't have a use case in mind, I'd suggest just trying to actually use Rust for something instead of spending time trying to construct problems where Rust would be bad at the solution.


You can't even explain how doing something as simple as a binary tree with parent pointer is possible in rust, so my assumption is that it isn't possible and I'd have to make work arounds. I can make work arounds just fine in C++, you don't have to teach me those, but I don't see why I'd learn a language where I'd be forced to make those work arounds rather than use the version that fits my problem best.

It might be possible that you can write nice versions etc, if so I'd like to see them, but it seems everyone is hell bent on just saying "You shouldn't do it!".


> You can't even explain how doing something as simple as a binary tree with parent pointer is possible in rust, so my assumption is that it isn't possible

A beginner learns this in the first week:

    struct Node<'a, 'b> {
        left: Option<&'a Node>,
        right: Option<&'b Node>
    }
Many people don't know how to read or write, but you don't hear them every day claiming that "therefore it must be impossible", yet here we are.

> but I don't see why I'd learn a language where I'd be forced to make those work arounds rather than use the version that fits my problem best.

> but it seems everyone is hell bent on just saying "You shouldn't do it!".

The claim was and still is "there are infinitely more effective - as in faster - ways to learn Rust than by starting with how to write doubly-linked lists".

This whole thread is you failing at 3rd grader logic hard and concluding:

- therefore it is impossible to write doubly-linked lists in Rust

- therefore doubly-linked list in Rust are efficient

- therefore it is impossible to learn Rust

- ...and many other things...

I've taught Rust to a 9th year old, but I don't think I could have taught Rust to a 2 year old, so arguably, I think you are right, in that probably _FOR YOU_ it is impossible to learn Rust, you wouldn't be able to grok how to write doubly-linked lists in Rust properly, you wouldn't be able to master Rust to write efficient code in it, etc.

That's ok, don't be frustrated by it, everybody is different.

Just try to stop projecting your limitations to other people. Particularly in this forum, or when talking with Linux kernel developers, were most people are smart experienced programmers that would learn Rust well in a day and master it in a week.


The question is fair.

Why would you want to use a non-intrusive doubly-linked list?

AFAIK there is only one answer: you have very big objects and you need O(1) splice.

If you don't have very big objects, or you don't need O(1), pretty much any other data-structure in existence is going to give you much much better performance than a doubly-linked list.

For the only use case for which non-intrusive doubly-linked lists are good at, however, there exists no hardware you can buy for which you can measure a performance difference between ref-counting and not ref-counting.

This has nothing to do with Rust. The same applies to C++, or Java, or Python, or even C. You can do ref counting on any language, and all these languages run pretty much everywhere.

The only thing that Rust gives you over Java, Python, or C here is the possibility to implement a doubly-linked list in safe Rust that will perform the same but that the compiler will prove for you to be memory and thread safe.

Sure, Rust also allows you to use unsafe code, and then prove that safe and create a safe wrapper. It even does this for you and provides this in std::List. But what value does this add? It's more work, and it doesn't perform better, and why a significant number of people have argued in the past that adding std::List was a mistake in one form or another.


> Why would you want to use a non-intrusive doubly-linked list?

Or you have a list of objects with many references to parts inside the list and you want to rearrange those parts from those references at O(1) speed.

Lets say I have pointers two elements A and B in the list. I now want to say that B should come after A, I can do this easily in O(1) time with this with no overhead and all references sees this update instantly. Lets say I put integers in these so the data is embedded in the list. You have many other places referring to elements inside the list and you want those references to track the objects position.

How would you solve this using a Rust compatible data structure?

> The question is fair.

No it is not. Experienced people almost surely knows their use case better than you do.

> there exists no hardware you can buy for which you can measure a performance difference between ref-counting and not ref-counting.

Are you kidding me? Seriously, I don't see how you could believe this unless you never tried to use ref counting to replace other code and compare the performance difference.


> Lets say I have pointers two elements A and B in the list. I now want to say that B should come after A, I can do this easily in O(1) time with this with no overhead and all references sees this update instantly.

You can do this with a vector of pointers as well, at lower space complexity cost (1 pointer per element instead of two), and a lower runtime cost (1 swap of two pointers instead of swapping 4 pointers).

Such a data-structure has also other advantages, like higher cache efficiency, etc.


You can't move an element in a vector in O(1) time. Swapping isn't the same as moving, you'd have to move n elements to make room for the new one.


You mentioned changing the order of two elements in a list, such that one comes after the other.

With a vector of pointers to the elements (C++ std::vector<T*>) this can be easily done in O(1) as I explained above.

If now that I've proven you wrong, you want to change the problem to something else, feel free to state your new problem, and I'll proceed to prove you wrong again.


> You mentioned changing the order of two elements in a list, such that one comes after the other.

Putting B after A is not swapping them, it is putting the element B so it is right after A. You could technically interpret it as you did, but only a contrarian would do that. If you want more people to support rust then you should stop being a contrarian. If you want me and others to assume that the Rust community is full of hard to work with people then please continue, but that wont make people more likely to pick up rust.

If instead of behaving like you do here people would just say "Sure thing, in order to get that performance in rust you just do X and Y!" I bet people would be way more supportive of rust. But if working in the language means that people like you will come and argue like this then why not just write a C library, and then someone will write a Rust wrapper and people are happy? While if they'd write it in unsafe Rust people would come and complain like hell.


> You could technically interpret it as you did, but only a contrarian would do that.

Or you could have been more clearer.

I didn't intend any animosity, but you clearly do.

> If you want more people to support rust

I don't care whether more people support Rust or not. I am just fighting the spectacular amount of disinformation in this thread by malicious actors that have never used Rust.

> Putting B after A is not swapping them, it is putting the element B so it is right after A.

Then you should have written that instead. Take the pointer in the vector right after A, and swap it with B. That way, A is now right before B, and B is now right after A.


> Take the pointer in the vector right after A, and swap it with B. That way, A is now right before B, and B is now right after A.

This is a joke, right? I don't see how it can't be.

But in case it isn't, in almost all problems like this it is assumed that you don't want to change the order of the other elements.

For example, if you wrote a function MoveToDirectlyAfter(a, b), and it did move elements other than a or b people would say that your function contains bugs or that it doesn't do what it says.


In a real life system what is the usual use case for this?

Let's say a user reorders documents with drag and drop. What's the low-level equivalent? Could you help me come up with an algorithm that depends on this?

If write speed is important, I'd just use an in-memory LevelDB-like thing (so a log structured merge tree).

If the data is not much, then I'd optimize for code simplicity.


Before I delve more, are you talking about `rotate` ?

https://en.cppreference.com/w/cpp/algorithm/rotate

Is that what you want?


The objects don't have to be big. They have to be expensive to copy. These are related but not always identical. Objects that refer to things in the real world (or in a complex model) can be either actually non-copyable or just hard to copy.


Linked lists are not good for cache locality, it has nothing to do with the language. It'd be my advice over linked lists even if you were doing it in C. Rust does make doubly linked lists in safe code without reference counting impossible, but I don't the argument that you'd use one due to performance or memory overhead.


> Linked lists are not good for cache locality

That doesn't matter if a solution with good cache locality doesn't exist for your problem.


They are not, that's why you would use a flat layout for your primary iteration order. But for any secondary ordering/indexing you might need, if you also need fast insert/removal, chaining in a list can be an option.


> A doubly-linked list is already space inefficient, it adds 2 pointers per element.

...unless you use the XOR trick to store two pointers in one field?


> learning how to write such optimized code without learning first what it takes to make it correct IMO completely misses the point of Rust.

This culture is why rust will never replace C++ and why C++ will never replace C. You can write the same computations in C++ as in C, and in Rust as in C++, but it isn't "idiomatic" so people are really afraid to do it because it will get harshly rejected by the community.


The premise that Rust does not intend to replace C and C++ is not true. Rust is designed to mesh with existing C and C++ code bases well, which is why many large projects support writing code in Rust (Chrome, Firefox, Linux, Windows, etc.).

Your claim that you need to learn all low level details first is also not true.

There are ~20 million programmers in the world, and about ~16 million of those are Javascript programmers. Many of them don't know and don't need to know the difference between the stack and the heap.

Many of them regularly optimize their Javascript hotspots by re-writing them in Rust and compiling it to webassembly. And almost all of them benefit from Javascript libraries that do this internally.

Rust is for many of them the ramp up into lower-level systems programming, and this is one of the reasons the Rust project has so many contributors. Rust enables Javascript programmers to actually hack on the Rust compiler, Firefox, etc.

This is something that C and C++ never achieved, and one of the main reasons for the Linux kernel to want to use Rust (they want to attract more junior developers to increase the developer base and make the project more accessible).

Your tone that the large majority of programmers in the world are somehow "doing programming wrong" by learning higher-level and safe languages first, and delving into low-level details as they need to, sounds very elitist and I personally find it disgusting.


> The premise that Rust does not intend to replace C and C++ is not true.

My point was that Rust wont replace C and C++ as long as the Rust community is as it is now. And from your comments it seems like Rust isn't intended to replace C or C++, but act as a low level language for people who don't know how to write low level code, like javascript programmers.

> Your tone that the large majority of programmers in the world are somehow "doing programming wrong" by learning higher-level and safe languages first, and delving into low-level details as they need to, sounds very elitist and I personally find it disgusting.

I didn't say that everyone has to learn rust by learning how to write the low level parts. I am saying that if someone wants to learn how to write low level parts of rust rather than the safe parts because they are used to writing performance critical bits of code in C or C++ then you shouldn't discourage them from doing that. Please don't put words in my mouth.


It isn't "don't know how to write low level code" in my view, it is more "don't think it is productive to reimplementat basic data structures over and over again". All the stuff you're saying you can't write in idiomatic rust is stuff that has already been written and battle tested in libraries, which are perfectly idiomatic to use. I don't need to write my own linked list and binary tree implementations over and over again. I did that in school, I can go check the code in the libraries I'm using to confirm it makes sense to me, and then just use the library and go about doing what I'm actually trying to do.


> All the stuff you're saying you can't write in idiomatic rust is stuff that has already been written and battle tested in libraries

It is fine for you to not be able to think of any scenarios where the battle tested standard libraries wouldn't be sufficient. But as someone who wrote low level libraries running in production at Google that doesn't apply to me, for me knowing what it would take to eek out performance in Rust compared to C++ is absolutely necessary and given how hostile people are my guess is that the Rust wouldn't look pretty so there is no reason for me to even try to pick up Rust.

And you can't say that I am not your intended user, if you chase away people like me from the rust ecosystem then there is no way that Rust can replace C++ in any reasonable timeframe.

What I'd like to see is a lot of examples and such showing how to write very complex things using unsafe rust. Because without that there is no way rust can compete, there are already plenty of people who know how to write highly performant code using unsafe C++.


You are definitely (one of) the intended audiences of rust, just to dispense with any confusion there.

I don't at all think that there are not "any scenarios where the battle tested standard libraries wouldn't be sufficient". I spent a number of words on exactly that in another comment below. My point in that comment was that the use case I don't believe actually exists in production code is "I'm just gonna knock out a simple bespoke implementation of a binary tree". I'm not saying people don't do that, but they shouldn't. If what you need is simple, there is a good library for it. And if the library for it is not sufficient, then it is important enough to invest time into writing a good implementation.

Maybe here's a way to put it: simple bespoke implementations of linked data structures are easy in both c++ and rust using ref counting, slightly better but low-investment implementations are still pretty easy in c++ but are not really possible in rust, and really good implementations are possible in both languages if the investment is worthwhile for the use case. It is true that you would have to learn how to write such a thing in rust and that it would be non trivial to do so, but that is just part of learning a new programming language to an expert level.

I think one misconception in here is that the rust "wouldn't look so pretty". You could do a line for line port of your c++ code using raw pointers and declare it safe, and the two implementations would have similar aesthetics. If your c++ implementation were memory safe to begin with, your rust implementation will be as well. The same process of thinking through the invariants and implementing them correctly is necessary in both languages.

I think the pushback you get that you find off-putting is against the sometimes expressed idea that it is a big knock against rust that you can't do linked data structures with zero cost abstractions in safe rust. This is why people say either "yeah but you can do minimal cost implementations in safe rust and zero cost implementations in unsafe rust". The pushback is against the idea that this isn't a good enough solution. But it totally is, it's certainly no worse than the situation in c++ where the division between a safe and unsafe subset of the language does not exist. Unsafe rust is no different than generic c++ code.

To your last point, there's a great book / reference called The Rustonomicon that is all about writing unsafe rust code well.


A link to the aforementioned Rustonomicon: https://doc.rust-lang.org/nomicon/

(I seem to recall a comment that it hasn't yet been fully updated for recent versions.)


I just saw this and thought of you, "Learn Rust the Dangerous Way": http://cliffle.com/p/dangerust/

"LRtDW is a series of articles putting Rust features in context for low-level C programmers [...] the sort of people who work on firmware, game engines, OS kernels, and the like. Basically, people like me."

Also, it seems like the author probably also wrote some low level libraries at Google... :)


> If you want good performance but don't care about precise control over the machine, write Java or C# or something high level like that. You'll be just as safe

No, you aren't.

None of these languages catch data-races, so writing multi-threaded code is pretty much as hard and error prone as in C. Other higher-level languages like Python fix this by holding a global mutex, so that using multiple threads doesn't really buy you that much there since threads always execute sequentially to avoid data-races.

This is in contrast to Rust, which allows anybody, even people without experience in low-level programming to accelerate their applications using multiple threads, without introducing bugs.

Actually, Mozilla tried to multi-thread Firefox multiple times using C++, and failed, over and over again, because every attempt would introduce subtle bugs. Rust was created to address this issue.


Rust only catches integer overflows in debug builds, my C# programs catch it everywhere. The compiler setting is off by default but it is available.

In Rust, many libraries especially the standard one have lots of unsafe code, with a history of security bugs there [1]. In C#, the standard library is written in almost 100% safe code.

About threading, while it doesn't catch all data races automatically, C# implements many useful things on the VM level. Monitor class, or memory model guarantees, are very hard to implement in languages who compile to native code.

[1] https://shnatsel.medium.com/how-rusts-standard-library-was-v...


> None of these languages catch data-races, so writing multi-threaded code is pretty much as hard and error prone as in C.

All real world languages have facilities for various kinds of safe concurrency, e.g. actors and other kinds of message passing. When it comes to concurrency, Rust isn't anything special. It's not even that good.

Rust has one killer feature: memory safety without GC. Except for this feature, Rust is mediocre. If you don't need the memory safety without GC, you can use almost anything else and be better off.


> Rust's safety checks do in fact block some safe and zero-overhead abstractions familiar to people working in C and C++

The issue of course is that while these abstractions are zero-overhead and can sometimes be used safely, they aren't compositional in general. That is, they impose requirements on outside code which aren't easily captured by Rust's type system. This is exactly what the 'unsafe' facility in Rust was made for. Note also that more recently-developed abstractions of the "Qcell" or "GhostCell" type can in fact implement linked lists safely, and once these are better understood a variety of them will likely be included in the Rust standard library.


Doesn't GhostCell preclude deletion, and effectively grow forever?


Fully agree.

I have heard claims that Rust would be performance-equivalent so many times and each time I bothered to check it turned out to be wrong. Especially in video game AI, indirection, pointers and linked lists are common. They are usually used in the performance-critical input-to-frame part of the game and there fitting things nicely into CPU cache lines is crucial for performance.

The end result is that I now have this vague feeling that Rust is a religious cult and I shouldn't take their claims at face value.


I've also heard your claim many times, and every time I checked, replacing a doubly-linked list with something else improved performance by orders of magnitude.

Particularly in video games.

---

> The end result is that I now have this vague feeling that Rust is a religious cult and I shouldn't take their claims at face value.

Also, you realize that Rust allows writing efficient doubly-linked lists right ? And that the Rust standard library provides one for you right?

The whole argument of the OP is that people learning programming need to start doing so by learning how to write doubly linked lists because performance trumps all and therefore all children, elderly, and even university students that learn programming with javascript are lesser human beings and don't deserve our respect.


The linked list provided by the standard library is unfortunately not very useful. It doesn't provide O(1) operations on the middle of the list, making it more or less a worse VecDeque. Of course there are crates that provide better implementations but people really need to stop pointing people toward the one in std.


You can also align data to cache lines if you want to in Rust, without unsafe [1]. It is not C/C++ specific.

For linked list, you can use an unsafe implementation if that is performance critical. Unsafe implementation can be wrapped in a safe API for others to use, and this is how the standard library is implemented.

There is also a new paper about structures with internal sharing such as graphs without runtime overhead that Rc/RefCell would cause [2].

[1]: https://doc.rust-lang.org/reference/type-layout.html#the-c-r...

[2]: http://plv.mpi-sws.org/rustbelt/ghostcell/


> I now have this vague feeling that Rust is a religious cult

I'd say even if many of technical claims about Rust are true. It still seems very much like a cult.


> I now have this vague feeling that Rust is a religious cult

I had this feeling from the very beginning just by the tone of its zealots here and elsewhere. My theory is that since the language is very hard to learn but doesn't provide a lot of tangible benefits (memory safety exists in most mainstream languages fot decades), the sunk cost fallacy hits very hard. Then the only way to get some benefits for the time spent is to recruit new comers in a sort of pyramidal scheme structured around experience in the language. This way the zealots become priests and are compensated in form of social status. Of course, the whole scheme cramble if nobody joins, which is why the community is so aggressive (RIIR, vocal advocating, etc.) with the non-believers.


People like rust for a lot of different reasons; that is, different people see different tangible benefits to it, and that has resulted in both a large and fast growing community of people who like it, and (predictably) backlash to that.

On the side of people coming from other memory safe languages, the benefit is much like Go, which has also become very widely used very quickly, so it seems worth taking seriously that the existing memory safe languages were maybe not satisfying people in some way. I think the popularity of both languages in this space can be boiled down to lowering the amount of abstraction without giving up memory safety, and simplifying the toolchain. Both Go and Rust do both of those things (in my view). I prefer rust because I don't like how the Go type system requires so much repetition of implementation due to lacking generics, but both languages make some amount of sense in this niche IMO.

Other people like rust because they came from C or C++ and got tired of some of their weaknesses but did not want to give up many of their strengths. Of course whether rust requires giving up the strengths of this languages is debatable and this where most of the backlash comes from. But rust is one of a very very few languages (maybe also D and Zig?) that can plausibly make this claim at all, and I think is has become the most actively developed and attracted the largest community of those, which does matter.

For me, it's a huge advantage to be able to plausibly use the same language for everything, from kernel drivers to giant applications. The only real competitor in that space is C++, and I think its cracks show more on both extremes, that is, I think rust is both a better C replacement at the very low end (see people like Linus Torvalds and Bryan Cantrill who are C diehards who dislike C++ but are interested or all in on rust) and is much better at the high level because of easy memory safety.

So I dunno, maybe there's a cult, but there are also very straightforward reasons why a lot of people are attracted to the language.


The lack of goto also bars it from the domain of very fast interpreters. Computed gotos are a common technique used in C/C++ interpreters which aren't possible in Rust.


To be fair, the last time I checked, LLVM was doing an okay job of generating jump tables from big switches and threading the cases for best jump target prediction --- same thing you'd do with computed goto.

But yes, lack of goto and computed goto are deficiencies, not strengths.


C++ is better for performance because it’s old and tremendous effort has been put into optimizing it. Rust is a very young language. It benefits from the clang tool chain and gets some of that optimization for free, but there is a lot of room to make Rust faster.

Unlike GC’d or dynamic languages, there is nothing intrinsic about Rust that makes it a fundamentally harder language to optimize. It’s really like a modern provably safe C++.

It also depends on how you code. If you make heavy use of functional constructs and complex types your Rust code will probably not end up being optimized as well. For tight algorithms it’s good to write “thin” Rust. The exact same is true for C++. You would not want to use a lot of STL or functional stuff in a rendering pipeline core. High performance C++ looks like C.


> Rust is a very young language. It benefits from the clang tool chain and gets some of that optimization for free, but there is a lot of room to make Rust faster.

I feel that this sort of belief relies too heavily on a presumption that performance is somehow proportional to the age of a programming language, and in the process ignores the fact that a) it's already using a highly optimized toolchain that benefits from decades of research and development, and b) there is absolutely no suggestion that any potential performance gains are relevant or exclusive to the Rust's front-end.


There is always a long tail of optimizations and tweaks that must be done to get a language that last 5% of the way to parity with a language like C. Building on LLVM gets them 95% of the way.

BTW Rust has some features that make optimization a lot easier, like the ability to assume strict aliasing in more cases. It’s safety features also make it easier to write zero copy, copy on write, and other optimization patterns without making mistakes. This enables some bold performance optimizations at the code level that would require serious bravery in C.

Long term I expect that Rust will be faster in some cases.


Which optimizations is the rust compiler missing?

A lot of the optimizations required for fast code, like manual layout of custom datastructures are done by the programmers, not by the compiler. I don't doubt that rust can also express this optimizations given the similar low level control provided by the language, but what the OP and the parent are saying is that some of these are not idiomatic in rust and harder or impossible to express in the safe subset of the language.

> You would not want to use a lot of STL or functional stuff in a rendering pipeline core. High performance C++ looks like C.

FWIW that's not at all my experience.


Agreed. My favorite example of C++ vs C performance is C++’s `std::sort` which is intrinsically faster than anything similar in C because C lacks the language features (templates) to support performant code reuse like that.


TBH, while I can think of many layout optimizations that Rust does and C does not, I can't think of any that C or C++ do that Rust does not.

In particular when it comes to unions which are heavily used in low-level code, Rust does optimizations that C and C++ can't even dream of (like all the niche optimizations).


which optimizations on unions does rust do?


In C++, you have `std::optional<T>`. That's a type that either contains a `T` or contains nothing.

In C++, sizeof(optional<T>) > sizeof(T) because the discriminat has to be stored somewhere.

This is true even if, e.g., you do something like `optional<T&>`. You know that T& is a non-null pointer, and you only have two variants, and one of the variants has no state, so you technically can encode this as 0x0 is the "no reference" variant, and the != 0x0 is the reference variant, and have `optional<T&>` have the same size as `T&`.

Rust does these layout optimizations of compressing the discriminant into gaps in the values of discriminated unions automatically.

So:

    enum Option<T> {
        Some(T),
        None
    }
for `Option<T&>` has the same size as `T&` in Rust, as opposed to C++.

In C, an example would be:

    struct DU {
        enum { A, B } discriminant;
        union {
            bool A;
            bool B;
        }
    };
You could encode that into 3 bits (i.e. have sizeof(DU) == 1), but instead you'll have at least sizeof(DU) == 2, because you need one byte for the discriminant, and one byte for the payload.

It is very easy to create values with gaps in Rust, but C doesn't really support doing this.

Another optimization are alignment optimizations. In C, if you write:

   struct S {
       uint8_t a;
       uint32_t b;
       uint8_t c;
   };
that ends up being 12 bytes long. In Rust, by default, that gets reordered as uint32_t, uint8_t, uint8_t, so it only ends up being 8 bytes.

If you want that instead to be laid out like in C, you can write:

    #[repr(C)]
    struct S {
       a: u8,
       b: u32,
       c: u8
     }
and then you get the same 12 bytes as in C. There are many supported `repr(...)` options supported for algebraic data types, e.g., you can use repr(u32) for the Rust enum above to store the discriminant in a u32, and get the same layout as DU in C.


Right, c++ has no builtin discriminated union, and std::optional<T&> is actually not valid in C++ (unfortunately).

But discriminant-less custom optionals classes with 'zero' type support via traits are easy to do (and I have done it many times). You do not have to rely on the compiler identifying a safe empty state and you can define any application specific one. For example if for one specific (and common) use case strings are always non null, optional<string, non_null_trait> has an obvious implementation.

So, yes, these sorts of optimizations are not done by the compiler has it has no notion of discriminated types (one day maybe...), but can be done generically by the programmer. In fact you could in principle optimize multiple layers of variant<optional<...>,... > and collapse everything in one discriminant,as long as you do not provide reference access to the sub variants; the required metaprogram is not going to be pretty though.

One of the advantages of C++ is that it allows this sort of control.


> One of the advantages of C++ is that it allows this sort of control.

But I believe what the OP showed is that Rust also allows you to have that level of control. However, the default behavior for rust is to do the optimization that you have to go out of your way to manually implement in C++.

Rust isn't taking away control from what you can do in C++, instead, it's made the idiomatic approach one that is well optimized by the compiler.


Depending on the values in b, you could do bit-packing in C++ and munch it down to just 1 byte.


You can only do bitpacking if individual fields of a struct cannot be accessed via reference (or 'borrowed', in Rust terms). While C/C++ includes the 'register' keyword which forbids reference access, Rust does not.


There are compiler directives for packed structs in C++. struct __attribute__ ((packed))


Rust also has a packed attribute; this is something different. The re-ordering still includes padding, whereas packed removes the padding.


Rust uses LLVM so optimizations occur at the AST level. Therefore it should be able to take advantage of the same optimizations as Clang C++ which uses the same approach.


They happen at a lower level than an AST, but you're right that any front end can in theory generate the same IR and therefore get the same optimizations. The trick is if the language semantics allow for it; this bites both ways.


Would anybody use a doubly-linked list if they care about performance? Cache misses are where performance lies in modern systems and linked lists are maximally bad for this.

You can also say almost the same things you are saying here about C++. Type punning is almost always undefined behavior. There is no way to construct an array of variable size at an address that you specify without a pointer indirection. But with ASM you've got no such problems!


> Would anybody use a doubly-linked list if they care about performance?

That's probably the only reason to use them. If you don't care about performance there are simpler data-structures available.

If you need to implement an algorithm for which you need O(1) splice, then doubly-linked lists are a data-structure that give you that. If your objects are very big the cache misses might not matter that much, and neither would ref counting.

If you go one step higher, and can modify your object data types, and are careful with how you allocate your data, then intrusive doubly-linked lists can give you equivalent performance to a vector with better algorithmic complexity for many insertion / removal / splice operations, etc.

The stars do however need to align a lot for a non-intrusive doubly-linked list, like the one being discussed above, to be the best answer for whatever performance / algorithmic problem you are having.


> The stars do however need to align a lot for a non-intrusive doubly-linked list, like the one being discussed above, to be the best answer for whatever performance / algorithmic problem you are having.

That O(1) splice must also be accompanied with an iteration. Which, in my experience, is really rare. If that iteration step wasn't already a part of the splice requirement then it can often be faster to do the splice via a memcopy.

My favorite algorithmic mistake was someone at my company used a binary search to maintain sort order on a linked list. IIRC, that turns insertion into something like an O(n^(log n)) operation whereas it's O(log n) operation on an array.


> That O(1) splice must also be accompanied with an iteration.

I don't think it necessarily must, but it tends to be.

People tend to keep pointers to elements of the list all over the place, so if you have the right 3 pointers (being and end of list you want to insert, and position in another list), then you can do it in O(1).

If not, and you need to traverse the lists... as you mentioned there are other data-structures that might be much better.


With a circular list, you don't even need the extra pointers.


> I'm really tired of this motte and bailey stuff. Rust proponents say Rust gives you fine grained machine control and safety with no performance compromises, and then when you point out that the language doesn't quite live up to that promise, Rust proponents start telling you that you didn't really need that performance anyway.

Citation needed: which part of my comment says this?


I wonder if that's still relevant way to learn Rust as seems to written in 2018.


The changes since 2018 have been very minor


Yep, the core concepts did not change. It's still relevant.


RCU - Read, Copy, Update - A strategy for readers and writers, where the readers do not exclude writers (best as I can tell).


... without any sort of locking.


>AFAIK no such correctness proof for the C APIs exists

There's a whole proven correct kernel: http://sel4.systems/


SeL4 is not the Linux kernel and it doesn't use RCU.

So yes, there is a correctness proof for a completely different operating system that does not use the one thing we are talking about here.


I can't access the article at $JOB, but if I understand correctly, Rust would guarantee that usage of the RCU API si correct, but it would be silent on the API implementation itself as it would be written with unsafe (but crossbeam-rs). On the other hand the author is probably saying that the kernel RCU C API itself has been proven correct, either manually or by machine checking (directly the C code or a translation of it in a proof assistant); usages of the API might not be proven correct though.

Also note that the author of the article is the actual inventor of RCU.


> Also note that the author of the article is the actual inventor of RCU.

I know, so?

> On the other hand the author is probably saying that the kernel RCU C API itself has been proven correct,

But this is not true right?

There is no formal proof, e.g., in Coq, that proves a useful subset of the RCU C API correct (as in, if you stick to this useful subset, you can't introduce UB).

OTOH, a safe Rust API could be proved correct to use at least, and an implementation of it that uses unsafe, like the one in crossbeam-rs, is typically very amenable to theorem provers (much more than C).

So what I am saying is that the author is ignoring the value of providing a safe Rust API for RCU, like crossbeam-rs.

If the Linux RCU APIs can't be mapped to it, then maybe they are incorrect to use, and this is why we can't prove them correct.

Doing the work of mapping them, and understanding any issues, could lead to changes to those APIs that might allow us to prove them correct.


A quick search shows a bunch of papers about RCU possibly containing proofs. I don't know if there is a coq proof, but there are other dedicated tools to model lock-free algorithms, or possibly the proofs were done by hand.

Of course that's still not proving the actual C code, the translation probably still need to checked by hand.


> an implementation of it that uses unsafe, like the one in crossbeam-rs, is typically very amenable to theorem provers (much more than C)

Just wondering, how many unsafe code in commonly used libraries (standard library, tokio etc.) are proved using theorem provers?


Don't have comprehensive enough data to know, but part of the standard library was proven via the Rustbelt project, and it found some places where there were bugs, as well as some places where things could safely be loosened up.


Most of the unsafe code in libcore is proved in Iris.

Some of the unsafe code in the standard library also.

Some parts of crossbeam have pencil&paper proofs documented.



There's previous discussion on reddit https://www.reddit.com/r/rust/comments/pzo1v9/so_you_want_to... with important points


Genuine question for Rust programmers:

Memory safety without GC is a big selling point of Rust. However, sometimes your program’s ownership model does not fit with what the Rust borrow-checker can verify. Not to worry, just use “unsafe”. But if “unsafe” is used, can we still claim that Rust code is safe?


First, there's a misconception to be cleared :)

Unsafe code is not the only solution to borrow checking problems, as Rust supports reference counting, which is an implementation of shared ownership.

Whether this defeats the point of Rust is subjective, but regardless, it's a solution to at least certain ownership problems.

A very typical area where you can observe this is bidirectional trees. They're ugly in Rust! :)

Now, memory unsafety is not a black and white matter. In my experience with the community, you'll be surprised at how much pragmatism there is. Nobody will scream sacrilege if one expresses the necessity to use memory unsafe code¹.

The connection to your question is that overally, the Rust memory safety model is not all-or-nothing, rather, a spectrum of safety above the convenientional entirely-unsafe approach. With this in mind, I'd say that "98% Rust-safety" is well beyond the "100% unsafe" approach of traditional low-level languages.

As a matter of fact, I suppose that in the future, we'll see languages developing intermediate memory safety features (I think Zig was doing something like that).

[¹] The actix-web case is a very interesting one. I haven't looked at it, but my personal suspicion is that it made use of _wildly_ unsafe code, and that usafe of unsafe code was not the problem, rather that unnecessary usage was.


With actix, the author was both misusing unsafe, as well as being completely unreceptive to PRs that resolve it with safe variants, claiming it was not dangerous in its current internal state and that the safe variant would shave percents and possibly lose it its spot as an ultra fast web library.


> The actix-web case

IIRC it was mostly the initial reluctance of the author to merge PRs that rewrote unsafe to safe Rust. Some escalation, some emotions, and then everything moved forward in harmony and... with more safety.


> IIRC it was mostly the initial reluctance of the author to merge PRs that rewrote unsafe to safe Rust.

No, not really. Spinning the actix-web as the author's reluctance to accept PRs is a gross misrepresentation of the problem faced by the project.

The author even went to great strides to explain that his use of unsafe code was indeed safe.

For context, here's what the author had to say about that regrettable episode.

https://github.com/fafhrd91/actix-web-postmortem


> No, not really. Spinning the actix-web as the author's reluctance to accept PRs is a gross misrepresentation of the problem faced by the project.

No, not really. The author, a professional at Microsoft, was a hobbyist with rust who (by his confession) wished to explore the limits of performance using various techniques. People made a mountain out of the fact that his project rocketed up to the top of the TechEmpower benchmarks -- as if this obligated him to conform to their flash-mob wishes regarding sanitized code, whereas the project very explicitly made it clear that production use was at one's own risk.

What happened to him was a hit-job by a bunch of entitled bullies. No less than Steve Klabnik acknowleged this ("far, far, far over the line"[0]). The fact that ntex exists today[1] yet the world has somehow not exploded yet demonstrates how overblown the criticisms were.

[0] https://steveklabnik.com/writing/a-sad-day-for-rust

[1] https://github.com/ntex-rs/ntex


You can't boil what happened down into a black-or-white simplification. Because what happened is more complicated than that. Some people went over the line, absolutely, but that doesn't mean any criticism of the project was overblown or inappropriate. Lodging criticism against a project doesn't have to imply any sort of obligation on anyone.

There was some pretty bad misuse of unsafe in actix-web. And as I recall it, the author was resistant to fixing it, even when patches were provided. This is the author's right. It's their project. But it's anyone else's right to make this sort of thing visible, and to advocate against holding up actix-web (at the time) as a good example of what Rust programmers could deliver.

A lot of people can't or won't communicate effectively, and often blur these lines. Either assuming there is entitlement where there is none, or expressing entitlement when they didn't mean to. But there is a reasonable position underlying what happened, and unfortunately some took it too far.


As an outsider, as it was clear that the author wasn’t that interested in reviewing safety patches, wouldn’t a “safe-actix” fork maintained by someone else have made everyone happy?


> No, not really. The author, a professional at Microsoft, was a hobbyist with rust who (...)

It's very hard to believe that "a hobbyist" would, by mere chance, write the absolute best performing framework in the world.

https://www.techempower.com/benchmarks/

Your comment not only lacks credibility but also sounds extremely petty.


The problem is that actix-web (at the time, at least) was also quite benchmark-oriented -- it was doing a lot of things you don't necessarily want to do in production in order to get better benchmarks.

And that was especially true of the benchmark code itself.

TechEmpower's methodology, additionally, is not perfect. IIRC they support HTTP 1.0 only, which is not what most modern servers are focusing on, but actix-web did.


> The author even went to great strides to explain that his use of unsafe code was indeed safe.

Some places of code were actually unsound (which means: got function containing unsafe and marked safe even if it didn't upheld the invariant of the unsafe block, meaning you could trigger UB from safe code: this is a big no-go in Rust).

That being said, the whole drama with people mobbing the author on every social media, calling him name and so on, is indeed regrettable.


> The author even went to great strides to explain that his use of unsafe code was indeed safe.

That's what any author of unsafe code is supposed to do right? Explain why the unsafe code is in fact safe (enough).


It is, yes. The parent is defending them, not trying to say that they were wrong.


Copy paste from https://this-week-in-rust.org/blog/2021/10/06/this-week-in-r...

There's a common trope among people unfamiliar with rust where they assume that if you use unsafe at all, then it's just as unsafe as C and rust provided no benefit. Comparing C's approach to safety vs Rust's is like comparing an open world assumption to a closed world assumption in formal logic systems. In C, you publish your api if it's possible to use correctly (open world). In Rust, you publish a safe api if it's im possible to use in correctly (closed world). Rust's key innovation here is that it enables you to build a 'bridge' from open world (unsafe) to a closed world (safe), a seemingly impossible feat that feels like somehow pairwise reducing an uncountable infinity with a countable infinity. Rust's decision to design an analogous closed-world assumption for safe code is extremely powerful, but it seems very hard for old school C programmers to wrap their head around it.


Yes. The idea is to isolate the use of unsafe and prove that it is safe.

Because compiler cannot prove that what you're doing is safe it is your responsibility to do so. Once you've _proven_ that your unsafe code is safe, then it cannot have propagating effects in the rest of the code.

A lot of people get this confused. They think unsafe means you can do whatever you want. In reality it means you'll play by the rules even if the compiler cannot verify them all. In truth it is not as hard as it sounds, we do it in C all the time.


Should have been called „trust_me“ instead of „unsafe“.


Yes, from the compiler point of view, that's the idea, and it is right there in the manual "Trust me, I know what I’m doing.".

But the keyword "unsafe" has been used in C# for quite a while, and probably in other languages too. So by the time Rust came out, it was already a well understood concept.

In the C# manual: "the execution engine works to ensure that unsafe code cannot be executed in an untrusted environment". Still that notion of trust.

One problem I see with "trust_me" or maybe "trusted" is that it gives the opposite message from a user point of view. Trusted code is code the compiler should trust but that you should not trust.


Trust me, I know what I'm doing is exactly what every C programmer thinks when writing code.


It could, it would be accurate, but they're trying to implicitly teach this idea to "avoid these." trust_me blocks lacks that implication.

It is very easy, particularly coming from another language, to butt heads with the language's design and then try to solve it with "trust_me" blocks. Throwing everything in unsafe feels like you're doing it wrong, and correctly so.


My above comment was meant to be an illustration rather than a proposal. I think calling it "unsafe" is a good design choice as you explained.

The downside is: Every time this discussion comes up, someone has to explain what "unsafe" actually means. Every single tech talk ever that touches on "unsafe", pre-fixes the talk with this explanation too.

A more funny idea instead of "trust_me" would be "safe_because[...]" where "..." is a rationale about the code plus your social security number and biometrics data.

For any control freaks and regulators out there: If you take the above seriously and try to implement such a thing you will be burnt by the stake by cyber elves.


A less extreme variation would be "unsafe_because "X" " where X must be a string of length at least 50.

You want to do something that the compiler can't understand but you know how it works? Ok, write down _why_ it works so others can know that, too.

Would it be possible to write the above as a Rust macro, and mark invocations of the ordinary "unsafe" operator as a warning / linter alert?

PS: I actually wrapped most F# unsafe (= partial) functions in our codebase in such a way. `Option.get x`, which extracts a possibly-null value and crasheswise, is deprecated and replaced by `Option.getOrFail fmt args x`, where you say why you are sure the value can't be null and provide context variables to diagnose a possible failure.

e.g.

     let title =
       textLines
       |> List.headOrFail "We generated the line lists by subgrouping the original list %A, so we know they're non-empty" originalList


The style guide for most projects is to include a comment on the unsafe block's header. Clippy has a lint for unsafe fns[1], and with either changes to the Rust AST (to not discard comments) or to the acceptability of doc attributes on unsafe blocks (to allow the use a /// comment on unsafe blocks), that lint could also work for unsafe blocks.

[1]: https://rust-lang.github.io/rust-clippy/master/#missing_safe...


It’s never really going to be perfect e.g. your idea doesn’t really work with unsafe functions, which can only be called from unsafe contexts but also are an unsafe context.

`unsafe` in rust really means that some of the compiler guardrails are disabled:

* you can call unsafe (and extern) functions

* you can dereference raw pointers (this has significant consequences such as being able to create mutable references out of thin air)

* you can access or modify mutable statics

* you can access `union` contents


D still has some issues with the safety story, but they have three safety attributes. @system - unsafe code, may do pointer arithmetics, may call C code, @safe - safe code, not allowed to do those things, @trusted - unsafe code, verified by programmer, can be called by @safe.

The main issue is that it came late in D lifetime, so until situation improves, @system is the default, because there's too much std/legacy code which still assumes @system as default. Many new projects however start with @safe as default.


Code using unsafe might be safe, it just can't be proved to be safe by the compiler.

Just as code written in C might be safe, it just can't be proved to be safe by the compiler! Rust's advantage over C is that the machine-unprovability is confined to small, defined zones, where hopefully human programmers can, formally or informally, prove the code safe.


Generally, the idea is that you use a small amount of unsafe code to build a safe abstraction to do what you need, then use that abstraction in many places. So now you only have a few unsafe sections you have to verify for correctness, instead of the entire program like you'd need in C/C++.


Two things here: 1. Restricting unsafe to limited number of places is a lot better than having it as a default everywhere. 2. The idea is that if you have to use unsafe, you (or some other module author) would wrap it in an interface that is safe to use. Overall this may not be perfect (in the sense of "its 100% verified by compiler") and finding bugs does happen in supposedly safe abstractions over unsafe, but Rust moves this from "its a norm" to "its very, very rare". 3. Unsafe lifts many restrictions, but not all of them, so there still is some validation.

Keep in mind that GC and a large parts of low-level code that is fundamental for languages that have GC is written in C, and we still are findings safety bugs there.


Interesting enough Java has the same question. There is a sun.misc.Unsafe class (https://blogs.oracle.com/javamagazine/post/the-unsafe-class-...) which is used internally to provide the safe code you can use in normal Java code. This is needed since at some point you will always have to go down to something which a machine cannot verify (e.g. "this port will hold the result provided by an external machine and the machine operator guarantees me it will always be a non-null address) and the guideline is the same as in rust: If you have to use this, you are responsible to hold all invariants and if you do this and therefore "guarantee" (usually it's done by tests, so it's not really guaranteed, but you get the idea) people using the safe abstractions can be sure that their code will work.

My point being: Unsafe does not invalidate the safety of the overall code, it shows you places where you have to be extra carefully. On the other hand if you yourself do not use unsafe and all the libraries you use are of high quality you can be sure your code will be correct. Which is the best you can do in reality.


No, but it makes it explicit regarding what's unsafe, so that you'll more easily know where to put resources to make the unsafe code as bomb proof as possible. In C that's 100% of the code, in Rust you can keep it to a low percentage.


unsafe code blocks are required to uphold the invariants of the safe Rust code surrounding it. So yes, if you fail to uphold those invariants (and that's proven to be pretty tricky), your program is not safe.

My experience is that you hardly ever need unsafe Rust code. If your ownership model doesn't the borrow checker, there are of course also other avenues like ref counts or even an actual garbage collecter (which you can chose to apply only to values whose ownership situation is not aligned with the stack). Only when those options are unable to give you the performance you want you'll need to reach for unsafe code.

Other scenarios where you'll need unsafe might include things like SIMD (although safe wrappers exist) or FFI, but hopefully we'll evolve the Rust ecosystem over time to concentrate the unsafe code in a few places by building abstractions, where they are thoroughly reviewed and well tested.


> My experience is that you hardly ever need unsafe Rust code.

I doubt that experience applies equally to kernel development.


They may apply better than one might think: the example SPI driver posted at https://lwn.net/Articles/863459/ only has two cases of unsafe.

Of course, the kernel:: -modules it uses undoubtedly have more, but I imagine the idea is to provide abstractions that allow actual driver code not to use unsafe.

And then there's stuff like file system code, that—I imagine—would require no more uses of unsafe than a database server written in Rust.


Because it's normal (indeed expected) to build safe abstractions on top of necessarily unsafe code, a lot of the kernel by volume shouldn't need to be unsafe at all.

For example suppose that we're back in the late 1990s and so for some reason dozens of different yet roughly equivalent 100Mbps fast Ethernet PCI cards are available, each offering the same functionality but with slightly different register layouts, interrupts and quirks. We are of course writing drivers for them in Rust.

Obviously at some fundamental level "configure PCI bus" isn't safe - if you get it wrong that'll be bad. But as the author of mediocre Ethernet driver #42 you aren't implementing "configure PCI bus" you're just calling a safe abstraction that worked for the other 41 drivers.

Handling userspace calls to twiddle the Ethernet interface is potentially unsafe too, but likewise that will have been factored out into common code and you're just talking to a safe abstraction.

Rust's type system and borrow checking makes it easier to write a safe abstraction for relatively low level ideas like "Only one device can own this MMIO register" or even "this can be 0x03 or 0x06 or even 0x18 but it can't be all zeroes, that's not a thing".

This sort of stuff is boring and in principle already not particularly unsafe in C today, this sort of abstraction is already done - but of course in C you've no practical way to get the promises Rust offers for safe code.


In respect of ownership models the borrow checker doesn't natively understand specifically, the Rust standard library provides a bunch of obvious things you would want, such as a Reference Counted type and the idea of Weak references. These are of course unsafe internally, but assuming you believe they are correctly implemented, it's safe to use them and so Rust code which uses Rc or Arc (an atomic reference counted type) is safe.

Now, beyond obvious things a beginner would want, you start to get into some pretty cool structures, and many of those are not provided by Rust's standard library, however, if you aren't excited about implementing them yourself others have likely done so. For example maybe you've read about "Hazard pointers" for very concurrent systems where you can't afford to take a lock when destroying shared resources but you want to only destroy resources that are no longer used. Rust doesn't provide Hazard pointers in its standard library, but several crates offer different takes on this.

Again, the implementations are unsafe internally, but if you trust that the programmer got them right, your use of their library is safe.


Using unsafe is like providing a "proof" that the code you write won't cause safe Rust to exhibit undefined behavior.

Writing unsafe requires a certain degree of skill, because one needs a deep understanding about what guarantees safe Rust provides, to ensure that your abstraction does not break them.

If you read the standard library docs, every unsafe block has a comment explaining / proving why it can't cause safe Rust code to exhibit undefined behavior.

Many of these blocks, particularly those in libcore, have been proved formally correct in proof assistants.

There is a theorem that proves that "if a safe Rust wrapper over unsafe code is proven correct, then extending safe Rust with that wrapper is still sound".

So you basically can infinitely extend the safe Rust subset of Rust with these abstractions.

> But if “unsafe” is used, can we still claim that Rust code is safe?

So the answer to this is "no, you can't claim that, you have to actually go and prove it".

Often, very often actually, the proofs are trivial.

For example, to index into a slice:

    fn index_slice(slice: &[T], i: usize)
        // SAFETY: if the index is not in bounds, we panic
        assert!(i < slice.len());
        unsafe { 
            *slice.as_ptr().add(i)
        }
    }
suffices.

Why? Because code that creates a slice with a ptr and a len field that do not point to a valid allocation with len valid elements already exhibits UB. That is, for any valid Rust program, we are guaranteed here that these two fields are "ok".

So the only thing we need to make sure of is that the index "i" is inbounds, and that's trivial to do with an assert that panics if this is not the case. That is, at runtime, no program for which the index is not inbounds will reach the ptr.add(i) method, so there is no way to offset this pointer out of bounds, much less dereference it.

Basically, because all safe Rust code can rely on all other safe Rust code upholding the rules, most of the proofs are really easy. The fact that you can prove safe abstractions over unsafe code in isolation from other abstractions is one of the main features of Rust.

---

For synchronization primitives, proving the absence of data-races is particularly hard, and requires a lot of expertise. The subset of Rust programmers that actually need to do this is very small. Most people just use those primitives that have already been proven today, of which there are many.


>Using unsafe is like providing a "proof" that the code you write won't cause safe Rust code to exhibit undefined behavior

It is more like providing an "axiom" as the compiler won't be able to check it and instead has to assume that it true.

I wonder if there is scope to extend the language in the future to actually add machine checked proofs.

Normally full machine checking doesn't scale to large projects, but if one had to restrict to writing proofs only for the small subset of unsafe code and let the simpler type/lifetime system handle the rest of the code it might be manageable.


> It is more like providing an "axiom" as the compiler won't be able to check it and instead has to assume that it true.

I don't think one should see this as an axiom, although as you mention one could.

If your unsafe code allows safe Rust to exhibit undefined behavior, Rust makes no guarantees whatsoever about what your program will do.

The convention in the standard library is to provide an actual proof that this won't happen as a comment to each unsafe block.

Right now, there is no tooling to check these proofs, but there is a lot of ongoing research about how to do that, and the standard library already has some tooling to check things about this.

> Normally full machine checking doesn't scale to large projects, but if one had to restrict to writing proofs only for the small subset of unsafe code and let the simpler type/lifetime system handle the rest of the code it might be manageable.

That's the idea.

It is not as simple as just proving each unsafe block independently, but rather all unsafe blocks within a Rust module must be proven together.

From this point-of-view, it would make sense to only implement one unsafe abstraction per module, and keep these modules as small as possible, providing most of the implementation of the abstraction outside the module using only the safe API.

Turns out, this is already how Rust code is written, since this is not only useful for writing proofs (either via assistants or using pencil and paper), but also helps humans reason about the correctness of the unsafe code by keeping all the "fluff" away.


> Using unsafe is like providing a "proof" that the code you write won't cause safe Rust code to exhibit undefined behavior.

My take is that Rust is proposed based on the value proposition of its Safe Rust feature, but in the Linux kernel that feature has limited use.

Therefore, if Safe Rust is out of the table then what's the point of writing/rewriting parts of the Linux kernel in Rust? What's the value proposition?


> My take is that Rust is proposed based on the value proposition of its Safe Rust feature, but in the Linux kernel that feature has limited use.

Where does your take come from?

Everyone I've heard of proposing Rust for the Linux kernel does so almost exclusively by making the argument that Rust can create safe performant abstractions over unsafe code.

That is, instead of requiring all Linux kernels programmers to be deeply familiar with RCU, you can have a small group of RCU experts create a safe abstraction over RCU, that then all other Linux kernel programmers can just use, without having to understand very deeply how it works. If they make a mistake, their code won't compile, and the error message explains why and how to fix it.

This is a huge productivity boost for large complex projects, and a huge safety boost since these abstractions prevent a large set of security vulnerabilities, and these are the main reason I've seen people to use to introduce Rust into the Linux kernel.

The fact that Rust does this with minimal - often zero - runtime overhead is a requirement for the kernel. But even for abstractions for which this is not the case, it still allows people to only have to become an expert when they want to improve performance, and any use of unsafe immediately triggers a requirement for such code to be reviewed by the actual experts.

This is also a huge boon for these experts, since it substantially reduces the code they have to review. E.g. now they only have to review uses of the "unsafe RCU" APIs, instead of also having to review the uses of the safe RCU APIs, which is the case today.

And also a big boon typically for the project in general, since it is much easier to attract "newbies" if there is stuff they can break. A newbie hacking on the linux kernel 20 years ago had much much much less to learn to become proficient than a newbie today. Reducing what newbies have to learn is a surprisingly effective way to increasing the contributor base of a project.


> Where does your take come from?

From this very discussion, for starters.

> Everyone I've heard of proposing Rust for the Linux kernel does so almost exclusively by making the argument that Rust can create safe performant abstractions over unsafe code.

The whole point is that the same thing (creating safe abstractions over unsafe code) is what developers have been doing with C for close to half a century.

Therefore, there must be a value proposition regarding the use of Rust other than Safe Rust. I haven't heard one so far. In fact, I've read the exact opposite, such as compilers needing rewrites to accommodate that usecase.

Thus, with Safe Rust out of the table, what exactly is Rust's value proposition?

> This is a huge productivity boost for large complex projects, and a huge safety boost since these abstractions prevent a large set of security vulnerabilities, and these are the main reason I've seen people to use to introduce Rust into the Linux kernel.

I feel those baseless assertions require some supporting evidence. Is there actually any tangible and concrete evidence that Rust, specially Unsafe Rust, improves productivity and safety? I'd love to read about it.


> The whole point is that the same thing (creating safe abstractions over unsafe code) is what developers have been doing with C for close to half a century.

They have been trying to do this, but it doesn't work because C's support for this is very poor. Not even C++ managed to do this.

Rust makes it easy to do this in a reliable and efficient way that is 100% guaranteed to work.

> I feel those baseless assertions require some supporting evidence. Is there actually any tangible and concrete evidence that Rust, specially Unsafe Rust, improves productivity and safety?

Yes, see: https://prev.rust-lang.org/en-US/whitepapers.html and https://www.rust-lang.org/production , and well https://dl.acm.org/action/doSearch?AllField=Rust (there are a bunch of peer-reviewed papers there that show this, serve yourself).


Unsafe in rust means "you have to enforce the use contract manually as opposed to compiler doing it for you" So yes, if contract is violated there the entire program is contaminated


The way I view it is that the unsafe keyword acts as a flag for auditors that they would need to thoroughly check the following block for memory errors.


Yes, we can claim so because there are other ways of proving your program to be safe. See: https://plv.mpi-sws.org/rustbelt/

By the way, several parts of the std (Mutex and Arc for example) are written in unsafe Rust. They have to be, by necessity, because they allow for aliasing and mutation.


unsafe rust code is not safe, but the rest of the program is, and the unsafe blocks provides very nice benefits:

- limit the size and scope of the unsafe code

- make the unsafe part stand out, "here be dragons", so you know when to be particularly alert

- act as a safe way to encapsulate unsafe code so that it can be used by safe code, and used without the lack of safety propagating outside of the unsafe zone


unsafe {} does not disable type checks or ownership rules. It (basically) only allows dereferencing a raw pointer. It's a way of telling the compiler "in this specific place, for this specific variable I manually verified correctness of dereferencing that pointer".

C/C++ is not just "everything is unsafe", it's also lacking many powerful features that exist even inside unsafe{} blocks: ergonomic enums with associated data, type-checked generics, ergonomic moves etc.


Ideally the unsafe parts are restricted to modules which have been proven safe to an appropriate degree of confidence. ('Unsafe' in Rust really means 'not guaranteed by the type system to be safe', so it's not a contradiction in terms to show that 'unsafe' code is safe.)

One might respond that you can do the same kind of modularization in C++. However, I think a Rust advocate would argue that it's much more difficult in practice to modularize unsafe code in C++ to the same degree, since unsafe idioms are deeply embedded in the language and library ecosystem.


They have a data structure that will do reference counting just like what most garbage collection systems use.

You see it used every so often.


Ref.counting is slow, causes too many extra memory requests. Also, when the program is multithreaded, access to cache lines recently modified by another CPU core is expensive, even more so that a cache miss.

For these reasons, most garbage collection systems are using mark and sweep, or something similar.


Reference counting man. When the programs ownership model doesn't adhere to Rust's borrow checker.


> However, sometimes your program’s ownership model does not fit with what the Rust borrow-checker can verify.

This tends to be a beginner issue for c/c++ programmers. 99% of the time you just need to rethink your design into something better.


> On the other hand, might the rumored sudden merge of the ksmdb driver (https://lwn.net/Articles/871098/) have been due to the implicit threat of its being rewritten in Rust?

That links to a message saying this:

> "ksmbd was then merged unbeknownst to me and my worries confirmed to be true as some out-of-bounds bugs (which are impossible in Rust) surfaced immediately."

Sad.


I would like a link to the bugs in question.

The only bugs I found have a long email chain with people being surprised that the kernel module even passed any tests at all, with extensive changes needed to fix logic issues for the path conversion code. The devs. even added an unload reload cycle to the tests because the module was in a bad enough state that it wouldn't consistently run over several iterations. I wouldn't be surprised if they would simply turn to unsafe if they had to write it in Rust, errors are apparently to be worked around not fixed. [1]

[1]https://lore.kernel.org/lkml/CAH2r5mvu5wTcgoR-EeXLcoZOvhEiMR...

edit: the last part of my comment may be a bit over the top, but I think the quality issues at the language level are just the tip of the iceberg for this module.


> I wouldn't be surprised if they would simply turn to unsafe if they had to write it in Rust, errors are apparently to be worked around not fixed.

They probably would. But it's easier to find usages of unsafe through grep or the like than it is with C because Rust's syntax and requirement of marking unsafe make it easier to find. To use one of my own codebases for example, I can look at the instances of the following regex patterns:

    * "\bfn\b" (function declarations), 138 matches.
    * "\bunsafe\s+fn\b" or "\bunsafe\s+extern\b" (unsafe function declarations), 23 total.
    * "\bunsafe\s*\{" (opening an unsafe block), 41 matches.
    * "\bunsafe\s+impl" (unsafe trait implementation), 0 matches.
We can also look at the total lines of code written (2269), and ask if this these numbers feel reasonable for what it's doing. Obviously that will require familiarity with Rust, what's being implemented, and the expected amount of unsafe based on previous usage in the kernel, but it would at least give an overall feel. And, of course, if you want to examine any of them, you know exactly where they are.


Wait wait, so somebody panicked that something might be rewritten in Rust and they rushed to merge C code which was then proven to have memory safety bugs?!

Is that what happened?


Rust in core Linux would probably be a horrible idea, given all the complexities of the language.

For drivers it could be fine. But so could many other languages. So why Rust?


> Rust in core Linux would probably be a horrible idea, given all the complexities of the language.

C is a very complex language as well. It's just that when you're not aware of all the complexities in Rust, the compiler will refuse to compile; in C, you'll get a security vulnerability or weird crashes at runtime. Yes, this is a generalization, but only slightly.

C is deceptively simple – its complexities are well hidden.


Rust is complex, and C's complexity is hidden - if it weren't hidden, C would be less complex.

But, does that in turn make C more complex? It definitely makes it less safe. But I don't know if it automatically makes it more complex.


> C's complexity is hidden - if it weren't hidden, C would be less complex.

If it weren't hidden, it would be a different language. Would that language be less complex? That depends on what that language would look like.

> But, does that in turn make C more complex?

More complex than what? I'm not saying that C is more or less complex than Rust – what I'm saying is that C is not simple – its considerable complexity isn't visible at first glance, but it's there.


> Yes, this is a generalization, but only slightly

Sounds like a total hand-wave to me. Can you provide examples of "well hidden" complexities?


Manipulating pointers as numbers. And addressing arrays out of bounds.


Manipulating pointers as numbers (using the ++ operator for instance) is actually really simple, but most people do a bad job of explaining it.


You are severely missing the point. Of course it's simple. Many things are simple on their face. Until one rainy day you're slightly distracted and make a mistake.

Be honest with yourself: you NEVER made a mistake that made you facepalm sometime in the future? Not once inn your life?

People just refuse to believe they can make mistakes. That's why those discussions are impossible to win; they are about personal maturity and not about programming prowess. On one side you have the people confidently saying their famous last words: "it's simple, you all dumb or something?" and on the other you have those that are like "I know I can screw up even if I'm the best ever so let me use a language that will catch part of my mistakes".

But yeah, the latter group can never convince the former. You all just repeat the same things every time and always miss the forest for a single tree. Very frustrating.


No, it's not a complex language. Come on.


The C11 standard is 701 pages. The "Language" section (ignoring the Environmet, definitions of terms, standard library, and appendices) is 145 pages. And it's hillariously underspecified even without ignoring all that; there's tons of implementation-defined (documented independently by each compiler), unspecified (compilers must pick a behavior from a list and document which choice they made), and undefined (not required to be documented by any compiler) behavior.


Before anyone takes this post at it's word, I encourage you to go look at the C11 standard. It's called "ISO/IEC 9899:201x" and those of you who have seen an ISO standard before will already know where I'm going with this!

> 145 pages

Yes, and (within that 145) they take a whole page to define the word "scope". Not even in the context of C!! Just the general language-agnostic term "Scope". That's hardly meaningful...


Which page within the 145 (chapter 6)? The section on scopes of identifiers (6.2.1, pgs 35-36) is entirely in the context of the C language.


Exactly that one. I'll let people read it to come to their own determination as to how "entirely in the context of the C language" it is.


Then why do the seasoned veteran C programmers keep introducing memory security vulnerabilities all the time? Some of the projects considered by many to be idiomatic C keep introducing terrible bugs... Sqlite, Redis, etc.


I don't think bugs are introduced due to the language being very complex, but the problem/solution being complex. To prevent (some of) these, you may need a more complex solution, like what Rust provides.


Would you like to address the arguments in my post instead of going "Na-ah, you're wrong, I'm right"?


I get your point, but I'm queamish on how far you stretch the definition of the word "complex". C has these problems because it's simple, i.e. it provides a rather thin abstraction layer over machine language.

Rust's corresponding layer is much thicker due to its memory requirements and historical origins.


> it provides a rather thin abstraction layer over machine language.

No, it doesn't, and it hasn't for quite some time – not since C compilers started exploiting undefined behavior for optimization opportunities. It used to be that if you wrote

    a + b
where a and b are int values, the compiler would emit an ADD instruction in assembly. This isn't necessarily true anymore, because compilers are allowed to assume that signed overflow doesn't happen, which makes writing correct overflow-detection so difficult and non-intuitive. Even worse, code which was working correctly for 20 years might start exhibiting subtle bugs from one day to the next, just because you've upgraded your compiler from version 9.4.2 to 9.4.3.

Undefined behavior is one major reason why writing correct C code is complex and difficult in practice.


> where a and b are int values, the compiler would emit an ADD instruction in assembly. This isn't necessarily true anymore, because compilers are allowed to assume that signed overflow doesn't happen, which makes writing correct overflow-detection so difficult and non-intuitive.

it's the exact opposite - if you want your compiler to just emit an ADD no matter which platform you are on, then it cannot be the language that defines the semantics of "+" in the corner cases (thus UB). If you define it as overflow on 2's complement, then on a DSP with saturating arithmetic your compiler will have to emit additional instructions to detect if you are saturating and wrap instead.


This is HN.... every day there are articles about "$(this old existing software) written in $(language of the week)"

$(language of the week) changes, and people continue using the old software, written in c/c++/...


except c++ is not welcome on the linux kernel.

torvalds said recently: “Probably next year, we’ll start seeing some first intrepid modules being written in Rust, and maybe being integrated in the mainline kernel.” [1]

the rust community is a bit over-excited, but if you can look past that, there's substance in the hype.

[1] https://thenewstack.io/linus-torvalds-on-community-rust-and-...


> $(language of the week)

Rust is 10 years old. See, such knee-jerk reactions are exactly why many people have trouble taking the anti-Rust arguers seriously. It's like you people defend religion or something. How about we stick to facts?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: