Hacker News new | past | comments | ask | show | jobs | submit login
Linus Torvalds on Rust support in kernel (lkml.org)
628 points by EvgeniyZh 9 months ago | hide | past | favorite | 290 comments



The title is slightly misleading, and so is Linus' response here. This RFC never claimed to be in shape to be immediately mergeable into the mainline kernel as-is. Miguel (the author of the patch) has replied to this mail (and in other places in the thread) that the Rust alloc() is currently called only as a temporary measure to speed up development, and all panic() calls from allocation failures are just as temporary. This is all because the Rust code that hooks into the kernel memory allocation functions is not yet usable.

The main point of this RFC is that "the [in-kernel Rust] support is good enough that prototyping modules can start today." There's no point in making long arguments about alpha-quality design shortcuts on an alpha-quality prototype that are also explicitly mentioned by the patch authors to be of alpha quality.

See e.g. https://lkml.org/lkml/2021/4/14/1130 and https://lkml.org/lkml/2021/4/14/1023

EDIT: Thanks for the child comments; it seems that Linus is simply not aware of all the specifics and is asking for more information and/or decided to look at the code before reading the full mail thread.


I don't really see anything misleading in the post. Linus says that the comments/responses are from a position of ignorance and it seems like he's just seeking understanding.

If anything is misleading it's linking to random emails in lkml without context. :-P (Maybe that's what you were getting at).

Personally, I'm very excited about Rust in the kernel.


> Personally, I'm very excited about Rust in the kernel.

Why?


Speaking for myself and not the person you replied to

- It would make me 10x more likely to work on the kernel, just because I enjoy programming in rust more than I enjoy programming in C. (At least given my current employment, any kernel contributions would be on my own time)

- It would give me more faith in the security of other code people are contributing, things like binder in C strike me as pretty scary components of android from a security perspective, I'm much more comfortable running the same thing written in Rust.

- I think it would generally reduce the number of kernel bugs in components written in it. Kernel bugs are rare, but and very frustrating when you encounter them.

- I think it would increase the general productivity in kernel development. A better kernel helps everyone out.

- It acts as validation for rust as a language. Obviously the kernel people shouldn't care about this in the slightest and it's not an argument for including it. However if it is included for other reasons (see above), it does help me argue that "rust would be a good fit for x" in other situations.


> It would make me 10x more likely to work on the kernel

Seconding this. I don't think I would ever touch the Linux kernel with a twenty-foot-pole in C. In Rust? Maybe I'd give it a try and contribute one day.

Knowing the exact set of constraints on my code, having them enforced, not having to hold the entire system in my head just to avoid a catastrophic memory error. These things make contributing to a project - especially one as incomprehensibly large and complex as the Linux kernel - orders of magnitude easier.


its not that bad writing kernel code - just remember if you get anything wrong you'll hose your filesystem beyond repair ;-)


what about interacting with the existing kernel code base? would data coming back from the kernel into rust space need to be wrapped to provide safety guarantees? or would it be necessary to turn safety features off?

a bit confused about this.


So... it doesn't really impact this discussion other than "shouldn't be an issue". I'll try and give a summary of what's happening technically, but HN is frankly the wrong form for a "how to use the C ffi in rust" tutorial. Also a big disclaimer that I don't know what precisely this project is doing, so I'm talking about C/Rust projects in general.

Rust talks the C ffi really well. It can call external functions that follow the C abi the same way it can call native unsafe functions [1]. You can tell it to layout a struct the same way that C does, etc.

Because calling the C abi requires unsafe code, it's common to provide wrappers around the C abi that are safe against missuse. I.e. that make it so that the only way to call C functions is the correct way. This is doing things like making it so the only way to get a `struct` is to call a (safe) `new` function that calls the (unsafe) C initializer internally, and exposing the C "methods" on that struct (that expect it to be initialized) as safe methods on the rust struct that internally call the unsafe C functions (and they can do so because they know the struct has been initialized). Obviously for any particular C api you have to look at what it requires to be called safely, and then figure out how to encode it in the type system, but that's usually surprisingly easy.

Calling rust from C doesn't really require any "unsafe" code (other than the fact that C is basically a giant unsafe block by nature), because the assertion that you're calling it correctly happens on the C side of things, not the rust side of things. Just like rust can call C abi functions, rust can make it's functions follow the C abi by simply saying

   extern "C" fn foo()
instead of

   fn foo()
But many of the data structures you might pass from C to rust will need a wrapper to use "safely". E.g. if I pass a doubly linked list, it's going to need raw pointers more or less by nature (at least if rust wants to be able to mutate it), and someone is going to need to do a similar wrapping thing where they expose some functions that correctly work with the list, that internally use unsafe, but expose a safe api.

[1] So what unsafe here means that the compiler doesn't know that the function is safe to call, so you have to tell it "I checked and how I'm using it is fine" by putting the call inside an unsafe block. This looks like the following. Note that you can also have unsafe native rust functions (e.g. if you want to index an array without checking the array bounds that's an unsafe function implemented in rust)

    unsafe {
        c_function_here(arg1, arg2)
    }


thank you for this. this helps.

but let's say one is writing a filesystem in rust, so you're implementing most of the functions in "struct file_operations", and moreover you are passing "struct inode" , "struct page" etc ... back and forth between c and rust. with such heavy handed interaction, aren't we basically doing c in rust by necessity of the interface? by which i mean "unsafe" the way you defined it?

are there examples where you see a clear win?


You'll have to excuse a bit of unfamiliarity with linux internals here, I'm taking a guess, but I expect that filesystems are an example where you would see a clear win.

My assumption would be that a file system is calling the same methods on a few different objects repeatedly. E.g. "read me some bytes from this page" or "get the id of this inode". For each of these APIs you once write a small amount of unsafe code that encodes into the type system "and this is how you can call it safely", and then you repeatedly get to make use of that code with guarantees that you aren't making any mistakes that are too terrible (logic bugs still exist obviously, which on a file system could delete or corrupt files, but you aren't going to corrupt some random kernel memory by accident). That's a pretty big win in my mind.

Meanwhile file systems probably include a lot of non-ffi things I think rust is substantially better for too. Like handling of a ton of different error's (oh no, the disk failed to give me bytes. Oh no, these bytes make no sense. etc) in the codes "happy"(ish) path. And like parsing data structures out of bytes (correctly). Tracking exclusive access to various resources. Implementing compression algorithms. Etc.

The case where you would see the sort of issue you're discussing is where all the code is doing basically unique ffi calls, so you don't get any reuse out of safe abstractions. I don't know of any great examples of this, maybe things like boot sequence code where you're running a lot of unique things exactly once to initialize the hardware?


thanks gpm for taking the time. let's see how it pans out. rust is definitely interesting.

now let me not impose on your kindness further and go learn a little rust.


Networking, especially wireless (as it's more complex and potentially more dangerous: attacker needs not even a wire).

Google is developing a bluetooth stack in Rust.

[1] https://blog.desdelinux.net/en/google-desarrolla-una-nueva-p...


hmmm ... good point.


That is what this patch series is about. It will require using unsafe code to some degree, yes.


>- I think it would increase the general productivity in kernel development. A better kernel helps everyone out.

Don't you think the massive increase in compilation time would negate any productivity gains and probably decrease productivity overall?


"Productivity" is notoriously hard to measure in software.

While slow compile times may slow you down, and hence reduce your productivity, if the compiler prevents hard to fix bugs, it still may increase your overall productivity. Consider things like https://hacks.mozilla.org/2021/04/eliminating-data-races-in-...

> Overall Rust appears to be fulfilling one of its original design goals: allowing us to write more concurrent code safely. Both WebRender and Stylo are very large and pervasively multi-threaded, but have had minimal threading issues. What issues we did find were mistakes in the implementations of low-level and explicitly unsafe multithreading abstractions — and those mistakes were simple to fix.

>

> This is in contrast to many of our C++ races, which often involved things being randomly accessed on different threads with unclear semantics, necessitating non-trivial refactorings of the code.

Maybe the C++ was faster to compile, but in the end, fixing these issues took more time. There were more of them, and they were harder to track down.

Nobody truly knows the answers to these questions in the general case yet, of course. My point is just that "slow compile == bad productivity" is not inherently true.

Faster compile times are, of course, always desired no matter what.


> "Productivity" is notoriously hard to measure in software.

We can look at history and results. In these terms, C (and perhaps C++ to some extent) is, I believe, the only productive programming language for making the low level parts of non-experimental kernels.

As far as it can be publically measured, Rust so far has proven itself for application programming in a certain niche and not much more. It would be cool if it could prove itself in kernel space too -- we certainly need less system crashes caused by bad kernel-level code. Curiously though, it has been a very long time since I've last bumped into such a thing in Linux. This makes me suspect that Rust is trying to fix a problem here that's already been fixed in another way.

As for ease of coding, C seems like a massively easier language to learn than Rust. But I might be wrong there. Any data about that, I wonder?


Yes, that is true. But the only way to get that data is to do it. This work is one part of doing that. Someone has to be first :) (There are of course a ton of kernel-level things in Rust that don't pass the "non-experimental" bar for various people. As always, depends on exactly what you mean.)

> Any data about that, I wonder?

Possibly one of the only things harder to measure than productivity is ease of learning, haha! I had programmed in C for decades before Rust even existed. We do have a lot of people who say that they think Rust was easier to learn for them than C was. And of course many who believe the opposite. Not sure anything is conclusive in any direction. For example, it's quite possible that some people find C easier, and some people find Rust easier, and there will never be a clear winner.

Time will tell.


I've got some vague opinions about "easier to learn" that I'd like to hear some disagreement on, to help me work out my thoughts more. Please forgive me if I'm not very clear here.

I don't know if this is what you mean or not, but I've seen a lot of claims that a language is "easier to learn" that seem to be considering "learning a language" as a valuable topic on its own, separate from "learning to write and maintain correct nontrivial programs in the language", and that seems wrong to me.

There's a part of this idea that does seem valuable to me, in that at the beginning of your learning process, there are a lot of benefits from being able to quickly get to a point where you can successfully write small programs that do something. It helps your motivation. It helps you reach some amount of productivity faster. When a language is better at early onboarding, it's more-useful to people who have smaller needs and more-constrained use-cases. Python being so easy to learn to glue together some libraries makes it a fantastic, valuable tool for many people.

The part of this idea that I really disagree with is how it applies to non-trivial, non-beginner use-cases. There are topics and skills that languages vary in their coverage of, but that you still need to learn about and deal with anyway for many types of programs. Memory management, resource handling, ownership and sharing, concurrency, nullability, error handling, composition, organization, abstraction, refactoring, testing, debugging, etc. A language including more or less that directly addresses these topics doesn't necessarily mean you won't still need to learn them.

To me, the relevant question isn't "Which language is easier to learn in isolation?", but instead "Which language is easier to learn to implement safe, performant, efficient, reliable, concurrent code with?".

If you take a new engineer who has "learned C", how easy is it to train them to get their rate of memory safety errors, thread safety errors, missed error checking, etc. down to the same rate as you'd get from a new engineer who has "learned Rust"?

Without tooling support like you get from Rust, you instead need to learn safe idioms, learn strategies to minimize your exposure to errors, train yourself to always always check everything at all times, learn how to write tests to discover mistakes you've made, learn how to use a collection of third-party tools you can use to approximate some of the benefits of Rust's compile-time checking, and train yourself to always use it. That's not "learning C", but it's still required in order to implement something like Linux.

Rust's bet is that there are ways to reduce the overall complexity of everything involved in implementing high-reliability high-performance systems by moving some of that complexity into the language. If you don't think it's accomplishing that goal, that's fine, but make that case directly.

I agree that the C programming language is smaller and easier to learn in isolation. It's not so obvious to me that something like "C + Valgrind + ASan + TSan + UBSan + ..." is easier to learn than Rust.

On the other hand, for many classes of errors that C offers no help with, Rust's compiler will directly point out where you've made a mistake, why it's wrong, and often offers advice on how to fix it. When learning a new language, having that kind of tooling support universally available is extremely helpful.

To be clear, Rust doesn't handle everything, and there's still a lot of benefit you can get from dynamic analysis tools, fuzzing, etc. There are also levels of reliability and assurance that aren't currently feasible with Rust. Rust has a long way to go.

The point I think I'm trying to make is that Rust really raises the bar in a meaningful way. There's some nonsense and awkward bits in Rust, but a lot of what you need to learn to be effective with Rust are things that you'd need to learn anyway to be effective at this level with C, and I think it's easier to learn those with Rust's help, and it's significantly easier to build systems with a much lower rate of these problems by using Rust.

Sorry for the length, and lack of organization. This has been rattling around in my head for a while, and I wanted to get some thoughts out in writing.


This will also be a bit messy, as I am on my phone and don't have much time to write this down.

I have 10 years of writing Java on both small hobby projects and massive systems on my job. I dipped my toes into C/C++ a few times and I can write hobby project sized code that runs, but I can never really trust that code like I can the Java code that I write.

I had this issue especially when writing my first TeamSpeak Plugin: I could never be certain if I had to free the data or if TeamSpeak would do it for me after my function was called. I had to look into the documentation, as TeamSpeak is closed source.

About one and a half years ago, I finally sat down and learned Rust. I had tried to learn it several times before and always ran out of motivation.

This time, I got to hobby project level proficiency within a month or two. It was really fun and I rewrote several Java Spring projects in Rust's Rocket.

As Rocket starts just about instantly, even with the long compile times, the Rust rewrite takes less to compile than the Spring projects took to start up.

Coming to the point: I am already more confident about my Rust code than ever was about C and would even go so far as to say that I trust it more than my Java code, despite the big difference in experience.

I believe that I am ready to jump into a big Rust codebase and become productive as soon as I have understood the business-logic of the project in question.

I doubt I will be able to say the same about C/C++ in the near future.


100% this. People forget when using dynamic languages they are trading up front cost - its easier to write the code but harder to test. In trivial or exploratory coding the tradeoff can be good, but it is a tradeoff.

That being said, using rust can be really nice for exploratory coding. If don't worry about edge case (use unwarp()/panic!) and don't worry about memory efficiency (use clone()) it still produces fast, memory efficient code.


No, because

- I don't think it will be massive.

- I especially don't think it will be that large for incremental builds, which is the main thing that matters. (But I'm not involved in this project, so I don't actually know how incremental the builds are...)

- My experience is that the vast majority of programming time is spent fixing mistakes, not compiling. Rust reduces the amount of time spent fixing mistakes a lot more than it increases time spent compiling.

- Rust moves many errors early in the compilation process (instead of when you try and test your code), which reduces iteration time instead of increasing it.

I'm not involved in this project, but I imagine it's at a point where you could get some numbers for the fixed overhead that adding rust adds to the compile times. I'd be interested in seeing those numbers.


Have you compiled the Linux kernel before? I doubt Rust will be the bottleneck, it's a massive project - a tiny fraction being in Rust will be a blip.


Pretty frequently, yes. It doesn't take that long on modern, moderately-powered devices. Even when I was using a mid-tier device from a decade ago, compilation time was still around half an hour, which was less than most Rust projects I've encountered are on modern and reasonably high-end hardware, despite doing much more and being much larger.


Yep, that sounds exactly right to me - about 30 minutes on older hardware. That's very odd to me that you have 30 minute rust build times - as someone who works on a rust project professionally, with 10KLOC, that isn't my experience at all. If Rust gets into the kernel I would expect it to account for <1% of the code, so even if it were 100x slower to compile, which it isn't, I don't see it having an impact.


I wrote 10kLOC last week. Most of the code didn't require much extra thinking, it was mostly a translation of an old TypeScript project to Zig. But measuring compile times with 10kLOC is really not a good argument.


Well, 15KLOC, and of course not including dependencies. But my point was that there will be a tiny, tiny amount of Rust in the kernel by comparison to the 10s of millions of lines of C code. Rust would have to compile radically slower than C, like hundreds or thousands of times slower, in order to be a limiting factor.


Man, 10KLOC is a very small project. Obviously it compiles quickly. Linux kernel is almost 30 million lines of code.


Yes... exactly. The Linux Kernel is 30 million lines of code - so how exactly will some minimiscule-by-comparison amount of Rust code slow down compile times considerably?


If it's always going to be miniscule then why bother? If it has potential to grow to something not miniscule then it's important to it compiles quickly.


1. Even if we start today, and all new code is written in Rust, it will be years before Rust has an impact on compile time.

2. A relatively small portion of code can account for the majority of exploited vulnerabilities.

3. Nothing about Rust is fundamentally slow with regards to compile times. There's plenty of time to work on it, and we've already seen significant efforts make headway.

Compile times are a nonissue in any practical sense for this work.


I've been coding predominantly in rust since 2018 and have never encountered a build time > 10min.


Worked at a place where our c# build times for a large project, many millions of loc, were 40 mins or so.


If you are talking speed to production ready code then rust is really productive. The rust tooling picks up a lot of errors and leads you to spending more time fixing coding issues rather then compile to test your code.


How much is 10 x 0? ;)


What makes you assume the probability of contributing is 0 though


> It would make me 10x more likely to work on the kernel, just because I enjoy programming in rust more than I enjoy programming in C

That sound was a thousand people nodding in unison


And it would be a million people if it were Python. That doesn't make it a good idea to encourage submissions to the kernel in Python. There is much more to kernel programming than the programming language. C might be inconvenient and lacking expressive power, but if people can't handle pointers and goto, what mess are they going to make with different address spaces and interrupts?


> C might be inconvenient and lacking expressive power, but if people can't handle pointers and goto, what mess are they going to make with different address spaces and interrupts?

I have terrible news for you. People (well, humans, and I don't see anybody else signing up to maintain Linux) cannot in fact correctly handle pointers and goto. That's why they keep making mistakes.

It's actually to be hoped that Rust can usefully express constraints it has today to prevent some of those problems onto things like address spaces. It'd be great if say, driver code which can confuse a virtual address with a physical one just won't compile rather than compiling and then mysteriously not working as expected or causing occasionally BUG() reports.


> but if people can't handle pointers and goto

Mitre shows that in fact people can't handle pointers and goto


He doesn't mention that it's an alpha feature


There are literally millions of things he didn’t say. He did however express a requirement for acceptance. That’s the message, if you want this in the kernel, it has to never call panic() at run time. Why? Because kernel crashes are unacceptable for the types of deployment Linux is used for.


I not sure he mean never panic at runtime. I think there are good reason to panic at runtime if you care about safety a buffer overflow is a good reason for instance.

But a memory allocation failure is clearly not a good reason.


It would be hard for him to be clearer:

> With the main point of Rust being safety, there is no way I will ever accept "panic dynamically" (whether due to out-of-memory or due to anything else - I also reacted to the "floating point use causes dynamic panics") as a feature in the Rust model.


WRT floating-point, worth noting for those who haven't read the patches:

- Kernels generally don't want to use floating-point, because saving and restoring the floating-point registers is fairly expensive. - Without some pretty aggressive hacking, it's not possible to remove floating-point support from Rust. - What you /can/ do (and what these patches do) is replace all the floating-point builtins with kernel panics. - This obviously sucks versus actually removing the floating-point, but this is an RFC.


There is some work in progress to make it easier to tell Cargo to build `core`/`alloc`, which in turn would allow projects to maintain patches atop them like disabling floating-point or oom-panicking APIs entirely. I suspect that this effort will get a great deal of motivation from being a kernel requirement


I think the gist of what he is saying is typical application programming patterns like crash on exception, gc, etc. are not a good fit for kernel programming. And I agree. Handle the request or return an error and let higher level code handle the error processing.


Out of memory allocation panic is absolutely not acceptable in a kernel, and even less so when you're linux (which does some sneaky memory overcommit things)


But looks like Linus doesn't know the specifics, he is asking for more info while at the same time making already clear enough that if his concern cannot be addressed than there is no point working on the Rust integration until it is fixed.


Anyone thinking that Rust is ready to be used in the kernel, just read the post first. It is very short. Then read these comments to look for what you need answered.

For me, the main point is Can Rust be written to guarantee that no oom (or other hard fail 128-bit math) panic occurs that is not under control of the written code? I want the answer to be yes and also want to see how (which isn't far off from what Linus is asking).


> Can Rust be written to guarantee that no oom

Yes, easily, OOM is entirely a library created concept in rust, just don't use the standard library (or the `alloc` susbset of the standard library) and you don't have OOMs...

> (or other hard fail 128-bit math)

128 bit math, and floats, need to be avoided by just "not using them", the same as for floats in C code in the kernel...


What's the issues with floats? Everyone is casually talking about this issue like it's common knowledge. Online searches of floating point panics doesn't really bring up any information beside bug reports.

Is it something about floating point operations not actually being fixed-sized?


In a modern pre-emptive multitasking operating system, the kernel needs to be able to stop what your user task was doing, do its own thing for a while or even run a different task entirely - and then put everything back apparently as it was and allow your task to carry on. The CPU affords this capability by providing a way to bottle up all its internal state, store that somewhere, and then put it back later.

Operating system kernels don't generally need floating point math (some of them use it anyway, lots don't).

So if your CPU has a way to say "Bottle your state, but er, don't worry about the floating point stuff" and it's faster, or uses less memory, or both, which are common, all the kernels (like Linux) which do not use floating point know they aren't touching that anyway and needn't bottle it up. This is potentially an important performance win.

If you try to take this performance win, but then you actually do use floating point in the kernel, the world suddenly changes beneath the feet of user tasks. A program is adding up some floating point numbers, and then, huh, suddenly the total is now negative? Wait, now it's zero? Nope, negative again? What's happening! If the program was aware of being interrupted this makes sense, but the whole point of pre-emptive multi-tasking is not to need to custom design every program to be interrupted everywhere.

So Linux (mostly) never uses floating point.


To say what tialaramex said in a different way, the OS needs to save all the user space registers to the stack when it interrupts a task, so it can do it's own thing. Floats are held in separate registers, so as an optimization it can skip saving those registers to the stack and just not use them (as long as it doesn't call any code that might use floats, and returns back to the same task afterwards). This turns out to be a substantial performance improvement, because there are lots of small syscalls that return quickly.

128 bit integers not working is a bit dumber, the compiler relies on some functions being implemented for them to work, and for whatever reason the people behind rust in linux just haven't implemented those functions (yet?). They use the normal (non floating point) registers though, so there is nothing stopping them from being implemented. https://github.com/Rust-for-Linux/linux/issues/11#issuecomme...


I would like to know about this too. Did you find any interesting article?


You got two good answers, but also, there's also now some pressure to make these scenarios better (say, "compile error if you use a i128" rather than "just don't use an i128"), which is nice. I'll be glad on the kernel's needs putting some pressure on Rust to improve.


Yes, don't use the std lib (or more specifically the alloc crate) and don't use types like floats and u128.


Is it possible to use Box (which is special cased in many places in the language and thus cannot be replaced, unlike for example Arc) without the alloc crate?


a replacement to `alloc` would define a `Box` structure and tag it as `#[lang = "box"]` but not define any crashy APIs. this would accomplish the goal, but is an annoying patch that `alloc` should work to obviate upstream


What I would love to see is Rust in the kernel, with a caveat. That caveat is something akin to Erlang's OTP supervision trees. Erlang has a similar philosophy to Rust in that if something erroneous happens the process fails. The difference is that Erlang OTP is designed to have supervisor processes of the worker processes. The only thing the supervisors do is monitor the processes under them and control when, if, and how those processes are restarted on failure or other conditions. The supervisors themselves have supervisors, right up to the root supervisor for each Erlang application.

Rust got the language and the tooling perfect, but Erlang got the services and service infrastructure perfect. The more I think about it the more I think I should shut up and put up. In other words, apply my knowledge of Erlang and OTP to create a gen_server in Rust as a jumping off point for a OTP like Rust framework, perhaps called OARS (Open Advanced Rust Services). This is definitely bigger than one person. If you'd like to join me on this journey then reply to this comment and I'll send you project details by the end of the weekend.


I don't see how one can run and hide from Linus' point. Exception/panic based work is problematic in libraries, and has no place in kernels. Erlang's OTP supervision trees is as others have pointed out, a runtime issue that app-devs build on. Therefore it's an abstraction that's above the kernel and out of scope w.r.t. to kernel work.


This is a fair and valid point. On reflection, what I am looking for is not something that would be in the kernel. It's more something that is an added layer on Rust, quite possibly with compiler support, to provide for OTP style services and supervisors.


Doing this well requires a heavy enough runtime that it would disqualify it for this kind of work. Not on a technical level, but on a social one. While there have been operating systems created with this sort of runtime, they're not as well known or successful as ones that haven't. I would imagine this email would be a flat "no" if this were a core part of Rust, sadly.


> a heavy enough runtime

Hey, asking out of pure ignorance - Why do you think this? My naive point of view is that the Linux kernel already has enough of a runtime to support such at hing - a scheduler, kernel threads, and interrupts.

An actor system shouldn't be particularly heavy - you can probably implement an actor in just a couple of bytes.

Agreed that it couldn't be a core part of Rust though.

Maybe it's too off topic or whatever, just curious.


Erlang makes heavy use of green threads to do this kind of work. You spin up thousands or hundreds of thousands of these. Kernel threads are too heavy weight to do so. And making them lighter weight has tradeoffs too. Erlang makes use of a GC and immutability to make tasks more restart-able; maybe Rust's memory safety features would let you do this sorta kinda, but I don't think it's been really demonstrated fully yet.

Like sure, you could build out some of these features, but they only truly work really well if you extremely commit to the architecture, in my opinion. And the kernel isn't about to do that.


Oh yeah, I 100% do not believe this would be something that ships to the kernel :)


From what I remember the Linux kernel doesn’t even use std.lib. It’s pretty straight up C with no dependencies (the os being low level). Makes kernel module programming difficult.


The Linux kernel is freestanding. But that's hardly the reason why kernel programming is difficult.


putting my marker on 2030 as implementing a tokio runtime in kernelspace


I don't really know much about erlang, but I think this may be along the lines of what you are thinking of: https://github.com/bastion-rs/bastion

(I also don't really think the linux kernel people would be interested...)


I don’t understand what you’re getting from this. Crash looping can still happen in OTP, the root supervisor can still die if the crash threshold is met in a small window. This would also be very heavy weight. IIUC the issue is not that errors occur, but that errors occur (and panic) and cannot be handled


You may want to write a proper user story for this and submit it to https://blog.rust-lang.org/2021/04/14/async-vision-doc-shiny... (mentioned in the latest "This Week in Rust" development summary).


Why not write an entire OS in a message-passing VM based language with garbage collection? No crashes, ever!

In fact, you can run Erlang directly on raw metal, or write an OS in Erlang: http://www.erlang-factory.com/static/upload/media/1498583896...


Because borrow checking > GC. When you have a GC, you need several times more memory to run the same program with the same performance as without -- and usually, you have to say goodbye to any sort of determinism in execution time, as well.

What Rust brings to the table is guaranteed memory safety without GC, and all memory is released in strictly deterministic time.

So for an OS, it's much better to bring the good bits of Erlang to Rust.


That is a good idea, but one thing I would advise, having both seen several attempts made at this sort of thing and having made one myself [1], try very hard to separate the accidental things Erlang brings to the idea from the fundamental things Erlang brings to the idea. Most attempts I've seen made at this flounder on this pretty hard by trying to port too directly the exact Erlang supervisor tree idea while grinding hard against the rest of the language, rather than porting the core functionality in in a way that integrates natively with the language in question as much as possible.

For instance, one thing I found when I was writing my library that will probably apply to most other languages (probably including Rust) is that Erlang has a somewhat complicated setup step for running a gen_server, with an explicit setup call, a separate execution call, several bits and pieces for 'officially' communicating with a gen_server, etc. But a lot of these things are for dealing with the exact ways that Erlang interacts with processes, and you probably don't need most of them. Simply asking for a process that makes the subprocess "start" from scratch is probably enough, and letting that process use existing communication mechanisms already in the language rather than trying to directly port the Erlang stuff. Similarly, I found no value in trying to provide direct ports of all the different types of gen_server, which aren't so much about the supervision trees (even if that's where they seem to be located) as a set of standard APIs for working with those various things. They're superfluous in a language that already has other solutions for those problems.

In addition to keeping an eye out for features you don't need from Erlang, keep an eye out for features in the host language that may be useful; e.g., the most recent suture integrates with the Go ecosystem's ever-increasing use of context.Contexts as a way to manage termination, which hasn't got a clear Erlang equivalent. (Linking to processes has some overlapping functionality but isn't exactly the same, both offering some additional functionality contexts don't have as well as missing some functionality contexts do have.)

Erlang has a lot of good ideas that I'd love to see ported into more languages. But a lot of attempts to do so flounder on these issues, creating libraries so foreign to the host language that they have zero chance of uptake.

The other thing I'd point out is that even in Go, to say nothing of Rust, crashing is actually fairly uncommon by Erlang standards. Many things that crash in Erlang are statically prevented at compile time in Go, and Rust statically precludes even more of them. However, I have found it OTP-esque supervision trees to be a very nice organizational structure to my code; I use suture in nearly every non-trivial Go program I write because it makes for a really nice modular approach for the question of "how do I start and stop persistent services?". I have seen it hold together runtime services that would otherwise be failing, the way it is supposed to, and that's nice, but the organization structure is still probably the larger benefit.

(There is deep reason for the way Erlang is doing it the way it does, which is that a lot of Erlang's type system, or lack thereof, is for communicating between nodes, so even if you perfectly program Erlang, if two nodes running different versions of code try to communicate with each other and they've changed the protocol you might get a pattern matching fail on the messages flowing between versions. The Erlang way of doing cross-machine communication with this sort of automatic serialization at the language level has not caught on, and all modern languages have a relatively distinct serialization step where this sort of error is better handled, as you try to deserialize the remote message into your internal data structure.)

Anyhow, the upshot is, you want to translate the functionality out of Erlang into other languages, not transliterate it.

[1]: https://github.com/thejerf/suture


I would love to see the same thing, I always told people around me that Erlang/OTP is more akin to an OS than a traditional programming language. That being said, the key feature to enable what Erlang supports is asynchronous termination, which as far as I know is not possible in regular Rust.


I think the current options for killing a generic task in Rust are either 1) make it an async task, which can be cancelled as long as it doesn't accidentally block a thread, or 2) make it a separate process, and have the OS kill it. Do either of those fit this use case?


You want a microkernel.


I don't know enough Rust and Erlang for helping with that, but that definitively sounds great. What would you use for IPC?


If it's based from OTP we should give it proper credit and call it OTP/OARS.


I love the idea of OARS and would like to help any way I can!


Aside from Linus' reaction, there are some really interesting pearls in that thread, for example:

Regarding code style[1]:

> The more you make it look like (Kernel) C, the easier it is for us C people to actually read. My eyes have been reading C for almost 30 years by now, they have a lexer built in the optical nerve; reading something that looks vaguely like C but is definitely not C is an utterly painful experience.

> You're asking to join us, not the other way around. I'm fine in a world without Rust.

CoC was already brought into battle[2]:

>> I could be mistaken but you seem angry. Perhaps it wouldn't be a bad idea to read your own code of conduct, I don't think you need a browser for that either.

> Welcome to LKML. CoC does not forbid human emotions just yet. Deal with it.

These ([3] [4]) messages have an interesting perspective about maintenance.

> I'm sure about one thing, the C bugs we have today will be fixable in 20 years. I'm not even sure the Rust code we'll merge today will still be compilable in 10 years nor will support the relevant architectures available by then, and probably this code will have to be rewritten in C to become maintained again.

Linus wants to see a real, working kernel driver instead of Android Binder[5]:

>Would there be some kind of real driver or something that people could use as a example of a real piece of code that actually does something meaningful?

[1] https://lkml.org/lkml/2021/4/16/118

[2] https://lkml.org/lkml/2021/4/16/143

[3] https://lkml.org/lkml/2021/4/16/181

[4] https://lkml.org/lkml/2021/4/16/283

[5] https://lkml.org/lkml/2021/4/14/1091


From one of those posts:

> I don't see how the two languages might coexist peacefully without rust toolchain being necessary for building any kernel useful in practice and anyone seriously involved in kernel development having to be proficient in both languages.

I can empathise with that. Just last week I butted head with an issue in a python package that requires Rust internally. The lib compiles fine on its own, but something gets screwed when running in a virtualenv. Opened a bug in github, and nobody has any idea about how to get even a detailed log out of the rust toolchain.

I'm sympathetic about Rust, I really am. But sprinkling it mindlessly everywhere is a big risk.


> The more you make it look like (Kernel) C, the easier it is for us C people to actually read. My eyes have been reading C for almost 30 years by now, they have a lexer built in the optical nerve; reading something that looks vaguely like C but is definitely not C is an utterly painful experience.

I think he makes a good point about the fact that it's certainly possible the Rust code written today won't still compile in 10 years, but writing Rust in C-style seems like a terrible approach. Write using the idioms of the language used.


> I think he makes a good point about the fact that it's certainly possible the Rust code written today won't still compile in 10 years

It’s possible but unlikely. The Editions feature [1] has been specifically designed to provide longevity.

[1]: https://doc.rust-lang.org/edition-guide/editions/index.html


Section 3.16 ("Platform and target support") of the linked document is so inadequate for Linux kernel related questions, such as "what compilation targets will be supported by this Rust Edition in 10 years", that there is nothing to quote to show it's inadequate. It doesn't even tell what compilation targets are supported right now.


The canonical link for platform support is here: https://doc.rust-lang.org/stable/rustc/platform-support.html


There is no commitment to keeping the platforms in tier 1, 2 or 3 for any length of time or to provide advance warning before platforms are degraded, not to mention that the tier 1 list is vastly insufficient (Arm64, x86, x86_64).

Rustc is clearly meant to compile Rust applications and use for the Linux kernel, which from the point of view of this document would be narrower than tier 1, is highly experimental.


Where does Linux document how long platforms are guaranteed support? My understanding is there's not either.

(And yes, we'd love to expand the tier list. It's a chicken and egg situation. That being said, I work mostly in tier 2 and 3 targets, and they just vary a lot. The ones I work in work as good as tier 1 targets do. The path to moving forward is bright for many of these.)

> Rustc is clearly meant to compile Rust applications

I don't see what about this implies that at all. As a core team member and someone writing an embedded OS at work, I can assure you that we don't think of Rust as purely an application language.


Maybe I'm unfamiliar with Rust, but if you list OSX, Linux and Windows on identical processors as different platforms they are platforms for applications, not for an OS kernel that runs on "bare metal": it implies that support includes relying on OS provided system calls to manage processes, implement the standard library, etc.


Sure, the standard library on those targets depends on OS services. But you don’t have to use the standard library. Many people do write userspace programs in Rust, just like they do in C. That doesn’t mean that it’s only useful or intended for userspace, just like C.

It takes more work to port an entire standard library than to not.


unless you plan to run `rustc` as a kernelspace application, anything in tier 2 (`core`, `alloc`, maybe `std` all compile and likely pass tests) is sufficient


Personal pet peeve: new people who come into a project without any context and use their own idiosyncratic code style. :-)


I agree, but if it helps creating more and better drivers, it might be a good thing considering the lifetime of the target hardware matches Rust lifetime.


>>> I could be mistaken but you seem angry. Perhaps it wouldn't be a bad idea to read your own code of conduct

A passive-aggressive "you mad bro?" followed by namechecking the coc, all in service of doubling down on antagonizing someone over their choice in workflow? Good grief.


“CoC” seems to suffer from the same problem that “bad cops” and “zero tolerance” politicians seem to suffer from, so that's why it's often brought up.

It seems to be a common thing that those that demand the strictest morality rules also seem to have the most aggression problems and often overstep their own rules, but typically have an excuse ready why in their case it's different.

Though, perhaps it simply stands out more if it's one to chant “code of conduct” or “zero tolerance”, but as far as statistics in the Dutch parliament goes, it's often pointed out that all the parties that are in favor of lighter punishments and rehabilitation tend to be spot free, whereas politicians of parties that advocate harsh punishments and zero tolerance tend to very often have past criminal records themselves.


Not sure why this is down-voted. At the very least the people who invoke the CoC are among the most pushy and dominant ones in projects, even when they manage to cloak the dominance so as not to be perceived to be aggressive.

U.S. people are better at that game, since it is expected and rewarded in work life. This is also why they push for CoCs, which have nothing to do with manners or niceness, but are just another power tool.

I have never seen a genuinely nice person push for a CoC. Not once.


> Not sure why this is down-voted.

What I just considered due to your post is that it's quite likely that those who support C.o.C.s are also more likely to cast votes.

I think it quite likely that those with a libertarian life philosophy are far less likely to cast votes on websites in general, especially to voice disagreement.

I concur that those who want enforced niceness seldom are nice themselves and tend to often have their reasons and excuses of why they are justified when they are not so nice.


Most bad thing starts with very good intention.

The use of CoC might be one of them.


Bad intentions don't exist; the bad guy always sees himself as the good guy.

The zero tolerance politician with a domestic violence conviction probably believes that in his case it was different and justified, just as everyone with a domestic violence conviction does.


The person he responded to had all but told him to go to hell in the previous email. Seems like a reasonable response to an unreasonable email


Eh I'm normally against bludgeoning people with a CoC, but the guy he responded to was basically telling him to "go fuck yourself" masked by euphemisms. So I think this was just a passive-aggressive response in kind to a rude email.


The CoC is working as designed.


Oh I love this bit in [1]

I've yet to see a program that renders HTML (including all the cruft often used in docs, which might include SVG graphics and whatnot) sanely in ASCII. Lynx does not qualify, it's output is atrocious crap.

Yes, lynx lets you read HTML in ASCII, but at the cost of bleeding eyeballs and missing content.

Nothing beats a sane ASCII document with possibly, where really needed some ASCII art.

Sadly the whole kernel documentation project is moving away from that as well, which just means I'm back to working on an undocumented codebase. This rst crap they adopted is unreadable garbage.


Rust's strengths lie where Rust is written like Rust. If Linus wants C, he knows where to find it. Rust isn't for "C people" -- it's for their replacements.


Winning friends and influencing people


The point is extremely valid.

It has never been an issue in my use case of Rust, but the lack of an interface for when the system is out of memory is problematic.

Not only in kernel, but in all system-level software.

Then, it happens, less and less frequently, but still, I want to know and I want to be able to handle it.


The problems are that:

- most such interfaces do have a constant overhead that all programmers using the language have to pay.

- some major operating systems (like Linux, ehem) make these interfaces useless for all their user space apps by enabling overcommit by default, which makes it hard/impossible to write portable code that can handle OOM

Explicit OOM handling would mean that most users end up paying a relatively high upfront cost for something that in practice for them (e.g. if they are Linux programmers) delivers no value.

For example, on Linux with overcommit (the default), even if the system is out of memory, malloc won't return null. It returns a pointer that's not null, such that your if(ptr == nullptr) will act as if everything is "ok", but then, when you try to read/write that memory, that will trigger a hardware exception, that the kernel will catch, and then the kernel will tell the OOM-killer to "make space" by killing "some process", and maybe your app is just killed, or some other app that your app is working with is killed leading to a race condition, or... or....

When your app is killed by the OOM killer, the signal that your app gets is "irrecoverable", which means that your app will die, you can at best try to do some cleanup before it does, but it will die nevertheless.

So I find it extremely ironic for Linus to argue that practical programming languages are hard to use for environments that must handle OOM errors, when they are championing one of the major platforms that makes handling OOM useless by default.

I keep saying this, but the obvious fix is for the Linux kernel to use overcommit internally just like they expect user space to do. When the kernel then runs out of memory, it should then start killing drivers at random to make for some space. If they think that's such a great default behavior, they should commit to it. \s


The "but the OOM killer!" argument has been a disaster for Rust, and keeps derailing all design discussions.

• Linux is not the only OS in the world.

• Even on Linux you have containers/cgroups that can impose hard limits.

• Platforms without virtual memory are also an important target for Rust.

• On 32-bit platforms you can run out of address space before you run out of RAM.

• Regardless of what the OS does, the application may still want to impose its own internal limit (e.g. https://lib.rs/cap) to avoid being OOM-killed or swap death.

People keep telling me how Linux never runs out of memory, while I'm currently firefighting a torrent of coredumps caused by Rust's self-own on OOM that actually happens.

I generally love Rust, but its OOM handling is awful, and the "but the OOM killer" nonsense is to blame for stalling the absolutely critical fixes it urgently needs.


> but the OOM killer!" [..] disaster for Rust [...]

Only on HN discussions (the disaster part) ;=)

(Through it's is a broken discussion, OOM on Linux is broken, but it becomes increasingly "un-"broken and like you said other platform exists, too).

Anyway there is no disaster as:

Rust does support handling memory allocations gracefully, the allocator API defaults to this!!

Just various types defined in [lib]std (or precise [lib]alloc) default in there default methods to calling the (replaceable!) allocation error hook.

This is due to ergonomics, there is no sane/ergonomic way for "common non embedded/kernel programming or non special purpose use-cases" to make literally every method which potentially allocates return a result. And outside of such special purpose cases you can always recover from panics (i.e. outside of special purpose cases panic=abort is a anti-pattern).

Still, even in std, methods like `try_reserve` exists and are used, e.g. by serde deserializers to make sure to gracefully fail if the user tries to allocate a 10GiB array (because compressed data formats ;-) ).

BUT for the kernel all this doesn't matter as it is very unlikely it will use [lib]std or [lib]alloc. So they can default to just always implement their types in a way which doesn't panic on memory allocation.

And wrt. to panic in other (basically guaranteed to be a bug cases) panic=BUG() (ther kernel BUG() macro) can be done.

I'm more worried about the float/128bit integer thing he mentioned, as I don't know anything about it. I assume it's not just a miss understanding of overflow checks and other debug assertions the kernel likely will disable for non-debug builds?


> I'm more worried about the float/128bit integer thing he mentioned, as I don't know anything about it. I assume it's not just a miss understanding of overflow checks and other debug assertions the kernel likely will disable for non-debug builds?

Idk about 128bits integer, but the floating point question has been discussed on Rust's subreddit[1]:

> Normally, the kernel leaves the floating-point state of the CPU in whatever state the userspace process left it in, so that you can do a quick system call without having to save and restore all that state. It's possible to use floating point in the kernel with a great deal of care, but you have to notify the kernel that you're doing so, so that it can save the userspace floating-point state.

> So, it'd be helpful if by default Rust didn't allow use of floating-point, and then you could opt in to allowing it for specific code.

And from Linus[2]

> In other words: it's still very much a special case, and if the question was "can I just use FP in the kernel" then the answer is still a resounding NO, since other architectures may not support it AT ALL.

[1]: https://www.reddit.com/r/rust/comments/mqxr1a/rfc_rust_suppo...

[2]: (from the same reddit thread) https://ipfs.io/ipfs/QmdA5WkDNALetBn4iFeSepHjdLGJdxPBwZyY47i...


Well if that it all this could be solved by rust by having a "floating_point_usage" (or similar) lint which normally defaults to "allow" but for kernel devs defaults to "error" (which could be done through an option in the target spec, which as a side note is also what defines if you have hard,soft or no fp support). (there are probably better solution but this one should be the easiest).


There's actually a lint specifically intended for that already. `float_arithmetic`:

https://rust-lang.github.io/rust-clippy/master/#float_arithm...

# What it does

Checks for float arithmetic.

# Why is this bad

For some embedded systems or kernel development, it can be useful to rule out floating-point numbers.


Thanks, nice, but for the kernel use case it would need to be part of rustc.


Simply install clippy and call "clippy-driver" instead of "rustc", which will handle this. Clippy is largely the lints that haven't either made it into rustc or are fairly edge-case-y.


Why? The Rust-for-Linux team already made Clippy a requirement for Rust kernel code. For niche configuration that are really project-specific like this, it makes total sense to use external tooling to ensure the guidelines are respected, it doesn't need to be the compiler enforcing these guidelines.


The overcommit is not the only argument.

The second argument is that complex programs that need to handle OOM properly will be full of complex untested error paths.

How should one handle a OOM error If one can't allocate some temporary node during an algorithm and how to recover without leaving a corrupted state? That's not a question I want to answer for almost every function call.

That's why the rust standard library data structures don't handle OOM. To make it easier to use for the most common case. And for the cases where the OOM handling is required, one can simply use data structures for it, which is what Linux will do (problem solved)

Also compare that to the case of mutex poisoning which can return an error on each unlock, and which is now considered a mistake because in practice, nobody want to handle a poisoned mutex error.


I argue it's a rare-enough problem:

• Statistically, likelihood of failure is proportional to the allocation size. You'll most often fail to allocate a very large Vec, and then only need to allocate a few bytes for a string that says "error!". I'm using https://lib.rs/fallible_collections and just a few strategically placed try_reserve has brought down my coredumps by 99%.

• Panic on OOM would unwind, and unwinding tends to free memory rather than allocate anything. The standard library should reserve/preallocate enough memory to be able to start unwinding.

• Rust is good at avoiding temporary allocations and relies on the stack a lot. You don't have to use fancy error-handling libraries that collect backtrace on every error. Error handling with enums doesn't touch the heap at all.

• Even if everything fails, and you get OOM during OOM handling, that's fair, and still better than abort every time.

Note that Rust currently doesn't handle OOM by panicking. It unconditionally aborts the whole process. Libstd could switch to panics without API changes, and that would make OOM handling possible via catch_unwind.


> Panic on OOM would unwind,

OR abort the process.

Panics are for fatal irrecoverable errors, not for recoverable errors.


For example of the problem that the API would have if it wanted to handle OOM: Box, String or Vec, the most basic types in the rust alloc crate, couldn't implement Clone, because clone() can't fail, and you need to allocate memory to clone these data structures. As a result, lots of generic algorithms that relies on on their type to be clonable wouldn't work, and most user types could not #[derive(Clone)] anymore.


So, if I get this right, this means that the "just don't use the alloc crate" argument given somewhere else here is not practical ?


No. People that care about catching OOM failures will simply have to use other crates that have a more complex API, but that's what they want

I was just saying it is a right choice in the alloc crate, which is meant for user space programs, to simply panic on case of OOM, because otherwise the API would be more complex for everyone.


No not at all.


Because ...? I mean, from the perspective of an outsider reading that you "can't use strings" if there's a risk of allocation failure is a bit worrying.


Because you just don’t use the types in the alloc crate, or you use the (only partially implemented) APIs that return a Result<> instead of panicking on allocation failure.

For that kind of low-level work I’d probably do the former and implement my own data structures, which is what you do with no_std currently.

As others have said there’s nothing in Rust the language which assumes allocations always succeed. The standard library made the choice to panic on allocation failure, which was the right choice in order to make an ergonomic API that’s suitable for higher-level work. If you’re not using the standard library (which currently includes people on eg embedded targets) then you can handle allocation any way you like. And if you’re writing drivers for the kernel you’d assume you’re ok handling that.

GP was pointing out that the Clone trait doesn’t allow for returning a Result, so you can’t use that trait to clone anything that allocates. But that’s not really a big deal since you can just make your own TryClone that does.


Ok, thanks for the clarification.


Hows that' a problem?

Provide `TryClone`, there, problem solved.


I agree, it doesn't seem like these types can actually implement Clone if allocation can fail.


I agree that the OOM killer is frankly irrelevant in this discussion (and a bad idea in the first place IMO, cue the "airplane company deciding which passenger to throw out the plane" argument).

But on the other hand in my experience with C and C++, languages which do allow explicit OOM handling, most applications either just crash when that happen (the `xmalloc` route) or attempt to recover but often really can't do much.

It goes even beyond programming languages. Very few environments deal nicely with near-OOM conditions. My Linux desktop becomes effectively unusable if I start aggressively swapping. And technically at this point as long as I have swap I'm not literally out of memory.

My general point is not that I don't think Rust would benefit from allowing the developer to explicitly handle OOM conditions, it's more that the vast majority of applications are not written to gracefully degrade in these conditions anyway, and if the only handling you do is effectively `alert("out of memory!"); exit(1);` it's really not worth bothering with it.

On the other hand for the minority of applications that want to do something more meaningful on OOM you probably want to take the time to design some ergonomic system that won't litter the code with boilerplate, i.e. you probably don't want every single Vec and String and basically every single facet of the standard API that can allocate behind the scenes to return a `Result<_>`.

I'm sure there's something to do, but I think the Rust devs are right not to rush it given that there are so many other features left to implement.


> if the only handling you do is effectively `alert("out of memory!"); exit(1);` it's really not worth bothering with it.

I'd argue that even this is worth it, for most software. At least then you get a clean exit and an obvious problem. The alternative is to not detect it and let the code walk off into undefined behaviors and potentially subtle bugs that can harm data and/or break security.


Yeah, explicit OOM exit beats random NULL deref any day. The former can be analyzed by an L2 tech out of a logfile. The latter requires spinning up GDB to figure out where NULL came from.


In my experience the problem with the latter isn't even that it requires a skilled analyst & a debugger. The reality is that, in a multi-threaded program, by the time your process has crashed on a null dereference the offending call stack is probably long gone.

The faster you fail the more readily apparent the root cause will be from your logs/core dump/etc. For this to work though reboots (of the application/environment) need to be inexpensive, and you also need supervision. That supervision can either be at the OS level, like Linux's systemd, Solaris' SMF, Apple's launchd, etc., or at the application level like Erlang/OTP.


Yes, that's another great reason to fail promptly.


To be clear Rust will give you a clean exit in this situation (or at least, I assume that to be true, it would be a glaring issue if it didn't). On linux however since malloc usually can't fail you won't ever see that since the kernel will realize the issue on access and not on alloc, then the OOM killer will play russian roulette with your processes.


It would be really helpful, if the OOM killer should be adjustable to kill non-overcommitting programs last (kill any with overcommit first). Then programs could be slowly fixed/improved.


word


> For example, on Linux with overcommit (the default), even if the system is out of memory, malloc won't return null. It returns a pointer that's not null, such that your if(ptr == nullptr) will act as if everything is "ok",

Not always. The default overcommit policy will reject certain allocations that are too big. Also, you can change the overcommit policy if you need it.

Also, overcommit policy not rejecting allocations by default tends to be more useful than harmful. One example is forking processes: the memory space gets duplicated with copy-on-write, so if you didn't allow overcommit you could end up with processes that couldn't be forked due to that behavior. Not to mention that most programs don't use all the memory they have assigned, so without overcommit you'd have a lot of problems with RAM underusage.

> So I find it extremely ironic for Linus to argue that practical programming languages are hard to use for environments that must handle OOM errors, when they are championing one of the major platforms that makes handling OOM useless by default.

Kernel and user space programming are very, very different. A lot of developers use languages with garbage collection without issues, but that's not acceptable in kernel programming. The fact that the platform provides X feature does not mean it has to be developed including X feature. In this case it's very clear. Even if they used overcommit in kernel code, Linux does not always overcommit memory, so you'd still need to manage the OOM case in kernel properly.


So, basically you are saying "Rust is mostly used as an application programming language anyways, so let's not pester those users with the overhead of OOM handling"?

Fair enough. But this would mean that Rust is (factually) a systems language in the same sense that Go was initially declared to be a systems language.

If I look at the Rust community (I give it a try from time to time), I would totally agree with your point, that most users would be pestered by this overhead. I see mostly CLI tools, "web apps" and the kind.

Furthermore, I just can't shake off the feeling that the embedded (bare metal, 16 or 32 bit architectures) or (in-house) kernel crowd will always be 2nd class citizens.

    I keep saying this, but the obvious fix is for the Linux kernel to use overcommit internally 
    just like they expect user space to do.
I'm not a kernel developer, but aren't you (maybe) asking for a bit too much? You ask the Linux kernel devs to change the kernel in a significant way, just so you can use Rust for writing drivers in a way you write user space applications.


> So, basically you are saying "Rust is mostly used as an application programming language anyways, so let's not pester those users with the overhead of OOM handling"?

No. What I am saying is that Rust intended to be a systems programming language that could handle OOM properly everywhere, but that got a lot of opposition because some operating systems, mainly Linux, make it impossible for programs to handle OOM _at all_, so adding proper OOM support to Rust would mean that every Rust Linux programmer would be paying for a feature (in ergonomics, etc.) that they cannot use _at all_.

That's the irony.

Linux design has made it not worth it for new programming languages to be designed to handle OOM properly, because that's impossible to do in OSes like Linux.

People have been complaining about this for user space programs for the last 20 years, and the Linux kernel stand was "that's a feature, not a bug".

But now that they are confronted with it, you see Linus writing stuff like "that's not acceptable".

That's a huge double standard IMO.


I don't think you can use "double standard" as a derogatory term when you're comparing the needs of kernelspace code and userspace code.

...plus, they're already planning to write their own `alloc` replacement if for no other reason that they need to support API features of the kernel allocator that are absent from the userspace allocator, like GFP flags:

https://github.com/Rust-for-Linux/linux/issues/2#issuecommen...


> I don't think you can use "double standard" as a derogatory term when you're comparing the needs of kernelspace code and userspace code.

You argue that the needs of kernel space and user space differ when handling OOM errors.

If that were true, one wouldn't have had to patch the standard library with the `try_...` methods to try to support OOM handling in user space as an afterthought.

That request came from firefox, where a user in userspace can click on a link pointing to a 1 Tb large, e.g., image file, that would kill the browser if it can't handle OOM properly.

So yes, it is a double standard, because both user space and kernel space have millions of valid reasons for wanting to handle OOM errors, but Linux and in particular Linus Torvalds motto here is "do as I say, not as I do".

It is also extremely ironic that this Linux kernel policy that makes it impossible for user space to handle OOM, now causes Linus to cry about it when the kernel wants to start using tools develop for userspace in the kernel.

They are directly responsible for this outcome.


Rust 1.0 was explicitly intended to be a Minimum Viable Product, containing only the work they felt they couldn't revise later without breaking their API stability promise.

As such, they started with the versions of allocating APIs that are most commonly desired... the ones equivalent to how array[index] will either get what you want or panic if you're out of bounds. The try_ allocation APIs which are equivalent to .get on a container then came later.

A web browser, while a good program to stress the capabilities of a language's design to make sure it can meet demand, is not a typical program.


No it's about "by default" not pestering users.

Rust has all the tools needed to gracefully handle memory allocation failure.

Yes, the tools are a bit limit wrt. usages in the standard library where you mainly can use them to handle "known to potentially fail big" allocations but not every single small allocation.

But enter no_std and it's basically all your choice, which is also the default thing to use for embedded/bar-metal and I count the kernel as embedded/bar-metal. (It's default because the std library types are tuned for the most common use-cases including web-server and user-space system programming, in which you always can have panic on OOM, and error kernel as a much more ergonomic pattern then explicitly returning a result on every thing which potentially allocates).

Through without question the discussion around it is a mess.


You can't solve memory allocation problems by turning off overcommit and handling alloc errors in rust. You don't get Null, you get panic. That's a problem for the kernel, even if you argue it's not a problem elsewhere. Start killing drivers is nuts, oh look my screen went blank, and three HDs stopped working. Damn where is my swap.


> Start killing drivers is nuts,

I don't disagree, just saying that its as nuts as killing random user-space programs on OOM.

"The web browser misbehaved, lets kill the pulseaudio daemon, and if that doesn't fix it, lets kill WiFi, and if that doesn't fix it, lets kill vim, and... "

FFS if some app tried to allocate too much memory just let the app know, or kill that app, but don't start randomly killing user-space processes.


While I agree that OOMKiller is pretty mad, it's also important to note that the shared nature of memory means that the app which will die when the system is out of memory is anyway random. Even without overcommit, you can get into the situation of "the web browser is occupying 99% of RAM to open YouTube, and now pulseaudio tried to allocate another 50 bytes that the system just doesn't have, so pulseaudio should handle this, and then the WiFi manager will be the next to need 100 bytes and be refused etc".

Even worse, Linux by design stalls to a complete crawl when memory almost runs out, even with 0 swap and a fast disk, as it will first start swapping out code pages before giving up. Which means that every time a process is context switched, its code ends up first needing to be read from disk, the worst possible kind of cache thrashing imaginable.


Somehow a LRU-algorithm makes the most sense, if only Linux and humans would have a common definition of what "recently used" means here. And if applications had safe-shutdown and restart on demand mechanisms. On a high level of abstraction this is how a lot of VMs deal with things, such as tabs in your browser or open apps on your smartphone.


You are talking about userspace. Linus is talking about kernel.

You can change the Linux memory overcommit policy using sysctls if you need it.


die on run out of memory is a bad policy for a userspace server. Seen some hairy production incidents with that, basically solution is piss off customers until they stop coming back.

action on run out of memory should be finish up currently running work and hold off new stuff until you have resources, not die and start to reprocess the thundering herd. When you run out of memory last thing you want to do is empty your hot caches.


You’re out of memory. Every possible cache has already been freed.

Unless your app is faking it’s own caching, in which case it might be contributing to the problem.


It's quite possible that an app is holding onto data that can be flushed, at the expense of future database lookups, session negotiations, etc. Without a proper mechanism in the kernel it is hard to manage memory pressure.


If that would be important you would disable overcommit in Linux. If you didn't do that, it's probably because it's good tradeoff that saves resource.


> but the lack of an interface

It's not lacking, the standard library just defaults to not use it by default because handling OOM is a conceptual mess (yes, the OOM handling features C/C++ has are included in this).

But even in the standard library you have methods (some not stable) like `try_reserve` which returns an error is allocation fails.

Anyway all allocation parts are part of the standard library (or the alloc library), i.e. they are not a core part of the language itself.

I don't think anyone ever planed to use rust's standard library as-it-is in the kernel (it's just not designed for this, e.g. see panics) or pull in external dependencies from cargo/crates.io without vendoring them.

So it's totally possible to run rust without alloc caused panics.

Now besides that there is the question about panics outside of allocations. Like e.g. if code realizes it ran into a violated invariant, e.g. it knows we have a bug which isn't explicitly handled.

And guess what the kernel already has a handler for it: `BUG()` so basically any panic will call `BUG()` (and we don't panic on memory allocations).

Now maybe some tweaks to the panic system are necessary to make sure this all works well (panic=BUG(), is like panic=abort and on abort call BUG()!).

Now wrt. the integer and float parts, there are two thinks first the panic will likely call without debug assertions like overflow checks (which would call BUG()!! So many are gone, but here are some special cases around floats and 128bit integers on platforms which don't support it where calling BUG() is inappropriate but I have to look into this, tbh.

So short:

- panic on mem-alloc failure is a (lib)std/alloc thing, kernel code would anyway have used something easel

- panic in the kernel can be made into being basically BUG()

- issues around float and 128-bit integers should be fixable, at worst with a compiler flag. But embedded code is by it affected, too. So there is a good chance there is already a fix.


I agree Rust should have it, but it is VERY hard to do correctly.

I've worked on some systems which claimed they were dealing with it, but when I purposefully pushed it (by making a malloc which would occasionally fail), I quickly uncovered dozens of bugs, and the solution was to stop pretending would could sensibly handle low-memory situations. SQLite famously does handle this situation correctly, but it is a huge amount of work.


There are counter examples. For example, data processing as part of a measurement application: If allocating the data fails, the application should abort that, but keep running to allow further control and let the user reduce e.g. the sampling size.

I know of one application with a worldwide customer base that supports this.


Check out Zig. That language will work much better as a kernel/driver language because it is simple by design.


Yes, if allocation failure is not tested, it is not likely to work correctly. SQLite is also famous for its strict and complete test sets.


Having this feature is one of the USP's of Zig isn't it?


Rust std::alloc has the same standard interface as everything else- return null on failure to allocate. It's just that the development in std collections to use this is in progress. And it's being retrofitted to collections that assume they can just abort or panic on alloc failure.


(Some of) Those collections live in std::alloc as well, which is the issue here.


It honestly isn't.

Out of memory means, your system is simply not designed for the task at hand. The kernel returning -ENOMEM only masks the fact that eventually Linux will have to OOM Panic if it can't OOM Kill. Hell imagine the swapping and the I/O spike because your VFS cache has been or is currently being purged. I honestly think the best case is to just fail when a fundamental resource is simply not there.


> I honestly think the best case is to just fail when a fundamental resource is simply not there.

Indeed. I'm using Firefox in a VM. It's a pain, because invariably at some point Firefox uses enough memory that the VM starts to swap. And then the whole thing grinds to a halt, and I usually just end up with the VM equivalent of power-off.

Instead I'd be perfectly fine with the kernel telling Firefox "computer says no" when it tries to malloc, before the system runs out of memory, and then Firefox can do whatever it wants with that.

Yes I know there's some cgroups magic or whatever I can do, but man, why does it have to be so painful?


Firefox is a bit of an interesting case.

We have essentially 3 categories of allocations: (1) those that are large and/or the size is user-controlled, which are (mostly) handled; (2) most of those that happen within the JS engine, which are handled but we're constantly debating whether it's worth the cost; and (3) all the rest, which includes all other allocations outside the JS engine as well as the ones within the JS engine that are expected to be rare and are too hard to handle in any sensible way. For (3), we crash on OOM.

So if you do use cgroups or ulimit or whatever, Firefox may do something reasonable with OOMs. Or it may not, depending on what code sees the OOM. It's still an open question how often the JS engine handling an OOM is worthwhile (as in, it won't just continue to OOM until it finds something that will choose to crash.) OOM telemetry is a little iffy, so I don't trust statistics based on it.

The cost of (2) is not just code size and code complexity. It's also a larger vulnerability surface that is rarely exercised. Within the JS engine, we have ways to synthesize OOM events to at least get some level of testing. (We'll run a chunk of code repeatedly, OOMing on the 1st, 2nd, 3rd, ... allocation, and make sure we either handle it properly or do a controlled crash.)


Personally I'd be fine with crashing a tab or five.

For example Slack uses several gigabytes of memory if I forget to reload the tab every hour or so. It has a DOM live-leak or something. Perfectly fine to let that thing crash and burn.

Letting my PC grind to a halt is far worse for me.

Now my point is, I don't want the kernel to try to fluff Firefox and try to limp it along. Sure for a few applications it's good that the kernel tries its utmost.

But in most cases, I don't want one application to dictate my PC's performance. Gone are the days where I use my PC for one thing at a time.

And if the kernel was more strict with applications, then hopefully sane OOM handling would force its way into more applications.


Zig (another language fit for low level programming, like C and Rust) begs to differ.

[1]: https://ziglang.org/learn/why_zig_rust_d_cpp/#no-hidden-allo...


What does this have to do with my comment? If you're out of memory, how can zig know you can just continue on? What if your memory is held in tasks that are effectively dead-locked because a dependent task is incapable of allocating? There are many things that can be happening once memory is effectively maxed out. The more common towards the edge is higher I/O and the system crawls.

I'm sure Zig is great, but I don't see from what you linked how that changes what I said.


I know Rust much better than Zig. I found that one difference between the two is Zig default behavior on mem alloc: it can fail.

All code alloc'ing mem needs to deal with this potential failing: the dev can choose to "panic/die/exit/etc" or possibly a different strategy can be chosen (display smth in the UI, not allocating memory, etc).

Rust has this "nicely return a Result, instead of exceptions/panics/etc", in most cases, but not in the case of mem alloc. That a big difference between Zig an Rust.

Now Rust has alternative methods for mem alloc that return a Result: but these are not the default and thus hard to discover. This is "unsafe behavior by default" in Rust. This is special, as Rust usually promotes the safe path.


I love how the section about Rust links to a github issue that is more than five years-old, when the fallible allocation story has evolved so much in the past year.


So how has it? I was not aware it has changed (also I dont need the changes, as I use Rust for more highlevel stuff)


There are new methods returning a Result instead of panicking, slowly being implemented for all allocating collections in the standard library.

see https://github.com/rust-lang/rust/issues/48043 and https://github.com/rust-lang/rust/pull/80310


Like what was mentioned in the linked thread, the "try_*" functions.

This is what Zig has as default behavior.


Is it really too much to ask, in 2021, that our computers be cabable of saying "I'm sorry Dave, I can't do that", instead of "Halt and Catch Fire"?


> Out of memory means, your system is simply not designed for the task at hand.

Speaking as somebody who's never built such a system, wouldn't maintaining some opportunistic cache (e.g. for some space-time-tradeoff) be a valid use case of asking for more memory and gracefully degrading in case it's not available?

Error handling would simply consist of not expanding the cache size (if it happens during an allocation related to the cache) or freeing up some cache memory (if it happens for an essential allocation).


You can free opportunistic caches, undo some calculations (and optionally put a descriptor of them into the disk), freeze some internal services into disk, not start some large task (and keep the small but time critical ones running)... There are all sorts of things you could do on out of memory errors.

But on practice programs ask the users to decide those things, and just fail if the user choice is invalid. It's very rare that some program makes those decisions by itself, and it's usually frowned upon, because each single program can not assume that it owns the entire system.


I mean, it sounds like you're describing overcommit to me. Ask whatever large amount you want. Maybe even be like webkit and use overcommit for heap isolation. It works out great for the userspace case, until the limits are actually reached and you still have a failure problem, one probably harder to deal with than without overcommit.


> Out of memory means, your system is simply not designed for the task at hand.

You seem not very familiar with the memory "management" behavior of Linux and the wonderful ecosystem it has engendered. For example no matter how much physical memory you have, Chrome for example will just crash all the time if you turn off Linux's insane "lie about memory allocation succeeding" default.


I'm more than familiar with overcommit and it has nothing to do with being out of memory. In fact, im explicitly talking about the kernel failing to allocate (-ENOMEM) in a alleged future driver.

It may interest you to know the WebKit takes this behavior to 11 and actually uses overcommit to isolate heaps even. That "runway" of memory between heaps is not used and thus the +99GiB virtual memory size is bunk. Really nothing to do with being out of memory.


In the embedded Rust world it is common to explicitly define a custom global allocator and panic handler. The global allocator is even optional although you then won't be able to use heap allocated structs like String and Vec. Therefore the Rust compiler does not force you to use the built in allocator or panic handler and you are free to implement them how you choose.

See for alloc: https://docs.rust-embedded.org/book/collections/index.html See for panicking: https://docs.rust-embedded.org/book/start/panicking.html


As a C developer for embedded, considering Rust for a long time now, the panic thing is something that bothers me. I don't want/I can't panic. I want to be returned false, null or whatever.

Do I have to check if an external crate that I am using would panic? If so, how do I prevent the crate from panicking?

From my perspective, Rust is kind-of designed to support code like this (non real Rust code follows):

   my_struct.do_something()
            .get_this()
            .get_that()
            .as_ref()
            .as_paper_airplane()
            .unwrap()
What if some of those calls fail? How do I detect an error? Alright, it might panic, but what if I have to keep going forward, even in case of an error, in my embedded application? Should I split the different function calls and check for errors?

Kernel development intersects with embedded development in many points. I'm sure I am not the only one with these doubts.


Calls that fail return `Result<OkType, ErrorType>`. Here's something like what `unwrap()` does (actual implementation may have some nuances I didn't capture here):

    impl<OkType, ErrorType> Result<OkType, ErrorType> {
        fn unwrap(self) -> OkType {
            if Ok(x) = self {
                return x;
            }
            panic!("not ok");
        }
    }
So, if you don't accept panic, don't call `unwrap()` unless you're very sure that the returned value couldn't fail in the way you're using the API. You could think of `unwrap()` in Rust as semantically similar to `assert(error == 0)` in C. Avoid or use in the same places you would avoid or use that assertion in C.

That means you must handle the Err case, often by just returning it to the caller (like C). The ? operator is syntax sugar for this: return if err, or give me the ok value if non-err.

    fn my_fn() -> Result<(), Error> {
        // if this failed, returns some Err()
        let my_x = my_struct.do_something_that_can_fail()?;
        // ditto
        let my_y = my_x.something_else_that_can_fail()?;
        // no failures? Ok
        Ok(my_y)
    }
Of course, these can be chained with the exact same characteristics:

    fn my_fn() -> Result<(), Error> {
        // Equivalent
        Ok(my_struct.do_something_that_can_fail()?
                    .something_else_that_can_fail()?)
    }


Thank you! Yes, the last 2 examples are much clearer to my eyes. And it's on the tutorial too:

    File::open("hello.txt")?.read_to_string(&mut s)?;
I should pay more attention :)

But I also deduce 2 things:

- That unwrap should not be used in the kernel, correct?

- That I should be very careful about the external libraries I use and look for the presence of panic! and/or unwrap, or use https://docs.rs/no-panic/0.1.13/no_panic/ as jononor commented here below. But I guess this might rule-out some useful libraries from a project.


No problem!

> That unwrap should not be used in the kernel, correct?

More or less. There could be situations where you know the error is impossible and it is reasonable to unwrap. But it's a good rule of thumb.

> That I should be very careful about the external libraries I use and look for the presence of panic! and/or unwrap, or use https://docs.rs/no-panic/0.1.13/no_panic/ as jononor commented here below. But I guess this might rule-out some useful libraries from a project.

Absolutely! Use of unsafe is another potential watch-point.


> - That unwrap should not be used in the kernel, correct?

Yes.

I guess the exception is if it will only panic on a kernel bug and you're ok with that (that's what the bug macro is for, no)? E.g. if I have `let x = Some(2); let y = x.unwrap();`. The second statement panics if x is None, but that would only happen in the event of a bug, so you might be ok with it.

This does happen sometimes in real code, `if some_list.len() > 2 { let x = some_list.pop().unwrap() }` will never panic, because pop() only returns None (think null) if the list is empty, but we just checked it has at least two elements. I'm not sure what the kernels stance on this sort of code will be.

> - That I should be very careful about the external libraries I use and look for the presence of panic! and/or unwrap,

In the context of the kernel, I imagine you just don't use external libraries at all. In a more general context, yes, you need to understand when any libraries you use might panic. The general convention that is honestly not very well followed is to document panics that result from usage errors, and to allow for panics caused by library bugs.


If do_something(), get_this(), get_that() and as_paper_airplane() all returned a Result then you could write your code as follows:

  my_struct.do_something()?
    .get_this()?
    .get_that()?
    .as_ref()
    .as_paper_airplane()?;

If the error type of the result differs you can use map_err or write a converter by implementing the From trait for your the custom error struct you want to map to. The one thing you can't find out is if the calling function will panic in its function or a function that it calls. When you write embedded libraries you are never supposed to panic but there is nothing to enforce this which is, admittedly, not ideal.


It would really panic if it ran into an unhandled condition that you explicitly sent it to or left as is. What I mean is pretend you have some external reading/data coming in and you parse it to have the field "name" to always have a value of "scoutt".

match data.name { "scoutt" => return True, _ => panic!(),

}

In that case, because you "know" that the name field cannot be anything but your specific value ,you don't really mind the matching pattern and Rust is satisfied. But then you get a runtime panic! because all of the sudden the value changed to "joe".

Long story short is as long as you handle and declare ahead of time the proper fail conditions and paths to take at compile, you really shouldn't come across a panic.


Maybe something like this? https://docs.rs/no-panic/0.1.13/no_panic/



It seems a serious shortcoming if you can't allocate String or Vec using a custom allocator? Presumably you can pass an arena parameter to create the custom allocator?

There's plenty of room between <unlimited heap> and <preallocated>, I don't really understand why, e.g., you couldn't have a pool for a particular usage of String, or fixed size objects?

Also, introspection to ask which pool an object is allocated on, how much space is available (units for fixed size, largest hole for variable size)?

Is it just a syntax choice or is there a rusty reason not to have a thread-local list of allocators?


It's just time; we have stabilized the interface for the single global allocator, but the more general allocation API is still being worked on.

You can do all of these things for your own data structures, but the ones in liballoc shipped without that support, because we had to get Rust 1.0 out the door. Now support is being retrofitted, and that's part of why doing so was okay before; the plan to do so in a reasonable way existed at stabilization time.


There's nothing special about String other than it's in the standard library. It doesn't even have special cased syntax (&str does have special syntax, it doesn't allocate).

String does need to know how to deallocate itself, so the compiler has to know what allocated it. It can be moved between threads, so a thread local allocator doesn't work.

If you want a String type which is allocated differently, or which stores a pointer to its deallocate, you can easily implement that as a new type by hand.

(The standard library types might some day become generic over allocator, which would allow statically using a different allocator with the same api. The compiler would then force you to keep track of which allocator your string was allocated with).


This code is using that allocator, that's part of the issue. He wants there to be no panics, not to have the panics handled in a special way.


I see, so anything that can possibly fail should return a result instead of "maybe" panicking as a side effect? So no panics at all in the language? That may adversely affect the ergonomics of the language for user space applications. Perhaps it would be better to build a completely fallible version of the standard library and use linting to enforce no panics.


At the very least, no panics for allocations. Methods that return Result exist, the issue is that while they do, the panic-ing ones also exist. There are plans to address this, they just haven't been implemented yet.


Thanks, interesting to hear!


Is a deny(panics) lint-rule feasible, now or in the short term? Something that would make a project fail to compile if anything use in it could panic?


I don't know enough about the specifics to really say. I do know there's interest.


I don't have a problem with the idea of panic, but the way it's used seems like a wart. In a language with a result type why there are there any cases where the system isn't too broken to return a result, but default to panicking anyway?

Thinking about it from the embedded side, my panic is going to tell the failsafe circuitry that the processor can't be trusted. That's too aggressive of an action for any case where we still believe that the processor can return control to a caller that could have a more-graceful error path.


This seems to be a non-issue? Torvalds has valid concerns, the patch submitter acknowledges those concerns and describes how they can and will be fixed before anything is merged.


We are lucky that this is a link to the mailing list, and you can read the answers. Tech "journalists" read all mails from Linus and try to create click-baity articles from them, completely ignoring the context.


Linux Declares War On Rust Community


Don’t forget to use the picture of Linus giving the middle finger along with that headline.


It's interesting tech stuff that's relevant to HN, it certainly caught my interest. There doesn't have to be a conflict for something to be a good HN submission.



Redox-OS has a similar situation that kernel code should never panic https://gitlab.redox-os.org/redox-os/redox/blob/master/CONTR... .

    No possible panics should ever exist in kernel space, because then the whole OS would just stop working.
Note that Redox-OS is written completely in Rust.


I see in that document the mention of a libredox as a libstd replacement, I'm guessing that means they have their own primitives with constructors that handle allocation failures?


Previous post about the RFC suggesting to add it is here [1]. This is Linus's reaction to that RFC. The top comment there also links to this email from Linus, and there's some interesting discussion there.

[1] https://news.ycombinator.com/item?id=26812047


This was already shared yesterday. See top comment.

https://news.ycombinator.com/item?id=26812047


I really miss old Linus - This email is way too long :). However, it is 100% on point.


> Also, I'm a bit worried about long-term survival of the new language-of-the-day-that-makes-you-look-cool-at-beer-events. I was once told perl would replace C everywhere. Does someone use it outside of checkpatch.pl anymore ? Then I was told that C was dead because PHP was appearing everywhere. I've even seen (slow) log processors written with it. Now PHP seems to only be a WAF-selling argument. Then Ruby was "safe" and would rule them all. Safe as its tab[-1] which crashed the interpreter. Anyone heard of it recently ? Then Python, whose 2.7 is still present on a lot of systems because the forced transition to 3 broke tons of code. Will there ever be a 4 after this sore experience ? Then JS, Rust, Go, Zig and I don't know what. What I'm noting is that such languages appear, serve a purpose well, have their moment of fame, last a decade and disappear except at a few enthousiasts. C has been there for 50 years and served as the basis of many newer languages so it's still well understood. I'm sure about one thing, the C bugs we have today will be fixable in 20 years. I'm not even sure the Rust code we'll merge today will still be compilable in 10 years nor will support the relevant architectures available by then, and probably this code will have to be rewritten in C to become maintained again.[1]

[1]: https://lkml.org/lkml/2021/4/16/283


Agree, kernel is responsible for managing all low level stuffs that we mostly take granted, it should not have an extra level of abstraction.

I have read many mains of Linus where he specifically rants about the need of avoiding kernel panic, he specfically points what not to do in those mails. Certainly worth reading to understand this mail.


Yeah, worth reading. Even not being me a systems-developer stricto sensu, but Rust dev BTW.

There are valuable sort of pro/cons standpoints about Rust RFC for the Kernel.

And honestly I have learned many things. Hours well spent :). Just recomendable even if you like Rust or not.


Linux should stick to C. If Rust is still around in 20 years, then it can be chosen as a mainstream kernel programming language for Linux. Google Fuchsia's Zircon kernel chose to use C++ over Rust over stability concerns.

Best to have an independent cleanroom kernel implementation where Rust can prove itself.


I get great value in a lot of comments from Linus. He can explain himself really well, even be an asshole at times, but I always feel he makes himself very understandable in a way that doesn't make me think he actually is one.

He seems to have learned a lot in how to respond though, because this one is pretty tame :)


From a later email in the thread:

    > There's a philosophical point to be discussed here which you're skating
    > right over!  Should rust-in-the-linux-kernel provide the same memory
    > allocation APIs as the rust-standard-library, or should it provide a Rusty
    > API to the standard-linux-memory-allocation APIs?

    Yeah, I think that the standard Rust API may simply not be acceptable
    inside the kernel, if it has similar behavior to the (completely
    broken) C++ "new" operator.
    
Having done C++-in-the-kernel work, this is precisely right. That work required not only abandoning the C++ memory model, but also implementing a separate standard library that complied to kernel-space limitations/requirements.


You're skipping over all the emails that explain that it's not a fundamental limitation, many of the falliable allocation APIs already exist and many others have been in planning for a long time.


I was referring to the C++ description as "precisely right".

Though note from those emails that Rust-in-the-kernel will also require a rework of a lot of stdlib stuff, especially if they want to use the kernel allocator; it'll just be a LOT neater and more idiomatically-Rusty than the stuff I had to deal with, and will be much more compatible with off-the-shelf Rust libraries.

And the developers of this Rust port repeatedly talk about removing panicking API calls from the library for kernel use, and adding extra non-panicking versions to the standard library. It's only that latter work that will likely make it into Rust upstream.


You can write bad code in any language.

You can invent BS about any language.

Linux kernel does not use glibc, why assume it would use the default generic application-level std lib for Rust, or for C++?

FUD does not enlighten.


Is this excitement in Linus-lang?


Hardly. He's reserving judgement.


More like cautious optimism on his part.


Coming from C its a striking missing feature of rust. I still find it unnatural, and most of the code I write is Java, where the "what if" is replaced by allocation of project time for gc tuning. Rust's "don't think about allocs" when everything you do is dominated by memory ownership is wierd. Rust eradicates a whole class of memory related issues except "not enough". Panic is not necessarily safe/secure, that proc might be doing something important. Nofixilla take this approach too much, the assumption that no action is secure, what dies might be your burgler alarm. Fortunatly Linus prioritises working. I feel like his input will help the rust community.


We should pay more attention to this rust-on-linux project than this specific thread. It's a treasure trove. The first email of the thread is good introduction: https://lkml.org/lkml/2021/4/14/1023

The code: https://github.com/Rust-for-Linux/linux/tree/rust/rust/kerne...

Very cool!


Linus' comments are surprising to me.

First, I would have thought that Rust is far too much like C++ for him. :-)

Second, he seems remarkably calm about the very idea of a compiler inserting memory allocations in kernel code. He's only talking about panics. It's been a long time since I was involved in any kernel code, but invisible allocations would have set me (and any kernel programmers I knew) off worse than the worst rant anyone has ever seen from Linus. Any kind of invisible things (destructors, anyone?) would.


Rust doesn't have invisible allocations. The compiler doesn't even know what an allocation is.


"So if the Rust compiler causes hidden allocations that cannot be caught and returned as errors, then I seriously think that this whole approach needs to be entirely NAK'ed, and the Rust infrastructure - whether at the compiler level or in the kernel wrappers - needs more work."

So Linus is worried about nothing?


Yes, basically. The Rust standard library has data structures and functions which require memory allocation, and _will_ panic on allocation failure. But the allocating subset of the standard library is a separate module, which is usually turned off in embedded or kernel-level code. The current prototype uses it to speed up development, but the long term plan as far as I understand is to replace it or augment the current implementation to support fallible allocation (which is being worked on anyway).


> but invisible allocations would have set me (and any kernel programmers I knew) off worse than the worst rant anyone has ever seen from Linus

Reading tea leaves here, but I think he's calm about this because he's pretty sure it's not happening (Which it isn't)


> he seems remarkably calm about the very idea of a compiler inserting memory allocations in kernel code

Rust doesn't insert memory allocations anywhere. You can use Rust without a heap even in stack-only environments. Just like C.


They will probably solve the panic issue. However, it is strange to me that no one is explicitly mentioning the fundamental underlying tension here between C and Rust. Rust is a reaction to the lack of safety inherent in languages like C.

Rust is an opportunity to really evolve operating systems forward. That's why projects like Redox OS have more promise in the long term for me than dragging the Linux community to Rust.


This is the first time I've ever seen this usage of NAK. I'm used to the ACK/NAK jargon, but this one is new to me. I can figure this one out by context I think, but I find the new usage interesting. Can anyone shed light on how exactly he's using it here, and the reasons why you think that?


As a synonym of "rejected as invalid", for that is the information conveyed by a NAK response.


People think Rust is the savior, while the truth is you can still have dangling pointers by returning references from functions. Fornux C++ Superset is the way to go. Thanks.


I’m not au fait with Rust or memory management or systems code to understand the issue here. Does anyone have a good explanation?


The short explanation is, this code wasn't in its final form yet, but good enough to ask for a high level review of the code, the review came back "hey this looks okay overall but I have some questions about <details>" and the reply was "great! <details> is the work we haven't done yet; we'll get on that."


Ah I understand that bit! I don’t understand why panicking if you run out of memory is a thing and or why it’s especially bad in kernel code...


Ah okay.

Rust's panics are used as the "I cannot recover from this error" mechanism. Most applications don't really properly handle OOM, and so this kind of error falls into the "I want my current thread to just die, thanks" style of error handling, hence a panic.

That's bad in the kernel because you don't want the kernel to die, you do want to handle it and do something.

The semi-ironic part here is that this behavior is the way that it is largely because of how the Linux userland works, where it's often tough to even tell that your system is is in a near-OOM or OOM state. Which is why applications rarely handle it, even if they theoretically could.

Now, Rust itself has good support for these environments; I work on an embedded kernel, and we do no allocations at all. But what the kernel wants is where Rust is currently weakest in support; we have good support for "no allocations" and "panic on OOM or return Result on OOM" but not great support for "return Result on OOM and do not panic on OOM", and reasonably, the kernel would like the behavior they don't want to be impossible. That's the work that needs to be done.


> we have good support for "no allocations" and "panic on OOM or return Result on OOM"

so if i understand it correctly, the ability is there by just "returning a result on oom" - linus is just asking for complete assurance that a panic will never happen? like say from a library?


It is not 100% clear to me if he's asking for no panic in any possible situation. (AFAIK, the BUG macro in the kernel does something very similar, so I would find it slightly surprising, though he may only want it to happen "explicitly" or something, we'll just have to see.)

It is clear to me that he is asking for no panics for OOM.


> It is clear to me that he is asking for no panics for OOM.

agreed. which is why i'm confused. maybe this is a non-issue that has exploded into an issue :)

thanks.


A Rust noob. But isn't allocation done with e.g. String::from("abc")? I wouldn't want to have an API there.


Are there discussion about java/kotlin kernel support through Graal Native? That would have too à great value proposition


I've got "Body for this message unavailable"


Try this archived version:

https://archive.is/VpcHT


Hmm, I would have thought that Rust produces standard ELF binaries that will run on Linux as-is.

Could someone ELI5 why the kernel needs changes to support Rust?


Because this is about allowing parts of the kernel itself to be written in rust.


This is for writing kernel modules, which ideally need to be integrated in the kernel build system.


Mostly this is surrounding (as others have mentioned) writing kernel components, such as drivers, in Rust.

Context on the post Linus says is: "If rust cannot report an error without aborting when it runs out of memory, then we can't use it". I have never written any rust code which does not abort when it runs out of memory, but I have never written for hardware (only with userland in mind) and my knowledge isn't that great anyway, so I'm not sure if it's possible or not.


Ah, I didn't realise it was about writing bits of the kernel itself in Rust, now it makes sense, thanks!


The ultimate memory safety PL is rejected because it cannot malloc without blowing up?


It has not yet been rejected. The review is vaguely positive. Like any pull request, there has been some feedback, which will now be addressed.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: