Hacker News new | past | comments | ask | show | jobs | submit login
Converting the Kernel to C++ (kernel.org)
79 points by artagnon on Jan 10, 2024 | hide | past | favorite | 55 comments



> We do a lot of metaprogramming in the Linux kernel, implemented with some often truly hideous macro hacks.

If you do a lot of meta-programming… why not used a more principled approach like Lex & Yacc? Why not go all the way to design mini-languages and compile them to C? Or design C extensions and compile them to C? You can still have good debugging support with line markers. https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html


You'd just be reinventing C++.


I sure hope not.


Please, anything but that.

I find it hard not to view stuff like this as a "if Rust gets to be in the kernel, then so should C++!".


Using a small set(kernel-c++20) it can simplify the existing C code by quite a bit, you get ctor, dtor, class inheritance etc all for free. Currently the kernel is basically in object-oriented C anyways.


Are you referring to a specific subset of C++?


There is nothing pre-defined, but it's not hard to scope one either, basically a freestanding c++20 will do most of it already, just disable exceptions rtti etc.


I'm skeptical that this is better than starting to rewrite the kernel in Rust. It would be better if everyone can focus on rewriting things into Rust rather than having a choice of rewriting into C++ or Rust. Unlike this email, C can be converted into Rust piecemeal and integrate with the rest of the kernel. It's also in kernel developers best interest to start learning Rust, and getting them to start earlier than later will be beneficial.


I'm skeptical because C++ is a slippery slope. Glad to see they are going for templates and concepts, and not constructors and destructors.

Yes, what C++ is supposedly good for – RAII, it actually got a little wrong:

1. Default construction / value initialization: Causes lots of initialization before assignment that is obviously unnecessary. Try allocating a buffer: `std::make_unique<char[]>` surprisingly memsets the buffer.

2. Constructors: No way to fail without exceptions. That buffer example again: Without exceptions, `std::make_unique<char[]>` will actually attempt to memset a nullptr on allocation failure … before your nullptr check (which btw makes the compiler entitled to delete your nullptr check as well).

3. Move is a (burdensome) runtime state that mandates nullability.

4. Destructors: Can't take arguments, forcing objects to contain implicit references to each other.

Rust's affine type system fixes 1-3, but not 4, which only a linear type system could: https://en.wikipedia.org/wiki/Substructural_type_system#The_...


> 4. Destructors: Can't take arguments, forcing objects to contain implicit references to each other.

This issue has been mentioned back in 2018: https://lore.kernel.org/lkml/CAOMGZ=HwTjk9JbGNeWHd+Jr3rwOkFO...


> I'm skeptical that this is better than starting to rewrite the kernel in Rust.

I recently migrated a complex cross platform C99 embedded project to C++17. We got the build systems ported in a day and are able to chip away at it piece by piece.

We looked at Rust but the main issue wasn't technical it was the large learning curve for the team. Rust is not as intuitive to C developers are C++ is.


> Unlike this email, C can be converted into Rust piecemeal and integrate with the rest of the kernel.

Did you read the entire email? Here's a quote that seems to directly contradict you:

> converting C code to Rust isn't something that can be done piecemeal, whereas with some cleanups the existing C code can be compiled as C++.

As far as I can tell, the author is correct. What is the pattern for converting C to Rust piecemeal?


> What is the pattern for converting C to Rust piecemeal?

You can call Rust from C, and C from rust. You can convert it function by function.


Won’t you end up with mostly _unsafe_ code this way?


It is true that the boundary will be unsafe, and that's more than if it were to be all in pure Rust, but that does not change that, even when doing kernel level work, in practice it has generally been shown that unsafe code is still relatively small in amount.


It really depends. If the interface that you're wrapping can have a nice, abstract API (like "parse this JPEG header into a struct" or something), then the hope is that you can figure out how to express that API in safe Rust, even though underneath it might be a big pile of C. The trouble is when there is no clean API boundary that can fit within safe Rust's rules, like when the underlying C code is "object soup" where everything has pointers to everything else, and it's all up to the programmer to know when it's safe to free things. In that case trying to wrap all that in Rust either gives you unsafe code everywhere, or maybe worse, "safe" APIs that are lying to you and are actually unsound.


> then the hope is that you can figure out how to express that API in safe Rust, even though underneath it might be a big pile of C.

This is actually not what you want, because as a rule Safe Rust enforces the use of Rust references which have stricter requirements than C++ references or C/C++/Rust raw pointers, and will otherwise introduce UB. (See the Rustonomicon for a detailed description of those requirements.) A "safe Rust" API is OK when all callers can be proven to satisfy these requirements, otherwise raw pointers are easier even though they must be accessed in an unsafe block.

There are also pitfalls when passing arguments by value to Safe Rust, e.g. a `bool` value MUST be 0 or 1, an `enum` MUST not have an invalid discriminant etc. Breaking any of these requirements when calling Safe Rust from C/C++ makes instant UB a very real possibility.


What you're talking about is what I was referring to as "safe APIs that are lying to you and are actually unsound". I've written more about safety and soundness here: https://jacko.io/safety_and_soundness.html


The issue is that the API is not lying to you; it's just as sound as any other piece of Safe Rust. Rather, the point is that UB can easily occur when Safe Rust is called from an unsafe 'context', such as may occur in ordinary C/C++ code; Safe Rust being sound only implies that the UB can in some sense be 'blamed' on the caller. Nonetheless as a practical matter, fixing the UB can require using "unsafe" functions that will be more general than their Safe counterparts.


I hope I am wrong, but this sounds as if the guy wrote the email thought that C is a subset of C++.

No it is not. First example: initialisation of structs


>What is the pattern for converting C to Rust piecemeal?

You take a piece and either rewrite it yourself or use a transpiler that produces potentially unsafe Rust code which you then later would want to rewrite into safe Rust code.


A recent practical example of the former: the fish shell re-wrote incrementally from C++ to Rust, and is almost finished https://github.com/fish-shell/fish-shell/discussions/10123

An example of the latter: c2rust, which is a work in progress but is very impressive https://github.com/immunant/c2rust

It currently translates into unsafe Rust, but the strategy is to separate the "compile C to unsafe Rust" steps and the "compile unsafe Rust to safe Rust" steps. As I see it, as it makes the overall task simpler, allows for more user freedom, and makes the latter potentially useful even for non-transpiled code. https://immunant.com/blog/2023/03/lifting/


Somewhat related: In 2020 gcc bumped the requirement for bootstrapping to be a C++11 compiler [0]. Would have been fun to see the kernel finally adopt C++14 as the author suggested.

I don't think that Linus will allow this since he just commented that he will allow rust in drivers and major subsystems [1].

would have hoped see more answers or see something in here from actual kernel developers.

0: https://github.com/gcc-mirror/gcc/commit/5329b59a2e13dabbe20...

1: https://youtu.be/YyRVOGxRKLg?si=_ad7wU51bDdDg6Ic&t=104



EDIT: I failed to notice the date stamp, you can ignore this.

Jeeze. Linus writes like C++ killed his dog or something.

There's plenty of great C++ 'wins' over C that make just doing everyday programming tasks so much easier and simpler for no cost.

You don't need to write so abstract and in the clouds that it becomes impossible to maintain - I certainly don't.

And frankly, you can shoot yourself in the foot just as easily with C, just in different ways.

To be clear: I'm not saying the kernel needs C++, I'm not qualified to write on that specifically. But I am saying C++ is not nearly as bad as he makes it out to be.


> makes it out to be.

made it out to be. That post is over 15 years old at this point, which goes to the very point of the posted article - C++ has (subjective opinion here) improved a lot in that time.

No idea what his current opinion on C++ is. Personally I'd rather there was no C++ or Rust in the Linux kernel. IMHO it would be significantly less of a cognitive burden to convert the entire kernel gradually to C++ than Rust.

But I would necessarily hold a man to opinions he held 15 years ago when the landscape has changed so much in that time. Rust was only born the previous year (2006) and not championed by Mozilla until 2009. Yet here we are and there is support in the official kernel for it anyway, whereas C++? Not so much.


Mea culpa, I did not notice the date stamp. C++ 15 years ago was certainly much worse than today.


Note the year. While I used C++ long before that and certainly didn't agree with him, the C++ Linus took issue with was a lot worse than the C++ of today.


No reply from Linus Torvalds yet! Hoping to find one was half the reason why I clicked on this.


I haven't noticed one back in 2018, thus I wouldn't hold my breath now.


But this time it's from H. Peter Anvin.


As with all good writing, it is hard for me to tell serious suggestions from satire.

However, hpa seems to be the crazy genius type, so maybe it's all for real.

The thing I'd like to know is, given that the kernel is written in "kernel C", a shared culture about how C should be used together with a hairy mountain of macros, why not make kernel-C a proper language?

It's probably just a number of extensions away. Together with some rules about code generation, it could make for a fairly neat dialect of C that would be much easier to use and understand. It would also pave the way for further experiments with compile time guarantees about soundness of isolated sections of the code.

Because, and we should be honest about this, going any C++-like route would impact the long term quality of submitted code. There should be no technical reason for this, but that does not make it something that should be turned a blind eye on.


does this have implications on the place of c in the programming world? I am seeing this trend in HPC and systems programming to move into more structured languages. even FORTRAN has objects now.

gives me some doubts if I should put my time into learning c or not. a lot of the tech I care about uses c++ (i want to work in ANNs) and true the kernels are basically in c but the code is c++.


The path to migrate to Zig is, or would be, the most straightforward, except that that language is not ready. But in design terms it's definitely got the things the OP is championing C++ for, and a better story for actually addressing longstanding systems programming issues instead of heaping stuff onto the C toolchain and therefore making everyone debug C build-time errors.


Seeing that the last time I tried to use Zig–completely inexperienced, trying to do completely basic stuff–I hit a compiler bug within two hours, I'd say that "not ready" is about right.

To be fair, the latest Github version had the bug already fixed, but it's still a sign the language is basically still in an alpha stage.


I think the standard library has some bugs too.


The reasoning behind switching to C++ today rather than Rust, Zig, or any other language is familiarity. C++, while it currently doesn't have memory safety, is extremely easy for any C programmer to ease into, especially given the limited subset that Anvin is proposing for kernel use. Meanwhile, Zig and Rust look absolutely awful.

As somebody who's been using C++ for eight years, I despise Rust and Zig (and Nim and...) syntax. Don't get me wrong, I like the concept of memory safety, but I think too many languages are going "We have to stand out!" and therefore are creating completely new syntaxes. That may work for you, but frankly, I'm happiest in a language that sticks close to C++ syntax (which is probably one reason I love D so much).

Also, C build errors have nothing to do with C++? What are you getting at there?


I'd be a little interested to hear what you like about C++'s syntax! As a non-C++ programmer, I mostly think of the language as a pile of mistakes that kinda had to happen for other languages to learn from them, very much including syntax (ex. types preceding declarations, necessitating `auto` and complicating parsing) - and I actually think the ugliest parts of Rust copy those mistakes (ex. using <> for generics, which are ambiguous with less-than greater-than when used with no whitespace, necessitating the turbofish).

(I've heard plenty of complains about Rust syntax, so I'm less so interested in that than what C++ syntax gets right. Unless it's just a familiarity thing...)


It's pretty much familiarity. If I'd learned Rust back in the day, I'd probably not want to learn C++ for syntax reasons as well.

One thing I think Rust gets wrong is classes. I absolutely don't understand why you would require a separate `impl` block to provide methods. C++ has a reasonable syntax of providing methods in the class.


Well, they're not classes, they're interfaces, and so you can implement multiple interfaces for multiple types, and so you have to implement them separately from the type declaration. There's not too much of a way around that, I think.


This feels like a personal statement of familiarity rather than looking ahead to designing a language that has to be taught to next generation of software engineers. AFAIK, the only operator Rust introduces over C++ is the try operator (aka. ?) and match statements. Furthermore I can't imagine how someone could prefer

    &&auto my_closure = [&](int x, int y) { return x + y }
to just

    let my_closure = |x: int, y: int| { x + y }
C++ is the one that tries hard to standout.


> This feels like a personal statement

Because it is. And is a statement I agree 100% being a +20-years C developer with a hardwired C parser in my brain: Rust, Zig and some (most?) newer C++ syntax is contorted at minimum (to my eyes).

I wish the only difference were about just new operators, but just the fact that the type has to come after the variable declaration is awful to me (also for returning types in functions declarations). One can tell me 100 reasons why Rust does it this way and they'll probably be all true and right, and you can call me all sort of things but this kind of new syntax puts me off right away.


The difference is negligible in real syntax:

    auto my_closure = [](int x, int y) { return x + y; };

    let  my_closure = |x: int, y: int| { x + y }
and frankly, the option of an explicit capture-list with aliases and mixing moves and copies is a benefit:

    int sum = 0, diff = 0;
    
    auto adder = [&sum](int num) {
        sum += num;
    
        // compilation error, author didn't
        // mean to capture diff.
        diff -= num;
    };


> C++, while it currently doesn't have memory safety [...] Meanwhile, [...] Rust look absolutely awful

Memory safety is the primary reason people are advocating for Rust in kernel! Given that goal, C++ looks absolutely awful.


perhaps it is possible. modern c++ is quite different from the previous incarnations of c++ but when you are writing kernel code.. you have in a space where a lot of stuff is not available : you are running in the kernel context. the userland part is waiting for you to give control back, so your code must be fast, do the required job, and be predictible. we cannot have a line of code that will, sometimes, create a huge cpu or memory load because a lot of stuff is going to happen behind it. that's why C is so effective... the kernel context is not a place where you can have garbage collection or classes-stuff that will happen when you're supposed to get control back to userland as quickly as possible.


Kinda feel like he just wrote this to make the rust folks seethe.


Should have been titled as Converting the Kernel to Rust. Heck even to Zig would be more appealing.


What’s the benefit? Rust actually is a nice language. C++ requires deciding which features to use? I don’t know modern C++ but it seems to have a kind of cult like indoctrination where everything else is wrong and only each persons flavor is correct leading to endless bike shedding.

C may have technical challenges but seemingly personal preferences are much more limited, just like the language. That helps large projects move forward.


"Kernel C" is the name given to the subset of C (read, Linux's flavor of C) that was formed by deciding which language features/compiler extensions they would/wouldn't use. Hardly a new or unsolvable problem.

Benefits of C++ over Rust: - Far easier and nicer to integrate with an enormous existing C codebase - C semantics are much more similar to C++ than to Rust, so no need to do huge pattern redesigns - No need to suddenly fight a borrow checker and prove lifetimes for a code base that isn't aware of them at all - Interfacing with existing C code will require unsafe interop, which greatly reduces the benefit of Rust's primary selling point - Transitions to C++ code will counterintuitively almost certainly introduce less bugs than rewriting in Rust, because you can do it in a much more piecemeal manner, and only do so in very disciplined way

Why should they rewrite it in Rust?


Rust gives you useful and important properties that C++ doesn't. For example, you can look at a piece of rust code and by following very simple rules verify that there's no UB in that code. You can't do the same in C++.

Also, the interop story with rust is equal or better than C++ in many areas at this point. Aside from tools like bindgen eliminating entire families of footguns, try doing something like registering a member function to a c callback, the way virtually every driver works. The rust equivalent in the extremely immature kernel dev trees are a bit ugly, but already much simpler than the idiomatic C++ equivalents. For additional fun, try and do it safely with a custom allocator or closures in a modern version.


>No need to suddenly fight a borrow checker

This isn't a bad thing. Failing to compile when there is a memory error is better than leaving it in to later manifest as a mysterious crash or be found by a fuzzer or an attacker. When programming without a borrow checker you have to be the borrow checker yourself. People being their own borrow checker does not scale.


> Rust actually is a nice language. C++ requires deciding which features to use?

With Rust you have to choose which features to use. Unlike C++, with it's "you pay for what you use" model, Rust's features are arguably still broken, unstable, and under active development. See async rust for example.

> don’t know modern C++ but it seems to have a kind of cult like indoctrination where everything else is wrong and only each persons flavor is correct leading to endless bike shedding.

Indeed you don't know what you're talking about. Adopting a style guide is not the same thing as joining a cult.

C++ has been in active use for around 30 years, and modern C++ for over a decade. People use it just fine, and don't feel the need to evangelize. Other communities sadly still appear to feel a constant need to reassure and convince themselves they didn't made the wrong choice.


Should I use boost style? Qt/Eigen styling? Can I get a subset of the language and stdlib without allocations? Should I use multiple inheritance with I classes or template duck typing? What’s the cycle and storage cost of smart pointer wrappers? Atomics kill pipelines and caches. Yeah the zero cost isn’t actually zero cost sorry to say. Are closed over stack variables moved or referenced? If referenced, what happens if the closure outlives the call context?


Google style guide is good tbh.

You should use the right tool for the job.

std::unique_ptr is just a move-only ergonomics type over a bare pointer, it doesn't have extra state. std::shared_ptr is basically no different than Rust's Arc.

Closures specify capture semantics explicitly, so you will always know, and default to copy.

And yes, you can capture local references and leave. Don't do that, or return references to local variables, or lots of other fun stuff.


> Should I use boost style? Qt/Eigen styling?

You use whatever you wish to use. You can also waste time debating whether you should name your variables in camelCase or snake_case. It doesn't matter, and no one will berate you for whatever choice you made.

It's your call. Do you think that having a choice is bad?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: