An example that needs to be in the textbooks. A detailed explanation and a timeline along with the code snippets. It succinctly shows you the complexities involved. Kudos to Max for putting it all into the post.
> Blaming the Linux kernel (i.e. somebody else’s code) for data corruption must be the last resort.
^^^ I can only image the stress levels at this point.
10 years ago I found even more outrageous bug in Windows 8.
I was working in MSFT back than and I was writing a tool that produced 10 GB of data in TSV format , that I wanted to stream into gzip so that later this file would be sent over the network. When the other side received the file they would gunzip it successfully, but inside there would be mostly correct TSV data with some chunks of random binary garbage. Turned out that pipe operator was somehow causing this.
As a responsible citizen I tried to report it to the right team and there I ran into problems. Apparently no one in Windows wants to deal with bugs. IIt was ridiculously hard to report this while being an employee, I can't imagine anyone being able to report similar bugs from outside. And even though I reported that bug I saw no activity in it when I was leaving the company.
However I just tried to quickly reproduce it on Windows 10 and it wouldn't reproduce. Maybe I forgot some details of that bug or maybe indeed they fixed this by now.
Worked there too at one point. It can be a struggle to find the right feature team. Once you do, if you can get it triaged, unless it’s high sev high priority it’s getting kicked to the next time period.
> Maybe I forgot some details of that bug or maybe indeed they fixed this by now.
There are lots of things which have been fixed in Windows 10, I'd go so far to say 1903 (19H1) is where things started to settle down, but even the latest versions are not perfect. When the Israeli/Palestinian conflict broke out in 2019, some of the US military computers started playing up for about a week, after the US vetoed something at the UN level regarding this conflict. So MS still has a long way to go to get things secure.
Another example of a vulnerability that is purposefully obfuscated in the commit log. It is an insane practice that needs to die. The Linux kernel maintainers have been doing this for decades and it's now a standard practice for upstream.
This gives attackers an advantage (they are incentivized to read commits and can easily see the vuln) and defenders a huge disadvantage. Now I have to rush to patch whereas attackers have had this entire time to build their POCs and exploit systems.
I've described how we (the kernel security team) handles this type of things many times, and even summarized it in the past here:
http://www.kroah.com/log/blog/2018/02/05/linux-kernel-releas...
Scroll down to the section entitled "Security" for the details.
If you wish to disagree with how we handle all of this, wonderful, we will be glad to discuss it on the mailing lists. Just don't try to rehash all the same old arguments again, as that's not going to work at all.
Also, this was fixed in a public kernel last week, what prevented you from updating your kernel already? Did you need more time to test the last release?
Edit: It was fixed in a public release 12 days ago.
I don't know what you don't understand: EVERY single kernel fixes a few vulnerabilities. If you lazily refuse to update because none of those say "hint: there is a vulnerability here", then you are taking the deliberate action of skipping some security fixes. Greg's announces always say "all users must upgrade". If there was sometimes a different signal such as "all users must really really really upgrade", then for sure you would simply skip all other ones, as it already seems like you're waiting for a lot of noise before deciding to apply due fixes, and you would remain vulnerable to plenty of other vulns for much longer.
Here the goal was to make sure that all those who correctly do their job are fixed in time. And they were. Those who blatantly ignore fixes... there's nothing that can be done for them.
I obviously am very comfortable disagreeing with people who work on the kernel or adjacent software. Working in those areas does not at all make them correct, or even informed, especially with regards to security.
It's been working semi-well, and gives us a way to deal with longer embargo times (like months instead of weeks and days), but it does not integrate well into the linux-distro-like way of working just yet, which is an issue that hopefully will be resolved sometime in the future if the linux-distro members wish it to be.
> When doing kernel releases, the Linux kernel community almost never declares specific changes as “security fixes”. This is due to the basic problem of the difficulty in determining if a bugfix is a security fix or not at the time of creation. Also, many bugfixes are only determined to be security related after much time has passed, so to keep users from getting a false sense of security by not taking patches, the kernel community strongly recommends always taking all bugfixes that are released.
> Linus summarized the reasoning behind this behavior in an email to the Linux Kernel mailing list in 2008 ...
Since severity can be a moving target, it seems like there is no straightforward solution. With that said, by hiding the known ones, older distros don't have much of a hope in hell of getting all reported CVE fixes back-ported.
Why isn't there a public index mapping known CVE fixes to git commit IDs? This seems totally doable and would make the world a more secure place overall.
> older distros don't have much of a hope in hell of getting all reported CVE fixes back-ported
Older distros have always had a ton of privilege escalation bugs and I don’t think that’s ever gonna change. If you can’t keep everything updated, your machines have to be single-tenant.
What should they do instead? You have to rush to patch in any case. If the maintainers start to label commits with "security patch" the logical step is that it doesn't require immediate action when the label is not there. Never mind that the bug might actually be exploitable but undiscovered by white hats.
If you do not want to rush to patch more than you have to, use a LTS kernel and know that updates matter and should be applied asap regardless of the reason for the patch.
When someone submits a patch for a vulnerability label the commit with that information.
> You have to rush to patch in any case.
The difference is how much of a head start attackers have. Attackers are incentivized to read commits for obfuscated vulns - asking defenders to do that is just adding one more thing to our plates.
That's a huge difference.
> the logical step is that it doesn't require immediate action when the label is not there.
So I can go about my patch cycle as normal.
> Never mind that the bug might actually be exploitable but undiscovered by white hats.
OK? So? First of all, it's usually really obvious when a bug might be exploitable, or at least it would be if we didn't have commits obfuscating the details. Second, I'm not suggesting that you only apply security labeled patches.
Don't know why your other comment got downvoted. Silently patching bugs has left many LTS kernels vulnerable to old bugs, because they weren't tagged as security fixes. Also leads to other issues..: https://grsecurity.net/the_life_of_a_bad_security_fix
I've read the post before, I've seen the talk, and frankly it's been addressed a number of times. It's the same silly nonsense that they've been touting for decades ie: "a bug is a bug".
They don’t need to label it security even, just a “upgrade now, upgrade soon, upgrade whenever”.
But they clearly don’t want nor care about making that call (and even more clearly basically expect everyone to run the latest kernel at all times (and if you run into a bug there no doubt you’ll be told to not run the latest kernels).
I think you missed my point. Attackers will go through commits regardless of a "Security Patch" tag.
But going about your normal patch cycle as normal for things not labelled "Security Patch", just means if the patch for some reason should have been tagged but wasn't, you're in the same situation.
I do see the value in your approach, but it just does not change anything for applications where security is top priority.
Well Xen for instance includes a reference to the relevant security advisory; either "This is XSA-nnn" or "This is part of XSA-nnn".
> If the maintainers start to label commits with "security patch" the logical step is that it doesn't require immediate action when the label is not there. Never mind that the bug might actually be exploitable but undiscovered by white hats. If you do not want to rush to patch more than you have to, use a LTS kernel and know that updates matter and should be applied asap regardless of the reason for the patch.
So reading between the lines, there are two general approaches one might take:
1. Take the most recent release, and then only security fixes; perhaps only security fixes which are relevant to you.
2. Take all backported fixes, regardless of whether they're relevant to you.
Both Xen and Linux actually recommend #2: when we issue a security advisory, we recommend people build from the most recent stable tip. That's the combination of patches which has actually gotten the most testing; using something else introduces the risk that there are subtle dependencies between the patches that hasn't been identified. Additionally, as you say, there's a risk that some bug has been fixed whose security implications have been missed.
Nonethess, that approach has its downsides. Every time you change anything, you risk breaking something. In Linux in particular, many patches are chosen for backport by a neural network, without any human intervention whatsoever. Several times I've updated a point release of Linux to discover that some backport actually broke some other feature I was using.
In Xen's case, we give downstreams the information to make the decisions themselves: If companies feel the risk of additional churn is higher than the risk of missing potential fixes, we give them the tools do to so. Linux more or less forces you to take the first approach.
Then again, Linux's development velocity is way higher; from a practical perspective it may not be possible to catch the security angle of enough commits; so forcing downstreams to update may be the only reasonable solution.
Not OP, but please do try to influence this policy if you can:
1. The commit message [1] does not mention any security implication. This is reasonable, because the patch is usually released to the public earlier and it makes sense to do some obfuscation, to deter patch-gappers. But note that this approach is not a controversy-free one.
2. But there is also no security announcement in stable release notes or any similar stuff. I don't know how to provide evidence of "something simply does not exist".
3. Check the timeline in the blog post. The bug being fixed in stable release (5.6.11 on 2022-02-23) marks the end of upstream's handling of this bug. Max then had to send the bug details to linux-distros list to kick off (another separate process) distro maintainers' response. If what you are maintaining is not a distro, good luck.
#1 is intentional, for better or for worse. It’s certainly well-intentioned too, although the intentions may be based on wrong assumptions.
#2: upstream makes no general effort to identify security bugs as such. Obviously this one was known to be a security bug, but the general policy (see #1) is to avoid announcing it.
#3: In any embargo situation, if you’re not on the distribution list, you don’t get notified. This is unavoidable. oss-security nominally handles everyone else, but it’s very spotty.
Sometimes I wish there was a Linux kernel security advisory process, but this would need funding or a dedicated volunteer.
As far as I know, this doesn’t get information from upstream maintainers. For this to work well, I think we would want actual advisories generated around commit time, embargoed early notification, and a process for publication.
TBH the thing annoyed me most in this story is the "Someone had to start the disclosure process on linux-distros again and if they didn't no one would know"-part. There are certainly silent bug fixes where the author intentionally (or not) does not post to linux-distros or any other maillists even after stable release. It would take an hour to dig a good example tho. (Okay, maybe 10 minutes if I'm going to read Brad Spengler's rants)
I guess a Linux kernel security advisory process is needed to fix this, but yeah :(
This is about the commit that fixed the bug, not the commit that introduced the bug. The accusation is not that linux developers intentionally introduced a vulnerability. Instead it is that linux developers hid that a commit fixed a vulnerability. Linux does this to prevent people from learning that the vulnerability exists.
> Linux does this to prevent people from learning that the vulnerability exists
No, not at all, just to leave time to users to deploy the fix before everyone jumps on exploits. This is important because every single backported patch is a candidate for an exploit already, and it's only a matter of time before any of them is exploited. Reason why embargoes have to stay short. It takes some time to figure whether a bug may have security impacts. It takes much less time once this is figured, to develop an exploit.
By the way it could have really happened that the fix for data corruption would have been merged first, and only later the author figured there was a security impact. And the patch wouldn't have been any different. That's why leaving 1-2 weeks for the fix to flow via distros to users, and having the author post a complete article is by far the best solution for everyone.
Nobody is arguing that users having a 1-2 week patch window is a bad thing. However, this frankly seems incompatible with open-source projects. Silently patching issues does not work in practice; it frequently leads to missed fixes, misapplied patches and other incompatibility woes. The situation with backports and LTS releases showcases this well— the only truly well-supported kernel is latest. Everything else is a patchwork of best-effort fixes, not all of which may have been applied correctly. Brad Spengler of grsecurity fame talks frequently about this (primarily via Twitter): https://twitter.com/spendergrsec
Not really. As you say there's an extremely difficult balance with opensource and not exposing everyone at once. You can't get a fix deployed everywhere without it being public first or it ends up in a total unfixable mess. But if the fix is public and gives too many info (exploit procedure) then you put everyone in danger until the fix flows to users.
Thus the only solution is to have a public fix describing the bug and not necessarily all the details, while distros prepare their update, and everyone discloses the trouble at the same time. Those who need early notification MUST ABSOLUTELY BE on linux-distros. There's no other way around. As soon as the patch is published, the risk is non-nul and a race is started between those who look for candidate fixes and those who have to distribute fixes to end users.
This is not about silently patching or hiding bugs, quite the opposite, it's about making them public as quickly as possible so that the fix can be picked, but without the unneeded elements that help vandals damage shared systems before these systems have a chance to be updated. Then it is useful that the reporter communicates about their finding, this often helps improve general security by documenting how certain classes of bugs turn to security issues (Max did an awesome job here, probably the best such bug report in the last few years). And distros need to publish more details as well in their advisories, so details are not "hidden", they're just delayed during tha embargo. Those who are not notified AND who do not follow stable are simply irresponsible. But I don't think there are that many doing that nowadays, possibly just a few hero admins in small companies trying to impress their boss with their collection of carefully selected patches (that render their machine even more vulnerable and tend to make them vocal when such issues happen).
In addition it's important to keep in mind that some bugs are discovered as being exploitable long after being fixed. That's why one MUST ABSOLUTELY NOT rely on the commit message alone to decide whether they are vulnerable or not, since it's quite common not to know upfront. I remember a year or two ago someone from Google's security team reported a bug on haproxy that could cause a crash in the HPACK decoder. That was extremely embarrassing as it could allow anyone to remotely crash haproxy. We had to release the fix indicating that the bug was critical and that the risk of crashing when facing bad data was real, without explaining how to crash it (since like a kernel it's a components many people tend to forget to upgrade). Then after the fix was merged, I was still discussing with the reporter and asked "do you think it could further be abused for RCE?". He said "let me check". A week later he came back saying "good news, I succeeded". No way to get that info in the commit message even if we wanted to, since that was too late. Yet the issue was important.
Speaking of Brad, I personally think that grsec ought to be on linux-distros, but maybe they prefer not to appear as "tainted" by early notifications, or maybe they're having some fun finding other issues themselves. We even proposed Brad to be on the security list, because he has the skills to help a lot and improve the security there. He could have interesting writeups for some of the bugs, and it would probably change his perception of what happens there. Maybe one day he'll accept (still keeping hope :-)).
Can you say what you're hoping to do? LK devs tag security fixes with "[SECURITY]" and then what? You would merge individual [SECURITY] commits into your tree?
Currently the situation is that you can just follow development/stable trees right (e.g. [0])? Why would you only want the security patches (of which there look to be a lot just in the last couple weeks). Are you looking to not apply a patch because LK devs haven't marked it as a security patch?
Assume I patch my Linux boxes once a month. I see a commit where an attacker has a trivial privesc. I read the commit, see if it's relevant to me, and potentially decide to do an out of cycle patch. As in, instead of updating next month I'll just update now.
Gotcha. Yeah it does seem like there's some space between the overpromising "I am a Linux Kernel Dev and I proclaim this patch is/is not a security patch" and the underpromising "I am a Linux Kernel Dev and have no knowledge of whether or not this is a security patch". It doesn't seem unreasonable to mark it somehow when you know.
On the other hand, just on that page I linked, there's... a lot of issues in there I would consider patching for security reasons. I don't know how reasonable it is, given the existing kernel development model, to tag this stuff in the commit. The LTS branches pull in from a lot of other branches, so like, which ones do you follow? When Vijayanand Jitta patches a UAF bug in their tree, it might be hanging out on the internet for a while for hackers to see before it ever gets into a kernel tree you might consider merging from.
I guess what I'm saying here is that it seems like a lot to ask that if I find a bug, I:
- don't discuss it publicly in any way
- perform independent research to determine whether there are security implications
- if there are, ask everyone else to keep the fix secret until it lands in the release trees with a [SECURITY] tag
- accept all the blame if I'm ever wrong, even once
That too is a lot of overhead and responsibility. So I'm sympathetic to their argument of "honestly, you should just assume these are all security vulns".
So maybe this is just a perspective thing? Like, there are a lot of commits, they can't all be security issues right? Well of course they can be! This is C after all.
Like in that list, there's dozens of things I think should probably have a SECURITY tag. Over 14 days, let's just call that 2 patches a day. I'm not patching twice a day; it's hard for me to imagine anyone would, or would want to devote mental bandwidth to getting that down to a manageable rate ("I don't run that ethernet card", etc.)
So for me, I actually kind of like the weekly batching? It feels pragmatic and a pretty good balancing of kernel dev/sysadmin needs. Can I envision a system that gave end-users more information? Yeah definitely, but not one that wouldn't ask LK devs to do a lot more work. Which I guess is a drawn out way of saying "feel free to write your own OS" or "consider OpenBSD" or "get involved in Rust in the kernel" or "try to move safer/microkernel designs forward" :).
I think some important context here is that the people who want commits obfuscated are never the ones making a decision about the security label. The people writing the commit already know it's a security issue.
> The people writing the commit already know it's a security issue.
For this special case, yes. But for the vast majority of bugs it's the opposite and existing bugs get exploited later, thanks to some people who think that some patches are not security-related and do not apply the fixes.
Then please just consider that every single stable kernel contains 1 or 2 fixes for similar vulnerabilities that nobody took the effort to try to exploit. THIS is the reality.
Are you saying that you are able to read all incoming linux patches, and easily identify changes which fixes a security problem, so that you can come up with a POC by the time the security issue is announced?
If the patch was flagged as a security problem from the beginning, it would give advantage to attackers, since they would know that the particular patch is worth investigating, while the defenders would have to wait for the patch to be finalized and tested anyway.
> Are you saying that you are able to read all incoming linux patches, and easily identify changes which fixes a security problem, so that you can come up with a POC by the time the security issue is announced?
Their point is that a full-time attacker (and there's enough money in it to do it as a full-time job these days) can look for obfuscated commits and take the time to deobfuscate them, whereas a defender doesn't have that kind of time.
I agree, that is definitely possible. That said it requires lot of work, since there are lot of incoming patches. I wonder how many people would have to review every proposed patch, how to select subset of incoming patches for human review, and how much one have to pay a team doing all this, to get reasonable results and return of investment.
My point was that if security patches are flagged as such from the start, it saves attackers lot of time (and money), as they will no longer have to go through (almost) every patch and evaluate whether it could be fixing a security problem. This means that such scenario will get a lot cheaper, while the defenders won't gain much from that, as one still needs to wait for the fix to be finalized and tested before deploying it in a production environment.
Security researchers already know that they're submitting a patch for a security flaw - there is 0 additional overhead.
> My point was that if security patches are flagged as such from the start, it saves attackers lot of time (and money), as they will no longer have to go through (almost) every patch and evaluate whether it could be fixing a security problem.
Not really.
1. They can just check to see who made the commit - if it's a security researcher, it's obviously a vuln patch
2. The commits are obfuscated in hilariously obvious ways if you know what to look for
3. It's not that hard to look at a commit, it's kinda what they're paid for
> while the defenders won't gain much from that,
When the vuln is found a race begins between attacker and defender. The difference is that attackers know they're in a race and defenders find out two weeks later.
Your 3 points above are true but this is a perfect example where that didn't apply. A perfectly regular bug found by someone affected by this bug who then started to wonder whether or not it could lead to more interesting effect. Also, the "attackers" you're talking about are more interested in the bugs that are not yet fixed as these ones are more durable. The goal here is mostly to protect against vandals who do not have such skills but find it fun to sabbotage systems. Multiply the max lifetime of critical bugs and the number that are found every year and you'll figure the number of such permanent issue that affect every system and that some people are paid to look for and exploit. This is where their business is. These ones will at best try to sell their exploits when seeing the fix reach stable as they know that within two weeks it won't work anymore, so better get a last opportunity to make money out of it.
You have it the wrong way around. Tagging the release as security allows nation-state level attackers with large budgets to investigate the fixes, while normal people have to wait to patches. This gives nation-state level attackers with large budgets a heads-up, making it worse for everyone else. Furthermore, nation-state level attacks with large budgets are more focused on offense than defense.
Attackers with the resources and patience to read and deeply analyze all the commits, over time... those guys were fairly likely to notice the bug back when it was introduced. Plain vs. obscure comments on the patch don't much matter to them. Low-resource and lower-skill attackers - "/* fix vuln. introduced in prior commit 123456789 */" could be quite useful to them.
What is your threat model / situation that you care about attackers who reverse engineer patches, but are not in the small circle of people who would be informed before hand.
To me, it seems like the average corporate security team is not going to worry about these kinds of attackers. Security for state secrets might, but they seem likely to be clued in early by Linux developers.
> What is your threat model / situation that you care about attackers who reverse engineer patches, but are not in the small circle of people who would be informed before hand.
Virtually every single Linux user. I think what you're missing is how commonplace and straightforward it is for attackers to review these commits and how uncommon it is for someone to be on the receiving end of an embargo.
Most exploits are for N days, meaning that they're for vulnerabilities that have a patch out for them. Knowing that there's a patch is universally critical for all defenders.
For context, my company will be posting about a kernel (then) 0day one of our security researchers discovered. You can read other Linux kernel exploitation work we've done here: https://www.graplsecurity.com/blog
By threat model I mean, who are you worried about attacking you.
I get that every linux user could be attacked. But why would someone with the relevant knowledge that could pull this off attack a given linux user? Why are you worried about it? (Not trying to be sarcastic, trying to get a sense of what threats you are worried about).
My point is that this is basically just how exploits work for Linux, so it's pretty universal unless your main concern is 0days. As for me personally, I run a company that uses Linux in production. We happen to explicitly do research into Linux kernel security (we'll be publishing tomorrow on a 0day we had reported) https://www.graplsecurity.com/blog
This is why stable branches are a thing. I don't know the branching scheme that the Linux kernel uses, but the idea is that for the oldest (most stable) branch, everything is a (sometimes backported) bugfix with security implications.
The offending commit was authored by Christoph Hellwig and possibly reviewed by Al Viro both of whom combined are close to 100% of Linux filesystems and VFS knowledge. Point being with the level of complexity you're just going to live the fact that they'll always be bugs.
VFS/Page Cache/FS layers represent incredible complexity and cross dependencies - but the good news is code is very mature by now and should not see changes like this too often.
Your post sounds like it's a bad thing, but "nicer" code is easier to maintain, i.e. there will be fewer bugs (and fewer vulnerabilities).
This bug is an exception of the rule - shit happens. But refactoring code to be "nicer" prevents more bugs than it causes.
Two patches were involved in making this bug happen, and minus the bug, I value both of them (and their authors).
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies."
I might have sounded harsh but I think shit happens is not the way to look at this. Don't claim I'm a better developer, but I always try to shy away from making things look nicer.
Experience have thought me, deal with problems when it is a problem. Dealing with could be problems can be a deep, very deep rabbit hole.
The commit message gave me the feeling that we should have just trust the author.
> I always try to shy away from making things look nicer
That's understandable, though from my experience, lots of old bugs can be found while refactoring code, even at the (small) risk of introducing new bugs.
As (almost) always, the expert's answer is: "It depends". How risky is the change, how big the consequences, how un-nice is the code before, how easy is it to test that the code still works afterwards, etc...
FWIW, I tend to err on the side of "do it", and I usually do it. But I have been in a situation where a customer asked for the risk level, I answered to the best of my knowledge (quite low but it's hard to be 100% sure), and they declined the change. The consequences of a bug would have been pretty horrible, too. Hundreds of thousands of (things) shipped with buggy software that is somewhat cumbersome to update.
While true, it's important to ensure there is adequate test coverage before trying to refactor, in case you miss something.
Also, try to avoid small commits / changes; churn in code should be avoided, especially in kernel code. IIRC the Linux project and a lot of open source projects do not accept 'refactoring' pull requests, among other things for this exact reason.
Agree, but even 100% test coverage can't catch this kind of bug. I don't know of any systematic testing method which would be able to catch it. Maybe something like valgrind which detects accesses to uninitialized memory, but then you'd still have to execute very special code paths (which is "more" than 100% coverage).
A lot of times, this is just shifting the problem to the future and making life harder.
We have a team like this -- their processes often failing, and their error reporting is lacking in important details. But they are not willing to improve reporting / make errors nicer (=with relevant details), instead they have to manually dig into the logs to see what happens. They waste a lot of time because they "shy away from making things look nicer."
Caveat - I know this doesn't directly apply to the vulnerability at hand, but is a discussion of a tangential view.
> Experience have thought me, deal with problems when it is a problem.
Experience has taught me that disparate ways of doing the same thing tend to have bugs in one or more of the implementations. Then trying to figure out if a specific bug exists other places requires digging into those other places.
Make it work. Make it good. Make it faster (as necessary) is the way my long-lived code tends to evolve.
Nonsense. It's just easy to blame refactoring when it breaks something. "You fool! Why did you change things? It was perfectly fine before.". Much harder to say "Why has this bug been here for 10 years? Why did nobody refactor the code?" even when it would have helped.
Not refactoring code also sacrifices long term issues in return for short term risk reduction. Look at all of the government systems stuck on COBOL. I guarantee there was someone in the 90s offering to rewrite it in Java, and someone else saying "no it's too risky!". Then your ancient system crashes in 2022 and nobody knows how it works let alone how to fix it.
Unless the code is very well covered by unit tests, any refactoring can introduce bugs. If the code is well established and no longer changing, there is no ease of maintenance to be gained. There is only downside to changing it.
If the code is causing more work to maintenance and new development, sure it may make sense to refactor it. Otherwise, like the human appendix, just leave it alone until it causes a problem.
I've the impression that most maintainers and project founders care about the project and the source. Contrary to what in industry happens often, where other things are more important {sales, features, marketing, blingbling}.
One of the prevailing features of well driven open-sources project is - you're encouraged to improve the code i.e. make it better {readable, maintainable, faster, hard}. You're not encouraged to change it for the sake of change i.e impress people.
I've the feeling it is the first case because it reduced the number of lines and kept source readable. Aside from that, I don't think good developers want to impress others.
Very good point. Often developers talk in cargo-cult terminology like "beautiful" or "nice" or "elegant" code, but there is no definition of what that even means or whether it empirically leads to better (or worse) outcomes. We know people like it more, but that doesn't mean we should be doing it. A true science would provide hypothesis, experiment, repeated evidence, rather than anecdotes.
(from the downvotes it seems like some people don't want software to be a science)
While scientific approach would be nice, it is hard to do, and even harder to do correctly in the way applicable to the specific situation. And in the absence of the research, all we have is intuition and anecdotes.
And they work both ways -- there are anecdotes that making code beautiful leads to better outcomes, and there are anecdotes that having ugly code leads to better outcomes.
This means you cannot use lack of scientific research to give weight to your personal opinions. After all, that argument works in either direction ("There is no evidence that leaving duplicate code in the tree leaves to worse outcomes... A true science would provide...")
> Point being with the level of complexity you're just going to live the fact that they'll always be bugs.
I'd like to add, for the less tenured developers around: "with the level of experience you're just going to live the fact that there'll always be bugs."
This will be worst over time until "more planned obsolescence than anything else" code is committed into the linux kernel.
Many parts of the linux kernel are "done", but you will have always some ppl which will manage to commit stuff in order to force people to upgrade. This is very accute with "backdoor injectors for current and futur CPUs", aka compilers: you should be able to compile git linux git with gcc 4.7.4 (the last C gcc which has beyond than enough extensions to write a kernel), and if someting needs to be done in linux code closely related to compiler support, it should be _removing_ stuff without breaking such compiler support, _NOT_ adding stuff which makes linux code compile only with a very recent gcc/clang.
For instance, in the network stack, tons of switch/case and initializer statements don't use constant expressions. Fixing this in the network stack was refused, I tried.
Lately, you can see some linux devs pouring code using the toxic "_Generic" c11 keyword, instead of using type explicit code, or new _mandadory_ builtins did pop up (I did detect them is 5.16 while upgrading from 5.13) which are available only in recent gcc/clang.
When you look at the pertinence of those changes, those are more "planned obsolescence 101" than anything else.
It is really disappointing.
This kind of argument is hypocritical: You want to use newer versions of the Linux kernel yourself (otherwise you could just stick to whatever builds with your toolchain!), but say that the Linux kernel must not use newer versions of things.
The GCC version requirement is 5.1 (which is 7 years old). Before that, it was 4.9, 4.8, 4.6 and 3.2. It has never been 4.7.
Use of newer versions of C than C89 which provides solutions to actual issues is perfectly fine. C11 was picked because it does not require an increase in minimum GCC version to use it, making your entire argument pointless.
The Linux kernel is already pretty lenient, as many alternatives have their a compiler on the tree and target only that.
>you should be able to compile git linux git with gcc 4.7.4 (the last C gcc which has beyond than enough extensions to write a kernel)
By this logic why not write the entire kernel in assembly? Tools evolve and improve over time and it makes sense to migrate to better tools over time. We shouldn't have to live in the past because you refuse to update your compiler.
That's obviously not their logic at all. Trying to diminish this to "OP refuses to update compiler" is frankly disrespectful of them & their actual point.
To me their logic is that their old tool works just fine so they shouldn't have to upgrade it. He essentially said that having a plan to upgrade to a newer version of the language or to a more up to date toolchain is planned obsolescence. He seems to want to be able to use his specific version of his compiler to the end of time. To me I don't quite get the justifications of this perspective as GCC is free software and it is simple to upgrade.
Thank you, that's a great reply to his comment. My first impression of his comment was that the kernel project shouldn't chase the latest-and-best compiler releases -- or similarly the most recent C language changes; rather, a boring-technology approach is sensible for such a foundational project as Linux. I see your point, though, that GCC is simple to upgrade. (If I were making the tech decision here, I'd want to ensure that newer GCC's didn't introduce features that I thought were too risky for my project, or at least that I could disable/restrict those features with flags.)
GCC 5.1 (released in 2015) is hardly latest-and-best, though: moving the version bar up only very slowly and with an eye to what distros are using as their compiler version is a pretty solid boring-technology approach, in my view.
It's not completely arbitrary. Notice that they said "the last C GCC". After that version, GCC started using C++ in the compiler itself. I can see why some people would see that as a complexity line that must not be crossed, as it makes bootstrapping harder.
What GCC is written in only matters if you intent to write your own compiler to compile it - which as you have no compiler yet would likely have to be written in assembly.
Otherwise you need to download a prebuilt compiler anyway, and whether that is C11 or C++11 is rather unimportant.
There was a never-shipped bug in Solaris back around.. I want to say 2006? I don't remember exactly when, but there was a bug where block writes in a socketpair pipe could get swapped. I ended up writing a program that wrote entire blocks where each block was a repeated block counter, that way I could look for swapped blocks, and then also use that for the bug report. The application that was bit hard by this bug was ssh.
Writing [repeated, if needed] monotonically increasing counters like this is a really good testing technique.
Fix was already merged to Android, however, there are millions of devices that will never be updated. The nice question: can this be used for temp-rooting? Vulnerabilities can be a blessing sometimes...
> there are millions of devices that will never be updated
Luckily, almost all (if not just all) these millions of devices which will never be updated never ever received the vulnerable version in the first place. The bug was only introduced in 5.8 and due to how hardware vendors work phones are still stuck in 4.19 ages (or better, 5.4. but no 5.10 besides Pixel 6)
I maintain a ROM for primarily older devices, the big feature is automated kernel CVE patching.
My patcher was able to patch the 15 affected devices I support, and I'll have builds up in the next few days.
https://gitlab.com/divested-mobile/divestos-build/-/commit/5...
It has been less than a month after fixes emerged for kernels and your PoC exploit has already been released into the public. Should you not have waited at least a bit longer (for example 2 months) before disclosing this vulnerability so that people/companies can keep up with patching? Don't they need more time to patch their servers and legacy etc before this becomes yet another log4j exploitation fest? That is if this really is the new dirty cow vuln.
I get responsible disclosure is important, but should we not give people some more opportunity to patch, which will always take some time?
It's the absolute opposite. It's insane that this commit wasn't flagged as a patch for a major vulnerability. Why am I finding out about this now? Why is it now my job to comb through commits looking for hidden patches?
It puts me, as a defender, at an insane disadvantage. Attackers have the time, incentives, and skills to look at commits for vulns. I don't. I don't get paid for every commit I look at, I don't get value out of it.
This backwards process pushed by Greg KH and others upstream needs to die ASAP.
Once the commit is in the kernel tree it's effectively public for those looking to exploit it. Combing recent commits for bug fixes for the platform you're targeting is exploitation 101.
The announcement only serves to let the rest of the public know about this and incentivize them to upgrade.
FWIW, if it in any way comes off like I'm blaming Max for this, I'm not. Anyone blaming Max for how vulnerabilities are disclosed is completely ignorant of the kernel reporting process.
Just wanted to note that your replies come off as quite confrontational/aggressive. I think you have valid points, and it's clear that this topic is important to you, but you're heating up the atmosphere of the thread more than necessary.
Why not three months? Why not six? I do not get it. How is this same conversation still happening? This was public the day the patch was sent to the list or pushed to a public git server. Do you think adversaries are sitting around for a POC? Or for you to decide to get around to patching?
I can't help but physically shake my head as I write this. I can't imagine actually asking people to try to play pretend security through obscurity because folks still can be arsed to implement some sort of reasonable update strategy. I have enough experience in tiny and huge shops to say that it's a matter of prioritization and it's just a blatant form of technical debt and poor foresight.
You never know if it was already being exploited, but once thing is sure, once the patch gets merged, it's a race and only a matter of time before an exploit is written. Two weeks is already long and may leave distro users exposed, which is why it's important that it doesn't stay too long in the fridge. Ideally we should have a "patch day" every week that distros would align on. That would allow users to adapt to this and get prepared to applying fixes everywhere without having to wonder about what fix addresses what, and more importantly it would remove the surprise effect. The distros process doesn't make this possible at the moment.
>Let me briefly introduce how our log server works: In the CM4all hosting environment, all web servers (running our custom open source HTTP server) send UDP multicast datagrams with metadata about each HTTP request. These are received by the log servers running Pond, our custom open source in-memory database. A nightly job splits all access logs of the previous day into one per hosted web site, each compressed with zlib.
Via HTTP, all access logs of a month can be downloaded as a single .gz file. Using a trick (which involves Z_SYNC_FLUSH), we can just concatenate all gzipped daily log files without having to decompress and recompress them, which means this HTTP request consumes nearly no CPU. Memory bandwidth is saved by employing the splice() system call to feed data directly from the hard disk into the HTTP connection, without passing the kernel/userspace boundary (“zero-copy”).
Windows users can’t handle .gz files, but everybody can extract ZIP files. A ZIP file is just a container for .gz files, so we could use the same method to generate ZIP files on-the-fly; all we needed to do was send a ZIP header first, then concatenate all .gz file contents as usual, followed by the central directory (another kind of header).
Just want to say, these people are running a pretty impressive operation. Very thoroughly engineered system they have there.
This if f*cking scary. Such a simple code, so dangerous and it works. You can trivially add an extra root user via /etc/{passwd|shadow}. There are tons of options how to p0wn a system.
Eh, it’s a limited subset of kernel versions (ones unlikely to be used in those devices), and requires local execution privileges and access to the file system. Linux in general has had numerous security issues (as has every other OS), often requiring far less access.
Does it need patching? Of course. It’s not a privilege escalation remote code execution issue though, and even if it was, it would be on a tiny fraction of running devices right now.
Those unsupported devices probably don't run Linux 5.8 or later, they are likely on older versions. It would be really useful to have this vuln on them though, it would help with getting root so you can get control of your own device and install your own choice of OS.
It wasn't available via `apt-get update && apt-get dist-upgrade` as of when I drafted that comment, but I confirm that 5.10.92-2 seems to be released now.
Others, note that the new archive name is 'stable-security'. You might need to update your pins if you upgraded from Buster and you're not seeing the update now. I put in a pull request to add it to the release notes.
< 5.8 not being affected is probably a saving grace for quite a few enterprises as I'd expect that LTS distributions may not have got that version included as yet.
I’m curious how git bisect was applied here. Wouldn’t you have to compile the whole kernel somehow and then run your test program using that kernel? Is that really what was done here?
Yes? This is faster and easier than you may think it to be. Building a reasonably small kernel only takes ~a minute. People usually have fully automated git-bisect-run scripts for build & test in qemu.
For me, at least, there's an important difference missing from the debate over the term "C/C++": compiling C code is always much faster than you would expect, but compiling C++ code is always much slower than you would expect...
And yet there's an ongoing effort to optimize the kernel compile time by rearranging all of the headers. On a modern machine with plenty of cores a kernel build is pretty quick, but they're talking about slicing 20% or more off the top.
Perhaps more importantly than being fast, it is scriptable. ("git bisect run" can take a shell command to run and interpret the exit code of, so you could script everything including the kernel recompiles and walk away for a few hours.)
GZIP (.gz) and PKZIP (.zip) are both containers for DEFLATE. GZIP is barely a container with minimal metadata, whereas PKZIP supports quite a bit of metadata. Although you can’t quite concatenate GZIP streams to get a PKZIP file, it’s pretty close—if I recall correctly, you just chop off the GZIP header.
> if I recall correctly, you just chop off the GZIP header.
...to get the raw DEFLATE stream, that is. You still need to attach any necessary metadata for PKZIP, which Max mentions. Their approach for converting between the two is pretty clever: it's so elegant and simple that it seems obvious, but I never would have thought of it. Very nifty, @max_k!
Once I fell victim to The Dirty Bong Vulnerability, when the cat knocked the bong over onto my Dell laptop's keyboard. Fortunately I had the extended warranty, and the nice repairwoman just smelled it, laughed at me, and cheerfully replaced the keyboard for free. No way Apple would have ever done that.
Nah, that's the most fun part. Once you have one kernel that works and one that doesn't, you can be pretty sure that you'll eventually find the cause of the bug. The part where I would have given up is the "trying to reproduce" part.
Depends entirely on what sort of hardware you have. IIRC, I usually spend around 5 minutes when compiling Linux on my desktop, so not instant but not horrible. The agonizing part would be to have to manually install, boot and test those kernels, or to create a setup involving virtual machines which does that automatically to use `git bisect run`.
>Memory bandwidth is saved by employing the splice() system call to feed data directly from the hard disk into the HTTP connection, without passing the kernel/userspace boundary (“zero-copy”).
What are the memory savings of this splicing approach as compared to streaming [through userspace]?
What does "streaming buffers" mean? splice() avoids copying data from kernel to userspace and back; it stays in the kernel, and often isn't even copied at all, only page references are passed around.
67% savings. If an application reads a 1MB file the normal way, the kernel creates 1MB of buffers in the file system cache to hold the data. Then it copies the data to another 1MB of buffers which are owned by the application. If the application then writes the data out to a network socket, the kernel has to allocate another 1MB buffer to hold the data while it is being sent.
If the application were processing the data in some way, then it would be worth it. Otherwise it is better to skip all of that work.
Since so many distros seem to lag a good ways behind on packages, and this vulnerability (in it's easiest exploited form) was introduced in kernel 5.8, it would seem a fair amount of Linux installs wouldn't actually be vulnerable to this. Is that somewhat correct?
I don't have a basis for how long this might take. As the author mentions "All bugs become shallow once they can be reproduced.", but only after spending probably the largest amount of time waiting for new incident reports to come in, and then analyzing the reports (e.g. to determine most incidents occurred on the last day of the month), and hours staring at application and kernel code. It's very impressive, but certainly the largest amount of time in the 10 month duration was not actually debugging. The "moment of extraordinary clarity" probably sprung out of years of experience.
Ah, I guess my thinking is that they didn't really focus on it. It was annoying but not high priority ... until they started to get an inkling of what was actually going on.
Agreed, about 99% of admins I know would not be able to identify this error, and most likely most Hacker News reads. The last sentence on your post is very true.
I’ve worked with (and been) a dev for several decades, and I can count on one hand the number of folks who would have a chance of figuring this out, and 2 fingers the number of folks who WOULD.
Of course, most never try to optimize or go so deep like this that they would ever need to, so there is that!
I assume you’d trigger a write some other way - if using this to mess with the shadow file, say, change your password at the same time to flush the file.
We all got the reference you were making; the problem is 'heart' 'bleed' is based around what could be considered the heart (rather than as is normally said, the brain) of a computer 'bleeding' data from one context to another.
In both cases the researchers chose sort of punny names that were also self descriptive and obvious once you read how to produce the exploit. 'Dirty Pipe' is literally the recipe for this exploit / corruption. Maybe your name seems funny to you for some reason that isn't obvious / shared.
> require all fields to be initialized any time an object is created
I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.
Worse, if you zero-initialize everything, valgrind will be unable to find accesses to uninitialized variables, which hides the bug and makes it harder to find. If I have no good initial value for something, I'd rather leave it uninitialized.
> I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.
So use a language that has an option type, we've only had them for what, 50 years now.
Mandatory explicit initialization, plus a feature to explicitly mark memory as having an undefined value, is a great way to approach this problem. You get the benefit in the majority of cases where you have a defined value you just forgot to set and the compiler errors until you set it, and for the "I know it's undefined, I don't have a value for it yet" case you have both mandatory explicit programmer acknowledgement and the opportunity for debug code to detect mistaken reads of this uninitialized memory.
But I think it would be troublesome to use such a hypothetical feature in C if it's only available in some compiler-specific dialect(s), because you need to coerce to any type, so it would be hard to hide to hide behind a macro. What should it expand to on compilers without support? It would probably need lots of variants specific to scalar types, pointer types, etc., or lots of #if blocks, which would be unfortunate.
Actually, https://news.ycombinator.com/item?id=30588362 has convinced me this wouldn't necessarily solve the bug in question either, since it's a bug caused by (quite legitimately) re-using an existing value. Though it would be easy to implement a "free" operation by just writing `undefined`, so it would still help quite a bit, and more than suggestions like "just use an Optional/Maybe type".
GCC has recently introduced a mode (-ftrivial-auto-var-init) that will zero initialize all automatic variables by default while still treating them as UB for sanitize/warning purposes.
The issue is with dynamic memory allocation as that would be the responsibility of the allocator (and of course the kernel uses custom allocators).
Interesting compiler feature to work around (unknown) vulnerabilities similar to this one. However in this case, it wouldn't help; the initial allocation is with explicit zero-initialization, but this is a circular buffer, and the problem occurs when slots get reused (which is the basic idea of a circular buffer).
Would this get caught by KMSAN (https://github.com/google/kmsan)? Maybe the circular buffer logic would need to get some calls to `__msan_allocated_memory` and/or `__sanitizer_dtor_callback` added to it? If this could be made to work then it would ensure that this bug stays fixed and doesn't regress.
Yes, but as you said, it works only after adding such annotations to various libraries. A circular buffer is just a special kind of memory allocator, and as such, when it allocates and deallocates memory, it needs to tell the sanitizer about it.
What bothers me about the Linux code base is that there is so much code duplication; the pipe doesn't use a generic circular buffer implementation, but instead rolls its own. If you had the one true implementation, you'd add those annotations there, once, and all users would have it, and would benefit from KMSAN's deep insight.
Every time I hack Linux kernel code, I'm reminded how ugly plain C is, how it forces me to repeat myself (unless you enter macro hell, but Linux is already there). I wish the Linux kernel would agree on a subset of C++, which would allow making it much more robust and simpler.
They recently agreed to allow Rust code in certain tail ends of the code base; that's a good thing, but much more would be gained from allowing that subset of C++ everywhere. (Do both. I'm not arguing against Rust.)
Why can't things like option types be used? That solves the issue as you'd either have `Some<FooType>` or `None`, which could be dealt with separately.
I love Rust, but would it have prevented this problem? IIUC there was no memory corruption at the language level here. This was really just a logic error.
Yes, it would have. Some code creates an instance of some struct, but doesn’t set the flags field to zero. It thus keeps whatever value happened to be in that spot in memory, an essentially random set of bits. Rust would force you to either explicitly name the flags field and give it a value, or use `..Default::default()` to initialize all remaining fields automatically. Anything else would be a compile–time error.
The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.
In fact, this might be a prime example of "using Rust does not magically eliminate your temporal bugs because sometimes they are not about memory safety but logical". Before that my favorite such bug is a Use-After-Free in Redox OS's filesystem codes.
Pro tip for random HN Rust evangelist: read the fucking code before posting your "sHoUlD HAVe uSED A BeTTER lANGUAGE" shit.
This is only partially fair. In Rust you would probably have assigned a new object into *buf here instead of overwriting the fields manually. It is good practice to do this (if the code is logically an object initialization, it should actually be an object initialization, not a bunch of field assignments), but it's clunky to do so in C because you can't use initializers in assignments.
The point is: You could have done this in Rust, but you wouldn't have been required to do so, so the exact same logic bug could have emerged. Maybe it would be more Rust-like to write the code like that, but it would have also been possible to write the code like that in C - and since we're talking about the kernel here, even if this code was written in Rust a developer might have written it in the more C-like way for performance reasons.
People writing C don't re-use allocated objects because it's clunky but to improve performance. The general purpose allocators are almost always much slower than something where you know the pattern of allocations. I've no idea if Rust has a similar issue. I would think that most kernel code, whether C or Rust, would need to handle "allocation fails" case and not depend on language constructs to do allocations, but that's just a guess.
I'm not saying you shouldn't reuse allocated objects. I'm talking about building a local object (no dynamic allocation) and assigning it to the pointer at once. This has the same runtime behavior (assuming -O1) as assigning the fields one by one.
you can assign a new object into *buf in C just fine, with "*buf = (struct YourType){.prop1 = a, .prop2 = b}"; it even zero-initializes unset fields! So C and Rust give you precisely the same abilities here.
edit: the "struct pipe_buffer" in question[1] has one field that even the updated code doesn't write - "private". Not sure what it's about, but it's there. Not writing unrelated fields like that is probably not much of an issue now, but it certainly can add up on low-power systems. You might also have situations where you'd want to write parts of an object in different parts of the code, for which you'd have to resort to writing fields manually.
Oh I was not aware of this syntax in C, thanks for bringing it up! I still think the pattern is more common and known in Rust but I might be wrong :)
Re: your other points, "reusing a pre-allocated struct from a buffer" is basically object initialization, which is different from other times you want to write fields. In general an object initialization construct should be used in those cases, this whole thread being an argument why. Out-of-band fields such as the "private" field are a pain I agree, but they can be separated from an inner struct (in which case the inner struct is the only field that gets assigned for initialization).
Taking a step back, the true solution is probably to have a proper constructor... And that can be done in any language, so I'll stand corrected.
The question of safety is often much more about coding standards than actual language features. For a kernel, you'd be violating Rust's safety features left right and center (or have a slow kernel that just dies on OOM; choose one) and would have to come up with your own standards for preventing coding errors in those, which is what you already have with C.
To be clear, some base level of trivially safe code is certainly a nice-to-have. I just don't think the amount that helps for a kernel is that much, and the added boilerplate on more unsafe things might even obscure issues.
as far as I understand, the kernel uses C89 + GNU extensions, cause there definitely are usages in the kernel of it. (my searching showed it being only used for defining globals, which is a weird standard, but I don't see anything preventing using it elsewhere if they wanted to)
And there are recent plans to move the kernel to a newer C version anyway.
I agree with your sentiment. Only the most strict pure functional languages will prevent you from reusing objects.
You could argue that some languages distinguish raw memory from actual objects and even when reusing memory you would still go through an initialization phase (for example placement new in C++) that would take care of putting the object into a known default state.
> The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.
This statement is incorrect. They are using an arena allocator, and there is no way for it to know if it is reusing one of the elements or using that element for the first time. To do this in Rust you would probably be using the MaybeUninit type: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html
However, you are partly correct. In Rust, when using the MaybeUninit type, it is still possible to partially initialize an object and then return it as if it were fully initialized without hitting a compile error. https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#ini...
If you do the whole struct at once, rather than one field at a time, then the compiler still has your back:
Ah thanks for explaining. I misunderstood the root cause and didn't read the patch. Rust definitely would have helped here. Or even just enforcing modern C practices such as overwriting the whole struct so that non-specified values would have been set to zero (although explicit is better than zero).
buf is a pointer to a pipe_buffer array, this whole function wreaks yes, but i don't think this is a simple "initialization would have fixed" bug
buf = &pipe->bufs[i_head & p_mask];
i think it is a problem with merging and needing to reset the flags, but i didn't want to waste too much time really trying to find the root issue and wtf merging is/why do it.
I tend to agree but only partially. I'm doing myself a lot of init functions for plenty of stuff, but I've faced plenty of issues due to the fact that these functions do not always initialize everything (and your patch above does exactly that for a good reason) and that it's much less obvious for those using them. And if they initialize too much, they can equally be a pain to work with. It's not uncommon to require 2 or 3 different init functions depending on what you're doing, but the name should explicitly indicate the promise, and that's where it's difficult.
Then let's have those 2 or 3 documented init functions. That's not perfect, but still much better than spraying different (undocumented) copies of the init code everywhere, that have to be located and adjusted every time somebody refactors something.
> Pro tip for language designers: require all fields to be initialized any time an object is created.
This proposal sounds great until you find out that this is a hard problem to solve reasonably well in the compiler and no matter what you do there will be valid programs that your compiler will reject.
No, I don’t think that this should be considered a valid program. I would never allow any code that creates a partly–initialized struct in any code that I maintain.
I agree that it’s a hard problem, but I don’t think that language designers should use that as an excuse. I think that language designers should really double–down and require that every field be initialized, otherwise people will just forget. The code will be carefully written at first, but then someone new will come along and add a new field, and then the existing code is insufficient. With no errors from the compiler there is no way to ensure that the new programmer updates everything to accommodate the new field.
Rust does pretty well in this regard, but you can still use unsafe blocks to get a partly–initialized struct if you really want one. I like Rust, but I wish that it went all the way and didn’t allow it at all.
I wouldn't expect additional security from introducing an entirely new OS/kernel. Just unknown RCEs and other vulnerabilities waiting to be discovered.
> Blaming the Linux kernel (i.e. somebody else’s code) for data corruption must be the last resort.
^^^ I can only image the stress levels at this point.