Hacker News new | past | comments | ask | show | jobs | submit login
“Fast Kernel Headers” Tree -v1: Eliminate the Linux Kernel's “Dependency Hell” (lwn.net)
549 points by nnx on Jan 3, 2022 | hide | past | favorite | 273 comments

Ingo Molnar was always one of my favourite scary kernel devs. This kind of massive rewrite was always a key strength of OG devs, who were prepared to tackle maintenance tasks beyond the limits of mortal sanity. The result of such ongoing struggles is why the Linux kernel still works after decades of active development, despite radical changing requirements.

Says Rusty Russell, who's pretty goddam scary himself. :)

That's weird, because I don't see it that way.

I look at Ingo, Dave Miller, Richard Henderson and Andrew Tridgell who are ~ my generation, all both better coders than I will ever be and really nice people to work with.

Then I look at young coders like Olaoluwa Osuntokun and Bastien Teinturier who are also great to work with and who I can only keep up with because I have years of experience, and it completely keeps my massive ego in check!

His name is Rusty "Trivial" Russel. :)

A question that I've been mulling over: How do we create more people like this? Better documentation? A... "brief guide to the entire kernel" blog series? Aggressively pushing GSoC project mentoring?

The same way you build talent in any other organization - give people money, tell them this is their job, and put them in the same "room" as people who also have this job but have been doing it longer.

Or are you asking how we create more people who will do this in their free time?

I think the question is more how to find/create more people who would spend a good chunk of their productivity on what sounds like a massively boring rewrite that has marginal returns on investment (Moore’s law says that 50-80% gains will be made up in less than a year by hardware getting faster, and even if it has slowed down it isn’t completely out of the picture). That does take a special kind of person to see this as an ambitious project and not a boring one.

Betting on Moore's law to speed up your software hasn't held for more than a decade. Software keeps getting worse and hardware has largely plateau'd in what you can expect to get in single threaded performance. And besides, running a CPU for 2-5x as long to accomplish the same task is wasteful in both time and energy, both of which are scarce resources. Especially for something that is done as many times as rebuilding the Linux kernel.

Not all software is getting worse. : )

If you could stay 50% ahead of Moore's law in general, that would be worth tens of billions of dollars per year, increasing as Moore's law slows. Your returns don't become marginal until an iteration is so fast that it might as well be instant, and there's no benefit in doing more iterations per second. The kernel isn't in that range.

We would stay ahead of the curve on maintenance, if the kernel and other software was easier to live-patch and it was the default & very stable in pretty much all relevant distributions. No more clusters just for the sake of being able to update during business hours.

Also, as a Clojure dev, you cannot recompile the running kernel while using it at the same time using something akin to REPL driven development. That would be the other side of the live-patch productivity boost in so many areas.

If I am not mistaken, the CPU and other devices don't care, if everything is compiled ahead of time and a few functions tacked on later, as they are changed by e.g. patches, or if the kernel is completely recompiled. Obviously, I am simplifying a lot but it is a very serious question. We really should strive to build maintainable systems that don't have to reboot just to apply updates.

The Linux kernel already supports live-patching, but as I understand it, the patches themselves are hard to produce or can't be produced automatically, so to be able to produce them you need a team that understands the process, which limits the live patching to commercial distros that can afford a team for this.

I know, and that is exactly my point. It isn't like "send to REPL" in pretty much any half-decent LISP. And it hurts development comfort, speed and maintenance is way more burdensome than what it could be.

Here, you would need to involve a compiler of course and do more due diligence and then run ftrace or something to exchange the code. Also, you would probably in some cases need to make sure, the ordering of code is right as to not hurt performance. (See Emery Berger's talk https://www.youtube.com/watch?v=7g1Acy5eGbE which probably applies here as well.)

Of course, with great power comes great responsibility. You could add some optional checking facility, that would be able to roll back the changes to code if some conditions were met.

I think worrying about existing data structures in memory is a lot bigger part of the problem than putting new code into place.

Some of that problem space could probably be reduced by immutable/ CoW data structures modeled after Clojure/ Rich Hickey/ Phil Bagwell https://www.youtube.com/watch?v=K2NYwP90bNs

The rest could probably be solved by a very short "stop the world and switch a few pointers" routine.

I think there's also an element of "ownership" to the kernel for many of the OG maintainers. It's their "baby". And as every parent knows, you would do a lot more for your babies than for anyone else.

It may well be that same sense of ownership creates an "inner circle" effect that is a barrier to entry for creating more such developers.

The inner circle effect is absolutely real. Some projects get rid of it with very loose commit-bit policies, but even getting to that point can be painful. There are software projects I'd like to contribute to and have (I think) reasonable ideas about, but no way in hell am I running the gauntlet of the communities involved.

I find it hard to see how any developer would describe it as "massively boring". At a basic level, every time a coder solves a problem, there's a dopamine hit. The greater the familiarity with the material, the easier it is to solve problems. Anyone that spends a sufficient amount of time in a niche will develop that familiarity.

Refactoring can be satisfying; but spending two years refactoring kernel headers for a gain in build performance but not in runtime performance, is objectively "massively boring".

I've attempted something similar myself - a refactoring of a very dirty Java project that was orders of magnitude smaller than the Linux kernel. I didn't have good refactoring tools, nor even a good SCCS; my month-long effort had to be backed-out. I didn't enjoy it - it was boring, but needed doing, because the accumulating cruft was impacting my ability to work on the codebase (and presumably everyone else's). The project stayed dirty, and my reputation suffered.

Credit to Ingo for his perseverance. That is massive commitment.

> but spending two years refactoring kernel headers for a gain in build performance but not in runtime performance, is objectively "massively boring".

It makes a big difference for the thousands of kernel developers [0]. There's also more than a million linux users globally[2], assuming only 1% might compile their own kernel, that still leaves tens of thousands of linux users benefiting.

Not to forget about the side benefits of a refactor. There could be runtime improvements coming out of this too down the road.

[1] https://en.wikipedia.org/wiki/Linux_kernel [2] https://www.zdnet.com/article/how-many-linux-users-are-there...

Big turds often leaves refactors almost equal to Joels "V2 rewrites", continuous runnable general cleanups is often the first step (and yields surprising benefits in general) then still runnable prep refactors that makes the bigger leaps smaller (temporarily increasing complexity so you really want them in just before embarking on the bigger stuff so it doesn't increase complexity in itself).

That said, in your case it felt a bit like an organizational failure also. As you could read in Ingo's mail he had to start over after plenty of work put in, for him there wasn't any reputation damage because he seems to have had free reign to start this as he wished.

As a bit of an extremely refactorer myself, there are some of us who like adding features, but others who just like "kneading" the code.

People need to be taught to read code and not only write it. I feel like a lot of problems start with people seeing more than 100 lines then saying "fuck it" then start clamoring to remove technical debt by adopting something new or even an writing an entire new code base.

Reading code is fun and it's a skill that took me time to develop. People need to told that's a skill you need and yes (shock) not emphasize creativity as much as we do. Creativity is great but it has enough encouragement in society already. We need people willing to read what already exists as well.

The problem is that most code is utter shit. If you just read shit code and do nothing, you have made a choice to preserve the status quo. At that point, you have become part of the problem.

Lots of developers (especially newer ones) seem to think code is "utter shit" when it really is written that way for a reason. At least I've seen this pattern many times, and used to do it myself decades ago.

Despite the implication otherwise, I have been a software engineer for 30 years. This is speaking as an active consultant in the embedded space. There is shit code everywhere, in nearly every single system. Some code may be "written that way for a reason", but it can simultaneously be utter shit. Working shit, but still shit.

Yeah I am a big believer in https://www.joelonsoftware.com/2000/04/06/things-you-should-...

If you do a from-scratch rewrite, you could stay in igorance and might make those mistakes. But if you do a refactor, especially one broken up into many steps like this Linux one, it in fact demonstrates a better understanding the existing code and any nuance than merely leaving it in place!

Not touching scary-looking code is just bad, and not a wise form of humility.

I think that refactoring is often used as a means to better understand code. This might be your own code that you wrote yesterday or it might be someone else's code from 3 years ago. In order to really understand a piece of code, some of us need to put their hands on it.

But I think throwing away the changes should be an option if there's no objective improvement. Very often those changes end up on the main branch but they only cause changes, not improvements.

The real outcome is in someone's head.

Refactoring isn’t rewriting for the hell of it. Maybe some do that, but the majority of the time, it’s to make things clearer, allow for better abstractions, etc. Letting cruft build up because you think rewriting is wrong is how you get technical debt.

Yeah I think the "fuck it" crowd just adds new code and doesn't touch existing code.

That or they advocate doing a from-scratch rewrite.

Was going to say this as it somewhat applies to me, or as my business partner says I’m a bit of a “sweeper”.

I seem to get satisfaction from things that a lot of others might find tedious/boring.

People like us though just need to watch ourselves that what we’re doing is actually useful opposed to just scratching an itch. Which of course is not exclusive to us as developers often find many other ways of scratching itches without actually adding value.

It is healthy to be self aware about outcomes vs just indulging, but I am not longer so worried.

On my best projects, I've done the first 90% so that a great many other can collective do the 2nd, 3rd, and 4th 90%s :). Without being "goal oriented" you can feel good about unlocking that part that would be too annoying for others, and they in turn can do the finishing work that wouldn't be fun for you.

Don't feel like you're not doing important work just because others are slotting in the keystones.

Kernel builds have been getting slower, not faster. You aren't going to make up the gains, because other things will slow you down again. Therefore, this improvement will be at least a linear improvement in kernel build times for the foreseeable future, if not possibly supra-linear depending on how things work out.

Some people like this kind of thing and just seem to be predisposed to wanting to do this kind of work.

Good team builders will find a variety of different employee talents and interests and let people do what they like and develop those skills.

I would love to do more work like this at work, but since I can’t guarantee the results up front, no one is interested and code rot sets in. “Minimum impact, equivalency-first rewrites” are the gold standard of optimization and DevOps badly underestimates their value.

Denard scaling has been dead my entire career. Moore's law was never about speed.

This is compilation of thousands of files. It doesn't really depend on Dennard scaling.

You create a lot of unimpressive people by doing that too. You need someone to have passion for a project like this to do a huge rewrite. It's pretty clear that passion can't come from money.

On one hand, a general macro-economic problem we have is that the more we automated, the cheaper labor gets, the less we want to automate. Stuff like public transit -> Uber is a clear productivity loss, even.

So we need to restructure our society to have economy the promotes leisure not output.

That will at least align the incentives right with a lot of more extreme engineering that just doesn't make economic sense today.

On a completely different level, yes, this is the sort of thing that does require passion. There is a lot of obscure stuff that is hard to plan or otherwise validate society. I'm a bit of a utopian I suppose in thinking if we had more leisure and guaranteed consumption, people would be able to do more passion projects, and we could give people hindsight recognition just instead like this.

Basic needs can be unconditional, renown can be the reward for "extra" work.

Finally, I hope as more monoliths are broken into libraries, we get more of this sort of stuff organically. Beyond the economics, conways law holds us back. (Even within the kernel!) Need to make sure people feel free and safe to really get down in other people's code, to develop the macro view, to see stuff like this.

This is the sort of "macro" refactor you can't get to with just local optimizations alone. We can and should break it into steps, but by not means does every commit have its own clear perf benefit.

Also, by god, if they had "generics" just think how much further this sort of thing could be taken!

Passion doesn't come from money, but many people strive, and succeed, to be paid for the thing they are passionate about.

In other words, in a perfect world, someone recognizes the value-add of your passion, and offers you money to do what you would do for free.

Evidence of this is all around us, and while most won't get this, some significant proportion of people pull it off.

This is a common misconception. Passion comes from experience and mastery, not vice versa. If someone doesn't already have the money they need to gain that, then clearly they need to be paid to do so.

and even if i have the passion, if i don't get paid for doing it i won't have the opportunity to do it. i keep hearing how companies like google don't reward maintenance work because it doesn't produce shiny new features and doesn't get noticed by the higher-ups.

This is obviously false, the world is full of people who are passionate about what they get paid to do.

The world is also full of people who are not passionate about what they get paid to do, and they do it because it is high-paying and offers security.

I would say the world is full of people who are passionate about getting paid, which makes best course of action to be not doing a thing while convincing others you do, which IMO is harmful.

Yeah, and for “professionals” it’s pretty rare to find someone that doesn’t at least like the field they chose, spent 4+ years of university on, and applied for jobs in. It’s vanishingly rare to find someone exceptionally talented in a field they hate because of the practice it takes to get to that level.

>It’s vanishingly rare to find someone exceptionally talented in a field they hate because of the practice it takes to get to that level.

So what do I win?

I mean, the prize is really bad :(.

The people who are the most capable are typically the ones who do it for fun, not money.

People who does stuff for fun usually have all their money problems already resolved.

Health problems -> Money problems -> Fun problems.

You get the privilege to work on "fun problems" once you don't have "money" or "health" problems (otherwise, the "fun" problems are the least of your "problems").

I don’t think that’s true. A lot of the people who have had the greatest impact on the world were never financially rewarded for it. Many aren’t even recognised for their work.

This thread seems to be going down the path of rediscoving Maslows hierarchy of needs.

It's not that they are directly compensated for their work, it's that the money problem is solved, even if from an external source.

Linus Torvalds worked on Linux for a long time before being directly compensated for it, but he probably would never have done so had he needed to worry about whether he could afford his next meal and rent that month.

I got my start in software many years ago because I had passion. Now days the market is flooded with people who want to get paid.

I had free time, and an interest. Often the same characteristics of people who contribute to FOSS.

These people already exist. They do not need to be manufactured. I think many get turned away from the Linux community for non-technical reasons. For some, the community takes a bigger personal toll than the technical challenges. I personally gave up over that aspect and stopped contributing. I did not feel welcome or needed.

I have tried to contribute to the Linux kernel development in the past, but I found it to be a strong inner circle of seasoned devs. The fact that they were still using mailing lists instead of github/gitlab/etc to keep track of development was painful enough, but they were also hostile to asking for documentation on inner modules. I'm not sure how it will progress when they retire or die off.

I don't have anything to add to your overall point – I'm sure you're right. I just wanted to address two smaller ones:

> The fact that they were still using mailing lists

In my personal experience, going back some 30 years or so online with my own personal paid email account – which is still live and still works – I find that most people I've worked or interacted with who dislike using email for workflow or other important comms do not know how to use email effectively. Strange as it may seem, this now applies to the developers of most email clients and online email services.

Email and mailing lists are remarkably powerful and capable tools, if used properly. Most people do not use it or them properly. I have yet to see anything else by anyone that is an improvement in every way on email.

> I'm not sure how it will progress when they retire or die off.

TBPH, as a Linux user and professional, I rather hope it does not.

Linux is an amazingly useful tool, but OTOH it's now, in and of itself, a vast hairball of technical debt. The traditional UNIX model itself is.

It is long past time that we should have moved past it on to better things. There have been many attempts but none have ever achieved critical mass.

At the rate things are going it looks quite likely that our technological civilization will collapse due to out-of-control global warming and the ongoing and accelerating mass extinction event. As the parent of a 2YO I very much hope I am wrong.

But if I am and it doesn't, the world needs better tech, and a ½ century old UNIX clone already does not really cut it today.

If Linux and all the other monolithic Unixes die when their dev teams die, we'll have to move on to newer, smaller, simpler systems that a new generation of programmers can actually read from top to bottom, understand, and do useful work on.

If monolithic Unices become the mid-21st-century COBOL, just kept around for a few old systems and only critical bugs patched, that will be a big win for us all.

> Linux is an amazingly useful tool, but OTOH it's now, in and of itself, a vast hairball of technical debt. The traditional UNIX model itself is.

Linux, perhaps, but "the model"? I'm not so sure.

> It is long past time that we should have moved past it on to better things.

That's what they said about SQL RDBMSes too.

> There have been many attempts but none have ever achieved critical mass.

Maybe for good reason.

> the world needs better tech, and a ½ century old UNIX clone already does not really cut it today.

That's what they... Eh, I'm repeating myself.

> If Linux and all the other monolithic Unixes die when their dev teams die, we'll have to move on to newer, smaller, simpler systems that a new generation of programmers can actually read from top to bottom, understand, and do useful work on.

That's probably what they... No, not quite. But it may have been what Linus thought -- at least, judging from what he made.

> If monolithic Unices become the mid-21st-century COBOL, just kept around for a few old systems and only critical bugs patched, that will be a big win for us all.

I doubt that'll happen. I'll be happy if they become the mid-21st-century SQL -- which I suppose will also be still going strong.

I know it's an unpopular view. :-)

But the alternatives are there.

The fast-headers thing put me in mind of the way that Plan 9 changed the C language, not only formalising indentation and things, but to forbid nested #includes, which apparently vastly reduced compilation times.

Inferno moved on to largely supplant C with Limbo, which is one of the ancestors of Go. I'd say Go, D and Rust all show that there's demand for a newer, better C and that C++ is not it.

Apple's xnu is widely held not to be a true microkernel because of its big in-kernel Unix server, but Minix 3 is... and for all that Minix 3 is not quite there yet, QNX shows this is possible and doable and performant.

It's 2022. We have boxes with terabytes of RAM now and hardware-assisted virtualisation even on £5 ARM boards. We don't need a total clean-sweep replacement; we can virtualise the old stuff and slot in underneath something smaller, simpler and cleaner that mere mortals can understand in less than a lifetime.

3D Xpoint memory is on the market now and makes non-volatile RAM doable, multiple orders of magnitude faster and longer-lasting than flash memory. Computers don't even need drives any more: we could just have a few terabytes of persistent memory, and no more shuffling stuff into memory. True single-level store is doable now. Who needs files when everything is in RAM forever?

ISTM that much of the computer industry has its eyes on the ground now, and is just shuffling round and round in the same old groove, rather than trying new stuff. In the middle of my working life there were dozens of exciting new OSes trying to do things differently. Now, we have 30 million lines of C instead, and need a billion-dollar industry to aim all those many eyes at all those many bugs.

As Molnár himself said: “Don't forget that Linux became only possible because 20 years of OS research was carefully studied, analyzed, discussed and thrown away."

Ingo wasn't working at Linux full time at the time of his peak. He was indeed a Jedi master level C programmer even before he started working at Linux kernel seriously.

It's not about talent, it's about clout. Not just anyone could show up and do this.

In my experience, big and important rewrites only can happen after someone has been maintaining a code base a long time and they can spend many months or years mulling over simplifying architectural changes not just for change. Often for company software at least I was frustrated with the pace that they shuffle you around projects.

It's not the rewrite per se--it's squashing all the bugs that the rewrite introduces. That is why big C programs are so scary. Because it's unusually easy to introduce such bugs.

>> How do we create more people like this?

Very few companies or other management structures would ever sanction this kind of work. They might recognize it as important, but it's too big and open ended to "allocate resources to". So for this to happen you need to pay people for more of a general "make stuff better for us" role. Even that's hard because you never know what you're going to get.

A good start would be to stop actively discouraging this kind of development in the majority of work places.

This. Management needs to understand the value of "tinkerers" and "massagers". They're immensely valuable, but since they don't have a direct production output, managers rarely understand the value they provide.

Without them, quality degrades and you get a lot of busy work from working in inefficient systems.

I guess I have some of that in me. I have no shortage of things I'd do with a code base (test, fuzz, profile & improve performance, refactor, look for and fix bugs, audit for security issues, run various static & dynamic analyzers, and just generally try to improve architecture) to improve it. This kind of thing is a large part of what takes up time in my personal projects (and the very limited open source participation I've done). That's a ton of turd polishing but I think it pays in the end (and you see a lot of it going on if you follow projects with a reputation for good code, like OpenBSD or Linux).

I did this at work too early on in my career too but it was quickly made clear to me that if there's no ticket that the customer has opened (or at least approved), they're not paying for it and I can't spend time on it. Tickets I opened myself were 99% ignored or only brought up when a problem I had anticipated actually manifested in product (told ya.. now this 18-month-old ticket is suddenly relevant?). I quickly learned not to give a crap about the code base (hard to give a shit if you're not really allowed to give a shit?). I guess that also slashed my motivation (and productivity with it). It's pretty frustrating to work this way.

Creative freedom and autonomy are key. I'm sure there was no micromanager assigning Ingo Molnar the "make kernel builds faster (est. 80 hours of work)" ticket.

I can't speak for Linux specifically, but political capital is also an issue. You could have people willing and able to do such things, but that wouldn't be necessarily allowed to do so in practice if they haven't already established themselves for a while.

It's certainly part this, but also part experience. You get a feeling for who the people in charge of kernel subsystems are and how they like their changes after a while. That goes a long way towards acceptance; there's a lot more human interaction under the surface of kernel dev than people think.

Obviously this is an extreme case, but it scales. Something like 8 years ago I noticed something was horribly wrong with the ARM32 boot wrapper, submitted a fix, and my approach was mostly shot down by the maintainer. Last week I submitted a 34-patch RFC series to finally add WiFi support for 5 years' worth of Macs (which required quite a bit of new scaffolding in that driver, as well as fixes and new features) and it's gotten positive reviews so far, modulo nits. I wouldn't have been able to pull that off 8 years ago. And it's not like I spent those 8 years doing kernel dev, but I've sent in a few fixes and watched how other kernel developers work. I started on a big kernel project a year ago and all that sitting and watching has been very helpful in putting out stuff that people like.

I have and still struggling to get into Kernel dev. There seems to be a lack of documentation or a place where there are clear steps what to do in order to get active. I guess it gets easier if you work for a company when Kernel dev is a central place but otherwise it looks hard to find something interesting to do.

Hibernate and touchpad still needs work.

Yes, amazing work is being done, but preach it! We still need more quality of life changes on the desktop.

There are plenty of people with the capability to this wasting their life building adware.

So, I've been a person like this on a smaller scale (taking a fully written program, and basically "completely rewriting it" in a new language. Biggest thing I can say is that if you have enough of a "crazy-eyed mofo" that's just insane enough to take a big job like that, 3 things are of utmost importance:

1] remove their technical blockers. Often a person like this doesn't have a universal skillset — they're usually good, but they can't do everything. If there's a 5% of the job that's hard-blocking them, make sure it gets done so their tires don't get stuck in the mud.

2] remove their bureaucratic blockers. Make absolutely sure the project WILL proceed, even in a skunkworks capacity (this is the primary value of skunkworks — it's the ability to proceed with something you know will work, in spite of authorities expressly forbidding it because they think it won't work and don't want it to happen for what's usually a petty reason like a "it's waste of resources").

3] remove their emotional blockers. This is really the apex — THE primary value of a person like this is not technical skill, but rather, their work ethic. A person like this has a really profound, train-engine drive to just keep soldiering on. The danger here is the problem of the "weary crusader". This willpower is considerably above average ... but it's not infallible. It's not like a superhero that's always gonna come through no matter the odds. They can break down. They can lose heart.

One of the things that will constantly break them down is if people are naysaying them, and calling into question the value of their work. If they're doing a giant refactor, and people are angrily opposing it simply out of fear of change, it will break them down. As silly as it sounds, you want to coddle them — they might be unusually emotionally tough, but they are the absolute last person you want breaking down. Treat them like a "snowflake".

And yeah, I'm explicitly suggesting censorship. Don't permit negativity. Don't permit the usual cesspool OSS discussions where you have a bunch of people who contribute almost nothing bikeshedding a project to death and exercising a sort of "liberum veto" on any attempt to do new things. Just shut down those conversations — make it clear that only the people who do major heavy lifting (or really, who are discussing things "in good faith") have a voice. The whole OSS community has an unhealthy cultural fixation on "absolute free speech", and like ... I completely understand where that comes from, but it's absolutely toxic for motivation. I've observed over the years that a lot of the more aggressively successful groups at "creating things" basically just establish their own safe-spaces where people get emotionally reinforced instead of emotionally sabotaged. Occasionally their work will get exposed to the world and some toxicity from outside will leak in, but the bread-and-butter day to day experience of working on their projects has them surrounded by friendly people who believe in what they're trying to do, and encourage them to keep going.

Partly because we know we won't be there forever to keep working on it. If we feel like we're part of a group that "has our back", and will support what we built, and grow it into something, it's a million times easier to keep working on something, compared to a situation in which someone actively holds our work in spite and is itching for the first chance to tear it down the moment we turn our back, or move on, or die, etc. This is all "emotional extrapolation", but these are the sort of depressive/anti-depressive thought cycles that go through your head during a mega project, and too much of the negative side can just break people.

While this is all but the same as rewriting something in a different language I agree with what you're saying.

I think better docs is it, but from a high level systems view. I can dive into something (say usb gadgetfs) and get a decent view and start trying to solve my problem but when I need to start learning about general kernel mannerisms I hit some walls that slowed everything down.

Real talk, I don't think you can.

Frankly everything I know about software dev is geared to discourage exactly this sort of behavior.

I would not be happy to see a pull request touching the entirety of the code base for better compilation performance. Not because such a thing wouldn't pay dividends in developer performance or code readability, but because such a change is just inherently risky.

You might be able to train more people with whole program understanding (Definitely a skill that is lacking, IMO). But I doubt you could train someone to be able to "touch every file in the kernel and get it merged".

They have to be inspired by some field. You can only inspire someone if you are able to transfer the purpose of something. Performance, UX, maintainability, scalability, product.

Most highschool teachers are really bad at this.

Also, you need to have gathered the skills to do what you wanna do. But you'll put in the effort yourself if you're inspired.

I think some people aren't as easily "created" like that. Perhaps it is comparable to an artist; you can teach a lot of people to paint, write or sculpt, but that doesn't mean they all become artists. It's some sort of combination of affinity, drive, self-starting and creativity.

You need a man and a woman plus a bit of time.


> How do we create more people like this?

That'd be a challenge. He is obviously a very talented person who grew up behind the Iron Curtain and went to university just after that collapsed. The resource constraints bred creativity, a certain kind of hacker mentality. The lead developer of Crysis: Warhead and Crysis 2 (and indeed most of the former Crytek Budapest team) is another extremely talented programmer emerging from the same time. Palma sub pondere crescit!

To be fair nobody who wasn't in the inner circle would contemplate doing this amount of work when the likely outcome would be getting shut down by a Linus rant.

"Don't forget that Linux became only possible because 20 years of OS research was carefully studied, analyzed, discussed and thrown away."

— Ingo Molnar

http://www.ussg.iu.edu/hypermail/linux/kernel/9906.0/0746.ht... to the Linux Kernel mailing list in 1999.

This is how I feel about the Go programming language.

So often you see people ranting about the Go team either being ignorant or scornful of PL research.

I always wonder if those same people are Agile Manifesto zealots at work.

Well, those critiques are not unfounded. The designers of Go are certainly not ignorant of the last few decades of PL research, but they did make a deliberate decision to ignore it and create (they admit!) an inferior language for skilled practitioners, in the interest of being maximally accessible to newer programmers who are only familiar with imperative languages with C-like syntax.

> "The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt. – Rob Pike"

> "It must be familiar, roughly C-like. Programmers working at Google are early in their careers and are most familiar with procedural languages, particularly from the C family. The need to get programmers productive quickly in a new language means that the language cannot be too radical. – Rob Pike"


If you're interested in a critique that goes more deeply into Go's actual design flaws, as opposed to its deliberate design tradeoffs, I enjoy https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-....

Thanks for that.

What a great thread supporting the choices that Linus & the core team made at the time, given the hindsight we now have.

I'm not sure I would have agreed with Linus in '99 (and I was doing a Phd in the OS networking space then!).

Your bigger picture quote is great too. In academia it is easy to get yourself stuck in local minima, unable to see the better path beyond the hill of similar research surrounding you.

link broken?

It's cited from https://quotepark.com/quotes/1742017/history/ but probably rotted some time in the last 23 years.

Edit: here's a live mirror. http://lkml.iu.edu/hypermail/linux/kernel/9906.0/0746.html

Hey now, the full quote is a little cheekier:

    > platitudes hardly provide compelling evidence for ignoring 20 years
    > of networking/OS research and architecting unnecessary copies into the.  
    > core of an OS that wants to be taken seriously...

    dont forget that Linux became only possible because 20 years of OS research was carefully studied, analyzed, discussed and thrown away.

    -- mingo<endquote>
.. don't you go ignoring the fact he was talking about zero-copy in the kernel!

Kudos to getting this out. I can't imagine working on a series of patches for 2 years for build cleanups/speedups on a separate branch, having to periodically rebase against master and throwing away my work 3 times in order to figure out how to do it correctly. At any company I have worked for, I would have been fired for being unproductive/afraid of making changes.

This is one of the values of open source. Sometimes it's simply a labor of love. Since you're not "on the clock" to get the work done, you, sometimes, spend a little more time to improve the result, and the result reaches much farther than the for-profit company you work for.

Oh anyone who does stuff like this is almost definitely on the clock. It may have started out with people tinkering with an OS as a hobby, but modern Linux, the fairly reliable multi-platform behemoth kernel we use today, relies on huge companies like Microsoft, IBM, VMWare, AWS, Google, etc. hiring the maintainers and paying them to maintain the kernel. It could never work if it was all evenings-and-weekends people. It’s just too much work.

And that’s fine! It’s incredible that a collaborative effort like Linux (protected by the GPL, which forces the issue) can pull in all these big companies’ resources, given that the companies would probably prefer to have their employees work on things that only benefit them and not their competitors.

I think the key difference is: being payed with a deadline or being payed with a purpose.

The first might produce quick, but unsustainable or less perfect results, the latter might take more time but will provide significant benefits (in this case for maintenance, time wasted/energy consumed by build pipelines on a global scale).

Ingo is employed by Red Hat, and his job is presumably to maintain the linux kernel. He may well love his job, but he is also technically on the clock.

And this is the problem with many companies. They are more concerned about getting something done, even if it is broken or insecure, than doing something correctly. At least with open source, many people are passionate about it and an unemployed developer, student or other people can contribute openly.

Why not work on it and release in smaller batches though?

Sometimes it is only in the last 10-30% of the changes that any significant benefit is realized, yet the preceding work is a necessary foundation. That makes it a very difficult sell to the rest of the organization (whether OSS contributors or a corporation). Until the last bit it looks like you're making a neverending stream of pointless changes.

Large-scale changes also suffer from "too many cooks in the kitchen" and bikeshedding that can make getting the changes accepted a real slog. It is difficult to get people to buy into your vision purely with words, either because they don't see the value in it or they reflexively disagree that the benefits are achievable (sometimes motivated by their own burnout or cynical attitude).

The solution is to work on it in semi-secret until you can get things to a point where the value is demonstrated. That bypasses endless bikeshedding and pointless feedback rounds because everyone can see the value of the end state (if you've chosen something worthwhile to work on).

It's also a common pattern with code review: make a 100 lines change and you'll get at least 10 comments with endless bikeshedding, make a 100k lines change and people will say LGTM in no time.

It depends on whether the change is split up into nice units or not. In this case, despite being a 250,000 line change, it's split up into 2300 patches. That's much easier to review and look at than the equivalent change in 5 patches. And I've seen the latter, it doesn't make me want to merge anything, it makes me want to kill someone.

A 100k LGTM is the bad outcome for something important like the Linux Kernel, though, right?

Yes, but at the same time, I'm not sure how many people can thoroughly go through 2k commits and 100k diff for a single pull request. Most people, I assume, in that situation will just try to understand the main ideas behind the changes, scan through the areas that they're somewhat familiar with to see if there's any glaring issue, and trust that the highly-reputed maintainer who submitted the PR knew what he was doing.

Yeah, the review process for something like this is realistically that some maintainers look at the changes made to their specific subsystems, and if none of them spot problems you assume that the patchset as a whole is systemically correct. If any of them do spot problems that aren't entirely trivial then it gets more complicated.

How else would you divide the work up? We all need to go look at the commits on this branch that touch our favorite driver and make sure that they don’t break anything.

A 100k LoC LGTM in any project whatever is not just a bad outcome, it's proof positive that whatever review process you've got is rubber-stamping and not worth the effort.

The same applies at 100 LoC.

And at 10.

LGTM = “Let’s Go To the Moon”?

LGTM = Looks Good To Me

Lets Get This Merged

“Looks Good To Me”

As the RFC email clearly explains, they were experimenting, restarted multiple times and to go quite deep into it until they got to a point where this was worth announcing as something that might be worthwhile to merge.

Yeah. I’ve done this in the past. The first few attempts are about learning where the pain points are and to get ideas about what a good way to structure the patch set might look like.

Edit: obviously not at this scale. 2300 patches at once in a fork is a lot.

Well at this point i think it’s good work but now that the change is identified as good and fairly concrete - it’s still likely a benefit to break it up into smaller more reviewable chunks to take in over time. Maybe by major subsystem so their expert maintainers can take closer looks at it.

Sure, hence the email here about "here's all I have, lets discuss if and how to best merge this"?

He says himself he didn't see major improvements before 1500 commits.

It's actually a little more than 1 year - late 2020 to present

Here's the [1] patch cover letter sent to Linus Torvalds (similar to a Pull Request I suppose)

[1] https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/T/#u

> (similar to a Pull Request I suppose)

It's not "similar to a pull request", it is a pull request (or would be if this wasn't a RFC). While this one is a bit different than usual (he didn't post the full diffstat, for instance), that's the way pull requests were traditionally done, even before github/gitlab/etc existed. A traditional pull request is an email which besides the description says something like "please pull from git://... some-branch", and notice that this one has near the top "[...] which can be found here: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master". The maintainer pastes these two arguments (the URL and the branch) to a "git pull" command line, so in this case it would be "git pull git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master".

>similiar to a pull request.

Someone should make a tool which parses mailing lists and presents it in a Github-like frontend

Why? Github and similar front–ends top out at a few dozen commits or comments on a merge request before they start whining that the change is too big.

Absolutely nothing about GitHub infuriates me more than this “feature”… “large diffs are not rendered by default”.

Why does GitHub ignore the most important thing to look at in a PR? It’s not the tiny changes that are of top importance, it’s the big ones. The number of times I’ve been burned by trying to ctrl+f for something I’m looking for in someone’s PR (trying to see “does this diff touch X” for instance) — and getting a false impression of the code I’m reviewing — just because GitHub silently elided showing a big diff somewhere in the middle of a PR… is too high to count.

The single most important part of GitHub is the social aspect (code reviews), and it’s the part it is objectively worst at. If I just wanted to host my code, I could make a simple server with SSH accounts and let people push to it. We use GitHub because it’s supposed to be a venue for discussing complex changes, and it sucks by default at discussing complex changes.

It cuts costs. Server time to render out a large diff is expensive.

I use a GitHub enterprise instance, and we have enough server power to spare. There's no option to say "we have the server capacity, please render large diffs."

They probably didn’t tell the engineers who designed and implemented it that it was a cost–cutting measure. But even if they did, every option or preference that you add to your software multiplies the number of cases that you have to test. In principle a boolean option doubles the number of tests you need to do, because you have to run all of your tests with it off and all of them again with it on. Of course in practice people usually just assume that there won’t be any unwanted interactions between most of these type of options, which is often true enough. It is quite common to limit the number of such options in order to control costs over the long term. It reduces development, QA, maintenance, installation, and support costs. On the other hand, it does annoy users.

Could be read-only. Mailing lists aren't very accessible.

It depends on your definition of accessible. This is the tool kernel devs use, and that patch is intended for them. It's mostly a matter of getting used to it.

Github's user interface was scary at first for me, much more than a plain-text mail. So was GitLab.

Some people I know have complained to me that I send plain-text mails and that nobody wants to read a wall of text. I think that's probably a user issue, and it can in turn help filter low-effort comments.

If talking about a11y, plaintext mail ought to be one of the most accessible formats: high-contrast, can be processed by multiple tools including screen readers. ASCII art trough screenreaders is probably an exception.


Now, about your specific question: there's patchwork[1], though I don't think that instance covers the general LKML.

[1] https://patchwork.kernel.org/

Aerc is built around git+email flow: https://aerc-mail.org/


SourceHut does this since it exclusively uses mailing lists.

There’s patchwork, although I can’t find this submission in it.

The submitted link should be changed to this, imo.

It's directly linked in the article anyways

Is it interesting that now, nearly eight hours later, there are still no replies on the list? Is every relevant developer really going to wait to reply until after they understand this patch series? That seems very effective...

Everyone that is capable of reviewing this effectively is likely on leave right now.

Almost all the core maintainers are employed only to work on the kernel 9-5 outside of extraordinary situations (say a critical privesc or RCE in the kernel).

This patch set will not be ignored but it likely also won't receive review until everyone is back behind keyboards in the coming days.

Kernel devs enjoy the holiday season too!

The guy busted his ass for a year+. Almost any knee jerk response feels trite without appropriate due diligence?

How can We respond if We are allbusy building his branch before we can meaningfully respond in kind.

But his branch builds 70% faster now!

First reply came after 12 hours or so

Incredibly readable too!

    From: Ingo Molnar
    Subject: [PATCH 0000/2297] Eliminate the Linux kernel's "Dependency Hell"
    25,288 files changed, 178,024 insertions(+), 74,720 deletions(-)

Where did the additional 100kLOC come from?

I suspect it’s this: “Automated dependency addition to .h and .c files. This is about 790 commits, which are strictly limited to trivial addition of header dependencies. These patches make indirect dependencies explicit, and thus prepare the tree for aggressive removal of cross-dependencies from headers.”

Did a brief scan of the repo. Looks like a lot of it comes from adding new headers. This commit adds +#include <linux/wait_api.h> to 1627 files for instance.


>2298 emails

When will Linux move to a site like github / gitlab / something similar self hosted that supports people proposing changes without having to send every single person interested in the development of Linux thousands of emails.

I think having everyone see all the emails is part of the point. Even with self-hosted Gitea, someone has to _host_ it. And there isn't a trivial export button, and if the host goes rogue they can make it inaccessible and make everyone lose efficiency while they swap back to the old method.

And if you need email to authenticate on a website, why not just use email anyway?

But this is an interesting idea, and I'm on the look-out for ideas to replace email. I'll keep "LKML" in the back of my head as a use case.

>And there isn't a trivial export button

It does support making backups of repositories at least.

>and if the host goes rogue they can make it inaccessible and make everyone lose efficiency while they swap back to the old method.

Isn't this also a problem if the person managing the email list goes rogue? You have to trust someone to host the infrastructure.

>And if you need email to authenticate on a website, why not just use email anyway?

Because Email may not be the optimal user interface for handling issues, pull requests, code review, etc.

> Because Email may not be the optimal user interface for handling issues, pull requests, code review, etc.

Maybe that's your experience.

Others have used mailing lists for this purpose for decades and a lot of people prefer it.

For one, it's accessible. Every machine can do plaintext email. Every text editor can work with plaintext. It's simple. Integrate it into whatever shell/editor/scripting language you prefer.

As opposed to login into to a bespoke web interface, mainly primarily designed to be friendly to novice users.

>and a lot of people prefer it.

If this in true then why do so few projects use a mailing list nowadays instead of something like github.

>Every machine can do plaintext email.

Every machine can do web browsing too. More people have used a web browser on their computer than a dedicated email client (as opposed to a web app like gmail).

>As opposed to login into to a bespoke web interface

You have to log in to email too.

>mainly primarily designed to be friendly to novice users.

What's wrong with that? Having good UX is a win. The UX for creating a new repo on github is a million times better than creating a mailing list (yes I have set up a mailing list for a project I made and no one e ended up using it except for me. Meanwhile I had much more success with getting people to join and discuss the project via Discord)

> If this in true then why do so few projects use a mailing list nowadays instead of something like github.

Lot's of projects use mailing lists. Lot's of projects use web-based git hosting services. Lot's of projects mix.

Why do people just use GitHub? Because it requires zero configuration and it's convenient when you're developing something yourself.

The Linux Kernel is by far the worlds largest open source project. Consider that it might have different needs.

>> Every machine can do plaintext email.

> Every machine can do web browsing too.

You're missing the point. A web interface is more complex than plaintext emails. To integrate into your shell environment is a lot more complicated.

> More people have used a web browser on their computer than a dedicated email client (as opposed to a web app like gmail).

I don't understand your point.

>> As opposed to login into to a bespoke web interface

> You have to log in to email too.

The emphasis was on bespoke web interface, not logging in.

>>mainly primarily designed to be friendly to novice users.

>What's wrong with that?

Fast and flexible are sacrificed. Ie. it's more convenient for the people who don't use the system all the time (people filing bug-reports) vs. the people who actually use it all the time (maintainers)

> yes I have set up a mailing list for a project I made and no one e ended up using it except for me

a tad different use-case to the development of the linux kernel it seems

> Meanwhile I had much more success with getting people to join and discuss the project via Discord

Whatever works for your project bro.

> Every machine can do web browsing too.

Tell that to most of my browsers which refuse to display any Gitlab content (they only ever show the side bar).

>> mainly primarily designed to be friendly to novice users.

> What's wrong with that? Having good UX is a win.

What's "good UX for novice users" isn't necessarily good UX for more advanced ones.

> Isn't this also a problem if the person managing the email list goes rogue? You have to trust someone to host the infrastructure.

It depends on what kind of threat you are worried about. The beautiful thing is even if LKML was abused and ruined tomorrow, nothing is lost or damaged. The emails have already been sent, and moving to a new list is more of a nuisance at this point.

If you're worried about authenticating a patch series, committers can sign their commits a number of ways.

Lastly, a large chunk of Linux development happens off-list, with subsystem maintainers building branches for Linus or Greg to pull from.

I can't stop laughing at the idea of GitLab struggling to render 100k lines of changes.

Please actually attempt the ideas you are suggesting before suggesting them.

This set of changes (and other large ones like it) is partially why the kernel needs to use email instead of <insert bloated Ruby application here>.

GitLab is awful and slow at rendering large changes. Sometimes when folding and unfolding files, the scroll position will jump around (though I'm unsure if that's a browser extension). And when viewing standalone files, KDE's Invent GitLab server is slow and can take 5-10 seconds to render a few thousand lines and send it to the client as JSON with syntax highlighting, before the file contents show up.

> GitLab is awful and slow at rendering large changes

So’s github, anything of a non-trivial size it just won’t show by default and you need to re-request every time.

It starts struggling around the kloc scale, and the browser itself soon follows as the dom is not the neatest and you start getting tab memory above half a gig, which I assume also makes the JS start killing itself or something.

Good times.

gitlab can't render this changeset. gitlab has a limit what it can render.

> why the kernel needs to use email instead of

This sort of implies that the Kernel would use something centralized if it worked well, which I don't believe to be true. Using email is very much an intentional choice and a desirable solution.

>This sort of implies that the Kernel would use something centralized if it worked well, which I don't believe to be true.

The mailing list that they use now is centralized. It works by everyone sending a message to a single email address. The owner of that address then sends the email out to all of the subscrbers of the mailing list.

Having everyone who wants to upstream their patch directly to the kernel submit their work to a single website is not much different than them sending it to a single email.

But that isn't how it works right? You send your pull requests to the relevant maintainer, CC people that you think should be aware _and_ send it to the mailing list (probably also "only" via CC).

This here for example is sent to Linus directly and that part does _not_ depend on the mailing list. The mailing list is just a convenience feature for people that _want_ to be aware but may be looked over/forgotten by the submitter (which still happens, and people regularly say stuff like "also CC'ing additional relevant people").

Yes you could, at least in principle, also do that via GitHub/GitLab/whatever. But a) you wouldn't gain anything (the kernel already has all the tooling it needs and it's large enough to just create missing tools, heck git was literally written for it), b) you would introduce a massive single point of failure that either i) is _not_ under control of the kernel.org team or ii) needs a massive amount of resources to host (see the fun people poke at displaying 2k commits in GitLab). Neither seems like a good use of the kernel development resources. And if you want to contribute to the kernel... using a mailing list is an absolutely trivial problem.

Yeah which is why I put the word "partially" in front of that out of context quote.

It's not like you would render and display the diffs in 2k emails at the same time either.

There’s no reason you couldn’t. And part of the beauty of patch emails containing full diffs is that you can take them and use whatever tool you want to render them and not be beholden to some dumb web UI’s opinion of what’s “too big.”

The difference is in one case a centralized server is expending a lot of resources to show something to the user, and in the other case it all happens locally and has no effect on the kernel.org infrastructure.

I don't think the issue is with the centralized server, but the browser and the JS components they use. I prefer local anyhow.

Didn't git do exactly that in preparing those emails to be sent? I wonder how long it took for that command to run.

Probably very, very quickly. git-format-patch is very fast; all it has to do is diff the files involved in each commit, which is already heavily optimized. No syntax highlighting (your email program can do that if you want it), or line numbering (ditto).

61.65 seconds to create all 2,298 emails on my computer, with the repository on spinning rust. That’s about one fortieth of a second each on average.

Don't discount the heaviness of rendering the HTML on the client, too!

I'm sure it could handle you viewing each commit one at a time just how with email you would look at each commit one at a time.

I have actually came to appreciate email much more than I used to. There are so many bad implementations of email, that for years I was on the bandwagon of having something else replace email.

Then I came to realize that my problem was not with email itself, but the way I had been using it. All GitHub, GitLab, etc do is take a decentralized platform and add centralization. I love the idea of using email now instead of a centralized issue tracker. Mailing lists become the issue tracker and those are then published on a website for others to use that want some centralized features.

A mailing list is a centralized platform. Just as the internet / web is decentralized and github lives on the web. Email is decentralized and a mailing list lives on email.

Centralized in this instance means one provider controlling the platform. GitHub is owned and controlled by Microsoft and therefore has a descent amount of vendor lock-in. Email is not controlled by a single provider and if configured correctly, you can easily migrate email between various providers or host it yourself. Email uses open standards.

A mailing list is centralized, but much less so than GitHub and is secondary in this instance.

> 0 siblings, 0 replies; only message in thread

Ingo did not send 2298 patch emails, he sent only the 0000 one, which contains the location of his public git repo for all this. People can clone this, and can even push it to a github/gitlab/etc hosted repo if they like.

In fact, sending too many patches to the mailing list is frowned upon:


Just checking here, but you are aware that the person who specifically likes the current process for managing the linux kernel sources is also the person who designed and implemented git, right? As in, Linus wrote git because he wanted a tool for managing sources exactly the way he wanted to do it. Is it any surprise that he has no desire to move away from that workflow?

Yes, I am aware of that, but I'm also aware that this is an ancient workflow. Git was released before I even used a computer for the first time.

I think it's natural for workflows to change overtime.

BDFLs gonna BDFL.

It's a fucking shame how many people nowadays can't grok the difference between git and GitHub.

Your question is fundamentally wrong; totally bass-ackwards. The correct question is:

When will all the zillion other projects move off of GitHub / GitLab / all the other usurper sites?

nobody is forcing you to read all of them

yeah just minimize the PR like I minimized this thread..

It just seems like a waste of resources to send all these emails to people who are not going to read them.

?? how?

it's plain-text email

the display ad on your local news website is probably wasting more resources

not for a very long time. The devs prefer email to github methods of communicating and patching.

You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.

Think how many lines are now duplicate (and therefore need to be updated in twice as many places, and bugs introduced when a copy is missed).

Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.

If the 100k lines were in their own subdirectory and added a major new feature, it would be worth it. But spread across the whole codebase, and introducing more duplication and chances for bugs I think outweighs a 'neater' header file system.

The 'dependency hell' is only an issue for people who only want to build part of the kernel anyway. If you're building the whole lot, you might as well just include everything and it'll work great.

> [This patch series] decouples much of the high level headers from others, uninlining of unnecessary functions, a decoupling of the type and API headers, automated dependency handling of header files, and a variety of other changes

That's a lot more than "shaving the build time."

As far as I understand it, the goal of this patchset isn't to improve the build time; that's just a nice consequence. The goal was to refactor the header-file hierarchy to make it more maintainable and less "brittle." Sometimes, increasing maintainability requires more code. (Almost always, if the current version is a terse mess of mixed concerns.)

Think of it this way: take an IOCCC entry, and de-obfuscate it. You're "increasing the size of the codebase." You might even be "duplicating" some things (e.g. magic constants that were forcefully squashed together because they happened to share a value, which are now separate constants per semantic meaning.) But doing this obviously increases the maintainability of the code.

I'd say cycle time is also important anyway: we spend a lot of time building the Linux kernel in various forms. The savings across the board in manhours waiting for compilation, or for git bisect, are not insubstantial.

I wonder what the memory savings for compilation look like? Because that's also potentially more workers in automated testing farms for the same cost.

I've never tried it myself, but presuming you build the kernel in modular rather than monolithic mode, wouldn't the incremental compiles during git-bisect et al already be pretty quick? You'd only be rebuilding the modules whose source files changed, and those (presumably) wouldn't be pulling in the entire header tree. (Or would they, with that being "the problem"?)

That is the problem.

> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.

When you're talking a project of half a million lines, sure.

The Linux kernel has around 27.8 million lines of code. An increase of .35%

> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.

Why add features at all? Code has a purpose. Sometimes bringing code into a static context is a net good. It was going to be generated at runtime anyway.

> If you're building the whole lot, you might as well just include everything and it'll work great.

That's not strictly true, but it's true for these features, which is a stated reasoning.

Good golly, that 0.35% figure really puts the patch set into perspective

A library I'm working on is 7000 LOC which seems pretty sizable, but 0.35% of that is 25 LOC.

The Linux patch set is actually tiny

>The Linux kernel has around 27.8 million lines of code. An increase of .35%

This is horribly misleading; most of these lines of code are drivers, which this patchset doesn't even concern.

It's still a massive change that only a handful of developers will ever be able to review in entirety - a fact to which the size of the project is completely irrelevant - if anything, actually, it urges even more caution, given the implied complexity. Which I believe was (at least in part) parent comment's point - given the importance and ubiquity of the Linux kernel, this may be concerning.

That said, I am very confident in the structures put in place by the kernel devs, their competence and the necessity for such a change - but trivializing a 100k LoC patchset because the project it's intended to land in is even more colossally complex isn't how I'd choose my approach.

> This is horribly misleading; most of these lines of code are drivers, which this patchset doesn't even concern.

That's not true at all, a big part of those added lines are added includes in drivers.

E.g. this commit I picked at random adds 1500 lines, of which just a few procent are in the core kernel: https://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.gi...

> You have to have an awfully good reason to add 100k lines of code to a project... And I don't think merely shaving 50% off the build time is enough.

In Ingo's post, he points out that the main speedup is coming from the fact that the expansion after the C preprocessor step is a LOT smaller.

That's a lot of decoupling. As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep. Having those data structures in files actually mentioned by the file you're working on would be a huge cognitive load improvement as well as make tool assistance much more plausible.

Even if this particular patch doesn't land, it lights the path. What types of changes need to be made are now clear. How many changes are required before you see the improvement is now clear. With those, the changes required can be driven down into the maintainers and rolled out incrementally, if desired.

> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.

It has always amazed me how finding something which originally seemed a trivial little thing, usually meant going through a chain of #defines and typedefs across many header files. It's the same with GLibc, by the way. It' a bit like when you hike to a summit by following a crest path: you always think the next hump in sight is the right one, your destination, the promise land; and when you reach it, dammit, it wasn't, your goal is actually the next one. Or perhaps the next after the next. Or...

Yes, actually when I came to C and asked on stackexchange/unix&linux how I could search which libc header I had to include in order to use a defined macro or find type declaration,without a web search, my question was shot down in the hour and I was recommended to either make a web search or grep and good luck. This how desperate we are : We can't handle newbies putting that truth in front of us.

> As someone who had to go rattling over the USB gadget subsystem, I can tell you that running "grep" with "find" was the standard way to find some data structure buried in a file included 8 layers deep.

As someone who tried poking around: Oh good; I assumed I was just missing something. Alternatively, oh no; I had assumed I was missing something and there was a more elegant tool out there.

https://elixir.bootlin.com/linux/latest/source can be far more helpful than grep for this sort of thing

> You have to have an awfully good reason to add 100k lines of code to a project

yes. and this is very, very good reason. As another poster said, you at 0.35% lines to make it compile almost twice as fast? And you're not happy about that?

> Think how many lines are now duplicate (and therefore need to be updated in twice ...

OK, how many? None! That's how many. Adding proper header dependencies to the .c module doesn't duplicate anything. Unless you think adding #include <stdio.h> in every module somehow creates unmaintainable duplication.

> Think how much extra stuff someone needs to skim through looking for the relevant file or part of the file.

OK. Hrm... I think a lot, lot less is how much. That's the whole point of a major cleanup like this. Proper decoupling. Headers that you use are obvious where they belong not being brought in with some action-at-a-distance accident.

Maybe the people developing the project have some insights into which build improvements they would benefit from.

Having to recompile the entire tree every time you change a seemingly unrelated header gets old fast.

You need to read the article. It explains all this.

> I got to over 500 commits in this tree and had to throw away my second attempt

But wait, why didn't you just break the task down into sub-tasks that could be estimated in four-hour increments in the first place? Every single "scrum master" I've ever labored under has insisted that every software development task can be broken down this way.

Clearly Ingo is just an inexperienced developer who struggles to do this simple task. Maybe he needs more training?

I hadn’t thought about it, but this is a great counter example to the (bad) assumptions of scrum.

Can't imagine the amount of whack-a-mole and backtracking involved in refactoring this many headers in a large project. I've done this for a small project and it's a migraine inducing type of work. Yet so valuable. Kudos!

I have a firmware project written in C that I maintain. Probably 200k lines of code. I feel like I'm constantly fretting how to keep the headers and dependencies from becoming a hot mess. At the same time not exploding the number of headers either.

He writes:

As to maintenance overhead: it was surprisingly low overhead to keep dependencies at a minimum across upstream kernel releases - there were typically just around ~5 dependency additions that need to be addressed. This makes me hopeful that an optimal 'fast' state of header dependencies can be maintained going forward - once the initial set of fixes are in of course.

It seems like this could regress quickly just by a developer adding a "bad" dependency to some core header which happened to #include a lot of code by accident. Don't we need tools to ensure this doesn't regress? Like something which warns "this git commit increases the amount of compiled code by 20%".

Not sure, but isn't a large part of it keeping #include out of header files? That's easy to spot, but can be hard to untangle once its done.

I’ve embarked down this same path with GCC 2.7.4 from a … while … back.

Ingo is on the right track.

I’ve always thought that this stuff should have been dealt with by the GCC (in form of pre-cached header files).

So sad that it is a thorny issue to this day (for large scale projects); so glad that this issue is now alive so it can be dealt with once and for all.

I envision some type of *.ho filetype to hold these quasi-compiled headers for the next build iteration.

"... dealt with once and for all."

Is at least one GCC maintainer looking at this? Otherwise it'll likely not be dealt with "for all", right?

On the kernel side every new dependency has to be added with keeping in mind this problem as long as there's no automation to help with this it'll be a problem.

Do you mean .pch? I think it works quite well generally, but for the kernel it might be too fragile as it is not so widely used.

> For the 'reference' subsystem of the scheduler, I also improved build speed by consolidating .c files into roughly equal size build units. Instead of 20+ separate .o's, there's now just 4 .o's being built. Obviously this approach does not scale to the over 30,000 .c files in the kernel, but I wanted to demonstrate it because optimizing at that level brings the next level of build performance, and it might be feasible for a handful of other core kernel subsystems.

This is important, realizing that compilation units rarely collide on private namespace usage. Called 'compilation unit grouping', or 'CU grouping', I implemented this once at a customer's and reduced build time of a large C++ project by 2x.

Merging CU's should preferably be done by the build tool and not have to be written into the .c source files themselves.

Exactly that - the implementation of it was done transparently under scons's SConstruct logic.

I remember doing this - one C file with about 100 lines from #include "file001.c" to #include "file299.c"

These were files from a library that used a large number of tiny files, we prefered to embed this in our project rather than link in.

Massive build time reduction.

There is some technical name for this other than "compilation unit grouping" - I forget what

Unity builds, but also unified and jumbo builds.

If you use CMake it's just a matter of setting -DCMAKE_UNITY_BUILD=1

The hero we need but don't deserve

This kids is exactly why nobody maintains out-of-tree kernel drivers.

Meh. There are loads of reasons why maintaining out of tree drivers is a bad idea. But this patch set is unlikly to cause any problems.

>> This kids is exactly why nobody maintains out-of-tree kernel drivers.

After this it should be easier to maintain out of tree stuff since it eliminates "dependency hell".

The dependency hell is not the kind of dependency hell where it's difficult to work on the code because including something might break something else. The dependency hell described in this email is to do with headers including headers including headers, eventually including a large quantity of text (which now has to be parsed) most of which is unnecessary. The only problem this "dependency hell" causes for out-of-tree modules is that builds take a bit longer. The fixes to this dependency hell are not going to make out-of-tree modules work better, they're likely going to break them in a lot of ways due to the many implicitly included headers they happen to have.

Very curious to see the reaction to this. Such a massive restructuring of the trees, header dependencies, and some of the machinery- these facts with which so many people are so familiar. There are few "who moved my cheese" events like moving code around.

For the other humans whose brains work on the kernel- the abrupt shock of moving to a vaguely familiar but in the end quite different house. Yes, the old one was suboptimal in various ways but this new one...

Few people's brains work like Ingo's. And he has had 2 years to work and iterate through the mental model. Field day here for the Science, Technology, and Society crowd, watching how old dogs deal with new tricks.

What's the process like for approving and implementing a series of patches this huge in quantity? Seems like it would almost require stalling on other pending issues for a significant amount of time while this is worked in.

It can probably be merged piece by piece, so it won't be a problem.

The reason he wrote the full patch series out like this, is because he wanted to measure the speed gains. Without significant speed gains it would be difficult to convince the kernel developers to make a big uncomfortable change like this.

> The reason he wrote the full patch series out like this, is because he wanted to measure the speed gains.

More importantly, because a lot of the patches involve moving code between files, or making batch changes to a large number of files at once. Evaluating these sorts of changes for correctness is a lot easier when the changes are isolated.

If it does as advertised, probably should not affect too much. Specifically, moving implicit dependencies to explicit should not impact anyone using the dependency. Might trip up code that is adding a new implicit?

Making implicit deps explicit is only one part of the patchset.

Moving all that header code around between files will conflict with all other code merges that touch those same lines.

Fair. I'm assuming the patches to update includes covers a wider set than changing the headers themselves.

Though, really, I was just saying why it may not be as bad as it sounds. Can still be bad, of course.


> - As to maintenance overhead: it was surprisingly low overhead to keep dependencies at a minimum across upstream kernel releases - there were tyically just around ~5 dependency additions that need to be addressed. This makes me hopeful that an optimal 'fast' state of header dependencies can be maintained going forward - once the initial set of fixes are in of course.

GCC needs something like that too. Building on a big multicore box turns out to not get that much parallelism, because of the dependency ordering.

Sounds like headers in general are not necessarily a good idea.

They provide flexibility but at a cost.

I enjoyed Borland Pascal compilation speeds and clean dependencies before switching to Linux. I miss it, together with Delphi.

C++ has been trying modules, and Gabriel Dos Reis from Microsoft has gotten a pretty decent standard and implementation done. It's likely it will be added to the C++23 standard. The biggest issue seems to be "What cross compiler format do we use for the compiled modules" (the equivalent of DCUs). The other problem is that macros have to still work, and code that uses them will not benefit as much from modules.

The gains in compilation speed (including linking) and dependency information tracking is phenomenal though. It would certainly bring C++ in line with languages like Java, Rust etc.

The CPPCon 2021 talk about this is enlightening.

If C++ gets modules, there is no reason why C can't use the same model - since the compilers will have that stuff anyway.

Modules are part of C++20

And it's telling that we're in 2022 and still the only compiler with full C++ module support according to cppreference [0] is MSVC.

[0] https://en.cppreference.com/w/cpp/compiler_support



The c compilation model is tuned for ancient hardware. On modern hardware, incremental compilation isn't all that useful, so the entire import/header model makes less sense.

Hopefully the Linux kernel will eventually get automated presubmits for IWYU and other things.

I find it fascinating to consider the potential impact on energy consumption, human time, hardware and hosting costs, etc. of just one patch to just one thing.

I was wondering the same thing. Something like the Linux kernel gets built and rebuilt constantly by build systems and users all over the world. Even marginal improvements in build cost become big when there's such a large multiplier.

Does the kernel currently use precompiled headers? Anyone know how much they ought to help with this sorta scenario? (Like, presumably their gains would be lessened by this patchset, but could they have gotten halfway there on performance on their own?)

> As to other approaches attempted: I also tried enabling the kernel to use pre-compiled headers (with GCC), which didn't bring a performance improvement beyond a 1-2% for repeat builds. Plus during development most of the precompiled headers are cache-cold in any case and need to be generated by the compiler again and again.

Precompiling C headers is almost never useful. It gives essentially no benefit (since compiling most C headers is fast) with more binaries to worry about. I only bother to do it for especially slow libraries in C++.

And Ingo probably had the greatest heads up over the rest of us by having his new kernel perform the fastest iterative kernel rebuilds, probably using all permutation of kernel option settings for rebuildings.

Thanks! It's true that that was posted earlier, but it is a blog post reporting on this email, and the site guidelines call for original sources:

"Please submit the original source. If a post reports on something found on another site, submit the latter."

So I think we'll merge that thread into this one.

not exactly, this is the mailing list.

IMHO, much more interesting that the original article

Forgive my ignorance as I'm not a kernel dev.

It seems like a change this massive would have the possibility of introducing a bunch of security holes, would it not?

All changes can have screwups; but a lot of these are moving things around and fairly mechanical infrastructure changes; so it shouldn't be that dangerous.

It seems to be rejigging header files so largely not adding new functionality or modifying code, but giving the compiler a lot less work to do.

It is not a code change per-se as much as a change to how the files are built.

Think of it like changing the order around of how you cook a fine meal and putting a few tricks all senior chefs know into the recipe explicitly. No ingredients were added or removed, do poison doesn’t get accidentally added.

what the process to get that type of change request merged? I suspect it's complex to find reviewers for such a broad change set.

I imagine all the individual owners, or Linus on a treadmill would be the way to go.

Disclaimer: I know nothing about the Linux kernel!

It sounds to me like this work is re-organizing and moving around existing code in the header files to reduce the amount of code the C compiler has to wade through during a Linux build.

So one way to verify the changes is to compare the binary files (or the resulting executable) built the old way vs the new way. In theory, they should be identical. (Or I could be misunderstanding the whole thing and you should stop reading now!)

I did something like this for a Prime minicomputer emulator project: https://github.com/prirun/p50em

It was initially coded for a PowerPC chip because that is big-endian like the minicomputer being emulated, and to be honest, I wasn't sure I could even do the emulation, let alone do it and be mentally swapping bytes all the time. After about 10 years of it working well, I spent a week or two reworking it to add byte-swapping for Intel CPUs.

The method I used was to first add macros to all memory and register references. This was done on the PowerPC build, and after each set of incremental changes, I'd verify that the executable created with macros was identical to the original executable without macros. If it wasn't, I goofed somewhere, so backed up and added the changes in smaller sections until I found the problem. I didn't have to do any testing of the resulting executable to verify thousands of code changes. Practically every line of code in the 10K LOC project was touched because of the new macros, but each macro should have been a no-op on the PowerPC build with no byte swapping.

Next I built it on an Intel CPU where the byte-swap macros were enabled and it basically worked, maybe with a tweak here or there.

As a further test, I ran the PowerPC emulator through a bunch of existing tests with tracing enabled and the clock rate fixed. This records the machine state after each instruction, and with a fixed clock, yields a reproduceable instruction trace. Then ran the same test with the Intel version and compared the instruction traces. If there were any differences, it was because I forgot to add a swap macro somewhere.

After a few months of running it 24x7 myself w/o problems, a customer did find a bug in the serial device I/O subsystem where I had forgotten to add a macro. I hadn't done any testing of this subsystem (terminal I/O on real serial RS232 devices like printers.)

If something similar is being done with these Linux changes, verifying them may not be as hard as it seems initially.

There were code changes. e.g. Search for "per_task" in the cover letter ( https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/T/#u). Also a lot of things were uninlined.

Each maintainer will check their subsystem/driver, etc, standard tests will be run, and someone like Linus will go over the methodology is likely what will happen.

Ingo has likely done a bunch of tests himself, of course.

As someone pointed out, it’s only like 0.13% of the kernel.

I think step 1 is “be Ingo or someone comparably legendary” :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact