How did this happen? Which commit caused it?

smashed · on Jan 13, 2023

Best explanation I found is here:

https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX...

A patch was backported to the 6.0 branch from the main branch, but they forgot a line of code, leading to a buggy behavior.

bravetraveler · on Jan 13, 2023

I'll open by saying I'll forever be thankful for GKH... but this response kills me:

> As 6.0.y is now end-of-life, is there anything keeping you on that kernel tree?

Uh, several distributions. It wasn't EOL enough to prevent breaking it, so fix it.

Don't even technically need their input, Git and all.

I'll buy this EOL thing if they revert the change that caused this and stop releasing under 6.0. There were at least two more after this

Arnavion · on Jan 13, 2023

>Uh, several distributions.

That is the kernel bug tracker, not the distributions bug tracker.

>It wasn't EOL enough to prevent breaking it, so fix it.

It wasn't EOL at the time the patch was backported. It's EOL now.

>I'll buy this EOL thing if they revert the change that caused this and stop releasing under 6.0. There were at least two more after this

Not sure what "this" in "two more after this" is, but there have been no 6.0 releases since it was EOLed.

bravetraveler · on Jan 13, 2023

> That is the kernel bug tracker, not the distributions bug tracker.

The point is that the contributors reasoning for being on the tree is irrelevant. Like you said, they just made it EOL

Distributions are the continuous/constant answer as to why countless people will be. This isn't an ancient release, something from the grave.

Is the expectation, then, that distributions would have to patch out the regression - or take on a more major upgrade (6.1 / 6.2), likely breaking something else?

Neither of these are particularly tenable. I'm glad GKH was willing to accept further changes to make it correct, but reverting is also applicable.

Breaking something, calling it EOL, and not fixing it is closer to dead than end of life.

Arnavion · on Jan 13, 2023

>Is the expectation, then, that distributions would have to patch out the regression - or take on a more major upgrade (6.1 / 6.2), likely breaking something else?

Correct.

>Neither of these are particularly tenable.

Yes they are.

>Breaking something, calling it EOL, and not fixing it is closer to dead than end of life.

You're awfully confident about how things should work, even though you don't understand how they already work.

bravetraveler · on Jan 13, 2023

Distributions have their own release strategies to pick up what are now in the 6.1/6.2 trees. Some were cut at a bad time where they're treating the now EOL stable as longer term

This is really just a pedantic criticism on the handling of 6.0, and the 'ignorance' (I hate the connotation of the term) of why people don't run latest.

I'm not asking them to bend over backwards, here.

Things are going more or less the way I want, 6.0 will get fixed [edit: upstream]. Please don't take this the wrong way.

vfclists · on Jan 13, 2023

This soooo BASIC!!

Why aren't the processes either manual or automated in place to check for things like this?

Aren't there some tests in place to check for such basic functionality errors?

Doesn't kernel development process mandate much facilities? I'm sure the NSA, Unit 8200 and GCHQ have tests like this in place but don't share their findings.

Is it a matter of funding or leadership philosophy and priorities?

andrewf · on Jan 13, 2023

IIRC the Linux maintainers view themselves as providing a kernel for distros to bundle.

You can get a kernel from Red Hat that has been through Red Hat's release process. Red Hat has their own test suite/labs and will also pay attention to test results from elsewhere - including Fedora, their evergreen distro for putting new software into the wild ahead of its incorporation into Red Hat Enterprise Linux.

Substitute the distro of your choice.

touisteur · on Jan 13, 2023

Wondering whether there's a company out there that does kernel testing as a service. Give your kernel conf, some tunings of basic services, eventually your distro, and have an automatic testsuite run for your subset, cyclictests, syzkaller instance, have some of your stresstests app run. Might be useful in a world of firecracker/microvms with smaller kernel surfaces?

cdelsolar · on Jan 13, 2023

yeah right? doesn't the kernel have a test suite?

bonzini · on Jan 13, 2023

Every kernel subsystem has its own testsuite. Running all of them would requires hundreds of different pieces of hardware, so it's not really possible for a single release manager to do so.

For Linus's releases this is easily solved by slowing down progressively the pace of development towards a release, so that cross-subsystem issues where maintainer A breaks maintainer B's subsystem become progressively less likely over the two months of the release cycle.

For stable releases this is much harder to do because of the short cycle. The stable branches in the end are a mostly automated collection of patches based on both maintainer input and the output of a machine learning model. The quality of stable branches is generally pretty good, or screwups such as this one would not make a headline; but that's more a result of discipline of mainline kernel development, rather than a virtue of the stable kernel release process.

vfclists · on Jan 13, 2023

> Running all of them would requires hundreds of different pieces of hardware, so it's not really possible for a single release manager to do so.

The issue is this bug is not hardware related. Its a pure software issue.

Hardware bugs are an entirely different kettle of fish.

BTW is that bonzini of GNU Smalltalk fame?

bonzini · on Jan 13, 2023

Yes, I agree that _this_ issue could have been found. But the parent was talking more in general of "doesn't the kernel have a test suite", and both hardware-dependent (drivers, profiling, virtualization, etc.) and hardware-independent (filesystem, networking, etc.) aspects of the kernel are distributed across multiple testsuites.

The stable kernels pre-release queue is posted periodically to the mailing list and subsystem maintainers _could_ run it through their tests, but honestly I don't believe that many do. Personally I prefer to err on the other side; unless something was explicitly chosen for stable kernel inclusion and applies perfectly, I ask the stable kernel maintainers to not bother include the commit. This approach also has disadvantages of course, so they still run their machine learning thingy and I approve/reject each commit that the bot flags for inclusion.

> BTW is that bonzini of GNU Smalltalk fame?

Yes it's me. :) Did we meet?

vfclists · on Jan 14, 2023

> Yes it's me. :) Did we meet?

I'm a fan of Smalltalk and I used to follow your development of GNU Smalltalk.

What's happened to it? It seems to have fallen by the wayside.

bonzini · on Jan 15, 2023

I got a job and a family. :)

yjftsjthsd-h · on Jan 13, 2023

So that might suggest that it's actually better to just track the latest version, rather than worrying about backports?

CJefferson · on Jan 13, 2023

In my (limited) experience, the only reason to use backports is because you have closed source kernel modules which you can't update (that of course ends up covering most Android phones, and many SOCs)

touisteur · on Jan 13, 2023

There's also official support of vmm things like firecracker, which officially supports only 5.10 and maybe latest but don't send bugs?

ilyt · on Jan 13, 2023

Distros standardize on version not because it is more stable but because tooling (which might include 3rd party modules for the kernel) can then rely to work on that version without recompile.

If you don't have that constraint yeah, not much reason.

eklitzke · on Jan 13, 2023

The vast majority of the time fixes like this that are being backported are straightforward fixes for bugs (security or otherwise) that require very little manual conflict resolution, especially if the fix is just being backported one or two kernel releases. The developer can often just cherry-pick the commit into a few recent release branches and most of the time git will just automatically do the merge correctly, or if there is a manual merge conflict it's something really simple. In fact, if there's a complicated merge conflict often the change won't be backported at all unless the bug is actually serious enough to warrant X hours of someone's time to do it and get code review etc. Most of the time this process works correctly, but obviously there's room for error and mistakes can happen.

There's a tradeoff between the risk of running an older kernel that has known bugs, upgrading to the latest new kernel which has bug fixes but may introduce new bugs, and getting backports for known bugs to your known working kernel. Most of the time the last option is reasonable but it definitely depends on your use case and what you're optimizing for.