
Kernel quality control, or the lack thereof - based2
https://lwn.net/Articles/774114/
======
exmadscientist
This is nothing new. I followed kernel development pretty closely about a
decade ago and I was honestly very disappointed in what I saw. It's all about
features, features, features, and there's almost no testing beyond "well, it
works on _my_ machine". It's kind of ironic that Linux has become the C++ of
kernels, given Linus's feelings about C++: a giant pile of features, many of
which nobody really understands, that can have some really weird corner case
interactions.

I switched to FreeBSD as a result of my time reading LKML. The BSDs seem, at
least to me, to be more designed rather than thrown together; but maybe I just
haven't spent enough time watching their sausage get made.

Hopefully now that a senior kernel developer (Chinner) is saying some of these
things publicly, things can get better. But a culture change of the magnitude
needed will not come quickly.

~~~
m45t3r
I don't think it is surprise that Linux is about adding more features in an
organic way, instead of being well thought like *BSDs.

However saying that no testing happens in Linux kernel is dishonest, to say at
least: there is automated tests maintained by big corporations like LTP [1] or
autotest [2], thousands of people run different versions of unstable/mainline
kernels with different configurations, security researches does multiple tests
like running fuzzers and reporting issues, multiple opensource projects run
tests in current versions of Linux kernel (that in the end also serves as a
test of kernel itself), etc etc.

Linux is basically the kind of project that is big enough and impactful enough
that naturally gets testing for free from the community.

[1]: [https://github.com/linux-test-project/ltp](https://github.com/linux-
test-project/ltp)

[2]:
[https://github.com/autotest/autotest](https://github.com/autotest/autotest)

~~~
maltalex
> Linux is basically the kind of project that is big enough and impactful
> enough that naturally gets testing for free from the community.

That makes intuitive sense, but is it really true? Is Linux being _throughly_
tested by the community?

~~~
snazz
What’s crazy for me, after reading all this, is how wonderfully stable my
Linux (kernel-level)[0] experience has been. I’ve never used any non-ext2/3/4
filesystems, granted, so I haven’t used this code, but I find it hard to
believe that these findings are indicative of the code I have used on a
relatively run-of-the-mill amd64 machine. So maybe if you’re like me, using a
fairly standard distro with the official kernel on somewhat normal hardware,
you would have the benefit of millions others testing the same code.

[0]: I have had more than my fair share of user land problems, but I have come
to expect that on any platform.

~~~
chubot
Yeah I've also had good experiences with Linux reliability.

But that's because I intentionally stay on the "happy path" that's been tested
by millions of others. I avoid changing any kernel settings and purposely
choose bog-standard hardware (Dell).

When you're on the other side, you're not just maintaining the happy path.
You're maintaining every path! And I'm sure it is unbelievably complex and
frustrating to work with.

\-----

Personally I would like software to move beyond "the happy path works" but
that seems beyond the state of the art.

I also think there is a big component of this:

 _Operant Conditioning by Software Bugs_

[https://blog.regehr.org/archives/861](https://blog.regehr.org/archives/861)

Over time you get trained not to do anything "weird" on your computer, because
you know that say opening too many programs at once can cause a lockup. Or you
don't want to aggressively move your mouse too much when doing other expensive
operations. (This may be in user space or the kernel, but either way you're
trained not to do it.)

There is another post that I can't find that is about "changing defaults". I
used to be one of those people who tried to configure my system, but I've
given up on that. The minute you have a custom configuration, you run into
bugs, with both open source and commercial software.

The kernel has _thousands_ of runtime and compile-time options, so I have no
doubt that there are _thousands upon thousands_ of bugs available for you to
experience if you change them in a way that nobody else does. :)

~~~
TeMPOraL
> _Or you don 't want to aggressively move your mouse too much when doing
> other expensive operations. (This may be in user space or the kernel, but
> either way you're trained not to do it.)_

Operant conditioning by software bugs is totally a thing, but for this
particular example I was trained into exactly opposite behaviour. I _do_ move
my mouse a lot during very resource-intensive computations, because that lets
me gauge the load on my system (is there UI animation lag? is there cursor
movement lag?), and in extreme cases, it can tell me when there's time to do a
hard reboot. I've also learned through experience that screensavers, auto-
locking, and even auto-poweroff of the screen can all turn what was a long
computation into forced reboot, so avoiding long inactivity periods is
important.

This conditioning comes from me growing up with Windows, but I hear people
brought up on Linux have their own reason - apparently it used to be the case
(maybe it still is?) that some computations relying on PRNG would constantly
deplete OS's entropy pool, and so just moving your mouse around would make
those computations go faster.

------
michaelt
Does anyone know what kernel developers do if someone sends them a patch
adding support for hardware they don't have - a new wireless dongle, say?

Does the maintainer merge the code after just reading over the code and
checking it compiles, never having executed it?

My assumption has always been there's little testing in such a situation - but
I can't square that with the fact the kernel seems to work so well :)

~~~
cesarb
Having sent the kernel maintainers a patch adding support for hardware they
don't have (in my case, it was a network card): they merge the code after just
reading over the code and checking it compiles. Keep in mind, however, that
they usually have decades of experience in their particular area of the
kernel, so they often can tell at a glance when you're doing something wrong
or unusual.

------
buserror
From my experience dealing with upstreaming, it has become more of some sort
of priesthood that is required to go thru the proper mantras and speak the
proper language. And most of the priest no longer have any real idea of what's
the real world is doing with the kernel.

I've spent a few months upstreaming a subsystem earlier last year, something
that had been tested in the field, at customers, and I had to literally _gut_
it to fit the priesthood's way of doing things. Ultimately the result that
went into the kernel was technically /inferior/ to the original source, and
any rant about /that/ was pointedly ignored.

I'm not even going to mention the device tree binding, which has become as bad
as the high days of XML where the format took a life of it's own and requires
it's own maintainers. It's completely bonkers.

I think since a lot of maintainers became 'professionals' they no longer use
linux. They just juggle patches all day and talk between themselves and their
clique. And as long as it fits the big tech company that pay them, it doesn't
matter if it's actually /useful/ to anyone else.

------
notacoward
"We trusted people" is a classic denial of responsibility. The people shoving
in the new-feature changes helped create the problem. It's great that at least
one of them seems to have had a change of heart, but starting by blaming
(unspecified) others suggests that the change might not last long. Expect
reversion to form in a few months.

------
Mister_Snuggles
There was one bug for a while where IPSec would not handle TCP packets[0].
This was a big one since sending TCP packets over an IPSec tunnel is a
somewhat common scenario.

I had to keep an old kernel for quite a while before that one was fixed.

It seems that a lot of the quality control happens with the distributions, not
with the upstream software itself. I doubt that SLES would have seen this bug,
but because I was running Tumbleweed I have to expect breakage like that.

[0]
[http://lkml.iu.edu/hypermail/linux/kernel/1704.3/02043.html](http://lkml.iu.edu/hypermail/linux/kernel/1704.3/02043.html)

------
hannob
Not surprised to read this about XFS.

I did some fuzzing on filesystem tools a few years ago. xfsprogs was...
"interesting". Interesting in that it was seemingly impossible to find anyone
to report bugs to. They had a bug tracker, but it didn't work, submitting bugs
resulted in an error. I think there was also a mail address that bounced. I
think in the end my bug reports reached noone who could care about them.

~~~
int0x80
There is xfs.org and the XFS mailing list. You can just send an email there.
Maybe you were sending HTML emails or something that made the message bounce?

------
lazka
I've filed bugs for some of the mentioned copy_file_range() issues more than
two years ago:

\-
[https://bugzilla.kernel.org/show_bug.cgi?id=135461](https://bugzilla.kernel.org/show_bug.cgi?id=135461)

\-
[https://bugzilla.kernel.org/show_bug.cgi?id=135451](https://bugzilla.kernel.org/show_bug.cgi?id=135451)

No response...

~~~
craftyguy
I'm sure they would have welcomed patches.

~~~
jodrellblank
What does that have to do with anything?

~~~
craftyguy
Uh, that OP could attempt to fix something they complained about, rather than
beat the proverbial dead horse about it?

~~~
jodrellblank
I.e. if you can’t sing, then you can’t point out that the singer is out of
tune, therefore they aren’t out of tune, and everything’s fine?

Submitting a request which says “can you clarify if this filesystem behaviour
is expected?” then bringing it up two years later saying “this has been
unstable for a while” does not seem like “complaining” or “beating a dead
horse”.

------
shereadsthenews
Syzkaller finds bugs in linux kernel all day every day. The fact that some
ioctls don't work right should surprise nobody.

