
How I Found a 20-Year-Old Linux Kernel Bug - cesarb
http://robert.ocallahan.org/2017/06/how-i-found-20-year-old-linux-kernel-bug.html
======
dom0
Linux ~4.7 or so fixed a bug in fadvise, specifically FADV_DONTNEED, that
incorrectly rounded page boundaries to the effect of making some calls less
effective. Found and fixed by a developer who wondered why his page cache was
filling up, even though the backup software he used made use of DONTNEED :)

The bug was in from day 1.

~~~
carussell
The thing I wonder about in instances like this is how many people ran into
the problem and thought, "Huh, there must be some quirky rationale here, and
it's just an idiosyncrasy I'll have to deal with", or, "Huh, that's definitely
wrong... oh well".

~~~
aisofteng
I can definitely say that I've been guilty of seeing a subtle but not deal
breaking bug in third party open source software and then failing to file a
report because there's just so much to do in a day that it's easy to forget.
It's on all of us to be good open source citizens.

~~~
ethbro
I think the counter point of this is being kind when people report bugs that
aren't bugs.

If users encounter a weird result, report it, and have someone call them an
idiot because they misunderstood a nuance of the system... they're​ probably
not going to take the time to report next time around.

(And I get it, there's that common issue that people consistently
misunderstand and you continually get reports about. But each one of those
users might also be a user who finds a real bug _next_ time.)

~~~
lomnakkus
I think your parenthetical probably _explains_ the dismissive attitude many
upstreams have, _but_ I would argue that if _so_ many people are
(consistently) misunderstanding your software then there's _probably_ a
UI/UX/education problem that needs fixin'...

(Not saying it's _easy_ , but it's a sign...)

~~~
alkonaut
Someone said, somewhere: User files bug that isn't a bug:

bad/rookie dev: omg dumb user

good dev: closes bug issue, files new issue to fix docs.

------
wyc
Great work!

Segfaults are horrible, but bugs like these are even more so. Crawling,
sneaking, living in your walls. Stealing precious CPU cycles and memory from
millions of machines at once--petabytes and petaflops when you add it all up.

Worst yet, they don't make a peep until you shine the holy light of
benchmarks, code reviews, automated testing, or the (un)lucky corner case on
them.

At least segfaults let you know something's definitely wrong. These bugs? Now
that's insidious.

------
glangdale
The 'rr' project gets a side mention, but if you're into debuggers, it's
_really_ worth a look - a very practical "record and replay" debugger that
allows you to move around in time in a debug session. [http://rr-
project.org/](http://rr-project.org/)

~~~
hashhar
rr is indispensable to me specially as I started hacking around on Firefox and
other C++ codebases (lack of interactive debuggers).

~~~
alkonaut
Why are there no interactive debuggers if you develop firefox?

------
hultner
It's always nice to see fixes of problems found with improved testing. It
would be nice to see something like Haskells QuickCheck rigorously applied on
the majority of the kernel functions/interfaces.

~~~
galapago
Is there a Haskell type of Linux syscalls? If it is the case, we can
automatically derive Haskell code to generate an arbitrary list of syscalls.

(disclamer: I'm one of the QuickFuzz [0] developers and I'm very interested in
testing for this kind of bugs)

[0]: [http://quickfuzz.org/](http://quickfuzz.org/)

~~~
pdimitar
Sorry for a bit of off-topic but can you point me at a good tutorial on what
must be done exactly so I can use your software to test mine?

~~~
galapago
If you already know Haskell, you can take a look to our article explaining how
it works:

[https://labdcc.fceia.unr.edu.ar/~amista/article.pdf](https://labdcc.fceia.unr.edu.ar/~amista/article.pdf)

Feel free to contact us by email in case you need.

------
nathan_f77
Terrifying to think that tiny, subtle bugs like this could exist in pacemakers
and planes. Especially if something like this only happens once every decade
or so:

> I guess once in a while it would fail if your allocator happens to land one
> at the end of a page.

~~~
wolfgang42
My understanding is that pacemakers and planes (and similarly high-reliability
systems) tend to statically assign everything and avoid allocators altogether,
for precisely this reason. It's much easier to prove that you don't have
memory problems if you simply assign every byte of RAM to a specific task and
then make sure it's always used for that task and only that task.

------
wextsucks
Title should have been "How i found a bug in some deprecated 802.11g-era API
that is disabled by default"

I always wonder why the kernel does not just warn when this API is used or
yank it. Any software stuck with this API can barely know about 802.11n and is
probably wondering what is 802.11ac or 802.11ax. Only some old or broken
device drivers require this API.

Linux distros should just stop enabling this API.

~~~
aneutron
I actually use a 802.11g because work. So no thanks.

------
metalliqaz
I have not ready anything about this bug, other than the very short
description in the linked blog, however this seems like a bug that would have
been flagged by a static analysis tool. I know they've been used on the kernel
(e.g. Coverity) Very surprised it survived until now.

~~~
benmmurphy
the syscall function takes a void* parameter then does a copy from user space
into the kernel using the wrong target type/sizeof. i think it works because
the incorrect type was a superset of the correct type. i don't think any
static analyser could catch this.

proposed patch here:

[https://bugzilla.kernel.org/attachment.cgi?id=256997&action=...](https://bugzilla.kernel.org/attachment.cgi?id=256997&action=diff)

~~~
jjnoakes
A static analyzer with sufficient inter-procedural analysis (i.e. across the
user/kernel boundary) could certainly see the type of the allocated structure
in userspace, and flag the kernel read of a larger type.

------
rullelito
I've never done any kernel/low level stuff, so sorry for a probably stupid
question. Will this bug only happen when there is less than 40 bytes but more
than 31 bytes available? (i.e. less than 32 bytes will fail anyway)

------
teddyknox
Love the modesty implicit in the short post.

------
aaronchall
Brilliant post. Horrific page UX - half of the text was blocked by a large
blue box. Work firewall may be responsible for something not loading, but I
don't see why I should have to delete nodes from the dom to read the text.

------
dsego
given enough eyeballs, all bugs are shallow

~~~
geofft
Sure, but it's not at all clear eyeballs are the most efficient way to find
bugs. They seem remarkably _inefficient_ compared to computers, which have
generally shown themselves to be good at monotonous mechanical work that
requires good attention to detail and no creativity.

In particular it seems to me like this could have been fixed with a better,
machine-readable description of the types/structures for each ioctl, plus a
static analysis tool that makes sure that the kernel does a copy_from_user on
exactly what the documented input types are and no more or less. There is
already a halfhearted attempt to encode type information in ioctls (the _IOR,
_IOW, etc. macros), so I think this is doable. I'm not sure how much work is
required to trace copy_from/to_user statically, but it certainly seems like it
would be far less work than 20 years of people using these syscalls.

As another example, I think "given enough eyeballs, all bugs are shallow"
would be a poor reason to eschew writing tests for your code.

~~~
db48x
The quoted soundbite is intended to be pithy, not 100% literal. It's ok if
some of the eyeballs are in fact implemented with automated static analysis.

------
WhiteSource1
And clearly no one noticed it in 20 years....

------
_e
did this guy get a bug bounty?

~~~
bonzini
It's not a security-sensitive issue.

~~~
jwilk
FWIW, not all bug bounty programs are about security.

------
squalor1a
Is there any way you can get paid a bounty?

