

Tracking down a kernel bug with git bisect - fauria
http://blog.oddbit.com/2014/07/21/tracking-down-a-kernel-bug-wit/

======
tlrobinson
The first time I used "git bisect run" I felt like I was cheating.
Unfortunately (or fortunately) I don't have many opportunities to use it.

~~~
chriscool
There is this LWN article (that I wrote about it):

[http://lwn.net/Articles/317154/](http://lwn.net/Articles/317154/)

------
tieTYT
One thing that seems difficult with bisect are these series of steps:

    
    
      1. "Hey... this bug didn't use to happen.  Where'd that get introduced?"
      2. OK, this unit test will tell me when the bug is fixed.
      3. Now lets automate bisect to tell me where this 
         test first failed even though I just wrote it.
    

How do you do that? I think as long as you DON'T commit your unit test, bisect
will carry over to every commit check. But, you'll have to make sure you
commit all your other changes or else they'll be carried around, too:
Conflict-city.

Also, if you _do_ commit your unit test (a "bad" habit of mine) I have no idea
how I'm supposed to work with this. I end up copying the unit test by hand to
each commit it tries to test.

In summary, it seems like bisect was built with manual testing in mind. I know
you can automate running a shell script, but I don't write my automated tests
in shell script.

EDIT: Keep in mind, not all projects are interpreted. Some are compiled.

~~~
kazinator
I've successfully used "git bisect" in situations in which, as part of every
test, I had to "git stash pop" some local changes, then do the test, then
remember to "git stash save" them before continuing to bisect. (Alternatively,
"git apply" the stash, then "git reset --hard" to blow away the changes.)
That's how you can handle situations where test code has to be added to the
program to demonstrate the problem, and that test code doesn't exist in old
versions.

~~~
tieTYT
Yeah, not a bad method. And I suppose if I've already commited my unit test, I
could use a "git reset --soft" to put the test in a stash. Next time I use
bisect, I'll try to do this. Thanks

------
amluto
A lot of the time spent mucking around with QEMU can be avoided by using a
tool like virtme:

[https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git](https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git)

------
CUViper
One tip for the module configuration: kbuild has "make localyesconfig" which
looks at your currently-loaded modules and converts them to CONFIG_FOO=Y
builtins.

See also "make help" for more available targets.

------
Arnavion
It's a pity that while the bisect did point in the general direction of where
the bug was in 3.15, it didn't actually find the right bug (the one it found
had already been fixed and a different one introduced that led to the same
errant behavior).

With something as complex as the kernel, I also wonder about automatic bisect
being led astray by unrelated changes. The bisect script might fail for a
different reason and zone in on the wrong commit.

~~~
chriscool
The bisect pointed at the right patch series which is already very nice.

If the bisect script fails because of another problem, it is often possible to
refine the script, or run bisect again on a fixed up branch.

~~~
Arnavion
If I'm not mistaken, it didn't find the right patch series. The article says
the function changed in 3.15 and the underlying reason for his issue was
different and not part of that series.

~~~
seabee
When you update the kernel and your program stops working, there are two
possibilities:

1) it's a kernel bug; there is a regression in the kernel's documented
behaviour

2) it's a bug in your tool because it took advantage of undefined or
undocumented behaviour

The bisect found exactly the right patch series, but bisect can only answer
the question 'when did it stop working', not 'why'.

You can read about many instances of 2) on Raymond Chen's blog The Old New
Thing.

------
josephlord
Good article. It might be worth considering doing _git bisect ignore_ if a
build fails. I suspect (not sure it might just halt) at the moment it would
result in a git bisect bad so might give you a wrong change commit.

~~~
chriscool
You mean "git bisect skip". There is no "ignore" sub command.

~~~
josephlord
Yes, faulty memory.

------
incision
Another article on bisecting which I bookmarked last year, but with physical
hardware [1].

1: [http://rtg.in.ua/blog/kernel-bisect-pxe/](http://rtg.in.ua/blog/kernel-
bisect-pxe/)

------
torrent-of-ions
> there were close to 15,000 commits, which seemed like a large space to
> search by hand.

Does the author not understand binary search? It would take about 14 tries to
find the failed commit.

~~~
laichzeit0
I have no idea why people are downvoting you. Sigh.

A comment as to why you're wrong by the downvoter(s) would be infinitely more
helpful.

~~~
philh
I downvoted for tone, not because ve's wrong. I'm reasonably confident that
someone who can write this blog post, understands binary search. I wouldn't
have downvoted a post that simply said, for example:

> Binary search over 15,000 commits would take about 14 tries to find the
> failed one.

although I don't think that's an important point to be making. (As erikb said,
that's still a lot of commits, and that was only the first run.)

------
snorkel
Interesting way to use to git to find a bug by process of elimination.

------
djhoffma
This is part of why the Linux kernel needs a kernel debugger. I've worked with
other kernels where we can actually observe what is going on using tools
(DTrace on Illumos), and problems like this become something that can be
tackled directly, without having to go through all this nonsense.

~~~
CUViper
kgdb, kdb, systemtap, ktap, perf, ftrace, ...

~~~
stusmall
I've even seen a Visual Studio plugin to debug the Linux kernel. No joke.
Really looked to just be a layer over kgdb but still.

~~~
CUViper
I suppose that's sysprogs.com/VisualKernel/ ? I'm amazed that this exists, but
more power to them! :)

~~~
stusmall
That's the one. I'd be worried if there was more than one. Like you said, more
power to them. Pretty cool it exists and honestly wouldn't mind trying it
sometime.

------
callesgg
This was cool i will definitely try to use the bisect command sometime.

------
mcardillo55
So... technically a kernel bug was fixed and revealed a bug in Docker.

