
Kernel debugging for newbies - alambert
http://www.alexlambert.com/2017/12/18/kernel-debugging-for-newbies.html
======
bcantrill
For those of us coming from non-Linux systems (and I'm speaking for
illumos/SmartOS here), the effort required to debug the kernel displayed here
is galling. That this is so difficult reflects Torvalds' historic disposition
against kernel developers[1]:

    
    
      I don't like debuggers. Never have, probably never will. I use gdb all the
      time, but I tend to use it not as a debugger, but as a disassembler on
      steroids that you can program.
      
      None of the arguments for a kernel debugger has touched me in the least.
      And trust me, over the years I've heard quite a lot of them. In the end,
      they tend to boil down to basically:
     
      - it would be so much easier to do development, and we'd be able to add
        new things faster.
    
      And quite frankly, I don't care. I don't think kernel development should
      be "easy". I do not condone single-stepping through code to find the bug.
      I do not think that extra visibility into the system is necessarily a good
      thing.
    

To be fair, this was a long time ago (17 years ago!), but the experience
relayed here leaves one believing that Torvalds' historic attitude has cast a
long shadow.

And if it needs to be said, Torvalds' arguments are themselves deeply
confused; he has conflated single-step in situ debugging (which does in fact
suffer from limited utility in the context of an OS kernel) with debugging
writ large. So as he rejected in situ debugging, he also implicitly rejected
postmortem debugging and dynamic instrumentation -- both of which have proved
absolutely essential for kernel development. (Indeed, it is likely that DTrace
alone would have allowed the author to debug their problem, as its design
center is exactly the kind of non-fatal failure described.)

The tutorial will certainly save others pain, but that such pain still exists
at all in Linux is deeply unfortunate, and a vivid example of Linux not
representing anything close to the state-of-the-art in systems development.

[1] [https://lwn.net/2000/0914/a/lt-
debugger.php3](https://lwn.net/2000/0914/a/lt-debugger.php3)

~~~
alambert
I agree! For Windows, getting symbols was trivial: you just pointed your
debugger to the publicly-available symbol server[1]. You can see this attitude
continuing in [2]: a beginner is warned away from using a debugger to
understand the kernel behavior.

[1] [https://msdn.microsoft.com/en-
us/library/windows/desktop/ee4...](https://msdn.microsoft.com/en-
us/library/windows/desktop/ee416588\(v=vs.85\).aspx#using_the_microsoft_symbol_server)
[2]
[https://lists.kernelnewbies.org/pipermail/kernelnewbies/2016...](https://lists.kernelnewbies.org/pipermail/kernelnewbies/2016-August/016692.html)

~~~
bcantrill
Wow, that second example is very telling. For whatever it's worth, before we
started work on DTrace (ca. 2001), we prioritized a project to add not just
symbol information but also debugging information to production kernels. This
project -- the Compact C Type Format (CTF)[1][2] -- has proved essential many
times over in the years since (and is a major reason why we can consider
kernel debugging a part of the core system functionality). It is clear that a
project like CTF is unlikely to even be understood let alone prioritized by
those displaying such a dismissive attitude towards debugging.

[1] [http://illumos.org/man/4/ctf](http://illumos.org/man/4/ctf)

[2]
[http://www.smnd.sk/lovasko/paper.pdf](http://www.smnd.sk/lovasko/paper.pdf)

~~~
harry8
Brian, I love your work but seriously. You've mispelled "read" as "understood"
there and to write it off as a dismissive attitude toward debugging in toto is
not something you should lower yourself to pretend to believe. Deep breath.
DTrace is awesome, Oracle is not.

Yeah my kissing may not be up to scratch either... ;-)

~~~
bcantrill
The attitude towards debugging _is_ dismissive -- and I'm not the only one to
have drawn that conclusion. Indeed, it's the original author of this article
who came across that and drew that inference -- one that I (obviously) share.
To flip this around: do you think that the work outlined in the original
article is work that should be expected of anyone wishing to debug the kernel?

~~~
harry8
I think there is room for reasonable people to disagree reasonably about
_most_ issues. Flip that around?

Linux is a long way from being the buggiest OS kernel I've ever used, how
about you? Perhaps they've found and fixed some bugs? Perhaps they've
prevented some from being written? Perhaps their approach is something that
can be disagreed with, even strongly so, on the grounds of being less than
optimal without suggesting that it has zero merit and by extension its
proponents are somehow to be considered with derision? The inference that
anyone hacking any OS kernel is too stupid to understand a differing idea is
probably not necessary and unlikely to be justified in my humble opinion. You
may of course reasonably disagree and maybe one of us did understand something
the other did not on the point? Anyway this is now dull.

But don't be less opinionated, that would be the wrong response!

~~~
Dylan16807
> Perhaps their approach is something that can be disagreed with, even
> strongly so, on the grounds of being less than optimal without suggesting
> that it has zero merit

If a yes or no attitude results in fewer bugfixes, all else equal, then that
attitude does have exactly zero merit.

------
lowleveldesign
Debugging a syscall or start of the process was a great way for me to learn
the system internals. I have some experience with Windows debugging and, after
reading the article, I find that configuring the kernel debugging in Windows
is quite easy. And I really like the live kernel debugging feature, when you
either use windbg (that requires the debug boot flag) or simply run livekd [1]
to analyze the running system data (for instance ALPC connections, handles, or
loaded drivers data). Is there anything similar available in Linux? I plan to
learn Linux internals and would love to use the kernel debugger next to
reading the source code and books.

Tangential, but if there is anyone interested in Windows debugging (including
kernel debugging) have a look at the Inside Windows Debugging book by Tarik
Soulami [2]

[1] [https://docs.microsoft.com/en-
us/sysinternals/downloads/live...](https://docs.microsoft.com/en-
us/sysinternals/downloads/livekd)

[2] [https://www.amazon.com/Inside-Windows-Debugging-Developer-
Re...](https://www.amazon.com/Inside-Windows-Debugging-Developer-
Reference/dp/0735662789)

------
Timothycquinn
Great to see this and I hope I never have to do this on Linux.

A few years back, when I was finally getting my personal dev box off of
windows, I took a very close look at FreeBSD or derivative. I hit the wall
with poor touchpad support which made laptop difficult to use. I tried very
hard to debug their touchpad kernel library but had no luck getting the
results I needed. Anybody have links on how to do the same kind of remote
kernel debugging on FreeBSD like above but using physical box rather than a
VM?

I would still like to give that can another good kick'n :)

------
cuckcuckspruce
Two machines seems like overkill in this case - why not debug using User Mode
Linux[1]? Debugging with two machines makes perfect sense if you are debugging
a hardware driver, but not here.

[1] [http://opensourceforu.com/2010/09/user-mode-linux-setup-
and-...](http://opensourceforu.com/2010/09/user-mode-linux-setup-and-debug/)

~~~
AstralStorm
Because UML cannot interact with most hardware and that is where bugs happen.
Even VMs do not provide real hardware or allow you to bring down the machine
with it.

Instead you can debug a crash using a trace and kexec dump. And also use quite
fast ftrace infrastructure which is much better than plain old printf.

There are also kprobes, gcov and oprofile.

------
chowyuncat
Network debugging over VMWare to a 2.6 kernel (e.g. RHEL 6) will not work much
of the time; a virtual serial port must be used instead.

------
CalChris
The BLIT had a debugger which was named _joff_. Unix had _printf_.

