Hacker News new | past | comments | ask | show | jobs | submit login

OK, so here's a question. I've always wanted to spend some time reading through the Linux kernel - I really enjoyed operating systems when I studied it (and reading through MINIX src) but I don't really know where/how to start with Linux.

I mean, I guess I can just dive in, and maybe that's the best approach, but is there a strategy anyone would recommend for reading the Linux source in terms of it making sense as a combined unit of code (as opposed to a collection of algorithms, if that makes sense)?

I recently did exactly what you describe. My first contribution to the Linux kernel actually shipped with this release. If I recall correctly it took me about 80 hours to go from no knowledge to first patch. Assuming no knowledge, it is a two step process.

(1) Learn basic OS concepts through xv6.

xv6 is a reimplementation of an early version of Unix, designed to be as simple as possible and accompanied by a whole book of commentary. Get the book and the source cost listing printed and bound: http://pdos.csail.mit.edu/6.828/2012/xv6.html. Work through the book and exercises. Use the lecture videos from 6.828 from 2011 if you need extra material in order to understand: http://pdos.csail.mit.edu/6.828/2011/schedule.html.

(2) Pick a part of OSes that you are interested in. Contribute to that part in Linux.

Figure out where that part is in the Linux kernel. Find a bug in the bug tracker and submit the patch. I found filesystems interesting, so I fixed a small bug in one of the filesystems. Use a cross reference, it will save you a lot of time: http://lxr.free-electrons.com/source/include/linux/cpu.h. Also feel free to subscribe to the Linux kernel subreddit: http://www.reddit.com/r/kernel. I've set up the sidebar with a lot of useful links.

The Linux kernel is large and complex. You need to equipe yourself with a mental model of an OS through xv6 and then pick one small, specific part to attack in Linux. Be tactical! Otherwise you will be overwhelmed.


As an aside, I'm actually currently working on a tool that parses the Linux source code to find symbol definitions and then works its way back through the Git history to find the commit message for when the symbol was first defined. These commit messages usually contain really useful information about the original intent of the symbol and implementation details. Currently fighting with a few bugs in my C grammar, but should be able to work through those soon. Please feel free to email me at tsally@atomicpeace.com if you want to be pinged when the tool is released.

Thanks for posting this. It was very helpful to me.

As an aside, I'm actually currently working on a tool that parses the Linux source code to find symbol definitions and then works its way back through the Git history to find the commit message for when the symbol was first defined.

Are you just doing something similar to cscope to find the definition of a symbol, then running git blame on that line? Or are you actually checking earlier revisions as well, to see if the symbol was moved or changed types?

I'm finding the definition of a symbol using a parsing expression grammar for C. I haven't yet decided upon the particular algorithm for finding the first commit, but I plan on using libgit2 to work directly with the repository. In the simplest case, you just iterate through all versions of a file and find the earliest one with the string in question. Obviously care is required when walking the repository history.

Do you know about "git log -S symbol"? It will show the first commit that defined the symbol at the bottom of the output.

Thanks for the pointer. I know I'll find the commit using libgit2, but I hadn't yet gotten to thinking about the algorithm to do so. git log -S seems like a good starting point.

I have literally no interest in doing any of that, but that's a great content-filled post. Thanks for doing so!

The most thorough treatment is Bovet & Cesati (944 pages): http://www.amazon.com/Understanding-Linux-Kernel-Third-Editi...

A good "gentle introduction" book is the Love book (440 pages): http://www.amazon.com/Linux-Kernel-Development-Robert-Love/d...

Isn't Bovet & Cesati a bit out of date now? I used to have a copy, but it was published in 2005, which means it was probably written in 2004.

Also, it's not kernel-specific, but this book covers a lot of system programming concepts (expensive though):


This is going to be unsatisfying, but the only thing that works for motivating me to understand a piece of kernel code has been wanting to change it and having to learn exactly how it works to achieve that.

If you just want to learn more about how the kernel fits together, reading http://lwn.net/Kernel/LDD3/ (Linux Device Drivers, freely downloadable) is a fine start.

This is the way the Linux kernel was introduced to us in Uni, and I really think it is the best way. Set a goal like "write a simple device driver", or (my choice) "implement a new concurrency primitive", and work towards that. Trying to do something like adding P/V system calls will teach you a lot about how the kernel operates; enough to give you a good starting place to find something new about it to learn.

I found Robert Love's Linux Kernel Development to be quite useful.


I have this book and think it's great. It covers Linux 2.6, which happens to be the version of the kernel used in CentOS6.2 which is what I use at work. What are the major differences between 2.6 and the 3 line. Specifically, what is the major difference that caused the numbering scheme to go from 2 to 3?

The jump to 3 was made arbitrarily after 2.6.39. Here's the announcement last summer from Linus: https://lwn.net/Articles/452531/

Basically, Linus just felt that the minor release numbers were getting too big. He dedided to not call that release 2.6.40 and call it 3.0.

I got sent to redhat in my companies project to go from Solaris/HPUX -> Linux. It was government software and the project got put on hold....

The instructor was excellent. https://www.redhat.com/training/courses/rhd361/course-exam-o...

Well thats kinda expensive....

One thing I found a little different was that the OS has its own libraries for everything (string.h etc..) which makes sense if you think about it.

If you want to browse the source code on-line with this software website thing called lxr (Linux Cross Reference). Its got a good search tool and linked headers. Clicking on a function name shows you where that function is used. You can install it yourself and I think its much faster.. http://lxr.linux.no/linux+v3.6/ http://rhkernel.org/

There is linux weekly news too, which is a decent site when I was still in the Linux porting world. Like many Linux sites, seems to lack in style, but makes up in content. http://lwn.net/

start off with reading 1) linux kernel development 2) linux device drivers 3) linux kernel module programming guide

have - understanding the linux kernel as your reference manual.

by now, you should be comfortable to read/understand the kernel source; download linux kernel source and start browsing through the code.

simply reading books wont get you anywhere - you need to play around with kernel source inorder to understand the linux kernel behavior and different problems you may come across. write simple kernel modules to get a hang of how you can interact/modify with the kernel.

join some opensource project and start fixing bugs you're comfortable with or just play around with your local linux kernel source - make changes; build and deploy and observe what happens.


if you have no prior knowledge of OS Theory and Fundamentals; then you should start here first - read either of the following books 1) Operating System Concepts by Galvin, Silberschatz OR 2)Modern Operating Systems by Tanenbaum

For programming related - system calls and stuff read 1) Advanced Programming in the UNIX Environment - by Richard Stevens

You may want to start with the earliest Linux kernels as they will be considerably smaller and simpler. You'll also be able to fit the whole kernel in your head.

From that point you can look at yearly diffs to find subsystems that have changed of interest. (Are 4 rewrites of USB interesting?)

Yeah I agree. The first versions of Linux were 10k LOC or less. Should be pretty readable.

One thing I just did was build the first version of git (just sync backwards in the git/git repo and type "make"). I ran the commands manually, and looked at the data file formats, and at the source code, and it greatly improved my understanding of git. It's all still relevant.

The first version of git was shockingly small, like 500 LOC of plain C code or so, but it does a surprising amount of the core work. I also gained a lot of respect for Linus' coding style.

People have said that Linux itself has too many hands in it -- e.g. the system call interface is a huge mess. So I wasn't sure if I would like Linus' code, but I definitely do after reading it.

I think he doesn't care about consistency when merging, because git's interface is a huge mess, just like the Linux syscall interface. But his code is consistent and good for sure.

In case somebody looked for the first version of git, here it is: http://goo.gl/mKYnR

There's no need to post shortened URLs to HN, and in fact this is discouraged. If you post a very long URL, HN will elide the text beyond some limit so it fits better in the page.

Don't read the Linux source code unless you have to. Read xv6's source code instead. http://www.google.com/search?q=xv6

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact