Hacker News new | past | comments | ask | show | jobs | submit login
A Heavily-Commented Linux Kernel Source Code [pdf] (oldlinux.org)
739 points by turingbook 32 days ago | hide | past | web | favorite | 40 comments

Of possible (mostly historical) interest at this point, a similar commentary on a fairly early version of the original Unix kernel, by an Australian cs prof named John Lions -- which was very widely circulated among CS students in the '80s, despite this being technically in violation of AT&T's copyright on the book.

It's availble online here: http://www.lemis.com/grog/Documentation/Lions/index.php

Note that the code is written in a very archaic dialect of C, and for hardware that didn't support paging in any form (just swapping). Nevertheless, it was an important introduction for a lot of people at the time, not just to the basics of OS implementation details, but also, how to find your way around a nontrivial sized codebase.

"Lions' Commentary on UNIX 6th Edition, with Source Code by John Lions (1976) ...Despite its age, it is still considered an excellent commentary on simple but high quality code.

For many years, the Lions book was the only Unix kernel documentation available outside Bell Labs. Although the license of 6th Edition allowed classroom use of the source code, the license of 7th Edition specifically excluded such use, so the book spread through illegal copy machine reproductions (a kind of samizdat). It was commonly held to be the most copied book in computer science."[0]

I had a look a couple of years ago, it's a joy to read. It's about UNIX on the PDP11. A 1977 version PDF is available here[1] - that page has a cover with endorsement from Ken Thompson and Foreword by Dennis Ritchie, but that seems from a later edition. The end of Lions' preface is funny:

"The co-operation of the "nroff" program must also be mentioned. Without it, these notes could never have been produced in this form. However it has yielded some of its more enigmatic secrets so reluctantly that the author's gratitude is indeed mixed. Certainly "nroff" itself must provide a fertile field for future practitioners of the program documenter's art."

The Use of these notes section:

"These notes, which are intended to supplement the comments already present in the source code, are not essential for understanding the UNIX operating system. It is perfectly possible to proceed without them, and you should attempt to do so as long as you can.

The notes are a crutch, to aid you when the going becomes difficult. If you attempt to read each file or procedure on your own first, your initial progress is likely to be slower, but your ultimate progress much faster. Reading other people's programs is an art which should be learnt and practised because it is useful!"

The end of the Introduction:

"...on the whole you will find that the authors of UNIX, Ken Thompson and Dennis Ritchie, have created a program of great strength, integrity and effectiveness, which you should admire and seek to emulate."



There's also the great "The Design of the UNIX Operating System":


Which is probably less relevant today in terms of directly understanding the implementation. But an interesting and enlightening read. Things were much simpler and fundamental back in the 1980s. It's easier to understand that way. Then layer on top.

Slightly more recent: The Design and Implementation of the 4.4 BSD Operating System, which includes some of the Berkeley additions to the kernel, such as TCP/IP and sockets.

And then there's xv6 [1], a small Unix running on vx32 from MIT for teaching purposes, full of comments, and available as a booklet that is directly inspired by Lions' commentary on the 6th edition of Unix.

I actually agree, though: to really appreciate the classics you should start with (Maurice) Bach. :-P

[1] https://pdos.csail.mit.edu/6.828/2018/xv6.html

Correction: xv6 runs on QEMU, not vx32 (although Russ Cox co-authored both xv6 and vx32).

Hah, this is amazing! This reminds me of how I used to (and still do, sometimes) read third-party code.

For an OS class in college, we had to modify fork (and re-build the kernel) to track how many times a particular process had been forked (and probably some other statistics I'm forgetting at the moment).

I remember going through a very similar process for the first time - injecting white space below chunks of code, writing out my own comments, and then using that to figure out how to modify fork. Looking at the author's fork.c comments gave me a feeling of nostalgia.

The useful part of course is going through yourself and writing your own comments, but it can be really helpful to start with something like this (and then write your own version of the comments).

To briefly understand just how thorough this book is with providing all of the necessary background information and context, the chapter that actually matches the book title (Kernel Code), is chapter 8 and starts on page 319.

KernelVersion 0.12

1117 pages

11.1 MB

> The main goal of this book is to use a minimal amount of space or within a limited space to dissect the complete Linux kernel source code in order to obtain a full understanding of the basic functions and actual implementation of the operating system. To achieve a complete and profound understanding of the Linux kernel, a true understanding and introduction of the basic operating principles of the Linux operating system. This book's readership is positioned to know the general use of Linux systems or has a certain programming basis, but it lacks the basic knowledge to read the current new kernel code and is eager to understand the working principle and actual code of the UNIX operating system kernel as soon as possible. Realize the lovers.

1117 pages for millions of lines of code doesn't seem too bad, relatively speaking

Only ~20k lines. From the book's into:

> The current Linux kernel source code amount is in the number of millions of lines, the 2.6.0 version of the kernel code line is about 5.92 million lines, and the 4.18.X version of the kernel code is extremely large, and it has exceeded 25 million lines! So it is almost impossible to fully annotate and elaborate on these kernels. The 0.12 version of the kernel does not exceed 20,000 lines of code, so it can be explained and commented clearly in a book

> The 0.12 version of the kernel does not exceed 20,000 lines of code

> The 4.18.X version of the kernel code is extremely large, and it has exceeded 25 million lines

Is most of this drivers though? whats left once that's removed?

Glue code for the drivers :P

Ah my bad. I guess it's still relatively fair overall.

1 million lines of code in 1000 pages would come out to 500 lines of code per side of page. You'd need a big magnifying glass to get through that.

The book covers an old version of linux which had a few thousand lines of code.

Chinese reader here. When I was in college about 11 or 12 years ago, a previous version of it is considered as one of our textbooks for the Operating System course. Most assignment and homework is about to add or modify some modules into kernel 0.11.

The title of the book doesn't seem to do the content justice.

For sure. Looks like over 300 pages of background about how Linux works before it even gets to the source code!

Totally, this should be "a new manual to linux"

A new manual to old Linux.

Great work. In the preface the author states

> At present, people in China are already organizing human annotations to publish books similar to this article.

Maybe Chinese programmers will herald an increase in literate programming? Seems like a lot of effort could be saved in back-annotating by just starting the program as a literate one in the first place...

I think the prevalence of book of commented sources in China partly comes from the way that Chinese big companies interview people - asking a lot of implementation details (especially for DBs), even though in most cases that is useless in daily work (similar to Leetcode questions in US interviews). IMO, literate programming sounds more like software development in Japan, where big companies engineers write high-level specification, then the 1st outsource company models class hierarchy, followed by the 2nd outsource company writing function declarations and comments, eventually implemented by the 3rd outsource company.

Here's my favorite example of a literate program: http://www.pbr-book.org/ I don't see how such a thing could be constructed using that Japanese way. Though I'm not sure how any software could be constructed in that Japanese way. ;)

This is a really dumb observation, but my apartment is in the cover photo! (Vancouver BC, Canada)

Sorry, I got super excited!

Just went to Vancouver a few months ago and I immediately recognized the aquabus and the background. I was just where the picture was eating the market's beefjerky! Crazy small world.

True! I got so excited when I saw the cover photo.

Same..I got super excited too!

Many of my friend read the Chinese/original version of it more than a decade ago. It's a dictionary-style book. Unfortunately I never had the patient to read it.

As other commenters pointed out, this seems like an excellent piece! Is not too often than a one thousands pages book catches my attention, and then after a while I notice I been reading intensely the first few pages wanting for more, and so far only has been some paragraphs about the people involved at the very beginning of Linux!.

It seems heavy, but rewarding.

Super piece of work. A great way for people to see how Linux works up close. Thanks for sharing.

Unrelated: I am from Vancouver, BC and it is so coool to open a totally unrelated PDF and see a picture of your neighbourhood.

I just wanted to read a few pages and save the book for later, but I ended up reading for 10 hours straight, I could not stop. Amazing.

You can tell the author has put so much work into this, I'm really grateful that he has released this for everyone to read.

Can someone redirect me to the section of the book where they explain how exactly kernel manages sockets, binds the port and keep tracks of bound/allocated ports.

I don't think v.12 (the linux version covered in the book) had sockets.

Yep, it seems 0.98 was the first version to have experimental tcp/ip.

An easy way to do this is figure out related syscalls. in this case it will be open (vfs of sockets) accept etc. Grep for those and then you'll see the data structures and the code they use.

I'd really like to contribute to the kernel one day. Is this work worth reading for someone like me, or is it more for historical interest?

Appreciate the effort but I’m sorry to say the writing is so not the best I can barely understand some of the sentence

I wish there was an epub version, so I can read it on an ereader. PDF's aren't nice to read on mine.

If you can, try Calibre[0]. It can do the conversion and ebook management for you.

[0] https://calibre-ebook.com

This book looks amazing.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact