I mean, I guess I can just dive in, and maybe that's the best approach, but is there a strategy anyone would recommend for reading the Linux source in terms of it making sense as a combined unit of code (as opposed to a collection of algorithms, if that makes sense)?
(1) Learn basic OS concepts through xv6.
xv6 is a reimplementation of an early version of Unix, designed to be as simple as possible and accompanied by a whole book of commentary. Get the book and the source cost listing printed and bound: http://pdos.csail.mit.edu/6.828/2012/xv6.html. Work through the book and exercises. Use the lecture videos from 6.828 from 2011 if you need extra material in order to understand: http://pdos.csail.mit.edu/6.828/2011/schedule.html.
(2) Pick a part of OSes that you are interested in. Contribute to that part in Linux.
Figure out where that part is in the Linux kernel. Find a bug in the bug tracker and submit the patch. I found filesystems interesting, so I fixed a small bug in one of the filesystems. Use a cross reference, it will save you a lot of time: http://lxr.free-electrons.com/source/include/linux/cpu.h. Also feel free to subscribe to the Linux kernel subreddit: http://www.reddit.com/r/kernel. I've set up the sidebar with a lot of useful links.
The Linux kernel is large and complex. You need to equipe yourself with a mental model of an OS through xv6 and then pick one small, specific part to attack in Linux. Be tactical! Otherwise you will be overwhelmed.
As an aside, I'm actually currently working on a tool that parses the Linux source code to find symbol definitions and then works its way back through the Git history to find the commit message for when the symbol was first defined. These commit messages usually contain really useful information about the original intent of the symbol and implementation details. Currently fighting with a few bugs in my C grammar, but should be able to work through those soon. Please feel free to email me at firstname.lastname@example.org if you want to be pinged when the tool is released.
Are you just doing something similar to cscope to find the definition of a symbol, then running git blame on that line? Or are you actually checking earlier revisions as well, to see if the symbol was moved or changed types?
A good "gentle introduction" book is the Love book (440 pages): http://www.amazon.com/Linux-Kernel-Development-Robert-Love/d...
Also, it's not kernel-specific, but this book covers a lot of system programming concepts (expensive though):
If you just want to learn more about how the kernel fits together, reading http://lwn.net/Kernel/LDD3/ (Linux Device Drivers, freely downloadable) is a fine start.
The instructor was excellent.
Well thats kinda expensive....
One thing I found a little different was that the OS has its own libraries for everything (string.h etc..) which makes sense if you think about it.
If you want to browse the source code on-line with this software website thing called lxr (Linux Cross Reference). Its got a good search tool and linked headers. Clicking on a function name shows you where that function is used.
You can install it yourself and I think its much faster..
There is linux weekly news too, which is a decent site when I was still in the Linux porting world. Like many Linux sites, seems to lack in style, but makes up in content.
have - understanding the linux kernel as your reference manual.
by now, you should be comfortable to read/understand the kernel source; download linux kernel source and start browsing through the code.
simply reading books wont get you anywhere - you need to play around with kernel source inorder to understand the linux kernel behavior and different problems you may come across. write simple kernel modules to get a hang of how you can interact/modify with the kernel.
join some opensource project and start fixing bugs you're comfortable with or just play around with your local linux kernel source - make changes; build and deploy and observe what happens.
if you have no prior knowledge of OS Theory and Fundamentals; then you should start here first - read either of the following books
1) Operating System Concepts by Galvin, Silberschatz OR 2)Modern Operating Systems by Tanenbaum
For programming related - system calls and stuff
read 1) Advanced Programming in the UNIX Environment - by Richard Stevens
From that point you can look at yearly diffs to find subsystems that have changed of interest. (Are 4 rewrites of USB interesting?)
One thing I just did was build the first version of git (just sync backwards in the git/git repo and type "make"). I ran the commands manually, and looked at the data file formats, and at the source code, and it greatly improved my understanding of git. It's all still relevant.
The first version of git was shockingly small, like 500 LOC of plain C code or so, but it does a surprising amount of the core work. I also gained a lot of respect for Linus' coding style.
People have said that Linux itself has too many hands in it -- e.g. the system call interface is a huge mess. So I wasn't sure if I would like Linus' code, but I definitely do after reading it.
I think he doesn't care about consistency when merging, because git's interface is a huge mess, just like the Linux syscall interface. But his code is consistent and good for sure.
The main reasons that I want it on my desktop are:
* easy backups
* maybe it will become integrated with package manager (I'm on ubuntu) so that I can roll back package updates if they don't work properly
It is integrated with the package manager on Ubuntu already. I just upgraded my laptop to Quantal and part of the upgrade process spotted I was using btrfs and created a snapshot. (It was a complete waste of time as I have automatically created hourly, daily, weekly and monthly snapshots.) Doing regular package updates is very slow with btrfs. This is due to fsync() being called several times per file.
You'll also want Quantal for an updated btrfs-tools package. For example it lets you do scrubbing. You also need it (and kernel 3.3+) to change raid levels after filesystem creation. The Ubuntu 12.04 installer does create and install to btrfs but didn't let you configure things like raid level so you were stuck with whatever it did.
I've been running all my systems on btrfs for several months now on Ubuntu 12.04 (kernel 3.2) on both SSD and HDD including using RAID 0, RAID 1, dmcrypt/LUKS, inplace conversion from ext4 and who knows what else. (Across my server, workstation, laptop and HTPC.)
The only problem I have had is when filesystems fill up. I've never lost data but it can be quite frustrating trying to find files to delete (also need to be removed from snapshots), rebalancing etc to get things running smoothly again. The various ways of doing df are mostly a work of fiction.
My data is duplicated to offline backups, Dropbox, multiple systems and git/hg repositories so the failure of any system would be annoying but I'd never lose anything. You should make sure you are in that position first, independent of what filesystems are being used.
You mean that fsync()s are more expensive on btrfs, or ubuntu calls them more often when using btrfs for some reason?
I assume btrfs uses a log, right?
There is a log to help with fsyncs. http://en.wikipedia.org/wiki/Btrfs
Note that btrfs will be just fine on unexpected power outage - the filesystem will just contain some random combination of old and new files affected by the installation.
Also, please note that btrfs snapshots != backups. They will not save you in case of device failure. Checksums also won't help you if there are kernel bugs.
To be honest, what do i need btrfs for? I don't (and likely wouldn't) use all of its advanced features anyway..
In all the benchmarks ext4 seems to be a reasonable choice in terms of speed and stability.
Plus copy on write should be pretty cool (cp becomes as fast as mv), though I could see it allowing me to engage in some suboptimal behaviors of copying tons of stuff and counting on CoW to avoid actual duplication.
OTOH, I can just leave KMS off and have no 2D or 3D acceleration.
Adding bufferbloat pieces into the main release is also good news (although that probably won't hit the servers that really need it for many more months...)
Maybe a combination of these two approaches could be useful.