
Linux kernel exceeds 15 million lines of code - gaoprea
http://www.h-online.com/open/news/item/Linux-kernel-exceeds-15-million-lines-of-code-1409952.html
======
karterk
Can someone explain how such a HUGE codebase is maintained and new people get
acquainted with it? I understand the distributed nature of development, but
still, this seems like LOTS of code especially for someone new and who wants
to get started in terms of contributing.

~~~
exDM69
Seven easy steps to get acquainted with the Linux kernel:

Step 1: Build gnu binutils, gcc and gdb from source for the correct target
architecture

Step 2: Setup qemu, some kind of virtualization or get a gadget that can run
linux so that you can run and debug the kernel without repeated rebooting. (in
the early days, Linus actually used to reboot to test the kernel, booting from
floppy disks. I wouldn't have the patience)

Step 3: Configure and build the linux kernel from git master branch

Step 4: Make an initrd root filesystem, e.g. uClibC + busybox.

Step 5: If applicable, install a bootloader. Boot to your new kernel.

Step 6: Hack with the source and/or compile time or runtime kernel
configuration

Step 7: Start over from the beginning but do something differently.

Steps 1-5 takes about one saturday afternoon. The faster your CPU, the less
there will be waiting.

You have to start somewhere, so pick an area of interest and work your way
around from there. Don't start in the beginning of the boot code, that's
arcane magic that's not very useful for learning.

There are several books written about the Linux kernel, so you might want to
pick one of those. Of course, printed material about the kernel gets old fast
but the basic foundations have been pretty solid.

~~~
4ad
> in the early days, Linus actually used to reboot to test the kernel, booting
> from floppy disks. I wouldn't have the patience.

Still required all the time if you develop drivers for physical hardware.

Still required at some point if you develop other kernel mode components
because you can get bugs reproducible only on physical hardware because of
different timings, cache coherency bugs (rare, but more often than you think)
or broken ACPI routing tables ( _Every_ motherboard in existence has a broken
ACPI table and the kernel has to deal with it).

~~~
exDM69
> Still required all the time if you develop drivers for physical hardware.

Thankfully, this is not quite true. Most Linux drivers are implemented as
kernel modules, which can be loaded and unloaded at runtime. No rebooting
necessary.

The guys who develop drivers for hardware that cannot be implemented as a
kernel module (like some motherboard chipsets) probably have another machine
for debugging or have some kind of dev boards to make their workflow more
efficient.

~~~
4ad
Thanks for telling me how to do my job, I'm a kernel developer :-).

A bug in kernel code means you panic the machine, so you _have_ to do a
reboot.

Whether you load the modules at runtime, on demand, or the driver is
statically compiled into the kernel has _nothing_ to do with it. The decision
of when to do linking does not influence development a bit.

Also, having multiple machines is _required_ , not optional, because you use
the dev machine to connect with a kernel debugger through a serial/firewire
link to the test machine.

------
igniteflow
Non-code lines shouldn't be included as they don't get compiled, otherwise you
are just penalising for documentation

~~~
jpitz
Sorry, but I absolutely, completely disagree.

Why? Documentation has to be maintained.

The LOC metric isn't about the cost to the computer to execute the code. What
is IS about is the cost to the developers to create and maintain it.

------
gizzlon
Would be interesting to see comparable numbers for other operating systems.

~~~
ch0wn
There are some numbers available on Wikipedia[0]. Comparing with Windows is
especially difficult though, because there is no differentiation between the
kernel and graphical systems or system tools.

[0] <http://en.wikipedia.org/wiki/Source_lines_of_code#Example>

~~~
4ad
I've worked on Windows. I never really did a thorough analysis, but I'd say
kernel mode components total around 3MLOC (out of ~50MLOC total). Definitely
smaller than Linux.

~~~
__alexs
Presumably there aren't all that many device drivers in there though? I'd
imagine that mostly lived off in a WHQL repo or something in a different
department.

~~~
4ad
Drivers made by Microsoft are in the tree. WHQL'ed 3rd party drivers offered
by Windows Update are somewhere else, but there's no source for those so it's
hard to count lines of code.

On a typical computer more than half of the drivers loaded are made by
Microsoft, you can check easily with sysinternals' autoruns.

~~~
rplnt
And they are in the kernel? What are dozens of .sys files in drivers folder
then?

~~~
Someone
There are at least three different definitions of "in the kernel":

\- code that compiles to the single executable file that runs the OS (this can
be limited to a mechanism for booting the system, a thread scheduler, basic
memory allocator, but this may also contain some device drivers, file systems,
etc)

\- any code that compiles to anything that will run in kernel mode (many file
systems and device drivers)

\- the code that someone (say the people distributing what they call "the
Linux kernel") call the kernel.

This count is about the third. I do not think it contains a kitchen sink, but
it is close :-)

------
ComputerGuru
So freaking what?

EDIT

It's not a valid metric. 15 million can be a lot or a little. It's all
relative. This is 15 million LoC of plain C code. This is code that includes
thousands of device-specific routines, but even if it didn't, what are you
comparing it to?

Let's say it is too long even compared to other C kernels w/ device drivers.
What price are you putting on well-written code? Do you know how many of those
lines are hints to the compiler? Those don't even get compiled.

------
monochromatic
Pretty arbitrary benchmark. Reminds me of this.

<http://xkcd.com/1000/>

------
helmut_hed
Has anyone out there done refactoring experiments? Perhaps as an academic
exercise? I don't know the kernel well enough to say for sure but I do recall
a fair bit of duplicated code in e.g. the drivers. It would be interesting to
see what could be done there.

------
jiggy2011
15 MLOC?

That seems like allot , this must include allot of things that can optionally
be compiled into the kernel?

Last time I browsed the kernel source tree it looked like a few 100KLOC to me.

------
Craiggybear
Like the article says, something like 75% of all of that is driver specific or
related to filesystems. You can still compile a minute little kernel for
embedding that will be a fraction of the size of the full kernel. And there's
still quite a lot of old cruft in there that _could_ be left out.

Even so, if true, it does seem to be getting quite bloaty. Still, it has to
cover a lot of ground nowadays in terms of hardware.

~~~
oakgrove
"75%...drivers...filesystems...it does seem to be getting quite bloaty"

Does the code that supports those drivers and filesystems slow the overall
system down in any way? If not, how can you call it "bloat"? Bloat in this
context generally means additional code that is out of proportion to its
functionality while also using a disproportionate amount of memory and
diskspace. There is legitimate bloat in the linux kernel but suggesting that
that has anything to do with the variety of hardware support is to
misunderstand the meaning of the word. Oh yeah, YMMV

~~~
Craiggybear
Look ... 95% could be pure Assembler. OK?

No, its largely irrelevant nowadays. But there's a pile of cruft in there that
could go ...

I'm not critiquing ANYTHING about the kernel. BUT. There IS a HUGE pile of
crap _still in there_ and everyone knows it. Don't nitpick my innocent and
uncritical comment and try to turn it into a major critique of the kernel. Its
still WAAAAAAYYYYY superior to the shite that still exists in Windows (because
it simply MUST).

NO. Of COURSE it doesn't slow down the Linux experience but it might one day
become too large to be realistically distributed without a lot of trimming.

I've been a major user, developer and proponent of Linux for decades. I use
nothing else in my day to day life. Don't dare to presume to pull me up
because I dare to comment on a seriously discussed and current problem with
the current kernel. NAMELY: IT'S BECOMING A FAT BITCH AND EVERYONE KNOWS IT.

It's still and always will be my OS of choice. Whether or not I knew how to
exclude certain chunks of irrelevance from my build or not.

