
Kernel 101 – Let’s write a Kernel - slashdotaccount
http://arjunsreedharan.org/post/82710718100/kernel-101-lets-write-a-kernel
======
akkartik
I got this running on qemu by cannibalizing a tiny bit of code from xv6
([http://pdos.csail.mit.edu/6.828/2012/xv6.html](http://pdos.csail.mit.edu/6.828/2012/xv6.html))
to replace the GRUB dependency. After cloning and building mkernel according
to its instructions:

    
    
      $ git clone git://pdos.csail.mit.edu/xv6/xv6.git
      $ cd xv6
      $ make
    

Now you should be able to run xv6 by itself:

    
    
      $ path-to-qemu/x86_64-softmmu/qemu-system-x86_64 -serial mon:stdio -hdb fs.img xv6.img -m 512
    

To run mkernel on qemu, we'll replace xv6's kernel with mkernel's:

    
    
      $ dd if=/dev/zero of=mkernel.img count=10000
      $ dd if=bootblock of=mkernel.img conv=notrunc
      $ dd if=../mkernel/kernel of=mkernel.img seek=1 conv=notrunc
    

Now you can boot up the _mkernel.img_ rather than _xv6.img_ :

    
    
      $ path-to-qemu/x86_64-softmmu/qemu-system-x86_64 -serial mon:stdio -hdb fs.img mkernel.img -m 512
    

(Based on xv6 at hash ff2783442ea2801a4bf6c76f198f36a6e985e7dd and mkernel at
hash 42fd4c83fe47933b3e0d1b54f761a323f8350904. Ping me if you have questions;
email in profile.)

~~~
exDM69
> I got this running on qemu by cannibalizing a tiny bit of code from xv6 to
> replace the GRUB dependency.

This thing doesn't depend on GRUB, per se. It requires a multiboot protocol
compliant bootloader, and QEMU and Bochs emulator have one built-in.

All you need to do is:

    
    
        qemu-system-i386 -kernel kernel
    

This was mentioned in the last lines of the OP, perhaps they were added after
you read it.

~~~
akkartik
Ah. Thanks! Yeah, I'd emailed the author with my comment.

------
grahamedgecombe
There's a slight problem in this tutorial in that it assumes ESP (the stack
pointer) will be defined by the boot loader to point to an appropriate
location for the stack. However, the Multiboot standard states that ESP is
undefined [1], and that the OS should set up its own stack as soon as it needs
one (here the CALL instruction uses the stack, and the compiled C code may
well too).

An easy way to solve this is to reserve some bytes in the .bss section of the
executable for the stack by adding a new section in the assembly file:

    
    
      [section .bss align=16]
        resb 8192
        stack_end:
    

Then before you make use of the stack (between `cli` and `call kmain` would be
appropriate in this case), you need to set the stack pointer:

    
    
      mov esp, stack_end
    

[1]:
[https://www.gnu.org/software/grub/manual/multiboot/multiboot...](https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Machine-
state)

------
exDM69
This seems like a good start but it's worth noting that it isn't guaranteed to
work correctly.

The problem is that the control is passed to C code with no stack space set
up. It works out of pure luck because the compiler has decided to keep all
variables in registers and changing your C compiler flags might make this fail
in interesting ways.

Another thing that is missing is clearing the .BSS section before passing
control to the C code. It's not used at the moment, though.

I am pointing these things out because in my own Ring—0 programming projects I
spent a lot of time debugging some failures related to stack space and an un-
initialized BSS section.

~~~
grahamedgecombe
> Another thing that is missing is clearing the .BSS section before passing
> control to the C code. It's not used at the moment, though.

The Multiboot standard says that the boot loader will clear the .bss section
for you - in section 3.1.3: "bss_end_addr Contains the physical address of the
end of the bss segment. The boot loader initializes this area to zero"

I don't know what GRUB does if you rely on that fact it can parse ELF files
and don't specify the fields like bss_end_addr though. I'm fairly sure it
clears it in this case too, but I'm using Multiboot 2 for my OS so the
behaviour could be different.

~~~
exDM69
> The Multiboot standard says that the boot loader will clear the .bss section
> for you - in section 3.1.3: "bss_end_addr Contains the physical address of
> the end of the bss segment. The boot loader initializes this area to zero"

Ok, good to know.

It certainly doesn't do that unless you tell it to (using the address tag),
and this example (nor my hobby kernel) use that.

So the BSS must be cleared or the bootloader told to do so.

~~~
grahamedgecombe
I've just checked the GRUB source code and I think it will clear the .bss
section even if it's loading an ELF file.

grub-core/loader/multiboot_elfxx.c has a function named
grub_multiboot_load_elf32/64 which actually loads the segments of the ELF
file. A segment has two fields defining its size: filesz (which is the amount
of bytes to copy from the file) and memsz (which is its actual size once
loaded). If memsz is greater than filesz, it zeroes the trailing bytes:

    
    
      if (phdr(i)->p_filesz < phdr(i)->p_memsz)
        grub_memset ((grub_uint8_t *) source + phdr(i)->p_filesz, 0,
          phdr(i)->p_memsz - phdr(i)->p_filesz);
    

The .bss section is placed by the linker at the end of a segment and increases
memsz by the size of it (but not filesz, to avoid having to place lots of
pointless zeroes in the ELF file) - for example this is one of the segments
from my kernel's ELF file, which contains the .bss section at the end:

    
    
      LOAD off    0x0000000000020000 vaddr 0xffffffff8011f000 paddr 0x000000000011f000 align 2**12
           filesz 0x0000000000004be0 memsz 0x0000000000017678 flags rw-
    

Here you can see memsz is 0x17678 bytes and filesz is smaller at 0x4be0 bytes.
The difference between them is the size of the .bss section.

grub_multiboot_load_elf32/64 is called in the case when the address tag is not
present, so the .bss section will be cleared by GRUB in this case as well.

------
bebop
This is a great resource for anyone who would like to take this further:
[http://wiki.osdev.org/Expanded_Main_Page](http://wiki.osdev.org/Expanded_Main_Page)

In particular, setting up interrupt handlers, paging, and getting a PIC setup
is pretty neat.

------
boulderdash
If anybody is doing this, let me share some words of advice based on
experience.

Please use a virtual machine instead of doing this on your primary machine.
You eliminate the risk of messing up your machine. Also, if you setup the VM
properly, you get a debugger.

~~~
aaren
Could you point to how to go about setting a suitable VM up on Debian? I could
do with not bricking my machine :)

~~~
zhemao
Use QEMU.

[http://qemu.weilnetz.de/qemu-doc.html](http://qemu.weilnetz.de/qemu-doc.html)

------
AdrianoKF
Looks great, will keep an eye on this!

I also really enjoyed James Molloy's OS kernel development tutorial at
[http://www.jamesmolloy.co.uk/tutorial_html/](http://www.jamesmolloy.co.uk/tutorial_html/),
which takes you from "Hello World" to some real toy OS kernel implementation.

~~~
grahamedgecombe
It's worth pointing out there are a few bugs in James Molloy's tutorial [1],
and some of the things he does in them aren't exactly best practices - for
example, a few I remember are:

\- Disabling interrupts and paging (which also has the side effect of flushing
the TLB) to copy memory around by physical address. This could be done without
disabling them by mapping all of physical memory into virtual memory instead
(possible in 64-bit mode, but in 32-bit there isn't enough room when your PC
has a similar amount of RAM to virtual memory space, in which case you could
map smaller parts of it as needed).

\- Moving the stack around to get around the fact that GRUB doesn't set ESP to
some well-defined value (instead of defining the stack yourself, which would
be much more robust) and then attempting to rewrite the base pointers to fix
it. For example, his code can't tell the difference between integers that just
happen to have a value in the range of the pointers and a pointer, and will
happily rewrite both. Also as ESP isn't defined by the Multiboot standard you
could be using any location at all as the stack (such as some memory address
that doesn't exist, or your kernel's code itself, or some memory-mapped area
for a piece of hardware, etc.) All of which will mean things go wrong. It's
better to just set ESP yourself before you enter C - see another of my
comments on this submission here [3].

There's actually a newer and much better version of JamesM's tutorials on
GitHub, but I believe they aren't quite finished [2].

[1]:
[http://wiki.osdev.org/James_Molloy%27s_Known_Bugs](http://wiki.osdev.org/James_Molloy%27s_Known_Bugs)
[2]: [https://github.com/jmolloy/JMTK](https://github.com/jmolloy/JMTK) [3]:
[https://news.ycombinator.com/item?id=7590753](https://news.ycombinator.com/item?id=7590753)

~~~
AdrianoKF
Oh, didn't know there was a follow-up to his original tutorial - I'll have a
look at your resources when I get back from work!

------
mbillie1
Very cool. I personally (as a developer without a CS background) find these
sorts of posts wonderfully interesting, even if this kernel, as pointed out in
this thread, lacks a lot of what a normal kernel does. I'd love to see one of
these for a compiler!

~~~
lmm
I highly recommend
[http://www.hokstad.com/compiler/](http://www.hokstad.com/compiler/) ; it
talks about writing a compiler in a way that makes sense to me - writing it
the way you write any other program, rather than throwing you straight in with
lexers and parsers and never justifying why we need to do things this way.

~~~
vidarh
Glad you like it... I'm hoping to push out the next part later this week, and
I've got a few more drafts queued up that "just" needs some proofreading.

Working on getting it to the point where it can fully compile itself now, and
hope to get there over the couple of months.

~~~
steveklabnik
I've been reading this series with interest for years. Thank you for writing
it.

------
dkarapetyan
I really like the new HN. Quality of articles is way up.

~~~
mbillie1
I've found myself spending a lot more time here as well. Was there any
code/policy change with the HN site? Or is this new moderators jumping in and
helping out a lot? Either way, I agree that it is excellent.

~~~
bebop
About a week or two ago pg added moderation. I cannot find the thread right
now, but basically comments are not published until someone with high enough
karma (I think 1000 points) approves the comment.

~~~
mbillie1
Oh, I thought that was only for selected threads.

~~~
Widdershin
It is, bebop was mistaken. (Unless my comment does not show up, in which case
he is correct.)

------
ahelwer
Neat read so far! Not done yet, but I think I've found a small error in
kernel.c: the attribute byte of the characters in "my first kernel" should be
set to 0x02, not 0x07.

edit: I misread. 0x07 is intentional, 0x02 was mentioned as an alternative.
Good post!

~~~
selmnoo
uaygsfdbzf: your account is marked dead it seems, so most of us can't see what
you write.

Here's what he posted:

Not an error. 0x02 means green character on black background. 0x07 means light
grey character on black bg. It's explained with one, and coded with another.

~~~
_mhr_
Not dead for me, and I don't have showdead on.

------
tbrock
This was fantastic. My only question is, how does one gain knowledge of the
required x86 hardware specifics he mentions? I don't know where to begin
looking to uncover these sorts of things:

    
    
      - The x86 CPU begins execution at the physical address [0xFFFFFFF0]
      - The bootloader loads the kernel at the physical address [0x100000]
      - The BIOS copys the contents of the first sector to physical address [0x7c00]
    

Is there an x86 instruction manual or is this sort of thing passed down
through generations of engineers?

~~~
exDM69
> Is there an x86 instruction manual or is this sort of thing passed down
> through generations of engineers?

Yes, there is an x86 instruction manuals, colloquially known as "intel
manuals".

That's about a dozen volumes of documentation, at a few thousand pages each.
It is quite a good example of well written and informative technical
documentation.

[http://www.intel.com/content/www/us/en/processors/architectu...](http://www.intel.com/content/www/us/en/processors/architectures-
software-developer-manuals.html?iid=tech_vt_tech+64-32_manuals)

------
aaren
Can anyone give me an idea how much different this would be for 64bit? Do I
just change the nasm directive to `bits 64`?

~~~
exDM69
> Can anyone give me an idea how much different this would be for 64bit? Do I
> just change the nasm directive to `bits 64`?

A lot different. You can see the boot code of my x86_64 hobby kernel project
[1].

The reason is that GRUB/Multiboot protocol is actually doing most of the
machine initialization, but it can only set up 32 bit mode. 64 bit mode is
missing partly because standardizing the Multiboot protocol is dragging
behind, partly because there's no one correct way to do this as you can't have
identity mapped memory in 64 bit mode (unlike 32 bit mode).

If you read the OSDev wiki, you can find examples of doing machine
initialization "from scratch", ie. after the PC BIOS (or UEFI). This involves
arcane details about the x86 machine like setting up something called the A20
line (which was a hack that allows to have 1 megabyte of memory - utilizing a
spare pin from the keyboard controller!), etc. Dealing with this shit is not
time well spent. (UEFI is a lot easier in some ways, harder in others)

This means that the boot code of your kernel must set up long mode, create an
initial page table for virtual memory, etc. Here's my limited 64 bit boot code
that sets up one 2 megabyte page [2].

Going to 64 bit mode is not that much more code, but it will make kernel
development more painful. In particular, switching CPU modes messes up the GDB
debugger which must be patched to work at all. And there's a bit more work
involved in all the little things that come with it.

So for educational purposes it would make more sense to stick to 32 bit mode
than deal with the nitty gritty details of 64 bit long mode.

[1]
[https://github.com/rikusalminen/danjeros](https://github.com/rikusalminen/danjeros)
[2]
[https://github.com/rikusalminen/danjeros/blob/master/src/arc...](https://github.com/rikusalminen/danjeros/blob/master/src/arch/x86_64/boot/start.s#L57)

~~~
e12e
This makes me really sad. 32bit mode x86 assembly is such a mess compared to
amd64 -- I guess it's a good excuse to work with qemu and arm, if nothing else
;-)

~~~
TallGuyShort
On the topic of ARM, does anybody know of a similar example like this for ARM?
One that just shows you how to pass control to C and do some basic I/O?

~~~
TallGuyShort
Too late to edit my post, but I eventually found this:
[http://wiki.osdev.org/ARM_Integrator-
CP_Bare_Bones](http://wiki.osdev.org/ARM_Integrator-CP_Bare_Bones). I haven't
tried it out yet as I'm still getting the cross-compiler toolchain together
(doesn't seem to be part of OpenSUSE 12, at least not an obvious part), and
it's not as well explained, but still the best I've found.

------
deathanatos
I kinda wish some of the FS utilities were better. Despite FUSE, mounting a
block device (like, disk image) as a non-root user is tough, so writing to an
image with actual partitions/FS on it is difficult, especially to script,
especially if you don't want to sudo in a build script. losetup on a disk
image doesn't (to my knowledge) detect partitions… for reasons unknown to me.

Bootsectors are similar. You can't just install grub to a disk image. (You
have to losetup it, at the very least, which implies root. Why can't I just
install to a file?)

You want a script/build system that allows you, at the very least, to:

1\. Code. 2\. Run build system/script. 3\. Fire up qemu or similar.

You can't be rebooting. Ideally, it'd be great to do this in userspace.

That said, if you're starting out, just do [boot sector] + [kernel] = tada
image until you need to do otherwise. (Really, do whatever works and is easy.)

That said, I've found a few somewhat helpful tools.

fuseloop[1] takes a file and offset/size, and exposes a single file. If you
have a partitioned disk image, then you can feed it the partition offset, and
it gives you back something you can format as an FS. (e.g., you can run ext2fs
on.)

Then there's fuse.ext2[2], which mounts an ext2 FS on FUSE, so non-root usable
again. Note that I'm linking to my fork of it, since the original didn't build
for me, but I didn't write it. (Which I fixed, and sent a pull request, but
never heard back.)

Finally — and sorry to peddle my own stuff again — I wrote a Python library
for dealing with the MBR.[3] I use it to figure out offsets and sizes in a
disk image.

I've had a bit of fun writing a boot loader, and I've managed to get it to
load up its stage 2 and switch to 32-bit pmode. Had a fun error where a
division instruction was throwing things into a triple fault; see [4] if you
want to see how a division instruction can fail _without_ dividing by zero
(which was the first thing I checked). The disk layout is currently:

    
    
      [boot sector] [stage 2] [kernel] [ partitions, FS, real data, etc. ]
    

stage2's size is hard coded into the first sector of stage 2 (into itself),
and the kernel's location and size will be similarly hard coded into it as
well. (When I get there. Disks in pmode are different, as you can't just have
the BIOS do all the work for you, sadly!) And by hard-coded, a build script
calculates and just re-writes a few bytes.

[1]:
[https://github.com/jmattsson/fuseloop](https://github.com/jmattsson/fuseloop)

[2]: [https://github.com/thanatos/fuse-
ext2-fakeFS](https://github.com/thanatos/fuse-ext2-fakeFS)

[3]: [https://github.com/thanatos/pymbr](https://github.com/thanatos/pymbr)

[4]: [http://stackoverflow.com/questions/21212174/why-cant-i-
step-...](http://stackoverflow.com/questions/21212174/why-cant-i-step-over-
this-div-instruction)

~~~
grahamedgecombe
> losetup on a disk image doesn't (to my knowledge) detect partitions… for
> reasons unknown to me.

You can use the kpartx command for this. This site has a good overview:
[http://nfolamp.wordpress.com/2010/08/16/mounting-raw-
image-f...](http://nfolamp.wordpress.com/2010/08/16/mounting-raw-image-files-
and-kpartx/)

------
tpush
I have done the same, sort of, only eschewing BIOS for UEFI and thus getting
straight to Long Mode :-).

It's pretty basic, currently just boots up and prints the memory map.

It's written in C++11 with the aim to be as clear as possible:
[https://github.com/thasenpusch/simplix](https://github.com/thasenpusch/simplix)
Have a look :-).

------
jonalmeida
I just finished an Operating Systems final today and the entire class was
conceptual and learning fundamentals, while I craved to get my hands dirty and
actually try to make a simple OS.

I'll definitely be playing around with this. Thanks!

------
anarion
Why kernel tutorials always say they need nasm. Gcc already comes with an
assembler so simply use it instead of installing other softwares. And if you
prefer intel syntax simply add ".intel_syntax;"

------
zenbowman
Looks great, looking forward to dive into this.

------
acomjean
Why C?

I'm just curious if another language can be used. (c++, go, rust).

~~~
DSMan195276
C is really easy for this sort of thing. I don't know the specifics for using
C++, Go, Rust, etc. (Disclamer, I'm a fan of C, and I know some C++ and Rust
but little Go). The kernel also doesn't get run-time help, so a lot of
features of C++, Rust, and Go would be gone right off the bat (Ex. You can't
be spawning threads in Rust or throwing exceptions in C++ without first
setting up your environment, which can be a non-trivial task). Using a subset
of any of those languages should work, provided the compiler can provide
straight ELF binaries that you could boot (I know g++ and rustc can do this,
Go is the only one I wonder about.).

If you get to that point, then you have the fact that the C ABI is fairly
standard, extremely easy to use, and well-known, so it makes it easy to
intermingle assembly and C. If you did it in C++, you'd definitely want to
extern "C" any symbol asm is going to call so it isn't mangled, Rust should be
the same, no idea on Go (You'd really also want to extern "C" any user-space
system calls, because the name mangling might get in the way.). For the last
point, you have to be able to use pointers and write to arbitrary memory
locations, which is easy for C and C++, possible in Rust, but I don't know
about Go.

IMO, the code displayed in this example is really hardly a kernel (It _is_ ,
but it doesn't really do anything). For a more complex kernel, you may get a
benefit from using a language other then C, but for something this simple C
lowers the complexity to get a working example with minimal issues going.

------
tsenkov
Awesome read, thanks.

------
kyberias
I'm not exactly Linus Torvalds but I'm pretty sure a program that prints one
line of text is not "a kernel". :)

~~~
TehCorwiz
I'm going to give OP the benefit of the doubt on this one. A couple things
worth noting, it successfully boots, does not cause a fault of any kind, and
is in a position to interact directly with the bare metal. The tools that we
use to interact with a *nix system are often just that, bits of code in user-
space. This is a kernel-space program. That it does not do any of the memory
management or device access yet doesn't mean it's not a kernel.

It might actually be interesting to take these techniques and apply them to
something else. Maybe a demo, ala the demoscene, that runs on the bare metal.
Maybe implement a game that doesn't have the overhead of an os.

This is less a finished product and more an inspiration. So often I think that
people miss that about things. Sometimes things aren't done, sometimes they're
just beginning.

~~~
Nav_Panel
> Maybe implement a game that doesn't have the overhead of an os.

I'm in CMU's operating systems class right now and that was one of our
projects (our second). The first was writing a stack-tracing debug library,
the third was a user-space thread library built on top of a particular kernel
spec, and then the fourth was to build a kernel basically from scratch, using
our thread library as a test program.

Here's a link to the spec for the game we built on the bare metal:
[https://www.cs.cmu.edu/~410/p1/proj1.html](https://www.cs.cmu.edu/~410/p1/proj1.html)

That fourth project, or "p3" as it's known around here, is known to be a
killer. It's tough to write a whole kernel in 6 weeks (+ 1 week of break),
especially when it's expected to have a full virtual memory system, kernel
tasks/threads for concurrent programming, program loading, interrupt handling,
etc, even with a partner.

~~~
TehCorwiz
Wow, that's pretty cool. Thanks for the link.

------
whydo
The first question is:

Why?

We already have a whole bunch of operating systems, many of them free.

One of the frequent problems with a lot of free/open source software folks is
that they lack direction. This type of thing where we do stuff just to do
stuff probably won't fly in one of the leading tech companies.

Why don't you figure out a real problem people have and look for ways to solve
that, instead of just doing random "interesting" stuff that wastes people's
valuable time?

~~~
kjs3
You must be a real joy at parties.

------
aortega
Great! except that to be called a kernel it's missing just a process manager,
memory manager, filesystem, process separation and hardware abstraction. Yeah
I'm _that_ guy, down vote me as you wish, the article is still wrong.

It's a way to load a ring-0 application into grub. Pretty cool, but not a
kernel.

~~~
nwg
It's a statically linked program, real memory, diskless system, kernel-mode-
only kernel without too much fluff in the video i/o abstraction.

~~~
aortega
Kernel means "core". Kernel-mode-only kernel is a contradiction. If you don't
have separate parts, you can't have a core.

~~~
nwg
So if you write core software for an architecture without privsep you must not
call it a kernel?

~~~
aortega
I'm not talking about privsep, just different modules. I gonna answer you with
another question:

Did MS-DOS had a kernel?

~~~
nwg
Yes. We all lose here today.

~~~
aortega
It depends. Being downvoted by people that believes a memcpy() is a kernel, is
victory.

