
Writing a Tiny x86 Bootloader - joebergeron
http://joebergeron.io/posts/post_two.html
======
okreallywtf
Random note, but it seems one of the resources linked
([http://www.ctyme.com/intr/rb-0097.htm](http://www.ctyme.com/intr/rb-0097.htm))
can contain NSFW ads, didn't get me in trouble but still gets me nervous on a
work machine (probably shouldn't be reading NH either for that matter).

That link is in the text "See the spec here." in the word "here".

I usually use adblock but I used the disable-all-extensions extension to debug
something and forgot to turn them back on.

------
gravypod
It's sad that most of these "bootloader" guides don't take you into actualy
boot-loading something. No searching a file system. No loading another binary.
No setting up 32 mode. Nothing, just printing text. That's what most of these
are on.

~~~
vectorEQ
He uses bios calls , thats what you need to learn to be able to write a
bootloader, the rest is just reading up some documentation about what you are
going to load. You can't fit a filesystem parser into 1 sector and have
anything other sensible in there... from bootloader you load sectors from
disk, using bios generally (on x86 atleast). and you can understand how to do
it from this tutorial.

~~~
gravypod
> the rest is just reading up some documentation about what you are going to
> load

That documentation is usually.... lacking

> You can't fit a filesystem parser into 1 sector and have anything other
> sensible in there

You don't use a single sector. You use a 2 stage bootloader that uses a
reserved FAT sector. In FAT12 (at least) you can specifiy how many reserved
secotrs on formatting. A reserved sector comes directly after the first
sector. You can do a full FAT12 implementation in assembly in two sectors.

> using bios generally (on x86 atleast). and you can understand how to do it
> from this tutorial.

You're going to need to do a lot more reading. Even just looking up the
interrupts you need is a task. I've found nothign that does a good job
explaining the 1, 2, and 3s of how to get all the way to booting from a
formated disk.

~~~
johnnycarcin
You sound pretty well versed in this area, would you be interested in writing
something up? Don't take this as a dig, I know the value of free time and how
much that free time is lacking for most of us, I just thought I'd toss it out
there.

~~~
gravypod
The best I can do is load up a second stage. Searching for the COM file is
where I get lost. I've also been trying to do it in C for months and I haven't
been able to finish for various reasons.

------
josteink
Not saying he is wrong or anything,but he seems to have missed how modern
machines ship with UEFI which tries to solve this problem on firmware level.

That said: if you're curious and want to learn, I have no objections to
digging into stuff, even "obsolete" stuff like BIOS boot :)

~~~
slim
What would be the equivalent of this in UEFI? I mean is there a one page
tutorial on booting with UEFI by writing a few lines of assembly?

~~~
josteink
I found this article previously on HN:
[https://news.ycombinator.com/item?id=12238498](https://news.ycombinator.com/item?id=12238498)

You may find it interesting.

Basically you won't bother writing a _bootloader_ with UEFI, since it already
provides that feature. Instead you'll get right at working on your OS.

------
mkup
Why he is using 32-bit esp/ebp registers in 16-bit environment? There are
16-bit sp/bp registers. And there is 0xFFFF limit on segment descriptors in
real mode anyway.

~~~
geocar
> Why he is using 32-bit esp/ebp registers in 16-bit environment?

It might be tooling. A 32-bit assembler (like gas) will turn `mov %esp,%ebp`
into `89 e5` while `mov %sp,%bp` becomes `66 89 e5` -- the former being
correct when actually in 16-bit.

> And there is 0xFFFF limit on segment descriptors in real mode anyway.

[http://wiki.osdev.org/Unreal_Mode](http://wiki.osdev.org/Unreal_Mode)

~~~
mkup
> A 32-bit assembler (like gas) will turn `mov %esp,%ebp` into `89 e5` while
> `mov %sp,%bp` becomes `66 89 e5` -- the former being correct when actually
> in 16-bit.

He is using nasm with "bits 16" directive, so 66 prefix will be emitted for
"mov ebp,esp". gas with 32-bit target is totally unrelated to this discussion.

> [http://wiki.osdev.org/Unreal_Mode](http://wiki.osdev.org/Unreal_Mode)

So what? MBR runs in 16-bit real mode.

~~~
geocar
> He is using nasm with "bits 16" directive

So he is!

I had to download nasm to check, but that sounds useful.

------
TazeTSchnitzel
It's a cool example boot sector program, but it doesn't actually, uh, load
anything. Does it still count as a bootloader?

------
AndrewStephens
I wrote a more elaborate bootloader for a job about 15 years ago - it was a
fun challenge to squeeze a lot of functionality into such a small, limited
package.

At the time, bootloaders could add additional functionality by staying
resident and hooking various interrupts to provide services or work around
bugs in the BIOS. For instance, my bootloader would translate sector mappings
that would allow WinPE to boot from USB drives that would not otherwise work
due to (idiotic) limitations in some BIOSes.

This is a fun example but doesn't really go very far. Handling memory and
reading additional code from disk are other topics that are required for a
full bootloader, but this was a nice trip down memory lane.

------
pepijndevos
Does anyone know of any resources for doing stuff like this on ARM? I have a
board with u-boot, but I can't find any information about how to do even a
hello world type program.

~~~
pm215
32-bit ARM is a bit awkward because traditionally there is no equivalent of
the x86 BIOS APIs for doing things like printing output, and the code started
by a bootloader is expected to talk directly to the hardware (serial port
etc), which is different on every board. If your board uses UEFI then you can
use those APIs, or if your grub was built with CONFIG_API that might provide
something useful. Otherwise even 'hello world' will require talking to the
serial port and adjusting any tutorial code you find to the specifics of your
board.

------
vectorEQ
If you want to save some bytes in the bootloader, use bios interrupt to change
video mode to clear screen ^^ it's very few instructions to do it.

------
bogomipz
It's been forever since I've thought about the segmented memory model. Can
someone explain this math to me:

The OP states:

"Since our code resides at 0x7C00, the data segment may begin at 0x7C0"

0x7C00 in decimal is 31744 and 0x7C0 in is decimal 1984 and a segment is 16
bits. I understand that the CPU is hardwired to do a JMP to 0x7C00. But how
did they arrive at choosing 0x7C0 for the data segment?

~~~
david-given
0000:7c00 (segment 0, offset 0x7c00) and 07c0:0000 (segment 0x7c0, offset 0)
are at the same physical address of 0x00007c00, so the processor will see the
same bytes each way. The latter's more useful in 16-bit mode, though, as it
allows you to use offsets 0x0000 to 0xffff to access the 64kB of data starting
at physical address 0x00007c00.

So typically the first thing you do on entry to the boot sector is to do a far
jump to change code segments. But you don't have to; if you're immediately
switching to 32-bit mode it may not be worth it.

Here's my boot sector --- everyone's written one. Mine's intended to load and
run raw tiny mode executables on floppy.

[https://github.com/davidgiven/ack/blob/default/plat/pc86/boo...](https://github.com/davidgiven/ack/blob/default/plat/pc86/boot.s)

~~~
bogomipz
Thanks for the clarification and your link. Maybe I am reading this wrong
then?

"Since our code resides at 0x7C00, the data segment may begin at 0x7C0"

If those are both the same address how can the code segment and the data
segment occupy the same address?

Thanks.

~~~
david-given
Yup, they can. That way your code and data share the same address range, so
you need to make sure that the offsets within the address range don't overlap.

It's also possible to configure the system so that your code lives in one
segment, and your data and stack live in another. That way, you get 64kB for
code and another 64kB for data.

We have to put the stack in the same segment as the data because C requires
the stack to be addressable; if you're not working in C, you can put the stack
in a third segment and get a bit more space.

Of course, if you're willing to use pointers which contain the segment as well
as the offset, so 32 bits wide, you can use multiple segments, but that raises
all sort of complexity that's not really worth thinking about these days.

~~~
bogomipz
Ah OK the segments have the same address just different offsets from that
address is that correct?

You mention:

"We have to put the stack in the same segment as the data because C requires
the stack to be addressable"

Are symbols only "reachable" if they are within the base and limit of same
memory segment then? If this was the case I would think that the code and
stack must be in the same segment? Why is the data segment?

Cheers.

------
ejanus
I checked the blog but came out without anything new. I would like to write
bootloader for micro-controllers like PIC and Atmel where should I start?

~~~
detaro
Existing implementations and/or the datasheet of the controller.

~~~
ejanus
Both would be great. I like picking new idea or concept from fundamental
level. Please send me links or emails.

------
mysterydip
This looks like it would go together well with the article yesterday of the
little OS handbook, which used GRUB for its bootloader to focus on the OS
part.

------
ternaryoperator
Please use left justification. The center justification makes reading the
article much more difficult.

~~~
joebergeron
Done!

------
tr1ck5t3r
If you read this article, you'll understand just how easy it is to compromise
so many computers. What I have seen infecting my own systems works on
computers built in 2004 as well as new machines, so its exploiting the design
and implementation of various international standards. The way it works, is
the bios loads the malware irrespective of the boot drive order specified in
the bios, and then it seems to rewrite the make and model of the hard drives
and cd/dvd devices which you can only see by pausing the bios boot process on
old machines. New pc's with SATA are too quick to even pause the part of the
bios boot process which shows the hard ware make and model has been rewritten.

In ubuntu for example, whilst booting, ubuntu starts loading more of the
malware from the metacache which suggest the hard drive cache may have been
filled by the initial boot loader for the malware, again helping to hide the
malware from detection. When booting different OS's, it works with XP, Ubuntu,
Parted Magic, Kali, Tails and others. The OS's seem to actively hide the
malware if you use a hex editor to scan the drive or infected files, so over
time, some of that open source code has become compromised, and lets not
forget the Dirty Cow exploit has been around since 2007 potentially making it
possible to hack many different packages that make up the core Linux OS. It
also seems to use SNMP to hack into managed switches, so whether this is
getting into the Stuxnet/DuQu/DuQu2 territory, remains to be seen, but I would
suggest it is, this then narrows it down to one country, because in maths its
possible to calculate unknowns.

Remember, software always does what its told, unlike Humans and here in lies
its weakness!

~~~
rasz_pl
>bios loads the malware irrespective of the boot drive order specified in the
bios

>rewrite the make and model of the hard drives

ummm, no

