Hacker News new | comments | show | ask | jobs | submit login
Writing a Tiny x86 Bootloader (joebergeron.io)
177 points by joebergeron 180 days ago | hide | past | web | 45 comments | favorite



Random note, but it seems one of the resources linked (http://www.ctyme.com/intr/rb-0097.htm) can contain NSFW ads, didn't get me in trouble but still gets me nervous on a work machine (probably shouldn't be reading NH either for that matter).

That link is in the text "See the spec here." in the word "here".

I usually use adblock but I used the disable-all-extensions extension to debug something and forgot to turn them back on.


It's sad that most of these "bootloader" guides don't take you into actualy boot-loading something. No searching a file system. No loading another binary. No setting up 32 mode. Nothing, just printing text. That's what most of these are on.


Fair enough, here is a google search term that seems to turn up some useful bootloader source code:

  "0x2098" "lgdt"
Please chime in with other links to great bootloader tutorials!


He uses bios calls , thats what you need to learn to be able to write a bootloader, the rest is just reading up some documentation about what you are going to load. You can't fit a filesystem parser into 1 sector and have anything other sensible in there... from bootloader you load sectors from disk, using bios generally (on x86 atleast). and you can understand how to do it from this tutorial.


> the rest is just reading up some documentation about what you are going to load

That documentation is usually.... lacking

> You can't fit a filesystem parser into 1 sector and have anything other sensible in there

You don't use a single sector. You use a 2 stage bootloader that uses a reserved FAT sector. In FAT12 (at least) you can specifiy how many reserved secotrs on formatting. A reserved sector comes directly after the first sector. You can do a full FAT12 implementation in assembly in two sectors.

> using bios generally (on x86 atleast). and you can understand how to do it from this tutorial.

You're going to need to do a lot more reading. Even just looking up the interrupts you need is a task. I've found nothign that does a good job explaining the 1, 2, and 3s of how to get all the way to booting from a formated disk.


You sound pretty well versed in this area, would you be interested in writing something up? Don't take this as a dig, I know the value of free time and how much that free time is lacking for most of us, I just thought I'd toss it out there.


The best I can do is load up a second stage. Searching for the COM file is where I get lost. I've also been trying to do it in C for months and I haven't been able to finish for various reasons.


You're going to need to do a lot more reading. Even just looking up the interrupts you need is a task. I've found nothign that does a good job explaining the 1, 2, and 3s of how to get all the way to booting from a formated disk.

There is unlikely to be a single thorough body of documentation on this sort of thing. IMHO, the best resource would be the numerous open source bootloaders that are available.


There isn't a lot there to document. At this point it counts as a part of our civilization. It's even on Wikipedia:

* https://en.wikipedia.org/wiki/Master_boot_record

* https://en.wikipedia.org/wiki/BIOS_interrupt_call

There isn't much call for FAT12 these days...


Not saying he is wrong or anything,but he seems to have missed how modern machines ship with UEFI which tries to solve this problem on firmware level.

That said: if you're curious and want to learn, I have no objections to digging into stuff, even "obsolete" stuff like BIOS boot :)


Thanks for giving it a look! You're certainly right about UEFI. I figured it would be easier and informative enough in the long run to just write a simple MBR bootloader instead of an EFI-format one.

Either way, it was definitely fun.


It surely was fun, and I personally liked the way you detailed and commented the code, but that is not a bootloader, it is a bootload, i.e. something that is loaded at boot, executes some instructions and doesn't actually load anything. In the good ol' days there were tens of similar bootsectors (to be used on non bootable floppies) that could display mesages or draw ASCII art. If anyone wants to play with actual MBR bootloaders, a basic MBR is by Peter Anvin (and Andrea Mazzoleni) in makebootfat:

http://www.advancemame.it/doc-makebootfat

and for an example of how much stuff you can actually put inside some 440 bytes, the base reference is mbldr:

http://mbldr.sourceforge.net/


What would be the equivalent of this in UEFI? I mean is there a one page tutorial on booting with UEFI by writing a few lines of assembly?


I found this article previously on HN: https://news.ycombinator.com/item?id=12238498

You may find it interesting.

Basically you won't bother writing a bootloader with UEFI, since it already provides that feature. Instead you'll get right at working on your OS.


I actually looked this up after reading this comment. Best I could find after a quick google search was this: http://x86asm.net/articles/uefi-programming-first-steps/

which is certainly not an easy tutorial to follow. Seems like more robustness tends towards a greater barrier to entry.


Once you outgrow your boot sector space, writing a BIOS based bare metal project gets a lot harder as you need to start loading stuff from disk with more BIOS calls.

It's a lot easier and nicer to use either UEFI or a Multiboot-protocol bootloader (GRUB, or qemu/bochs built-in bootloader). That also means you will be starting in long mode or protected mode.

In order to work with bare metal efficiently, you need to get a debuggable and bootable ELF image up and running as soon as possible. Once you can attach a remote GDB debugger to Qemu, you get things done much faster than using the Qemu/Bochs monitor/debugger.

Anything more complex than a hello world example is easier with UEFI or multiboot than BIOS boot sectors. You will need a linker script and a build script but things will be much easier after the first steps.

I made this pull request for someone else's bare metal project to add debuggable ELF images. It's a small but practical example:

https://github.com/Overv/MineAssemble/pull/5


Why he is using 32-bit esp/ebp registers in 16-bit environment? There are 16-bit sp/bp registers. And there is 0xFFFF limit on segment descriptors in real mode anyway.


I've since updated the article and code on Github to reflect this discssion. Thanks HN as always for teaching me something new.


> Why he is using 32-bit esp/ebp registers in 16-bit environment?

It might be tooling. A 32-bit assembler (like gas) will turn `mov %esp,%ebp` into `89 e5` while `mov %sp,%bp` becomes `66 89 e5` -- the former being correct when actually in 16-bit.

> And there is 0xFFFF limit on segment descriptors in real mode anyway.

http://wiki.osdev.org/Unreal_Mode


> A 32-bit assembler (like gas) will turn `mov %esp,%ebp` into `89 e5` while `mov %sp,%bp` becomes `66 89 e5` -- the former being correct when actually in 16-bit.

He is using nasm with "bits 16" directive, so 66 prefix will be emitted for "mov ebp,esp". gas with 32-bit target is totally unrelated to this discussion.

> http://wiki.osdev.org/Unreal_Mode

So what? MBR runs in 16-bit real mode.


> He is using nasm with "bits 16" directive

So he is!

I had to download nasm to check, but that sounds useful.


geocar, you're wrong, mkup is right. I've checked a few disassembled

https://onlinedisassembler.com/odaweb/z1mMaYSk/0

locations of his binary file (which is identical to the binary I can produce with the nasm):

      
    :0000001f    6683c402 add $0x2,%esp	      
            
    :00000025    6655     push %ebp	      
      
    :00000027    6689e5   mov %esp,%ebp
There's a 66h "Operand-size override" prefix present in the binaries which is not needed in the 16-bit code. The proper instructions would be "push bp" etc.


It's a cool example boot sector program, but it doesn't actually, uh, load anything. Does it still count as a bootloader?


I wrote a more elaborate bootloader for a job about 15 years ago - it was a fun challenge to squeeze a lot of functionality into such a small, limited package.

At the time, bootloaders could add additional functionality by staying resident and hooking various interrupts to provide services or work around bugs in the BIOS. For instance, my bootloader would translate sector mappings that would allow WinPE to boot from USB drives that would not otherwise work due to (idiotic) limitations in some BIOSes.

This is a fun example but doesn't really go very far. Handling memory and reading additional code from disk are other topics that are required for a full bootloader, but this was a nice trip down memory lane.


Does anyone know of any resources for doing stuff like this on ARM? I have a board with u-boot, but I can't find any information about how to do even a hello world type program.


32-bit ARM is a bit awkward because traditionally there is no equivalent of the x86 BIOS APIs for doing things like printing output, and the code started by a bootloader is expected to talk directly to the hardware (serial port etc), which is different on every board. If your board uses UEFI then you can use those APIs, or if your grub was built with CONFIG_API that might provide something useful. Otherwise even 'hello world' will require talking to the serial port and adjusting any tutorial code you find to the specifics of your board.


If you want to save some bytes in the bootloader, use bios interrupt to change video mode to clear screen ^^ it's very few instructions to do it.


It's been forever since I've thought about the segmented memory model. Can someone explain this math to me:

The OP states:

"Since our code resides at 0x7C00, the data segment may begin at 0x7C0"

0x7C00 in decimal is 31744 and 0x7C0 in is decimal 1984 and a segment is 16 bits. I understand that the CPU is hardwired to do a JMP to 0x7C00. But how did they arrive at choosing 0x7C0 for the data segment?


0000:7c00 (segment 0, offset 0x7c00) and 07c0:0000 (segment 0x7c0, offset 0) are at the same physical address of 0x00007c00, so the processor will see the same bytes each way. The latter's more useful in 16-bit mode, though, as it allows you to use offsets 0x0000 to 0xffff to access the 64kB of data starting at physical address 0x00007c00.

So typically the first thing you do on entry to the boot sector is to do a far jump to change code segments. But you don't have to; if you're immediately switching to 32-bit mode it may not be worth it.

Here's my boot sector --- everyone's written one. Mine's intended to load and run raw tiny mode executables on floppy.

https://github.com/davidgiven/ack/blob/default/plat/pc86/boo...


Thanks for the clarification and your link. Maybe I am reading this wrong then?

"Since our code resides at 0x7C00, the data segment may begin at 0x7C0"

If those are both the same address how can the code segment and the data segment occupy the same address?

Thanks.


Yup, they can. That way your code and data share the same address range, so you need to make sure that the offsets within the address range don't overlap.

It's also possible to configure the system so that your code lives in one segment, and your data and stack live in another. That way, you get 64kB for code and another 64kB for data.

We have to put the stack in the same segment as the data because C requires the stack to be addressable; if you're not working in C, you can put the stack in a third segment and get a bit more space.

Of course, if you're willing to use pointers which contain the segment as well as the offset, so 32 bits wide, you can use multiple segments, but that raises all sort of complexity that's not really worth thinking about these days.


Ah OK the segments have the same address just different offsets from that address is that correct?

You mention:

"We have to put the stack in the same segment as the data because C requires the stack to be addressable"

Are symbols only "reachable" if they are within the base and limit of same memory segment then? If this was the case I would think that the code and stack must be in the same segment? Why is the data segment?

Cheers.


I checked the blog but came out without anything new. I would like to write bootloader for micro-controllers like PIC and Atmel where should I start?


Existing implementations and/or the datasheet of the controller.


Both would be great. I like picking new idea or concept from fundamental level. Please send me links or emails.


This looks like it would go together well with the article yesterday of the little OS handbook, which used GRUB for its bootloader to focus on the OS part.


Please use left justification. The center justification makes reading the article much more difficult.


Done!


If you read this article, you'll understand just how easy it is to compromise so many computers. What I have seen infecting my own systems works on computers built in 2004 as well as new machines, so its exploiting the design and implementation of various international standards. The way it works, is the bios loads the malware irrespective of the boot drive order specified in the bios, and then it seems to rewrite the make and model of the hard drives and cd/dvd devices which you can only see by pausing the bios boot process on old machines. New pc's with SATA are too quick to even pause the part of the bios boot process which shows the hard ware make and model has been rewritten.

In ubuntu for example, whilst booting, ubuntu starts loading more of the malware from the metacache which suggest the hard drive cache may have been filled by the initial boot loader for the malware, again helping to hide the malware from detection. When booting different OS's, it works with XP, Ubuntu, Parted Magic, Kali, Tails and others. The OS's seem to actively hide the malware if you use a hex editor to scan the drive or infected files, so over time, some of that open source code has become compromised, and lets not forget the Dirty Cow exploit has been around since 2007 potentially making it possible to hack many different packages that make up the core Linux OS. It also seems to use SNMP to hack into managed switches, so whether this is getting into the Stuxnet/DuQu/DuQu2 territory, remains to be seen, but I would suggest it is, this then narrows it down to one country, because in maths its possible to calculate unknowns.

Remember, software always does what its told, unlike Humans and here in lies its weakness!


>bios loads the malware irrespective of the boot drive order specified in the bios

>rewrite the make and model of the hard drives

ummm, no


[flagged]


If you're going to respond empathetically to someone who you think needs help, it's important to distinguish that from a personal attack. This comment doesn't do that—indeed it almost seems like you're invoking "psychiatric help" as ammunition in an argument, which would be bannable offence here.


It's almost a moot point. Given the underlying pathology, any attempt at all to discredit this type of delusion* will necessarily be seen as a part of the conspiracy, and hence as an attack. No amount of empathy will help.

But I certainly see what you mean; I was probably too quick to use that stock phrase. My intent here wasn't to argue, because I know that's impossible; it was more to identify the pathology to others. Would my comment have been fine without the last line? (I'm assuming that you were the one to flag it.)

*: and I mean that in the psychiatric sense, not as an insult


I think you'd need both to drop the last line and reword the first line, because otherwise it's too easy to read as invoking a psychiatric category as snark.

There's probably a way of making this point that's fine, but it would require pre-emptively distinguishing your comment from the obvious misinterpretations. Which is tedious, but such is the way communication on a large forum needs to work, or else people just react and attack.


Except I have isolated the code and can reproduce it on demand, and have the photo's & video's as evidence so nice try.

However having just read this https://www.facebook.com/dragosr/posts/10151655183445588 I can see he has explained much of what I was witnessing on some systems as well.

Cant rule out a modern day version of one of these https://en.wikipedia.org/wiki/Phoebus_cartel considering the Windows MSR partition I have copies of which effectively stripes the Windows Partition as well.

Some people are desperate to keep this quiet though, so perhaps suggesting what you have is your way to introduce doubt into an argument which is after all a valid debating technique.


With all due respect, everything you said sounds like some kind of conspiracy theory. If you have isolated this case, write a blog post of how to reproduce it, post said photos & videos, so people might at least take their time to read it. I'm sure with all the experts here on HN somebody would provide an explanation. There are hell lots of processes happening at modern system boot time, if it does something you don't understand, it doesn't necessarily mean malware.

" The OS's seem to actively hide the malware if you use a hex editor to scan the drive or infected files, so over time, some of that open source code has become compromised"

This part, in particular, I find hard to believe. You can inspect open source system and compile everything yourself. Did you try to reproduce this behavior on minimal systems (i.e. BeagleBone) or QEMU?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: