
How Does an Intel Processor Boot? - bytefire
https://binarydebt.wordpress.com/2018/10/06/how-does-an-x86-processor-boot/
======
burfog
This part is a bit of a mess:

"16-bit Real Mode with insruction pointer pointing to address 0xffff.fff0, the
reset vector. In this initial mode, the processor has first 12 address lines
asserted, so any address looks like 0xfffx.xxxx. This fact combined with how
addressing using segment selector (SS) register works, allows the CPU to
access instruction at reset vector address 0xffff.fff0"

SS is the Stack Segment register. CS is for code.

In real mode, the instruction pointer is only 16-bit and can not hold a value
in excess of 0xffff.

Ignoring those issues, the explanation still doesn't match what I've seen
before in the documentation. If things have changed, when did that happen? The
old explanation:

The CS base is set to something like -16, that is with all but the lower 4
bits set. This covers all of physical address space, with any higher bits just
being ignored. The instruction pointer is set to 0. The result is execution
that starts 16 bytes below the first address that is beyond the end of the
physical address space. For example, with 44-bit physical addresses this would
be at 0x00000ffffffffff0.

~~~
bytefire
yes CS not SS. i should fix that.

regarding how the CPU addresses 0xffff.fff0 is not exactly specified in the
post. actually CS register is loaded with 0xf000 and normally this would yield
a segment selector address of 0x000f.0000 (CS left-shifted by 4 bits). but on
a reset, like the post mentions, first 12 address lines are asserted so the
base address ends up being 0xffff.0000. these address lines remain asserted
until a long jump is made, after which the first 12 address lines are de-
asserted and normal CS segment selector calculation resumes.

instruction pointer contains -16 as you mentioned, the resulting address is:

base address + IP = 0xffff.0000 + 0xfff0 = 0xffff.fff0

i am not sure if this is worth adding to the post but it is definitely useful.

~~~
atq2119
I recall reading that it's not that those 12 bits are explicitly asserted, but
rather that the CS descriptor after reset is in an "unreal mode". After all,
x86 segment descriptors consist not just of their numeric value, but also of a
base address, segment size, and privilege information.

So at reset, CS is set to a descriptor whose numeric value is 0xf000 and whose
base address is 0xffff0000, or something to that effect. All the rest follows
naturally -- there's no special case logic that asserts lines of the address
bus until the first long jump, it's simply that the reset value of the CS
descriptor is rather magical, and that long jumps by their nature load a new
CS segment descriptor which isn't magical.

~~~
bytefire
this is interesting! i am not aware of how this logic is implemented, i.e. the
logic of initial state where 12 most significant bits but thanks for
enlightening

~~~
hyperman1
I looked it up:

[https://software.intel.com/en-us/articles/intel-sdm#nine-
vol...](https://software.intel.com/en-us/articles/intel-sdm#nine-volume)

Get volume 3A and read chapter 9.1.4 at pg 315. The text is quite readable:

    
    
      The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode. The
      processor is initialized to this starting address as follows. The CS register has two parts: the visible segment
      selector part and the hidden base address part. In real-address mode, the base address is normally formed by
      shifting the 16-bit segment selector value 4 bits to the left to produce a 20-bit base address. However, during a
      hardware reset, the segment selector in the CS register is loaded with F000H and the base address is loaded with
      FFFF0000H. The starting address is thus formed by adding the base address to the value in the EIP register (that
      is, FFFF0000 + FFF0H = FFFFFFF0H).
    

Any change to CS reverts this to normal real mode operation. So near jumps are
OK, far jumps or interrupts are not.

~~~
Stratoscope
Speaking of readability, here's a readable copy of that quote:

> The address FFFFFFF0H is beyond the 1-MByte addressable range of the
> processor while in real-address mode. The processor is initialized to this
> starting address as follows. The CS register has two parts: the visible
> segment selector part and the hidden base address part. In real-address
> mode, the base address is normally formed by shifting the 16-bit segment
> selector value 4 bits to the left to produce a 20-bit base address. However,
> during a hardware reset, the segment selector in the CS register is loaded
> with F000H and the base address is loaded with FFFF0000H. The starting
> address is thus formed by adding the base address to the value in the EIP
> register (that is, FFFF0000 + FFF0H = FFFFFFF0H).

(Don't use indentation to format a block quote, only use it for code
listings.)

------
garganzol
Hey, thanks for the great article on an intricate topic. It is nice to see the
resurgence of WordPress blog posts recently. Back in the day it was the
primary source of sacred knowledge for me and many other people around the
world.

------
hlandau
It's a bit light on details in the ME and pre-real mode stages. For example,
it omits to mention the 'Authenticated Code Module' which now executes before
the BIOS reset vector.

~~~
ajross
Lots of those details are vendor-specific. Really the specifified behavior
exists in the modern world not to enable "boot" (which as you point out is
mostly the job of firmware running on the ME and not the application CPUs),
but to clearly define the initialization sequence for SMP startup (i.e. how do
the auxilliary CPUs start -- Yes, that's right, they start in real mode) and
some other minutiae like mode switching for legacy interaction with BIOS
calls.

------
sebazzz
Doesn't this article confuse BIOS and UEFI? UEFI starts the bootloader in
32-bit mode, however, the classic BIOS still jumps through the bootloader in
16-bit mode.

~~~
hyperman1
That's a hard question to answer. On the original PC, ROM BIOS (Basic I/O
System) was the name for both the (EEP)ROM-based firmware sitting at address
F0000 and the INT based interface it provided to the OS.

This included how the system handed of control to the OS. BIOS just loaded the
first disk sector and executed whatever it found there. MBR-based partition
tables were a DOS-convention, the BIOS couldn't care less what the first disk
sector did once it was in control.

When UEFI, the new boot interface, was invented, we needed a name for the old
boot conventions. So we called them BIOS.

It's hard to claim the article confuses anything if the original word BIOS has
such a confused meaning to begin with. If you say BIOS=PC firmware except the
option ROMs, the article is correct.

~~~
JdeBP
The use of "BIOS" to name the PC, PC/AT, PC98, and suchlike firmwares long
pre-dates the existence of EFI, and was _not_ a reaction to it.

And the name is even more confusing than you paint it to be. The "BIOS" was
_also_ the bottom-half of MS/PC/DR-DOS, contained in IO.SYS in (pre version 6)
MS-DOS and in IBMBIO.COM in PC-DOS and (post version 3) DR-DOS.

~~~
hyperman1
To clarify: The name BIOS was used for the very first PC, but there was no
clear distinction made (or necessary) between firmware, boot protocol,
interface, ...

When the old boot protocol needed a name, it got BIOS. This situation arose as
reaction to the existence of the new UEFI boot protocol.

AFAIK, IO.sys/IBMBIO.COM was the interface used by DOS to the hardware (some
kinde of HAL). Microsoft seemed to think every PC-clone vendor would implement
its own firmware, and they would have to port IO.SYS for every platform.
Happily, after I thinc Compaq reverse engineered the IBM BIOS, this turned out
to be unnecesary: IO.sys was only ever implememented for BIOS.

~~~
JdeBP
Repeating that this was a reaction to EFI does not make it the case. That
remains an ahistorical explanation.

And this despite the change of tack from from naming the firmware to naming
the boot mechanism. The name used for the boot mechanism, also adopted long
before EFI existed, was a "boot record" or "boot sector", subdivided by type
into _volume_ boot records and _master_ boot records, terminology that goes
back to the 1980s. The boot protocol was not actually named "BIOS" at all.

IO.SYS/IBMBIO.COM was the Basic Input/Output System, nomenclature (alongside
the names of the other parts of the operating system: the BDOS, the command
processor, and the housekeeping utilities) that MS-DOS got from CP/M. It was
one of two things called that, the other being the machine firmware.

Neither was in any way influenced by something that did not exist until
decades later. And although there was confusion between the two, it was not
some generalized confusion about parts of the system in general. "BIOS" was
not a name for a boot sector/record, even though one of the things _contained
within_ a boot record was a BIOS Parameter Block, which people had to
regularly explain meant "the _other_ thing that is called a BIOS". And people
regularly distinguished in the 1980s between such things as "BIOS services"
and the "ROM BIOS".

~~~
hyperman1
Let's try and find out:

[https://www.pcjs.org/pubs/pc/reference/ibm/5150/techref/](https://www.pcjs.org/pubs/pc/reference/ibm/5150/techref/)

First external PDF, pg 169 first talks about the ROM resident Basic I/O System
(BIOS). From that point on, the text refers to this as BIOS, and says that a
complete listing is provided in appendix A -- which has nothing concerning to
IO.SYS. So it seems IBM considered only the ROM based part BIOS, without the
IO.SYS part on the diskette. Even ROM BIOS seems to be a misnomer according to
this document. But, as the BIOS sits in ROM and provides services, it is easy
to see people talking about ROM BIOS and BIOS services.

As the 5150 has no hard disk, we need the 5160 manual for the HD boot
protocol: pg 417 and 419 of
[https://www.pcjs.org/pubs/pc/reference/ibm/5160/techref/](https://www.pcjs.org/pubs/pc/reference/ibm/5160/techref/).
It reads the first sector from the HD, and checks if the last 2 bytes have a
specific value. If yes, it executes whatever it found there.

This de facto defines the MBR as that first sector, without ever calling it
MBR. No partitions or Volume boot records exist at the BIOS level, these were
purely DOS conventions. See

The Bios Parameter Block was a DOS structure in specific file systems like
FAT16. It told DOS how to translate BIOS disk layout (e.g.
Cylinder/Head/Sector) to DOS disk layout. See
[https://en.wikipedia.org/wiki/BIOS_parameter_block](https://en.wikipedia.org/wiki/BIOS_parameter_block)
and note how details vary with DOS versions. Despite its name, the BIOS did
not know or care about this structure.

UEFI is a new firmware standard for the PC, intended to replace the BIOS. At
this point, the BIOS was used mainly for booting the OS and communicating PC
parameters to the OS. Applications did not directly talk to the BIOS anymore -
this was a habit from the DOS era. Seeing UEFI and BIOS today as only a boot
mechanism is therefore not a 'change of tack'.

I found no references calling IO.SYS part of the BIOS. Maybe this was a CP/M
convention?

Of course, the word BIOS did not time travel to after the release of UEFI. But
if you need to describe the non-UEFI way with 1 word, what are you going to
do? Call it "that stuff the BIOS did when it wanted to load an OS" is a bit
long, so boot mode={UEFI,BIOS} seems reasonable to me. This nomenclature is
regularly used in the references provided by
[https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_In...](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface)

------
usr1106
It would be interesting to understand the fundamental difference why ARM
systems can even boot without a BIOS and a BIOS chip. Of course they also need
to have their DRAM configured, but when compiling U-Boot I don't recall any
complication like stackless code. Just that your address space is extremely
small in the beginning.

~~~
zaarn
IIRC from my Embedded Linux course, ARM has a small amount of SRAM embedded in
the chip that is available during boot and is later disabled in favor of the
bigger DRAM chips once the controller is initialized.

Uboot is responsible for this in some chips.

~~~
usr1106
Yes, ARM has this small amount of SRAM. Having a bit of memory available was
such a trivial concept that I did not even think of that the much more
complicated x86 processor could just lack it.

But I guess this might not be the only fundamental difference.

(I have thought to utilize the SRAM for some optimization later at system
runtime because it should be incredibly fast. At least if you don't have to
care about power consumption that should be possible or does the specification
require to turn it off?)

~~~
zaarn
Well, as the article mentions, Intel uses Cache during Boot as an early SRAM
memory, difference seems to be (probably for legacy reasons) that this is not
the default.

Most of the boot process is just swinging from one legacy mode to the next
until you hit the modern parts.

------
bogomipz
I have a suggestion and a question. The article makes numerous references to
"memory initialization" but I didn't see anywhere where its explained what it
means to "initialize" DRAM hardware. If I overlooked this I apologize.

My understanding has always been that initializing DRAM consisted of two
things:

The BIOs had to enumerate how much physical memory that motherboard had
installed. And then to test that that memory is working by writing a bit to
each location and reading it back.

Would this be accurate?

Would also be worth noting that many BIOSes allow you to hit the space bar to
skip memory initialization presumably because it somewhat time-consuming.

~~~
bytefire
no you didn't overlook. the article doesn't discuss actual mechanics of DRAM
init, so thank you for adding this info :) i know there is a process of memory
training whose aim is to arrive at the right parameters for that DRAM. the way
i see it, it is sort of in-field caliberation. boot firmware can then store
those parameters inside BIOS chip and then on next reboot just use those
parameters, because memory training is a time-consuming process.

------
asianpopupwork
If the memory initialize code is released as a binary blob, it is called as
FSP (which usually is used with Coreboot). Why not just call it as the
original name - memory reference code(MRC)?

~~~
bytefire
you're right, MRC is a major part of FSP but i think FSP does more work than
just initialise memory. it also performs some CPU init and also ICH.

