
8086 Microcode Disassembled - matt_d
https://www.reenigne.org/blog/8086-microcode-disassembled/
======
rkagerer
Chris Gerlinsky did a talk last year on the process he uses to decap chips and
extract their ROM bits with a microscope:
[https://youtu.be/4YpSevQWCX8](https://youtu.be/4YpSevQWCX8)

My favorite line was when he described how one hint you've got the decoding
right might be stumbling upon a recognizable ASCII string, and said "sometimes
the only ASCII text you find is a copyright notice... keep putting those in,
that's great!"

------
derefr
One thing I've always wondered: in what ways is the design of microcode
instruction-sets for CISC-ISA CPUs, different from the design of outward-
presenting RISC ISAs?

For example, does microcode tend to have instructions that "half complete" a
transfer-level operation, leaving some registers in an indeterminate state,
under the assumption (which is, in practice, a guarantee) that they'll always
have another ucode op executed after them that does "the rest of" the
operation and so puts things right?

Or, for another example, on CISC CPUs that have a small set of system-visible
registers, and use register renaming to map them to a larger register file
(e.g. x86_64), do the user-visible register names make it into the microcode;
or do the microcode ops function directly on register-file offsets?

To answer these questions, though, we'd probably need a _survey_ of microcode
for various CPUs, including modern ones. So I'm not holding my breath. Unless
an engineer from Intel or the like wants to jump in!

\------

I've also been curious whether there are any lessons in the design of
microcode ISAs, that can be applied in the design of abstract-machine bytecode
ISAs.

Right now, most bytecode ISAs are semi-idealized RISC ISAs, with some load-
time specialization of bytecode into VM-specific ops; but rarely is there
recompilation of bytecode into VM-specific microcode. I'm curious why that is.

~~~
monocasa
There's two vague categories of microcode, with different answers for each.

This article is an example of vertical microcode. 15 to 64 bit wide
instructions that look a lot like a regular assembly, just what you'd expect
after the first stage of decoding, with the some of the encoding wrinkles
ironed out. Source register always in the same place in the bit pattern kind
of stuff. This will normally look a lot like the host ISA; two address versus
three address kind of stuff. Might still have memory RMW ops if it's a CISC
ISA.

Then you have horizontal microcode, which is a wider word. I've seen 64 bits
to 256 bits. It's simply most of the control lines for the processor uarch
state concatenated together. You'll have sometimes dozens of fields that
always mean the same thing, and when you look at the schematic you clearly say
to yourself "oh, these three bits are this mux, the four bits next are this
mux, this next line is an enable", etc."

It's not uncommon to have both. The 68k's main microcode looks vertical, then
has a horizontal microcode for it's "nanocode" deeper in the core.

~~~
bogomipz
Could you explain the flow or how the vertical and horizontal microcode
interact? If a bit pattern is read into the instruction register that pattern
is used as an index into a translation ROM? Is that translation the horizontal
microcode that drives everything including further lookup and translations to
mico-ops i.e vertical microcode?

------
viler
Outstanding work - never fails to amaze me when people unearth little secrets
like that 4 decades after the fact. That MUL/IMUL/IDIV status bit hack is one
for the ages.

~~~
drfuchs
And he found it by “reading the code”!

------
ajenner
Author here if anyone has any questions.

~~~
derefr
> While most of the unused parts of the ROM (64 instructions) are filled with
> zeroes, there are a few parts which aren't. The following instructions
> appear right at the end of the ROM [...]

Given that they're right at the end — and seemingly intentionally written
there _after_ the rest of the unused space before them was zeroed — might
those bytes be a checksum of the ROM?

~~~
ajenner
I don't think there's anything on the chip that could compute a checksum of
the microcode ROM contents. It could be some kind of copyright message
perhaps, though I don't know how it's encoded and it's only 42 bits long so
there isn't much space for anything meaningful.

~~~
derefr
I would guess that it’s not a runtime-verified checksum, but rather a simple
embedded “sum complement” value, used for ROM-mastering-time integrity
verification.

A sum-complement value is a value computed _from_ some data, such that, when
the data is checksummed with the sum-complement value now embedded _into_ it,
the data will sum to zero. This approach to checksumming is useful, as any
potential verifier just has to throw the image-as-a-whole through the
checksumming algorithm, and ensure that the output is zero. It doesn’t need
one iota of knowledge about _what_ it’s verifying. It doesn’t even need an
extra machine-register to hold the expected checksum.

These “blind” checksums allow ROM production hardware (programmers, copiers)
to both pre-verify the integrity of the input image, and to post-verify that
it has programmed the image onto a chip successfully. No special container
format for the ROM image is required, nor is the ROM image required to be
structured in any particular way (which is good, because ROMs are used for all
sorts of things, not just code.) The ROM image can be any opaque blob, just as
long as it sums to zero.

In fact, you don’t even need a ROM “image” at all. It’s possible to integrity-
verify a programmed ROM “against itself”; and thus, a hand-programmed ROM
(e.g. an EEPROM you programmed in your office) can be sent to the duplication
facility to serve as the reference from which mask-ROM masks will be
generated. The data on the EEPROM can be trusted, because it sums to zero. And
the mask ROMs themselves can be checked for flaws by seeing whether _they_ sum
to zero.

For smaller-scale ROM distribution, ROM-to-PROM bulk copiers are used. These
copiers can be made to both pre-verify the source, and to post-verify the
programmed copies. Using this approach to checksumming, the copier can avoid
having to verify the source “against” the destination, instead only needing to
verify the source once, and then verify the destinations against themselves.
This both speeds up verification; and allows for the use of simpler
microcontrollers in these copiers, which reduces their design cost. (By quite
a lot, back in the 1970s, when all this was most relevant.)

You can see this approach to checksumming in practice in early-generation game
cartridge ROMs, which almost always have these embedded sum-complement values
(and so presumably were integrity-verified during mastering/duplication.)
These sum-complement value fields get referred to by emulators as “the
checksum” of the ROM image—but technically, they’re not; if you’re following
along, you’ll realize that “the checksum” of such ROM images is zero! :)

~~~
bogomipz
"In fact, you don’t even need a ROM “image” at all."

What exactly is a ROM image? Is it just the ROM contents encoded in some
defined file format? If so what would a common format be.

~~~
derefr
I was being kind of loose with terminology; technically, a “ROM image” is an
image (i.e. a replica, like a disk image) _of_ a ROM chip.

ROM is random-access for reads—it’s “memory” in the same sense that RAM is
memory, wiring onto a device’s address bus and so becoming part of that
device’s physical memory layout.

So when people say that a game-cartridge backup device or the like captures a
“ROM image”, what they really mean is that it captures “a snapshot of what the
mapped region of the address space that the ROM chip _claims_ to map for — or
seems to be wired to — looks like.” Sometimes there’s metadata in the ROM
itself saying what region the ROM maps for. But since the ROM is just a
physical chip sitting on the bus, it can map or not map for any address
arbitrarily (as long as it has the correct address lines wired to discriminate
that address from other addresses.)

This is what results in so-called “overdumps” — this is where a ROM chip
doesn’t actually respond to all the read requests that its mapping claims it
does, and thus, for some reads (usually the ones at the top end of the ROM’s
address space) you don’t get a response from the ROM, leaving the data bus
floating (“open bus”), giving you undefined data for those reads.

This is why I say that a ROM image is technically an image of the address
space a ROM occupies as discovered by requesting those addresses, and not an
image of the ROM’s contents per se: most ROM images are, in fact, overdumps.
It’s just that more modern systems have pull-up resistors on the data bus to
ensure that reads the ROM doesn’t deign to respond to, read off as zero.

ROM copiers are really “ROM image” copiers — they work by programming the
destination ROM(s) with the data discovered by probing the source ROM’s
address space, as above. If the destination ROM is larger than the source ROM,
the destination ROM will record an overdump of the source ROM.

All that being said, when originally _programming_ an EEPROM, the ROM-
programming device doesn’t actually interface to your computer as writable
random-access memory. It interfaces as, essentially, a hybrid serial/block
device — i.e. a device where you can _either_ write (program) one byte to an
arbitrary address, _or_ write (program) a whole ROM-block (usually 64 bytes)
at a time. You can also _erase_ an entire block.

In other words, functionally, an EEPROM accessed through a programming device
acts very similarly to flash memory accessed through a flash controller.
(Flash memory is, in essence, an EEPROM technology with very fast writes
trading off against slower, block-at-a-time reads rather than bus-speed byte-
at-a-time reads.)

What that means, in practice, is that there’s no particular constraint on how
you first program the data into the EEPROM you’re going to be mastering PROMs
with. There’s no “ROM programmer file format”, any more than there’s a common
file format used to descriptively represent the instructions the various
mkfs(8) utils use to initialize filesystems onto a block device. Programming
EEPROMs is a _procedure_ , not data per se.

That being said, if we wanted to represent the process of programming an
EEPROM _using_ modern file formats, a CUE sheet (or equivalent) would probably
be the best approach. A CUE sheet isn’t a description of the intended result,
but rather a sequence of instructions for an abstract “burner” to go through
to _produce_ a result. Unlike a ROM image, which just tells you what you got
when you tried to read from the addresses in an assumed-mapped memory region,
a CUE sheet tells you what some other device originally tried to _put_ at
those addresses, and so lets you figure out which reads are “true” answers
from the ROM, vs “open bus” answers, vs. de-facto responses from a pull-up
resistor. (It also lets you emulate the process of cell wear, and so figure
out which cells were intentionally “programmed to death”, allowing a faithful
representation of “indeterminate state” addresses, much like the Applesauce
image format[1] does for magnetic-flux media.)

[1]
[https://wiki.reactivemicro.com/Applesauce#Applesauce_Image_F...](https://wiki.reactivemicro.com/Applesauce#Applesauce_Image_File_Format_.28.WOZ_and_.A2R.29)

So, to be clear, there's no _defined_ file format for ROMs _generally_. You
know the size of the EEPROM chip sitting in the programmer; you have some data
you'd like to write (maybe in a file; maybe as a stream); as long as the size
of the data is less than the size of the chip, you can just dd(1) the data,
blockwise, onto the programmer block-device, and you'll get a programmed
EEPROM.

But if you want to make this friendly to consumers — say, if the EEPROM is
your computer's BIOS ROM — then you take a ROM image you've constructed some
other way; wrap it in your own format with checksums et al; create a "flasher"
program that first verifies the integrity of the ROM image against the
checksum, and then dd(1)s it to the EEPROM programmer block-device. Usually
the file _extension_ OEMs decided on for these ROM-in-container files was
".bin". Doesn't mean anything; they were arbitrary formats, or sometimes not
formats at all, just raw ROM images.

~~~
bogomipz
Thanks for the wonderfully detailed reply. I had a follow up question does the
ROM designer or any part of the ROM itself ever have to know where in memory
it is mapped to?

~~~
derefr
They almost certainly do. The set of system architectures that relied heavily
on memory-mapped ROM, are almost exactly the same as the set of systems that
don't have any concept of virtual memory, and where achieving position-
independence (i.e. indirecting through some kind of symbol table) would be a
huge waste of CPU cycle budget.

An interesting "exception that proves the rule" is the "option ROMs"
([https://en.wikipedia.org/wiki/Option_ROM](https://en.wikipedia.org/wiki/Option_ROM))
on modern PCI-e cards, e.g. GPUs, NVME controllers, etc. which provide
capabilities to the BIOS, like writing to the GPU's framebuffer.

These ROMs _aren 't_ position-independent (i.e. they always get mapped to the
same physical memory region during BIOS bring-up) but their contents _are_
position-independent code. This is because they're not actually ROM that lives
on the CPU's address bus where the CPU could ever execute from it; but rather
these ROMs live on the MMIO bus, which in x86 at least, can only be interacted
with via specific IN/OUT instructions.

As such, even though BIOS option ROMs all wire to the same physical address†
_on the MMIO bus_ , they get copied into RAM in order for the CPU to execute
on them, and so the code in those ROM chips has to be position-independent
code.

† You might wonder, then, how the BIOS manages to read off a particular option
ROM, when multiple ROMs could be wired to the same MMIO address, and thereby
_all_ respond to the same latched MMIO in request, making a mess of the MMIO
data lines. My understanding of the spec, is that the BIOS just powers
PCI/PCI-e devices on and off one by one during early boot, such that only one
option ROM can be wired at a time; and does all its interaction with said ROM
while it's isolated like this. The ability to do this "early power-on" — that
maybe _only_ powers on the wired ROMs and nothing else — is an important part
of what it means for a PCI device to be "Plug-and-Play"!

~~~
bogomipz
>"My understanding of the spec, is that the BIOS just powers PCI/PCI-e devices
on and off one by one during early boot, such that only one option ROM can be
wired at a time; and does all its interaction with said ROM while it's
isolated like this. The ability to do this "early power-on" — that maybe only
powers on the wired ROMs and nothing else — is an important part of what it
means for a PCI device to be "Plug-and-Play"!"

Interesting is this what acutally makes the boot times for servers with a
handlful of option ROMs so painfully slow then?

------
dylan604
It seems like there have been a few disassembly write ups on the 8086 lately.
Are the tools getting to the point where this is possible, or just enough
people with enough serious interest in this? Coincidence? Am I seeing a
pattern that isn't really there?

~~~
ajenner
Probably not entirely a coincidence - Ken Shirriff is doing a series on the
8086 which may account for at least one of the other articles you've noticed.
My disassembly was only possible because of Ken's high-resolution photos of
the die with the metal layer removed - that's why it took me until now to do
it.

~~~
dylan604
so it's turtles all the way down? someone makes a break through that gets used
by someone else to make a different break through kind of a thing. this is why
science needs to be open. no one person/group can do it all. i just wish that
research didn't have to be done in secret to protect potential patent ability.
Let the work be published and the let the people responsible receive whatever
credit/recognition/awards deserved.

kudos for your efforts!

------
bogomipz
I apologize if this is a naive question but how do I make sense of
microcode_8086.txt in the zip file? Just using line 1 or 000. I understand
from the final column(s) that this concerns a MOV instruction and Mod R/M byte
is that correct? How do I understand what everything to the left of that
means?

000 A CD F H J L OPQR U R -> tmpb 4 none WB,NX 0100010??.00 MOV rm<->r

Similarly how do I understand what each of the different files in the zip file
represents?

The authors states: >"I used bitract to extract the bits from the two main
microcode ROMs, and also from the translation ROM which maps opcode bit
patterns onto positions within the main microcode ROM."

Is the the translation ROM the translation.txt file then? Is that the key to
understanding these files? If so why wouldn't there be more than the 38 or so
Op codes listed?

~~~
ajenner
The translation.txt file is the contents of the translation ROM which tells
the CPU where in the microcode to go for long jumps, calls and EA decoding.
The key.txt file has the details of all the mnemonics.

"000" \- this is just a line number "A CD F H J L OPQR U" \- these are the
actual bits from the ROM. "R -> tmpb" \- this is a move operation (each
microcode instruction can do a move as well as something else) copying the
value from "R" (a register described by the word length bit and either the R
field or the RM field of the modrm byte depending on the direction bit) to
"tmpb" (an internal register not accessible from the user-level ISA). "4 none
WB,NX" \- a type 4 instruction (bookkeeping) that tells the CPU that the next
instruction is the last one in the microcode burst (NX) unless a write back
(WB) to memory is needed. "0100010??.00" \- this is the bit pattern by which
this line of microcode is addressed. This one means opcodes 0x88-0x8b. "MOV
rm<->r" \- a comment added to say what this set of opcodes actually
corresponds to x86 assembler, or what it does if it's a subroutine.

~~~
bogomipz
Thank you this really helps.

One last question do the individual files. There seems to be 3 distinct file
groups:

0b.txt - 8t.txt

l0.txt - l3.txt

r0.txt-r3.txt

Do each of these represent different ROMs or different logical parts of the
two ROMs? Or am I reading too much into the naming convention?

By the way - brilliant work. This is really a fascinating read.

~~~
ajenner
The _b and_ t files are the bottom and top halves of the 9 "chunks" of the
decoder above the main microcode ROM. The l* and r* files are the left and
right halves of the four horizontal slices of the main ROM. I split them up
that way because bitract needs the bits to be regularly spaced in both the
horizontal and vertical directions.

Thanks - glad you enjoyed it!

------
kevbin
The link to [https://www.righto.com/2020/06/a-look-at-die-
of-8086-process...](https://www.righto.com/2020/06/a-look-at-die-
of-8086-processor.html) is worth clicking

------
surfsvammel
Awesome stuff. Really nostalgic. An 8086 with yellow monochrome screen was my
first computer. It ran Police Quest I, I think.

~~~
ksaj
Have you used a green monochrome screen? I still remember the first time I got
one, because it was cheaper than those newfangled amber screens.

At first I thought it was a little stupid because of how slow the fade was
when the cursor blinked, and it wasn't nearly as sharp or vivid. But within
the first few hours of hacking around, I recognized how much easier on the
eyes it was without the flickery amber that wobbled when you clacked your
teeth together, and the weird random "snow" when refreshing the screen in a
text "animation."

If only fractals didn't take an hour or so to render back then, an animated
one at modern speeds would have been quite soothing to watch that way.

Fractint - I'm shocked I actually remember the name. Downloading it from a BBS
is how I got my _second_ computer virus! Exciting times. Nostalgic is right.

~~~
dm319
Have you come across coolretroterm? It simulates the snow and wobble of those
screens, and I think, does a reasonable job. Not sure if it would work with a
graphical program though.

~~~
ksaj
I remember seeing it, I think from HN when I was still a lurker. I bookmarked
it, but never got around to trying it.

Some things are best left to the fond memories. I got an sdf.org account a few
months ago, and that quickly demonstrated to me that nostalgia occurs when
things happened so long ago that you forgot about all the not-great things
like lag, terms that don't agree on what a backspace is, newsgroup spam, every
program having completely different idioms, etc. My newsfeed config randomly
disappeared after the third day, which was a very accurate representation of
"back in the day" that I had forgotten about until that moment.

------
pkphilip
Amazing work! Can't say I understood half of what you have written, but sure
is some top quality work!

------
procd
Amazing!

