
x86 Disassembly - infinity
http://en.wikibooks.org/wiki/X86_Disassembly
======
awhitworth
It's funny to see this link come up on HN, because I wrote this book (or much
of it) years ago as a college student. I was taking some courses on
microprocessor architectures and assembly code, and was in the middle of
reading Reversing by Eldad Eilam (which had just come out at the time). I had
a mantra back then that the best way to learn a subject was to try and teach
it, so I wrote this and several other wikibooks as sort of a study aide for my
classes. This also explains why the material appears to cover barely a
semester's worth of material and why my name ("Whiteknight") doesn't appear in
the edit history after graduation in 2008.

Despite the relatively thin and incomplete coverage of the material, I've
heard from several people over the years who appreciated the work as a nice
introduction to the topic and even once received a job offer because of it
(which didn't work out, for a variety of reasons). All things considered, if I
had to change anything it would be the title to make it a little more focused.
It's not really a book about disassembly so much as it is an introduction to
what high-level language features look like when translated into non-optimized
x86 assembly code. Find me a short, catchy title that accurately describes
that, and you win some kind of prize.

I doubt I'll ever get back to this book either. I haven't worked with this
material at all since school, and don't feel like I have up-to-date knowledge
of the subject. Unless somebody else wants to jump in and fill it out, it will
probably stay the way you see it now.

I'm glad to see that this book is still around and I'm glad that people are
benefiting from it in some small way. I know it doesn't cover nearly what
would be needed for a real book on the subject (I do still recommend Eilam's
Reversing for book-o-philes) but I think it should be a decent stepping stone
to pique interest and get people moving on towards more in-depth treatments.

------
tptacek
Write a disassembler at least once. It's much easier than you think it is
(even with X86) --- it's essentially a file format parser --- and very
illuminating.

~~~
userbinator
It's also much easier to decode x86 instructions when you look at them in
octal instead of the hexadecimal that most tables use, since both the main
opcode map and ModRM/SIB are organised in a 2-3-3 layout:

[http://reocities.com/SiliconValley/heights/7052/opcode.txt](http://reocities.com/SiliconValley/heights/7052/opcode.txt)

The 8080/8085/Z80 instruction sets also look much better in octal:

[http://www.z80.info/decoding.htm](http://www.z80.info/decoding.htm)

~~~
101914
This comment validates all the time I have "wasted" reading HN over the years.

It does not suprise me that something so simple would be so well overlooked
(or, at least, "forgotten"). I wonder if I ever would have figured this out
from my own readings and experiments. Doubtful.

Great tip!

~~~
userbinator
I figured it out before/without exposure to that document, but I attribute it
to the fact that I started teaching myself at a time when octal was more
common amongst mini and micro-computers; most programmers these days barely
know any number base other than decimal, and of those who do, binary and
hexadecimal are likely far more familiar to them than octal. The official
Intel/AMD manuals make no reference to octal either, using only binary and
hex.

As an aside, ARM opcodes are (mostly) hex-structured with 4-bit fields, while
MIPS, POWER, and SPARC are not amenable to any standard number base except
binary (5- and 6-bit fields.)

------
xvilka
This book is missing very actively developed reverse engineering framework
radare2 [1], which is supporting not only x86/x86_64 but also arm, mips, ppc,
avr, arc, 8051, TI tms320c55x family and even more!

[1] [http://rada.re/](http://rada.re/)

~~~
awhitworth
The title of the book is "X86 Disassembly". It was only ever intended to cover
x86. x86_64, mips, ppc, etc are all expressly out of scope.

------
indutny
And here is a JS x86 disassembler:
[https://github.com/indutny/disasm](https://github.com/indutny/disasm) . It is
incomplete, but shows how the basics works.

------
calibraxis
Why does `push eax` perform "much faster" than the following?

    
    
      sub esp, 4
      mov DWORD PTR SS:[esp], eax
    

A brief skim of Intel's "Software Developer’s Manual" (particularly ch. 6 on
stacks), didn't seem to find an answer.

While hitting the ALU just for `sub` might be an extra step, doesn't hitting
RAM make that a drop in the bucket? (`sub` may account for less than 1%?) Or
is there some caching going on, so RAM may be updated in the background?

(I'm not an assembly programmer; very ignorant of what's happening.)

~~~
userbinator
It's a lot smaller (1 byte vs 6), which means less space spent in the cache
and decoder, reducing cache misses and decode bandwidth. The x86 also has a
dedicated "stack engine" since the Pentium M (but not suprisingly, absent in
NetBurst), which contains an adder and copy of the stack pointer to handle
push/pop operations. This is faster than using the general-purpose ALUs and
memory read/write ports, and also frees those up for use by other non-stack
instructions. On the other hand, it means reading/writing the stack pointer
explicitly between implicit stack operations incurs a little extra latency to
get the values between the stack engine and "real" ESP register synchronised.

Memory reads/writes do take a few more cycles to complete, but since this is a
write, the CPU can continue on with other non-dependent instructions following
it. All the above information assumes a CPU based on P6 and its successors
(Core, Nehalem, Sandy Bridge, Ivy Bridge, Haswell, etc.); NetBurst and Atom
are very different.

Linus also has some interesting things to say about using the dedicated stack
instructions:
[http://yarchive.net/comp/linux/pop_instruction_speed.html](http://yarchive.net/comp/linux/pop_instruction_speed.html)

Somewhat amusingly, GCC was well known to generate the explicit sub/mov
instructions by default, while most other x86 C compilers I knew of, including
MSVC and ICC, would always use push.

~~~
calibraxis
Thanks to you and awhitworth! Very interesting stuff, kept me reading. (And
soon searching to understand some of the ideas. And thinking of Linus's puzzle
about `call` being faster than a `push` before a jump. Seems to be one of
those cases where higher-level abstractions can be optimized better than
lower-level ones. I suppose because lower-level ones are too general-purpose,
while higher-level ones are constrained.)

