
A fundamental introduction to x86 assembly programming - nkurz
https://www.nayuki.io/page/a-fundamental-introduction-to-x86-assembly-programming
======
pkrumins
If you want to learn x86 assembly, I recommend one of my favorite books
Programming From The Ground Up:

[http://savannah.nongnu.org/projects/pgubook/](http://savannah.nongnu.org/projects/pgubook/)
(free pdf!)

This is a practical book and teaches assembly programming on Linux.

Author Jonathan Bartlett wrote this book because he was frustrated to no end
with the existing books. At the end of them he could still ask, "How does the
computer really work?" and not have a good answer. Jonathan's goal is to take
you from knowing nothing about programming to understanding how to think,
write, and learn like a programmer. You won't know everything, but you will
have a background for how everything fits together.

Fun story: I remember how I went through this book in 2004, a day before a job
interview, and I exactly got asked a question about how C functions get
compiled to assembly, how the stack and memory management works. I got that
job.

~~~
userbinator
For "How does the computer really work?" questions, I recommend this book:

[http://www.charlespetzold.com/code/](http://www.charlespetzold.com/code/)

~~~
genop
Funny you mention that one. I have been re-reading that book the past few
weeks.

This book takes a bit of a different angle on explaining computers. I really
enjoy the history.

My only complaint would be that I think he has a Microsoft bias (but then I
guess I am biased myself).

And similarly, I have a minor annoyance with the OP's mention of Linux, and
only Linux, when he touches on calling conventions. This bias toward one
system, and ignorance of others, is typical of many websites and
documentation. To be fair, the OP mostly avoids it.

To be clear, great explanations of computers to me are ones that either:

    
    
       1) take great care to stay completely neutral and only discuss universally shared traits across systems,
    
       2) go to great lengths to try to be as comprehensive as possible, including many systems and all their commonalities and idiosyncracies, or
    
       3) focus only on one system and go into great detail how it works.
    

The more the author strays from 1, 2 or 3, the less likely I am to read their
work.

Petzold pays ample attention to Morse code and similar _succinct_ ways of
communicating information. In my opinion this type of focus is the mark of a
skilled coder. When I look at the entries to IOCC, it is no surprise to me
that Morse code is (or at least was) a frequent focus of the entrants.

------
Keyframe
I was into asm back when Amiga was around and lost the will and knowledge over
time (mostly since programming is more of a hobby now, over the last couple of
years). But, I had a strong desire to get into asm again and I did it a bit
unconventionally. It worked, though.

Get a TIS-100 game and play it (that got me interested again). After that, I
compiled simple(r) C programs with gcc and looked at their .S output. After
that (along with that) grab a tool like ollydbg, x64dbg or (if you can!) IDA
Pro and open up your favorite programs and modify them. Whenever you stumble
upon an unknown (to you) instruction, look it up in the intel manual and
google for it to see idioms people use. This process has worked really well
for me, for now, albeit it feels like I'm cracking software or something like
it (it's fun though). Along with that you can start writing asm blocks in your
programming language of choice and/or full asm with any of the assemblers
(flat, yasm, nasm, whatever).

Only thing you need to know beforehand are the basics of C and data/memory
manipulation.

------
ericbb
I was surprised to read that x64 apparently doesn't allow pushing or popping
32-bit values. I have a language that uses 32 bits as the basic unit for all
values and I'm working toward x64 code generation. Should I just promote
values to 64 bits and waste half the stack? Should I use mov instructions
instead of push/pop? What solutions are other compiler-writers using?

~~~
rayiner
x86-64 is really oriented around integer values being 64-bits. For example,
32-bit operations will zero-extend the result to write the full 64-bit integer
register. The ABI also assumes integral values are promoted to 64-bits and the
stack is 64-bit aligned on calls.

That said, as long as you keep RSP aligned you can do whatever you want.
Consider this code:

    
    
        extern void value(int* a, int* b, int* c);
    
        int main() {
          int a, b, c;
          value(&a, &b, &c);
          return a+b+c;
        }
    

This is how LLVM compiles it:

    
    
        subq	$24, %rsp
        leaq	20(%rsp), %rdi
        leaq	16(%rsp), %rsi
        leaq	12(%rsp), %rdx
        callq	value
        movl	16(%rsp), %eax
        addl	20(%rsp), %eax
        addl	12(%rsp), %eax
        addq	$24, %rsp
        retq
    

Note that the int values are allocated at 4-byte alignment, but rsp is aligned
to 8-bytes. If you add an additional parameter, 'd', you'll see that the
compiler still allocates 24-bytes of stack, and stores the additional
parameter at 8(%rsp) (which is unused in the code above).

~~~
qb45
Almost correct.

If you want to call C code conforming to the x86-64 SYSV ABI, RSP needs to be
aligned to 16 bytes when you execute the _call_. If the code you generate
never calls alien code, 8 byte alignment is enough.

Since 8 bytes are occupied by return address pushed by the _call_ which
started your function, you need to decrease RSP by further 8, 24, 40, 56, 72,
... bytes before calling code generated by others.

Reason: having stack 16 byte aligned makes it easier to allocate aligned 16
byte stack variables and this is useful because x86 has 16 byte registers
(SSE) which are most efficiently loaded/stored to aligned addresses.

However, it isn't only performance that you lose by neglecting alignment. I
learned the hard way that some code generated by gcc crashes if you call it
with unaligned stack.

That's why in this example LLVM allocates 24 bytes, even though 16 would be
enough for 3 ints.

Another example (gcc):

    
    
      extern void bar();
    
      void foo() {
              bar();
      }
    
      0000000000000000 <foo>:
       0:   48 83 ec 08             sub    $0x8,%rsp
       4:   b8 00 00 00 00          mov    $0x0,%eax
       9:   e8 00 00 00 00          callq  e <foo+0xe>
       e:   48 83 c4 08             add    $0x8,%rsp
      12:   c3                      retq
    

To anyone writing x86-64 compilers, I recommend finding the x86-64 SYSV ABI
spec and reading it. Saves debugging time.

~~~
rayiner
Ah, good point. I forgot about the return address.

------
wmu
> The way the x87 FP stack works is a bit weird, and these days it’s better to
> do floating-point arithmetic using xmm registers

x87 allows to do calculations in extended precision, i.e. word width is 80
bits. SSE is limited to double precision (64-bit words). Also FPU has some
advanced math instructions, like sin, cos, tan, exp, etc., and these
operations will be available in AVX512F.

~~~
wolf550e
Have fun debugging arithmetic difference resulting from spilling an
intermediate result to memory and rounding to 64bit value vs. not spilling and
keeping the intermediate value as 80 bits.

~~~
qb45
FST/FLD can store/load full 80 bits if you need such precision. No problem
whatsoever.

~~~
wolf550e
I've never seen anyone store 10 byte IEEE-754 values in memory.

People store doubles, and people get upset when compiler optimizations (like
when to spill from registers to memory and whether to use one or two
instructions for multiply-and-add) change not just the performance but also
the result of computations.

~~~
qb45
> I've never seen anyone store 10 byte IEEE-754 values in memory.

But you can, if you are after precision as the OP apparently was.

I'm not sure what this c-word is doing in your post, I thought we were talking
assembly, but as far as c-things go, at least gcc and clang represent _long
double_ as 80b extended precision on x86.

So, for example, compiling this beauty (which is too large for x87 stack):

    
    
            long double a[64], b[64], c[64], d[64];
            // load a,b,c,d from somewhere
    
            long double x =
                    ((((((((a[0]+a[1])+(a[2]+a[3]))+((a[4]+a[5])+(a[6]+a[7])))
                    +(((a[8]+a[9])+(a[10]+a[11]))+((a[12]+a[13])+(a[14]+a[15]))))
                    +((((a[16]+a[17])+(a[18]+a[19]))+((a[20]+a[21])+(a[22]+a[23])))
                    +(((a[24]+a[25])+(a[26]+a[27]))+((a[28]+a[29])+(a[30]+a[31])))))
                    +(((((a[32]+a[33])+(a[34]+a[35]))+((a[36]+a[37])+(a[38]+a[39])))
                    +(((a[40]+a[41])+(a[42]+a[43]))+((a[44]+a[45])+(a[46]+a[47]))))
                    +((((a[48]+a[49])+(a[50]+a[51]))+((a[52]+a[53])+(a[54]+a[55])))
                    +(((a[56]+a[57])+(a[58]+a[59]))+((a[60]+a[61])+(a[62]+a[63]))))))
    
                    +((((((b[0]+b[1])+(b[2]+b[3]))+((b[4]+b[5])+(b[6]+b[7])))
                    +(((b[8]+b[9])+(b[10]+b[11]))+((b[12]+b[13])+(b[14]+b[15]))))
                    +((((b[16]+b[17])+(b[18]+b[19]))+((b[20]+b[21])+(b[22]+b[23])))
                    +(((b[24]+b[25])+(b[26]+b[27]))+((b[28]+b[29])+(b[30]+b[31])))))
                    +(((((b[32]+b[33])+(b[34]+b[35]))+((b[36]+b[37])+(b[38]+b[39])))
                    +(((b[40]+b[41])+(b[42]+b[43]))+((b[44]+b[45])+(b[46]+b[47]))))
                    +((((b[48]+b[49])+(b[50]+b[51]))+((b[52]+b[53])+(b[54]+b[55])))
                    +(((b[56]+b[57])+(b[58]+b[59]))+((b[60]+b[61])+(b[62]+b[63])))))))
    
                    +(((((((c[0]+c[1])+(c[2]+c[3]))+((c[4]+c[5])+(c[6]+c[7])))
                    +(((c[8]+c[9])+(c[10]+c[11]))+((c[12]+c[13])+(c[14]+c[15]))))
                    +((((c[16]+c[17])+(c[18]+c[19]))+((c[20]+c[21])+(c[22]+c[23])))
                    +(((c[24]+c[25])+(c[26]+c[27]))+((c[28]+c[29])+(c[30]+c[31])))))
                    +(((((c[32]+c[33])+(c[34]+c[35]))+((c[36]+c[37])+(c[38]+c[39])))
                    +(((c[40]+c[41])+(c[42]+c[43]))+((c[44]+c[45])+(c[46]+c[47]))))
                    +((((c[48]+c[49])+(c[50]+c[51]))+((c[52]+c[53])+(c[54]+c[55])))
                    +(((c[56]+c[57])+(c[58]+c[59]))+((c[60]+c[61])+(c[62]+c[63]))))))
    
                    +((((((d[0]+d[1])+(d[2]+d[3]))+((d[4]+d[5])+(d[6]+d[7])))
                    +(((d[8]+d[9])+(d[10]+d[11]))+((d[12]+d[13])+(d[14]+d[15]))))
                    +((((d[16]+d[17])+(d[18]+d[19]))+((d[20]+d[21])+(d[22]+d[23])))
                    +(((d[24]+d[25])+(d[26]+d[27]))+((d[28]+d[29])+(d[30]+d[31])))))
                    +(((((d[32]+d[33])+(d[34]+d[35]))+((d[36]+d[37])+(d[38]+d[39])))
                    +(((d[40]+d[41])+(d[42]+d[43]))+((d[44]+d[45])+(d[46]+d[47]))))
                    +((((d[48]+d[49])+(d[50]+d[51]))+((d[52]+d[53])+(d[54]+d[55])))
                    +(((d[56]+d[57])+(d[58]+d[59]))+((d[60]+d[61])+(d[62]+d[63]))))))))
                    ;
    

produces only 80b spills ( _fstpt_ in GNU syntax):

    
    
      $ objdump -d fpmonster |grep fst
      400457:       db 7c 1c 10             fstpt  0x10(%rsp,%rbx,1)
      400462:       db bc 1c 10 04 00 00    fstpt  0x410(%rsp,%rbx,1)
      400470:       db bc 1c 10 08 00 00    fstpt  0x810(%rsp,%rbx,1)
      40047e:       db bc 1c 10 0c 00 00    fstpt  0xc10(%rsp,%rbx,1)
      400b47:       db 7c 24 10             fstpt  0x10(%rsp)
      400d91:       db 3c 24                fstpt  (%rsp)
    

Clearly, extended precision _can_ be done right both in C and raw assembly.

> People store doubles, and people get upset

That's their fault :) and another story altogether. For reproducible low
precision, indeed SSE is the way to go.

------
znpy
A very nice book about assembly programming is "Assembly Language Step-by-
Step: Programming with Linux, 3rd edition"
([http://www.amazon.com/dp/0470497025](http://www.amazon.com/dp/0470497025)).

The nice thing about this book is that it guides the reader at understanding
how the machine works first, and only then to assembly programming.

The sad thing about this book is that it references 32 bit intel-compatible
processors.

My guess is that the original author has grown old and is not interested in
producing a fourth edition of such book.

On this matter, I would like to ask: is it worth learning assembly for the
x86/32-bit instructions, now that pretty much every computer is built on the
amd64 architecture ?

~~~
Cheyana
I work in an IT dept which supports almost a dozen departments that all told
use about 30 or 40 apps, almost all of which are still 32-bit. The hardware is
recent and all 64-bit (as is our OS) but even the MS Office we use is 32-bit
because of interaction with other apps. We also have to default the browser to
the 32-bit IE executable rather than the 64-bit because of plugins (even MS
recommends this). Most vendors still aren't up to 64-bit yet because they
don't want to shut out the customers that are still years behind on upgrading.
I'm thinking 32-bit will still be around for another 10 years to be on the
safe side.

~~~
pwaring
When I worked at a university, we had several pieces of legacy 32-bit
software, mostly written in C, which were essential to some courses. It became
more and more difficult to run them as Linux distributions stopped shipping
32-bit libraries by default (I think Scientific Linux 7.1 caused a lot of
problems because of this).

------
VonGuard
Anytime I see anything names my-asm or mini.asm or anything like that, it
instantly yanks me back to college. We had this awesome teacher who had been
at DEC for decades and taught at night. He'd bring in chunks of core memory,
and tell us all about the old days in between course work. God I loved that
class.

~~~
rashkov
I'm reading Hackers by Stephen Levy right now and DEC's pdp computer era as
described seems like a real golden age. I'd enjoy more history book
recommendations along this line if anyone knows of some

~~~
jacalata
The Soul of a New Machine, about building the first Data General machines.

------
dimdimdim
X86 and x86_64 Assembly and Shellcoding:

[http://www.pentesteracademy.com/course?id=3](http://www.pentesteracademy.com/course?id=3)
[http://www.pentesteracademy.com/course?id=7](http://www.pentesteracademy.com/course?id=7)

------
zatkin
I'm planning on writing a calculator in x86 assembly, so this will probably be
a good starting point for me. I want to do this to grasp a better
understanding of assembly. I currently am looking to develop on my MacBook
Pro. Does anyone have more resources and/or suggestions? Thank you.

~~~
userbinator
That is a little vague. What type of calculator exactly? A GUI one, or just an
expression parser/evaluator? For the latter, you might find this interesting:

[http://www.hugi.scene.org/compo/compoold.htm#compo4](http://www.hugi.scene.org/compo/compoold.htm#compo4)

------
nickpsecurity
Randall Hyde's work is interesting for beginners and users alike. Art of
Assembly teaches you assembly but he uses a high-level assembler to do it in
pieces. So, you can abstract away some things like in a HLL to ignore them
until you understand enough to use the raw ASM. Likewise, if you use HLA for
projects, you can do HLL stuff where understanding is more important than
performance/memory. Standard library for HLA is so large the HTML reference
about froze my browser haha.

[http://www.plantation-productions.com/Webster/](http://www.plantation-
productions.com/Webster/)

------
someoneElse123
This is a nice fundamental introduction, but where would I go to see how to
actually run code?

~~~
lallysingh
My best introduction to any asm language is just my C compiler putting out asm
(gcc -S, I think). I can create small programs to do what I want, and see what
the compiler puts out.

~~~
Liblor
[https://gcc.godbolt.org](https://gcc.godbolt.org) is also helpful, it
colorizes the assembly output corresponding to the C/C++ source. Furthermore,
you can simply change between Intel and AT&T synatx.

------
eugenekolo2
Am I the only one who feels the "intro to x86" market is oversaturated?

~~~
tbirdz
Seconding this. Is there even any advanced x86_64 assembly language material
out there besides the AMD and Intel reference manuals?

~~~
wmu
Probably no. I think when you reach some level of understanding, you'll use
manuals. Then you know what you're looking for. :)

~~~
acdimalev
And once you start reading the docs on the more efficient but far less
consistent 64-bit calling convention, you may find yourself choosing words to
describe it other than the "improved" that this author opted for.

~~~
aktau
I take it you're a fan of the plan9 calling convention (f. ex.: all arguments
and return value(s) are on the stack)?

Curiously, it doesn't actually appear all that ineffecient. Go uses it AFAIK.
I wonder whether anyone has studied it. I also wonder whether gccgo uses that
convention or defaults to SysV x64.

~~~
acdimalev
I didn't know it was a System V thing. Thank you for cluing me in!

I don't think of it in terms of like-vs-dislike. My observation is that it's a
difficult thing to get right without a compiler, and thus avoided for
introductory material.

As far as I am aware, storing values in a register requires fewer
instructions. However, I have never personally confirmed the performance
difference of this calling convention.

Handling of all calling conventions is one of the many things that I am
personally much happier leaving up to GCC and LLVM in practice.

------
mike_hock
Isn't it disingenuous to call esp/ebp (or rsp/rbp) general purpose registers
considering there are instructions that implicitly assume esp/rsp is the stack
pointer (push, pop, call, ret, ...).

~~~
wolfgke
> Isn't it disingenuous to call esp/ebp (or rsp/rbp) general purpose registers

In the sense of instruction encoding they are. Additionally, the x86-64 call
convention typically does not use ebp as frame pointer (except when you use
something like alloca). Instead functions typically allocate their nessesary
stack amount at the begin. So ebp is generally used as a general purpose
register by most compilers in x86-64.

------
gravypod
Is there something like SPARC or other RISC architectures?

~~~
userbinator
Yes there are... I wouldn't recommend starting with SPARC Asm though, it's
nowhere near as fun as a CISC like x86, nor as easy as MIPS (which tends to be
the "boring" go-to architecture CS courses use.) ARM is more interesting than
MIPS and easier than x86.

~~~
gravypod
I've done a little bit of x86 and I have to say I'm not very impressed by some
of it. It seems like the lowest common denominator where nothing 'fun'
happened.

ARM has a few ecosystem problems that I'd rather not deal with. From what I
understand there is a lack of hardware discovery. ARM is more embedded then
'user' computer. Nothing is swapable.

~~~
userbinator
You might find these size-optimisation challenges more fun:

[http://www.hugi.scene.org/compo/compoold.htm](http://www.hugi.scene.org/compo/compoold.htm)

The 256B and below categories in the demoscene are also sources of interesting
x86 Asm programs:

[https://news.ycombinator.com/item?id=7960358](https://news.ycombinator.com/item?id=7960358)

------
nayuki
Hi, Nayuki here. I'm happy to take any comments, questions, and constructive
criticism on the article.

------
danjoc
Adding to my read later. Looks like a good intro. Does anyone know of a
similar resource for ARM assembly?

~~~
pwaring
I'm not aware of a good free introduction to ARM assembly, but you could check
out the reading lists from these courses:

[http://studentnet.cs.manchester.ac.uk/ugt/COMP15111/syllabus...](http://studentnet.cs.manchester.ac.uk/ugt/COMP15111/syllabus/)
[http://studentnet.cs.manchester.ac.uk/ugt/COMP22712/syllabus...](http://studentnet.cs.manchester.ac.uk/ugt/COMP22712/syllabus/)

Despite its age, ARM System-on-Chip Architecture (Steve Furber) is still a
good introduction to the processor and assembly (I'm re-reading it at the
moment). ARM Assembly Language - an Introduction (J.R. Gibson) is worth
reading if you want to learn ARM assembly.

The materials page for COMP22712 also has some interesting resources,
including the lab manuals and a small ARM assembler written in C (source is
also on GitHub:
[https://github.com/uomcs/aasm](https://github.com/uomcs/aasm)). Unfortunately
COMP15111 is now hosted on Blackboard so you can only get at the materials if
you're a current student enrolled on the course.

(I've been an undergrad, postgrad and staff in CS at Manchester, so I'm
familiar with the courses - other universities may have similar resources with
fewer access restrictions).

