I wrote a book to learn x86 programming and it was published in 1994 back before "real digital age", oriented to be the best didactical text possible (I had to learn the hard way, I wrote what I would have had to read 10 years earlier). Over 20 reprints, recommended in all Spanish-speaking universities (yeah, I know, if I had known then what I know now I'd have done it in English). It was discontinued 10 years ago or so, they asked me to revise it for the modern world, and it didn't really make sense: rewriting it for what assembly language is and is used for in today's world would be a ton of work because it's a qualitative difference, and if I added an extra chapter explaining 32-bit and 64-bit changes and letting the publisher stamp a "2010 edition" logo on the cover would just be scamming people, which I don't want to do. Here is the link to a scanned copy of the original: https://www.dropbox.com/s/sz6rinfhyc8sai6/Lenguaje%20Ensambl... . Could prove useful if you can speak Spanish.
Honestly: I learned Z80 assembly first, in the 80s, and then switched to x86 very easily. Learn whatever assembly language first, what's hard is learning about registers, flags, memory... and if you learn it will, then you can switch to another architecture quite easily.
On this front, I can highly recommend these two resources, preferably in this order for someone totally new to assembly:
NAND to Tetris, a course that will have you build an emulated general-purpose CPU from first principles even with no prior knowledge. You'll learn exactly about registers and memory by making them. I even recommend this to non-hardware people because the way they divide each layer of complexity is great practice even in software. https://www.coursera.org/learn/build-a-computer
Microcorruption, a series of incrementally difficult MSP430 (an easy-to-understand 16-bit instruction set) exploitation exercises in the browser: https://microcorruption.com
These are both geared towards being a gentle introduction to assembly and CPU architecture principles. They don't touch certain facts of X86/64 processors like pipelining or variable-length instructions, but IMO those are best left until you're comfortable with the basics.
The book version is also great, and suitable for self-study if you prefer to learn that way: https://www.nand2tetris.org/book
A hacker I closely follow tried to play Microcorruption during a Twitch broadcast, and, to my great disappointment, he was unable to due to lack of response.
It should probably at least be explicitly noted by anyone considering doing so: I'm sure that's still super illegal.
In any case, I should probably write here that I have since tested account creation with another email and received the activation link right away.
For those interested.
I submit that if x64 existed early-on, no one would have bothered with higher-level abstractions. x64 is actually -pleasant- to code in natively.
One thing I haven't seen people mention yet is actually using assembly language; just people pointing at documentation. To learn assembly, like anything else, you have to use it. I'd suggest writing a toy program to sum string-lengths, or do maths.
One of my own recent projects was to write a "compiler" to convert reverse-polish mathematical expressions to assembly language:
It's been a few years since I touched assembly, and even when I did I largely ignored the floating-point stuff, so this was a handy refresher.
Switching to 8086 was so awkward: it had built in div, instruction, the rep prefix but most importantly so many bits - 16. I routinely used ah/al and the like.
Learning Assembly is quite easy on its right own, ability to write optimal code and routinely being compilers is a whole another story. Writing the inner loops (like grep) is what's usually left to on modern processors.
IA-32e/AMD64 is basically an extension of IA-32/x86. What you're looking for is to understand the difference between real mode, 386 protected mode, PAE, paging and long mode because the registers, instructions, addressing and sizes of structures differ.
You can learn all of this by reading the Intel software manuals and digging into it yourself.
Tools you need (for non-Windows):
- assemblers: yasm, nasm, binutils (contains "as")
- IDA Pro
- Virtualization/emulation such as VMware Fusion/Workstation (because it supports being a gdbserver), VirtualBox or QEMU
- Intel's CPUID app
You also need good references like:
- the Intel manual set (a giant PDF or a collection of several) https://software.intel.com/en-us/articles/intel-sdm
- https://sandpile.org (fairly current)
- https://ref.x86asm.net (outdated by useful)
Another helpful exercise is writing a toy operating system in Rust or only assembly. https://osdev.org has many resources and guides.
An interesting semi-abandonware static disassembler that was really good is Sourcer 8.01 from V-Communications. I think it supports Pentium II/Pro/III at most, which would include real, protected mode and long mode DOS/DOS extended/Win16/Win32 EXEs and COMs IIRC. It did a lot of memory typing and clever analysis long before IDA existed, and still interesting for retro computing.
It's worth learning the Ghidra Scripting API because you can write ad-hoc scripts with Jython syntax to automate tasks/do custom analysis.
Thanks to the better RTTI Analyzer
I’ve never seen this name before. Where does it come from?
AMD call it AMD64. Intel call it Intel 64.
Why do we need even more names for the same thing beyond these two?!
x86_64, x86-64, x64, AMD64, amd64, Intel 64, EM64T, all mean the same thing!
When Intel was developing Itanium they named the new architecture IA-64 and retroactively named their 32-bit x86 line IA-32.
After AMD released "AMD64" Intel started copying the AMD 64-bit extensions. IA-32e was a short lived name and Intel started using "Intel 64" to refer to the 64-bit extension of x86 and continued using IA-64 for the Itanium VLIW CPU line.
After several years of denying its existence, Intel announced at the February 2004 IDF that the project was indeed underway. Intel's chairman at the time, Craig Barrett, admitted that this was one of their worst-kept secrets.
Intel's name for this instruction set has changed several times. The name used at the IDF was CT (presumably[original research?] for Clackamas Technology, another codename from an Oregon river); within weeks they began referring to it as IA-32e (for IA-32 extensions) and in March 2004 unveiled the "official" name EM64T (Extended Memory 64 Technology). In late 2006 Intel began instead using the name Intel 64 for its implementation, paralleling AMD's use of the name AMD64.
I think we should call it AMD64, since that's what the people who actually designed it wanted it to be called.
The "x86-64" name is the original one, and came from AMD themselves: https://web.archive.org/web/20000817071303/http://www.amd.co... (and "x86_64" is obviously an alias for where a hyphen is not an allowed character, like identifiers on many programming languages).
The "x64" name came from Microsoft, probably due to file name length limitations (this was before Windows XP unified the Windows 9x and Windows NT lines).
IIRC, the "AMD64" name came later, probably to distinguish it better from Intel's IA-64 (Itanium).
Intel didn't want the x86 to be 64bit.. they wanted the world to switch to Itanium for that. I figure a lot of the naming mess can be traced back to Intel's marketing.
--- Matt Pietrek's "Just Enough Assembly Language to Get By" - http://bytepointer.com/resources/pietrek_asm_pt1.htm
--- Hongjiu Lu "ELF: From The Programmer's Perspective" - http://beefchunk.com/documentation/sys-programming/binary_fo...
--- Computer Systems: A Programmer's Perspective (3rd Edition) by Bryant & O'Hallaron - https://www.amazon.com/Computer-Systems-Programmers-Perspect...
--- Modern X86 Assembly Language Programming by Daniel Kusswurm - https://www.amazon.com/Modern-X86-Assembly-Language-Programm...
--- Low-Level Programming by Igor Zhirkov - https://www.amazon.com/Low-Level-Programming-Assembly-Execut...
I haven’t finished the 3rd edition, but I made it about 3/4 of the way through the 2nd edition and loved it. I picked up the 3rd specifically for the x64 material.
But the early CPUs were actually quite simple! At the time "CISC" was a good thing because it provided straightforward mechanisms for expressing things (e.g. "push", "call", load a struct field with an offset...) that real world programmers needed to do all the time.
You rarely actually use 64 bit integers. So much of the processing is on strings which use 8 or 16 bits.
Furthermore, your Intel x64 processor still contains the features first introduced with the first IBM PC using the 8088 which in turn was conceptually compatible with the 8008 and 8080. When you try to understand a modern Intel or AMD CPU, you need to be aware of the legacy of 40+ years of backward compatibility. Once you understand the historical decisions, SSE, AVX architectures start making more sense.
Take a look at the complexities of 8080/Z80 or 6502 addressing modes, or the kind of tricks "big" 16 bit architectures like the PDP/11-70 were playing to stretch addressible memory. The 8086 was a breath of fresh air! Your code could be separate from your stack and from your heap, and all three could be a full 64k without any crazy bank switching or copying! And all you had to do was set up 4 segment selectors and then ignore them. And everything you were used to running did so in a clean, unconstrained environment.
Really, read that 8086 datasheet again (it's like 8 pages), it was great stuff at the time.
If you know how to write low-level C (i.e. with direct Win32/POSIX API calls), that's a good start. If you don't know what that means, you need to master this first. So much of assembly is built to support higher-level programming, things like segments, indirect references, system calls, etc., so it pays to know the WHY of all of it (e.g. do you know what a frame pointer is? why it's useful and can sometimes be omitted)? Learn this stuff first.
Once you have a good handle on C, you need to start learning how operating systems and memory management work, at least the "client side" of them accessible from userspace.
Then you might want to dip your toe into how CPUs are built, with pipelines, registers, caches, all of that.
If you've mastered those things, gcc -S will provide everything else you need (another comment suggested this).
I learned this stuff in a traditional university computer engineering program. It helped a lot. But for context, this question feels a little like, "can someone explain quantum physics"? It's a huge topic but only like 5% is the actual thing you're asking about, the other 95% is the conceptual machinery (calculus, probability, mechanics) it's built on. Mastering all the other stuff is actually the hard part. For all intents and purposes, assembly is just a notational convenience for expressing program structure in the only way a CPU can understand it. At least 90% of the difficulty is understanding how CPUs work, and how to think at sufficiently low level that you can express human-useful work at such a low level of abstraction.
It would also be nice to know why you want to know this. Writing a device driver is going to be different from writing an operating system, which will be different from writing tight numerical loops in assembly. And in any case, dollars to donuts you won't be able to beat a modern optimizing compiler performance-wise.
Hope this helps.
This sounds simple but will give you a good tour of a lot of different areas: BSD sockets, named pipes/unix domain sockets for cross-thread communication, signals, memory management, threading, basic scheduling, and parallel programming (locks/concurrency), to name a few.
see this HN thread (450+ points) discussing the book at the time: https://news.ycombinator.com/item?id=21640669 It is also constantly updated.
This books is to ASSEMBLY what Richard W. Stevens is to TCP/IP and UNIX programming. Also it IS a beginners books because he makes no assumptions about the readers previous experience. It is very thorough so it's probably the only book you'll ever need on Assembly :)
 explanation from the author:
> What is with two titles? The book was named “Reverse Engineering for Beginners” in 2014-2018, but I always suspected this makes readership too narrow. Infosec people know about “reverse engineering”, but I’ve rarely hear the “assembler” word from them. Likewise, the “reverse engineering” term is somewhat cryptic to a general audience of programmers, but they know about “assembler”. In July 2018, as an experiment, I’ve changed the title to “Assembly Language for Beginners” and posted the link to Hacker News website, and the book was received generally well. So let it be, the book now has two titles. However, I’ve changed the second title to “Understanding Assembly Language”, because someone had already written “Assembly Language for Beginners” book. Also, people say “for Beginners” sounds a bit sarcastic for a book of ~1000 pages.The two books differ only by title, filename (UAL-XX.pdf versus RE4B-XX.pdf), URL and a couple of the first pages
I've just started going through it, and it's definitely made some assumptions about prior knowledge; I doubt I would have been able to get through it if I didn't already have some assembly knowledge.
You mean "invaluable" here (cannot be given a value, as it's too important). "valueless" is the opposite (worthless)! https://en.wiktionary.org/wiki/valueless
The other useful step has been to write C functions and compile the code with a good compiler (good experiences with Intel, gcc and llvm) first at -O0 and later at -O2 or -O3 to understand how high level concepts can be translated to assembly.
This requires to have at least a basic understanding of registers and basic assembly concepts but at least has helped me apply the concepts to real code.
For me, a great way of learning the latter is to pick a task with some instant gratification (audio dsp, software texmapping or shaders?) and start writing some of that in inline assembly.
I suggest looking at your C compiler output, learning about SIMD instructions and using a pipeline simulator to see why and how it performs like it does. I used VTune and AMD Codeanalyst back in the day, don't know what's the current SoA.
I mean, you have different modes, and in 64-bit mode EAX becomes RAX for example.
Some exceptions that come to mind:
* The old floating point stuff is gone. EDIT: guess I am wrong on that, was thinking of what compilers typically generate for the architecture.
* The C abi now tends to focus on pass-by-register for the first few arguments.
* If you are working in kernel mode, some stuff is going to necessarily look different.
Do you mean x87? I’m not sure it’s true that it’s gone is it? The instructions are still documented? Do they not work?
It does take effort for an OS kernel to save and restore x87 state, so I could imagine somebody dropping support.
FXSAVE saves the x87 state with the SSE state, and XSAVE generalizes that to add several optional components, depending on what your processor support (as of right now, AVX state, MPX state, 3 for AVX-512, PT state, and SGX state, IIRC).
x86 makes it pretty easy on the kernel devs with XSAVE.
It actually can be used, if you write by hand, at least in Windows.
Try to reverse engineer something you don't have the debug symbols for. One of the big pain points was figuring out the calling convention i.e. what registers correspond to which function arguments on different platforms:
For reverse engineering in particular, this was a good resource:
The first one is somewhat written in "weird English" (Russian English?), but it is still readable. It really helped me with x64 assembly as a C programmer. I have used 2 and 3 as reference most of the time and the first was basically my "main x64 assembly book".
I would also recommend getting more proficient in C programming either by studying books such as "Expert C" and the "The C Programming Language" or reading "advanced" stuff somewhere on the Internet, e.g.: http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf
The main takeaway in all this for me was learning about the call stack and the different calling conventions which gives you a clue on how recursion works under the hood.
Also when you are done learning about "practical computer architecture", i.e. assembly language programming, learn stuff about operating systems as well:
Fun fact: this is not really related to assembly programming, but functions such as setjmp() and longjmp() are used for implementing exception handling.
It starts with the very foundation and builds up, up to interfacing with C and implementing data structures, still in assembly.
Sadly, this book still references 32-bit ISA. My guess is that the author got old enough not to bother updating the book anymore (the first editions of the book were about assembly programming in DOS, then DOS and Linux, then Linux only).
Still a very valid book, as x86-64 is backwards-compatible with 32-bit x86. Also, as somebody else has written, once you understand the basics, you can mostly swap things like register names and/or look to the reference pages of your cpu/linker/assembler.
3rd edition has much info on modern x86.
I have since then ventured into Motorola 68000 assembly coding, by reading an old Amiga course in machine code from the late 80'ties. It's fascinating to learn from an old platform like the Amiga 500.
To keep myself motivated, I have written a lot of posts that chronicles my progress through the course.
I can also recommend this game, that is inspired by NAND to Tetris. Where you build the hardware in your browser :)
$ gcc foo.c -S
By far the best way to learn IMO, write C and see how it translates.
gcc -c -g foo.c
objdump -S foo.o
Stay with Intel syntax, nevermind about AT&T syntax, beyond learning on how to read it.
PC world is all about Intel syntax, AT&T is lower level on how instructions get expressed and Assemblers macro capabilities just suck compared with what TASM and MASM were already capable of on the MS-DOS days.
It looks like his x64 is just slides instead of videos which may be better, sitting through 16hrs of video when I was learning x86 was ... an experience, but a very useful one.
Anyway, whenever someone asks this question I don't hesitate to bring up his work and trainings I think they're really useful. I know this is for reverse engineering but if you can reverse engineer x64 you can certainly write it.
One of the best 32bit x86 assembly books out there.
I think it is one of the best books about fundamental computer programming overall. Even if you don't program in assembly afterwards, you will take away a much better understanding of how computers operate. Sure, it is still a beginner book and it doesn't go into much detail, but the parts it explains, it explains very well.
The only really annoying thing is the stack alignment for calls. I usually just make a macro to check rsp and align if needed. Other than that I think it is similar enough to x86 just with different registers.
It's useful to at least look at such a textbook to make sure you don't skip any of the easy, sort-of prerequisite material.
video link -> https://www.youtube.com/watch?v=vtWKlgEi9js&list=PLXYpRL4SQp...
linker -> http://www.masm32.com/
some way to understand assembly lang
Plus, you now have the ymm and possibly zmm stuff going on :)
- Programmer's PC Sourcebook
- PC Intern
- Undocumented PC
- Undocumented DOS
- Programmer's Guide to EGA, VGA ...
- Michael Abrash books on assembly including self-modifying code and graphics programming
- Turbo Assembler Quick Reference Guide
- Code Complete
Later, the MindShare series were really good.
None of them really helped with 386 protected mode programming. GDT, LDT, selectors and descriptors, paging, etc. For that you really needed the official docs. I printed the 300+ page programmer's guide in the school lab which they were not too happy about.