Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to x64 Assembly (intel.com)
176 points by AndreyKarpov on Apr 5, 2013 | hide | past | web | favorite | 76 comments

For those of you that went through Z80, 6502, 68000, x86 macro assemblers like myself, it is just me or does the AT&T syntax just suck?

I recently had to convert some code from Intel syntax with NASM macros to GAS with AT&T syntax, and boy what a pain.

I agree that indexing looks way better in Intel ("[ebx+3]") than in AT&T syntax ("2(%ebx)").

However, for me, AT&T's "mov X Y" for "move X to Y" feels better for me than Intel's "mov Y X" for "move Y from X" (who says that?). If they wanted to things in reverse, they should have named the instruction differently, for example as "load Y X" for "load Y from X" (as in "LDA #0xFF" from the 6502)

As for movl instead of mov, I like the 68K version (MOVE.L) better.

I always read

mov x, y

with coma being a synonym for equals, like mov x = y, hence it feels more natural to me.

But the biggest issues are address modes, specially the more complex ones and the macros seem very light weight in features when compared to the Intel world.

I love this comment. It shows how powerful the right metaphor can be in understanding something, which is something we obsess over at Dev Bootcamp when teaching students. It also shows how tiny affordances (http://en.wikipedia.org/wiki/Affordance) make us "think" specific thoughts. I mean, it's called "mov" so something must be moving, which means there must be a subject, object, and possibly an indirect object, right?

This is what Piaget meant when he talked about "schemata." (http://en.wikipedia.org/wiki/Jean_Piaget#Schemata) So, thank you for the new schema. :)

This particular one had never occurred to me and makes it way easier to internalize.

Sorry for gushing -- few things get me more excited than a new way to explain something. :)

I see this more as a) an example of how well humans can learn to ignore misdirection (the word 'move' hints at src, dest arguments) and b) the first small step towards C:

- read "mov X,Y" as "X = Y"

- adjust syntax so that it allows "X = Y" (what you see is what you think)

- similarly, replace obscure syntax for indexed memory acces by such things as "X = Y[3]"

- getting annoyed with the seemingly random limitations of the language, add an expression parser that translates "X = 2 * Y + 3" into "X = 2 * Y" and "X = X + 3", each of which gets assembled into one instruction. For now, only allow expressions that get away with only using the result register for temporaries.

- use existing macro capabilities to build a library of control flow statements such as IF and WHILE.

- introduce standard way to call subroutines.

- introduce shorthand method for doing such calls: one for the call site that takes a couple of expressions as argument, and one for function entry that uses macros such as 'int' and 'char' to pop arguments from the call stack.

By that time, one almost has K&R C.

I wrote my first compiler in M68000 assembler in a similar fashion to your steps. I started by allowing it to recognize M68000 assembler opcodes, and when found it'd just copy the line to output, otherwise it'd parse and compile the line. You could use register names and sizes directly in the statements.

So e.g.

    D0.W = 5
Would translate into:

    MOVE.W #5, D0
I didn't use macros though - handling basic argument passing is simple enough.

As soon as I had the basics of procedures/functions, simple expressions and argument passing in place I started rewriting the compiler using it.

Back in the 90's TI had a processor, maybe a DSP, I am not sure about it, which used a form of light C as their assembler.

There wasn't an Assembly language available as we tend to know.

I don't remember the name of the processor, though.

"It shows how powerful the right metaphor can be in understanding something"

Just like how I was taught the difference between "<" and ">". Our teacher drew the rest of Pac-Man around the "<" or ">" so that we'd think of "<" or ">" like the mouth of Pac-Man. It wants to eat the larger number. But that was back around 1983 when Pac-Man was all the rage (when I was drawing Pac-Man eating ghosts on my folders).

My elementary school teachers used a similar metaphor with alligators. But I never could remember which number it was that got eaten, so I just figured the bigger side was for the bigger number. Since an equal sign doesn’t have a “bigger side”, it would imply that both sides are the same. Then I wondered why they didn’t just teach that. Then I liked school even less.

(Oddly enough, my childhood intuition about equals signs isn’t historically accurate—the equality is actually supposed to be represented by the equal length of the two strokes.)

The greedy alligator always wants the bigger prey.

It took me a long time to differentiate the symbols because, before using the alligator metaphor, my teacher tried this one: cross a line that goes perperdicular to the bottom part of the symbol. If it turns into a 4 it's smaller than; otherwise, if it turns into a 7, then it's bigger than.

I remember seeing a "5 < 8" and thinking "ok, I know which number is smaller and which one is bigger, and I know how which symbol is named, but which one should I use?" For some reason it wasn't obvious to me that I should read the assertion from left to right. I just saw two numbers and I didn't know if I should indicate that one of them was bigger or that another was smaller—both assertions were true. I remember having a list of exercises to do as homework and just circling the bigger numbers in frustration.

When I hearded the alligator metaphor, however, I got it instantly.

I learned the same thing in school, but with an Alligator instead of Pac-Man =).

wow. what? what's all this stuff about alligators and pac-men? Isn't the side with two lines just bigger than the pointy side? how do these new entities help? I'm really confused by the small side being the alligator that wants to eat the big side (or did i get that wrong?), let alone the stuff about 4s and 7s!

No problem! I enjoyed reading your post. :)

Nice! And, I always read mov x, y as analogous to memcpy(x, y, size).

"Move" is called "load" on other architectures; indeed, the corresponding 8080 instruction is LD.

If x86 used LD instead of MOV, it would make more sense. LD AX,4 would be read as "load register AX with the value 4"; LD AX,BX would be "load AX with what's in BX"; LD AX,[SI] would be "load AX with what's in the memory location pointed to by SI" and so forth.

I agree that it does seem backwards initially to have OP DEST, SRC, but it does match the other ISAs that are still being used these days like ARM or PowerPC. I have found it really hard to go from working one day in PowerPC assembly to the next working in AT&T-format x86 because of the operand ordering. Plus all the other bits and pieces that make AT&T a pain, like the required sigils, LEA format, use of immediates with MOV, instruction size suffixes, etc.

Not only that, sometimes the instruction names between AT&T/Intel are totally different! http://blog.reverberate.org/2009/07/giving-up-on-at-style-as...

Since all of the official docs use Intel syntax, I think that using AT&T syntax is a mistake.

> For those of you that went through Z80, 6502, 68000, x86 macro assemblers like myself, it is just me or does the AT&T syntax just suck?

My assembly programming background is heavily Intel syntax based, I've done some x86 programming in the 16 bit days and also some Z80 for my TI calculators. So I found the AT&T syntax difficult to read (and write) at first.

But recently I was working on an operating system project and I needed some assembly code. I initially used NASM, then tried GAS/GCC inline asm with .intel_syntax, but I finally settled on AT&T syntax. It was easier to have just one asm syntax in my program (for inline asm in C code and in separate asm files). And it was easier to integrate in my build and toolchain, and no need to install another assembler for my target arch, just gcc+binutils.

So, in the end it was less pain to use AT&T syntax when it comes to tooling. But here's the thing: in less than a week I didn't mind the syntax any more.

So yeah, you might (as I did) think that the AT&T syntax sucks, but sometimes it's easier to use that than to reconfigure tools to use Intel syntax. But just get over it and use the default syntax of your tools. It's not such a big deal.

Once you get used to AT&T syntax, Intel's seems somewhat verbose. That said, deciphering complex addressing modes (especially for LEAs) in AT&T is pain. So each syntax and its quirks...

I wouldn't call it more verbose.


    sub    $0x8,%rbx
    callq  *%rax
    mov    (%rbx),%rax

    sub    rbx, 0x8
    call   rax
    mov    rax, [rbx]
Intel can be written with:

    mov    rax, QWORD PTR [rbx]
But it's redundant and assemblers don't expect it. It's only necessary in a handful of places to avoid ambiguity, as opposed to the incessant size suffixes and $/% prefixes, which make AT&T feel more verbose to me. Definitely a matter of familiarity, though.

Did you try '.intel_syntax'? I've had some success with it, but it clearly needs some code contributions.

Using AT&T syntax was a requirement.

You can convert it automatically.

How can I without going to the trouble of writing an assembler myself?

Anyway, the job is already done.

I can't stand Intel syntax exactly because it is "backwards" compared to M68k syntax.

The calling convention is a god send.

One thing to watch out for people learning IA-32(e) asm is that there are two syntaxes: Intel and AT&T. Intel, unsurprisingly, uses Intel syntax. GCC, however, mostly uses AT&T syntax (for historical reasons, and that's all I'll say).

You can find a reference to the GAS syntax at http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax. I wouldn't use it for a reference to any of the instructions (for that grab the linked Intel manuals in this article), but that should get you familiar with the differences if you decide you'd like to write GAS.

You do have the option to set GAS to use Intel syntax with a directive, but if you're modifying legacy code or just want to be consistent with a lot of other GAS code, you may want to use AT&T.

I learned by working through the software developers' manuals. Many years ago, they used to ship them as physical copies. Nowadays they just send CDs.

PDF copies: http://www.intel.com/content/www/us/en/processors/architectu...

When did they stop shipping physical copies? I ordered the new manuals for VT-x not 4 years ago and they were happy to ship me the full set of volumes for free.

I tried to order in january 2012 and they said back then that they replaced the free physical books with free CDs. Now it says:

Note: We are no longer offering the Intel® 64 and IA-32 Architectures Software Developer’s Manuals on CD-ROM. Hardcopy versions of the manual are available for purchase via a print-on-demand fulfillment model through a third-party vendor, Lulu (please reference 1 and 2 below): http://www.lulu.com/spotlight/IntelSDM.

Apparently nowadays they refuse to send CDs and demand that you buy them through Lulu.

Hmm that must be recent. I have CDs from January 2012

That 3D "cross" graphic (Figure 1, "General Architecture") might be the worst illustration of a programmer's model for a CPU I've ever seen.

I'm sure it makes sense after you've read the accompanying paragraph a sufficient number of times (once didn't cut it, for me), but not so sure it helps in that understanding.

I guess it's a funny hint that the x86's architecture is complex, when a figure that's just trying to describe the register names is this complex. :)

I don't think thats the right graphic - a more reasonable graphic is in the PDF which the article is a copy of.

Something's going horribly wrong, I'm seeing a screenshot of code to read and parse CPUID.

The ARMv7 VFP/NEON register layout is just as much fun. Maybe more when you add in the constraints that some instruction need to operate on consecutive registers, and some of them are required to begin on an even-numbered register.

The pictures with 3 dimension axes have no sense at all. Speaking as somebody who made a reduced set assembly for x64. Did you find something really new or would you just like to confuse more?

I still don't get it. And I've done a bunch of Intel asm. Besides, there are 24 blue boxes and one yellow one. How does this relate to the "16 general purpose 64-bit registers"?

Intel keeps making up new names for their x86-64 architecture. Now 'x64' did they use that one before? Is IA-32e or EMT64 not better? Also if you want to find out quickly what they mean by 'x64', why is the article tagged with 'ia64' (which means Itanium).

- AMD64 -> Original x86 64bit extension from AMD.

- EM64T (EMT64 is a common typo) -> Intel's implementation of AMD64, intel traded SSE3 to AMD for AMD64.

- IA-32e -> Same as EM64T, Intel used this name for a bit, mostly during development.

- INTEL64 -> Intel renamed EM64T to be more in line with AMD64's naming.

- x86-64 -> Overarching instruction set, AMD64 and INTEL64 are implementations.

- x64 -> Shorthand for x86-64.

x86-64 / x86_64 / x64 are just names for the overarching ISA; AMD64 / INTEL64 are implementations.

The AMD64 and INTEL64 implementations aren't actually identical, there are a few differences between them which compilers generally deal with by producing binaries that can handle either implementation.

The x86-64 / x86_64 / x64 thing isn't Intel or AMD's fault, it came about because OS vendors used different terms for the arch. Linux for example added support when AMD64 was the only thing around and it wasn't clear what Intel was going to do, they were still pushing Itanium. So the linux kernel used AMD64 as the arch name. Later when Intel licensed AMD64 linux wasn't about to completely rename the arch (and thousands of packages as a result) so now in linux the AMD64 arch strangely supports both AMD64 and INTEL64 targets. On the other hand Apple used x86-64 and x86_64 in the OSX kernel and Sun / Oracle decided to use x64. Really the OS vendors created this mess in combination with awkward timing of support between AMD and Intel.

Related is the term x32, which is an architecture (from the point of view of the Linux kernel) for x86-64 code designed to run in 0x86-64 32-bit mode.

Why would you do this? Given that you never use 4 GB of memory, you can save a few bytes of memory (32-bit pointers instead of 64-bit) while still getting the extra registers and operations of x86-64 code.

From memory, I seem to remember that the x64 name originally came from Microsoft. They certainly make good use of it, at least, so Sun/Oracle isn't alone :)

Just to fill in a (the?) missing piece in an otherwise well assembled puzzle ;)

I can vouch for the fact, though, that internally the Windows team refers to it as "amd64".

Well, I suppose it wouldn't be expedient for Intel to call it amd64, but of all the nicknames for that arch, I do hate x64 the most. Do people think x86 was a line of 86-bit processors that got slimmed down to make the x64 line?

First generations of consumer x64 processors were named Athlon 64, so you can read x64 as Athlon 64 compatible if you really want uniformity.

x86 also started as unofficial shorthand.

x64 has actually been used for quite a while both inside and outside Intel. It's simply a shorter version of x86-64, with all the other alternatives (IA-whatever and EMT<whatever>) never really catching outside Intel.

I frequently see recommendations that intrinsics be used instead of assembly. The theory is that they are more portable and just as efficient, which they certainly are on an instruction by instruction level.

But I'm not having any luck at writing entire loops with them. All the compilers I've tried (gcc, icc, clang) feel compelled to "optimize" the intrinsics for me, turning my well-tuned and port-conscious loop into a hash.

For a tight loop of all intrinsics, I can often get a 20% improvement if I use raw assembly in the same written order. Is there any good way to convince these compilers to "do what I said" without turning off all optimizations everywhere?

Put it in a separate translation unit and compile that with different flags.

But seriously, if you're really at the spot where you have real, debugged code that is better than compiler output, the compiler is providing no value and you might as well use the assembly you've written. The point to intrinsics is to expose hardware features that the C language doesn't, not to promise to do it better than the compiler.

Put it in a separate translation unit and compile that with different flags.

I've tried it in separate units, and haven't been able to get things out in the order I have them in the source. ICC does the most rearranging (which in most cases other than this leads to faster code), and GCC generally does what you say. But both do things like turning my reloads from memory (to reduce port pressure) into register copies. I was wondering if there were flags I'm not finding to prevent this.

If you're really at the spot where you have real, debugged code that is better than compiler output, the compiler is providing no value and you might as well use the assembly you've written.

Likely the best plan, but this is being used in an open source library where the rest of the work is done with intrinsics. It's a "when in Rome" thing rather than a technical concern.

Poor explanations such as this annoy me:

"By replacing the initial R with an E on the first eight registers, it is possible to access the lower 32 bits (EAX for RAX). Similarly, for RAX, RBX, RCX, and RDX, access to the lower 16 bits is possible by removing the initial R (AX for RAX), and the lower byte of the these by switching the X for L (AL for AX), and the higher byte of the low 16 bits using an H (AH for AX)"

It makes learning more difficult. Something like this would be way better:

"Specific parts of the registers can be accessed separately from the rest of the register, within strict limitations dictated by the format of instructions. These register-parts are given easy-to-remember names. The lower 32 bits of the first eight registers can be accessed as EAX, EBX, ECX, EDX, EBP, ESI, EDI and ESP. The lower 16 bits of registers EAX, EBX, ECX and EDX can also be accessed as AX, BX, CX and DX. Finally, both the lower and the second-lower bytes of registers EAX, EBX, ECX, EDX can be separately accessed as AH, AL, BH, BL, CH, CL, DH and DL, with AL/BL/CL/DL being the lowest-order byte and AH/BH/CH/DH being the other one.

Knowing what things are due to what part of the model, and what parts are conventions, is a big part of understanding, which is just building a good model of the architecture in your mind.

It took me years to understand two's complement arithmetic, and the key missing point was that it is just a convention. A convention with great practical advantages, but a convention anyway.

"Intel stores bytes "little endian," meaning lower significant bytes are stored in lower memory addresses."

I would like to meet the person that doesn't know what endianness is, but can still understand that explanation.

It's an okay refresher for those who forget which way is which.

I agree that prose was not the right way to communicate this; BE & LE are much easier to explain via example (1 -> 00:00:00:01 or 01:00:00:00).

Except that now the reader has to understand what you mean with your xx:xx:xx:xx syntax. Having examples is good, but then you should provide a small graphic with boxes representing bytes in memory and associated numbers indicating memory addresses. And once you do that, you've actually just provided a visual illustration of exactly what the quote says: least significant byte at lowest memory address vs. highest memory address.

It's even worse trying to explain Gulliver's Travels (and the origin of Endianness) to engineers that do know it in the data-storage sense.

Speaking of which I think this is one of my fav papers to read from an historical computing pov:

ON HOLY WARS AND A PLEA FOR PEACE - Danny Cohen 1 April 1980 http://www.ietf.org/rfc/ien/ien137.txt

Please try. I've no idea what you're talking about.

Jonathon Swift's "Gulliver's Travels", a satire of 18th century England, included in it a long-running bloody war between two religious sects that started over the correct way to eat a soft-boiled egg: the big end, or the little end.

For those not familiar with the history of England, the tome of Western literature (and the genre of satire), or soft-boiled eggs, I've gotten some confused looks when trying to explain this origin. Plus, it doesn't really add anything at all to the understand of endianness in the computer hardware sense.

x64 is so much nicer than x86 --- ip-relative addressing, more (mercifully, numbered) registers, and (best of all) a register-based calling convention.

Is there a good Windows assembler that doesn't get flagged by anti-virus software?

UPDATE: I just ran across JWAsm and haven't encountered any problems so far. http://sourceforge.net/projects/jwasm/

How about just writing CIL for the .NET CLR? It's fine if you just want to get used to assembly.

Intro at: http://msdn.microsoft.com/en-us/magazine/cc301368.aspx

Ecma spec at: http://www.ecma-international.org/publications/standards/Ecm...

FASM[1] is quite good, particularly its macro system, and the Windows version includes a minimal IDE. FASM and MenuetOS[2] are both written entirely in FASM.

[1]: http://flatassembler.net/

[2]: http://menuetos.net/

I haven't tried it; but how about Yasm?



Thanks. Looks like it requires Visual Studio in Windows. Was hoping to find a simple "ASM" -> "Windows EXE" tool. I might try WinASM again at home.

Actually, Yasm doesn't require Visual Studio. To create .exe files you'll need a linker, but it works fine with ld.exe from MinGW.

Going assembly straight to .exe sounds convenient for learning, but isn't very useful in practice, so not many assemblers implement it. The most common use case for assembly language these days is to write a few functions that get called from some higher level language, and that's done by having both the compiler(s) and assembler spit out object files that are linked together by the linker. I suspect WinASM is just hiding the steps, and it'd be easy enough to create a Makefile that did the same.

The closest thing to assembly->exe that I know of is the "flat" binary output from Yasm. It's similar to really old DOS .com executables. I'm not sure they'll run on modern Windows, though.

I'll have to look again, the JWasm needs a linker too. Sadly, I think it was the ld.exe from MinGW that I had problems with in the past :( .

Yea, I'm really just doing this for the fun of it, not any actual needed reason. If the "flat" version you're talking about is the old 16-bit .COM format, then yes it will not work on modern 64-Bit windows. That was the first method I tried with an old version of MASM as it was really easy to assemble directly to a program back in my Assembly class days.

Can you elaborate on what assemblers get flagged by what anti-virus software? I haven't heard that one before.

I think it's due to heuristic checking. I just got flagged with the WinASM mentioned in the article and McAfee on my work machine deleted the EXE automatically. I've had similar issues in the past, I think with NASM.

Assemblers are obviously only used for evil hacking purposes! =)

This is awesome, now I have something to do tonight. :-)

Why is there no R8H?

It's because of the encoding: http://cs.smith.edu/~thiebaut/ArtOfAssembly/CH04/CH04-3.html (Section 4.7).

Look at the MOV opcode. In 32-bit code, the opcode byte has one bit that specifies whether the operand size of the instruction is 8 bites or 32 bits (16 bit operands are selected using an operand size prefix). In either case, the MODRM byte has three bits to encode a register number (either the source or destination register depending on a different bit in the opcode byte). That gives you 8 registers to work with.

Now the 8086 was not regular. There were 4 GPRs (AX-DX), and 4 pointer registers (SP, BP, SI, DI). It was clear by context whether an instruction would refer to a pointer register or a GPR. But it still had 3 bits for the register number. So these 3 bits were used to treat the 4 16-bit GPRs as 8 8-bit registers.

Now, the 386 regularized the architecture to make SP, BP, SI, and DI mostly act as GPRs. But it still has only 3 bits to specify a register number. Intel chose to retain compatibility with the 286, so in 32-bit mode, when the operand size is 8-bits, you can access the lower bytes of EAX-EDX, just like before. But when the operand size is 32-bits, you can access all 8 registers as GPRs.

AMD64 comes along and muddies things further. In AMD64, if the REX prefix byte exists, and the instruction has an operand size of 8 bits, then the 16 available register numbers (remember, the REX prefix carries an additional bit for each of the register numbers in MODRM and SIB bytes) address the lower byte of each of the 16 GPRs. However, if the REX prefix is absent, the same model is used as the 386, where the 8 available register numbers address the lower bytes RAX-RDX.

And that's why there is no R8H-R15H. Accessing R8-R15 requires the extra bits in the REX prefix, but even with the REX prefix you only have 16 register numbers to work with, and AMD chose to use them to make each GPR accessible as a byte register.

There’s also no R9H ... R15H. These “registers", if they existed, would be bits 8-15 of each of R8, R9, etc. Without talking to engineers at AMD, it’s probably impossible to say precisely why they weren’t provided, but there is generally little/no reason to ever want to use the existing *H registers in 64-bit code, so it’s not surprising that they were omitted.

Some combination of low/no demand and added encoding/decode/retirement complexity is an excellent reason to leave things out of an architecture.

There are also no RAH, RBH, .... You can only access AH by using the old 32-bit encoding.

This happens because in 32-bit x86, you only have access to AL, AH, BL, BH, CL, CH, DL, DH (no lower bytes of EDI, ESP, etc). These are 8 different possibilities, that are encoded by ModR/M's 3 Reg bits.

In amd64, they decided to regularize the instruction set: those 3 bits are now used to encode the lower byte of EAX--EDI (AL, CL, ..., DIL). To get R8L, ..., R15L you use the REX.R prefix, which acts as the 4th bit.

I have an older Intel PDF for the EMT64. It makes for some great reading when you want a really deep understanding of how everything works.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact