

Introduction to x64 Assembly - AndreyKarpov
http://software.intel.com/en-us/articles/introduction-to-x64-assembly?wapkw=%28video%20encode%20white%20paper%29

======
pjmlp
For those of you that went through Z80, 6502, 68000, x86 macro assemblers like
myself, it is just me or does the AT&T syntax just suck?

I recently had to convert some code from Intel syntax with NASM macros to GAS
with AT&T syntax, and boy what a pain.

~~~
Someone
I agree that indexing looks way better in Intel ("[ebx+3]") than in AT&T
syntax ("2(%ebx)").

However, for me, AT&T's "mov X Y" for "move X to Y" feels better for me than
Intel's "mov Y X" for "move Y from X" (who says that?). If they wanted to
things in reverse, they should have named the instruction differently, for
example as "load Y X" for "load Y from X" (as in "LDA #0xFF" from the 6502)

As for movl instead of mov, I like the 68K version (MOVE.L) better.

~~~
pjmlp
I always read

mov x, y

with coma being a synonym for equals, like mov x = y, hence it feels more
natural to me.

But the biggest issues are address modes, specially the more complex ones and
the macros seem very light weight in features when compared to the Intel
world.

~~~
jfarmer
I love this comment. It shows how powerful the right metaphor can be in
understanding something, which is something we obsess over at Dev Bootcamp
when teaching students. It also shows how tiny affordances
(<http://en.wikipedia.org/wiki/Affordance>) make us "think" specific thoughts.
I mean, it's called "mov" so something must be moving, which means there must
be a subject, object, and possibly an indirect object, right?

This is what Piaget meant when he talked about "schemata."
(<http://en.wikipedia.org/wiki/Jean_Piaget#Schemata>) So, thank you for the
new schema. :)

This particular one had never occurred to me and makes it way easier to
internalize.

Sorry for gushing -- few things get me more excited than a new way to explain
something. :)

~~~
Someone
I see this more as a) an example of how well humans can learn to ignore
misdirection (the word 'move' hints at src, dest arguments) and b) the first
small step towards C:

\- read "mov X,Y" as "X = Y"

\- adjust syntax so that it allows "X = Y" (what you see is what you think)

\- similarly, replace obscure syntax for indexed memory acces by such things
as "X = Y[3]"

\- getting annoyed with the seemingly random limitations of the language, add
an expression parser that translates "X = 2 * Y + 3" into "X = 2 * Y" and "X =
X + 3", each of which gets assembled into one instruction. For now, only allow
expressions that get away with only using the result register for temporaries.

\- use existing macro capabilities to build a library of control flow
statements such as IF and WHILE.

\- introduce standard way to call subroutines.

\- introduce shorthand method for doing such calls: one for the call site that
takes a couple of expressions as argument, and one for function entry that
uses macros such as 'int' and 'char' to pop arguments from the call stack.

By that time, one almost has K&R C.

~~~
vidarh
I wrote my first compiler in M68000 assembler in a similar fashion to your
steps. I started by allowing it to recognize M68000 assembler opcodes, and
when found it'd just copy the line to output, otherwise it'd parse and compile
the line. You could use register names and sizes directly in the statements.

So e.g.

    
    
        D0.W = 5
    

Would translate into:

    
    
        MOVE.W #5, D0
    

I didn't use macros though - handling basic argument passing is simple enough.

As soon as I had the basics of procedures/functions, simple expressions and
argument passing in place I started rewriting the compiler using it.

------
Locke1689
The calling convention is a god send.

One thing to watch out for people learning IA-32(e) asm is that there are two
syntaxes: Intel and AT&T. Intel, unsurprisingly, uses Intel syntax. GCC,
however, mostly uses AT&T syntax (for historical reasons, and that's all I'll
say).

You can find a reference to the GAS syntax at
<http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax>. I wouldn't use it for
a reference to any of the instructions (for that grab the linked Intel manuals
in this article), but that should get you familiar with the differences if you
decide you'd like to write GAS.

You do have the option to set GAS to use Intel syntax with a directive, but if
you're modifying legacy code or just want to be consistent with a lot of other
GAS code, you may want to use AT&T.

------
niggler
I learned by working through the software developers' manuals. Many years ago,
they used to ship them as physical copies. Nowadays they just send CDs.

PDF copies:
[http://www.intel.com/content/www/us/en/processors/architectu...](http://www.intel.com/content/www/us/en/processors/architectures-
software-developer-manuals.html)

~~~
Locke1689
When did they stop shipping physical copies? I ordered the new manuals for
VT-x not 4 years ago and they were happy to ship me the full set of volumes
for free.

~~~
niggler
I tried to order in january 2012 and they said back then that they replaced
the free physical books with free CDs. Now it says:

Note: We are no longer offering the Intel® 64 and IA-32 Architectures Software
Developer’s Manuals on CD-ROM. Hardcopy versions of the manual are available
for purchase via a print-on-demand fulfillment model through a third-party
vendor, Lulu (please reference 1 and 2 below):
<http://www.lulu.com/spotlight/IntelSDM>.

------
unwind
That 3D "cross" graphic (Figure 1, "General Architecture") might be the worst
illustration of a programmer's model for a CPU I've ever seen.

I'm sure it makes sense _after_ you've read the accompanying paragraph a
sufficient number of times (once didn't cut it, for me), but not so sure it
helps in that understanding.

I guess it's a funny hint that the x86's architecture is complex, when a
figure that's just trying to _describe the register names_ is this complex. :)

~~~
rraawwrr
I don't think thats the right graphic - a more reasonable graphic is in the
PDF which the article is a copy of.

~~~
bcoates
Something's going horribly wrong, I'm seeing a screenshot of code to read and
parse CPUID.

------
kzrdude
Intel keeps making up new names for their x86-64 architecture. Now 'x64' did
they use that one before? Is IA-32e or EMT64 not better? Also if you want to
find out quickly what they mean by 'x64', why is the article tagged with
'ia64' (which means Itanium).

~~~
mbell
\- AMD64 -> Original x86 64bit extension from AMD.

\- EM64T (EMT64 is a common typo) -> Intel's implementation of AMD64, intel
traded SSE3 to AMD for AMD64.

\- IA-32e -> Same as EM64T, Intel used this name for a bit, mostly during
development.

\- INTEL64 -> Intel renamed EM64T to be more in line with AMD64's naming.

\- x86-64 -> Overarching instruction set, AMD64 and INTEL64 are
implementations.

\- x64 -> Shorthand for x86-64.

x86-64 / x86_64 / x64 are just names for the overarching ISA; AMD64 / INTEL64
are implementations.

The AMD64 and INTEL64 implementations aren't actually identical, there are a
few differences between them which compilers generally deal with by producing
binaries that can handle either implementation.

The x86-64 / x86_64 / x64 thing isn't Intel or AMD's fault, it came about
because OS vendors used different terms for the arch. Linux for example added
support when AMD64 was the only thing around and it wasn't clear what Intel
was going to do, they were still pushing Itanium. So the linux kernel used
AMD64 as the arch name. Later when Intel licensed AMD64 linux wasn't about to
completely rename the arch (and thousands of packages as a result) so now in
linux the AMD64 arch strangely supports both AMD64 and INTEL64 targets. On the
other hand Apple used x86-64 and x86_64 in the OSX kernel and Sun / Oracle
decided to use x64. Really the OS vendors created this mess in combination
with awkward timing of support between AMD and Intel.

~~~
maggit
From memory, I seem to remember that the x64 name originally came from
Microsoft. They certainly make good use of it, at least, so Sun/Oracle isn't
alone :)

Just to fill in a (the?) missing piece in an otherwise well assembled puzzle
;)

~~~
asveikau
I can vouch for the fact, though, that internally the Windows team refers to
it as "amd64".

------
nkurz
I frequently see recommendations that intrinsics be used instead of assembly.
The theory is that they are more portable and just as efficient, which they
certainly are on an instruction by instruction level.

But I'm not having any luck at writing entire loops with them. All the
compilers I've tried (gcc, icc, clang) feel compelled to "optimize" the
intrinsics for me, turning my well-tuned and port-conscious loop into a hash.

For a tight loop of all intrinsics, I can often get a 20% improvement if I use
raw assembly in the same written order. Is there any good way to convince
these compilers to "do what I said" without turning off all optimizations
everywhere?

~~~
ajross
Put it in a separate translation unit and compile that with different flags.

But seriously, if you're really at the spot where you have _real_ , debugged
code that is better than compiler output, the compiler is providing no value
and you might as well use the assembly you've written. The point to intrinsics
is to expose hardware features that the C language doesn't, not to promise to
do it better than the compiler.

~~~
nkurz
_Put it in a separate translation unit and compile that with different flags._

I've tried it in separate units, and haven't been able to get things out in
the order I have them in the source. ICC does the most rearranging (which in
most cases other than this leads to faster code), and GCC generally does what
you say. But both do things like turning my reloads from memory (to reduce
port pressure) into register copies. I was wondering if there were flags I'm
not finding to prevent this.

 _If you're really at the spot where you have real, debugged code that is
better than compiler output, the compiler is providing no value and you might
as well use the assembly you've written._

Likely the best plan, but this is being used in an open source library where
the rest of the work is done with intrinsics. It's a "when in Rome" thing
rather than a technical concern.

------
jng
Poor explanations such as this annoy me:

"By replacing the initial R with an E on the first eight registers, it is
possible to access the lower 32 bits (EAX for RAX). Similarly, for RAX, RBX,
RCX, and RDX, access to the lower 16 bits is possible by removing the initial
R (AX for RAX), and the lower byte of the these by switching the X for L (AL
for AX), and the higher byte of the low 16 bits using an H (AH for AX)"

It makes learning more difficult. Something like this would be way better:

"Specific parts of the registers can be accessed separately from the rest of
the register, within strict limitations dictated by the format of
instructions. These register-parts are given easy-to-remember names. The lower
32 bits of the first eight registers can be accessed as EAX, EBX, ECX, EDX,
EBP, ESI, EDI and ESP. The lower 16 bits of registers EAX, EBX, ECX and EDX
can also be accessed as AX, BX, CX and DX. Finally, both the lower and the
second-lower bytes of registers EAX, EBX, ECX, EDX can be separately accessed
as AH, AL, BH, BL, CH, CL, DH and DL, with AL/BL/CL/DL being the lowest-order
byte and AH/BH/CH/DH being the other one.

Knowing what things are due to what part of the model, and what parts are
conventions, is a big part of understanding, which is just building a good
model of the architecture in your mind.

It took me years to understand two's complement arithmetic, and the key
missing point was that it is just a convention. A convention with great
practical advantages, but a convention anyway.

------
fyolnish
"Intel stores bytes "little endian," meaning lower significant bytes are
stored in lower memory addresses."

I would like to meet the person that doesn't know what endianness is, but can
still understand that explanation.

~~~
acallan
It's even worse trying to explain Gulliver's Travels (and the origin of
Endianness) to engineers that _do_ know it in the data-storage sense.

~~~
jackpirate
Please try. I've no idea what you're talking about.

~~~
acallan
Jonathon Swift's "Gulliver's Travels", a satire of 18th century England,
included in it a long-running bloody war between two religious sects that
started over the correct way to eat a soft-boiled egg: the big end, or the
little end.

For those not familiar with the history of England, the tome of Western
literature (and the genre of satire), or soft-boiled eggs, I've gotten some
confused looks when trying to explain this origin. Plus, it doesn't really add
_anything_ at all to the understand of endianness in the computer hardware
sense.

------
tptacek
x64 is so much nicer than x86 --- ip-relative addressing, more (mercifully,
numbered) registers, and (best of all) a register-based calling convention.

------
xhrpost
Is there a good Windows assembler that doesn't get flagged by anti-virus
software?

UPDATE: I just ran across JWAsm and haven't encountered any problems so far.
<http://sourceforge.net/projects/jwasm/>

~~~
swatkat
I haven't tried it; but how about Yasm?

<http://yasm.tortall.net/>

<https://github.com/yasm/yasm/>

~~~
xhrpost
Thanks. Looks like it requires Visual Studio in Windows. Was hoping to find a
simple "ASM" -> "Windows EXE" tool. I might try WinASM again at home.

~~~
jlarocco
Actually, Yasm doesn't require Visual Studio. To create .exe files you'll need
a linker, but it works fine with ld.exe from MinGW.

Going assembly straight to .exe sounds convenient for learning, but isn't very
useful in practice, so not many assemblers implement it. The most common use
case for assembly language these days is to write a few functions that get
called from some higher level language, and that's done by having both the
compiler(s) and assembler spit out object files that are linked together by
the linker. I suspect WinASM is just hiding the steps, and it'd be easy enough
to create a Makefile that did the same.

The closest thing to assembly->exe that I know of is the "flat" binary output
from Yasm. It's similar to really old DOS .com executables. I'm not sure
they'll run on modern Windows, though.

~~~
xhrpost
I'll have to look again, the JWasm needs a linker too. Sadly, I think it was
the ld.exe from MinGW that I had problems with in the past :( .

Yea, I'm really just doing this for the fun of it, not any actual needed
reason. If the "flat" version you're talking about is the old 16-bit .COM
format, then yes it will not work on modern 64-Bit windows. That was the first
method I tried with an old version of MASM as it was really easy to assemble
directly to a program back in my Assembly class days.

------
klrr
This is awesome, now I have something to do tonight. :-)

------
nsxwolf
Why is there no R8H?

~~~
stephencanon
There’s also no R9H ... R15H. These “registers", if they existed, would be
bits 8-15 of each of R8, R9, etc. Without talking to engineers at AMD, it’s
probably impossible to say precisely why they weren’t provided, but there is
generally little/no reason to ever want to use the existing *H registers in
64-bit code, so it’s not surprising that they were omitted.

Some combination of low/no demand and added encoding/decode/retirement
complexity is an excellent reason to leave things out of an architecture.

~~~
pbsd
There are also no RAH, RBH, .... You can only access AH by using the old
32-bit encoding.

This happens because in 32-bit x86, you only have access to AL, AH, BL, BH,
CL, CH, DL, DH (no lower bytes of EDI, ESP, etc). These are 8 different
possibilities, that are encoded by ModR/M's 3 Reg bits.

In amd64, they decided to regularize the instruction set: those 3 bits are now
used to encode the lower byte of EAX--EDI (AL, CL, ..., DIL). To get R8L, ...,
R15L you use the REX.R prefix, which acts as the 4th bit.

------
snake_plissken
I have an older Intel PDF for the EMT64. It makes for some great reading when
you want a really deep understanding of how everything works.

