I recently had to convert some code from Intel syntax with NASM macros to GAS with AT&T syntax, and boy what a pain.
However, for me, AT&T's "mov X Y" for "move X to Y" feels better for me than Intel's "mov Y X" for "move Y from X" (who says that?). If they wanted to things in reverse, they should have named the instruction differently, for example as "load Y X" for "load Y from X" (as in "LDA #0xFF" from the 6502)
As for movl instead of mov, I like the 68K version (MOVE.L) better.
mov x, y
with coma being a synonym for equals, like mov x = y, hence it feels more natural to me.
But the biggest issues are address modes, specially the more complex ones and the macros seem very light weight in features when compared to the Intel world.
This is what Piaget meant when he talked about "schemata." (http://en.wikipedia.org/wiki/Jean_Piaget#Schemata) So, thank you for the new schema. :)
This particular one had never occurred to me and makes it way easier to internalize.
Sorry for gushing -- few things get me more excited than a new way to explain something. :)
- read "mov X,Y" as "X = Y"
- adjust syntax so that it allows "X = Y" (what you see is what you think)
- similarly, replace obscure syntax for indexed memory acces by such things as "X = Y"
- getting annoyed with the seemingly random limitations of the language, add an expression parser that translates "X = 2 * Y + 3" into "X = 2 * Y" and "X = X + 3", each of which gets assembled into one instruction. For now, only allow expressions that get away with only using the result register for temporaries.
- use existing macro capabilities to build a library of control flow statements such as IF and WHILE.
- introduce standard way to call subroutines.
- introduce shorthand method for doing such calls: one for the call site that takes a couple of expressions as argument, and one for function entry that uses macros such as 'int' and 'char' to pop arguments from the call stack.
By that time, one almost has K&R C.
D0.W = 5
MOVE.W #5, D0
As soon as I had the basics of procedures/functions, simple expressions and argument passing in place I started rewriting the compiler using it.
There wasn't an Assembly language available as we tend to know.
I don't remember the name of the processor, though.
Just like how I was taught the difference between "<" and ">". Our teacher drew the rest of Pac-Man around the "<" or ">" so that we'd think of "<" or ">" like the mouth of Pac-Man. It wants to eat the larger number. But that was back around 1983 when Pac-Man was all the rage (when I was drawing Pac-Man eating ghosts on my folders).
(Oddly enough, my childhood intuition about equals signs isn’t historically accurate—the equality is actually supposed to be represented by the equal length of the two strokes.)
It took me a long time to differentiate the symbols because, before using the alligator metaphor, my teacher tried this one: cross a line that goes perperdicular to the bottom part of the symbol. If it turns into a 4 it's smaller than; otherwise, if it turns into a 7, then it's bigger than.
I remember seeing a "5 < 8" and thinking "ok, I know which number is smaller and which one is bigger, and I know how which symbol is named, but which one should I use?" For some reason it wasn't obvious to me that I should read the assertion from left to right. I just saw two numbers and I didn't know if I should indicate that one of them was bigger or that another was smaller—both assertions were true. I remember having a list of exercises to do as homework and just circling the bigger numbers in frustration.
When I hearded the alligator metaphor, however, I got it instantly.
If x86 used LD instead of MOV, it would make more sense. LD AX,4 would be read as "load register AX with the value 4"; LD AX,BX would be "load AX with what's in BX"; LD AX,[SI] would be "load AX with what's in the memory location pointed to by SI" and so forth.
Since all of the official docs use Intel syntax, I think that using AT&T syntax is a mistake.
My assembly programming background is heavily Intel syntax based, I've done some x86 programming in the 16 bit days and also some Z80 for my TI calculators. So I found the AT&T syntax difficult to read (and write) at first.
But recently I was working on an operating system project and I needed some assembly code. I initially used NASM, then tried GAS/GCC inline asm with .intel_syntax, but I finally settled on AT&T syntax. It was easier to have just one asm syntax in my program (for inline asm in C code and in separate asm files). And it was easier to integrate in my build and toolchain, and no need to install another assembler for my target arch, just gcc+binutils.
So, in the end it was less pain to use AT&T syntax when it comes to tooling. But here's the thing: in less than a week I didn't mind the syntax any more.
So yeah, you might (as I did) think that the AT&T syntax sucks, but sometimes it's easier to use that than to reconfigure tools to use Intel syntax. But just get over it and use the default syntax of your tools. It's not such a big deal.
sub rbx, 0x8
mov rax, [rbx]
mov rax, QWORD PTR [rbx]
Anyway, the job is already done.
One thing to watch out for people learning IA-32(e) asm is that there are two syntaxes: Intel and AT&T. Intel, unsurprisingly, uses Intel syntax. GCC, however, mostly uses AT&T syntax (for historical reasons, and that's all I'll say).
You can find a reference to the GAS syntax at http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax. I wouldn't use it for a reference to any of the instructions (for that grab the linked Intel manuals in this article), but that should get you familiar with the differences if you decide you'd like to write GAS.
You do have the option to set GAS to use Intel syntax with a directive, but if you're modifying legacy code or just want to be consistent with a lot of other GAS code, you may want to use AT&T.
PDF copies: http://www.intel.com/content/www/us/en/processors/architectu...
Note: We are no longer offering the Intel® 64 and IA-32 Architectures Software Developer’s Manuals on CD-ROM. Hardcopy versions of the manual are available for purchase via a print-on-demand fulfillment model through a third-party vendor, Lulu (please reference 1 and 2 below): http://www.lulu.com/spotlight/IntelSDM.
I'm sure it makes sense after you've read the accompanying paragraph a sufficient number of times (once didn't cut it, for me), but not so sure it helps in that understanding.
I guess it's a funny hint that the x86's architecture is complex, when a figure that's just trying to describe the register names is this complex. :)
- EM64T (EMT64 is a common typo) -> Intel's implementation of AMD64, intel traded SSE3 to AMD for AMD64.
- IA-32e -> Same as EM64T, Intel used this name for a bit, mostly during development.
- INTEL64 -> Intel renamed EM64T to be more in line with AMD64's naming.
- x86-64 -> Overarching instruction set, AMD64 and INTEL64 are implementations.
- x64 -> Shorthand for x86-64.
x86-64 / x86_64 / x64 are just names for the overarching ISA; AMD64 / INTEL64 are implementations.
The AMD64 and INTEL64 implementations aren't actually identical, there are a few differences between them which compilers generally deal with by producing binaries that can handle either implementation.
The x86-64 / x86_64 / x64 thing isn't Intel or AMD's fault, it came about because OS vendors used different terms for the arch. Linux for example added support when AMD64 was the only thing around and it wasn't clear what Intel was going to do, they were still pushing Itanium. So the linux kernel used AMD64 as the arch name. Later when Intel licensed AMD64 linux wasn't about to completely rename the arch (and thousands of packages as a result) so now in linux the AMD64 arch strangely supports both AMD64 and INTEL64 targets. On the other hand Apple used x86-64 and x86_64 in the OSX kernel and Sun / Oracle decided to use x64. Really the OS vendors created this mess in combination with awkward timing of support between AMD and Intel.
Why would you do this? Given that you never use 4 GB of memory, you can save a few bytes of memory (32-bit pointers instead of 64-bit) while still getting the extra registers and operations of x86-64 code.
Just to fill in a (the?) missing piece in an otherwise well assembled puzzle ;)
x86 also started as unofficial shorthand.
But I'm not having any luck at writing entire loops with them. All the compilers I've tried (gcc, icc, clang) feel compelled to "optimize" the intrinsics for me, turning my well-tuned and port-conscious loop into a hash.
For a tight loop of all intrinsics, I can often get a 20% improvement if I use raw assembly in the same written order.
Is there any good way to convince these compilers to "do what I said" without turning off all optimizations everywhere?
But seriously, if you're really at the spot where you have real, debugged code that is better than compiler output, the compiler is providing no value and you might as well use the assembly you've written. The point to intrinsics is to expose hardware features that the C language doesn't, not to promise to do it better than the compiler.
I've tried it in separate units, and haven't been able to get things out in the order I have them in the source. ICC does the most rearranging (which in most cases other than this leads to faster code), and GCC generally does what you say.
But both do things like turning my reloads from memory (to reduce port pressure) into register copies. I was wondering if there were flags I'm not finding to prevent this.
If you're really at the spot where you have real, debugged code that is better than compiler output, the compiler is providing no value and you might as well use the assembly you've written.
Likely the best plan, but this is being used in an open source library where the rest of the work is done with intrinsics. It's a "when in Rome" thing rather than a technical concern.
"By replacing the initial R with an E on the first eight registers, it is possible to access the lower 32 bits (EAX for RAX). Similarly, for RAX, RBX, RCX, and RDX, access to the lower 16 bits is possible by removing the initial R (AX for RAX), and the lower byte of the these by switching the X for L (AL for AX), and the higher byte of the low 16 bits using an H (AH for AX)"
It makes learning more difficult. Something like this would be way better:
"Specific parts of the registers can be accessed separately from the rest of the register, within strict limitations dictated by the format of instructions. These register-parts are given easy-to-remember names. The lower 32 bits of the first eight registers can be accessed as EAX, EBX, ECX, EDX, EBP, ESI, EDI and ESP. The lower 16 bits of registers EAX, EBX, ECX and EDX can also be accessed as AX, BX, CX and DX. Finally, both the lower and the second-lower bytes of registers EAX, EBX, ECX, EDX can be separately accessed as AH, AL, BH, BL, CH, CL, DH and DL, with AL/BL/CL/DL being the lowest-order byte and AH/BH/CH/DH being the other one.
Knowing what things are due to what part of the model, and what parts are conventions, is a big part of understanding, which is just building a good model of the architecture in your mind.
It took me years to understand two's complement arithmetic, and the key missing point was that it is just a convention. A convention with great practical advantages, but a convention anyway.
I would like to meet the person that doesn't know what endianness is, but can still understand that explanation.
ON HOLY WARS AND A PLEA FOR PEACE - Danny Cohen 1 April 1980
For those not familiar with the history of England, the tome of Western literature (and the genre of satire), or soft-boiled eggs, I've gotten some confused looks when trying to explain this origin. Plus, it doesn't really add anything at all to the understand of endianness in the computer hardware sense.
UPDATE: I just ran across JWAsm and haven't encountered any problems so far. http://sourceforge.net/projects/jwasm/
Ecma spec at:
Going assembly straight to .exe sounds convenient for learning, but isn't very useful in practice, so not many assemblers implement it. The most common use case for assembly language these days is to write a few functions that get called from some higher level language, and that's done by having both the compiler(s) and assembler spit out object files that are linked together by the linker. I suspect WinASM is just hiding the steps, and it'd be easy enough to create a Makefile that did the same.
The closest thing to assembly->exe that I know of is the "flat" binary output from Yasm. It's similar to really old DOS .com executables. I'm not sure they'll run on modern Windows, though.
Yea, I'm really just doing this for the fun of it, not any actual needed reason. If the "flat" version you're talking about is the old 16-bit .COM format, then yes it will not work on modern 64-Bit windows. That was the first method I tried with an old version of MASM as it was really easy to assemble directly to a program back in my Assembly class days.
Look at the MOV opcode. In 32-bit code, the opcode byte has one bit that specifies whether the operand size of the instruction is 8 bites or 32 bits (16 bit operands are selected using an operand size prefix). In either case, the MODRM byte has three bits to encode a register number (either the source or destination register depending on a different bit in the opcode byte). That gives you 8 registers to work with.
Now the 8086 was not regular. There were 4 GPRs (AX-DX), and 4 pointer registers (SP, BP, SI, DI). It was clear by context whether an instruction would refer to a pointer register or a GPR. But it still had 3 bits for the register number. So these 3 bits were used to treat the 4 16-bit GPRs as 8 8-bit registers.
Now, the 386 regularized the architecture to make SP, BP, SI, and DI mostly act as GPRs. But it still has only 3 bits to specify a register number. Intel chose to retain compatibility with the 286, so in 32-bit mode, when the operand size is 8-bits, you can access the lower bytes of EAX-EDX, just like before. But when the operand size is 32-bits, you can access all 8 registers as GPRs.
AMD64 comes along and muddies things further. In AMD64, if the REX prefix byte exists, and the instruction has an operand size of 8 bits, then the 16 available register numbers (remember, the REX prefix carries an additional bit for each of the register numbers in MODRM and SIB bytes) address the lower byte of each of the 16 GPRs. However, if the REX prefix is absent, the same model is used as the 386, where the 8 available register numbers address the lower bytes RAX-RDX.
And that's why there is no R8H-R15H. Accessing R8-R15 requires the extra bits in the REX prefix, but even with the REX prefix you only have 16 register numbers to work with, and AMD chose to use them to make each GPR accessible as a byte register.
Some combination of low/no demand and added encoding/decode/retirement complexity is an excellent reason to leave things out of an architecture.
This happens because in 32-bit x86, you only have access to AL, AH, BL, BH, CL, CH, DL, DH (no lower bytes of EDI, ESP, etc). These are 8 different possibilities, that are encoded by ModR/M's 3 Reg bits.
In amd64, they decided to regularize the instruction set: those 3 bits are now used to encode the lower byte of EAX--EDI (AL, CL, ..., DIL). To get R8L, ..., R15L you use the REX.R prefix, which acts as the 4th bit.