It seems every few years someone learning Asm for the first time stumbles across the dichotomy, and writes an article about it --- in favour of the Intel side. This is one of the earlier ones I remember: http://x86asm.net/articles/what-i-dislike-about-gas/
Also, it's worth mentioning that what people are normally referring to when they say "Intel syntax", beyond the syntax in the oficial docs, is more like "TASM IDEAL" than MASM or Intel's little-known "official" assembler (named ASM386.EXE, seemingly rare and hard to find these days); in the former, a label is consistently an address constant and all memory operands are enclosed in [ ], while the latter distinguishes between labels and variables.
Wow, this is a spicy take but i'll go ahead and bite the bait. why on earth should i be angry that parenthesis are used to de-reference pointers? TFA doesn't ever explain that part, it's just implied that I should be outraged somehow.
The article’s objection to “parenthesized nonsense” is about the nonsense, not the parentheses. 3(%edi,%ebx,8) is hiding a multiplication, two additions, and a memory access, in an order that you wouldn’t guess from the syntax.
I think this is the "the first language I learned does X, everything that does Y is wrong" when in reality there are lots of different ways to do something and none of them are really 'right' - every thing is a set of compromises made by the person who wrote it that were made in context of the place and time they wrote it.
For example: suppose I want to be able to unambiguously use the names of C globals in my assembler - I might insist that they all get extended with an initial '_' or I might insist that registers get a '%' prepended
Or: many of my potential customers use IBM 029 card punches, I can't use [] to indicate indirection because they can't punch those characters, so I'll use () instead
Or: people in the UK don't have a $ key, I'll use a # to indicate literals instead
These are historical issues I've certainly had to deal with when designing long lived things
yeah, that's why i'm so shocked at the OP's hatred for this type of notation. It's the standard notation on several other platforms so it shouldn't be too controversial that people want to use it on x86 too.
Intel/MASM/TASM syntax has stupidities in it too, like "DWORD PTR" and whatnot, which brings Cobol-level verbosity to machine coding. All you need is a one-letter suffix for the size of the datum being transferred.
Back when I wrote a lot of x86 assembly in the late 90's/early 2000's, I used NASM (the Netwide Assembler). It was the most natural syntax. Very similar to MASM and TASM but without the unnecessary cruft (no "dword ptr", etc.) It was a pleasure to work with.
To be proficient in reading or writing assembly (versus higher level languages) means to deal with a stream of instructions.
Once you are fluent with that concept parsing individual instructions is a lookup. Operand ordering is just a small part of that.
It can be x86, arm, or tis-100.
Switching between x86-intel and x86-at&t is no different than switching between x86 and arm.
It makes sense if you consider that intel kept flip-flopping between LD (load; 4004, 8008, 8085) and MOV (move; 8080, 8086) semantics, but didn't change the syntax layout. 'LD EAX, EBX' makes perfect sense, or would have, but 8086 landed on a 'move' generation, and stuck there.
Lots of outrage, not much grounds. The syntax is different and, um, you might not get the results you want if you're expecting one and have to produce the other. But that's true in either direction, so... Intel syntax is broken! Makes just as much logical sense.
> but it also bears mentioning that assembly and English are different languages. In particular, the way objects and verbs interact is different
The mnemonic operation names are in English though. mov = move, add = add, jmp = jump etc. It's not like it's just hex codes, APL or K. So I think the English argument kind of makes sense.
> It’s somewhat telling that assembly has no equivalent to the word ‘to’, and few enough production rules to count on both hands.
Assembly was designed to be fairly concise. Blaming a 1970s assembly syntax for not having "to" instead of a comma is kind of silly. It used commas and will keep using them probably. If the syntax was more advanced, it wouldn't be assembly, it would be C or some other higher level language.
TFA> The association between this numeric form and its meaning under the ISA is completely arbitrary
P> It's not like it's just hex codes
The numeric form of machine code, appropriately in hex or octal, is not arBITrary at all, the bits mean things that help group and decode the instruction with minimal logic. In the first example in the article, it's easy to see the registers he's talking about, and it would be easy after that to decode the addressing modes, alu functions, etc.
But they are not listing the hex codes but the mnemonic commands which look strangely enough similar to English, wouldn't you agree? "mov", "jmp", "rdtsc" etc.
No absolute consistency in ISAs, even among the same one: E.g.
- 6502 uses "load"/"store" (LDA/STA, LDX/STX, LDY/STY), but also "transfer" (TXA, TAX, TYA, TAY, TSX, TXS).
- Z80 has "load" (LD, LDD, LDIR) but NO "store" (stores are LD src=reg,dest=ram I think)
- Some 4-bit Sharp CPUs use "t" meaning transfer as a JMP.
- Intel 4004 has mnemonics with Fetch, Read, Write, Load
- MIPS has both "load"/"store" (lb/sb,lbu,lh/sh,lhu,lw/sw,lui,la,li) and a couple "move" mnemonics (mfhi/mthi, mflo/mtlo).
- 68000 has MOVE (MOVE, MOVEA, MOVEM, MOVEQ), but then you do have that pesky LEA (Load Effective Address) instruction.
- Signetics 2650 is consistent, you have LODA/STRA, LODI, LODR/STRR, LODZ/STRZ for load/stores of registers to/from RAM and LDPL/STPL, LPSL/SPSL, LPSU/SPSU for load/stores of the PSW to/fom RAM.
> There are only a few cases where you really need to manually include size information; off the top of my head, I can only think of: sign- or zero-extended moves out of memory, and operations involving a memory and immediate operand.
There are a whole bunch of instructions with no operands at all that nonetheless have sizes. The most useful that comes to mind is SYSRET. sysretl and sysretq are both useful but are not interchangeable.
There a whole bunch of architectures where you have to specify instruction sizes - the intel prefix thing is really the odd one out - more importantly the PDP-11 (where the AT&T format came from) had ubiquitous 8 and 16-bit instructions
I just realized something: while very few people are writing Assembly code itself, there is a very sizable community of C/C++ programmers who need to be able to read the Assembly code, as part of the debugging process. Well, GCC disassembler defaults to AT&T syntax, as does Clang, I believe. This means that AT&T syntax is quite important, as it will be widely used in debugging.
(I personally find cmp to be really annoying in AT&T syntax —- the AT&T syntax cmp; jge is backwards, whereas the exact same sequence in Intel syntax makes sense. (jge is “jump if greater or equal”. But which values are compared in which order? [0]). AVX-512 in AT&T syntax is extra bizarre, too, if for no other reason than that it doesn’t match the documentation.
[0] For extra mental gymnastics, CMP with two register arguments can be encoded in two different ways: the way where the first operand could have been, but wasn’t, in memory, and the way where the second operand could have been, but wasn’t, in memory. Unlike SUB, CMP doesn’t really have a destination, but AT&T still reverses it for, ahem, consistency.
LOL @ the operand ordering bit that spends paragraphs whinging about "appeal to authority" and then justifies the author's preference with "because mutation" XD