Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AT&T syntax is broken (2021) (outerproduct.net)
91 points by g42gregory on Nov 19, 2022 | hide | past | favorite | 44 comments



Thanks! Macroexpanded:

AT&T Syntax versus Intel Syntax - https://news.ycombinator.com/item?id=33585154 - Nov 2022 (104 comments)


It seems every few years someone learning Asm for the first time stumbles across the dichotomy, and writes an article about it --- in favour of the Intel side. This is one of the earlier ones I remember: http://x86asm.net/articles/what-i-dislike-about-gas/

Also, it's worth mentioning that what people are normally referring to when they say "Intel syntax", beyond the syntax in the oficial docs, is more like "TASM IDEAL" than MASM or Intel's little-known "official" assembler (named ASM386.EXE, seemingly rare and hard to find these days); in the former, a label is consistently an address constant and all memory operands are enclosed in [ ], while the latter distinguishes between labels and variables.


Wow, this is a spicy take but i'll go ahead and bite the bait. why on earth should i be angry that parenthesis are used to de-reference pointers? TFA doesn't ever explain that part, it's just implied that I should be outraged somehow.


The article’s objection to “parenthesized nonsense” is about the nonsense, not the parentheses. 3(%edi,%ebx,8) is hiding a multiplication, two additions, and a memory access, in an order that you wouldn’t guess from the syntax.


I think this is the "the first language I learned does X, everything that does Y is wrong" when in reality there are lots of different ways to do something and none of them are really 'right' - every thing is a set of compromises made by the person who wrote it that were made in context of the place and time they wrote it.

For example: suppose I want to be able to unambiguously use the names of C globals in my assembler - I might insist that they all get extended with an initial '_' or I might insist that registers get a '%' prepended

Or: many of my potential customers use IBM 029 card punches, I can't use [] to indicate indirection because they can't punch those characters, so I'll use () instead

Or: people in the UK don't have a $ key, I'll use a # to indicate literals instead

These are historical issues I've certainly had to deal with when designing long lived things


Motorola 68K assembly languages use parentheses that way, too.

  mov.b (a3), #42
or whatever.


yeah, that's why i'm so shocked at the OP's hatred for this type of notation. It's the standard notation on several other platforms so it shouldn't be too controversial that people want to use it on x86 too.


Intel/MASM/TASM syntax has stupidities in it too, like "DWORD PTR" and whatnot, which brings Cobol-level verbosity to machine coding. All you need is a one-letter suffix for the size of the datum being transferred.


AVX-1024 will probably have 20 letter mnemonics.


Should be 128 (ASCII) letters.


Back when I wrote a lot of x86 assembly in the late 90's/early 2000's, I used NASM (the Netwide Assembler). It was the most natural syntax. Very similar to MASM and TASM but without the unnecessary cruft (no "dword ptr", etc.) It was a pleasure to work with.


I feel like this is such a non-issue.

To be proficient in reading or writing assembly (versus higher level languages) means to deal with a stream of instructions. Once you are fluent with that concept parsing individual instructions is a lookup. Operand ordering is just a small part of that. It can be x86, arm, or tis-100.

Switching between x86-intel and x86-at&t is no different than switching between x86 and arm.


Coming from 6502, 68000, and similar families, I have always found it weird that

    mov eax, ebx
means moving ebx into eax.

When you move something, you move it to its destination, not from its destination.

    mov A, B
should mean "Move A into B".


It makes sense if you consider that intel kept flip-flopping between LD (load; 4004, 8008, 8085) and MOV (move; 8080, 8086) semantics, but didn't change the syntax layout. 'LD EAX, EBX' makes perfect sense, or would have, but 8086 landed on a 'move' generation, and stuck there.


If you interpret "mov eax, ebx" as "eax = ebx", it's not that weird.


Yes, that's how I read it mentally, but that was not my point.


Lots of outrage, not much grounds. The syntax is different and, um, you might not get the results you want if you're expecting one and have to produce the other. But that's true in either direction, so... Intel syntax is broken! Makes just as much logical sense.


> but it also bears mentioning that assembly and English are different languages. In particular, the way objects and verbs interact is different

The mnemonic operation names are in English though. mov = move, add = add, jmp = jump etc. It's not like it's just hex codes, APL or K. So I think the English argument kind of makes sense.

> It’s somewhat telling that assembly has no equivalent to the word ‘to’, and few enough production rules to count on both hands.

Assembly was designed to be fairly concise. Blaming a 1970s assembly syntax for not having "to" instead of a comma is kind of silly. It used commas and will keep using them probably. If the syntax was more advanced, it wouldn't be assembly, it would be C or some other higher level language.


and to take this in the other direction,

TFA> The association between this numeric form and its meaning under the ISA is completely arbitrary

P> It's not like it's just hex codes

The numeric form of machine code, appropriately in hex or octal, is not arBITrary at all, the bits mean things that help group and decode the instruction with minimal logic. In the first example in the article, it's easy to see the registers he's talking about, and it would be easy after that to decode the addressing modes, alu functions, etc.


But they are not listing the hex codes but the mnemonic commands which look strangely enough similar to English, wouldn't you agree? "mov", "jmp", "rdtsc" etc.


TFA shows a numeric code in octal, but says that numeric codes are arbitrary. Why should they list arbitrary codes?

They're not arbitrary, was my point.


I've always like 6502 (and ARM) syntax. LDA #5 means load A with the literal number 5.


Intel syntax it is, then.

MOV EAX, 5

Vs

MOV $5, %EAX


It always seemed weird to be use use MOV[E] as a command when the value isn't being removed from the source location. It should be C[O]PY.


No absolute consistency in ISAs, even among the same one: E.g.

- 6502 uses "load"/"store" (LDA/STA, LDX/STX, LDY/STY), but also "transfer" (TXA, TAX, TYA, TAY, TSX, TXS).

- Z80 has "load" (LD, LDD, LDIR) but NO "store" (stores are LD src=reg,dest=ram I think)

- Some 4-bit Sharp CPUs use "t" meaning transfer as a JMP.

- Intel 4004 has mnemonics with Fetch, Read, Write, Load

- MIPS has both "load"/"store" (lb/sb,lbu,lh/sh,lhu,lw/sw,lui,la,li) and a couple "move" mnemonics (mfhi/mthi, mflo/mtlo).

- 68000 has MOVE (MOVE, MOVEA, MOVEM, MOVEQ), but then you do have that pesky LEA (Load Effective Address) instruction.

- Signetics 2650 is consistent, you have LODA/STRA, LODI, LODR/STRR, LODZ/STRZ for load/stores of registers to/from RAM and LDPL/STPL, LPSL/SPSL, LPSU/SPSU for load/stores of the PSW to/fom RAM.


> There are only a few cases where you really need to manually include size information; off the top of my head, I can only think of: sign- or zero-extended moves out of memory, and operations involving a memory and immediate operand.

There are a whole bunch of instructions with no operands at all that nonetheless have sizes. The most useful that comes to mind is SYSRET. sysretl and sysretq are both useful but are not interchangeable.


There a whole bunch of architectures where you have to specify instruction sizes - the intel prefix thing is really the odd one out - more importantly the PDP-11 (where the AT&T format came from) had ubiquitous 8 and 16-bit instructions


nop and enop and fnop! so many nops: https://www.felixcloutier.com/x86/nop


Funny you mention nop. The classic nop is 0x90, which decodes to XCHG EAX, EAX, which, on x86_32, does nothing.

But x86_64 made 32-bit operations zero the high 32 bits of the destination register, so, logically, 0x90 would clear the high bits of RAX. Oops.

As I’ve heard the story, this was noticed a bit on the late side, and an 11th hour change was made to special case 0x90 and make it still be a nop.


I just realized something: while very few people are writing Assembly code itself, there is a very sizable community of C/C++ programmers who need to be able to read the Assembly code, as part of the debugging process. Well, GCC disassembler defaults to AT&T syntax, as does Clang, I believe. This means that AT&T syntax is quite important, as it will be widely used in debugging.


agree completely. and if you are using an intel cpu why the heck would you not use intel notation?


I use AMD notation personally.


That’s okay — AMD’s manual, like Intel’s manual, uses Intel syntax.

(I personally find cmp to be really annoying in AT&T syntax —- the AT&T syntax cmp; jge is backwards, whereas the exact same sequence in Intel syntax makes sense. (jge is “jump if greater or equal”. But which values are compared in which order? [0]). AVX-512 in AT&T syntax is extra bizarre, too, if for no other reason than that it doesn’t match the documentation.

[0] For extra mental gymnastics, CMP with two register arguments can be encoded in two different ways: the way where the first operand could have been, but wasn’t, in memory, and the way where the second operand could have been, but wasn’t, in memory. Unlike SUB, CMP doesn’t really have a destination, but AT&T still reverses it for, ahem, consistency.


x86 ≠ Intel. There are three companies that make x86 cpus


I'm mainly using amd64 nowadays don't know what that has to do with intel.


I learned ASM on hp48 which has a very explicit syntax

A=A&C A; ?A=0 A; GOYES bit_is_not_set

Then MOV style syntax felt weird..


So then the question is, who has the most aesthetically pleasing assembly?


LOL @ the operand ordering bit that spends paragraphs whinging about "appeal to authority" and then justifies the author's preference with "because mutation" XD


What does “SIB” stand for?


Scale Index Base, the form of addressing mode that looks logical in Intel syntax but very obtuse in AT&T.

https://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_02.htm

https://xem.github.io/minix86/manual/intel-x86-and-64-manual...


> Everybody except for GCC and the GNU toolchain (and its clones, Clang and TCC), who just have to be different.

So some of the world’s most popular compilers, got it.


If you took all the assembly language ever written, I'd wager a negligible amount of it was written for GNU as. So yes, they are the odd ones out.


Free beer has a great power and people go out of their way to keep getting free beer.

If GCC had been sold at Solaris C compiler prices it wouldn't be as popular.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: