
Learning to Read X86 Assembly Language - adamnemecek
http://patshaughnessy.net/2016/11/26/learning-to-read-x86-assembly-language
======
adamnemecek
Also Matt Godbolt's gcc explorer is the the bee's knees for understanding
assembly

[https://godbolt.org/](https://godbolt.org/)

I think that playing around with it for 2 hours will teach you more than most
classes on the topic. It really drives home why interactivity is such a bit
deal in education.

You should also try writing a script for counting instructions in binaries.
It's pretty illuminating. Here are some sample statistics
[https://webcache.googleusercontent.com/search?q=cache:j0gebK...](https://webcache.googleusercontent.com/search?q=cache:j0gebKGwf5YJ:https://www.strchr.com/x86_machine_code_statistics+&cd=1&hl=en&ct=clnk&gl=us)

~~~
makmanalp
This is amazing. After a few minutes I learned so much already - try with an
empty function that returns 0, then return an argument, then return an
argument +1, argument^2, etc.

As an aside, I was looking at the header code generated by gcc to handle the
initial function call. What's the convention on how we assign parameters to
registers? I'm trying stuff out, and the first paramter always seems to be
edi, then esi, eax. Except when I change the types, and it turns into xmm0 and
the like. And the result returned is eax too, except when it's a float, then
it's xmm0. How do things know where to look?

~~~
adamnemecek
> How do things know where to look?

The compiler generates code that uses the correct register. So the compiler
picks a register into which it will put the result and then generates code
after the calling location that gets the result from the correct register.

And yeah, there's quite a bit of surprises. E.g. I found out that gcc is smart
enough to perform tail call optimizations
[https://godbolt.org/g/MZDmwP](https://godbolt.org/g/MZDmwP)

~~~
makmanalp
OK, so it's not a magical convention - it works it out bottom-up. First decide
on registers for each parameter when you generate the code for the function,
then based on that generate specific code for the instances where you call
that function. Cool, thank you.

Also that example, heh. I tried to go back gcc versions to see if there was a
case where it didn't do TCO - nope. Also, I like how returning 0 is "xor eax,
eax".

~~~
MichaelBurge
Suppose you're right and the registers are arbitrary. Then how would foreign
function calls work? If you're compiling Rust code that calls into a C
library, how does it know what registers to use?

So the choice of registers cannot be arbitrary, unless the compiler knows the
function is only used within an object file.

The registers are predetermined by a convention unless you use the 'static'
keyword to signal that the function is only used internally to a module, so
the compiler has complete freedom to choose registers.

~~~
userbinator
_Then how would foreign function calls work? If you 're compiling Rust code
that calls into a C library, how does it know what registers to use?_

By using information kept with the function, or perhaps even encoded into the
function name itself (as already happens when distinguishing between different
calling conventions, or in the case of C++ name mangling)?

Coming from an Asm background, where there basically is no one "calling
convention", and programmers would document which registers (almost always
registers, rarely the stack --- and that can make for some great efficiency
gains) are for what, I've always wondered why that idea didn't seem to go far.

~~~
masklinn
> By using information kept with the function

How would you do that with dynamically linked code, inspect functions you're
calling at runtime before laying out your arguments?

> perhaps even encoded into the function name itself

That would mean name mangling in C and assembly.

> Coming from an Asm background, where there basically is no one "calling
> convention"

Right, because you can lay out memory however you want since you're at the
assembly level. Higher-level code (C up) can't do that, so instead you've got
standard calling conventions for inter-library call (inside a compilation
unit, the compiler is free to use alternate calling conventions since it has
complete control over both sides of the callsite, that's also how it can
inline calls entirely).

> programmers would document which registers (almost always registers, rarely
> the stack --- and that can make for some great efficiency gains)

Some standard CC (though not the old CDECL) also use registers, so far as they
can, depending on the arch. The SystemV AMD64 ABI uses 6 x86_64 registers for
integer/pointer arguments and 8 SSE registers for FP arguments, with the rest
on the stack.

------
userbinator
_What a train wreck! It’s hard to imagine a more confusing state of affairs._

I'd say that's more attributed to someone many many years ago deciding they
would not follow the official Intel syntax (for what reason I do not know),
and somehow convincing the rest of the community to follow them. That's
actually one of the things that could make for a very interesting article: how
one processor family got two different and incompatible Asm syntaxes. The fact
that the mnemonics and syntax don't correspond to those found in the
manufacturer's datasheets and manuals just increases the barrier to
understanding. As far as I know, the same didn't happen to ARM, MIPS, SPARC,
and the others. Especially when the sense of the comparisons/conditional jumps
is reversed, and some of the more advanced addressing modes look less-than-
obvious, it's hard to imagine why anyone would adopt such a syntax:

[http://x86asm.net/articles/what-i-dislike-about-
gas/](http://x86asm.net/articles/what-i-dislike-about-gas/)

Note that the GNU tools have option to use Intel syntax too, so you can avoid
some of the confusion (in the DOS/Windows and embedded world at least until
recently, Intel syntax is overwhelmingly the norm.)

~~~
haberman
This. The AT&T syntax for x86 thing is a huge mistake. All the official docs
are Intel syntax. Intel syntax is easier to read and write. Half the gotchas
in this article are problems that don't exist in Intel syntax, like the
instruction suffixes. The instruction suffixes get even weirder when you get
to the sign extending instructions. I wrote an article about this here:
[http://blog.reverberate.org/2009/07/giving-up-on-at-style-
as...](http://blog.reverberate.org/2009/07/giving-up-on-at-style-assembler-
syntax.html)

~~~
Annatar
_The AT &T syntax for x86 thing is a huge mistake._

For someone who grew up on normal processors (MC68000 and UltraSPARC) AT&T
syntax is the best thing since sliced bread: it's perfectly logical to move
something to somewhere, instead of "move to somewhere something".

~~~
userbinator
I haven't done any 68K Asm and barely glanced at SPARC, but how does src, dst
interact with noncommutative operations like subtraction and comparison? E.g.
with x86 Intel syntax,

    
    
        cmp eax, 5     ; eax - 5
        jg morethan5   ; eax > 5 ? then jump.
        sub eax, ecx     ; eax = eax - ecx
    

This is one of the most confusing things about AT&T x86 --- the comparisons
and subtractions have their operands reversed, and you have to identify and
manually reverse them to understand the code correctly. With Intel syntax, the
operands to a subtraction appear in the usual arithmetic order. Or do those
processors' syntax keep the order but instead replace the subtrahend with the
result??

    
    
        sub A, B    ; B = A - B ??

~~~
Annatar
_This is one of the most confusing things about AT &T x86 --- the comparisons
and subtractions have their operands reversed,_

That is confusing as all hell to me: if I compare x to 5, and 5 to x, it's
still the same comparison, so what difference does it make?

Anyway, on Motorola 68000 it would look like so, assuming data was in data
register 0 (there are eight general purpose data registers, and eight general
purpose address registers):

    
    
      cmp.l #5, d0    ; d0 is unchanged by the comparison
      bgt MoreThanFive
      ;
      ; substract the value of d1 from d0, and store the result
      ; in d0.
      ;
      sub.l d1, d0
    

however, we don't usually branch if greater or lower; we simply compare
whether a register is equal to some value:

    
    
      cmp.l #5, d0
      bne NotFive

~~~
vram22
>That is confusing as all hell to me: if I compare x to 5, and 5 to x, it's
still the same comparison, so what difference does it make?

I haven't done assembly code for a while, and was not an expert at it earlier,
so guessing, but:

it may be because of what flags in the flags register (if there is one
nowadays) get set - they could be different for the two versions of your
comparison.

~~~
Annatar
Yes, there is a status register, every processor must have one (or else the
processor couldn't function). Doesn't matter whether you compare 5 to a
register (or memory location, depending on the processor), or memory /
register to 5, the same bit(s) will still be set in the status register.

~~~
vram22
>Doesn't matter whether you compare 5 to a register (or memory location,
depending on the processor), or memory / register to 5, the same bit(s) will
still be set in the status register.

Are you sure? That was my whole point - that it may not be that way. As I
said, it's been a while, but it seems to me that the bits that get set in the
flags/status register, on comparing A to B, should be, in some sense at least,
the opposite (maybe not for all the bits) of what would get set on comparing B
to A; because I thought it would be done by subtracting A from B or B from A,
and then setting (some of) those flag bits based on which was greater or
equal. If that is so, comparing A to B will not have the same result in the
register as comparing B to A. And the reason why I think so, is that there are
assembly instuctions like JGE (Jump if Greater or Equal), JE (Jump if Equal),
JNE (Jump if Not Equal), etc. - the meaning of those instructions would get
changed and so would the resulting action (jump or not jump) based on the
looking at the flags set on comparing A to B vs. B to A.

~~~
Annatar
You're overthinking this way more than you need to.

On any given processor, in this context, there is one and only one way to
compare an immediate value with one in a register, so you don't have to worry
about whether you're comparing 5 to %eax, or eax to 5: you can't subtract the
value in %eax from 5, because 5 is an immediate value, not a memory location.

    
    
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_main
      	.align	4, 0x90
      _main:
      	movl	$17, %eax
      	cmpl	$5, %eax
      	jle	Exit
      	movl	$5, %eax
      Exit:	ret
    

now let's assemble and link that using the C compiler's front end, so we won't
have to worry about _init and _fini:

    
    
      > cc cmp.s -o cmp
      > ./cmp; echo $?
      5
    

note the cmpl $5, %eax instruction. Now watch what happens when I attempt to
compare %eax with 5:

    
    
      	.section	__TEXT,__text,regular,pure_instructions
      	.globl	_main
      	.align	4, 0x90
      _main:
      	movl	$17, %eax
      	cmpl	%eax, $5
      	jle	Exit
      	movl	$5, %eax
      Exit:	ret
    
      > cc cmp.s -o cmp
       cmp.s:6:13: error: invalid operand for instruction
       cmpl %eax, $5
                  ^~
    

it can't be done, because there is one and only one way to compare an
immediate value with one in a register. intel or AT&T syntax -- dst, src or
src, dst -- the comparison is the same. Therefore, AT&T syntax is the best
thing since sliced bread, because it's left to right instead of right to left,
which is how we think in terms of taking something and moving it somewhere --
in the physical world, step 1. will be to take an object and step 2. will be
to move that object somewhere.

~~~
vram22
I think we might be talking about different things, or past each other. Let it
go.

------
tptacek
So, this is fantastic, but I want to make an appeal for the most important
thing to understand about any assembly language, even before you work out the
individual instructions:

Break your programs into _basic blocks_! Reverse engineers never read assembly
in a straight line. Instead, they read the _control flow graphs_ of
subroutines, which is the graph where nodes are runs of instructions ending in
jumps and branches, and the edges are the jump targets. I hope this doesn't
sound complicated, because it isn't: it's literally just what I wrote in this
paragraph. It takes about 15 minutes for most platforms to learn enough to
recover CFGs from subroutines by hand.

To get a decent understanding of what a chunk of assembly code is doing, all
you really need is:

* The code broken into subroutines (this is usually your starting point) and then CFGs (good disassemblers do this for you, but it's easy to do by hand as a first pass)

* The CALLs (CALLs don't end basic blocks!)

* The platform's calling convention (how are arguments passed and return values returned from subroutines)

There are two tricks to reading large amounts of assembly:

1\. Most of the code does not matter, and you won't be much better off for
painstakingly grokking it.

2\. Virtually all the assembly you'll see is produced by compilers, and
compilers spit out assembly in _patterns_. Like the dude in The Matrix, after
an hour or so of reading the CFGs of programs from a particular compiler,
you'll stop having to read all the instructions and start seeing the "ifs" and
"whiles" and variable assignments.

------
ivanhoe
To understand assembly it really helps to know at least something about how
computers work on a low level. When I first tried learning it (long time ago,
in a high school) I had no idea how computers really work on such a low level,
how CPU's addressing registers and that kind of stuff, and while I managed to
learn the syntax, even write some asm code, it was all really confusing to me.
And only few years later on University, after I've learned in details about
the cpu architecture, registers, buses, DMA, etc, it suddenly all started to
make perfect sense and became 100x more clear and easier. So if you're
interested in this, it will save you a lot of effort to invest some time first
to learn the computer architecture basics, and then from there go to learn the
assembly lang. Just my $0.02

~~~
theparanoid
It's helpful to realize x86 assembly is not what's executed by the machine;
machine code is. One assembly instruction, e.g. ADDL, is translated to several
different machine code instructions depending on the destination, source, and
addressing mode.

~~~
amw-zero
Can you point to a source for this? All x86 assemblers that I know of map one
assembly instruction to one machine instruction.

~~~
spc476
I'm looking at the Microsoft Macro Assembler 5.1 Reference manual (it was
nearby and easily accessible to me; yes it's old (from the very late 80s or
early 90s) but it covers the 32 bit 80386, which is still valid.

Anyway, it shows three different encodings for the ADD instruction. The first:

    
    
        000000dw mod,reg,r/m
    

This adds register to register, or memory to register (either direction, the d
above) using either 8 or 16/32 bits (the w above [1]). The second form:

    
    
        100000sw mod,000,r/m
    

This adds an immediate value (8 or 16 bits, w again) to a register. The s bit
is used to sign extend the data (s=1; otherwise, 0-extend it) if required [2].
The final form:

    
    
        0000010w data
    

This adds an immediate value to the accumulator register (EAX, AX, AL) [1].
That's three different encodings for the "same" instruction. The MOV
instruction (and again, I'm only talking about the 80386 here) has 8 different
encodings, depending upon registers used.

[1] If the current code segment is designated as a 16-bit segment, then the w
means 16 bits, _unless_ a size override byte (an opcode prefix byte) is
present, in which case it means 32-bits. If the current code segment is
designated as a 32-bit segment, then the w means 32 bits, again _unless_ a
size override byte is present, in which case it means 16-bits.

[2] It seems to me that if w=1, then the s bit is extraneous and thus could be
used to encode other instructions. I'm not sure if that is the case but it's
common to use otherwise nonsensical instruction encoding to do something
useful.

~~~
userbinator
_It seems to me that if w=1, then the s bit is extraneous and thus could be
used to encode other instructions. I 'm not sure if that is the case but it's
common to use otherwise nonsensical instruction encoding to do something
useful._

Opcode 82h is an alias for 80h --- it presumably sign-extends the immediate
value into an internal temporary register, but the upper bits don't matter
anyway since it's an 8-bit add. Some interesting discussion on that here,
along with an example application:

[http://computer-programming-
forum.com/46-asm/143edbd28ae1a09...](http://computer-programming-
forum.com/46-asm/143edbd28ae1a091.htm)

------
qwertyuiop924
x86 is the worst ISA. If you want to play with assembler without feeling a
desire to stab yourself and end it all, I recommend ARM.

Or go learn Z80, x86's weird, 8-bit cousin (it had a 16-bit version, but it
sold poorly), which had a greater emphasis on backwards compatability (you can
run code from the original 8080 on a Z80, unchanged), and is nicer to work
with (because it wasn't extended in unticipated directions far beyond its
original capabilities, while keeping fetishistic backwards compatability by
stacking hack on top of hack on top hack. It also didn't have memory
segmentation, otherwise known as The Worst Thing.)

There are only two common reasons to learn Z80 assembler, though: to program
the Gameboy (which runs on a modified Z80 with all the cool instructions
removed), and to program a TI calculator, thus making all highschoolers in
your area immensely happy.

TI calculators are a comically overpriced scam, that have only survived
because of the College Board, but that's another story.

~~~
honkhonkpants
I never felt more like stabbing myself than when trying to cipher out exactly
which immediate values are possible on ARM, and which are not. X86 I happen to
enjoy. It is not "the worst ISA" by any means. It has wonderful code density,
which turns out the be very important. There's a reason that x86 won and
continues to win.

~~~
JoeAltmaier
When originally invented the x86 instruction set _was_ efficient - the most-
used instructions had shorter byte code sequences. But eventually some
instructions got 'left behind' by the compilers. There are a whole host of
single-byte instructions that are never, ever used by a compiler - the
register exchange instructions for instance (xchg eax, ebx). Compilers just
schedule destination registers carefully, never need to swap them around.

Also the whole set of exchange-register-with-itself were defined but never
used. E.g. xchg ax,ax which does nothing in one byte. In fact that one _was_
considered useful, its used as the 'no-op' instruction (0x90) right? But what
about xchg bx,bx, xchg cx,cx and so on? Just wasted single-byte opcodes.
Leaving actual common instructions to use longer bytecode sequences.

So maybe an executable should begin with an opcode-decode-table that is loaded
with the code, that tells the hardware what byte sequences mean what
instructions. So each executable code can be essentially compressed, using
optimum coding for exactly the instructions that code uses most often. Just
thinking out loud.

~~~
qwertyuiop924
That would be cool, but I would not want to hand-code assembly on such a
platform. That makes memory segmentation look fun.

~~~
JoeAltmaier
The table could be created from your assembly automatically? I would never
want to code the hex directly in any case.

~~~
qwertyuiop924
That could work...

I thought it couldn't a second ago. I don't know why. Makes no sense to me
now.

------
satysin
A video I like to send friends asking about how to understanding x86 assembly
is
[https://www.youtube.com/watch?v=yOyaJXpAYZQ](https://www.youtube.com/watch?v=yOyaJXpAYZQ)

I think the video maker does a good job of mapping a simple C program to its
disassembly.

------
Annatar
_And, of course, modern compilers will usually produce faster, more optimized
code than you ever could, without making any mistakes._

This assertion comes up over and over again in the last 30 years. Every time
I've had it asserted to me, it always came from non-assembler programmers, who
always wrote in a high level language. I have yet to see evidence of
optimizing compilers generating code even remotely close in efficiency to what
we would code directly in assembler.

A coder would never write all those extra frame pointer setup code, nor would
they waste encoding and tact cycles shuffling values from one register to
another. For example, a human might write the code from the article thus:

    
    
      Add42: addb $42, %al
             ret
    

and that's it. No frame pointer or stack setup, that's all unnecessary
overhead because compiler algorithms can't reliably make such contextual
decisions.

~~~
xxs
>>because compiler algorithms can't reliably make such contextual decisions

They can actually. Compiler optimizations have come long way, even Java's JIT
should be able to optimize that. (ok, not using the AL register)

My personal story - I used to use exclusively assembler for 6502 and 8086 as
it actually ran fast enough. In the mid 90s I saw Delphi's code (and Delphi
was not known for its optimizations) but it was able to use the Pentium
instruction pairing which takes quite an effort to accomplish by hand.

While beating an old compiler was easy it was the time the compilers began
making strides rivaling humans.

Still, hand written inner loops in Assembly might yield some performance
(iirc, grep still relies on) but overall there are very limited amount of
settings where there would be significant difference... to warrant the effort
(incl. correctness and [micro]benchmarks)

~~~
Annatar
_My personal story - I used to use exclusively assembler for 6502 and 8086 as
it actually ran fast enough. In the mid 90s I saw Delphi 's code (and Delphi
was not known for its optimizations) but it was able to use the Pentium
instruction pairing which takes quite an effort to accomplish by hand._

But a human would almost never use some of those more complex instructions,
for a very simple reason: they eat too many clock cycles. When one is coding
in assembler, one usually targets two constraints:

1\. the least amount of clock cycles needed to pull off an operation;

2\. the least amount of bytes to encode the operation.

Where those two meet is where the best coders get unbelievable performance out
of the hardware. At least that's the case in the demo scene, although many
nowadays cheat by banging the GPU's in CUDA or OpenGL.

------
Grom_PE
As AT&T syntax is still being used, at this point I'm willing to believe it's
to purposely make x86 assembly hard and unpleasant to read and write. Perhaps
so people will want to stay away from it, and in a way, to reduce the amount
of code that is tied to the x86 platform. It spreads the thinking x86 assembly
is terrible and ugly.

Intel syntax is much cleaner, in particular, Intel Ideal (as opposed to MASM),
and specifically, FASM (flat assembler). FASM makes it as clean as possible
and turns writing assembly into a joy.

Compare:

    
    
        movl %fs:-10(%ebp), %eax       ; AT&T
        mov eax, dword ptr fs:[ebp-10] ; MASM
        mov eax, [fs:ebp-10]           ; FASM

~~~
spc476
As I recall, the "dword ptr" stuff was only necessary if the instruction was
otherwise ambiguous. Using EAX means you are using a 32-bit destination. But
something like:

    
    
        move fs:[ebp-10],5
    

is ambiguous. Is that an 8-bit constant? 16 bits? 32-bits?

~~~
Grom_PE
I believe MASM-style assemblers require "dword ptr" at all times and many
disassemblers keep outputting that in unambiguous situations.

Your example won't assemble because size can't be guessed, but this will:

    
    
        mov dword [fs:ebp-10], 5

------
gnuvince
Nice article! It take a subject that is scary for many and does a great
explaining a little bit very clearly using good visual aids. I look forward to
the next articles!

~~~
pat_shaughnessy
Thanks a lot. I'm looking forward to writing it, I think it will be
interesting.

------
Gruselbauer
> It turns out x86 assembly is much simpler than Hungarian

Well, what isn't? I have a knack for languages but that beautiful beast seems
like an almost impenetrable fortress of strangeness.

On the other hand, that encourages me to learn to read basic assembly.

------
bogomipz
I thought this sentence was curious:

"To write code that runs directly on your microprocessor you need to know how
memory segmentation works"

Although you can't completely ignore segments, in practice at least on Linux
the only segments in use are user code/data and kernel code/data segments.

Does anyone know why the author might suggest that understanding segmentation
is necessary to write Assembly code?

~~~
ktRolster
Probably because you need to deal with the MMU. You can't just write raw
assembly and expect it to work (ignoring the MMU), but the kernel takes care
of that for you.

~~~
bogomipz
I am not following - what is raw assembly? I can write a complete userland
program using only Assembly and it will run just fine. When do I need to deal
with the MMU exactly?

~~~
wolfgke
> When do I need to deal with the MMU exactly?

When writing programs for user mode (ring 3 on x86), you hardly need to care
(except sometimes use some segment override prefixes (cf.
[https://news.ycombinator.com/item?id=13052076](https://news.ycombinator.com/item?id=13052076)),
which pedantically is "dealing with the MMU", since because of the MMU this
works; but in my opinion it is not necessary to understand the technical
details behind it, why this works).

On the other hand, if you are an OS ("operating system", here I don't mean
"open source") developer, you probably better know the details of the MMU.

Concerning
[https://news.ycombinator.com/item?id=13052892](https://news.ycombinator.com/item?id=13052892):
I also consider the author's statement as misleading that one has to know how
segmentation works. The knowledge of about segmentation is absolutely
necessary for x86-16 (real mode), which many people tend to associate with
assembly (because there seem to be many more assembly tutorials available for
DOS/x86-16 than for x86-32 or even x86-64), but hardly relevent for people who
just write user mode code.

------
kitsuac
If you want to learn how to read X86, don't choose this AT&T style syntax as
it's a monstrosity.

------
Jugurtha
You might like this:

[https://news.ycombinator.com/item?id=9322723](https://news.ycombinator.com/item?id=9322723)

------
kruhft
Reading assembly language is about having the computer in your head, just like
regular programming. You read and execute the instructions just like any other
language, just the operations are that much smaller and less abstract. Each
instruction is a 'function call'; just like any low level language you
leverage abstraction to build up these operations into greater pieces of
execution knowledge using macros and functions in logical ways to get the
outcome.

It's not magic. The best way to learn assembly is to program in it. I learned
on the Gameboy by getting a job and programming 2 games in it. Fun as hell,
especially when the machine is small enough to really fit in your head and
clock cycles count at 4Mhz.

------
matchagaucho
When I worked at Intel in the Server BIOS group, the development process
involved several iterations of developing macro abstractions until the x86 ASM
code became more readable and maintainable.

No one particularly enjoyed working in raw ASM 100% of the time.

------
RandomInteger4
This is the first time I actually read all the way through an article on
assembly. It was nice and concise. Granted, I'll probably forget this until
the next article (due to focusing on other studies), but thank you none the
less.

~~~
pat_shaughnessy
Thanks! Glad to hear you made it to the bottom :) I always worry that I tend
to write on and on for too long.

------
ndesaulniers
Nice illustrations. It was a good idea to get the prologue/epilogue out of the
way. Note that for optimized code, it's usually unnecessary. For more info:
[https://nickdesaulniers.github.io/blog/2014/04/18/lets-
write...](https://nickdesaulniers.github.io/blog/2014/04/18/lets-write-
some-x86-64/)

------
JoeAltmaier
The examples seem odd to me: the argument order is reversed from every Intel
disassembly (or assembler) I've ever used. Addi edi, 43 is the normal way to
say "Add 43 to edi". The destination register is normally first; the source
register 2nd in the disassembly, right?

~~~
ycmbntrthrwaway
gdb uses AT&T syntax by default. [0]

[0]
[https://en.wikipedia.org/wiki/X86_assembly_language#Syntax](https://en.wikipedia.org/wiki/X86_assembly_language#Syntax)

~~~
sigjuice
This article uses lldb, and not gdb. Both default to AT&T syntax, I guess.

------
pselbert
Excellent article that has context around the how and why, which I
appreciated. Pat has a great "explorer" writing style.

I was going to post this last night but figured it must have been posted
already. I was wrong! Nice to see it ok the front page.

~~~
pat_shaughnessy
Thank you! Knowing that people enjoy reading this sort of thing makes it
worthwhile.

------
RX14
If you're experimenting with asm in crystal, you might want to use
--prelude=empty to remove the standard library to make the asm output cleaner.
You can then then require lib_c and use that directly.

~~~
pat_shaughnessy
I tried that while researching the article, but found the call to "puts"
doesn't link without the standard library code. And without a "puts" or
similar call to produce output LLVM optimized the entire program away :)

I suppose this could work if, as you suggest, I manually called out to a lib_c
function like printf instead.

~~~
RX14
Yeah, puts is part of the part of the standard library, and uses crystal's
evented io framework, fiber scheduler and libevent. This is what most of the
extra code in the asm output will be doing.

------
dman
Btw if someone has a local copy of intels intrinsics tool before they
converted it into a web version - could you please share it?

------
Vera527
What a great read. Thanks for sharing op.

------
iamahacker2
Excellent article.

