The faker's guide to reading x86 assembly language

15155 · on Jan 4, 2023

In my opinion, most articles, lessons, etc. that purport to teach assembly miss the forest for the trees. This is a good article for someone wanting to learn "what assembly _is_" - but I have yet to see an article targeted to programmers that would teach "how do I practically use this."

One can learn the vague syntax complexities, etc. - maybe even write a Hello World..but without having real-world exposure of what compilers actually do to source code, people cannot actually derive meaningful signal or intuition. A very experienced programmer without exposure to assembly might be able to optimize a tight loop or spot a performance issue, but that's probably the extent of it.

Complexity increases as you introduce platform-specific calling conventions, APIs, etc.

I don't know a better way to acquire assembly intuition than by implementing a RISC-V CPU or staring at a lot of decompiled output.

rcarr · on Jan 4, 2023

I've heard nothing but good things about this course which should do the trick:

https://www.nand2tetris.org

It on my todo list at some point, maybe this year or next.

markus_zhang · on Jan 4, 2023

I completed the hardware part last month. It's definitely accessible and serves as a good entry project for interested students. However it glossed over a lot of things (that even I'm aware of) so I decided to up the game and find other projects/lectures. The CPU project of University of Tokyo seems interesting but is too tough for whoever just completes nand2tetris. Some bridge project is needed.

rcarr · on Jan 5, 2023

I came across this on HN a month or two back, might be the kind of project you're looking for?

https://539kernel.com

markus_zhang · on Jan 6, 2023

Thanks! That's part of the picture. I'm looking for something similar to nand2tetris but level up one notch. 539kernel definitely can serve as part of the puzzle.

timmisiak · on Jan 5, 2023

Covering how to practically use it is far too much for a single post (and I don't think I really claimed to teach assembly... I just want to give people the tools they need to start learning assembly). This was mainly to get people to not be scared and start reading asm, which is really the only way you can learn this stuff. In my opinion, writing a hello world in asm is not useful for most people. But learning enough to understand what part of a complex C++ expression caused a program to crash is much easier, and more widely useful. My plan for future posts is to talk about common compiler-generated code patterns to help people recognize what's causing a crash even if they don't understand every line of asm.

GaNuongLaChanh · on Jan 4, 2023

AsmBB - a forum written in Assembly prove that usage

Waterluvian · on Jan 4, 2023

This was a nice read, but I was hoping for something one step above this. When I was writing an emulator, I spent so much time looking at 8080(ish) code that I began to see constructs for bigger concepts. Like, “oh you’re setting up and running a loop over some subroutine X times.” “You’re checking if a certain math expression equals a certain value before continuing.” Etc.

I imagine an Assembly reader sees these things. I wonder if there’s a guide like this for them.

msla · on Jan 4, 2023

An easy way to learn to see those patterns is through exploring compiler output:

https://godbolt.org/

Other than that, I like "Assembly Language Step By Step" by Jeff Duntemann, which is currently in its third edition and is Linux-only, as opposed to previous editions which were MS-DOS and Linux. He has example assembly code (Intel syntax for NASM, another reason I like his book) for download on his website:

http://www.duntemann.com/assembly.html

My only annoyance is that apparently, even the most recent edition from 2009 is still 32-bit only.

Anyway, beyond that, you'd be looking for information on using assembly language to perform some specific task, like writing vectorized numerical code.

signaru · on Jan 4, 2023

I'll guess practical compiler books that goes down to assembly would also be helpful, especially those for those with inclination to language implementations. So far, I haven't read any though, and hope rui314's book comes out some day.

leni536 · on Jan 4, 2023

In my experience the most important parts of reading godbolt output are:

1. Data movements (mov)

2. Control flow (various jumps with corresponding tests, call and ret)

3. Calling convention. But you can often wing this and figure it out from context.

zokier · on Jan 4, 2023

One thing I like for following the control flow is radare2 visual graph modes

Donckele · on Jan 4, 2023

Regarding the book, What would be the difference between 32bit and 64bit? Is it still relevant to read today?

harry8 · on Jan 4, 2023

Yes - it is one way to do it and it will work and be reasonably efficient. Everyone[1] who knows x86-64 asm today learned 32 bit first. You still have 32 bit instructions, and 16 bit, and 8 bit. The instructions will translate. You get more registers. You need to update the function and syscall calling conventions. There's no reason there can't be "learn x64 asm" books but there aren't many last I looked, but maybe someone can link us up to show how wrong I am?

https://wiki.osdev.org/Calling_Conventions

Richard Blum [2] after Dunteman.

[1] At a reasonable level of approximation.

[2] https://www.wiley.com/en-us/Professional+Assembly+Language-p...

saagarjha · on Jan 4, 2023

I would doubt that, people haven't been running 32-bit x86 code for years at this point.

15155 · on Jan 4, 2023

How exactly do you get away with not using the 32-bit subset of AMD64 ASM?

"32-bit x86 code" just means "code that does not use any of the AMD64 ISA." (And at this point, probably SSE2+ and a handful of other extensions.)

GP is correct: anyone that "knows" AMD64 assembly inherently must know 32-bit ASM because one must use the original x86 registers, instructions, etc. for the vast majority of tasks.

People are still constantly running what was formerly known as "32-bit x86 code" in their 64-bit applications.

saagarjha · on Jan 5, 2023

I read “learned 32-bit first” as “learned on i386 machines”, not “learned what the 32-bit instructions do on 64-bit computers”. There’s a big difference between these two, because the way you write code for a machine that doesn’t do 64-bit is quite different, since you have many more registers, the calling convention is pretty different, etc.

markus_zhang · on Jan 4, 2023

From my very limited RE experience, one difference is calling convention: 32-bit passes a lot of stuffs on stack while 64-bit passes the first 8 (IIRC) through registers and the rest on stack. I could be wrong but this is one glaring difference. Another one of course is the registers them selves: EAX vs RAX for example.

hirvi74 · on Jan 4, 2023

> Intel syntax for NASM

Like the Good Lord intended.

timmisiak · on Jan 5, 2023

I'm planning to write a future post on "common compiler generated code" to recognize those patterns. The intention of this post was to encourage folks who think asm is scary to give it a try because it's not as bad as they think. You really need to spend a bunch of time with godbolt or a debugger with a side-by-side source and asm view to really build an intuition for these things. I want to try to give folks the tools they need to start building that intuition, because you can read dozens of articles about asm and still have no idea what you're doing.

danuker · on Jan 4, 2023

Maybe a decompiler can take you further:

https://boomerang.sourceforge.net/cando.php

https://www.backerstreet.com/decompiler/decompilers.htm

steve1977 · on Jan 4, 2023

> But the assembly code will always tell you the truth.

Is this really still the case with modern caching branch-predicting microcode processors? [1]

From what I know (which is little), there is quite a way between assembly and what a processor will actually execute.

[1] just throwing around buzzwords

JoeAltmaier · on Jan 4, 2023

Sure it gets executed using microcode constructed from the assembly. But there's a contract that what happens is precisely what the assembly said, with a little wiggle room for operation order and bus cycle width.

Perhaps it's more correct to say assembler is 'closer to the truth'

wongarsu · on Jan 4, 2023

Just yesterday we had a thread about a core dump that turned out to be a CPU bug (stack pointer being incremented by 1024 in very specific but consistent circumstances), and the discussion had multiple people who encountered kernel or CPU bugs that lead to correct assembly producing incorrect results.

So assembly is evidently not always the truth. It is however the closest we can easily get to the truth (on consumer desktop/server grade processors).

https://news.ycombinator.com/item?id=34230823

VLM · on Jan 4, 2023

Less abstract means less is obscured, more truthy.

Higher level languages will have numerous control structures to iterate, consider everything from do-until, do-while, for loops, all the way up to functional languages. That's great. It all gets translated down to machine language code eventually at some low level, either at compile or run time.

Most people iterate in assembly the "simplest" way which will vary by architecture but is generally not overly abstract.

Similar to how its possible to write in an OO style (inheritance, polymorphism, etc) in non-OO languages but most people writing in non-OO languages do not. I think it would be possible to write a bash shell script that implements the concept of polymorphism, but most people would never do that. Its possible and funny to write thousands of lines of code to do "enterprise grade patterns hello world" but most shell scripters will "echo hello world" and call it good which makes it more truthy, less abstract, less obscured.

True, it'll be impossible to infinitely extend and scale the bash "hello world" script, but most software problems are never infinitely extended and scaled anyway. The most powerful tool for a job is rarely the correct tool for the job.

commandlinefan · on Jan 4, 2023

> between assembly and what a processor will actually execute

Well, modern processors can execute out of order for efficiency reasons, but the end result must be exactly the same as if the instructions were executed serially.

hooli_gan · on Jan 5, 2023

It's the closest you'll get without going complete insaneo style. [0]

[0] https://www.youtube.com/watch?v=hQ40jNXMDlk

msla · on Jan 4, 2023

Yes, the simple opcodes are the most-used, and this was one of the ideas floating around that helped inspire RISC, but be careful how you figure out which opcodes are different. For example, mov in x86 is extremely polysemous: It can be move from register to register, load register with data from RAM, store register contents out to RAM, store a constant out to RAM, and even perform ALU operations and then use the result of those operations as a memory address for storing something out to RAM. On a cleaner (RISCier) ISA, all of those would be different opcodes or sequences of opcodes, assuming they're possible at all.

Other than that, this piece looks good as far as it goes.

jcranmer · on Jan 4, 2023

One thing worth pointing out is that, if you ignore the way the assembly is written and instead look at the binary encoding, then x86 is closer to a RISC idea. Essentially, the core of an x86 instruction is opcode + ModR/M byte, which encodes a register and register-or-memory operand. There's one opcode [1] that means "move from second operand to first operand" and a different opcode that means "move from first operand to second operand". Intel merely decided to call both of these instructions "MOV" rather than "LD" and "ST"--but it does basically have them. There's also another opcode that loads immediates into a register.

Furthermore, the ability to put a memory operand on, say, an ADD instruction is close in effect to having a compressed instruction encoding that encodes "LD to a temporary, unnamed register followed by ADD that register to the destination register" in fewer bytes than having both (also avoids clobbering a register, useful given the thin 8 registers 32-bit x86 has).

[1] Okay, several, to vary the operand size (8-bit, 16-bit, 32-bit, 64-bit).

zokier · on Jan 4, 2023

One of my pet ideas is to design a new set of mnemonics and syntax for x64 assembly, the Intel ones are pretty crufty in many ways and imho have potential to be improved. Of course it will not win any popularity contests, but could be interesting design exercise.

msla · on Jan 4, 2023

> One thing worth pointing out is that, if you ignore the way the assembly is written and instead look at the binary encoding, then x86 is closer to a RISC idea. Essentially, the core of an x86 instruction is opcode + ModR/M byte, which encodes a register and register-or-memory operand.

You could say the same thing about the VAX, which was even more CISC than the x86 is, because of how its opcode encoding worked: Bytes for the opcode, bytes for addressing modes, bytes for register specifications or constant values or memory addresses. Ditto the PDP-10. Which is to say that's not a helpful way of making the RISC/CISC divide because it ignores everything that makes those kinds of processors different.

Somewhat down in this page, written by processor designer John Mashey, is a list of features which RISC processors tend to have that CISC processors don't:

https://userpages.umbc.edu/~vijay/mashey.on.risc.html

Above that list are two points which get to the heart of the RISC project:

> The RISC characteristics:

> a) Are aimed at more performance from current compiler technology (i.e., enough registers).

> OR

> b) Are aimed at fast pipelining

> - in a virtual-memory environment

> - with the ability to still survive exceptions

> - without inextricably increasing the number of gate delays (notice that I say gate delays, NOT just how many gates).

The point b is where RISC chips really pulled away from CISC in terms of architectural design, especially chips like the MIPS, which Mashey worked on: The MIPS had a number of points where it exposed the tricks it used to pipeline more aggressively, even at the expense of making compilers somewhat harder to write and/or human assembly-language programmers think a bit harder. However, the lack of complicated addressing modes (post-increment, scale-and-offset, etc.) and the lack of register-memory opcodes with ALU operations, and total lack of memory-memory operations, is still a very common feature of RISC design.

I also want to take on this:

> Furthermore, the ability to put a memory operand on, say, an ADD instruction is close in effect to having a compressed instruction encoding that encodes "LD to a temporary, unnamed register followed by ADD that register to the destination register" in fewer bytes than having both (also avoids clobbering a register, useful given the thin 8 registers 32-bit x86 has).

The difference between having one opcode which does both memory operations and ALU operations and not having those kinds of opcodes is faulting: If the CPU has to take a fault, does it have to back out a lot of ALU state such that opcodes appear to be atomic? CISC chips do, and they pay for it, whereas RISC chips are designed not to have to. This, again, makes pipelining easier. (And the page I linked to goes into this as well.)

CISC/RISC is points on a scale, but that doesn't mean it's helpful to "reinterpret" things to try and make CISC seem equivalent to RISC.

kasajian · on Jan 4, 2023

https://github.com/xoreaxeaxeax/movfuscator

timmisiak · on Jan 5, 2023

The target audience here were folks that have very little experience with asm, but you're completely right that a lot of complexity gets glossed over. That's not even to mention cases where an instruction behaves differently depending on the code segment attribute or privilege level it's running in.

markus_zhang · on Jan 4, 2023

I dabbed into reverse engineering a while ago (and probably will dive deeper into it seriously later) and realized it is not particularly difficult to recognize constructs in assembly code, but only for simple code such as examples. Once they go up one level, say start using a lot of win32 api, it then makes the business a lot more confusing.

That was why I decided to drop the study temporarily. I wanted to figure out exactly the OS and the type of applications (e.g. DOS virus, or Windows malware) I'd like to reverse engineer for, gain some positive engineering experience and then come back.

timmisiak · on Jan 5, 2023

That's true. It's far too broad to learn everything without having a specific goal in mind. Learning the parts of asm that are useful for writing asm code is very different than learning enough asm to understand what APIs are being called from malware.

zabzonk · on Jan 4, 2023

does anyone know why at&t opcodes ever existed in the first place?

masklinn · on Jan 4, 2023

IIRC it’s the syntax for an older assembly (I want to say DEC’s) which the UR-Unix kept using (and extending) as it was ported to new architectures rather than use whatever the platform’s assembly was.

msla · on Jan 4, 2023

Here's one explanation:

https://news.ycombinator.com/item?id=33586476

which cites this PDF:

https://www.bell-labs.com/usr/dmr/www/otherports/newp.pdf

andrewf · on Jan 4, 2023

Not for the last time :)

https://go.googlesource.com/go/+/refs/tags/go1.19.4/src/runt...

boring_twenties · on Jan 4, 2023

I can't find it now but I could swear there was an hn post about that very thing just in the past few weeks.

rolph · on Jan 4, 2023

near the end of text,one of the most complete resources:

https://www.intel.com/content/www/us/en/developer/articles/t...

coding in assembler is very manual, such as stack manipulation, or dropping assembler instructions into windows applications.

these are procedures you can pickout from the code, but very hard to discover if you are unaware of the hueristic.

other resources.

https://www.agner.org/optimize/

renox · on Jan 5, 2023

"not so hard" .. until your compiler decide to vectorize your loop then good luck!

timmisiak · on Jan 5, 2023

That's actually something I want to tackle in the next post I'm writing. I need good examples though of stuff like that, so I think I need to spend some time with godbolt...

fsckboy · on Jan 4, 2023

>or calculating pi in roman numerals

     xxii
    -----
     vii

hmmm... now what do i do? how about uppercase/lowercase?

     XXII
    ----- = III i iv ii viii ...
     VII

sbaiddn · on Jan 4, 2023

Love it.

Instead of upper/lower (romans didnt have lowercase?) I would define a latin word, say "dividum" and write it as

CCCXIV DIVIDUM C

Which for short would be:

CCCXIVDVDC

Where DVD is a nonsensical or at least redundant roman numeral (500 + 495 = 995 VM). For readability we now add a new symbol to unicode of a V and a D superimposed.

N.B. Latin nerds... I dont care what the proper term for "divided by" is.