Hacker News new | past | comments | ask | show | jobs | submit login
A Crash Course in X86 Assembly for Reverse Engineers (2013) [pdf] (sensepost.com)
217 points by ingve on Dec 3, 2016 | hide | past | web | favorite | 31 comments

If you're interested in learning assembly language, I recommend checking out "Programming from the Ground Up" by Jonathan Bartlett [0]. It's a great introduction to ASM and how computers work at the low level. It's also used in universities such as Princeton to teach introductory CS classes [1]. Definitely worth checking out.

[0] http://savannah.spinellicreations.com//pgubook/ProgrammingGr... [1] https://discuss.fogcreek.com/joelonsoftware5/default.asp?cmd...

This is great! Thanks for sharing. I have not seen this before.

Learning x86 directly can be a bit hard. There are easier assemblies to begin with (e.g: 8085). There's an educational program called gnusim8085, which I found excellent to get started with assembly, because while it's similar to x86, it has fewer instructions and registers. It comes with a debugger, a tutorial and some example programs. https://gnusim8085.github.io/screenshots

To learn reverse engineering look for "crackme" programs. These are programs that contain challenges for you to crack, and come with graded levels of difficulty. https://en.wikipedia.org/wiki/Crackme.

Now, one of the techniques that I've found more useful is the NOP sled. https://en.wikipedia.org/wiki/NOP_slide. Easy to implement as well, just replace bytes with the opcode 90 (NOP, or no operation).

The editor I use is a clone of Hiew called ht: http://hte.sourceforge.net/screenshots.html, which is free and multiplatform (just make sure to switch to disassemble mode with F6) This other tool is free and can be a good alternative to IDA Pro on Windows: http://x64dbg.com/

In this way you can get started for free.

Having said that, I don't know of many people that use ht editor. Make sure you change to the disasm mode (F6) and you can edit bytes using (F4). Save with F2.

You can also follow functions around rather quickly. Supports PE (Windows), ELF (*nix) and many other formats.

I wish for this in x86_64, I know the difference but we need to move on from 32 bit.

While amd64 is better in that it gives more registers for compilers to use, we rarely need to "move on". In fact many programs will work better (more efficient, faster) with 32bit pointers (as many programs would with 16bit integers instead of 32bit ones, where possible ofc). Realistically nobody codes for performance anymore so x86 or amd64 is not much difference (for math stuff, sse(1-4)/avx(1/2) don't care about amd64 vs x86).

Anyway, learning amd64 when knowing x86 is easy as they are mostly the same.

Well, for one the ASLR security feature is much weaker on 32-bit platforms.

I've been wondering whether it would be possible to write programs that have most of their memory in 32 bits, so most pointers would be 32 bits.

Also, such program would have int as a 32 bit value unless specifically declared to be larger - we could write programs that use less memory, but still use more registers, and use 64bit pointers and values as necessary

One of the less-well-known features of Linux is that you can do this! Theres's a thing called the "x32 ABI" (use the option -mx32 with gcc or clang; you'll need all your libraries compiled with it too) where:

* As far as the processor itself is concerned, the code runs in 64-bit mode, so you get the extra (and wider) registers from that.

* But pointers are still 32 bits, so you get the memory savings of 32-bit mode.

In principle, as long as you're using <4GiB of memory, it should be at least as fast as the best of 32-bit or 64-bit mode for any particular program. But I haven't heard of it being used much.


Like GP, I thought I had cleverly thought of this idea up myself. So glad that smarter people than I have did this up right!

The only thing missing, I guess, is 64bit 'himem' pointers.

does something like this exist for aarch64?

Look up the x32 ABI for Linux. It's exactly that.

I guess this is theoretically possible (...maybe..), but I would definitely require cooperation from the underlying operating system and some of the hardware.

Additionally, processors have "modes". You tell the processor to go into 64-bit or 32-bit mode. I don't think quickly switching back and forth between those in real-time is a very good idea.

64-bit mode is kind of a strict superset of 32-bit mode. So the program can run in 64bit mode. But for example, if a program stores a pointer in memory, it's stored as a 32bit number. Once it's loaded, it gets 0-extended to a 64bit integer.

The main thing is having a differen ABI (within the program), and using the linker/memmapping to ensure that the stack, code, and a heap is in the lower 4G. Another heap can be in the 64bit space, using 64bit pointers.

edit: one could even use trickery related to the alignment of pointers (i.e. 32bit alignment) to shift values on load or access, using the fact that the lower bits of the address. This could allow 36bit addresses that cover 16GB of memory.

So basically going back to segmented memory with near (32bit) and far (64bit) pointers?

Sometimes the instruction ends in a q for 64 bits.

The extra registers affect the way programs are written. You get a more RISC-like feel: this shows up most notably in the calling conventions, which pass all arguments in registers most of the time. You see more RISC-like three register arithmetic because the compiler has more opportunities to use LEA. Etc.

Piggy backing on this comment. Agner Fog has some nice tables at http://www.agner.org/optimize/calling_conventions.pdf describing the register usage and calling coventions. Chapter 6 & 7

It is a lot more than that. Calling conventions, register names and usage, etc. etc. I have a "cheatsheet" sitting on my desk that I wrote up last time I need to do some x86-64 reversing. Do no sit down with a x86-64 binary and use this manual as a guide, you will not get very far.

Would you be willing to post this cheat sheet, or link to it if you got it from the internet?

If you want to learn the low level today, where do you start ? Will x86 still be relevant in a few years ? Thinking about ARM, or is it about the same ? What does computer architecture optimize for today, besides more cycles per second ?

"Note that the bytes are saved in reverse order in the memory as Intel uses Little Endian representation. That means the most significant bit of every byte is the most left bit."

Haven't they got this backwards? Little Endian means the least significant bit is stored first [0].

[0] https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Data/endi...

Little endian means that the least significant byte is stored first. The bits within each byte are stored with the most significant first.

Who is this for? If you're a regular reverse engineer, won't you just pick up the Intel reference manual and flick through it? (It has a sorted instruction reference, which you'll probably find much more useful than anything categorised.) And if you're not, I'd have thought knowing an assembly language already a rather important prerequisite - seems like a very strange line of work to pick otherwise - and in that case you'd presumably do the same thing.

>won't you just pick up the Intel reference manual and flick through it?

The Intel reference manual is incredibly bloated and dry reading. Yeah, it has literally everything you would want to know. But good luck trying to understand all of it in a reasonable amount of time.

I learned x86 while studying buffer overflows in college. We used Hacking: The Art of Exploitation which walked us through most of the core concepts really well.


Does anyone know something similar for arm?

The ARM System Developer's Guide is one of the best ARM books I've ever read. Has nothing to do with reverse engineering, but a great ARM reference.

If you can reverse engineering in X86, a reference for ARM ASM is all you'll need. (could get by with the official docs but this book really is something special)

You'll weep tears of joy going from X86's nightmarish instruction set to the beauty of RISC ARM!


Great. Thank you.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact