To learn reverse engineering look for "crackme" programs. These are programs that contain challenges for you to crack, and come with graded levels of difficulty. https://en.wikipedia.org/wiki/Crackme.
Now, one of the techniques that I've found more useful is the NOP sled. https://en.wikipedia.org/wiki/NOP_slide. Easy to implement as well, just replace bytes with the opcode 90 (NOP, or no operation).
The editor I use is a clone of Hiew called ht: http://hte.sourceforge.net/screenshots.html, which is free and multiplatform (just make sure to switch to disassemble mode with F6) This other tool is free and can be a good alternative to IDA Pro on Windows: http://x64dbg.com/
In this way you can get started for free.
You can also follow functions around rather quickly. Supports PE (Windows), ELF (*nix) and many other formats.
Anyway, learning amd64 when knowing x86 is easy as they are mostly the same.
Also, such program would have int as a 32 bit value unless specifically declared to be larger - we could write programs that use less memory, but still use more registers, and use 64bit pointers and values as necessary
* As far as the processor itself is concerned, the code runs in 64-bit mode, so you get the extra (and wider) registers from that.
* But pointers are still 32 bits, so you get the memory savings of 32-bit mode.
In principle, as long as you're using <4GiB of memory, it should be at least as fast as the best of 32-bit or 64-bit mode for any particular program. But I haven't heard of it being used much.
Additionally, processors have "modes". You tell the processor to go into 64-bit or 32-bit mode. I don't think quickly switching back and forth between those in real-time is a very good idea.
The main thing is having a differen ABI (within the program), and using the linker/memmapping to ensure that the stack, code, and a heap is in the lower 4G. Another heap can be in the 64bit space, using 64bit pointers.
edit: one could even use trickery related to the alignment of pointers (i.e. 32bit alignment) to shift values on load or access, using the fact that the lower bits of the address. This could allow 36bit addresses that cover 16GB of memory.
Haven't they got this backwards? Little Endian means the least significant bit is stored first .
The Intel reference manual is incredibly bloated and dry reading. Yeah, it has literally everything you would want to know. But good luck trying to understand all of it in a reasonable amount of time.
I learned x86 while studying buffer overflows in college. We used Hacking: The Art of Exploitation which walked us through most of the core concepts really well.
If you can reverse engineering in X86, a reference for ARM ASM is all you'll need. (could get by with the official docs but this book really is something special)
You'll weep tears of joy going from X86's nightmarish instruction set to the beauty of RISC ARM!