I too have written 100's of thousands of lines of 680x0 assembler and still get a warm nostalgic glow when I see move.l, add.w, etc!
There are AArch32/A32, Thumb-2/T32 (Thumb for older processors) as the two common 32 bit instruction sets and AArch64/A64 as 64 bit instruction set.
For example AArch64/A64 does not support no conditional execution (the encoding bits that are now available are used to encode 32 instead of 16 registers). On the other hand Thumb cannot encode many instructions that AArch32/A32 can. Thumb-2 was specifically developed to deliver instructions to provide equivalent functionality. If you want to look into details look at how UAL assembly is encoded quite differently into A32 or T32 instructions.
Also for older ARM processors support for Thumb was optional (though very common). On the other hand modern 32 bit microcontroller cores of the ARMv8-M series (cf. http://www.arm.com/products/processors/instruction-set-archi...) only support T32.
TLDR: Of which of these common three ARM instruction sets are you talking about in your question "How about ARM?"?
I'm hobbling together a Genesis/MegaDrive emulator in Rust, and also trying to wrap my head around the m68k at the same time.
It's tough to find good documentation about how the cpu works, and this is a fantastic reference.
Motorola's patents for the 68000 are also quite informative (see US patents 4,325,121, 4,296,469 and 4,307,445) if you're interested in the low level details of the implementation. The first of those actually contains a complete microcode/nanocode listing of a pre-release version of the CPU. For Genesis/Megadrive emulation generally, you should check out this thread  on SpritesMind. It contains links to other forum topics with useful info about the hardware (68K included) discovered by the community.
I have copies of the two Motorola manuals, and they are comprehensive and quite useful. The OP is tremendously helpful as well because it's a bit higher level. Now I can flip between the manual for opcodes and the tutorial for assembler names and contextual understanding.
I'm excited to dig into this thread as well. There's a lot here that looks fascinating. Particularly the microcode exploration, the reference on the sound chip, and the interrupt handling. Not to mention timing information.
I haven't gotten much further in my experimentation than rudimentary execution of a few opcodes, and have mostly been floundering around trying to make the code and project feel ergonomic. I'm discovering (and expected) that there is quite a lot of stuff you want to set up early on in a project like this. Mainly debugging and other sundry tooling.
I'm away from my laptop, but I'll try to post the code somewhere tomorrow.
I mainly read 68K assembler when reading compiler output, but it was beautifully clear and understandable.
"The only proper way to understand 80x86 coding is to realize that ALL 80x86 OPCODES ARE CODED IN OCTAL. [...] For some reason absolutely everybody misses all of this, even the Intel people who wrote the reference on the 8086 (and even the 8080)."
Outside of 64-bit mode you can't do byte operations on [e]si, [e]di, [e]bp or [e]sp (though for the last of those, there is little reason you would want to). In 64-bit mode, you can do that with a REX prefix, but then you can't access the ah, bh, ch and dh. There's only one size bit in the main opcode byte so they had to add a prefix byte when they added 32-bit support and another one when they extended to 64-bit. Unsigned multiply instructions have a fixed destination register/register pair. Numerous instructions have special shorter encodings you can use for certain registers.
All of this makes a lot of sense in the context of the 8086. It needed to be easy to port assembly from the 8080 and memory was incredibly limited. As a product of its time it seems like a pretty good design, but it wasn't a forward thinking design. The 68000 by contrast was a (mostly) 16-bit implementation of a 32-bit, forward-looking, ISA. It's a shame x86 survived while 68K is mostly dead.
Only when the bit offsets of the fields are multiples of 3 (for example a 3-bit field does not go from bit 1 to 4).