1. Get a platform you can tinker around with, and recover quickly in case of disaster. Raspberry pi is a good example.
2. Learn how to find and read detailed register-level manuals of the underlying chip you're working on.
3. Learn what a programmer's memory-model is.
4. Learn what peripherals are, and what they do. You may have to dig a bit into basic electronics.
5. Learn what user-space vs kernel-space is.
6. Learn what device trees are, and how to prevent kernel from taking over a peripheral.
7. Based on above information, write a user-space driver for a simple peripheral (e.g. GPIO or UART) in C (don't try to learn assembly, yet). You will not be able to use the peripheral interrupts in user-space.
8. Learn how to verify hardware functionality, and see your driver in action. (Warning: high dopamine levels are reported at this point).
9. Repeat steps 7 and 8 for all peripherals, and see how far you can go.
I'd recommend something cheaper and simpler like the STM32F4 discovery boards. You won't get linux and will have to program via JTAG but the documentation for the STM32F4 is not as overwhelming as Broadcom's.
For a related twist the OP might check out the Nordic Semi's M4 with radio SoC offering, the nRF52840DK. It has an onboard, breakout-able segger JTAG programmer and is easy to get going with Segger's freebie take on a GCC IDE.
In reality every vendor's peripherals, and more importantly HAL and associated drivers and libs, will be different. But if you've done it once learning another vendor's way of life is a bit like learning a new programming language; it's just syntax.
Broadly most (IE communication, ADC, watchdog, etc) peripherals will be similar but there are definitely places where the differences will be larger. I'd expect those to be largely focused on clock trees, interrupt architectures, power modes, etc.
So short answer I agree with parent, get a cheap board and learn some stuff. If you like it, go from there. Do NOT choose a Cortex A though unless you want to really dig into embedded Linux. Systems at that level are way, way more complex and if the goal is to learn "everything about this processor" that will be especially difficult.
Except for the Kinetis DMA, I never understood that one!!
General ARM programming--yes.
General peripheral knowledge--yes.
An ARM is an ARM withing certain limits so your knowledge transfers. Sadly, every chip has a different way of programming the periperhals. However, every chip has roughly the same core peripherals (I2C, SPI, UART, etc.), and those electrical specifications and how you use them for protocols doesn't change.
General peripheral knowledge--yes.
Hardware is physical and can be quirky. Signals take real clock time to settle down and your CPU is probably much faster than the transition time. Hardware can and does have bugs just like software. A bad ground pin can make your hardware do “impossible” things.
You need to have much more awareness of the context your code runs in the further down the stack you go.
OTOH, there is a danger that I've seen frequently that embedded systems guys don't seem to be able to make the conceptual leap to fully paged, SMP programming. Having sort of come up this way myself (via mircocomputers without MMUs and just a single processor) it doesn't make any sense to me, but it seems common.
People who have been exposed to a more common multi-threaded and virtualized environment seem to be able to pick up those pieces easier.