I think the issues are the same for the arm64/m1. The cache structures are probably even similar. However, modern CPUs are out of order superscalar and dual ported.
You might look at something like the Arm® Cortex®-A75 Software Optimization Guide for a good picture of the microarchitecture. Unfortunately Apple doesn't put out that kind of information for the m1. The LLVM docs have a little information but the structure isn't that different from an A75 although the latencies are.