Other things helping are forced >=16Kb page sizes, and massive L1 caches (M1 has 4X Zen 3's L1 data cache and 3X the L1 instruction cache; how much of that cache size is enabled by new process node and larger page sizes vs just lack of x86 decode I don't know).
L1 cache size is driven by the target clock frequency. Apple is not aiming for 5+GHz, whereas both Intel and AMD cores can turbo above 5GHz these days.