Mario 64 gets flak for some files not being compiled with optimizations on. There is a YouTuber, whose name I forget at the moment, who has been doing optimization and improvements on it and constantly talks about how this is for the better due to the low instruction cache size and slow speed on the console. Compiling with optimizations explodes code size (unrolling links, inlining, etc) to the point it’s a net loss after factoring in the increased load on the IC.
The IC is the instruction cache. Bloated code needs more slow loads from the IC, and even if the performance is faster per cache line, the 50x overhead of added cache loads diminishes all -O3 advantages.