I'm always amazed by the strange assembly generated by modern compilers. LLVM will auto-vectorize a routine into this awesome NEON SIMD masterpiece... and then throw in a bunch of completely redundant stack loads/stores and refuse to use any registers except r0 and r1! 0_o
You can tell where the devs with Ph.Ds in compiler design are putting all of their effort :)
That's true. I guess my perspective has also changed in that I don't need every cycle to be used carefully any more - on that 7.16MHz Amiga, deleting those instructions mattered a lot more.. :)
I think that's also affecting where the optimization effort goes to a great extent - it's more likely to be invested on the type of code people are more likely to use in critical inner loops to be run in places that might saturate large numbers of cores...