> *The tricky part is the are a number of different valid march flags for x86 an...

speleo_engr · on Feb 19, 2019

Yes you can see the broad opportunity but various things can easily break vectorization between extensions of x86 and ARM. The first that comes to mind is use of doubles instead of floats. Another is too wide of an inner loop, especially on algorithms that must be tuned for cache line sizes. You can also run into accuracy & stability issues between different x86 instruction extensions - it's really a nightmare to debug those.

Something1234 · on Feb 20, 2019

Isn't the best flag `-march=native`? It gives you all the vectorization that your processor supports, or at least I think that's the case? At least if you don't have to provide builds for other people's machines.