> The tricky part is the are a number of different valid march flags for x86 and ARM that support SIMD.
Sure, but if you're writing software which works on AMD64, you can see prettymuch every autovectorization opportunity that exists on other platforms. Most of the autovectorizer functionality in gcc and clang is platform-agnostic, AFAIK.
Yes you can see the broad opportunity but various things can easily break vectorization between extensions of x86 and ARM. The first that comes to mind is use of doubles instead of floats. Another is too wide of an inner loop, especially on algorithms that must be tuned for cache line sizes. You can also run into accuracy & stability issues between different x86 instruction extensions - it's really a nightmare to debug those.
Isn't the best flag `-march=native`? It gives you all the vectorization that your processor supports, or at least I think that's the case? At least if you don't have to provide builds for other people's machines.
Sure, but if you're writing software which works on AMD64, you can see prettymuch every autovectorization opportunity that exists on other platforms. Most of the autovectorizer functionality in gcc and clang is platform-agnostic, AFAIK.