> I always cringe when developers compare compiler-optimized C to 'hand rolled' assembly. In the dawning age of SAT-based program synthesis and vectorization-aware superoptimizers, it's extremely anachronistic. It only shows a lack of awareness that in reality your job as a programmer is not to give instructions to a machine, but to provide enough help that a machine can create its own instructions better than you can.
I agree with your general point, but note that there are some unfortunate reasons why this can't be done in many cases.
A compiler can only make use of appropriate SIMD instructions if it knows that the target has the appropriate SIMD.
Most packages/shipped binaries are "universal" x86-64 binaries, i.e at compilation time, these instructions can't really be used effectively in many cases.
Of course, in certain cases, one can get around this issue.
For example, if one compiles from source, one can pass -march=native or similar such flags.
Another example is in game development, where one can freely assume that the target processor has AVX (or whatever it is based on the "min reqs" listed on the game).
But in the general case, the common solution is to compile in different pieces of code corresponding to the different target CPU levels and do runtime CPU detection.
Video coding (e.g FFmpeg) is one illustration of this approach, and to which I don't know of any reasonable portable alternatives.
GCC has the target_clones function attribute that lets you specify a list of different architecture variants, and GCC will emit a version of the function compiled for each, and automatically generate the code to pick the right one at runtime. This does hinder portability outside of the GNU binutils/glibc world, but is otherwise a pretty clean solution.
I agree with your general point, but note that there are some unfortunate reasons why this can't be done in many cases.
A compiler can only make use of appropriate SIMD instructions if it knows that the target has the appropriate SIMD. Most packages/shipped binaries are "universal" x86-64 binaries, i.e at compilation time, these instructions can't really be used effectively in many cases. Of course, in certain cases, one can get around this issue. For example, if one compiles from source, one can pass -march=native or similar such flags. Another example is in game development, where one can freely assume that the target processor has AVX (or whatever it is based on the "min reqs" listed on the game).
But in the general case, the common solution is to compile in different pieces of code corresponding to the different target CPU levels and do runtime CPU detection. Video coding (e.g FFmpeg) is one illustration of this approach, and to which I don't know of any reasonable portable alternatives.