> I always cringe when developers compare compiler-optimized C to 'hand rolled' ...

wtallis · on Dec 11, 2016

GCC has the target_clones function attribute that lets you specify a list of different architecture variants, and GCC will emit a version of the function compiled for each, and automatically generate the code to pick the right one at runtime. This does hinder portability outside of the GNU binutils/glibc world, but is otherwise a pretty clean solution.

gajjanag · on Dec 12, 2016

Thanks a lot for this pointer.

I noticed just now that there is another hn discussion on recent developments on this front in GCC: https://news.ycombinator.com/item?id=13153320.

I do wish there was a portable solution though (at least across *nix platforms with clang or gcc).