gcc "solves" this problem for its own inline functions by tagging them as static. But what about template functions? And what about functions defined by developers. I don't think gcc claims to solve those.
There's an easy way to do this for .Net programs - some Windows .Net programs (Paint.NET, etc) contain an "optimization" step at the end of their installers that ngens their binaries. The ngen'd binary is free to use CPU-specific instructions.
I don't think this scales. I work on text search, and there are distinct algorithms that use at most SSSE3, some use at most SSE 4.2 and some use at most AVX2. Should I then ship 4 different versions for every platform I support? What are my instructions to users on the download page?
It's worse than what you have now, of course, but the situation is not new with AVX.
>What are my instructions to users on the download page?
If the differentiation is at the DLL level (SSE3 implementations in their own DLL, SSE4 implementations in their own DLL, all providing a common interface), your application would load the corresponding one at runtime based on feature detection and your user never notices.
If the entire binary / program package needs to be differentiated (say ripgrep where there's only one statically linked binary to speak of), your Windows installer picks the right binary to install, and you have multiple Linux packages and a metadata package that depends on them and picks the one with the matching architecture. The former is definitely done, and I believe I've seen instances of the latter in my package manager's listings.
As DannyBee mentions in the sibling, this issue is compounded in AVX512. You really need to support runtime detection, and doing that means including multiple versions of the same function in the same binary. It looks like one is at the mercy of your compiler for how well this works, but it seems like gcc/clang do this just fine?
Sure, I didn't say that it was impossible in general. The context of my comments was that MSVC doesn't appear to support automatic target-architecture-based dispatch and thus requires such workarounds.
For C, in fact, you get "whatever", and that's even legal.
For C++, the problem is templates, etc, are linkonce, and you can't choose which it's going to pick (ie the problem described in the article).
You also can get screwed by inlining, etc.
Function multi-versioning alone does not solve this well, you really need a way to make the ABI part of the template arguments or something to do it well.
There are other issues.
How do you think that compilers and linkers should handle the problems of non-inlined inline functions in the face of /arch:AVX? Saying that it should not even be a hard thing doesn't help much.
Literally the only two things it takes to do this are a conditional check based on CPUID result and CodeGen support for generating multiple Arch paths. If you have those two, multi-version function is very likely to work with safety.
Edit: oh, you are the author. I hope you weren't offended. I meant none.
The C++ language says that the linker is free to choose any instantiation of an inline function because they are required to be identical. Failure to make them identical is an ODR violation, and the ODR implicitly says that the linker can grab whichever one it wants.
It's also not clear what sort of differences to report. Let's say that you have two translation units, one compile /O1 and the other compiled /O2. They would probably generate different versions of floor, but this is not an ODR violation. So, how is the linker supposed to detect when a difference in generated code is fatal and when it is benign?
The C++ language does not require reporting of ODR violations because such reporting is expensive and problematic. What sort of ODR reporting do the gcc and clang toolchains do?
With the inline semantics defined by C99 there are two solutions to this problem:
1. Use static inline. Any non-inlined calls will result in an instance of the function with internal linkage. Aside from wasting space with multiple copies of the same function it is harmless. As mentioned, an optimising linker might even merge identical functions across compilation units.
2. Use inline without explicit extern. Non-inlined calls result in an undefined external reference to be resolved by the linker with a definition provided elsewhere. The inline definition is only used when the function is actually inlined and never provides a non-inline definition of the function, internal or external.
Neither of these approaches result in duplicate definitions with external linkage, so the described problem cannot arise.
It's disappointing that Microsoft seem to have chosen the one approach to inline functions that results in this sort of breakage.
No, you are incorrect. The function in question was tagged as inline. Having multiple instances of it is legal. If a Unix linker gave an error on multiple copies of an inline function then that would be a serious bug that would prevent them from linking any non-trivial C++ program.
For inline functions it is only an ODR violation if the two instances are different in an ODR relevant way. The language spec doesn't discuss compiler switches but it is logically clear that /arch:AVX versus not is an ODR violation, whereas /O1 versus /O2 is not. So therefore, even having two different instances of a non-inlined inline function is not necessarily illegal, and must be accepted by a conforming linker.
> Use inline without explicit extern.
I'm not sure why you think that this is a solution. An explicit 'extern' is not needed when defining an inline function. If it is not inlined then the compiler generates a copy that can be referenced. I have worked with gcc, clang, and VC++ and I have never had to explicitly mark an inline function as extern.
You are correct that static inline solves the problem, although I think that this is an ugly solution. And, it is a standard library solution, rather than a toolchain solution, so it doesn't automatically help developers who write their own inline functions.
The best suggestion I heard (on twitter) was name mangling (for non-inlined inline functions) that added an architecture suffix, thus making accidentally calling the wrong architecture function impossible. Much cleaner and more efficient than static inline.
It is also true that Unix linkers give an error if the final set of object files to link contain more than one external definition of the same symbol. The linker can't tell if functions were marked inline in the source code or not. A definition is a definition. The Microsoft object file format might record this information. If it does, it would appear to be a misguided decision that opens the door for the kind of breakage you're complaining about.