
VC++ /arch:AVX option – unsafe at any speed - jsnell
https://randomascii.wordpress.com/2016/12/05/vc-archavx-option-unsafe-at-any-speed/
======
nly
Meanwhile this problem has been solved in GCC for years

[https://gcc.gnu.org/wiki/FunctionMultiVersioning](https://gcc.gnu.org/wiki/FunctionMultiVersioning)

~~~
brucedawson
Isn't dynamic dispatch a separate problem? It's great that gcc solves dynamic
dispatch, but this article was about the problem of functions that are not
supposed to be dynamically dispatched, but are called from dynamically
dispatched functions - the noninlined inline function problem.

gcc "solves" this problem for its own inline functions by tagging them as
static. But what about template functions? And what about functions defined by
developers. I don't think gcc claims to solve those.

------
Arnavion
The reason given for not doing it for Chrome are sound, but I agree with the
author that the solution for most software should be to ship separate AVX-
enabled and non-AVX-enabled binaries (whether an individual DLL or the entire
program).

There's an easy way to do this for .Net programs - some Windows .Net programs
(Paint.NET, etc) contain an "optimization" step at the end of their installers
that ngens their binaries. The ngen'd binary is free to use CPU-specific
instructions.

~~~
burntsushi
> but I agree with the author that the solution for most software should be to
> ship separate AVX-enabled and non-AVX-enabled binaries

I don't think this scales. I work on text search, and there are distinct
algorithms that use at most SSSE3, some use at most SSE 4.2 and some use at
most AVX2. Should I then ship 4 different versions for every platform I
support? What are my instructions to users on the download page?

~~~
Arnavion
>Should I then ship 4 different versions for every platform I support?

It's worse than what you have now, of course, but the situation is not new
with AVX.

>What are my instructions to users on the download page?

If the differentiation is at the DLL level (SSE3 implementations in their own
DLL, SSE4 implementations in their own DLL, all providing a common interface),
your application would load the corresponding one at runtime based on feature
detection and your user never notices.

If the entire binary / program package needs to be differentiated (say ripgrep
where there's only one statically linked binary to speak of), your Windows
installer picks the right binary to install, and you have multiple Linux
packages and a metadata package that depends on them and picks the one with
the matching architecture. The former is definitely done, and I believe I've
seen instances of the latter in my package manager's listings.

~~~
burntsushi
But I don't think you actually need to do that. You can compile individual
functions with different target specific optimizations using (e.g.) LLVM's
target_feature attribute. Both gcc and clang expose this type of
functionality.

As DannyBee mentions in the sibling, this issue is compounded in AVX512. You
really need to support runtime detection, and doing that means including
multiple versions of the same function in the same binary. It looks like one
is at the mercy of your compiler for how well this works, but it seems like
gcc/clang do this just fine?

~~~
Arnavion
>Both gcc and clang expose this type of functionality.

Sure, I didn't say that it was impossible in general. The context of my
comments was that MSVC doesn't appear to support automatic target-
architecture-based dispatch and thus requires such workarounds.

~~~
burntsushi
Ah! Gotya. Understood. Sorry for the mixup!

------
alfalfasprout
CPU dispatch like this is pretty hard to get right. For now, I've only seen
Intel's ICC/ICPC get it right. Even then, if you have an AMD CPU you're out of
luck.

~~~
flamedoge
This... should not even be a hard thing.

~~~
brucedawson
Dynamic dispatch is not a hard thing. And in fact the article treats dynamic
dispatch as being too simple to be worthy of discussion. The article is about
the subtle problems that can happen after you have dynamically dispatched.

How do you think that compilers and linkers should handle the problems of non-
inlined inline functions in the face of /arch:AVX? Saying that it should not
even be a hard thing doesn't help much.

~~~
flamedoge
The way the author is using it to generate both AVX and non-AVX code is such a
hack that I find it hard to believe it even worked. It is literally a ticking
time bomb that depends on compiler's inliner heuristics to trigger. Compilers
and linkers should be able to generate and optimize different code paths -- it
shouldn't have to take hacking and tinkering with tools to do something. To
make it sound worse, that something isn't even guaranteed to be safe even with
hacky build.

Literally the only two things it takes to do this are a conditional check
based on CPUID result and CodeGen support for generating multiple Arch paths.
If you have those two, multi-version function is very likely to work with
safety.

Edit: oh, you are the author. I hope you weren't offended. I meant none.

------
DannyBee
Yeah, VC needs function multiversoning like everyone else. Sadly, it doesn't
help if you want to do something like have templated c++ functions

------
ComputerGuru
This was a solved problem going back at least 10 years. At $oldjob, we would
simply compile an entire section of project as a standalone DLL once wth AVX
(it was SSEx something back then) and once without, then dynamically load the
entire module - not call a single function.

------
nitrogen
Is linker ordering acrually undefined as the author suggests? I thought that
linkers had a well-defined process for deciding what symbols to take from
which object.

~~~
brucedawson
I'm the author but I'll chime in and say that yes, linker ordering is
undefined (or at least only partially defined) in general, and definitely in
this case.

The C++ language says that the linker is free to choose any instantiation of
an inline function because they are required to be identical. Failure to make
them identical is an ODR violation, and the ODR implicitly says that the
linker can grab whichever one it wants.

------
mansr
I'm surprised this doesn't cause a multiple definitions error. Is the msvc
linker really that broken?

~~~
brucedawson
The linker could check all .obj files for definitions of the function and see
if they match. However this would slow down linking, especially if the linker
peeks into .obj files that it otherwise wouldn't examine.

It's also not clear what sort of differences to report. Let's say that you
have two translation units, one compile /O1 and the other compiled /O2. They
would probably generate different versions of floor, but this is not an ODR
violation. So, how is the linker supposed to detect when a difference in
generated code is fatal and when it is benign?

The C++ language does not require reporting of ODR violations because such
reporting is expensive and problematic. What sort of ODR reporting do the gcc
and clang toolchains do?

~~~
mansr
Unix linkers give an error if any symbol is defined in multiple object files,
different or not. After all the object files have been combined, remaining
undefined symbols are resolved by searching the specified libraries in command
line order. Each object from a library that provides a needed definition is
selected and its undefined symbols are added to the global list. Some linkers
will scan the libraries a second time in order to resolve interdependencies.
If at the end of this, the combined set of object files (from command line and
from libraries) contain more than one definition for any symbol, an error is
raised. The MSVC linker is apparently much more lax in this regard.

With the inline semantics defined by C99 there are two solutions to this
problem:

1\. Use static inline. Any non-inlined calls will result in an instance of the
function with internal linkage. Aside from wasting space with multiple copies
of the same function it is harmless. As mentioned, an optimising linker might
even merge identical functions across compilation units.

2\. Use inline without explicit extern. Non-inlined calls result in an
undefined external reference to be resolved by the linker with a definition
provided elsewhere. The inline definition is only used when the function is
actually inlined and never provides a non-inline definition of the function,
internal or external.

Neither of these approaches result in duplicate definitions with external
linkage, so the described problem cannot arise.

It's disappointing that Microsoft seem to have chosen the one approach to
inline functions that results in this sort of breakage.

~~~
brucedawson
> The MSVC linker is apparently much more lax in this regard.

No, you are incorrect. The function in question was tagged as inline. Having
multiple instances of it is legal. If a Unix linker gave an error on multiple
copies of an inline function then that would be a serious bug that would
prevent them from linking any non-trivial C++ program.

For inline functions it is only an ODR violation if the two instances are
different in an ODR relevant way. The language spec doesn't discuss compiler
switches but it is logically clear that /arch:AVX versus not is an ODR
violation, whereas /O1 versus /O2 is not. So therefore, even having two
_different_ instances of a non-inlined inline function is not necessarily
illegal, and _must_ be accepted by a conforming linker.

> Use inline without explicit extern.

I'm not sure why you think that this is a solution. An explicit 'extern' is
not needed when defining an inline function. If it is not inlined then the
compiler generates a copy that can be referenced. I have worked with gcc,
clang, and VC++ and I have never had to explicitly mark an inline function as
extern.

You are correct that static inline solves the problem, although I think that
this is an ugly solution. And, it is a standard library solution, rather than
a toolchain solution, so it doesn't automatically help developers who write
their own inline functions.

The best suggestion I heard (on twitter) was name mangling (for non-inlined
inline functions) that added an architecture suffix, thus making accidentally
calling the wrong architecture function impossible. Much cleaner and more
efficient than static inline.

~~~
mansr
What I said about inline is true of C99. C++ might be different though.

It is also true that Unix linkers give an error if the final set of object
files to link contain more than one external definition of the same symbol.
The linker can't tell if functions were marked inline in the source code or
not. A definition is a definition. The Microsoft object file format might
record this information. If it does, it would appear to be a misguided
decision that opens the door for the kind of breakage you're complaining
about.

