
Glibc sysdeps: dl_platform detection effectively performs “cripple AMD” - arthur2e5
https://sourceware.org/bugzilla/show_bug.cgi?id=24979
======
AnssiH
The point is to put Haswell-optimized code in the "haswell" subdirectory.
However, compiling code with -march=haswell (implying -mtune=haswell) does a
lot of other platform/manufacturer-specific optimizations as well (not just
usage of the various extensions), like favoring instructions X that perform
slightly better over other instructions Y, even if instructions X would be
significantly slower on AMD processors.

So it does not seem 100% certain, at least without testing, that using
"haswell" on AMD would be beneficial (it depends on whether the speed-up from
extra extensions offsets possible speed loss due to other Intel-specific
optimization). And Intel was probably not going to do such testing.

But my guess is that using the "haswell" libraries on AMD will be beneficial,
and the check will probably be loosened accordingly in the future.

Note, though, that the feature checks currently performed do not seem
exhaustive, so I don't think one can simply remove the Intel check as-is.
Haswell has a lot more extensions than the ones currently checked.

~~~
iforgotpassword
I'm not sure it is even worth it to distinguish by CPU model. My hunch is that
the optimizations the compiler does for a specific model are vastly
overshadowed by the gains of having certain extensions available at all. So
glibc should only distinguish by avx512, sse4.2, etc. usually a CPU that has a
newer extension available also has all of the older ones, the question is how
many exceptions would arise, and what the speed penalty would be to fall back
to the next slower implementation.

~~~
jandrewrogers
The implementations of extensions on different CPU microarchitectures can vary
quite a bit from the same vendor. And performance of those extensions can vary
by an order of magnitude between CPU vendors, sometimes because vendors will
initially add CPU extensions as microcode if they did not design the extension
themselves. AMD has microcoded many Intel extensions in some of their
architectures for binary compatibility but without performance benefits, and
many of their native implementations behave differently in material ways for
optimization. The existence of the feature does not imply you should use it
for highly optimized code, it may be slower than software versions of the same
thing, or that you should use the feature in the same way to optimize code.

It is quite complicated. The AMD and Intel microarchitectures are different
enough in design that some low-level optimizations around many extensions
really don't translate well between them. The differences can be big enough
that you take them into account at the C++ level too, writing target-specific
code.

Individual CPU vendors are in the best position to provide software
optimization for specific implementations of their microarchitectures.
Unfortunately, AMD invests relatively little in this and relies on the open
source community to fill in these gaps whereas Intel is excellent at providing
first-party software optimization support for their CPUs.

~~~
temac
Although you probably overestimate a bit the impact of uarch diff between
haswell and zen, there is some truth in theory in what you said. In practice,
zen is quite close to Skylake in terms of general uarch principles and main
figures, and zenv2 even more so (or even better). IIRC for ZenV1 AVX2 should
yield no gain compared to AVX, though.

glibc is not Intel's project and is expected to have a _minimum_ amount of
neutrality in own it is maintained -- that _also_ includes how patches are
accepted or modifications are asked before they are integrated.

The dispatching is probably to implement things like memset & memcpy etc,
which are easy to benchmark, and it is probable that the haswell version will
at least be better than whatever is used right now with an amd zen (and _even
more_ probable for zenv2). Optimizing further can come later, if anybody wants
to do it.

Therefore, I think this ticket is justified (but I also think that this is not
a drama that is has not be taken care of sooner)

------
krapht
Hm, is this really "crippling" AMD? Seems more like Intel submitted a
performance patch that is only enabled for Intel processors, but could be
extended to support AMD too.

There's a moral difference. It is wrong to intentionally degrade the
performance of your competitors. It is not wrong to not do something that
benefits others.

~~~
Traster
The point is that the correct way of doing this is to check for the feature
not for the vendor. It's perfectly legitimate for Intel to submit code that
helps Intel CPUs, but glibc shouldn't be accepting code that unnecessarily
favours one CPU vendor. The correct version of this code would just check for
the feature rather, so that if AMD does support this it just works.

~~~
bin0
It might just be the easiest way to do it. It's also possible that Intel
didn't have access to samples of enough AMD stuff to test, or the wrong
department had it, or they couldn't get approval to take the time it would
require. The GCC guys almost certainly lacked the hardware, time, and
inclination to do this instead. It's not malice, I don't think, so much as
practicality. I'm sure they'd accept an AMD patch enabling this where
available.

~~~
justinjlynn
> Intel didn't have access to samples of enough AMD stuff, or the wrong
> department had it, or they couldn't get approval to take the time it would
> require

I seriously doubt that.

~~~
big_chungus
You'd be surprised how bad communication can be in a big company the size of
intel. I'm sure they have plenty of samples of AMD stuff, but I could believe
they're being used mostly by some cpu-testing division, rather than the open-
source one.

~~~
justinjlynn
I wouldn't be surprised that communications were bad within Intel. I also
wouldn't be surprised if there was zero political will to make sure their
contribution worked properly on AMD processors. Combine the two and you get
whatever plausible deniability you want to use to deflect the argument, I
suppose.

------
cotillion
While this patch from @intel.com looks bad i wonder what has AMD been doing in
the meantime? Atleast intel appears to have someone looking at Glibc
performance.

~~~
PedroBatista
Intel’s software division is bigger the the whole AMD.

------
hunta2097
Could this be a legitimate unintended consequence of the pull request or some
new dirty pool tactic?

Either way I agree with Mingye Wang's assessment, this kind of thing cannot be
allowed to get into the source tree.

Hopefully AMD will increase their Linux activities with their new bigger
market share and income.

------
panpanna
Wait, where in glibc is SIMD used anyway?

~~~
bonzini
All string functions: not surprising since recent processors have the crazy
PCMPxSTRy instructions that are basically a hardware implementation of strcmp,
strchr, memchr, strspn etc.

memcpy and memset use SSE on some generations, but these days are best inlined
by the compiler as "rep movsb/stosb".

~~~
stingraycharles
I better hope AVX512 isn’t used on simple string functions, as it causes so
much heat Intel CPUs have to reduce their clock speed significantly, affecting
other running processes as well.

[https://www.tcm.phy.cam.ac.uk/~mjr/IT/clocks.html](https://www.tcm.phy.cam.ac.uk/~mjr/IT/clocks.html)

~~~
physicsguy
There’s nothing wrong with dropping the clock speed when the whole point of
SIMD instructions is that they execute on multiple data. As long as the clock
speed drop beats the multiplicative speedup from doing 4x/8x/whatever
operations in parallel, it’s fine

~~~
stingraycharles
My point is that the clock speed is also dropped for other processes. So it's
not a simple heuristic "oh if we have more data than X we can use AVX512
instead of AVX256", because you do not know what other processes are doing,
and they will be slowed down as well.

This type of heuristic only works if you are the sole users of the server,
e.g. a database server.

------
dogma1138
While it might appear scammy AMD lacks instructions on some of its pre-Ryzen
CPUs as the coverage for MOVEBE and the Bit Manipulation instructions are
quite inconsistent.

TBH the case for Intel isn’t that much different they have (Atoms mainly)
post-Haswell CPUs that don’t support all of the instructions in this patch
that would fail the master CPUID check and would not receive the optimizations
either.

If someone wants to split it into checking 30 specific CPUID flags and then
figuring out which libraries can be linked because many of them would require
some or all of these instructions by all means do that...

------
gameswithgo
A relevant point here may be that with Zen2, just released and very popular,
AMD supports AVX/AVX2 properly for the first time. Previous, AMD cpus could
execute AVX instructions but it wouldn't actually be faster, so there was not
much point in properly detecting that anyway. Now there is.

~~~
adrian_b
No,already the first Zen in 2017 supported perfectly AVX2. It just had half of
the throughput of Haswell/Broadwell/Skylake. Nevertetheless, using 256-bit AVX
instructions was still slightly faster in most cases than using 128-bit AVX or
SSE, because less instructions had to be fetched and decoded.

When Zen was first released in 2017, using -march=haswell in gcc produced
faster programs than using -march=znver1 or -march=bdverX.

Using AVX2 + FMA and all the other instructions introduced by Haswell &
Broadwell was always the right thing to do when compiling for AMD Zen. Nothing
has changed with Zen 2, except that the throughput of the AVX programs that
are not limited by the memory throughput has doubled now.

------
rgbrenner
Am I missing something? Intel submitted a patch in 2017 that enabled support
for a feature that was supported in their processors. 2 years later, AMD still
hasn't made processors that support one of those features (AVX512), and only
recently released processors that met the requirements for the other haswell
test... And somehow this is labeled "cripple AMD"?

It would be one thing if Intel did this while AMD had processors that worked
on it.. but they didn't. And code cannot tell the future. That's why we have
programmers. They're supposed to change the code when it needs to be changed.

~~~
adrian_b
"only recently released processors that met the requirements for the other
haswell"

This is false. Zen was introduced in Q1 2017, before that patch was written
and it already implemented all the useful parts of the Skylake instruction set
(and also the SHA extension not implemented by Skylake). The fact that the CPU
vendor should not be used when testing for CPU features was already discussed
a lot, many years ago, so someone of the library maintainers should have
noticed this bug and removed the inappropriate test.

About AMD not supporting yet AVX-512, that is far less annoying than the fact
that Intel does not support AVX-512 yet in any of their CPUs which are
actually competitive in their target market (Ice Lake is worse than Comet
Lake, Cascade Lake is worse than Epyc 2).

~~~
stochastic_monk
For me, skylake is useful because of AVX512{dq,bw}.

~~~
adrian_b
Skylake (i.e. Skylake, Kaby Lake and so on up to Comet Lake) does not support
any AVX-512 instructions.

Only Knights Landing, Knights Mill, Skylake Server, Cascade Lake, Cannon Lake
and Ice Lake support AVX-512.

What is called "Skylake Server" in the Intel documents, includes products sold
under various names, i.e. Xeon Scalable, Xeon W, Xeon D-2xxx and Skylake-X
(HEDT).

I assume that you were thinking about Skylake X, but those processors are a
very small fraction of a percent in comparison with all the processors using
the Skylake microarchitecture, which were sold since 2015 until now, so one
should never use "Skylake" to speak about "Skylake Server" a.k.a. "Skylake X",
because this is very misleading.

~~~
stochastic_monk
I see, thank you. I’ve only used skylake as skx, which just reflects what
architectures I have access to.

------
kbumsik
Isn't it AMD's job, not GNU's job?

> something that should not happen in any free software package.

Some people takes "free software" pretty wrong. Free software doesn't mean
being fair to all hardwares. It only is fair to accepting contributions from
people who want their machines to work with it. They would be individual
contributors but they are mostly people from the hardware vendors.

That's being said, Linux/Free software have being supporting Intel very well
because Intel itself is one of the biggest contributor. It's time for AMD to
hire more software engineers to support better drivers and software ecosystem.

~~~
simion314
The code is bad, it should detect features not vendor names . the proper fix
should be CC-ed to the Intel developer so he can learn the good way of doing
it.

This kind of bad code makes sites to ask me to run a modern browser just
because my modern browser is not in the dev whitelist or this kind of bad code
caused Microsoft not to have Windows9 because some developers where clever and
instead of checking for features or check for win95 and win98 they wrote a
shorter line of code and checked for "Win9".

In conclusion there are 2 possibilities:

1 the Intel dev did a mistake

2 it was intentional to limit the improvements to Intel only

you are arguing that 2 is fine, then you are fine if google, Facebooks,
Microsoft also put checks in their commits and limit features only if the code
runs in their OS,browser,site. This kind of obvious user hostile commits are
bad and don't have a place in free software and IMO the Intel compiler should
not intentionally cripple a competitor.

Also this attitude would mean that each company should now have a developer
dedicated to check competitors contributions to make sure no intentional bad
commits are sneak in because in your opinion is fair to intentionally do it.

~~~
asveikau
> then you are fine if google, Facebooks, Microsoft also put checks in their
> commits and limit features only if the code runs in their OS,browser,site

In my experience some FB features (notes, maybe video) refuse to work on
Firefox on *BSD. I tell it to lie about my UA and say that it's linux,
everything works.

