
Intel's “cripple AMD” function (2019) - arto
https://www.agner.org/forum/viewtopic.php?f=1&t=6
======
segfaultbuserr
> _Never rely on benchmark tests unless the benchmarking code is known to be
> open source and compiled without using any Intel tools._

A serious question - how many common benchmark packages are compiled by ICC or
uses Intel MKL? I hope the number is limited, otherwise all the benchmarks
published by mainstream PC reviewers are potentially biased. If there's a
serious ICC-in-benchmark problem, then only Phoronix's Linux benchmarks are
trustworthy - the majority of benchmarks on Phoronix uses free and open source
compilers and testsuits, with known versions, build parameters and
optimization levels. Thanks Michael Larabel for his service for the community.

~~~
wtallis
This concern really only applies to synthetic benchmarks (stuff like SPEC
CPU). If you're testing a commercially available application or game as
delivered to consumers, this issue does not invalidate the benchmark, it just
makes the software vendor a bit of an Intel stooge.

~~~
formerly_proven
The go-to Intel-crushing benchmark these days is Cinebench R20, which runs on
Intel's Embree raytracer.

~~~
segfaultbuserr
So Cinebench R20 is a suspect?! That's not good...

------
ncmncm
Money quote: "... on an AMD computer then you may set the environment variable
MKL_DEBUG_CPU_TYPE=5."

When run on an AMD, any program built with Intel's compiler should have the
environment variable set. I don't think there is any downside to leaving it on
all the time, unless you are measuring how badly Intel has tried to cripple
your AMD performance.

~~~
g42gregory
My understanding is that that flag is gone, as of couple of months ago. Intel
“fixed” it.

~~~
rasz
Why not patch out the CPUID check as a post compilation step?

~~~
danieldk
That's definitely possible (it probably checks that the manufacturer ID is
_GenuineIntel_ ), but nobody wants to distribute patched MKL versions, because
it most likely violates the MKL license.

It may even be easier to replace the function altogether with _LD_PRELOAD_.

~~~
danieldk
Indeed works. A simple trace reveals that the function is called
_mkl_serv_intel_cpu_true()_.

Make a file with the following content:

    
    
        int mkl_serv_intel_cpu_true() {
          return 1;
        }
    

Compile

    
    
        gcc -shared -o libfake.so fake.c
    

Run

    
    
        LD_PRELOAD=libfake.so yourprogram
    

And it uses the optimized AVX codepaths.

Disclaimer: may not be legal in your country. I take no responsibility.

~~~
ashleyn
Wow. I wasn't quite expecting something as simple as "if CPU is not intel,
make everything worse."

~~~
colejohnson66
I’m sure their justification is that (1) they have no obligation to help AMD,
and (2) how could you guarantee AMD implements CPUID the same as Intel (as in:
what if AMD implements a feature bit differently?)

Of course, the second one makes no sense as x86 programs run just as fine on
AMD as Intel with the same feature set (albeit at different speeds)

------
physicsguy
I work on CFD software. We're well aware of this in my work, but the reality
is that all our big corporate clients use Intel hardware. We already tell
people to set those environment variables in our documentation.

> Avoid the Intel compiler. There are other compilers with similar or better
> performance.

This is not really true IMO, but even as an aside, the Intel compiler has the
enormous advantage of being available cross platform. So we can use it on
Linux and Windows, and provides MPI cross platform. We upgrade fairly
regularly and that provides us with less work.

My own tests found that PGI compiler performance was worse than Intel for C++,
and that now appears to have been discontinued on Windows anyway with NVidia's
new HPC compiler suite replacing it. GNU can run everywhere, but performance
is around 2.5x worse on Linux for our application use case because it doesn't
perform many of the optimisations that Intel does. We use MSVC on Windows just
because everyone can have a license, and performance is much worse.

The other thing is that MKL is pretty stable and gets updated. If I use an
open source BLAS/LAPACK implementation - sure, it works, and it may even give
better performance! But it's not guaranteed to get updates beyond a couple of
years, and plenty of implementations are also only partial. We pay Intel a lot
of money for the lack of hassle, basically.

~~~
gnufx
So which are the optimizations the Intel compiler does which GCC can't is
asked? I could guess at the reason for a factor of two, but what does the
detailed profiling say with equivalent compiler flags? I can also say that GCC
is a factor of two better on SKX on a Fortran benchmark, and came out about
the same over the collection that's from when profile-directed. The usual
reason for the Intel compiler appearing to win much is incorrect-by-default
maths optimization allowing more vectorization.

I don't know about MKL stability, but reliability definitely isn't something I
associate with the Intel Fortran compiler (or MPI) in research computing
support.

~~~
physicsguy
I found that the common subexpression elimination performance was
significantly better than that in GCC for one thing

------
gnufx
The mythology surrounding the Intel tools and libraries really ought to die.
It's bizarre seeing people deciding they must use MKL rather than the linear
algebra libraries on which AMD has been working hard to optimize for their
hardware (and possibly other hardware incidentally). Similarly for compiler
code generation.

Free BLASs are pretty much on a par with MKL, at least for large dimension
level 3 in BLIS's case, even on Haswell. For small matrices MKL only became
fast after libxsmm showed the way. (I don't know about libxsmm on current AMD
hardware, but it's free software you can work on if necessary, like AMD have
done with BLIS.) OpenBLAS and BLIS are infinitely better performing than MKL
in general because they can run on all CPU architectures (and BLIS's plain C
gets about 75% of the hand-written DGEMM kernel's performance).

The differences between the implementations are comparable with the noise in
typical HPC jobs, even if performance was entirely dominated by, say, DGEMM
(and getting close to peak floating point intensity is atypical). On the other
hand, you can see a factor of several difference in MPI performance in some
cases.

------
smartmic
Related:
[https://news.ycombinator.com/item?id=21732902](https://news.ycombinator.com/item?id=21732902)

------
gnopgnip
Even if the compilers are biased, isn't it reflective of what users would
experience because most software is made with biased compilers?

~~~
throwaway5792
Not really. No one outside of specialized applications like HPC will use
Intel's compiler for their software. The general public seeing SPEC benchmark
figures between gcc AMD and icc Intel may be surprised when they that Intel
CPU doesn't perform as well as expected vs AMD when running generic code.

~~~
FartyMcFarter
5-10 years ago the Intel C compiler produced significantly faster code than
gcc (and clang was even worse back then), so there was a bigger reason to use
it back then.

~~~
nzmlPA
That was the story 10 years ago as well, yet I have never managed to find an
open source program where the Intel compiler has produced faster code than gcc
back then, too.

gcc has always produced faster code for at least 15 years. In fact, it is the
Intel compiler which has caught up in the most recent version.

~~~
physicsguy
For what sort of application? I ran benchmarks of my own scientific code for
doing particle-particle calculations and with -march=native I could get 2.5x
better performance with Intel vs GCC.

One thing I found that you do have to be careful with though is ensuring that
Intel uses IEEE floating point precision, because by default it's less
accurate than GCC. This causes issues in Eigen sometimes, we ran into an issue
recently after upgrading compiler where suddenly the results changed and it
was because someone had forgotten to set 'fp-model' to 'strict'

~~~
bluecalm
If Intel is using floating point math shortcuts you can replicate it with
-Ofast when using gcc.

It goes without saying that you should use -O3 (or -O2 for some rare cases)
otherwise. I am mentioning it just in case because 2.5x slower sounds so
exotic to me that the first intuition is that you're omitting important
optimization flags when using GCC. GCC was faster than Intel on everything I
tried in the past.

------
Jonnax
Is there major software that uses I tell compiler?

~~~
jarvist
Most high performance software on super computers uses the Intel C and Fortran
compilers, and much engineering and scientific software on workstations uses
the Intel Maths Kernel Library (MKL) for high performance linear algebra.

Now that AMD EPYC processors are powering a lot of next generation super-
computer clusters, we're going to have to figure out some workarounds!

~~~
ip26
I think this: [https://developer.amd.com/amd-aocl/amd-math-library-
libm/](https://developer.amd.com/amd-aocl/amd-math-library-libm/) is supposed
to be the alternative to MKL for those applications.

~~~
jarvist
Thank you! I wasn't aware of this. But this is only a replacement for libm
(i.e. basic trig, exp function), not the matrix-orientated BLAS, LAPACK and
SCALAPACK routines that scientific codes spend >90% of their time.

~~~
ip26
I'm not personally familiar with those, but seems like BLAS, SCALAPACK, &
others are also available:

[https://developer.amd.com/amd-aocl/](https://developer.amd.com/amd-aocl/)

------
hosteur
Wow. How is this not a lawsuit happening already?

~~~
ballenf
Toward the end of the article, the several lawsuits and FTC actions are
discussed. The end result of them is a disclaimer on the Intel compiler that
it's not optimized for non-Intel processors and that Intel can't artificially
hurt AMD performance (but it apparently has no obligation to support unique
AMD optimizations either).

~~~
wtallis
> (but it apparently has no obligation to support unique AMD optimizations
> either)

It's a bit worse than that. Intel has no obligation to support optimizations
that _aren 't_ unique to AMD; they're allowed to disable SIMD extensions that
AMD processors declare support for, while at the same time using all of those
SIMD extensions on Intel CPUs. They just have to include the disclaimer that
their compiler and libraries may be doing this.

~~~
colejohnson66
Why is it _just_ the compiler maker’s job to report it may (read: will)
underperform on AMD and not also the program developer too? If I paid for
software that performed worse on AMD because it deliberately hobbled itself
(and was not informed), I’d want a refund.

It’s straight up anti-competitive, but consumers aren’t smart enough to
understand that it’s a problem; A consumer just sees biased benchmarks that
show Intel outperforming AMD, and then choose Intel.

------
coronadisaster
All companies are evil if you let them...

~~~
phendrenad2
Business is war, they say.

~~~
rbecker
Except when discussing free trade.

