
Intel's "cripple AMD" function (2009) - luu
http://www.agner.org/optimize/blog/read.php?i=49
======
dchichkov
I've ran into it in practice a few years back, while making WRF (math
intensive atmospheric modeling software, I'm maintaining non-commercial
soaring prediction site for the bay area) to work on my AMD cluster. Had to
patch the executable compiled with the Intel compiler in order to make it work
unhindered on AMD's. The patch was just zapping 'Genuine Intel' detection code
in the compiled executable... That 'post-linker' patch is available here:
[http://www.swallowtail.org/naughty-
intel.shtml](http://www.swallowtail.org/naughty-intel.shtml)

~~~
neurostimulant
Why not just recompile wrf from source? Dependency hell?

~~~
aylons
Probably it wouldn't work so nice with other compiler. This level of
optimization often requires compiler - specific syntax.

Not to say Intel optimization is really good.

~~~
nanidin
I was in a parallel programming class where the fastest correct assignments
got significant extra credit boosts. Compiling with the intel compiler with
optimizations turned on (vs gcc) was often enough to make the difference in
getting that extra credit by a significant margin.

~~~
eridius
You were graded based on compiled binaries that you provided, and not based on
the source? That sounds crazy to me.

~~~
nanidin
We provided build scripts and source. The extra credit was for fastest
execution time.

I assume you're payed based on final results, not based on source. Not so
crazy of a concept - whoever delivered the best results got rewarded for it.

~~~
eridius
Sure, but if merely switching compilers produced a faster binary, then I would
expect all programs to be compiled with the better-optimizing compiler. After
all, it doesn't take any particular expertise to adjust the value of CC.

~~~
dpe82
There's more to optimization than setting -O3. Learning how various compilers
behave and how their optimization features interact with your code are
valuable skills and may well have been within the scope of the course.
Certainly worthy of extra credit.

~~~
scott_karana
Sure, but why not mandate that everyone tunes the same compiler?...

~~~
mikeash
I'll flip the question around: _why_ mandate it?

The class clearly has a performance component, and so students were expected
to learn about optimization. Are they going to learn optimization better or
worse if you mandate a single compiler? If merely switching compilers is the
best path to performance, is that not a valuable lesson? If switching
compilers _and_ doing a bunch of extra work to make the code fast with the new
compiler is the best path to performance, have they not learned a great deal?

~~~
scott_karana
Some compilers aren't generally available. Hypothetically, what if ICC wasn't
available freely to educational users, but some of the students had side-jobs
where they used it?

You can always mandate a large set of compilers, make them all available, and
leave it up to the students to determine which is fastest. I think that
acheives both the competitive/educational goal _and_ the level playing field
goal.

~~~
mikeash
I would definitely ban using any compiler that wasn't generally available to
the class, or at least disqualify their output from winning the contest. I'd
take a generic approach where it's worded just like that, rather than trying
to come up with an official set of acceptable compilers, though.

~~~
scott_karana
Sounds fair enough. I suppose we're really on the same page after all. :)

~~~
mikeash
Sounds good! Just remember, if the Internet Police show up, this never
happened.

------
wmf
(2009)

Since then Intel settled the lawsuit by paying $10M and agreeing to add the
following disclaimer to their compilers: "Intel's compilers may or may not
optimize to the same degree for non-Intel microprocessors for optimizations
that are not unique to Intel microprocessors..."
[http://software.intel.com/en-us/articles/optimization-
notice...](http://software.intel.com/en-us/articles/optimization-notice#opt-
en) [http://www.anandtech.com/show/3839/intel-settles-with-the-
ft...](http://www.anandtech.com/show/3839/intel-settles-with-the-ftc)

~~~
lotyrin
Yep, and that's been interpreted such that - just in case - we add a link on
every page of software.intel.com to [http://software.intel.com/en-
us/articles/optimization-notice](http://software.intel.com/en-
us/articles/optimization-notice)

Whether or not the current page has anything to do with compilers.

Also, judging by the URL they made it an 'article' instead of a 'page'
again... I'll have to see if I can get someone to fix that.

~~~
shocks
I presume the notices are images so they can't be indexed by search
engines?...

~~~
lotyrin
Not sure, more likely somebody got a zip file full of images from the
lawyer(s) and decided to put those up exactly as provided.

------
sounds
Article is from 2009 but Agner's CPU optimizations manual is still very
useful.

[http://www.agner.org/optimize/optimizing_cpp.pdf](http://www.agner.org/optimize/optimizing_cpp.pdf)

Instructions on how to patch Intel's CPU detection routine to do your bidding
is in section 13.7, pp. 132-133.

The 2009 article also has this interesting tidbit: "It is possible to change
the CPUID of AMD processors by using the AMD virtualization instructions. I
hope that somebody will volunteer to make a program for this purpose. This
will make it easy for anybody to check if their benchmark is fair and to
improve the performance of software compiled with the Intel compiler on AMD
processors."

------
gcp
This is an old article. As far as I know, the settlement that was been reached
was entirely laughable and most certainly doesn't remove the "cripple AMD"
function. Now Intel just has to notify customers that it may not get optimal
performance on other CPUs, and reimburse them the cost of the compiler if they
can demonstrate that they mistakenly bought the compiler thinking that
wouldn't happen, or something like that.

There is no new info in the linked article regarding the "new" FTC
investigation.

------
salient
Intel is one of the least ethical tech companies around. Have they even paid
their 1 billion euro fine to the EU Commission yet for trying to force OEMs to
not use AMD chips in their products?

[http://www.engadget.com/2009/05/13/intel-
fined-1-45-billion-...](http://www.engadget.com/2009/05/13/intel-
fined-1-45-billion-dollars/)

~~~
boyter
I wouldn't be surprised if they had. Off the top of my head they made over $7
billion doing this and can consider the fine as a cost of doing business.

~~~
sitkack
When companies do this, they should be fully audited and fined 300% profit,
split evenly between the harmed company and the government. If that puts them
out of business, so be it.

~~~
sounds
That would certainly discourage _getting caught_ violating the law.

It would also tend to kill off the older companies (weak law of large numbers:
if a company violates any of the laws that will kill it, and it exists long
enough, it eventually gets caught and killed).

It might even lead to some efforts at counter-legislation. For example,
companies might lobby to _broaden_ the "get killed" legislation, which would
result in lots of sympathy cases where companies were killed for "minor"
offenses. Eventually the whole "kill the company" idea would fall out of
favor.

[http://en.wikipedia.org/wiki/Three-
strikes_law](http://en.wikipedia.org/wiki/Three-strikes_law)

(Companies will tend to view a government audit as a death sentence, since it
would damage them so much even without a 300% fine.)

~~~
Peaker
I'd support fines that are proportional to general revenue or profit. A fine
must hurt.

Also, an audit and 300% fines would probably not kill companies.

------
TwoBit
How does the Intel compiler compare to others today? We tried using it for
game development years ago and it had too many problems to make it worthwhile
(e.g. pathological behavior with some C++ code).

~~~
moconnor
The Intel compiler is extremely good at finding and exploiting vectorization
(SSE/AVX) opportunities; using these instructions in hot loops is becoming key
to getting anywhere near peak performance out of modern CPUs.

Most people don't care enough about performance to notice, but recompiling
with Intel's compiler often shows a 5-15% difference on number crunching codes
and that's before spending time investigating the vectorization output and
fine-tuning.

On the other hand, if you really care about speed then someone with some
experience in performance tuning will typically be able to make your code run
4-8x faster, vastly outweighing any benefits from the compiler.

~~~
sounds
Just in case you skimmed moconnor's comment, it bears repeating:

Intel's compiler: 15% speedup

Hand-optimized code: 800% speedup

This gap in compiler tech is still a big deal today. Think about the early
mainframes and how the code was all written in machine code or assembler.
[http://www.pbm.com/~lindahl/mel.html](http://www.pbm.com/~lindahl/mel.html)

Compilers can still improve, a lot.

• Parallel code? _still_ hand-written, even though choosing the right
language/library can help. Note that choosing that language that makes
parallelism easy may cost you when you actually go for the max parallel
speedup

• GPU? hand-written. See: litecoin miners and bitcoin miners before that.
OpenCL but were hand-tuned for a specific architecture

• Cross-platform? Java and C should be portable, but ask any Android developer
how it really works

• And the one we're talking about here: number-crunching code? hand optimized!

I'm actually quite optimistic about the future of compilers. One of the
reasons HN is so fun to read is that it comes up often.

~~~
raverbashing
"Hand-optimized code: 800% speedup"

It _really_ depends.

Especially in how naively the "non-optimized" code was written.

I can see vectorization accelerate from 2x to 4x (per core), but not much more
than that (which the Intel compiler does best)

But even GCC can vectorize better today than in the early days of 4.0

~~~
sounds
Sure, it depends. I've seen embarrassingly parallel (yeah, that's a real term)
code with speedups in the 20's.

My personal best was a 9x speedup, partly by using SSSE3 and partly by some
really good prefetching and non-temporal writes.

If you look at what I said in the very narrowest light, I agree that SSE2 all
by itself typically delivers a 2x speedup per core over non-SSE code.

------
agumonkey
Bothered me when I realized that maybe mainstream reviews (the ones able
influence the average mass market buyer) were using binaries very biased in
favor of intel.

~~~
GhotiFish
What bothers me is that if a mainstream reviewer benchmarks ICC compiled
programs, well, that isn't unfair. Real world programs are compiled with that
compiler. AMD processors actually WILL under-perform on certain programs
because of this.

That leaves a bad taste in my mouth.

~~~
wmf
Maybe any program compiled with ICC should have the same disclaimer as ICC
itself so people know that the program is biased.

~~~
GhotiFish
well, personally I don't think that's enough. I'm amazed they got away with
the plea bargain they did.

------
jrockway
If I were AMD, I'd just start calling my processor GenuineIntel. (Or maybe
make it user programmable, and then absolve myself of any knowledge of what
users are setting it to.) When the judge asks why, I'd say because those are
the magic words to make certain binaries run faster, and I wanted to run a
viable processor business.

This is not an acceptable use of trademarks.

~~~
fzltrp
> This is not an acceptable use of trademarks.

But if you were Intel, would you have your engineers work on competitors'
products to make sure they are well supported on your line of tools? Before
making an answer, consider that the core implementation of AMD cpus differ
significanrly from those of Intel: instruction timings are slightly different,
whether you look at them individually or in groups. It's not just a matter of
turning a switch to get optimal performances, and that's just the tip of the
iceberg.

Now, from a business standpoint, I think it could make sense for them to make
their compiler produce fast code for any chip, but the legal implications of
having a conccurent's product burn because of code produced with your compiler
might make you think twice before going that road. Intel probably chose the
safe road for a reason. Also, note that the produced code isn't crippled (as
in, it doesn't make AMD cpu execute endless loops, or produce wrong results
more than Intel's ones), it just follow the safest path.

~~~
hvidgaard
Add another flag to the compiler to produce code optimized code for any CPU
with the warning that it's only been verified to work with Intel CPUs. With
todays CPU I do not buy the "safest path" argument - perhaps I could accept
that "we only default enable it for implementations we have verified in-
house", which makes a lot of sense.

This sounds a lot more like Intel know they make the best compiler, and
knowingly put non Intel CPUs at a disadvantage because it would seem that they
have a faster CPU.

~~~
fzltrp
I guess they could do that, and trust that customers will always be reasonable
to not sue them when in those situtations that they wanted to avoid. Besides,
it's not 2009 anymore: if they want to maintain their arch on the market
against ARM founders, they should probably help AMD out as much as possible
(though I remember reading somewhere once that AMD considered including ARM
cores in their APUs - or maybe it was just a journalist's speculation).

> This sounds a lot more like Intel [...]

Just a thought here: should Intel do things to avoid sounding like bad
competitors, or to give their customers the best product they can offer? We're
engineers, we should also know not to fall for appearances, shouldn't we? I
know, I supported my own reasoning with the legal aspect of things, which
sometimes is not very reasonable in what it must handle. There goes my
original point.

------
w1ntermute
Can you spoof AMD CPUs to return "GenuineIntel" instead of "AuthenticAMD"?

~~~
staticfish
Couldn't that also potentially break some logic branch through the chip where
the app is expected to be running on a GenuineIntel processor? I'm not well
versed in this.

~~~
sounds
Not likely. AMD processors are very carefully designed to correctly execute
code, even if it just assumes GenuineIntel and never even checks.

If code is dumb enough to try to use something low-level (let's use Bull
Mountain RDRAND as an example) without checking for that specific feature bit,
then it obviously is the code that is broken, leading to an illegal operation
and it gets killed. That's not the CPU's fault.

Intel and AMD CPU manuals both pound in the point, too. In the sections on
these advanced features they always insist that you check the feature bit
first.

~~~
mikeash
Your mention of RDRAND is a great point and made me think about just how many
differences there are between different models of CPUs from the same vendor. I
assume the differences between different Intel CPUs vastly outweigh the
differences between similar Intel and AMD CPUs.

------
aeonsky
I'm still not entirely sure why is Intel forced to do this? Is it only because
they advertise that it optimizes equally well for any CPU? If not, then I
don't really see why they can force them to provide another AMD-friendly
version.

~~~
freehunter
Intel is the market leader by a good margin, and in the past has been known to
use unfair tactics to keep other players out of the market. AMD has been in a
lot of lawsuits with Intel due to this.

In this case, it's not just that Intel isn't playing nice with AMD, it's that
they're specifically using poor optimizations during compile if you're not
using an Intel processor. That's not by accident, that's done on purpose to
make non-Intel processors seem worse. What you're allowed to do while
competing in the market changes when you're the dominant player in the market.

~~~
aeonsky
I am usually not a free market extremist, but if Intel makes an excellent
software product after years and millions of dollars in R&D, and make it only
work for certain platforms, power to them.

~~~
wmf
The problem is that they penalized AMD without telling anyone. The market only
works if you know what you're buying.

~~~
mikeash
To make an obligatory car analogy, imagine if Ford opened up gas stations that
sold really good gas, but this gas was somehow made to run much less
efficiently in non-Ford cars. And further that they didn't tell anyone this,
and just left you to assume that if you filled up your Prius with Ford gas and
subsequently got 20MPG, the car was to blame.

~~~
fnimick
It's even worse than that - since companies are distributing binaries compiled
with icc, it's more like a Ford gas refinery distributing gas to normal gas
stations that secretly runs terribly in other cars. There simply is no way for
the consumer to know what they're getting.

~~~
sitkack
This nock on effect is where the real harm is done. The fact that Intel is not
checking for feature flags, but rather the existence of an Intel processor is
actionable. They aren't following their own best practices for accessing
optional features of the chip.

------
happycube
At this point, does intel _need_ that function to make AMD's CPU cores look
bad?

~~~
wmf
You're talking about a company whose motto is "only the paranoid survive". Why
win when you can utterly dominate?

~~~
salient
I don't think that's been their motto since Otellini took over. Look where
they are now in the mobile market. Otellini put the profitability of their
Core chips above improving Atom in the first few years, even when it came to
netbook performance, which was already terrible. Combined with the fact that
they forced OEMs to not buy AMD alternatives during the same time, Otellini
just didn't think it's necessary to improve the performance of Atom too much.

They only started caring about _power consumption_ when it was already obvious
to _everyone_ that ARM is going to pose a threat to them eventually. I think
if everyone sees something that's by definition not "paranoia". To be
paranoid, you have to see and believe something _before_ others see it.

~~~
JohnBooty

      They only started caring about power consumption when 
      it was already obvious to everyone that ARM is going to
      pose a threat to them eventually.
    

I'm being a bit pedantic, but it seems to me they refocused on power
consumption beginning with the launch of the Pentium M (forerunner of the Core
and Core 2 lines) which was released in 2003 and was surely in development
several years before that.

Or do you think they were thinking ahead to ARM already in ~2001 or so? Maybe
they were... although I think they were thinking about targeting laptop sales
in general at that point, not ARM specifically.

~~~
agumonkey
That's true, the NetBurst syncope made them redesign toward efficiency, but
still, the rise of ubiquitous mobility forced another inflection in their TDP
curve. And they're still sweating over it since the PC market is shrinking and
they need to get their foot in the smartphone/tablet market (see the bay-trail
subsidize effort [http://liliputing.com/2014/01/bay-trail-tablets-cheap-
intels...](http://liliputing.com/2014/01/bay-trail-tablets-cheap-intels-
footing-bill.html))

------
bd_at_rivenhill
This all seems to indicate that the intel compiler emits multiple, cpu-
dependant code paths for a given binary, which seems insane to me due to the
amount of extra memory that this would require. Am I missing something here?

~~~
stephencanon
Extra code on disk doesn’t cost anything (well disk space, but that is “free”
for practical purposes). A compiler can arrange so that all of the code for a
given architecture appeared consecutively in the binary, and then the pages
and cachelines containing implementations unused by the processor on which you
are running are never loaded into memory / never take up space in the cache.

Also, in practice code is a tiny portion of the size of a typical application.
Far more space is consumed by resources like images and sounds.

~~~
gonzo
You mean "data".

------
raverbashing
And the question is: how about we stop paying Intel for an unfair product?

I know, their compiler produces the fastest code, but maybe you can get good
(enough) results by using libraries and maybe some manual optimization

------
fest
Isn't this crippling a compile-time thing?

Is there something in binary that executes best-performing instructions (as
opposed to execute just the instructions compiled in) when it's being executed
on a specific CPU? If so, how exactly does it work?

~~~
neolefty
It's actually a runtime switch. A compiled x86 binary that uses extra-wide
number-crunching instructions (SSE etc) must also work on older processors
that don't have those instructions, so it will have two or more code paths.
The code paths all perform equivalent computations, but using different
instructions.

For example, if you are adding 4 pairs of 64-bit numbers, and there's a
special add-4-pairs-of-64-bit-numbers instruction, but it's specified as part
of SSE4 (I made that up, but it's the kind of thing that you would find), then
you can ask the CPU if it supports SSE4. If it _does_ , then you say great,
use this code path that requires SSE4, and we'll do the whole operation in
three instructions: load, add, store. Or something.

However, if the CPU says that it _doesn 't_ support SSE4, then you'd better
have a backup plan. It doesn't have to run as fast, but it should compute the
same answer. If it's compiled C code (as opposed to hand-written assembler),
the compiler will have you covered. Instead of a single SSE4 instruction,
maybe it will take 4 regular 64-bit x86 add instructions instead.

(And if you've written it in assembler, then you probably provided the
compiler with a backup C implementation to use if SSE4 isn't supported.)

Intel's compiler is being unfair to AMD CPUs because -- even if they support
the instructions that you want -- it won't use them. It will unnecessarily
fall back to the plain old non-SSE x86 instructions.

~~~
fest
Thanks for your answer, it did not occur to me initially, but it makes a lot
of sense!

------
coldcode
When I worked for a game company we used the Intel compiler for a couple of
versions but it caused so many issues for people with AMD we switched back to
the MS compiler. In the end the performance difference wasn't enough to
matter.

~~~
fzltrp
That's interesting: could you elaborate on those problems? I was pretty much
supporing Intel on the ground that their competitors products were just
running a safe path, but your input might change my view entirely.

~~~
coldcode
It wouldn't be useful anymore, that was 3-4 years ago. At the time besides
having AMD issues we had floating point optimization issues which messed up
our physics. I doubt it's an issue today.

