
Intel's "cripple AMD" function - kierank
http://www.agner.org/optimize/blog/read.php?i=49
======
drewcrawford
I was an intern for AMD a few years ago (these are my views and not AMD's). I
was pretty skeptical about AMD's antitrust claims against Intel until I went
to work there. I'm as free market as they day is long, but there's a whole
untold story of the evil things that go on in the back meeting rooms, even
outside of sales, where most of the public lawsuit claims are/were.

The thing to remember is that AMD is a small fraction of the size of Intel,
and they have to cover the same market segments. If they try to specialize
(say, servers, or notebooks), Intel will just sell that segment at a loss. AMD
has to cover everything with only a fraction of the people to stay
competitive, and it's really hard.

Even while I was there, we had what I suspect (but have no proof) were
incidents of people leaking product plans, roadmaps, etc. (but no IP) to
Intel. It's sad, really.

~~~
jimbokun
"Even while I was there, we had what I suspect (but have no proof) were
incidents of people leaking product plans, roadmaps, etc. (but no IP) to
Intel."

I can't imagine Steve Jobs allowing this to happen at Apple. They have
definitely caught people leaking things, and the consequences were swift and
unpleasant for the leaker. Why can't AMD catch these people? Is there
something preventing them from implementing the same kinds of measures to
catch leakers as Apple?

(Using Apple just as an example, of course. I'm sure there are other companies
who find leakers and make an example of them through the legal system.)

~~~
drewcrawford
There was one guy who was strung up as a leaker ten years ago. Don't remember
his name, but it was a big deal.

AMD's culture is just different than Apple's. For one, there are no "secret
teams" like iPhone, iTablet, etc. (well, at least none that I knew about). For
another, developers have real autonomy to make business decisions, something
that would never happen at Apple. For instance, I, a lowly intern, redeployed
software to the production line during an emergency. If something went wrong,
chips would actually stop rolling out of the factory. I would imagine normal
(non-senior) Apple engineers don't have that kind of autonomy.

The other major difference, which perhaps you caught above, is that AMD
actually manufactures their own stuff. So, not only are there US engineers,
but engineers overseas in the plants that AMD owns, engineers in Dresden, etc.
Not to say that foreign engineers are somehow bad, but it is a lot harder to
control leaks when you have engineers working literally 24/7 all around the
world. And at the scale that you're making your own stuff, there are just more
people than there are at Apple, and things are way harder to control. It's
like herding cats.

~~~
djcapelis
> For instance, I, a lowly intern, redeployed software to the

> production line during an emergency. If something went

> wrong, chips would actually stop rolling out of the factory.

While I'm all for letting engineers react to things progressively, that you
were in position where a screw up could have shut down a fab as an intern is
nothing short of terrifying to me.

~~~
drewcrawford
> That you were in position where a screw up could have shut down a fab as an
> intern is nothing short of terrifying to me.

To me, the converse is a lot scarier: what if I, feeling no personal
responsibility for yield, wasn't there after-hours looking for bugs in the
first place? Or what if I did find the critical bug, but had to wait weeks for
forms before it was pushed through? Or was blocked by office politics?

There's nothing more disheartening as having a fix for something serious that
you can't push through. I've worked at companies like that: taking away the
power to break something means taking away the power to make it better.

Not to say that I somehow dislike code reviews or generally fly by the seat of
my pants: the situation really was a real emergency. I can't talk specifics,
but the bug had already cost more than the damage it would do to if I broke
something.

------
herf
When using IPP, I had to rewrite the CPU detector, even for new Intel chips as
they came out. This code should be better...really it should just benchmark
all the options and catch processor exceptions to pick a supported path.

Instead, the idea is to do a static dispatch for 'known' chips, which is
really bad. When the Core2Duo came out, the version of IPP we used reverted to
basic MMX code instead of SSE2, about 2.5x slower. This is just bad code, and
it's bad on Intel chips, not just AMD.

Also there is the "optimized for benchmarking" piece. It's not always good to
use all your cores for one job, for instance, but a lot of these libraries
make the assumption that your CPU has nothing else to do.

------
rbanffy
Isn't this the textbook reason for using - and contributing to - open-source
compilers and libraries?

~~~
liuliu
and gcc is not that bad after all. I uses openmp to parallel my program on
core i7 860 cpu which should support 8-threads. But using icc as compiler, it
will only utilize 7 cores, and it does affect performance (about 10% slower
(wall time) than gcc which uses 8 cores). I suspect that it has something to
do with the dynamic linked openmp library for icc.

~~~
jey
Really? I find Intel's compiler to outperform GCC on pretty much all of the
numerical work I do. I build with "-O3 -xHost" and make use of OpenMP.

Dynamic linking of the OpenMP library is almost certainly _not_ the cause of
the slowness you're observing. If you really want to force the Intel OpenMP
runtime to use all 8 cores:

    
    
      export OMP_DYNAMIC=false
      export OMP_NUM_THREADS=8
      export KMP_LIBRARY=throughput
    
      # "KMP_BLOCKTIME" is how long an idle worker thread
      # should enter a blocking wait for more work before
      # sleeping, in milliseconds. default value is 200ms
      export KMP_BLOCKTIME=1000
    
      # following are needed if you use Intel MKL
      export MKL_DYNAMIC=false
      export MKL_NUM_THREADS=8
    

For more info:
[http://software.intel.com/sites/products/documentation/hpc/c...](http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-
us/cpp/lin/compiler_c/optaps/common/optaps_par_var.htm)

~~~
rbanffy
"I find Intel's compiler to outperform GCC on pretty much all of the numerical
work I do."

It's only a question of time. I suspect that if more companies decided to pool
resources around GCC (or any other free C compiler, like pcc or clang), they
will pretty much bury Intel.

Intel is a chip company. The only conceivable reason for them to want to
maintain a C compiler is to make a C compiler that's better than the
competition on Intel processors and that sucks as much as possible on
competing architectures.

Icc is not a compiler. It's a sales tool.

~~~
jey
I fully agree. I'm looking forward to LLVM becoming fully mature; it's a great
platform already, and just needs to be fleshed out with some more
optimizations/analyses/etc. And with the clang front-end, we can get rid of
the unmaintainable pile of crap that is GCC.

------
jrockway
Sounds like AMD should just start setting the vendor string to "GenuineIntel",
then. (This is something like the "like Mozilla" in every user agent string.
If dumb software is going to do dumb tests, and you need to fool the dumb test
to get your interoperability.)

~~~
rbanffy
Better yet: make it writable.

That way the OS could change it per process/thread/context and the code would
be happy.

~~~
wmf
VIA has a writable vendor string which they have used to reveal this type of
shenanigans in the past.

~~~
rbanffy
I always liked those guys. The über-486 Centaur built was briiliant design and
out-of-the-box thinking from top to bottom.

Weren't the VIAs able to trounce Xeons in some crypto stuff?

~~~
wmf
_Weren't the VIAs able to trounce Xeons in some crypto stuff?_

Yes, since they had crypto instructions and nobody else did. Now I would
expect Westmere to be faster.

------
wmf
Has anyone tried the Sun Studio compilers? They're free and supposed to be as
good as Intel, but I've seen virtually no discussion of them.

~~~
daeken
For x86, they fall behind a bit. For x64, they're faster than ICC in general.
Definitely worth a look if you don't mind going to OSol.

~~~
jedbrown
My experience on x64 has been that Sun is usually competitive with GCC/ICC,
but not clearly better. Sun's C99 compiler does really atrocious things with
SSE intrinsics, strange since their C++ compiler handles intrinsics almost as
well as GCC/ICC. Note that they also work fine on Linux.

------
rythie
Not only should they fix it, they should open source the code, so AMD can
contribute.

Intel often makes noises about open source, so they should put their money
where there mouth is.

~~~
notauser
Compiler discussion to one side for a minute...

Intel does more than just make noises about open source. Their wifi and
graphics chip set support has been excellent over the years. Prior to the
recent changes at ATI they were pretty much the only company doing that.

~~~
rythie
Agreed and it's good that they do that since I use their graphics and Wifi
drivers on my Ubuntu Laptop.

My point was if they are truly committed to open source they would do this
too, they are after all a hardware company and should compete by making the
best hardware.

------
sfg
I do not know much about processor benchmarking, but is it not a little weird
that the bench markers use software that is not independent of the hardware
they are testing? It seems like they are asking to be manipulated: why do they
do this?

~~~
wmf
They think the software is processor-independent but it's really Intel-biased;
that's the problem.

------
Andys
Sensational headline: this article is only about the Intel C Compiler, which
as far as I can see, is only used for benchmarketing and research purposes.

~~~
praptak
"Only" benchmarketing? If any published benchmarks are affected by this
misfeature, it's pitchforks and torches for Intel.

~~~
Andys
Does it really come as news to anyone that if Intel wants to show their CPU in
best light, they'll use their own compiler?

Caveat emptor.

The bulk of the x86 world uses Microsoft C++ or GCC - end of story.

~~~
kelnos
I think the parent was talking more about third parties doing benchmarks. If
someone (not Intel) uses benchmarking software compiled with ICC, it might
report erroneously bad results on an AMD system.

~~~
praptak
Yup, that was my point. Also, 'caveat emptor' has limits - Intel should at
least state that their compiler produces suboptimal code for their
competitors' CPUs.

------
adame944
Bottom line: it's a business decision. Code generated by the Intel compiler
"works" on AMD chips, although it may not be optimal. For Intel to support the
optimal codepaths on AMD chips would require a substantial amount of research.
I don't think they're intentionally crippling AMD chips; just declining to
invest the effort to support them optimally.

~~~
DarkShikari
That isn't exactly how it works.

The proper way to do it:

    
    
        if( CPUIDbits & SSE1_CAPABLE ) {enable SSE1}
        if( CPUIDbits & SSE2_CAPABLE ) {enable SSE2}
        [etc]
    

The even better way to do it:

    
    
        if( CPUIDbits & SSE1_CAPABLE ) {enable SSE1}
        if( CPUIDbits & SSE2_CAPABLE ) {enable SSE2}
        if( CPU is Athlon 64 ) {disable some SSE2 functions}
        if( CPU is Pentium-M ) {disable all SSE2 functions}
        [etc]
    

Intel's way of doing it:

    
    
        if( CPU is Pentium 3 ) {enable SSE1}
        if( CPU is Pentium 4 ) {enable SSE1/SSE2}
        if( CPU is Core 2 ) {enable SSE1/SSE2/SSE3/SSSE3}
        [etc]
    

Practically all sane applications do things the first way; a couple do things
the second way. Anyone doing things the third way is just asking for trouble
both in terms of future compatibility and resilience to unexpected situations.
For example, some VMs disable certain instruction sets, which would result in
SIGILLs when using the last method.

~~~
kelnos
If you read the article, it looks like Intel's CPU-type dispatcher actually
does it the second way (sorta; it appears to only check Intel CPU family IDs),
but at the bottom of that list there's a big "if(CPU string is not
"GenuineIntel") { disable everything and use crappy fallback code path }".

------
NathanKP
This doesn't really make any sense. All you would need to do is compile the
code on an Intel machine to get fast speed and then you can run it on an AMD
machine. It shouldn't really cause any problems as long as developers build on
genuine Intel machines. Of course that it irritating, but it shouldn't cause
any slowdown on other machines.

~~~
ShabbyDoo
I think the compiler generates code which checks processor type at runtime,
not compile time. If the compiled code is running on an AMD processor, the
"safe" version of the compiled code is chosen automagically.

~~~
NathanKP
Wouldn't that make the code twice as large?

~~~
ShabbyDoo
Perhaps, but size doesn't really affect runtime performance that much,
especially if most codepaths are never execute -- no processor cache churn
because the unused paths are never executed.

I don't really know anything about this compiler, so I'm certainly
speculating. My assumption is that one writes some function foo() and the
compiler prepends a dispatcher in front which forks (code paths, not
processes) to one of N optimized but functionally equivalent codepaths based
on the actual processor upon which the code runs.

~~~
ars
Size does affect performance because of the cache.

If the forks are inline, and the cache works in blocks, then you are wasting
cache space for code that never runs.

But considering it's intel I'm sure they thought of that.

~~~
pmjordan
I suspect it patches a jumptable at initialisation time based on CPU type, and
all the code used by one type of CPU is bunched close together. The unused
code probably isn't even paged into physical RAM.

