
Correcting Intel's Deep Learning Benchmark Mistakes - Smerity
https://blogs.nvidia.com/blog/2016/08/16/correcting-some-mistakes/
======
ChuckMcM
Ok, its officially the new new thing when corporate communications types are
sniping at each other :-)

Once we start seeing press releases of products that are not available yet
being shown to kill the existing competition we'll know that the hype train
has officially gone super-product-sonic (that is a hype wave travelling faster
than the product releases can support it).

~~~
wmf
I think both Knights Whatever and DGX-1 are already there. They seem to be in
"order now, receive sometime" mode.

~~~
sp332
Xeon Phi is in 23 of the Top500 supercomputer list, so it's not like they're
not shipping. The next version, Knight's Landing, should be launching soon,
and hopefully will have better availability.

~~~
mtgx
Intel was comparing Knights Landing to Kepler, which was released a few years
back.

~~~
paulmd
Kepler was the default GPGPU product for many tasks until GP100 released - and
GP100 still has not hit general availability yet.

Essentially, anything that needed dynamic parallelism (launching kernels from
within kernels, i.e. tasks where you don't know where the
difficult/interesting needles are within a haystack), advanced/concurrent
scheduling capabilities, or FP64 is going to be much, much better off with
Kepler until users can get their hands on GP100 cards. Maxwell is good at
neither of those things - it's actually only good at specifically deep
learning/neural nets. _Which is not every task within the GPGPU space_.

~~~
llukas
Nvidia blog post is about Intel deep-learning benchmarks only.

------
modeless
> Titan uses four-year-old GPUs

... as does nearly every public cloud provider. I agree with most of the
article, but you can't fault Intel for benchmarking the hardware that cloud
providers are actually offering.

I'm not sure what exactly NVIDIA is doing with their Tesla product line but
whatever it is, it's really restricting the availability of recent GPU
hardware. Even Azure's GPU instances released this month are using the Kepler
architecture from 2012. It's fully two generations out of date now, and that's
sad.

~~~
pavanky
I mean Intel is comparing their publicly unavailable product against NVIDIA's
publicly available product. Now NVIDIA is replying with the benchmarks on
their (as of yet) publicly unavailable product.

I think this blog post is fair game.

~~~
scottlegrand
Not even wrong. I have two PCs with 4 Titan X (Maxwell) GPUs and a third PC
with 4 Titan X (Pascal) GPUs. Both of these systems are available today (I
built them myself, total BOM about $7K), and both will destroy 4 Xeon Phi
servers at Deep Learning.

The benchmark Intel presented here is as disingenuous as their infamous white
paper from 2010: [http://pcl.intel-
research.net/publications/isca319-lee.pdf](http://pcl.intel-
research.net/publications/isca319-lee.pdf)

In comparison, a single Knights Landing Xeon Phi will be ~$7K. I know where I
put my money. Caveat Emptor.

But Xeon Phi and I go way back here. They've been trying to beat my AMBER GPU
code since 2013 or so. Many man years later I believe that a Knight's Corner
is now ~35% faster than 2 Xeon CPUs with 1M atoms or more (source:
[http://adsabs.harvard.edu/abs/2016CoPhC.201...95N](http://adsabs.harvard.edu/abs/2016CoPhC.201...95N))

Meanwhile, the CUDA code has continued to scale with the GPU roadmap and a
Titan XP is arguably 9-10x faster than 2 Xeon CPUs. No data is supplied at the
low-end for Xeon Phi and I think we can safely assume it's because performance
there sucks. (source:
[http://ambermd.org/gpus/benchmarks.htm](http://ambermd.org/gpus/benchmarks.htm))

Xeon Phi? IMO avoid avoid avoid until they start winning head to head 3rd
party benchmarking fights like Soumith Chintala's fantastic convnet benchmark
data: [https://github.com/soumith/convnet-
benchmarks](https://github.com/soumith/convnet-benchmarks)

~~~
gnufx
So, a specific MD code may or may not work well with KNL -- we don't have
data. KNL looks quite attractive for other chemistry, given all the vector
units, large amount of fast memory, and ability to run realistically-sized
examples without the network, or potentially the network-on-chip. We'll see
how it pans out.

~~~
scottlegrand
I prefer to look at it the other way, why don't you point out an existing and
important chemistry application where KNL bested its contemporary GPUs, say
the best of Knight's Corner versus the best of Kepler (K40 or K80). I'm also
open to Knight's Landing versus GP100 (vaporware versus no longer vaporware
but hard to get)

I'm genuinely interested here because I can't find this anywhere. I don't
think it exists personally.

~~~
gnufx
KNL is not Knights Corner, and I have limited information on either. I'm
interested in data and, more to the point, insight -- not just single
benchmark numbers or specific programs, especially if they've had a lot of GPU
effort and no tuning for KNL. I don't expect KNL to be particularly good for
applications that aren't highly vectorizable, though the memory system may
help.

If I manage to access the KNL here, I'll probably run cp2k and gromacs, though
single node performance is of limited interest, and ELPA doesn't currently
have AVX512-specific support.

~~~
scottlegrand
Here's your answer for GROMACS (and it sucks)...

[http://www.prace-ri.eu/IMG/pdf/wp120.pdf](http://www.prace-
ri.eu/IMG/pdf/wp120.pdf)

Even so, right now, little would please me more technologically than a
competitive Xeon Phi offering, but while KNL is better than KNC, my inside
info says it sucks too (it would have been a lot more interesting, just like
Altera's Stratix 10, if it had shipped before GP100 and GP102).

Right now, I have more confidence in AMD GPUs right now than I have in Xeon
Phi. This 3rd party benchmark is particularly interesting (and it doesn't look
like anyone at NVIDIA is paying any attention to it):

[https://techaltar.com/amd-rx-480-gpu-review/2/](https://techaltar.com/amd-
rx-480-gpu-review/2/)

Sure, NVIDIA is still in the lead, but not with the ~10x margins they used to
have over AMD.

Finally, I figuratively feel like punching the next person who makes the BS
scaling argument over raw performance. GPUs scale too if they're coded
correctly. And cloud datacenters are the worst place for that given their
craptastic ~10 Gb/s interconnect subject to arbitrary network weather effects.

Or butchering Seymour Cray: Your life depends on winning a race, would you bet
your life on a 1,350 HP Venom GT or on 20 179 HP Scion FRSs? I mean
collectively that's almost 3600 HP, right? Except it's even worse because for
GPUs vs CPUs, it's like they priced the Scion FRS like a Venom GT and vice
versa.

I wish you luck finding Xeon Phi winning anything but synthetic tests against
yesterday's news:

[https://www.xcelerit.com/computing-benchmarks/libor/intel-
xe...](https://www.xcelerit.com/computing-benchmarks/libor/intel-xeon-phi-vs-
nvidia-tesla-gpu/)

------
cynod
@imaleppert Agree getting the details on the testing would be helpful. I think
NV was more pointing out that Intel were putting their latest against NV's
oldest. It'd be like testing a RX480 against a GTX660. What's the use of that?

@modeless the new Azure instances have M60's or you can purchase a 1080 or new
TitanX which are both available (although stock has been tight).

[https://azure.microsoft.com/en-us/blog/azure-n-series-
previe...](https://azure.microsoft.com/en-us/blog/azure-n-series-preview-
availability/)

~~~
modeless
Yeah but Azure is also introducing new K80 instances (based on four-year-old
Kepler) becuase the M60 is worse in memory capacity and memory bandwidth, and
not much better in FLOPS. NVIDIA is just not putting their best hardware out
there for cloud providers to use.

Sure I can buy a Titan X for myself (already have), but I can't rent a
hundred, or even one, on EC2 or Azure or GCE. And I can't get a P100 yet at
all. I don't want to hear NVIDIA claiming unfair benchmarking and citing P100
numbers until P100s are actually available either to buy (and ship
immediately) or in the cloud.

------
iamleppert
Why don't they provide a link to their testing methodology? They need to back
up their claims (on both sides) with the actual configuration, all versions,
and sample datasets for people to independently verify.

A docker container that runs their performance suite would be ideal.

~~~
onalark
Except that Docker containers play terribly with virtualization solutions.
Still, some sort of configuration/infrastructure-as-code would go a long way.

~~~
mctx
What about NVIDIA-docker?

[https://github.com/NVIDIA/nvidia-docker](https://github.com/NVIDIA/nvidia-
docker)

------
Gladdyu
`each with one or two sockets of less-capable processors, like Xeon Phi.`

Corporate smacktalk...

------
StreamBright
I do not usually believe in any vendor provided performance benchmark unless I
fabricated it myself. On a more serious note, benchmarking is pretty hard and
people usually discount things that seem minor until you found it the
bottleneck, like the interconnect in this case for example. Another problem
with synthetic benchmarks is that you can always optimize for your exact use
case and it usually yields to pretty good improvements, comparable to buying
faster equipment. The ultimate quiestion which is more cost efficient, buying
faster CPU/GPU or hiring a performance expert.

------
ldargin
I'm a bit surprised that Nvidia mentioned nothing about performance per watt
in their reply.

~~~
NotQuantum
Or that they didn't mention cost for performance. They equated 4 Xeon Phi
servers to ONE DGX-1. The DGX has a 140k price tag.

~~~
sp332
That was for performance scaling, not raw performance. Although 140k probably
includes a nice interconnect.

------
nzjrs
I actually came away from the article thinking, heh Phi looks pretty good.

------
dmoy
Well that was a pretty concise explanation of mistakes. Benchmarking is indeed
difficult to get right.

~~~
mtgx
Especially when you compare on purpose against your competitor's 4 year old
product.

------
sidkashyap
They do not mention the version of Caffe used to test the Intel systems, Intel
claims its numbers based on an optimized branch of Caffe and not the public
(BVLC) version

------
NotQuantum
Shots fired, wow Nvidia is pissed that Intel is trying to edge into the market
with their Phi series.

