
Intel Declares War on GPUs at Disputed HPC, AI Border - jonbaer
https://www.nextplatform.com/2016/11/20/intel-declares-war-gpus-disputed-hpc-ai-border/
======
stonogo
Intel's roadmap is worthless, since they only seem to actually produce a given
Xeon PHI SKU for about two years. Nobody -- and I mean nobody -- is happy with
Intel's complete lack of availability for hardware replacements as the MIC
addons fail. I can only hope the self-booting Knights Landing product is ...
stabler.

I am aware of about fifteen different sites that are enacting significant
architecture changes as the 5110 supplies dry up and the 7120s are clearly
next.

It does not matter which product is technically superior when one of them is
literally unsupportable.

------
nl
So.. speaking as someone who does some deep learning (but quite a bit of ML
more broadly).

Intel are correct that the memory size on GPUs is increasingly becoming a
problem in some complex deep learning models. Having access to cheap(ish) DDR4
RAM using the same programming model would be a big win.

Interestingly, AMD is going after a related market (visualization) which is
also very RAM hungry by building GPUs with M.2 SSDs onboard[1]. AFAIK no one
has tried these for deep learning.

But within the deep learning community there's a fair bit of cynicism about
Intel's claims. Over the last few years they have frequently claimed parity
with GPUs... any day now. Yet they keep falling further behind in any useful
benchmark.

[1] [http://www.anandtech.com/show/10518/amd-announces-radeon-
pro...](http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-fiji-
with-m2-ssds-onboard)

~~~
conjectures
More memory would indeed be a bid deal for ML applications.

Found this comment useful as it's not clear to me whether the 'Knights
Landing' will actually be comparable to GPU in terms of pure grunt.

Then I wonder if Nvidia will respond with more memory on their cards. From my
totally uninformed perpective, that would seem easier than the transition
Intel are trying to go for.

~~~
nl
The programming model for GPUs isn't great - since they aren't general purpose
platforms you have to keep thinking about moving data around between the GPU
memory and the main memory.

Intel could bypass that and end up with an easier programming model. But..
believe it when you see it.

Note that this is almost entirely about deep learning though. Non DL machine
learning is mostly done on CPUs at the moment (with a few exceptions) and
generally performs adequately. More speed is always nice, but you don't hear a
lot of complaints about XGBoost's speed, or FastText, or VopalWabbit, or the
various FTL regression algorithms implemented on CPUs.

~~~
deepnotderp
Yes, but deep learning is clearly the future. I'd love to fully unroll my
rnns.

------
dman
Intel seems to be taking a long time getting things to market in terms of SKUs
that the general developer can buy. This is critical for gaining mindshare for
the programming model. Both Knights Landing and the new chips from Altera have
failed to materialise into general availability in a SKU priced at <500$. Part
of what has made Nvidia so successful here is that the CUDA ecosystem scales
up and down their entire product line.

~~~
zamalek
> the general developer can buy

Having a non-integrated GPU (i.e. NVIDIA or AMD) in a machine is not a stretch
of the imagination these days. This means that any curious dabbler can grab
Tensorflow and start learning ML. Over the past few months, I've been seeing
exactly this on HN: many developers with ML Show HNs. In my opinion, it is
precisely the general availability of ML that has caused this explosion.

As you pointed out, the above scenario falls away entirely under an expensive
SKU. I strongly believe that having to buy any specific SKU in order to
approach this stuff will spell doom for the tech. If it's not on every SKU
down to the Core 2, I really can't see it taking off.

~~~
petra
Why would na expensive SKU be a problem ? as long as they offered it as a
service and they ROI would be significantly better, it would be fine.

~~~
nl
Because library developers want to have things available without paying by the
hour.

Once you have the software, a service is fine. But the ecosystem of tools
around NVidia CUDA and CuDNN is there because you can buy a gaming card for a
few hundred dollars and hack up libraries you find useful.

------
vmarsy
Interesting article presenting a "war" between Intel CPU and Nvidia GPU.

I'm curious where AMD falls in this war. While it's true GPU can provide
really nice theoritical TFLOPS, the cost of moving data in and out of the
GPU's memory is a well known issue. This renders GPU much less attractive for
real applications.

The reason I mentioned AMD is that I read some time ago about their
Heterogeneous system architecture [1]. One of the HSA's goal is to get rid of
the particular issue of moving that data. I wonder if this was adopted in any
HPC cluster or anywhere else.

[1] [http://developer.amd.com/resources/heterogeneous-
computing/w...](http://developer.amd.com/resources/heterogeneous-
computing/what-is-heterogeneous-system-architecture-hsa/)

~~~
magila
Right now AMD is fighting for its life, it's in no position to take on Intel
and Nvidia in the HPC space. If their consumer Zen processors flop then AMD
will likely be heading for a buyout and/or bankruptcy. Only if both Zen and
Vega (their upcoming high-end GPU) are huge successes will they have the
resources to mount a serious effort to capture some of the HPC market.

~~~
rdtsc
> Right now AMD is fighting for its life,

I thought so too, but looking at it stock recently it seems to be doing much
better (especially in last year). Not sure how to interpret that. Wonder if it
was mostly investors being happy they divested from that ARM microserver
company, or there is genuine upswing and increased market share.

~~~
dave_sullivan
> looking at it stock recently it seems to be doing much better (especially in
> last year). Not sure how to interpret that.

One interpretation: NVIDIA has been doing a very good job of A) pursuing deep
learning and B) talking that up on earnings calls. Even if they're not seeing
much money from it right now, investors are excited that they might in the
future.

This leads some investors and algos to think, "Hey, NVIDIA is doing well, and
AMD is nominally in the same business as NVIDIA, so let's buy AMD because it's
cheaper and therefore must represent a better value." This drives up the price
of AMD stock, which is itself already prone to more volatility because of
their lower market cap.

However, they are two fundamentally different companies. I hear a lot about
founder-led vs non-founder-led businesses, and AMD vs NVIDIA is a good example
of non-founder-led vs founder-led atm. Also, they've done a very poor job with
their ATI acquisition. The ATI acquisition is why they even have a leg to
compete with NVIDIA in the first place, but that train has basically left the
station at this point.

Intel is in a weird position because their market cap is 10x NVIDIA's, so they
need markets where they can make literally 10x more money than for NVIDIA to
be interested. Victims of their own success, innovators dilemma, etc.
Personally, I think that will hinder them because everything they want to try
won't be "big enough" while NVIDIA doesn't suffer from that problem and AMD
just has no idea what it's doing besides professional supply chain management.

Respect where it's due, Nvidia will win the hardware side of deep learning and
fast matrix multiplication for the foreseeable future.

~~~
dharma1
Right now the only thing I can see holding AMD back is their lack of vision to
drop a bit of cash on a CuDNN equivalent (and the ML framework support for
it), especially fast convolution kernels optimised for their hardware. They
are waiting for the open source community to do the work for them, but no one
is picking it up because it's a lot of work.

A couple of good AMD engineers should be able to knock out fast Winograd
kernels pretty quick.

~~~
ansible
Their best bet it to increase support for OpenCL rather than invent something
new. And then improve / create OpenCL support for TensorFlow and other popular
toolkits.

~~~
dharma1
Yes, that would be OpenCL/SPIR-V. But you still need to put in the work to
optimise convolution kernels for specific hardware.

TF has openCL support in the works, AFAIK outsourced to Codeplay -
[https://www.codeplay.com/portal/tensorflow%E2%84%A2-for-
open...](https://www.codeplay.com/portal/tensorflow%E2%84%A2-for-
opencl%E2%84%A2-using-sycl%E2%84%A2)

Caffe has an OpenCL branch, works on AMD but about quarter of the speed of
equivalent NVidia hardware using CuDNN 4/5, mostly due to unoptimised kernels
-
[https://github.com/BVLC/caffe/tree/opencl](https://github.com/BVLC/caffe/tree/opencl)

------
jakozaur
Intel doesn't have that much of natural room to expand. They completely missed
the mobile, Moore law is slowing down, so their edge slowly erodes.

They been talking about killing GPU many years. They been doing it both
technical and non-so technical point. E.g. in 2011 Intel paid $1.5Bln to
NVIDIA, because the Intel killed NVIDIA chipset business through not so fair
licensing practice.

Instead NVIDIA invested so much in the GPU computing ecosystem CUDA. For many
early years it was niche, but quickly growing. However, they hit jackpot with
deep learning. I wouldn't be surprised that majority of self-driving cars will
have NVIDIA chip in it.

~~~
miloofcroton
The only problem with Nvidia, if you look at the entire system, is that they
rely on Intel's CPUs for most of the computer systems that they are a part of.
Sure, they have their ARM core CPUs that have proven to be decent, but I don't
see them blowing anyone out of the water with them. Furthermore, there is no
interchanging GPUs, so the only people using Nvidia CPUs are the ones that buy
into Nvidia's entire ecosystem, which is necessarily going to be fewer
people/groups.

AMD, on the other hand, while fighting Intel and Nvidia for the past decade
and coming up short, may be at the horizon of a breakthrough. Only AMD seems
ready to find the synergy between CPUs and GPUs. A Zen and Vega combination of
architectures could essentially be an Intel i7 top of the line + an Nvidia GTX
top of the line. Add some special sauce, and they could really do it.

Nothing that Intel has done in a long time has been impressive, but Nvidia is
on fire. AMD's CPUs may help them leap ahead instead of merely catching up
(which Vega may do anyway).

~~~
frozenport
AMD has been trying to find a synergy between CPUs and GPUs for over a decade.
Heck they were the ones who bought ATI! They've been unable to deliver because
"a synergy" between GPU and CPU doesn't mean very much.

In the best case, the architecture will be able to avoid memory copies, but
although inconvenient, existing applications have been able to pipeline host-
device/device-host memory copies, effectively negating the problem.

Even on this limited promise, AMD hasn't been able to deliver.

~~~
geezerjay
> They've been unable to deliver because "a synergy" between GPU and CPU
> doesn't mean very much.

Contrary to what you may believe, that "synergy" does mean a lot, considering
that one of the worlds fastest supercomputers does run on Opterons and NVIDIA
Tesla cards.

The thing is, high-performance computing is a niche market. Yet typical
consumer hardware needs aren't CPU-driven, but GPU driven. Computer games
don't take advantage (or need) multi-core CPUs, thus AMD's bet on the 6-core
and 8-core AMD FX line didn't paid off as it was expected. Yet, everyone needs
a good GPU to run games, and the "synergy" between that the CPU delivers and
the GPU processes ends up being the bottleneck in consumer hardware.

~~~
frozenport
Except you haven't told me what synergy means. Also the cpus mentioned have
fused FP units, meaning their cores are more comparable to Intel's
hyperthreads. They often failed to perform at i5 levels, and any small
performance increase was offset by higher energy consumption.

------
llukas
"Second, with the above notes in mind, remember that Intel’s strength even
with its own accelerator, is that there is no funky programming model—it’s
CUDA versus good old X86."

Now, please show us non-trivial use cases where recompiling "good old X86" to
Knights Mill architecture yields satisfactory performance?

To get good performance you need to ensure your code makes good use of Knights
Mill vector instructions (AVX-512*).

~~~
tlb
Most likely you call BLAS routines for matrix multiply, which use the right
instructions. So any code using BLAS (which includes numpy) will run fast.

Using a GPU is much more complex, because it has a separate memory space and
you have to copy arrays back and forth. It's easy to lose all your performance
gains if you get the array allocation wrong.

~~~
llukas
If you use KNL as an accelerator you got same problem as GPU.

If you're using it in standalone mode then keep in mind that every part of
code that is not using AVX runs on slow Atom-like CPU. You have to vectorize
code yourself or find a library that did that for you.

Neither of above is close to "recompile good old X86 and run".

~~~
CmdrSprinkles
The argument is that you should be vectorizing your code anyway. You won't see
as much of a boost on a skylake as you would on a KNL, but you are still going
to see a pretty good boost.

Obviously if your code is completely serial (and arguably bad), you have a lot
of work ahead of you. Whereas, if you have been taking advantage of AVX
already (or relying on libraries that do), you are basically just going to
need to recompile.

Are you going to get peak on a KNL? God no. But if it runs well on a "normal"
processor, it is likely to run pretty well on the KNL. Which is a MUCH better
foundation to start from than if you are trying to port a code to a GPU

And those tweaks you make to better utilize that KNL? It will probably improve
normal perf as well.

It is obviously not as simple as the marketing led us to believe, but it never
is. GPUs are a load of toss without a lot of work (and often some creative
presentations). But the underlying principle of "Use the same code on both
types of processor" does actually hold true as of KNL.

As for performance numbers: Check the presentations from SC as they become
available. Multiple outlets have begun to publish, or at least present, these
numbers and their experiences using KNLs. I would personally argue that we
still have a long way to go, but there is actually light at the end of the
tunnel and it does look like there is an actual path now.

------
SEJeff
I'm pretty sure Nvidia is cleaning house with Intel when you compare something
like say a K80 vs a Phi. The number of people who can do amazing and wonderful
things (and frameworks to support them) with an Nvidia GPU vs a Phi is also
dramatically different.

~~~
mattkrause
nVidia has really hustled to make that happen.

I expressed some vague interest in using CUDA for something and we were
offered a demo account on one of their boxes, invited to apply for an academic
hardware grant, and pointed at tons of training material (books, a MOOC,
conferences, and even some 1-2 day sessions), libraries, and other resources.

On the other hand, my institute apparently has some Phis. I discovered this
because there's a small queue with that name on our cluster. I don't think
I've ever been contacted by an Intel rep or offered training material--or even
marketing material--from Intel, let alone a Phi of my own to experiment with.

------
nullnilvoid
Nvidia's stock price has tripled from $30 to $93 in the last 12 months, while
Intel's share price stayed at $34, exactly the same price a year ago. Although
Intel is still a lot bigger, they want a piece of that pie too. That said, the
chance for them to win a significant market share is slim. They have failed a
few times in entering standalone GPU market. Will this time be different?

~~~
moyta
Nvidia has taken over a newer segment, cars, and they have a stronghold in the
HPC market due to the sheer number of academic compute clusters that have
based their setups on Nvidia hardware.

Intel has forced Nvidia out of the mobile and lowend/midrange desktop space,
eliminating what used to be their bread and butter, so this pivot by Nvidia to
markets where they either control the whole stack with their custom ARM cores,
or where Intel can't lock them out by getting rid of PCIE lanes like on
laptops/desktops (AMD would get subbed in cause PCIE lanes are all that
matter) has definitely put them in a stronger position, justifying that
valuation.

------
formula1
The article mentions that to write for Intels Xeon Phis, its x86. I imagine
this is mostly accurate but is creating writing for this as simple as creating
another thread? That seems naive

Looked into it, [0] looks like you add some compiler instructures before code
you want to outsource to the coprocessor. It doesnt look extreme, mostly
naming which variables need to be sent and to where. Ive never worked with
cuda so I dont know if its a pain or not. Looking into it, it seems instead of
x86 code it is cuda prefixed functions that will offload. Granted, there are
speed advantages to having direct accessbto memory, I dont think Api is the
biggest selling point.

Im quite interested in seeing if nvidia thinks this is a threat and how they
will respond. New systems may stay with nvidia just because the experience and
failures have been there. Though if the speed gains are even 25% I would bet
there would be a migration

[0]
[https://software.intel.com/sites/default/files/managed/ee/4e...](https://software.intel.com/sites/default/files/managed/ee/4e/intel-
xeon-phi-coprocessor-quick-start-developers-guide.pdf)

~~~
m_mueller
Here is the main problem IMO: To get reasonable performance (i.e. comparable
to GPU), you need to make use of vector instructions as well. And it's still
not easier to vectorize your x86 code rather than to just port it to CUDA.
With CUDA you get two parallelisms done in one programming paradigm: Both
vector and multiprocessor parallelism just gets mapped to the kernel scatter
initialisation, and the kernel itself is just written as a scalar operations
plus some index calculations. In terms of readability / ease of use it's
[naive x86] > [CUDA] > [performance portable parallelized + vectorized x86].
An if you don't have the last version of x86, which almost noone has, then
Xeon Phi will be just as much portation work as GPU, if not more (because
Nvidia has a considerable lead in tooling support and has actually even gained
more lead since Intel's Phi came out because Intel is just not very agile).

~~~
gcp
_And it 's still not easier to vectorize your x86 code rather than to just
port it to CUDA._

This a million times! The SIMT programming model that the GPUs (except
Intel's) support so well is way easier to program and parallelize for than the
explicit SIMD that x86 requires.

PS. Are you the person that wrote gogui?

~~~
m_mueller
Nope, but I'm writing this:

[https://github.com/muellermichel/Hybrid-
Fortran](https://github.com/muellermichel/Hybrid-Fortran)

------
chunsj
Though I'm not that expert on this area, can Intel make H/W (FPGA) as S/W
commodity by making FPGA design as open source or equivalent unlike other or
existing vendors? If Intel can do this, I think in the long term, Intel can
re-take their competence.

------
wyldfire
> big plans to pull the mighty GPU down a few pegs in deep learning...will
> offer a 100X reduction in training time compared to GPUs...loosely laid down
> plans to eventually integrate...hesitant...on when

"Please don't get too far into the OpenCL/SPIR/etc software ecosystem, guys.
We will totally lap these GPUs Real Soon Now. And the best part is that it
will all be x86!"

I really wish Intel would produce a real GPU with fixed-function parts like
the competition (and GDDR5 or whatever's next!). Their most attractive quality
is their commitment to the open source world.

------
madenine
> 100X reduction in training time compared to GPUs

Sounds great, when will it..

>Intel loosely laid down plans to eventually integrate the unique Nervana
architecture with Xeons this week, but was hesitant to put a year stamp on
when that might happen.

So their theoretical eventual integration is hypothetically 100x faster than
current-gen GPUs?

If they can deliver, great news for deep learning - but this seems like a lot
of speculation.

