
Radeon Instinct – Optimized Machine and Deep Learning - hatsunearu
http://radeon.com/en-us/instinct/
======
slizard
What's particularly interesting here is that the Fiji card they propose is a
very different beast than any of the NVIDIA offerings.

The MI8 card's HBM has a great power and performance advantage (512 GB/s peak
bandwidth) even if it's on 28 nm. NVIDIA has nothing that has even remotely
comparable bandwidth in this price/perf/TDP regime. None of the NVIDIA
GP10[24] Teslas have GDDR5X -- not to surprising given that it was rushed to
marked, riddled with issues, and barely faster than GDDR5. Hence, the P4 has
only 192 Gb/s peak BW; while the P40 does have 346 GB/s peak, it is far higher
TDP, different form factor and not intended for cramming in into custom
servers.

[I don't work in the field, but] To the best of my knowledge inference is
often memory bound (AFAIK GEMV-intensive so low flops/byte), so the Fiji card
should be pretty good at inference. In such use-cases GP102 can't compete in
bandwidth. So the MI8 with 1.5 the Flop rate, 2.5x bandwidth and likely ~2x
higher TDP (possibly configurable like the P4) offers an interesting
architectural balance which might very well be quite appealing for certain
memory-bound use-cases -- unless of course the same cases are also need large
memory.

Update: should have looked closer at the benchmarks in the announcement; in
particular the MIOpen benchmarks [1] MI8 clearly beating even TitanX-Pascal
which has higher BW than the P40 indicates that this card will be pretty good
for latency-sensitive inference as long as stuff fits in 4 GB.

[1]
[http://images.anandtech.com/doci/10905/AMD%20Radeon%20Instin...](http://images.anandtech.com/doci/10905/AMD%20Radeon%20Instinct_Final%20for%20Distribution-
page-021.jpg)

~~~
jsheard
Those MIOpen benchmarks are a bit dubious, since MIOpen is AMDs own deep
learning framework. It's unlikely that code written by AMD is optimal for the
Nvidia hardware.

To be realistic you need to compare AMD hardware running MIOpen to NV hardware
running a framework backed by cuDNN.

~~~
slizard
It's clearly indicated on the slide that those are Deepbench [1] GEMM and
GEMM-convolution numbers. Data for M40, TITAN Maxwell/Pascal and Intel KNL is
actually provided by Baidu in their Github repo.

[1] [https://github.com/baidu-research/DeepBench](https://github.com/baidu-
research/DeepBench)

~~~
jsheard
Sorry, not sure how I overlooked that.

------
zitterbewegung
Does anyone use AMD for deep learning in scientific / industry ? All the
libraries for deep learning I have seen require CUDA and NVIDIA is winning by
merely being the most popular API. Searching github it looks like they are
university assignment projects see
[https://github.com/search?utf8=%E2%9C%93&q=opencl+deep+learn...](https://github.com/search?utf8=%E2%9C%93&q=opencl+deep+learning&type=Repositories&ref=searchresults)

~~~
peller
For what it's worth, AMD has been working on a CUDA compatibility solution for
at least a year now. Announcement[0] and Progress[1]

[0] [http://www.anandtech.com/show/9792/amd-sc15-boltzmann-
initia...](http://www.anandtech.com/show/9792/amd-sc15-boltzmann-initiative-
announced-c-and-cuda-compilers-for-amd-gpus)

[1] [http://www.anandtech.com/show/10831/amd-
sc16-rocm-13-release...](http://www.anandtech.com/show/10831/amd-
sc16-rocm-13-released-boltzmann-realized)

~~~
slizard
Additionally, here's an example of how Caffe was porting using HIP [1]. To be
honest, if the approach really does work, you might see a very quick increase
in the number of applications ported.

All in all, given how elegant HIP is and that HCC seems to make GPUs more
approachable than CUDA (and less silly than OpenACC), there is a great
potential for AMD to gain some traction. My greatest concern is the quality
and robustness of their software stack, their overly optimistic view (at least
from the outside), and their relationship with the rest of the OSS world,
especially given the conflicts that they seems to be running into with
upstream contrib [2].

[1]
[https://www.youtube.com/watch?v=I7AfQ730Zwc](https://www.youtube.com/watch?v=I7AfQ730Zwc)
[2]
[https://news.ycombinator.com/item?id=13136426](https://news.ycombinator.com/item?id=13136426)

~~~
jsheard
The catch, which AMD are quick to gloss over, is that Caffes performance on
Nvidia hardware largely comes from its use of Nvidia's proprietary cuDNN
kernels.

AMDs HIP port is using the "fallback" open-source CUDA kernels, which are
nowhere near as fast as the hand-optimized cuDNN code.

------
visionscaper
There seems to be considerable effort being undertaken to allow TensorFlow to
work with OpenCL [0]. Also see [1]. This coincides nicely with the
introduction of these AMD cards.

I'm looking forward to the day that Nvidia gets some competition in the GPUs-
for-deeplearning market. Further, doing some smaller Deep learning experiments
on my MacBook Pro with AMD discrete GPU is another benefit I'm looking forward
to ;)

[0]
[https://github.com/tensorflow/tensorflow/issues/22](https://github.com/tensorflow/tensorflow/issues/22)

[1] [https://github.com/benoitsteiner/tensorflow-
opencl](https://github.com/benoitsteiner/tensorflow-opencl)

------
rsp1984
Interesting that NVDA is down almost 4% for the day [1] while AMD is up 3%
[2]. Is Wall Street realizing that NVidia is not alone in the ML Hardware
space?

[1]
[https://www.google.com/finance?q=NASDAQ:NVDA](https://www.google.com/finance?q=NASDAQ:NVDA)

[2]
[https://www.google.com/finance?q=NASDAQ%3AAMD](https://www.google.com/finance?q=NASDAQ%3AAMD)

~~~
Twirrim
It's way too early to be making guesses like that.

Micro-trends are mostly meaningless with stocks, unless you're trying to do
high frequency work. Stock shifts and changes to small extents all the time
based on the quirks of all sorts of trading companies, and each of
_Especially_ if there hasn't been any significant news about the company, and
this isn't significant news. It's interesting, but it isn't really threatening
Nvidia's dominance or profits at the moment. AMD needs to make a bigger name
for itself in the sector, start picking up some splashy customers before most
of the market might react.

Looking at the bigger picture, Nvidia stock is up 160% so far this year, but
has been fluctuating a bunch over the last month or so, and it's still well
within the scope of those fluctuations.

------
LeanderK
Ahh, what exciting times we live in. Just look at the example applications:

\- autonomous vehicles

\- autopilot drone

\- personal assistant

\- personal robots

\- ...

i know it's optimistic, but it's not science-fiction.

~~~
akerro
\- running out of natural resources

\- child slavery to build new iPhones in Africa and China

\- killing and burning wildlife to build new farms in Latin America

\- still not having any solution for problem of drinkable water in 2/3 of
World

What a time to be alive!

~~~
grzm
Yes, the world is not perfect by any stretch of the imagination. Lots of
things to be improved. If you could choose another time to live in, when would
it be?

~~~
akerro
Wow, that's a great question. I can't decide if it would be in 20 years or 40
years ago. Just to be part or witness of creating Internet as we know it or as
we will see it! When some of the above problems don't exist any more because
of technological advancement and some worsen because of the very same reason,
but we no longer see them, because no one is looking at the wild parts of the
world.

------
echelon
What's a good GPU / setup for someone doing deep learning at home? Does anyone
have recommendations?

~~~
dylanbfox
I've been curious about this too. AWS does have K80 instances available for
$0.90/hour which isn't too bad for playing around and as long as they keep
updating their infrastructure, you can play with the newest stuff versus
having to upgrade your own all the time.

~~~
echelon
That doesn't sound that bad. I might investigate AWS until putting an
investment into my own dedicated hardware.

I'm just starting off, so my primary concern is speed of iteration and
learning. I want to train on and generate audio phonemes, so this will
undoubtably take a lot of practice.

~~~
Florin_Andrei
If you also play video games, then just get one of the new Nvidia Pascal chips
and install it in your home PC. Dual-boot Windows / Linux. Which card to get?
Bigger is better but it also depends on your wallet.

------
kyledrake
One thing that would be interesting is if you could use cards like this for
rendering multiple instances of X, for the purpose of running things like
WebGL browser screenshotters.

I had to ship out a high-end gamer GPU with a dummy HDMI adapter for this
purpose recently. But it's obviously not very efficient. It would also be nice
to be able to run multiple screens in parallel, not just one per GPU.

I doubt there will ever be a product for my use case, but one can dream...

That said, these are cool. I think they're lower power than the Nvidia
equivalent, but I could be mistaken (I just recall the Tesla models being
power hungry.. enough to cause a real problem in a datacenter rack).

------
raj_m
I really don't think this will make a dent in CUDA's platform. CUDA has a well
established ecosystem in deep learning and compatible cards like Quadro
coupled with very matured platform makes it miles ahead of platform.

That said, I would love to be proven wrong. Healthy competition such as this
fosters much better results. Also CUDA is not without issues in certain
matters.

~~~
mrlatinos
Just speaking from a personal perspective, I took a parallel computing course
at my uni this past semester and CUDA was the main platform we worked on (and
I'm an undergraduate). Nvidia also has a great Udacity course they offer for
free. Unless AMD gets CUDA compatibility working soon, I really don't see how
they're going to catch up as far as adoption goes.

~~~
visarga
They either get CUDA or compatibility with all major frameworks. And they'd
better be cheaper or faster than NVIDIA.

------
rwmj
Are the drivers for this open source? That would be a major improvement over
NVidia.

------
stuckagain
I can't even look at the press picture without remembering that is the exact
same metal card slot tab that I had on my IBM PC 35 years ago. They should
take a picture of the other end or something.

~~~
anoother
I wish modern chassis, and indeed, consumer cards, would come with these
support brackets.

------
jheriko
this website is terrible. it tries to sell me the product, but neglects to
mention what it does.

------
cordite
Other than ML applications, what can I write on this?

OpenCL? Something comparable to CUDA? What about utilizing Vulkan?

------
milesf
Am I the only one that saw this and immediately thought "mining
cryptocurrency"? :)

------
nickeleres
loving on the UI, very reflective of their product

------
ilaksh
But Keras and Tensorflow still only work on nVidia right?

~~~
mastazi
AMD is supporting development efforts to port Caffe, Tensorflow and Torch
[http://www.anandtech.com/show/10905/amd-announces-radeon-
ins...](http://www.anandtech.com/show/10905/amd-announces-radeon-instinct-
deep-learning-2017/3)

------
ipunchghosts
This really doesnt matter for deep learning. There is a large ecosystem built
around CUDA. Unless AMD becomes CUDA compatible (they are working on it but
not there yet) and I can install Torch/TF and run it on my AMD GPU, I will
stick with NVIDIA.

I am all for choice, but AMD has a lot of catching up to do.

