
Deep Learning Hardware Limbo - TheAlchemist
http://timdettmers.com/2017/12/21/deep-learning-hardware-limbo/
======
paulsutter
We’re getting 90 TFLOPS on AWS V100 for deep learning (using dlib not
Tensorflow), so I can’t understand this comment:

> With TensorCores the Titan V has a new shiny deep learning feature, but at
> the same time, its cost/performance ratio is abysmal.

~~~
TimothyFitz
Can you help me understand, how can the V100 have 10x the TFLOPs of the P100,
but only get a 2.5x speed increase in training a neural net according to
nvidia's docs? [https://devblogs.nvidia.com/parallelforall/inside-
volta/](https://devblogs.nvidia.com/parallelforall/inside-volta/)

Do we need significant software changes to take advantage of the new power?
Are the TFLOPs somehow not directly comparable?

~~~
paulsutter
Most published numbers aren’t actually using the tensor cores. We’re using
dlib (which is in c++) and gives us more direct control, but surely Tensorflow
will eventually do this too.

------
mattnewton
Disclaimer, I am long Nvidia. But this doesn’t convince me that Nvidia is
really threatened at all, they can easily lower the Volta chips cost and win.
CUDA has so much more momentum than anything else, that cost/performance
comparisons don’t make sense yet, and won’t for a while.

I hope as an Nvidia customer that this competition catches up sooner rather
than later, but I am an nvidia investor because it seems like the competition
is incredibly far behind and are trying beat a very quickly moving target.

~~~
varelse
The new driver EULA clause banning the use of GeForce GPUs in "datacenters"
except for blockchain purposes has ended my positive impression of NVDA. To
that end, we have ceased the use of the term in favor of "GeForce Houses of
Ill Compute."

[https://wirelesswire.jp/2017/12/62708/](https://wirelesswire.jp/2017/12/62708/)

------
cs702
This is potentially really good news.

I would love to see real competition in this space.

I would love to see dedicated deep learning hardware (i.e., without all the
graphics cruft) commercially available, with good support in TensorFlow/Keras
and Pytorch. (Perhaps Google will consider selling TPU hardware?)

Even more, I would love to be able to mix hardware from different vendors and
use a framework like TensorFlow/Keras or Pytorch to manage heterogeneous deep
learning hardware from Nvidia, AMD, Intel, and maybe others (e.g., Google).

BTW, the moniker "GPU" no longer feels right to me. We should start calling
deep learning hardware something else, like "DLU" for deep learning unit, or
"AIU" for AI unit.

~~~
santaclaus
> without all the graphics cruft

What cruft are you referring to here? I doubt that removing the rasterization
logic from the GPU is going to make them signifcantly faster at matrix-vector
multiplies or whatever.

------
aub3bhat
This is ridiculous, NNP has not even reached the market and he is already
betting the farm on it. The reason Titan V does not yet significantly
outperform previous generations is because those Tensor Cores have not been
properly utilized. E.g. Current stable version of TF 1.4 does not uses CUDA
9.0 .

~~~
deepnotderp
> " NNP has not even reached the market"

And it never will....

Seriously , Nervana is dead, they're on 28nm still and most of the team has
left (e.g. many have gone to Cerebras systems).

