
Nvidia Announces PCIe Tesla V100 - endorphone
http://www.anandtech.com/show/11559/nvidia-formally-announces-pcie-tesla-v100-available-later-this-year?utm_source=twitter&utm_medium=social
======
modeless
Don't get too excited about V100. P100 was announced over 1 year ago and is
_still today_ not available from AWS, GCE, or Azure (despite various
announcements). Although you can now, finally, order one at retail for the
low, low price of $12k+ [1] [2].

The upcoming Volta-based consumer GPUs are going to be your best value for
machine learning, not V100.

[1] [https://www.amazon.com/NVIDIA-Tesla-P100-computing-
processor...](https://www.amazon.com/NVIDIA-Tesla-P100-computing-
processor/dp/B06WV7HFWV/) [2]
[http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l...](http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l=en&s=bsd&cs=04&sku=489-BBCF&cid=298721&st=&VEN1=saalXTMnP%2C101952148149%2C901q5c14135%2Cc%2C%2C489-BBCF&VEN2=%2C&lid=5704670&dgc=ST&DGSeg=SO&acd=12309152537501410&VEN3=105703622544116288)

~~~
jamesfmilne
You can get the Quadro GP100, which is pretty much identical (I've got 2).

[http://images.nvidia.com/content/pdf/quadro/data-
sheets/3020...](http://images.nvidia.com/content/pdf/quadro/data-
sheets/302049-NV-DS-Quadro-Pascal-GP100-US-NV-27Feb17-HR.pdf)

Agreed re consumer GPUs for machine learning though.

~~~
modeless
Interesting, I hadn't heard of GP100, only released in February. Still, at
$6000, as expensive as eight 1080 Tis.

~~~
tanderson92
Yeah but the double precision on the 1080 TIs is crippled, so the eight 1080
TIs would (in theory) be 50% as powerful in FP64 performance of a single
Quadro GP100. There are good reasons to use the higher end chips, in some
cases (maybe not for ML).

------
visionscaper
The "NVIDIA Tesla Family Specification Comparison" table indicates 112 TFLOPS
"Tensor Performance (Deep Learning)" for Tesla V100 (PCIe).

Is that double precision?

The Nvidia 1080 Ti has a double precision performance of 332 GFLOPS [1]. If
the above number is for double precision computing, the Tesla V100 (PCIe)
would be about 337 times as fast (!!)

Does anyone have more insight into these numbers?

[1]
[https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units)

UPDATE : I should have read the article more carefully, it seems to be a mix
of FP16 (half precision) and FP32 (single precision). That would likely mean a
factor ~10 in computation performance (specifically for deep learning)

~~~
wmf
"...matrix operations with FP16 inputs and FP32 accumulation..."
[http://images.nvidia.com/content/volta-
architecture/pdf/Volt...](http://images.nvidia.com/content/volta-
architecture/pdf/Volta-Architecture-Whitepaper-v1.0.pdf)

~~~
visionscaper
Thanks, I also read the article better now. I updated my comment.

------
polskibus
I wonder when will such briefs include information about cryptocurrency mining
performance on a regular basis.

~~~
colordrops
I was under the impression that ASICs were far better for mining. Have GPUs
caught up again?

~~~
dantillberg
GPUs are definitely dead for bitcoin. You'll get something like 10^9th more
efficient SHA256 hashing from an ASIC than a GPU.

But for many other coins, there are two big factors bringing GPUs back into
the game for mining:

\- Lots of alt-coins competing for popularity; many might have ASIC mining
implementations in the future, but there either hasn't been enough time or
enough money to be made by doing it yet.

\- There's been various research done to reduce the potential gain in
efficiency from an ASIC implementation over using commodity hardware. This was
done in response to the concentrating effect of ASIC mining operations, which
put a greater percentage of global hashpower in a smaller set of hands. The
main way ethereum implements this "ASIC resistance" is by exercising memory
bandwidth -- an area where GPUs are already quite optimized -- in the hashing
algorithm.
[https://github.com/ethereum/wiki/wiki/Ethash](https://github.com/ethereum/wiki/wiki/Ethash)
goes into detail.

------
lvspiff
I remember a time back before the turn of the century me and my friends joking
how cool it would be to run doom on a CRAY - who knew that less than 20 years
later we could be...

~~~
astrodust
DOOM wouldn't run very well on a CRAY machine for multiple reasons, but one of
which is it doesn't have the right video hardware.

A Raspberry Pi has way more compute power than the first generation Cray
machine and a 386-vintage machine has better single-core, non-vectorized
performance.

------
redtuesday
With the incredible die size of Volta, and the cost associated with it, I
wonder if the rumors are true that AMDs next gen GPU codenamed Navi is using
the same strategy as their Zen CPUs (Ryzen, Threadripper, Epyc) - multiple
smaller dies connected for better yields and lower cost.

Would something like that be even possible with GPUs? I know the Exascale
paper from AMD mentions GPU chiplets which sound like this.

------
sgt101
Tensorflow 1.2 has integrated new instructions from Intel, does anyone have
any figures on 1.2 showing CPU performance vs GPU's?

------
petra
What else besides deep learning can the tensor core do well ?

~~~
thearn4
CFD and other high-fidelity multiphysics simulations can be efficient when
GPU-bound. It all comes down to being able to write a matrix and stream a
bunch of vectors to multiply it by. When considering Krylov subspace methods,
this extends to covering a large part of the problem space of linear algebra.

I think the Tesla cards in particular are double precision, which is not
necessarily the best for deep learning applications.

~~~
tanderson92
High-fidelity multiphysics seems incompatible with FP16, unless you're talking
some kind of mixed precision method.

