
Nvidia Announces Tesla P40 and P4 - Neural Network Inference, Big and Small - yread
http://www.anandtech.com/show/10675/nvidia-announces-tesla-p40-tesla-p4
======
wyldfire
Wow, I've been out of the GPU game for a year or two now and it's clear how
the market has shifted (or how NVIDIA wants it to move). Back in the day we'd
keep asking NVIDIA and AMD for half precision support, looks like not only did
they do that by now but there's 8-bit integer support too! I was about to say
"well who the heck would be able to use 8-bit integers for much?" when I saw
in TFA: "offering an 8-bit vector dot product with 32-bit accumulate."

In case it's not clear from the title (which used to read "...47 INT TOPS"),
that's 47 [8-bit integer] tera-operations-per-second. Anandtech says it will
"... offer a major boost in inferencing performance, the kind of performance
boost in a single generation that we rarely see in the first place, and likely
won’t see again." No kidding!

~~~
Eridrus
I've read a lot about fp16 being good enough for training, but the thing that
people don't mention is that just swapping fp32 for fp16 will make things fail
because your deep learning framework doesn't implement everything you use for
fp16, and after you fix that, it will probably diverge because standard
practices aren't used to dealing with such a limited range.

Which isn't to say that the Deep Learning stacks won't get there eventually,
but at the moment it's not as easy as flipping a switch.

~~~
aab0
For a 4x boost, I imagine Tensorflow and Torch will get support shortly after
the GPUs start shipping in real quantities.

~~~
Eridrus
It's a 2x boost for training since you can't use int8 for training and need
fp16.

INT8 is a 4x for inference, but most people aren't using GPUs for inference
atm.

~~~
scottlegrand
Some of us are...

[https://blogs.aws.amazon.com/bigdata/post/TxGEL8IJ0CAXTK/Gen...](https://blogs.aws.amazon.com/bigdata/post/TxGEL8IJ0CAXTK/Generating-
Recommendations-at-Amazon-Scale-with-Apache-Spark-and-Amazon-DSSTNE)

~~~
Eridrus
Fair enough; when I said most people I meant companies who are not
AmaGooFaceSoft and their international equivalents. Though I can't quite tell
if you're doing GPU batch predictions and storing them or doing them in
realtime with Spark Streaming.

Unrelated question though: any chance you will do blog post/paper about how
DSSTNE does automatic model parallelism and gets good sparse performance
compared to cuSparse/etc?

------
1024core
Just as a comparison: the P40, at 12TFlops/s, would have made the Top 500 list
as recently as 2008:
[https://www.top500.org/list/2008/06/?page=5](https://www.top500.org/list/2008/06/?page=5)

~~~
ipunchghosts
I wish there was a website where i can type in my device, it gives me the FP32
FLOPS, and then tells me where that device would show up and place on past top
500s.

~~~
frou_dh
IIRC that was the kind of ad-hoc data discovery and munging that Wolfram
Language was demoed as being good for. I thought W-L had an online IDE but I
can't find one now.

------
LeifCarrotson
For some reason my brain skipped "Nvidia" and I assumed this was about a
budget/low-range version of the Tesla Model S. They already have the Tesla 60,
60D, 70, 70D, 75, 75D, 85, P85D, P90D, and now P100D. Why not add the P40?

Nvidia needs a "Ludicrous Mode" overclock setting for these cards. Push the
button for 2.5 seconds of super-high frame rate! With some cool down time
required.

~~~
hatsunearu
>Nvidia needs a "Ludicrous Mode" overclock setting for these cards. Push the
button for 2.5 seconds of super-high frame rate! With some cool down time
required.

This actually is a thing with GPU Boost 3.0. The card boosts up automatically
if the temperatures are low, though lately there were issues with the GPU
clock frequency oscillating because there wasn't enough hysteresis.

(This, by the way, means that GPUs are thermally constrained rather than
timing-constrained.)

------
agentgt
Are there any IaaS' that offer access to GPU hardware. I have always wanted to
play around with this stuff (as I'm completely ignorant of GPU tech these
days) but I'm not interested in buying hardware.

EDIT... apparently when I last googled years ago this was not the case. I wish
I could delete this comment :(

~~~
Eridrus
Despite all the announcements, if you want it by the hour AWS is still your
only option.

------
noodles23
I don't know why, but no public cloud to my knowledge offered even the Maxwell
GPUs.

As cool as these cards are, I really hope they become available on AWS soon.
The current AWS GPU instances are so weak we're contemplating buying a
physical desktop setup.

~~~
Eridrus
Second hand cards are pretty cheap on eBay. Much cheaper than AWS in the
medium term.

------
p1esk
Why can't they release an FP16 card without FP64 cores?

Currently, P100 wastes about half its area on dedicated FP64 cores, which no
one needs for deep learning.

------
ipunchghosts
FYI, The P40 looks similar in specs to the Titan X (Pascal) but with half the
RAM (though the titan x has 12 GB of RAM which is quite a bit).

------
hatsunearu
P40 uses GP102 like the pascal titan X, and has more CUDA cores than the titan
X. Just putting that out there.

------
dharma1
Hope these will appear on public clouds soon

