
New P2 Instance Type for Amazon EC2 – Up to 16 GPUs - jeffbarr
https://aws.amazon.com/blogs/aws/new-p2-instance-type-for-amazon-ec2-up-to-16-gpus/
======
Smerity
$0.9 per K80 GPU per hour, while expensive, opens up so many opportunities -
especially when you can get a properly connected machine.

Just as an example of the change this entails for deep learning, the recent
"Exploring the Limits of Language Modeling"[1] paper from Google used 32 K40
GPUs. While the K40 / K80 are not the most recent generation of GPU, they're
still powerful beasts, and finding a large number of them set up well is a
challenge for most.

In only 2 hours, their model beat previous state of the art results. Their new
best result was achieved after three weeks of compute.

With two assumptions, that a K80 is approximately 2 x K40 and that you could
run the model with similar efficiency, that means you can beat previous state
of the art for ~$28.8 and could replicate that paper's state of the art for
~$7257.6 - all using a single P2 instance.

While the latter number is extreme, the former isn't. It's expensive but still
opens up so many opportunities. Everything from hobbyists competing on Kaggle
competitions to that tiny division inside a big company that would never be
able to provision GPU access otherwise - and of course the startup inbetween.

* I'm not even going to try to compare the old Amazon GPU instances to the new one as they're not even in the same ballpark. They have far less memory and don't support many of the features required for efficient use of modern deep learning frameworks.

[1]: [https://arxiv.org/abs/1602.02410](https://arxiv.org/abs/1602.02410)

~~~
lqdc13
Still seems much better to buy your own gtx 1080 for $700 which you would have
spent in a month playing with parameters on these instances.

And at the end of the month you still have a modern GPU to play video games on
etc.

Of course if you have money to burn, these are great.

~~~
egeozcan
By the way, I own a 1080 since two weeks and I can't overstate how powerful
this thing is. Even if you are not into gaming, getting a 1080 is still a
considerable option if you want to experiment with deep learning.

~~~
visarga
Get an GTX 1080 for your local box and use it to write and test the model,
then run in cloud for training. Both are necessary, in fact, they are
complementary. If you want to run random hyperparameter search (train 50
times) then you benefit from many GPUs, but a single 1080 doesn't scale.

------
__mp
MeteoSwiss uses a 12 node cluster with 16 GPUs each to compute its weather
forecasts (Piz Escha/Kesch). The machine is in operation since more than a
year. We were able to disable the old machine (Piz Albis/Lema) last week.

The 1.1km forecast runs on 144 GPUs, the 2.2km probabilistic ensemble forecast
is computed on 168 GPUs (8 GPUs or 1/2 node per ensemble member). The 7km EU
forecast is run on 8 GPUs as well.

~~~
n00b101
What kinds of numerical algorithms are used for this? e.g. Monte Carlo or
Finite Element Analysis?

~~~
__mp
It's a simple finite differences discretization on a rectangular grid. The
computations are horizontally explicit and vertically implicit (data
dependencies in the vertical direction).

Here's the scientific description of the model: [http://www2.cosmo-
model.org/content/model/documentation/core...](http://www2.cosmo-
model.org/content/model/documentation/core/cosmoDyncsNumcs.pdf)

------
paukiatwee
For anyone who interested in ML/DL on cloud.

Google Cloud Platform just released Cloud ML beta with different pricing
model, see [https://cloud.google.com/ml/](https://cloud.google.com/ml/)

Cloud ML costing $0.49/hour to $36.75/hour, compared to AWS $0.900/hour to
$14.400/hour

The huge different of $36.75/hour (Google) compared to $14.400/hour (AWS) make
me wonder what Cloud ML are using, they mentioned GPU (TPU?) but not exact GPU
model.

~~~
aub3bhat
There are two issues, Google offers a black box service where your Tensor Flow
model gets optimized. While AWS offers a generic instance which is far more
flexible and can be used with multiple frameworks. Second AWS offers spot
pricing on GPU instances (though not on P2 instances yet) which can offer
significant discount depending on the type of workload.

I would say that while I still have to evaluate Google Cloud ML, AWS offers a
far far better experience and cost effectiveness.

~~~
andrioni
They already offer spot p2 instances, although I think the main spot pricing
history widget isn't up to date yet. The current minimum spot price seems to
be 0.09 USD/hour for a p2.2xlarge, and there have already been steep price
fluctuations for p2.16xlarges in the past hour.

~~~
aub3bhat
Oh wow they do, 0.09$ per hour thats cool! Hopefully it will take another week
for others to transfer their models from g2 instances, should reduce price
pressure!

------
topbanana

      All of the instances are powered by an AWS-Specific version of Intel’s Broadwell processor, running at 2.7 GHz.
    

Does anyone have any more information about this? Are the chips fabricated
separately or is it microcode differences?

~~~
idunno246
[http://m.youtube.com/watch?v=JIQETrFC_SQ](http://m.youtube.com/watch?v=JIQETrFC_SQ)

I think this is the right talk by him, but there's one where he implies some
differences. For instance, that because they can promise exact temperature
ranges they can clock them differently. My guess is same fab, just binned
differently, or maybe different packaging.

~~~
ComodoHacker
Perhaps with overclocking unlocked via microcode.

------
mrb
Keep in mind the Nvidia K80 is a 2 years old Kepler GPU. Nvidia launched 2
newer microarchitectures since then: Maxwell, Pascal. I would expect to see
some P100 Pascal GPU "soon" on AWS. Maybe 6 months? (Maxwell's severely
handicapped double precision performance reduces its utility for many
workloads.)

~~~
n00b101
> Nvidia K80 is a 2 years old Kepler GPU ... I would expect to see some P100
> Pascal GPU "soon" on AWS. Maybe 6 months?

Not likely.

The AWS CG1 instance type was introduced in 2010 (NVIDIA Tesla C2050 GPU). The
G2 instance type was introduced in 2013 (NVIDIA GRID K520 GPU). Now the P2
instance type has been introduced in 2016 (NVIDIA Tesla K80 GPU). At this
rate, we probably won't see another GPU fleet in AWS until 2019.

~~~
mrb
I never realized Amazon was so consistently late in deploying current-gen
GPUs. How lame. Sounds like an opportunity for a competitor to beat them.

~~~
modeless
It's not just Amazon. No large cloud provider offers recent GPUs. I believe
the blame probably lies with NVIDIA.

------
sysexit
Pretty classless how Jeff describes TensorFlow as an "Open Source library,"
without atributing it to Google.

~~~
lsaferite
Does every mention of an Open Source library or project morally require
attribution to the original authors or something?

~~~
sysexit
> Does every mention of an Open Source library or project morally require
> attribution to the original authors or something?

No. But in my view had this been written by an independent journalist, they
would describe it as "TensorFlow by Google". The project is virtually
completely driven by Google, so "By Google" is a more distinguishing
classification than "Open Source". We can speculate but strong impression is
that Amazon doesn't want to give kudus to Google.

~~~
bdcravens
Not sure that Google would agree. Looking to Google's own words about open
sourcing the project:

"We hope to build an active open source community that drives the future of
this library, both by providing feedback and by actively contributing to the
source code."

------
raverbashing
One thing I discovered recently is that for GPU machines your initial limit on
AWS is 0 (meaning you have to ask support before you start one for yourself)

(This might be an issue of my account though - having had only small bills so
far)

~~~
fletchowns
Wonder if it has anything to do with stolen AWS credentials being used for
crypto currency mining.

~~~
hueving
Doubtful, if you're using stole credentials you don't need to care about
efficiency and you can just use whatever instances are available to the
account.

They're likely doing it as a way to control demand spikes because they have
such a smaller pool of GPU instances to go around. Also, it prevents Joe Schmo
from wasting one to run a web server because he doesn't understand performance
and "just picked the best one".

------
phs318u
My first thought: "I wonder what the economics are like, re: cryptocurrency
mining?"

My second thought: "I wonder if Amazon use their 'idle' capacity to mine
cryptocurrency?"

With respect to my second thought, at their scale, and at the cost they'd be
paying for electricity, it could quite possibly be a good hedge.

~~~
hueving
As far as I understand it, the only way to profitably mine bitcoin is to have
super cheap power and ASICs specific to bitcoin mining. Essentially, GPUs
won't cut it because you're competing with Chinese miners using ASICs attached
almost directly to dirty, cheap electricity generation.

~~~
cloudjacker
You don't understand it.

Mine other cryptocurrencies that are profitable, or may become profitable. It
is VERY, as in EXTREMELY, profitable to mine large stakes in cryptocurrencies
that the market hasn't discovered yet.

Especially ones that aren't even listed on an exchange yet and mining is the
only way to earn it.

The Darkcoin (now renamed to the politically non-threatening name DASH)
instamine was a notoriously profitable use of GPU clusters on AWS.

~~~
hueving
Don't claim I don't understand it when I was clearly talking about the
economics of bitcoin.

Also, if you are paying to mine currencies that don't have an exchange value
yet, you are just speculating. The mining isn't profitable at that point
because the currency has no value. It's just as likely that you chose to mine
a dud. It's exactly the same thing as opening a copper mine at a loss and
hoarding the concentrate hoping that it will gain value.

~~~
cloudjacker
> you are just speculating

you say that like its a bad thing

it is a very profitable endeavor

------
chrisconley
This is great - we'll try to get our Tensorflow and Caffe AMI repo updated
soon: [https://github.com/RealScout/deep-learning-
images](https://github.com/RealScout/deep-learning-images)

------
seanwilson
How do Amazon (or any other cloud provider) make sure they have enough of
these machines to cope with the demand for them without getting too many?

~~~
tachion
You pretty much answered your own question. First, they have 'too many',
second, if they dont, they oversubscribe, third, after running your business
for a while, you can do some math on your stats and predict future demand with
accuracy that's 'good enough'.

~~~
raverbashing
I don't think you can oversubscribe GPUs

But at the price they're charging, if demand exists, they just call nVidia and
order another batch of cards

------
ivan_ah
According to [1], the K80 GPUs have the following specs:

    
    
        Chips: 2× GK210
        Thread processors: 4992 (total)
        Base clock: 560 MHz
        Max Boost: 875 MHz
        Memory Size: 2× 12288
          Clock: 5000
          Bus type: GDDR5
          Bus width: 2× 384
          Bandwidth:  2× 240 GB/s    
        Single precision: 5591–8736 GFLOPS (MAD or FMA)
        Double precision:  1864–2912 GFLOPS (FMA)
        CUDA compute ability: 3.7
    

Is that a good deal for $1/hour? (I'm not sure if a p2.large instance
corresponds to use of one K80 or half of it)

How much would it cost to "train" ImageNet using such instances? Or perhaps
another standard DDN task for which the data is openly available?

______

[1]
[https://en.wikipedia.org/wiki/Nvidia_Tesla#cite_ref-19](https://en.wikipedia.org/wiki/Nvidia_Tesla#cite_ref-19)

------
ajaimk
Priced this config (or close enough) on [http://www.thinkmate.com/system/gpx-
xt24-2460v3-8gpu](http://www.thinkmate.com/system/gpx-xt24-2460v3-8gpu)

Comes to just under $50,000 for the server or roughly 4.5 months @ $14.40

~~~
visarga
That's months of continuous use. If you have periods when you don't use it,
then you'd have to take the occupation percentage into consideration. It's
pretty wasteful not to use a 50K$ server at maximum.

~~~
smw
Also, doesn't include electricity, cooling, data center space, all of which
might be significant?

------
ravenstine
Sounds like a great way to build a custom render farm. My home comouter has a
dirt cheap GPU but it works well enough for basic modeling & animation.
Terrible for rendering, though. I've been thinking of using ECS to build a
cluster of renderers for Maya that I can spin up when needed and scale to the
appropriate size. I don't know for certain if it's cheaper than going with a
service, but it sounds like it is(render farm subscriptions cost hundreds),
and I would get complete control over the software being used. I am glad to
hear that Amazon is doing this. Granted, I'm more of a hobbyist in this arena,
so maybe it wouldn't work for someone more serious about creating graphics.

------
spullara
It is interesting to compare this to NVidia's DGX-1 system. That server is
based on the new Tesla P100 and uses NVLink rather than PCIe (about 10x
faster). It boasts about 170 Tflops vs the p2.16xlarge's 64 Tflops. If you run
the p2.16xlarge full time for a year it would cost about the same as buying a
DGX-1. Presumably Amazon releases their GPU instances on older hardware for
cost savings.

[http://www.nvidia.com/object/deep-learning-
system.html](http://www.nvidia.com/object/deep-learning-system.html)

------
cm2187
Stupid question: are GPU safe to be shared by two tenants in a datacenter? I
read previously that there are very few security mechanisms in GPU, in
particular that the memory is full of interesting garbage. So I would assume
no hardware enforced separation between the VMs too.

~~~
moonbug
The GPUs aren't shared between VMs, each GPU is dedicated to the VM it is
allocated to. The GPU memory gets wiped on initialisation, just like host
memory, so there's no leakage between successive VMs allocated the same
hardware.

If you do absolutely care about not being multitennanted, you can stipulate
exclusive hw allocation when requesting the VM.

~~~
_pmf_
> The GPU memory gets wiped on initialisation

Well, an attempt is made. As we can see in the case of WebGL, there are a
multitude of interesting effects.

~~~
moonbug
Can't speak to what happens on a Windows box, but across AWS instances it's
definitely the case.

------
clishem
Pricing?

~~~
moonbug
[https://aws.amazon.com/ec2/instance-
types/p2/](https://aws.amazon.com/ec2/instance-types/p2/)

From 0.90-14.40 USD/hr.

------
epberry
Yess thank you, thank you, thank you. I was just signing up for the Azure N
Series preview but we're good to go now :).

------
nik736
Anyone knows if video transcoding on GPUs (with FFMPEG) is viable nowadays? If
yes, what are the gains?

~~~
scrabble
This depends on if you're doing live transcoding or VOD transcoding.

I work with live transcode, and it can be beneficial to run on g2 instances.
Running c4.4xlarge I can transcode a good number of 1080@30 in with
1080/720/480/360/240@30 out. With a proper g2 instance I can transcode more
simultaneously.

So cost efficiency really depends on sustained traffic levels. I scale out
currently using haproxy and custom code that monitors my pool and scales
appropriately. But I monitor sustained traffic levels to know when it makes
financial sense to scale up.

If your main concern is transcode speed, CPU is likely sufficient -- I am
unable to transcode faster than real time with live transcode.

~~~
corobo
> I am unable to transcode faster than real time with live transcode

Well, yeah. Maybe quantum computing will change that one day!

What is it you're working on? Colour me intrigued.

~~~
grhmc
I'm not sure if you're kidding or not, but just in case: it is desirable to be
able to encode faster than real time (ie: 1 hour of media takes less than 1
hour of time to transcode) so you have leftover processing cycles to do other
work.

~~~
scrabble
This is true. But real time encode is essentially what you need to do with
live video instead of VOD transcode. You have other processes doing the other
necessary work during the transcode.

All in all, I look for somewhere around 0.25s to complete the rest of the work
and deliver the segment to CDN.

------
asendra
Damn, I would love to have an excuse to play around with this.

