
Introducing Preemptible GPUs - boulos
https://cloudplatform.googleblog.com/2018/01/introducing-preemptible-gpus-50-off.html
======
minimaxir
A month ago I ran benchmarks on CPUs vs. GPUs on Google Compute Engine, and
found that GPUs were now about as cost-effective as using preemptible
instances with a lot of CPUs: [http://minimaxir.com/2017/11/benchmark-
gpus/](http://minimaxir.com/2017/11/benchmark-gpus/)

That was _before_ preemptible GPUs: with the halved-cost, the cost-
effectiveness of GPU instances now doubles, so they're a very good option for
hobbyist deep learning. (I did test the preemptible-GPU instances recently;
they work as you expect)

~~~
Eridrus
I'm always disappointed in these comparisons since they never look at AWS spot
instances, which are the real competitors for hobbyists.

~~~
maksimum
I think the real competitors for hobbyists are physical GTX 1060s. Hobbyist
cloud-computing is a different question though.

~~~
cobookman
Doesn't Nvidia's new eula make that Difficult. Anything requiring more than a
handful of gpus would be classified as a data center deployment. Which is
against the eula.

IANAL though so I might have interpreted this incorrectly.

[http://www.nvidia.com/content/DriverDownload-
March2009/licen...](http://www.nvidia.com/content/DriverDownload-
March2009/licence.php?lang=us&type=GeForce)

~~~
jdietrich
I doubt that any court would consider a half-rack in someone's closet to be a
"datacenter", nor would I expect Nvidia to enforce that EULA term against a
hobbyist.

~~~
cobookman
But if that hobbyist ends up creating a billion dollar business, that's
leverage Nvidia has for a lawsuit.

Similar strategy of Adobe won't sue a single user for pirating Photoshop, but
the second they have a successful business...that's a different story.

~~~
vlod
> ends up creating a billion dollar business, > that's leverage Nvidia has for
> a lawsuit.

A great problem to have. Maybe first concentrate on creating a billion dollar
business and by that time you can afford to get some 'approved' cards.. ;)

------
gputhrowaway
The comparison with AWS Spot instances are meaningless since with recent shift
to per-second pricing, AWS spot market has changed significantly, there is
very little price volatility but also its almost impossible to get a spot
instance with GPU allocated. I almost always get spot-capacity-not-available
message.

Googles preemptible model achieves a lot more fairer distribution of GPU's as
opposed to AWS's model. Though AWS ended up with spot market since you got
charged for the entire hour up-front and if AWS evicted you, then you got the
entire hour for free, this made pricing spot instances complicated for both
users as well as AWS very difficult. With shit to per second billing this
issue has been eliminated to a large extent since you only get first 10
minutes free if you get pre-empted within those 10 minutes.

tl,dr; GCP preemptible instances are superior to AWS along all possible
dimensions, price, flexibility, IO etc.

~~~
boulos
Disclosure: I work on Google Cloud (and helped launch this).

While I appreciate the sentiment, there are certainly things some people
prefer about Spot. For example, you basically get up to 4 minutes notice
compared to the 30 seconds I chose (see the other thread). Similarly, until
this change we didn't offer preemptible VMs with GPUs attached.

You're right about the complexity of Spot. Preemptible (and Azure's copy, Low-
Priority VMs) is all about a fair, predictable price. Not _everyone_ hates
markets, but the number of companies and customers burned by the Spot market
gave us the conviction to push for a simpler, fixed price.

Again, thanks for the praise (but there are pros and cons!).

------
viridian
This is a great (read:inexpensive) resource for big data modelling if someone
is willing to build for it. By choosing to be the customer of least concern
and designing software that is able to handle being bounced, you are getting a
terrific deal, assuming your work isn't particularly time sensitive, such as
in a bioinformatics research lab.

I wish they put the minimum time you will be granted cycles though, as that
information seems like an important design constraint when choosing the size
of data chunks you design for.

~~~
LeifCarrotson
> Compute Engine may shut them down after providing you a 30-second warning,
> and you can use them for a maximum of 24 hours.

That's a good start...looks like the model is "save your work often". I'm sure
you could run a few instances for a while and model the distribution of time
granted.

~~~
viridian
Right, my concern is more the event of them axing work 32 seconds in or
something, which could be a problem for some data analytics tools I've seen,
that need a good 70 seconds or so to reach a steady state.

~~~
jkaplowitz
Quoting their docs: "Compute Engine doesn't bill for instances preempted in
the first 10 minutes," so that scenario is harmless in most application
designs where preemptible VMs would make sense anyway.

Disclosure: I used to work at Google on GCE, but not directly on preemptible
VMs or this billing policy. I don't currently work or speak for Google.

------
etaioinshrdlu
What I would really love (and use heavily!) is a sort of "AWS Lambda" for
GPUs.

I don't think anyone has such a product at this point.

~~~
mv4
what would you use it for?

~~~
etaioinshrdlu
Batch processing. Inference with NN's. Yes, even inference is much faster on a
GPU. Perhaps a factor of 40.

Some jobs have time constraints and need to be completed as quickly as
possible.

And I have very bursty workloads.

I would love such a service.

~~~
mv4
Bursty workloads is key here. In your opinion, what's the main challenge with
existing hourly options (like Amazon's Elastic GPUs, or similar offerings from
Google Cloud) - is it the management/provisioning? So, if someone made a
Lambda-like service, where one would need to worry about provisioning - you
think there'd be a market for that?

Btw, I found some tasks (and some algos) don't parallelize well (e.g. would
get any "faster" from employing 40 GPUs instead of 4), but it would definitely
make sense to be able to run multiple models (or training) in parallel. Once
your needs are well understood, it's almost always significantly cheaper to
build your own GPU cluster.

~~~
etaioinshrdlu
Yes, it is precisely the management and provisioning that is difficult.

No one does hourly billing anymore, thankfully. GCE and AWS do per second or
per minute...

What I do now is autoscale a group of GPU instances in AWS based on the size
of the jobs queue. I create as many instances as will reasonably help,
sometimes up to 100. Work is distributed to them. They turn off immediately
once there is no more work to do.

The code runs in Docker containers but I am forced to maintain the base linux
system and nvidia drivers b/c no one provides a container or FAAS for nvidia
gpu computation...

I get the sense that this is a common problem nowadays. The way NVIDIA manages
software releases doesn't help anything. There's quite a bit of .. churn. They
don't play nice with anybody else hence Linus's famous words.

There's totally a market for containers as a service with passthrough access
to NVIDIA hardware. It best not be more than 30% more expensive than a raw
instance though, or it won't be very exciting.

------
argonaut
The 30 second shut-off notice is pretty bad for deep learning training, IMO.
You often will not be able to finish a mini-batch of training and save a model
to disk in 30 seconds. There are probably ways to interrupt processing of a
mini-batch, but it'll require some custom code (it probably will be hard to do
in Keras, for example).

~~~
boulos
Disclosure: I work on Google Cloud (and helped launch this).

Actually both Keras and raw TF have a check point thing built in (as does
Torch). I believe it can be setup to do so every epoch, but unless you're
talking about many many GBs of parameters, you can also stream lots of data to
GCS in 30s.

~~~
argonaut
You're talking about support for checkpointing every epoch, but a 30 second
shutoff would require checkpointing every N mini-batches, which is not a
typical use for checkpointing.

~~~
boulos
Losing some work in exchange for a large discount is still a good trade
though. It's a non-goal to provide 100% efficiency :).

~~~
argonaut
My point is, you would have to write a nontrivial amount of code to get this
working. It would not work out of the box with Keras right now, for example.

~~~
boulos
I think you skipped over my first comment. This is built into Keras (and was
the second result on Google for 'keras checkpoint'):
[https://keras.io/callbacks/#example-model-
checkpoints](https://keras.io/callbacks/#example-model-checkpoints)

I don't consider that a nontrivial amount of code to write. Am I missing
something?

~~~
argonaut
I am well aware of Keras' checkpoint code. It only supports checkpointing
every epoch. So no, it does not work out of the box with Keras.

------
patall
Can anyone explain how you cope with the 30s shutdown notice? Do you save your
network every few iterations to your SSD or is it feasible to save the whole
16 GB from the P100 within the 30s window? I mean, we can freeze a running
operating system (from RAM to disk), can do the same from GPU memory?

~~~
carreau
See pre-emptible as something that should only run jobs that can be re-
schedule. You don't save what you're doing; you close all that can be closed
and mark all tasks that have to be re-done. If you are 3 minutes in a task
that takes 5 minutes, then stop it and put it back on your global queue; The
gain of running on preempt will compensate for the loss of rerunning a few
tasks twice.

------
Const-me
But isn’t it still way too expensive?

Performance wise that P100 is close to GeForce 1080. That GPU currently
retails for $550.

For the price of 1 month of preemptive google’s IaaS you can buy the GPU and
use it for as long as you want only paying very small amount (depending on
where you are $0.02-$0.1/hour) for electricity.

~~~
aeleos
I think its more about the flexibility of having access to more resources than
you can ever need. If you buy a 1080 then you are limited by its performance
and say you wanted to train to models at the same time, you are even more
limited. Whereas with something like this you could train 10 models right when
you need them trained, and you aren't stuck with 10 gpus that you bought so
you could do that.

~~~
Const-me
I understand the IaaS is way more flexible, e.g. in cases when you need lots
of GPUs but not for long the cloud is a clear winner.

However, IMO the price is just not OK.

Compare with traditional servers. For $0.7/hour you can use e.g. c4.4xlarge
amazon instance. The hardware is much more expensive than $550, just the RAM
is already around $400. You won’t be able to purchase an equally-performing
server for 1 month of that IaaS rent, I think it’ll be like 3-6 months (which
is IMO reasonable).

~~~
tw04
That's because they can amortize the server over a longer time window than a
GPU.

~~~
candiodari
If that's true, then why are they still running the K80s ?

~~~
tw04
They didn't add the K80s until February of 2017, did you really think they'd
literally throw out hardware after less than 12 months? "Accelerated" means
18-36 month shelf life, not 10.

------
Dolores12
Did anyone count if it worth for mining anything?

~~~
pgeorgi
If it is, that'll change in 2 weeks when the difficulty factors in that
everybody moved to preemptible GPU deals.

~~~
mv4
Unlike deep-learning tasks, datacenter-grade (Tesla) GPUs don't have an
advantage in mining over much cheaper consumer GPUs. As an example, a cloud-
based server with dual Tesla P100 would get you 120..125 MH/s on ETH, and
would cost $2,500 per month. Or, I can build a rig with consumer GPUs
(1060/1070) for under $2,000 (one-time).

Interestingly, you cannot run consumer GPUs in a datacenter, as Nvidia driver
terms prohibit it.

------
yonran
I wish Google would just adopt spot prices like AWS EC2, where you can look at
the 3-month spot price history to estimate cost and to discover combinations
of availability zone and instance type with extra capacity. GCE’s preemptible
instances are “easy” in that there is no spot price to bid, but the downside
is the total absence of information, since there is no way to look up capacity
once your instance is shut down.

~~~
cobookman
This is a known problem. You should ping your account team for more
information on upcoming development.

Your account team can also advise you on what zones you should deploy
preemptives in. Googlers internally can see current preemptive capacity and
preemption rates on a per zone level. In certain zones you're more likely to
have preemption due to large customer demand.

~~~
swozey
AKA; not us-central1-c

------
jhallenworld
Well, we need a way to do this with FPGAs.

------
option_greek
How much money do companies spend on an average per month for GPU computing ?
(looking for anecdotal figures).

~~~
dmoy
That sounds like a number that would be fairly closely guarded by anyone who
could answer you :/

~~~
falsedan
You can just watch the spot prices for g2.2xlarge and see them go up to 4 time
the on-demand price in some regions.

