
Introducing Preemptible VMs - boulos
http://googlecloudplatform.blogspot.com/2015/05/Introducing-Preemptible-VMs-a-new-class-of-compute-available-at-70-off-standard-pricing.html
======
jxf
This is a great announcement. Preemptible VMs are a great strategy if:

    
    
        * you don't care too much about when a workload finishes
        * it can be chunked into small units of interruptible work
        * the cost of resuming interrupted work is zero or small
    

Examples of workloads that might look like this:

    
    
        * geocoding large batches of addresses
        * map-reducing large event streams
        * content scraping/crawling
    

Notice that you don't need to run _all_ of your cluster as preemptible if
you're using bdutil [0], too -- some of the workload can be preemptible, and
some not. So you can guarantee a minimum processing throughput, but get extra
throughput for very cheap.

I think that's a great way to do it, though I wish it was generalized for all
kinds of clusters, not just for Hadoop ones.

[0]
[https://github.com/GoogleCloudPlatform/bdutil](https://github.com/GoogleCloudPlatform/bdutil)

~~~
boulos
Yeah, I added it to bdutil since it was clearly useful and yet well understood
that it would do the right thing. I'd love to see more cluster and machine
management tools deal with "machines can come and go".

The guys on the go team are using it for their buildbots
([http://build.golang.org](http://build.golang.org)) which was already
prepared for the box to die.

------
branola
Any developer who was burned by App Engine pricing in the past is likely to
take the lure of a "70% off" discount with a grain of salt because of the much
higher cost of incorrectly relying on Google's pricing pledges.

In case anyone forgets, the pricing changes on Google App Engine caused many
developers to abandon apps they had developed on the platform because of
hundreds of percent increases in pricing:

[https://groups.google.com/forum/#!topic/google-
appengine/rSh...](https://groups.google.com/forum/#!topic/google-
appengine/rSh8mgw2f3c)

~~~
csense
I can immediately off the top off my head think of no less than four examples
of useful services Google has discontinued or EOL'ed: Google Reader, Google
Code, Google Wave, and XMPP support for Google Voice.

So as well as potential future price hikes, you should be prepared for the
possibility that this service might not be around for the long haul. Therefore
you should skip using any Google-specific functionality, and instead implement
a design that allows you to easily migrate to another VM vendor.

~~~
Klathmon
I never got the "google shuts everything down" mentality that has become so
prevalent.

I could easily beat those 4 with a multitude of examples from both Apple and
Microsoft, that doesn't mean that any of them are untrustworthy, just that
they evolve and continue to grow.

At least when google shuts down a service, they give a good amount of "heads
up" to those using it, provide examples of trustworthy equivalent services
from competitors, and ALWAYS provide an export function if it makes sense to
have one.

~~~
superuser2
Apple and Microsoft may discontinue support/sales but the hardware/software
they sold you continues to function. That is inherently not true of "cloud"
services.

------
boulos
As we put in the docs, to try this out:

    
    
      gcloud compute instances create my-vm --preemptible --zone=us-central1-c
    

or just set the preemptible bool to true via the API (it's under scheduling).
When you want to test how your system behaves on preemption, you can just do:

    
    
      gcloud compute instances stop my-vm --zone=us-central1-c
    

which will give you the same 30 second timeout as when you're preempted. Most
OSes have a fairly standard set of things they do on shutdown that will at
least send all your running processes a signal (via kill), but if you need to
add your own you can inject it via the new shutdown script support
([https://cloud.google.com/compute/docs/shutdownscript](https://cloud.google.com/compute/docs/shutdownscript)).

We tried to cover this in the docs
([https://cloud.google.com/compute/docs/instances/preemptible](https://cloud.google.com/compute/docs/instances/preemptible))
any feedback on that would be welcome!

------
mark_l_watson
Nice! This ties in with the discussion yesterday on HN about data center
utilization. Good for saving money and the end result must be lower global
energy use if this also catches on with other providers (AWS already offers
something similar).

When I worked at Google in 2013 part of my job was running very large
calculations that were often pre-empted. I usually ran at the lowest priority
and was in effect using spare capacity. No hassles, assuming that getting runs
completed was not too time sensitive.

~~~
GreyTheory
Could you point me at the datacenter utilization article? I can't seem to find
it, and it sounds interesting

------
4k
I might be missing something important here. But according to their pricing
page[1], a preemtible VM with 30GB memory costs $86.4 a month.

Why would someone go for this when cheap dedicated host providers like hetzner
etc offer powerful dedicated servers with 64GB memory and multicore server
grade CPUs? The comparison only gets worse taking into account that Google's
offering is preemptible and can shutdown and come up as they wish.

[1]:
[https://cloud.google.com/compute/#pricing](https://cloud.google.com/compute/#pricing)
(0.12x24x30)

~~~
rwmj
It's like asking why do people use Serviced Offices which are usually 10x the
price of a monthly rented office. Answers: because they only need the office /
VM for a few hours. Because they can immediately get an office / VM of any
size they need. Because they can walk away from the office / VM when they
don't need it any longer. It's certainly not for everyone, but Regus seem to
be doing ok.

~~~
vidarh
And for those who truly only need it now and again, it's great.

But time and time again I see infrastructure where people pay for these
services for large amount of instances that are running continuously, blindly
assuming that it's cheap because it's cloud. There's a bizarre level of price-
blindness amongst certain subset of customers of Google Cloud and AWS that
I've never seen anywhere else.

------
vosper
@boulos, since it looks like you're involved with this: The link to the
product pricing [1] goes to a generic landing page for Google services, which
is a little confusing. I think you should link directly to the instance
pricing [2].

[1] [https://cloud.google.com/pricing/#compute-
engine](https://cloud.google.com/pricing/#compute-engine)

[2]
[https://cloud.google.com/compute/pricing](https://cloud.google.com/compute/pricing)

~~~
boulos
Yeah, as part of our larger pricing announcement we're making sure people go
to /pricing so they get a bigger picture before diving straight to the table
;)

[Edit: Point taken though]

------
gnur
I wonder how big the chance is that the machine is turned off in the first
hour.

I use Google Compute for some personal project and the typical run time for a
VM is about 10 - 20 minutes. If the vm has 90% chance to survive the first
hour, it could be worth the trouble to make my process more fault tolerable.

~~~
boulos
We're not ready to put hard numbers on it. As our docs
([https://cloud.google.com/compute/docs/instances/preemptible](https://cloud.google.com/compute/docs/instances/preemptible))
say:

The probability that Compute Engine will terminate a preemptible instance for
a system event is generally low, but may vary from day to day and from zone to
zone depending on current conditions.

Give it a shot in us-central1-a and let us know how it goes!

[Edit to undo my quote text (it wrapped poorly)]

~~~
pacala
I couldn't figure out from the docs, can one restart a preempted preemptible
instance, or needs to start a different one? Would it be possible to restart
in non-preemptible mode, so the job completes but at a higher price? We still
want to complete our workloads, one way or another :)

~~~
boulos
Yes you can start a VM back up via gcloud:

    
    
       gcloud compute instances start instance-name
    

Note of course that if you got preempted because we needed the capacity back
for regular VMs (as opposed to say a maintenance event) you may not be able to
start a new Preemptible VM in that zone.

------
burner21926
This is awesome. Thank you Google for another great addition to hassle-free
cloud computing.

------
ikeboy
How does their pricing compare to ec2 spot instances?

~~~
vosper
The fundamental difference seems to be that Google's prices are fixed, whereas
AWS uses a "market" model, which frequently sees crazy prices (well over the
on-demand price) especially in us-east-1.

I say "market" because no-one really knows how the spot market place actually
works. We've had machines run for weeks, and other times the prices fluctuate
in bizarre ways and we can't get out preferred instance types for hours or (in
the worst case) days. There's an interesting analysis here:
[http://santtu.iki.fi/2014/03/20/ec2-spot-
market/](http://santtu.iki.fi/2014/03/20/ec2-spot-market/)

~~~
CoolGuySteve
We had the same problem with getting priced out of c3.8xlarge in Virginia. We
fixed it by changing our allocation algorithm to find alternate instance types
and zones. For example, instead of 1 c3.8xlarge, it might pick 2 c3.4xlarge
instances or a cc2.8xlarge. Seems to work so far.

I looked through the pricing table and played with the calculator, it seems
something equivalent to our needs would cost around a third more on google but
each cpu would have twice as much ram. Not worth it for us.

~~~
boulos
For people coming from AWS, we currently don't have an instance shape that
lines up with the c3/c4 ratio (pushing you either to our n1-standards or
n1-highcpu-. Can you get by with _less_ memory?

Note: we're very aware of this pain point, and maybe you'll see something soon
;).

~~~
CoolGuySteve
We need maybe 1GB for each job, but we need some leeway to avoid the OOM
killer when a group of work units is larger. Swapping takes so long that it's
not cost effective.

But to be honest about the situation, the cost would have to be much lower to
make it worthwhile for me to rewrite our scheduler on Google's API.

------
SEJeff
And this is where a tool like mesos + aurora/marathon really really shines.
The same can be said for the likes of kubernetes.

~~~
deadbunny
Could certainly be interesting for cheaply expanding Mesos slaves to cope with
a workload spike.

------
h43k3r
Now I am waiting to see a similar thing in Azure.

Preemptible VMs will prove to be very useful in fault tolerant Distributed
Networks.

One more usecase that I can think off is Load Testing on a large scale ( in a
distributed way ).

------
Tegran
The 1970s just called. They want their batch processing back. :-)

------
andybak
Any reason you couldn't use these for conventional web stuff? 30 seconds could
easily be long enough to bring up another instance and sync data if needed.

~~~
CydeWeys
By "conventional web stuff", do you mean serving a website or similar? Yeah, I
don't think it makes sense to host a website on a service that is preemptible.
You need your webserver available to respond to requests 24/7\. It can't just
randomly go down. Pre-emptible instances are more suited for batch processing
and big data computation runs.

~~~
comex
If you have a website that can be distributed among arbitrarily many frontend
servers, it could make sense to put them on preemptible instances, with the
database on a regular instance. (In the unlikely case no preemptible instances
are available, you could always automatically switch the frontend servers to
regular ones.) However, I think this could only possibly be efficient pricing-
wise if your traffic was extremely bursty.

------
tw04
So... this is essentially first come, first served based on their description.
And they say the preemptibles come from a smaller pool of resources. Seems odd
to me they wouldn't have a number somewhere of how many are available and how
many are in use given that it's a finite resource.

Tough to build a business model around a resource you can't even determine the
availability of.

~~~
nightcracker
> Though to build a business model around a resource you can't even determine
> the availability of.

That's exactly why it's cheap: you're trading price for reliability.

~~~
tw04
reliability != availability. I can build a business model around an
unreliable, but available resource. I can't build a business model around an
unavailable resource. They're two very different things, and the distinction
is important.

~~~
nl
This basically for batch-processing. That means that the lack of availability
isn't such an issue because this lets you trade speed-of-processing for cost.

Say you have a few hundred terabytes of images to process. You can prioritize
images by pushing them to the head of the queue, but you don't really care how
long the complete batch takes.

If you are happy to wait for your job to complete you pay less. Otherwise, pay
more and guarantee completion.

------
bitJericho
At a penny per core per instance hour the cost is about 7 dollars a month. I
can get a normal dedicated VM for cheaper... What's the benefit in this?

~~~
jpatokal
> What's the benefit in this?

You're not buying a dedicated instance for a month, you're renting it as long
as you need it. If you only need a core for an hour, your bill will be $0.01.

