Hacker News new | comments | ask | show | jobs | submit login
Saving Hundreds of Hours with Google Compute Engine Per-Minute Billing (omerio.com)
138 points by izzym on Mar 16, 2016 | hide | past | web | favorite | 75 comments

Google Cloud wins big on simplicity and ease of use. That can save lot of developer time. Here is the article that I am writing on "Ease of use comparing Google vs AWS": https://docs.google.com/document/d/1B5XUiQUClxdoFpl2Md5MZOv6...

Any feedback is welcome.

Citing from OP on ease of use with Google Cloud:

> Over the past couple of years, I’ve never struggled to grasp or understand any of the GCE offered services. The predefined machine types are also very clear, shared core, standard, high memory and high CPU, I know them all by heart now with their memory configurations and pricing to some extent. I clearly understand the difference between the IOPS for a standard disk and an SSD disk and how the chosen size impacts their performance. There is no information overload, disk pricing and other information is kept separate, it’s simple and easy to understand.

Now compare this with EC2 VMs, it’s overwhelming, current/previous, etc… generation VMs. Disk information and network configurations all lumped together with VM configurations, paragraphs and paragraphs of different price configurations. For me, it was painful just trying to understand which VM type is suitable for my needs. My first encounter with SSD configurations and maximum provisioned IOPS for AWS RDS was one of pain. Instead of spending time working on my project, I found myself spending valuable time trying to select which IaaS offerings best fit my needs. Things like trying to figure out if low to moderate or high network connectively is best for my needs!. No wonder I still hear many say they find Cloud offerings confusing!, I think this is no more with GCP.

Very structured and lots of meaningful analogies. I didn't know that AWS had so many storage services!. And you are right AWS console feels like a marketing dashboard, so far I've only ever clicked on 3 icons, never looked at the other 30 or so

i think it's mostly historical baggage. as soon as gce gets as old as aws, the offering variations they have to offer for historical contacts and what so ever will be as confusing as aws today

Actually, we've made some explicit choices along the way to avoid this cruft.

For example, we haven't introduced different "generations" of machine types, and instead have stuck with "n1-standard-1" even across different architectures (we then document which underlying processor architectures are available on a zone-by-zone basis at https://cloud.google.com/compute/docs/zones#available for people that care).

Similarly, instead of introducing "Local SSD instances", we let you attach Local SSD partitions to any VM arbitrarily. And Preemptible VMs are just a boolean option on any VM.

So you don't see a machine-type matrix explosion on GCE, and that's on purpose.

Disclosure: I work on Compute Engine.

why do you think amazon did not do the exact same things, before they accumulated years of baggage?

There's a real trade off. I didn't mean to dismiss it (apologies if it came across that way). For some workloads, a 2.3 GHz Haswell runs absolute circles around a 2.5 GHz Sandy Bridge (mostly vectorization friendly ones). So if you care about newer architectures, now the API either needs to let you choose a processor type per zone (painful) or you end up taking it on as the provider (each zone has a single processor type). We also elect to maintain single-threaded performance across a variety of benchmarks which is a double-edged sword.

So they made a different choice: let the customer just decide explicitly. We went with "let the customer decide if they want". Most people internally at Google don't bother and I'd say we've been proven correct in the marketplace as well; if you need to care, we're transparent (otherwise, who cares!).

I've been using GCE a lot since 2012 (early beta tester) and the web interface has only improved with time.

Same with AWS, I've been using that since 2008 or so. At that time, many of the services, or functionality within certain services were either not usable (as in couldn't be accessed) via the web interface, only via the APIs.

I see a constant flux of features, but the whole Google Cloud ecosystem is well thought of and coherent as if the ease of use is primary thing. App Engine, after 8 years, is still as easy to use as the day it was released.

I agree, most of it is accumulation over the years

It's not just accumulation. IMHO, the features of Google Cloud are better thought out.

Two examples of things that rock: - Being able to pop up an ssh shell right from your browser - Google Cloud shell. A free Linux shell in the sky with a bunch of dev tools pre-installed (including docker)

Wow, I never realized that Google offers per-minute billing. We have a video decoding, encoding workload, and it has been a major pain point handling it on AWS. Decoding/Encoding 4K video requires quite a bit of beef, with a hefty minimum memory requirement on the order of 3+ GB for h264 encoding. Less if you use ultrafast preset, but then you pay for larger files and larger egress.

So, we can spin up a beefy EC2 (c4.8xlarge or larger) for each video, and get them handled quickly, but we get charged for the full hour which is quite expensive. All the machines we would target for spot instances have had very rocky pricing lately, so that often doesn't save much. We could spin up less beefy instances, trying to encode/decode in ~1 hour so as to maximize economy, but then, obviously, each video takes much longer to process.

Lambda would be an okay solution, approximately double the price of equivalent EC2 solutions, while providing two very big wins: granularity and 5 minute encoding/decoding time regardless of video size/length. That second point is pretty insane; we could process an entire 4K movie in 5 minutes by unleashing a swarm of Lambdas on it. The problem is that Lambdas have a max memory of 1.5GB which isn't enough for 4K encoding (h264). Also they have limited disk space, so it's challenging to get frames and video in-out (and no FUSE module in the kernel :/). We're still experimenting, but even if we do get it to work it will probably be with the ultrafast preset, which is non-ideal.

Elastic Transcoder doesn't handle decoding/encoding, only transcoding, so its a no-go. We thought to use it in a larger pipeline, like have lambdas sew frames up into a lossless codec, then hand that to Elastic Transcoder to be compressed. But ET won't encode 444 formats, and it incurs the cost of both a lambda swarm and ET, so that's not great.

Per-minute billing would be ideal. Now I have to consider whether it's worth it to port everything over to Google. Oh how I wish these companies didn't use ludicrous egress pricing to enforce vendor lock-in.

What's even better, you can use preemptible instances in GCE which are a rough equivalent to Spot instances in Amazon, but have fixed pricing (no bidding nightmare). They can live up to 24 hours, but can be shut down earlier, so your processing pipeline should be able to handle it. But with up to 70% lower prices and the same per-minute billing it's a steal!

Keep in mind that GCE has 10 minute minimum, so if you run an instance for less than 10 minutes, you'll be charged for 10 minutes, but after that billing is per-minute.

+1. Preemptible images are insanely cheap. If you can tolerate the possible loss of the machine (video processing seems like it would fit) they are a bargain

+1. We run 90% of our backend processing on PVM's now, anything that is not time sensitive basically.

Why not set up a render farm that's always running and then send it job after job to keep the pipeline full?

You said that making it take an hour was ok, so I assume the processing time is somewhat flexible. If you don't have enough workload to always keep one machine busy, you could always wait until you did have enough workload and then spin up a one machine render farm, have it work for an hour and then shut it back down until you had enough work.

It's true that per minute billing would save you money, but assuming what you're doing is meant to scale up, once you get past enough work to keep a single machine busy, you've gained nothing.

However if you put in the work now to separate the jobs from the machines, you're on the path to much better future scalability, because you can size the render farm to the work instead of having the farm and the number of jobs directly linked.

> You said that making it take an hour was ok, so I assume the processing time is somewhat flexible.

My comment was unclear. It's a balancing act between running time of a job, time before a job starts, and cost. 1 and 2 are both important and appreciated parameters, but only if 3 allows.

> It's true that per minute billing would save you money, but assuming what you're doing is meant to scale up, once you get past enough work to keep a single machine busy, you've gained nothing.

Completely true, but at that point we're running a cloud within a cloud ... which seems silly. Per-minute billing would get us what we need, without the complexity of having to engineer a cloud within a cloud. Plus, it would allow scaling horizontally, which improves parameter #2 above, without impacting #3. In other words, a farm of huge machines will need a queue to keep them busy. Having a queue means there's some delay before a job starts. Whereas if we just spin up a per-minute billed machine immediately for each job, there is no delay. That requires less engineering, and handles any workload whether it be heavy or sporadic.

> Completely true, but at that point we're running a cloud within a cloud

You're doing that anyway, even if you run one job per machine, you just loose granularity (one of the big advantages of any "cloud").

> In other words, a farm of huge machines will need a queue to keep them busy. Having a queue means there's some delay before a job starts

But that's only true if you can't fill one machine. Once you're filling one machine, you're queue processing time will be the same. Presumably you still have a queue, it's just that the queue drives the start of machines instead of processes on the machine.

> That requires less engineering

Right, like I said, you can do less engineering now but pay more later as your workloads scale up in both money and maintenance costs.

1.1 machines worth of processing means some things wait 1hour, or your wasting a lot of time on the second machine. So, shorter increments still save you as the workload grows, just less as the system keeps growing. (More so if demand is random.)

Little pro-tip: use Direct Connect and POOF egress goes down to nearly $.03 per GB. https://aws.amazon.com/directconnect/

Disclaimer: I work at GCP, and used to work at AWS, so I've actually seen this save customers $millions

The leased line provider will also charge you for bandwidth too. But probably overall cheaper. And you get QOS guarantees too.

Not if you use Direct Peering from Google :D https://cloud.google.com/interconnect/direct-peering

Disclaimer: I work at GCP, so I know you can only get a few TB/s this way, which might be a bottleneck. YMMV

Oh, to be constrained by "a few TB/s".... :)

Or even easier (come on Miles!), use CDN Interconnect via Fastly, CloudFlare, etc.: https://cloud.google.com/interconnect/cdn-interconnect

Disclosure: I work on Compute Engine.

Any blog posts you have links for describing how this architecture looks?


It's sorta like that.

Disclaimer: I work at GCP, so feedback on how you'd like to see a more detailed version might actually have an effect!

I'm curious about what things look like from a management point of view. It would be great if I could go from gcp and AWS directly and not through and intermediary.

For workloads like yours, per-minute billing is amazing. It lets you have a simpler architecture (machine failed? no problem, just re-enqueue the single video or chunk that it was handling) and is economically equivalent to running the work back-to-back. We want to encourage you to do this, because internally we would never expect someone to keep some long running jobs up to run mapreduces.

To entice you some more, custom machine types are also a huge differentiator here. The jump in cost and number of cores between fixed instance shapes is huge. When choosing between the c4.4xlarge and 8xlarge, you've got just over a 2x jump in price and performance. With custom machine types, you could nudge yourself upwards to say 18 vCPUs or 24 (and can even tune the type to your resolution).

So take a look! We've got a free trial of $300 for 60 days.

Disclosure: I work on Compute Engine (and specifically, Preemptible VMs).

Please add a preemptible option for Container Engine! (GKE)

Lambda is not a good fit for long running processes. Also, the lambda instances that run your code only have a small amount of RAM (I believe 512MB is the default, but it is tunable). In short, I don't believe lambda would be a good fit for your 4K video encoding use case.

Lambda's limits are currently a max 5min run time and ~1.5GB.

What are the benefits of using cloud providers for video encoding, compared to a dedicated server provider, or even a localized server? Dedicated providers can offer 16-core/32-thread Xeon servers for a few hundred dollars a month.

If you need to spin up dozens or hundreds of machines at short notice, and then take them down, I could see that cloud use case. If it's a handful or just one, it seems the cloud only complicates the matter.

Your story is exactly the story of Atomic Fiction:


> and no FUSE module in the kernel

FUSE typically has a lot of latency for context switches, or so I under stood it. Is it possible to make a native driver?

Couldn't Amazon Snowball or Direct Connect significantly reduce the cost of getting your data off their storage?

I'm seeing a lot of folks here talking about how to-the-minute billing would save them so much money on Amazon, but I think you're not thinking about the problem correctly.

If you want to build for scalability, then it's best to separate the jobs from the machines. Having one job per machine isn't granular enough. It's best to have a farm of machines that can take jobs and process them, and then scale the farm if the jobs aren't processed fast enough. (Incidentally this is what Lambda does for you)

If you do that, then once you have enough jobs to fill a single machine, to-the-minute billing doesn't really save you much money, if any, and you're then on a path to much better future scalability.

I think it depends on the use case and service.

As a disclaimer, I work for Google on Cloud Dataproc - a managed Spark and Hadoop service. So, I am passionate and focused on Spark clusters from 3 CPUs/3GB ram to 5k+ CPUs and TBs+ of RAM.

If you are running Spark (or Hadoop, Pig, Hive, Kafka, etc.) jobs than per-minute billing can save you quite a bit. Unless you can seperate your jobs over time (and balance them) to keep n clusters saturated, you're probably paying for an idle Spark/Hadoop cluster to sit around. Moreover, in terms of Cloud Dataproc (only) that's why you can scale clusters up/down, use preemptibles, and custom machine types. You can custom shape your clusters and pay for exactly what you use (disclaimer - there is a 10 minute minimum like most Google Cloud services.)

As a practical example, if you have a 100 node Spark cluster and only use 25 minutes of it, you can stand to save considerably in a given year. Yes, you can possibly rebalance your work internally to saturate a cluster at the optimal n minutes but at that point, you're paying to do the engineering work for it. :)

... and with some use cases you just can't saturate the cluster 24/7. For instance, if you have to run ad-hoc computational jobs.

Not every use case revolves around customer-facing websites or processing streaming data...


All the more reason that 15 minutes * 100 nodes * x days can add up very quickly. :)

True but as soon as you have two jobs you can saturate that cluster for an hour.

My point is at only the smallest scales does hourly vs minutely billing make a difference. Yes, it's really nice, but doesn't make as much difference as everyone says.

I think you are correct, but there is a case where saturating the cluster is maybe not the ideal situation.

I think there's a latency trade-off there. So imagine you have a job where latency matters, like a user wants to run a bigquery-style interactive query across 1TB of data, and ideally it should complete within 1 minute. Lets say it takes 10 instance hours (600 instance minutes) to complete this query.

With per-minute billing you could launch 600 instances and complete it in a minute. You could also do the same with per-hour billing, but you would overpay by a factor of 60x or you would launch 10 instances and wait an hour for the query to complete.

Assuming you have a bunch of these queries coming in, you could queue them up, but then the latency will suffer, because you'll have to delay starting a job until you get to that position in the queue.

If you wish to trade some wait time for some cluster efficiency, yes, you can just queue them up and then slowly scale the cluster up and down to keep 100% utilization.

However, it would be nice if you really could scale up and down at per-minute increments and let Google figure out how to get their cluster utilized efficiently and let them take advantage of their different customers with different workloads and preemptible instances.

Can you give me an example of a job where latency matters that much?

I did in the post. Interactive queries.

You're reallllly simplifying a hard problem. "Just pack the bins tighter, no big deal." This is what Omega/Borg/Mesos are all trying to - it's not an easy problem.

Also, you're assuming you have enough work to keep all machines busy all the time....

Actually, what I'm saying is that bin packing is a hard problem so separate the work from the machines, make the bins smaller, and then bin packing becomes easier.

+ YARN :)

It really depends on how elastic your workload is. Imagine, if amazon had 1 day/week/month minimum for the VMs. You'd run into the same issues (you wouldn't respond to a spike in demand if it was only going to last 2 hours).

I think the point is: Smaller billing windows allow you to be more responsive to your load and be able to shut down faster and save more money.

You're absolutely right. What I'm trying to say is that beyond the smallest scale, one hour is so small it doesn't make a difference.

I think the problem I'm reading in the comments that Lambdas have a 1.5 GB max memory, which is a constraint for some use-cases

I may have derailed my comment a bit by mentioning lambda. My point was more about separating work from machines, which Lambda helps with, but you can also do on your own.

Some people have spikey workloads.

The author makes a good point.

Many years ago I was crunching (doing NLP analysis) on parts of the Twitter firehose (or sometimes just the garden hose) on Amazon Elastic Map Reduce. EMR was a fantastic service that made getting work done much easier. However, I was always trying to game the system by having my runs end in just less than one hour, which was a nuisance. Per minute billing would have made my life easier.

Amazon should definitely be feeling the pressure from Google on this. Per-hour billing has always been a pain to manage for offline/batch workloads.

If you're interested in leveraging advanced cost-saving techniques on AWS compute and storage, we're currently building something in the cloud HPC space that does exactly this. Nothing public yet, but email in my profile for more. Our system automatically manages elastic fleets of instances and right-sizes instances to their workloads (including spot and lambdas), and also makes it easy to reduce TCO on storage. It's agnostic in terms of deployment (hosted vs. bring-your-own-VPC).

When it comes to Big Data, per-minute billing + rapid startup is significant. When you can have a 10,000-core Hadoop cluster in 45 seconds, you stop thinking about it in terms of clusters and start thinking about it in terms of jobs.

Before: Start cluster > submit many jobs to the cluster > manage resources

Now: Submit a job to Dataproc > Start a cluster for each job > shut down the cluster

And, of course, BigQuery gives you the equivalent of per-second billing and 0-second scaling to thousands of cores:


In addition, you can also use preemptibles and custom VM machine types (specify the exact ratio of CPU/RAM you want for master + workers) for your clusters. This gives even more control for cost vs resources.

Disclaimer - I work at Google, on Cloud Dataproc (and we are passionate about making the Spark/Hadoop ecosystem both super fast, but also extremely cost effective) :)

BigQuery is well and truly amazing, zero upfront setup or infrastructure. You ur Query & BigQuery that is it

This is a great analysis, but I'm missing some basic background. Do all of the other cloud providers charge per-hour? Do people typically design their applications around this pricing model, or ignore it and eat the cost?

AWS rounds up to the nearest hour. Azure is by the minute. Not sure about the other smaller providers.

Wait till you try per-100ms billing on AWS Lambda

Except Lambda is quite expensive. Compared to equivalent EC2 t2 instances, Lambda is 4.6x the cost of on-demand, or ~20x of spot instances. I really don't understand why anyone would use it for a high-volume web app. For low-volume sites you have the advantage of not paying for idle time, but the inflection point is somewhere around 13 minutes per hour.

Lambda is great though at handling services that see unpredictable spikes of traffic throughout the day where its difficult to provision quickly. Its also really good at reducing machine-management issues where your software doesn't utilize resources very well on a given machine. While its a bandaid on writing well-designed scalable software, it does allow you in the interim to take somewhat inflexible software and scale parts of it independently of physical machines.

edit: To clarify, I mean if you're running software that does something on a given set of physical machines and you cant get job throughput higher than some number, n, on a given machine it can be prohibitively expensive to scale machines ahead of time or during load. With Lambda we have the option to run a ton of independent processes on theoretically independent machines to vastly increase throughput while we slog through improving our design to improve per-machine throughput on our legacy EC2 infrastructure. Lambda allows you to isolate your scale problem to 1 job per container at a time instead of looking at machines as capable of only `n` jobs at a time.

We don't know their memory usage but if they use 1.5GB then they'll get 70 hours/month free from Amazon when using lambda [1], and it gets better from there depending on their usage.

But seeing as most of their work last less than an hour, I wonder how well they'd done to take advantage of the spot market.

Does anyone know of any tools that take advantage of the spot market? I know that Spark does [2].

[1] https://aws.amazon.com/lambda/pricing/ [2] http://spark.apache.org/docs/latest/ec2-scripts.html

I have always been curious as to how to use the spot market. If I were to get a Ec2 on the spot market, how do I ensure that my task gets completed before being prempted? Mt batch jobs take anywhere between 10 ~ 20 minutes. I there a minimum guaranteed time that I could buy? For my solution , I ended up using Google cloud and its minute billing.

I lot of my infrastructure runs on the spot market. If you launch an individual machine there's no guarantee that at any time it won't be pre-empted (terminated). You do a little time to handle graceful shutdown of your code before the machine goes away, so it's not like the plug was pulled out at the wall.

If you're using Spot Fleets (a new feature) AWS will take a spec of the compute capacity you want and find it for you, across different instance types and availability zones, which makes it more likely you'll get the capacity you want.

In practice spot machines are pretty stable. We run hundreds across several different instance types and they're often up for weeks at a time.

So unless your batch job absolutely must not be delayed you might give spots a go. We used to run critical analytics EMR (Hadoop) jobs on spots and it saved us a a fortune - probably hundreds of thousands of dollars. Very occasionally we would switch them to on-demand (automatically) if we couldn't get spots at the price we wanted.

Have you looked at Google's Preemptible VMs? The big difference is that it's a flat 70% off fee, rather than a volatile market, and Google has relatively few but versatile VM types, which makes it easier to architect infrastructure (ex. no need for a high-IO VM or a high-network VM.. just attach a high-IO disk to a regular VM and networking is world-class).

We're heavily integrated into the rest of the AWS system, it wouldn't be worth the engineering effort. Also, the spots don't make up a large enough portion of our overall AWS spend, if we wanted to save money there are other things we could do that would let us stay on AWS (which we like).

You can do block request for one hour or more of guaranteed run time, but you pay for that guaranteed minimum run time. Otherwise you have to build in checkpointing and recovery.


AWS launched spot blocks last year that does just this. It's not as cheap as regular spot instances, but you do get a guarantee that your instance will be around for a configurable period between 1 and six hours (you pay more as you approach the six hour mark), after which time it will be automatically terminated.

I guess this is useful for longer-running batch jobs (longer than the 2 minute warning you get of termination on regular spots) of a relatively known duration.


Combined with per-minute billing and our lower on-demand pricing, regular GCE instances often end up being cheaper.

For example, a fixed duration m3.medium is currently $.037/hr for 1 hour and $.047/hr at 6 hours. GCE's equivalent n1-standard-1 is $.05/hr all the time (and automatically triggers sustained use discounts). If you ask for a 4-hour block and only use 3.2 hours you pay the same 16 cents as you would have on GCE. As I tell my friends, if you want a cheap, guaranteed VM just use GCE on-demand; if you're locked into AWS though, check out Spot!

Disclaimer: I work on Compute Engine (and Preemptible VMs specifically), so I'm clearly biased.

Heroku charges per second of compute, with no minimum. Using AMQP with https://github.com/the-grid/guv our average server stays up for just couple of minutes, directly in proportion to current demand

I would love to see how you automate the full cycle, not only spinning new instances


Can anyone spot a dateline on this article?

Well, the URL signifies that it was published today.

Archives -> March 2016(1) links to this article http://omerio.com/2016/03/

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact