Any feedback is welcome.
Citing from OP on ease of use with Google Cloud:
> Over the past couple of years, I’ve never struggled to grasp or understand any of the GCE offered services. The predefined machine types are also very clear, shared core, standard, high memory and high CPU, I know them all by heart now with their memory configurations and pricing to some extent. I clearly understand the difference between the IOPS for a standard disk and an SSD disk and how the chosen size impacts their performance. There is no information overload, disk pricing and other information is kept separate, it’s simple and easy to understand.
Now compare this with EC2 VMs, it’s overwhelming, current/previous, etc… generation VMs. Disk information and network configurations all lumped together with VM configurations, paragraphs and paragraphs of different price configurations. For me, it was painful just trying to understand which VM type is suitable for my needs. My first encounter with SSD configurations and maximum provisioned IOPS for AWS RDS was one of pain. Instead of spending time working on my project, I found myself spending valuable time trying to select which IaaS offerings best fit my needs. Things like trying to figure out if low to moderate or high network connectively is best for my needs!. No wonder I still hear many say they find Cloud offerings confusing!, I think this is no more with GCP.
For example, we haven't introduced different "generations" of machine types, and instead have stuck with "n1-standard-1" even across different architectures (we then document which underlying processor architectures are available on a zone-by-zone basis at https://cloud.google.com/compute/docs/zones#available for people that care).
Similarly, instead of introducing "Local SSD instances", we let you attach Local SSD partitions to any VM arbitrarily. And Preemptible VMs are just a boolean option on any VM.
So you don't see a machine-type matrix explosion on GCE, and that's on purpose.
Disclosure: I work on Compute Engine.
So they made a different choice: let the customer just decide explicitly. We went with "let the customer decide if they want". Most people internally at Google don't bother and I'd say we've been proven correct in the marketplace as well; if you need to care, we're transparent (otherwise, who cares!).
Two examples of things that rock:
- Being able to pop up an ssh shell right from your browser
- Google Cloud shell. A free Linux shell in the sky with a bunch of dev tools pre-installed (including docker)
So, we can spin up a beefy EC2 (c4.8xlarge or larger) for each video, and get them handled quickly, but we get charged for the full hour which is quite expensive. All the machines we would target for spot instances have had very rocky pricing lately, so that often doesn't save much. We could spin up less beefy instances, trying to encode/decode in ~1 hour so as to maximize economy, but then, obviously, each video takes much longer to process.
Lambda would be an okay solution, approximately double the price of equivalent EC2 solutions, while providing two very big wins: granularity and 5 minute encoding/decoding time regardless of video size/length. That second point is pretty insane; we could process an entire 4K movie in 5 minutes by unleashing a swarm of Lambdas on it. The problem is that Lambdas have a max memory of 1.5GB which isn't enough for 4K encoding (h264). Also they have limited disk space, so it's challenging to get frames and video in-out (and no FUSE module in the kernel :/). We're still experimenting, but even if we do get it to work it will probably be with the ultrafast preset, which is non-ideal.
Elastic Transcoder doesn't handle decoding/encoding, only transcoding, so its a no-go. We thought to use it in a larger pipeline, like have lambdas sew frames up into a lossless codec, then hand that to Elastic Transcoder to be compressed. But ET won't encode 444 formats, and it incurs the cost of both a lambda swarm and ET, so that's not great.
Per-minute billing would be ideal. Now I have to consider whether it's worth it to port everything over to Google. Oh how I wish these companies didn't use ludicrous egress pricing to enforce vendor lock-in.
Keep in mind that GCE has 10 minute minimum, so if you run an instance for less than 10 minutes, you'll be charged for 10 minutes, but after that billing is per-minute.
You said that making it take an hour was ok, so I assume the processing time is somewhat flexible. If you don't have enough workload to always keep one machine busy, you could always wait until you did have enough workload and then spin up a one machine render farm, have it work for an hour and then shut it back down until you had enough work.
It's true that per minute billing would save you money, but assuming what you're doing is meant to scale up, once you get past enough work to keep a single machine busy, you've gained nothing.
However if you put in the work now to separate the jobs from the machines, you're on the path to much better future scalability, because you can size the render farm to the work instead of having the farm and the number of jobs directly linked.
My comment was unclear. It's a balancing act between running time of a job, time before a job starts, and cost. 1 and 2 are both important and appreciated parameters, but only if 3 allows.
> It's true that per minute billing would save you money, but assuming what you're doing is meant to scale up, once you get past enough work to keep a single machine busy, you've gained nothing.
Completely true, but at that point we're running a cloud within a cloud ... which seems silly. Per-minute billing would get us what we need, without the complexity of having to engineer a cloud within a cloud. Plus, it would allow scaling horizontally, which improves parameter #2 above, without impacting #3. In other words, a farm of huge machines will need a queue to keep them busy. Having a queue means there's some delay before a job starts. Whereas if we just spin up a per-minute billed machine immediately for each job, there is no delay. That requires less engineering, and handles any workload whether it be heavy or sporadic.
You're doing that anyway, even if you run one job per machine, you just loose granularity (one of the big advantages of any "cloud").
> In other words, a farm of huge machines will need a queue to keep them busy. Having a queue means there's some delay before a job starts
But that's only true if you can't fill one machine. Once you're filling one machine, you're queue processing time will be the same. Presumably you still have a queue, it's just that the queue drives the start of machines instead of processes on the machine.
> That requires less engineering
Right, like I said, you can do less engineering now but pay more later as your workloads scale up in both money and maintenance costs.
Disclaimer: I work at GCP, and used to work at AWS, so I've actually seen this save customers $millions
Disclaimer: I work at GCP, so I know you can only get a few TB/s this way, which might be a bottleneck. YMMV
It's sorta like that.
Disclaimer: I work at GCP, so feedback on how you'd like to see a more detailed version might actually have an effect!
To entice you some more, custom machine types are also a huge differentiator here. The jump in cost and number of cores between fixed instance shapes is huge. When choosing between the c4.4xlarge and 8xlarge, you've got just over a 2x jump in price and performance. With custom machine types, you could nudge yourself upwards to say 18 vCPUs or 24 (and can even tune the type to your resolution).
So take a look! We've got a free trial of $300 for 60 days.
Disclosure: I work on Compute Engine (and specifically, Preemptible VMs).
If you need to spin up dozens or hundreds of machines at short notice, and then take them down, I could see that cloud use case. If it's a handful or just one, it seems the cloud only complicates the matter.
FUSE typically has a lot of latency for context switches, or so I under stood it. Is it possible to make a native driver?
If you want to build for scalability, then it's best to separate the jobs from the machines. Having one job per machine isn't granular enough. It's best to have a farm of machines that can take jobs and process them, and then scale the farm if the jobs aren't processed fast enough. (Incidentally this is what Lambda does for you)
If you do that, then once you have enough jobs to fill a single machine, to-the-minute billing doesn't really save you much money, if any, and you're then on a path to much better future scalability.
As a disclaimer, I work for Google on Cloud Dataproc - a managed Spark and Hadoop service. So, I am passionate and focused on Spark clusters from 3 CPUs/3GB ram to 5k+ CPUs and TBs+ of RAM.
If you are running Spark (or Hadoop, Pig, Hive, Kafka, etc.) jobs than per-minute billing can save you quite a bit. Unless you can seperate your jobs over time (and balance them) to keep n clusters saturated, you're probably paying for an idle Spark/Hadoop cluster to sit around. Moreover, in terms of Cloud Dataproc (only) that's why you can scale clusters up/down, use preemptibles, and custom machine types. You can custom shape your clusters and pay for exactly what you use (disclaimer - there is a 10 minute minimum like most Google Cloud services.)
As a practical example, if you have a 100 node Spark cluster and only use 25 minutes of it, you can stand to save considerably in a given year. Yes, you can possibly rebalance your work internally to saturate a cluster at the optimal n minutes but at that point, you're paying to do the engineering work for it. :)
Not every use case revolves around customer-facing websites or processing streaming data...
All the more reason that 15 minutes * 100 nodes * x days can add up very quickly. :)
My point is at only the smallest scales does hourly vs minutely billing make a difference. Yes, it's really nice, but doesn't make as much difference as everyone says.
I think there's a latency trade-off there. So imagine you have a job where latency matters, like a user wants to run a bigquery-style interactive query across 1TB of data, and ideally it should complete within 1 minute. Lets say it takes 10 instance hours (600 instance minutes) to complete this query.
With per-minute billing you could launch 600 instances and complete it in a minute. You could also do the same with per-hour billing, but you would overpay by a factor of 60x or you would launch 10 instances and wait an hour for the query to complete.
Assuming you have a bunch of these queries coming in, you could queue them up, but then the latency will suffer, because you'll have to delay starting a job until you get to that position in the queue.
If you wish to trade some wait time for some cluster efficiency, yes, you can just queue them up and then slowly scale the cluster up and down to keep 100% utilization.
However, it would be nice if you really could scale up and down at per-minute increments and let Google figure out how to get their cluster utilized efficiently and let them take advantage of their different customers with different workloads and preemptible instances.
Also, you're assuming you have enough work to keep all machines busy all the time....
I think the point is: Smaller billing windows allow you to be more responsive to your load and be able to shut down faster and save more money.
Many years ago I was crunching (doing NLP analysis) on parts of the Twitter firehose (or sometimes just the garden hose) on Amazon Elastic Map Reduce. EMR was a fantastic service that made getting work done much easier. However, I was always trying to game the system by having my runs end in just less than one hour, which was a nuisance. Per minute billing would have made my life easier.
If you're interested in leveraging advanced cost-saving techniques on AWS compute and storage, we're currently building something in the cloud HPC space that does exactly this. Nothing public yet, but email in my profile for more.
Our system automatically manages elastic fleets of instances and right-sizes instances to their workloads (including spot and lambdas), and also makes it easy to reduce TCO on storage. It's agnostic in terms of deployment (hosted vs. bring-your-own-VPC).
Before: Start cluster > submit many jobs to the cluster > manage resources
Now: Submit a job to Dataproc > Start a cluster for each job > shut down the cluster
And, of course, BigQuery gives you the equivalent of per-second billing and 0-second scaling to thousands of cores:
Disclaimer - I work at Google, on Cloud Dataproc (and we are passionate about making the Spark/Hadoop ecosystem both super fast, but also extremely cost effective) :)
edit: To clarify, I mean if you're running software that does something on a given set of physical machines and you cant get job throughput higher than some number, n, on a given machine it can be prohibitively expensive to scale machines ahead of time or during load. With Lambda we have the option to run a ton of independent processes on theoretically independent machines to vastly increase throughput while we slog through improving our design to improve per-machine throughput on our legacy EC2 infrastructure. Lambda allows you to isolate your scale problem to 1 job per container at a time instead of looking at machines as capable of only `n` jobs at a time.
But seeing as most of their work last less than an hour, I wonder how well they'd done to take advantage of the spot market.
Does anyone know of any tools that take advantage of the spot market? I know that Spark does .
If you're using Spot Fleets (a new feature) AWS will take a spec of the compute capacity you want and find it for you, across different instance types and availability zones, which makes it more likely you'll get the capacity you want.
In practice spot machines are pretty stable. We run hundreds across several different instance types and they're often up for weeks at a time.
So unless your batch job absolutely must not be delayed you might give spots a go. We used to run critical analytics EMR (Hadoop) jobs on spots and it saved us a a fortune - probably hundreds of thousands of dollars. Very occasionally we would switch them to on-demand (automatically) if we couldn't get spots at the price we wanted.
I guess this is useful for longer-running batch jobs (longer than the 2 minute warning you get of termination on regular spots) of a relatively known duration.
For example, a fixed duration m3.medium is currently $.037/hr for 1 hour and $.047/hr at 6 hours. GCE's equivalent n1-standard-1 is $.05/hr all the time (and automatically triggers sustained use discounts). If you ask for a 4-hour block and only use 3.2 hours you pay the same 16 cents as you would have on GCE. As I tell my friends, if you want a cheap, guaranteed VM just use GCE on-demand; if you're locked into AWS though, check out Spot!
Disclaimer: I work on Compute Engine (and Preemptible VMs specifically), so I'm clearly biased.