Hacker News new | past | comments | ask | show | jobs | submit login
Going Multi-Cloud with AWS and GCP: Lessons Learned at Scale (metamarkets.com)
229 points by jbyers on Aug 21, 2017 | hide | past | web | favorite | 54 comments

One of the biggest benefits of Google Cloud is networking. By default GCE instances in VPC's can communicate with all instances across zones and regions. This is a huge plus.

On AWS, multi region involves setting up VPN and NAT instances. Not rocket science, but wasted brain cycles.

Generally, with GCP setting up clusters that span three regions should provide ample high availability and most users don't need to deal with the multi cloud headaches. KISS. You can even get pretty good latency between regions if you setup North Carolina, South Carolina, and Iowa. Soon West Coast clusters will be possible between Oregon and Los Angels (region coming soon).

This is one of the biggest features that you appreciate a lot when you dont have it, and makes global apps incredibly easy. Softlayer has a similar network with default region peering but not as advanced.

Of course anything can be setup using custom VPN but this is a lot more work and will never be as easy, reliable, automated or cost effective.

That being said, AWS is rolling out automatic VPC peering, running on their own private backbones between regions so there should be functional parity soon, although with different price and performance compared to GCP.

Having used SoftLayer and experienced their API, support, and GUI, I would not consider that a redeeming feature.

It's a feature they have. It might not make up for other parts of their platform that don't fit what you need but we've found their servers to be efficient and their support is quick and helpful.

They're overshadowed now by the scale, efficiency and managed services of the major clouds but can still be useful if you're running on their dedicated machines. Last I checked, Keen.IO runs on softlayer.

Digital Ocean is a far better alternative if you don't want to use the other bigger cloud providers, quite frankly. AWS dedicated servers are also far less expensive and you get more bang for the buck. m4.2xlarge, for example, is the most comparable hourly offering based on memory and CPU cores to SL base dedicated server and outperforms the hell out of it. See here:



One success story is not enough compared to thousands elsewhere.

I may be completely off here, but isn't this due to their underlying architecture decisions? That is, AWS from the start has kept all regions completely separate, so that problems in one region do not influence another. But GCP has has issues with failure across regions IIRC.

Having a software defined networking spanning across regions and failure cascades across regions are two different things. There's nothing preventing a vendor from presenting to you a single network, while they are actually distinct networks.

Having distinct networks in different regions encourages you to architect your application in a fault tolerant way.

Or the contrary. In most cases there is something to synchronize between regions, like a replica of the data.

With difficult interconnection of regions, it makes it somewhat harder to do, and it can easily end-up with "meh, AZs are good enough".

It is also potentially due to Google owing their private fiber backbone that connects all regions and as well as their software Defined Network that allow high bandwidth and low latency routing of packets across regions.

AWS also has a private backbones and offers (or will soon) VPC peering run on top of this.

I work for AWS.

Just as an FYI you don't have to use a NAT instance there are also NAT gateways which I find easier to manage: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-na...

Regardless they both cost money to run. Looks like the cheapest NAT gateway[1] is currently $0.045/hr which is about $32.76 per month. Then you also get charged $0.045 per gigabyte transferred as a "data processing charge" in addition to standard AWS data transfer charges.

[1]: https://aws.amazon.com/vpc/pricing/

AWS needs to release a more affordable and simpler feature for inter region connectivity. Even MS Azure has a Vnet to Vnet connectivity option in which traffic flows through the Azure backbone vs the internet and it doesn't cost much.

That Vnet to Vnet is unreliable when you start using it at scale.

We had issues as soon as we started launching instances ( after connecting vnets ) , and azure supports response was to give them the ids so they can manually add them to routing between vnets.

Also BGP routing, was impossible to do beyond their tutorial level setup.

If any Google Cloud people are listening I wish you had an equivalent to AWS's Certificate Manager. Provisioning a TLS certificate which automatically renews for eternity (no out-of-band Let's Encrypt renewal process needed) and attaching it to a load balancer is so nice compared to Google Cloud's manual SslCertificate resource creation flow[1].

To a lesser extent, it's also nice registering domains within AWS and setting them to auto renew. Since Google Domains already exists, it would be neat to have this feature right inside Google Cloud.

[1]: https://cloud.google.com/compute/docs/load-balancing/http/ss...

We hear you, while I can't speak to future products and features I can say we understand there is room to improve the SSL provisioning and lifecycle management story in our products and we are making investments in that area.

It's in progress, star this issue to vote: https://issuetracker.google.com/issues/35900034

One thing that I liked with GCP is their recommendation for cost saving. I spun up a compute engine for a hobby project and within minutes they gave recommendations to reduce the instance size and how much i can save. I don't think AWS offers something like that. Correct me if I am wrong.

Even better are Google managed services (PubSub / Dataflow / Datastore), which scale up and down based on usage (cloud native products) and thus save money automatically compared to their equivalents in AWS (Kinesis / Kinesis Analytics / DynamoDB) which does not autoscale.

It does not work well, it gives late responses

Really? Feel free to email me about that randhunt@amazon

AWS has the trusted advisor, and it will offer to assist you in cost savings in terms of:

* Idle Load Balancers

* Underutilization of EBS volumes

* Unassociated Elastic IP addresses

* Idle RDS intsances

* R53 latency resource record sets

* etc...

Most of the Trusted Advisor checks are only available if you're on a Business or higher tier support plan. And those are now priced as a percentage of your monthly spend – not cheap.

If you're running your business on any provider, wouldn't you want to make sure you had support?

Not when the support is useless and not required.

One extra point for tracking VM bills:

GCE bills are aggregated across instances. To get more detailed breakdown, you can apply labels to them and the bills will have label information attached in BQ.

Alternatively, you can leverage GCE usage exports here:


Which has per-instance per-day per-item usage data for GCE.

Disclosure: I work for Google Cloud but not on GCE.

When it comes to GCP:

- They have Role Based Support plans which offer flat prices per subscribed user which is a much better model. [1]

- Live migration for VMs mean host maintenance and failures are a minor issue, even if all your apps are running on the same machine. It's pretty much magical and when combined with persistent disks, effectively gives you a very reliable "machine" in the cloud. [2]

1. https://cloud.google.com/support/role-based/

2. https://cloud.google.com/compute/docs/instances/live-migrati...

>>> on AWS you have the option of getting dedicated machines which you can use to guarantee no two machines of yours run on the same underlying motherboard, or you can just use the largest instance type of its class (ex: r3.8xlarge) to probably have a whole motherboard to yourself.

Not at all. Major mistake here.

When you buy a dedicated instances on AWS, you reserve an entire server for yourself. All the VMs you buy subsequently will go to that same physical machine.

In effect, your VMs are on the same motherboard and will all die together if the hardware experiences a failure. It's the exact opposite of what you wanted to do!

I think two concepts are being conflated:

Dedicated Instances: https://aws.amazon.com/ec2/purchasing-options/dedicated-inst...


Dedicated Hosts: https://aws.amazon.com/ec2/dedicated-hosts/

At my current job, we're looking into DIs to reduce our SQL costs. With standard Spot/RIs, we're paying per-core for SQL Server. But with a DI, we're expecting to be able to license against the physical sockets instead.

> You can use Dedicated Hosts and Dedicated instances to launch Amazon EC2 instances on physical servers that are dedicated for your use. Dedicated Instances are Amazon EC2 instances that run in a VPC on hardware that's dedicated to a single customer. You can also use Dedicated Hosts to launch Amazon EC2 instances on physical servers that are dedicated for your use.

> Dedicated instances may share hardware with other instances from the same AWS account that are not Dedicated instances.

> An important difference between a Dedicated Host and a Dedicated instance is that a Dedicated Host gives you additional visibility and control over how instances are placed on a physical server, and you can consistently deploy your instances to the same physical server over time.

It looks like you can launch DIs on your DHs, or on any arbitrary host; but once you have a DI on an arbitrary host, only your VMs will run there; so a de facto Affinity policy. And any instance you launch on your DH is automatically a DI.

Is there a benefit to running DIs without having a DH? It sounds like having a DI gives you 90% of a DH. The DH gives you is a few hardware details (which might be essential for licensing), and like GP suggested would let you choose Affinity (or Anti-Affinity) between them manually.

As a result, Dedicated Hosts enable you to use your existing server-bound software licenses like Windows Server and address corporate compliance and regulatory requirements.

This is the first I'm hearing about DHs, and it sounds like that might be what we need, instead of the DIs we've been telling other teams about.

I'm not an expert on DI and DH, sorry. I can find answers for you though and post them back here. You can also email me: randhunt@amazon.com (actually anyone on these threads is welcome to email me any question they have) and I can try to get back to you there.

If you have hipaa reqs, your signing an agreement with Amazon will require you to host pii/phi on a dh

Wow. Thanks!

I'm not sure what "dedicated machines" mean here - as far as I can tell from:


You can buy up to two of each type/location and schedule your vms to run on different physical hosts?

Effectively, Dedicated Instances are an Affinity policy. You're looking for an Anti-Affinity policy, which isn't common. Here's an article about Affinity and Anti-Affinity on OpenStack: https://techglimpse.com/affinity-anti-affinity-policies-open...

If AWS were to go to a per-minute billing cycle, they would be instantly more price-competitive with Google's offering. Or, to put it the other way around, those leftover minutes form a significant chunk of AWS's profit margin.

I don't think so. GCP's bill is usually about 50% of AWS's bill for same application, if you run it full hour (from my personal experiences and from several others as well: https://thehftguy.com/2016/11/18/google-cloud-is-50-cheaper-...). GCP has lot more cost saving features like seamless scalability, custom shapes, sustained discounts and so on. If you workloads span less than hour, GCP can offer more then 50% savings.

I refuted some of the networking claims in that article previously (I work for AWS). Especially the bizarre claims that you have to get a C4.4xlarge for 1gpbs... The 220 mpbs network cap claim is just not true. Just run iperf3 on any aws instance to a GCE instance and you can see greater than 220mpbs.

Honestly we all know that the small instances have terrible CPU that doesn't let you use the advertized 1Gbps anyway. Other than that, even if AWS let 1Gbps traffic go on for a while, you get throttled pretty quickly from my experience.

Author of the quoted article here.

The run of iperf refuted your refutation.

> Just run iperf3 on any aws instance to a GCE instance and you can see greater than 220mpbs.

For how long is the question. Historically, it’s been considered common knowledge (might just be an urban legend) that AWS, even if you pay for more traffic, at some point just throttles you, the same way that they do with IO.

He said "more" price-competitive :). I think we're all saying the same thing.

Agreed, and I hope they do so!

Though there would still be other things like the lower on-demand rates, custom shapes, networking that scales with shape (rather than being coarsely grouped), being able to attach SSD / GPUs semi-arbitrarily, and so on. For those that care, not having to pay up front for the best price is also a huge deal. You see the same thing in GCS vs S3 as well: Glacier and S3-IA have a few rounding up gotchas that catch many people out.

All that said, I hope we all get to per-minute billing.

Disclosure: I work on Google Cloud (but haven't talked to the Metamarkets folks)

> As we investigated growth strategies outside of a single AZ, we realized a lot of the infrastructure changes we needed to make to accommodate multiple availability zones were the same changes we would need to make to accommodate multiple clouds.

Maybe he author means multiple regions? Multi az is so easy. Everything works. Multi region is much harder.

Very nice writeup! A nice, detailed read that was easy to understand.

It seems to focus more on raw infrastructure (EC2 vs GCE) instead of each company's PaaS offerings. Obviously AWS has the front runner lead here, but would be super curious in a comparison of RDS vs. Cloud Spanner for instance. (pun unintentional, but then realized, and left in there)

This should be RDS vs Cloud SQL

Did you mean to say AWS Aurora vs Cloud Spanner? Because, I don't think you can compare RDS vs Cloud Spanner. RDS is a managed for the most of the famous RDBMS out in the market (except Aurora). Cloud Spanner is a google proprietary db running only on GC.

Great thorough comparison and falls very into line with my experience. Definitely worth the read. Thanks!

Off Topic: it's frustrating that these companies spend quite a lot of time and money learning about the complexities of their infrastructure but when you're interviewing at such companies, you're expected to have answers for everything and a complete strategy for the cloud.


Great post! How difficult is it to switch from an AWS EC2 instance to the GCP version?

Nice post! I will be using it as a reference.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact