Hacker News new | comments | show | ask | jobs | submit login
GCE vs. AWS in 2016: Why You Shouldn't Use Amazon (thehftguy.com)
149 points by mikecarlton 59 days ago | hide | past | web | 169 comments | favorite

We moved back and forth between AWS and GCE (based on who gave us free credits). Once we ran out, we chose GCE and never regretted it.

GCE has many quirks, for instance the inconsistency between API and the UI, it misses the richness of the services offered by AWS but everything GCE does offer is just faster, more stable and much more consistent.

One of the biggest problems with AWS is that once you outgrow the assigned limits, it becomes a hell to get more resources from them. We're running on average around 25k servers a day, majority of them preemptible (spot). AWS requires that you request exact type and location for your instances. GCE only asks for a region and then overall resources (e.g. number of CPUs).

Also the pricing is much less complicated. 1 core costs y, 32 cores cost 32*y.

> AWS requires that you request exact type and location for your instances

Are you running 25k without using Spot Fleets? Spot fleets let you specify value per instance type, and then total value you need. ("Value" could be CPU, memory, network or whatever) AWS will maintain the lowest cost spot instances that fulfill your requirements.

We couldn't even get them to approve 5k servers on AWS. We run everything in GCE.

That is hard to believe. Their case studies have organizations using thousands or tens of thousands of servers for scientific computing, etc.

I didn't blur anything so you can see for yourself http://imgur.com/14upQ5j

Case studies tend to be written by the marketing team, and will tell you what the marketing team wants you to believe. I'll take "hey, this is my hands-on experience" over "this is what we want you to think" everytime.

I understand spin, but the claim is that AWS markets a service/product that they will not provide.

Those studies always involve close coordination between the PMs in the Cloud org and the client.

Your post has motivated me to blog some details about the difference between AWS EC2 Spot Instances and Google Cloud's Preemptible VMs:


(work at Google Cloud)

> We're running on average around 25k servers a day

25k number of servers or its the cost of the servers ?

Number of servers

Wow, your bill must be gigantic.

Well, it's not small, but definitely not that crazy. We're running an immutable infrastructure and the majority of the servers are a "cattle".

To be more specific, we redeploy servers from images instead of replacing just the code. The majority of servers (99.9%) is preemptible (meaning that they will be deleted every 24 hours) and thus we get a huge discount. Also our whole communication is through pubsub so the servers don't communicate directly between each other.

Thanks to all these optimizations, we run only what we need at any given time, which keeps the cost low. It's still much larger that anything I've ever expected, but I'm still trying to convince myself that this is the sign of success.

I have to wonder if their Lambda service might be worth looking into. Any reason you're using full on servers rather than some kind of a compute engine? I don't have the great fortune of getting to work with those kinds of numbers, so forgive my ignorance here, I'm really curious about your setup. :)

I'm not the original commented but I have analyzed Lambda for some tasks and found its costs prohibitive. Moreover, aws Lambda has a 300 second timeout limit so it is not feasable for heavy processing.

I played with lambda recently. It's further than Functions [0] but it has a lot of limitations. We like to have control over our infrastructure, at least some kind of control.

[0] https://cloud.google.com/functions/

You should have moved to bare metal months/years ago. Things like making your servers preemptible is like putting a bandaid on a gaping bullet hole.

There are many reasons why bare metal doesn't make much sense for us, at least at the moment. Here are couple:

- we're very small team (13 people). We would never be able to manage this kind of infrastructure

- we're growing incredibly fast. Just 6 months ago we were running on ~300 servers. We grew so quickly that we are now working with GCE's infrastructure planers to plan our resource needs

- even we're not suffering from massive bursts, we don't grow gradually but rather in massive jumps. It would take us months instead of days to grow this quickly on bare metal. This way we can grow in days.

- it is way more complicated to host in many geographic locations, especially in asia. We need to be present in many parts of the world and with GCE/AWS/Azure it's very easy and convenient.

- we did try to work with some bare metal hosters (like OVH), but they are not able to deliver the servers on the schedule we need. They require way longer times and obviously much longer commitments

- the pricing is actually not that different once you run on preemptible (spot) instances. The bare bone would give us some performance boost, but the freedom is worth every penny.

@user5994461 can't reply to your comment for some reason, but you're correct. Ansible would never scale. We ditched it a long time ago. We're using packer to create images/docker and then distribute them automatically via kubernetes or just directly through the custom images.

As I've mentioned earlier, we're running an immutable infrastructure. That means once we need to change something we replace the whole server. Each server runs only one single service. That allows us to run smaller instances but in large quantities.

We actually did run on Softlayer. It was a nightmare. They have consistently some outage somewhere. We couldn't count on any instance to stay up. The performance was better, but you can't threat the infrastructure as a cattle and that was a huge limitation for us.

The "reply" button sometimes goes away when the discussion is deep enough. Gotta click the comment to comment.

I would imagine that rolling a container to thousands hosts may take a while.

What kind of load and software do you run? In my experience, dramatically scaling out increases load and latency variance and cause all kind of problems.

When was your experience with SoftLayer? Care to elaborate? I've got some interests in them for future projects. I'd rather hear about the issues now :D

You're still thinking about these servers as a bare metal. We don't keep the servers running. We create a server always from scratch or based on an image we prepared or with a docker container that is automatically pulled from an internal repo once the OS is turned on.

This is the benefit of running on GCP. We don't have to trouble ourselves with the headache of scaling both with the images and containers thanks to the internal tools offered by GCP.

We quit Softlayer around February. We were running bare metal and they went down so often that we essentially ended up keeping one super large server for a very insignificant service. We never gave them too much of chance so I may be too harsh.

Pulling a 100 MB docker image from 1000 hosts would take an eternity.

I stopped thinking about bare metal a while ago. I'm thinking about the comparison with AWS and I see that your entire business is relying on capabilities only available on Google Cloud (which indirectly, is why I'm advising Google Cloud nowadays. It's easier for basic usage and it's unlocking a whole new set of extreme usage).

Things you couldn't have pulled off on AWS that easy: Create thousands of hosts, deploy images that fast, managed kubernetes, pubsub, worldwide AMI, multi region subnets.

Never tried it on AWS so can't comment on that. It works very well on GCP however. We send over 1M request to their API to rescale the stack every single day (they had to up our limits because it was always timeouting on us).

It's likely that a large percentage of your workload is static.

It's likely that bare metal is lower cost than spot - it is for me.

It's likely that your issue is allocation of capital - you'd rather put your money into burning dollar bills into AWS versus pre-paying for 18 months of servers with capital.

I'm happy your company is doing great.

I agree GCP is a great competitor to AWS.

I think you're being close minded to building out a 20-60 cabinet data center to offload some of your workload.

It's hard to do a DC with over 20 racks and pay more than 1/2 of AWS's prices.

You summed up nicely, up until the last point.

I understand why would think so, because you don't know our setup. Just to give you an idea, we process 16PB of data every single month. This is an ingress traffic. If we would pay for this traffic going out (egress), we would end up paying over $1M just for the traffic itself. By keeping everything in a single spot it costs us literally 0.

That said, I've tried. I reached out to many hosters, like OVH and others. They just don't have the capacity we need. 20 servers will not make a change for us. We wanted to start with 500, but it would take them 9 months.

They are fools. I will build you 1k servers with 100g to AWS in 75 days.

Your $1M estimate is 3X too high at 2c/GB.

In two hours I'll save you $25k per month. In 6 months $300k/month.

Either you are growing and should be investing in cost efficiency or you run a staid lifestyle business. Which is fine, but if you aren't growing, get off expensive cloud.

You've a good logic in this, but you don't have enough details about us to judge it properly. The gain would be much smaller than you think and we would lose a lot of freedom, which would slow us down.

I don't think it's worth continuing to argue. It's obvious you've done your research and built a real system that works, you don't have to continuously defend people who claim you can save 50% but don't understand why running an entire system means more than just miniizing the hardware cost.

> I think you're being close minded to building out a 20-60 cabinet data-center to offload some of your workload.

I'd say it's time to forget about the infrastructure. Should optimize the software and rewrite in C++ :D

We already have huge portion of the system in C/C++ ;)

Would you mind giving more details about what you run with those 25k servers?

I'd worry about the time it takes to reconfigure these servers with ansible/salt. or the time it takes to kill/rebuild them from a system image.

On a side note, the 4th competitor is IBM SoftLayer. It's different enough from the top 3 (Amazon, Google, Azure) to be a thing of its own.

> Founder of Pex [pex.com], a video analytics & rights management platform able to find and track online video content anywhere.

src: doh's HN profile.

I agree but that's years out. We are barely scratching the surface.

Once the hockey stick flattens out then you will want to find permanent homes, assuming you have a viable long term business model

I'll be honest, it really feels like you haven't done a good job of scaling up first (at least based on your comments about cost and how they aren't provisioning fast enough). When you say you tried OVH and it didn't work, are you deploying dual processor hexacores with 64 gb+ of RAM? Because if not you're probably not doing it right. What software stack are you running?

Get 1-2 more people and move off the cloud. Maybe not for everything but to service your base traffic. You can manage it if you spend a few days learning how, in the same way you learned AWS/GCE.

I may be missing something, why is bare metal better in this case (immutable/premptible servers)?

Just the sheer number of instances the OP is using has put the usage FAR into the territory where the premiums spent on virtualized instance costs far outweigh any clever strategy one might use to make things cheaper in the cloud. It has nothing to do with what the infrastructure is, and everything to do with raw volume.

Spot instances are super cheap...about 80% less than regular instances.

Yea but it's still virtualized slow molasses at the end. And still more expensive. My point is that at 25k servers, even a 5% efficiency gain is significant, and I'm willing to bet the margins are still higher.

Awesome, would you mind sending me an email (in profile), I'd like to pick your brain a bit further.

We were sharing the same office for over a year [geekdom]. You can find my email in of the discussion chains.

Lol, sorry I failed to put 2+2 together.

I can't imagine why you'd guys need 25,000 servers. Your G bill at a minimum must be over 5 million a month. I could be wrong, but those numbers don't add up too me.

What instance type are you running?

We don't host on AWS, we host on GCE. Also you're way off with your estimation. We pay significantly less.

As the instance go, we have almost everything ranging from many thousands on the micro side to hundreds on the largest side.

$200 a server was my estimate which is pretty conservative. I assumed that price included disk and bandwidth per server.

25,000 * $200 = 5 million a month.

Well, no. You're calculating with 100% utilization + with normal servers. Preemptible (or spot) instances are way cheaper. For instance 8 core 7GB ram preemptible server on GCE costs around $45/month under full utilization. We may use it for couple of hours a day and then terminate and open a different one.

It truly differs minute by minute.

Based on which criteria do you preempt your servers? 25K sounds like a lot !

What do you mean? We use them for many sorts of tasks. We have only couple of fixed servers that are keeping the state of the whole infrastructure and databases that keep the data.

At 25k servers / day one would think pulling this into real hardware and doing capacity planning would be cost effective.

That's what I was thinking. Unless they're really bursty (so they only need the 25k servers for an hour at a time or whatever), I would think that 25k servers would be well beyond the number where it makes more economic sense to build out your own data center.

> Unless they're really bursty

Last I checked. AWS has hourly billing and Google Cloud has sub-hour billing.

If they are really bursty, they are in troubles with AWS pricing model.

> the number where it makes more economic sense to build out your own data center.

I would think that this number doesn't exist... unless you have already built datacenters for yourselves and you have a major internal expertise in that.

I can tell you less than 1000 servers in AWS runs around 200-300k US / month

I am not sure what he is doing at 25k servers but you can see real world math puts this in a serious money range. Data centers can be built for less.

I wonder if the down voters on the hardware comments actually run services at scale.

n1-standard-1 * 25k instances = $180k per month with pre-emptibles

You didn't see that coming, did you? :D

That gives you 25k CPU + 83TB of memory. (Of course, that's just to give a figure, they probably don't use single core instances).

Forget everything you know about AWS, it's over priced. Google is half the costs of AWS in average (down to one quarter for special cases).

I'm not sure what you mean by building your own datacenters (really, the buy some land and build from scratch?). Just having a bunch of dudes in a few locations worldwide is gonna be a multi-million dollar projects. And I'm not even talking about building, power, cooling, servers, storage, network. That's the hell of an undertaking.

What kind of servers run at that price range? I've listed [0] some of the reasons why are we hosting in cloud instead of our own datacenter.

[0] https://news.ycombinator.com/item?id=13260805

Various servers, i2 for Cassandra eats most of it. Application at various levels, multi region etc etc

We too are hockey sticking but not ready to go to metal .... yet.

Your reasons make sense. I would grow that team soon :)

To quote another articles from that site: https://thehftguy.com/2016/11/18/google-cloud-is-50-cheaper-...

The i2 comparisons are under "local SSD and scaling up". Google have local 400 GB SSD that can be attached to any instance. That's a lot more flexible and a hell lot cheaper when you have specific needs.

We have some large servers, but not too many of them. We're running 32 core nodes that are holding our Postgres, but the majority is smaller.

We are a data company :)


Actually that's super neat.

I grew to envy you as I read. It looks like interesting work.

I have serious trust issues with Google. Their history of discontinuing services, dismal support (even for a paid service), and neglect of bugs in SDKs/APIs - all three of which I have experienced first-hand - has left a long-term bitter taste. No doubt some individuals are fantastic, but the organisation as a whole gives me an enduring impression of being systematically arrogant and aloof.

AWS by contrast have demonstrated fanatically helpful support (even on business level, the cheapest), fixing issues within days of reporting, and a willingness to maintain obsolete/deprecated services (like SimpleDB) long after I'm sure they'd wish everyone had migrated away.

It seems to me that Google is highly siloed internally. The GCP team is very visible (they engage on HN and elsewhere), and GCP has excellent support. Very different from other parts of Google.

I have problems with Google as a whole, but I have as much faith in their cloud as I have in AWS or Digitial Ocean (we currently use all three).

The Kubernetes team is also highly engaged on Github (they also occasionally show up on the official Slack channels). The Go community suffers from a depressing abundance of hostility, intransigence and arrogance, and some of Google's projects (e.g. Protobuf, the Go project itself) reflect this, but I was delighted to find that the Kubernetes people are not like this at all. It's a very friendly, quality-focused community. I think they have to be since Kubernetes is still emerging tech that's craving adoption. Same goes for GCP -- Google doesn't hide the fact that they're aggressively courting customers to migrate.

Not to be too cynical but none of this is reassuring. It's making it sound like the moment that GCP stops being the underdog they'll stop being as helpful and courteous :)

This thread of comments from a year and a half ago, on the article "How Amazon took control of the cloud", is really epic on these points.


Honestly. I would ignore any comment that is that old.

Running anything on AWS or Google 3 years ago. That must have been hell.

The attitudes and competencies of organizations generally doesn't change over the time scale of only a year and a half (which is really not long at all... I mean... I've been using AWS now for over eight years and had a product deployed on App Engine six years ago; the people I work with who really like GCP have been using it for three or four years, and our conversations about it and their experiences track that far back: a little over a year is nothing...), and when it does it is generally due to a massive disruption, such as a changing of the guard at the upper echelons of management. This is particularly true of companies that are dominated by personality effects, as are most of these massive tech companies. FWIW, the serious complaints I would lodge against AWS today are the same things I was complaining about back in 2009 on the Amazon EC2 forums. Sure, the actual technical abilities of products change over a year and a half, but that is neither the complaint of this comment thread here, nor of the one I linked. Let me put it another way: the key issues people have with Google are generally issues that happen over longer scales of working with them: things are great for a half a year or even two or three years, and then the rug gets pulled out from under you (whether or not you are paying for the service and whether or not you are generating lots of revenue; I mean: look at the fate of Google Wallet); if you seriously are throwing out experience from as recently as a year and a half ago and seriously believe that after three years knowledge would have become fundamentally out of date, your article today about "use Google, not Amazon" is clearly also worthless as the real question users need to answer for themselves is "in three years will I have regretted my decision", and you believe articles and comments and experience (such as yours!) is not capable of guiding people, which is ludicrous.

Agree that the general criticism didn't change. (And the criticism from that link was on AppEngine, so that's certainly on point).

However 1-3 years is a big gap in terms of capabilities and tooling.

If I go back long enough ago, there was a time without nat instances, without VPC, without region in X zone where I have customers. That's deal-breaking changes. Some things we use today to manage our servers didn't exist or didn't support a specific usage some months ago.

I'm certainly biased because I've only been using AWS since last year. I see the things we do which are marked as "added MM-YY" and it would have been hell to try to do that 3 years ago.

That's why I advise to take old criticism about capabilities with a grain of salt.

I'll add that the main issue seems to be with Google AppEngine of two years ago. For compute options Google Cloud has Compute Engine, Container Engine (Kubernetes), and even AppEngine Flex Environment these days.

(work at Google Cloud)

One reason why I still can't use GCE in 2016. No PostgreSQL support for CloudSQL.

I can find alternatives for other services, but I don't want to compromise on the choice of relational database.

Note: I understand there are third-party providers for PostgreSQL, but I'd rather have Google's.

Googlers on HN have commented before that they're working on it. No ETA, though.

With AWS now offering both plain-old PostgreSQL and souped-up-PostgreSQL-on-Aurora [1], whatever Google produces needs to be great in order to compete. However, I fear they'll initially come out with something that's on par with the current MySQL support in Cloud SQL, which is just a vanilla MySQL server behind a UI/API. (For example, Cloud SQL's read-replica stuff is reportedly just MySQL binlog replication.) Better than nothing, of course.

Cloud SQL also has some annoyances (such as not supporting private IPs and the need for the Cloud SQL Proxy [2]) that I hope they're working on.

[1] https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgr...

[2] https://cloud.google.com/sql/docs/sql-proxy

Yeah, I heard this too.

TBH, I don't need Aurora-level performance (although it would be a nice option in the future), I'm just happy to have a vendor-managed PostgreSQL instance with granular price points.

C'mon Google! :)

This is on my short wishlist for GCE. It's such obvious feature parity that it's very strange they're not doing it.

I believe they are (as mentioned by other commenter), but it's taking too long :)

Totally agree, Google Container Engine (hosted kubernetes) plus managed Postgresql would be my dream setup.

Have you found any compelling hosted Postgres RDS alternatives?

I mean, there is ElephantSQL [1] and Aiven [2], but I'd rather have something from Google themselves since I consider relational database a core part of the architecture and infrastructure.

[1] - https://www.elephantsql.com/

[2] - https://aiven.io/

Two points here really hit home with me about AWS (not in comparison to GCE though since I've never tried it).

1) Reserved Instances: I think the pricing model for this has become very outdated since the beginning of AWS, and it is definitely becoming cumbersome (and therefore scary) to use.

2) ELB + Traffic Spikes: I have tried (unsuccessfully) to pre-scale an ELB to prepare it for the traffic it was about to receive. I tried to pre-scale for this project 3 different times, in coordination with support and without them. I could not do it. Very frustrating.

I think these are all signs of extreme growth, and a strange organization of engineering units inside of AWS. However, as OP descried.. we are much to heavily invested in AWS to consider an infrastructure shift at this point

We also manually sync up with AWS support to "warm up" our ELBs... I guess they don't expose an option to avoid abuse, but it really does seem like an implementation caveat.

Can you talk more about ELB + Traffic Spikes?

Under high traffic, ELB will fail?

How do you pre-scale the ELB?

Many thanks.

God I hate any piece of writing that doesn't define its acronyms at least once. Google Compute Engine isn't popular enough that people should be expected to know immediately what it means.

Also, GCE is a service while AWS is a suite of services. The correct comparison should be Google Cloud Platform (GCP) vs AWS. Guess it's nitpicking.. but still.

I don't think this is nitpicking. This is probably the single biggest issue with Google right now: Don't know how to call their services.

GCE - GCP - Google Cloud - Google Compute Engine - Google Cloud Platform ???

I can see on that blog's analytics that people are looking for various terms which result in different articles and ranking.

This needs to be unified by Google. Personally, I think I'm gonna call everything "Google Cloud".

It is unified, they have some of the simplest naming. GCP is Google Cloud Platform which is the entire ecosystem. GCE is Google Compute Engine which is the VM/IaaS offering within.

You can look at the official products page for all the names and descriptions: https://cloud.google.com/products/

Amazon also suffers from the bewildering-array-of-confusingly-named-services syndrome.

This entire blog post is rife with other small issues. AWS support is not a 10% minimum at all! Developer Support is 3%.

> AWS Premium Support is mandatory

Is Google Cloud support even acceptable? Google is known for poor or no support for most services.

Google's known for poor or no support on their free services but their paid services often have decent support. Personally, I've found GCP, GSuite, Project Fi, Google Store and Pixel support to all be pretty great but haven't found any support at all for Gmail, YouTube etc.

Google nearly wreaked a business that was making them bank and is now doing 20MM/year without them. Support wouldn't even talk to the founder: https://mixergy.com/interviews/the-penny-hoarder-with-kyle-t...

Hah, Adwords must be an exception then - support is total crap. You're lucky if you get a canned response that doesn't begin to answer your question..

Hmm sounds like you might have to increase your bids.

We already pay around $5 per click - I think that's more than enough! However, it's true our business is tiny compared to that of large corporations - but I don't think that excuses the abysmal 'support' I and others have received from Adwords.

It's a joke... you said that you always get a canned response, didn't you? That's the canned response I always get from them. Doesn't matter what the issue is, raising bids is always the solution.

YouTube red support seems excellent (the one time I used it).

And my father asked support for help with his Google Home routers with a very specific question and support was absolutely phenomenal. No going through lame troubleshooting steps; it was clear we got connected with a well qualified network engineer immediately.

The actual people doing the tier 1 support at Google also got through the Google hiring process. 6 years and 250k on a CS masters from Stanford and they'll set you up helping people with home routers :) it's really not just a meme that Google massively underutilizes their talent.

Actually it's surprisingly good. We use them a lot and even when we were on the silver package (the cheapest) they were pretty quick (under 1 hour for P1 situations [production is impacted]).

Can you help me understand how much you define "a lot"? Is it about 2 tickets a month, or more?

With AWS we do have a direct line to a Technical Account Manager which we utilize for the rare P1 (Prod down) situations. That gets things moving quickly if it wasn't already moving quickly with Support.

Just checked, we're on two tickets a week. Sounds a lot but we push GCE to its limits.

On gold support and P1 there is usually an engineer assigned to the ticket. In many occasions you're talking to the person fixing your problem.

GCE is not perfect and neither is their support. But they do try and even when things go sidewise they take responsibility for it, which didn't happen much with AWS, at least in our case.

You have the same thing for gold and up on GCE. On the rare occasions I have to use it, I get every pennies worth.

Also with P1 issues on GCE most of the time an enginner assigned to the case calls me. I've even spoken directly to the head of google cloud platform via hangouts.

What company do you work for?

Thought I would chime in here since I didn't see anyone comment here about gold. We have GCP gold support. They answer all the questions / issues quickly and there aren't any limits to ticket counts I am aware of. During both the outages we have been affected by (~1.5 hour google load balancer outage, >3 hour bigquery outage) in the past few months, the support feels pretty bad even though there is nothing they can do I guess. However during the bigquery one, it was "outage will have an update by some time". There proceeds to not be some update for well past that time. Also 3 hours later its still we will have an update by some time but no more info when both of these were past SLA. Overall feels good and they answer any weird / general questions too which is nice (although sometimes on every reply they say I am going to close the ticket).

I hear ya.

Check out the postmortem for the BigQuery Streaming API outage [0]. Relevant paragraph:

"Finally, we have received feedback that our communications during the outage left a lot to be desired. We agree with this feedback. While our engineering teams launched an all-hands-on-deck to resolve this issue within minutes of its detection, we did not adequately communicate both the level-of-effort and the steady progress of diagnosis, triage and restoration happening during the incident. We clearly erred in not communicating promptly, crisply and transparently to affected customers during this incident. We will be addressing our communications — for all Google Cloud systems, not just BigQuery — as part of a separate effort, which has already been launched."

(Work on Google Cloud and was on BigQuery team in the past)

[0] https://status.cloud.google.com/incident/bigquery/18022

Whenever I've needed to contact Google support for Gsuite, they've been excellent.

Same here

Support for things like the Nexus devices and anything you can buy on their store is very good.

Also, Google My Business support is also pretty good.

AWS premium support is the best support experience I have encountered so far and should absolutely factor into choosing a cloud provider. Reading frequent stories of Google support nightmares across all their services makes me think twice about using GCE. GCE must find a way to counter this.

Also, the AWS premium support fee is negotiable for some customers from what I have heard. They don't like to negotiate down, though!

Strange. I'd qualify them as the most useless support I've ever encountered.

In the year 2016, among maybe a hundred tickets, there was only ONCE where they could change something (an ELB issue).

And well, I'm not sure whether the fix was related to their changes or if it was just an intermittent error that happened once. Thus their implications in the only time something happened has yet to be proven.

My biggest complaint about AWS is still EBS and having to guess about for the right provisioned IOPS. Throw in confusing extras like EBS optimized instances, enhanced networking. GCE just abstracts away all these details.

This was referenced in the previous thread, but here is what amounts to a stl;dr:

> "Unfortunately, our infrastructure on AWS is working " > "I learned recently that we are a profitable company, more so than I thought. Looking at the top 10 companies by revenue per employee, we’d be in the top 10."

I'm a bit interested in what their company's response would be to this article.

The author(s) have a high proportion of strongly pro-GCP articles; so many, and so emphatically worded, that I began to distrust the source.

Depends on who you ask in the company and how you word the question.

For starters, most of the people here don't need to know anything about AWS. They just fill a [sortof] spreadsheet with a goal-team-instancetype-count-zone and they get servers up and running fully provisioned 5-15 minutes later. There's another lists with load balancers and security groups if they wanna do fancy stuff.

Looking forward to the day Google Cloud provides on-demand GPU-backed machines. Currently AWS is the only game in town for that, as far as I know.

Coming early 2017: https://cloud.google.com/gpu/

You should add that Google will allow to attach up to 8 GPUs to any type of instance.

That will put Google Cloud seriously ahead of the competition in terms of GPU computations.

Long term: Google has a better trajectory. IMO.

Note that AWS announced something similar at reinvent 2016 but it ain't coming any time soon. Not sure about the status, is it even real anymore?

Last I checked I only found per-month pricing. Are there finer-grained options?

VMs on Azure are billed per minute and I haven't found any indication of N-series being the exception.

What's your source on per-month pricing for N-series?

Apparently, it's coming soon to GCP.

What reason is AWS premium support mandatory? I ask because I'm currently building out SaaS offering on AWS and haven't yet hit any issues requiring support. Can I expect to start seeing issues as traffic scales up to a certain level?

Author of the article here.

It's not. It's utterly useless. After an entire year, where the support has never been of any help, my last action for the year 2016 was to call a meeting with everyone, subject line "we should cancel our support subscription with AWS".

It's clear that we (especially me) are way more qualified in all AWS offerings, from basics to special quirks, than they are. And they can't do anything that we can't do ourselves.

I think the support is useful when you first start out, they can answer a lot of general questions. You should subscribe for support the first year and see how it goes.

Note that some AWS managed services (e.g. RDS) can only be debugged by the support so you might be forced into support if you use these services. We don't.

I consider it mandatory because of the various random issues you will face, even without high levels of traffic. Also, when leveraging new features many times the documentation isn't as refined (e.g. the ALB when it first came out). Support is valuable in explaining how to really do something without wasting much time.

Having 24/7 access to one dedicated AWS employee (the technical account manager (TAM) you get as part of premium support) can be _really_ helpful. Such TAMs can get you direct access to the product teams inside of AWS in case you need it and can escalate problems quickly.

Still, it really depends on what you're doing at which scale with AWS. I also found business support quite good, if you don't need such dedicated resources.

Regarding seeing issues, it's simply a matter of probabilities: The more you run in AWS the more likely it is to get problems sooner or later with one or the other service.

My advice is to start with business support and see based on the experiences with it if you have additional requirements business support doesn't meet. If yes, you might then better know if premium support might offer enough value for its money.

Contrary to what the article says, my company does a lot of business and traffic via AWS and have never once needed support in the last two years.

We've been on AWS since 2009, and never needed support.

I've used AWS for a bit over 9 months now and it's quite terrible to be honest.

I don't need it for anything professional and it's quite terrible for just some amateur hosting plus the immense fees if you somehow manage to get decent traffic together.

Once my reserved instances run out I'll probably either check out GCE or DO, either seems to be a better option, though GCE seems to be more expensive.

Anyways, the console in AWS is a mess and I'm quite sure that I leaked my entire IAM settings to the internet because some switch somewhere isn't set right.

Since everything recommends to setup IAM users you'll have to setup the permissions, a procedure which I enjoyed about as much as getting my fingernails slowly removed by a glowing red iron.

Calculating any sort of sustained cost is a pain in the backplane if the total doesn't exceed three digits a month.

And lastly the login process is probably the biggest pain I've encountered across many many providers. There are atleast 4 login forms I've discovered, 2 of which I have to use and one of those always asks for a captcha with such low quality that a brain-damaged AI running on my calculator could figure it out, not mentioning never knowing if the 2FA setup was correct or maybe probably blew up somewhere because giving some feedback from the UI is plain impossible.

TL;DR Don't use AWS, anything else is better.

I mean, you're basically setting up a datacenter if you aren't using Beanstalk.

I see complaints all the time about how complicated the networking is, IAM, etc is. But it's far more simple dealing with VPCs then having to buy and hop onto a bunch of F5s/Brocades etc that tend to require their own network engineers on staff.

The issue always seems to be that someone tries to move a company onto AWS but they lack the experience in actually running infrastructure to that degree. If you know Change Management you can figure out Cloud Formation templates, if you don't you're likely completely lost and rolling out instances by hand. If you don't know any network engineering you're likely going to have issues with load balancers and VPCs.

You can really see the experience difference in people when you work on multiple AWS infrastructures. And that's the huge benefit to it. You can practically roll out whatever sort of infrastructure your business requires.

If you're experienced enough you can basically do every single thing in AWS/GCP without having to hire ancillary staff (network engs, etc).

I personally use Cloud Formation and create EVERYTHING by it. Load balancers, VPCs, instances, kubernetes, etc. But yeah, if you haven't touched CF before it can be like walking into a spider web of confusion - but believe me, I'd rather do that over again than take over someones hardware that has no APIs, no dashboards, no centralization. I spent years in datacenters where everything was done manually. I wouldn't ever want to go back now that I've learned to treat infrastructure as code.

But hey, if you want to run Vsphere servers, manually configure instances and databases and not do any automation via AWS you can do that as well.

That's why it's incredible.

I totally agree, AWS is great if you are basically screwing in the racks yourself anyway (metaphorically speaking).

But for running a few hobbyist/amateur servers on it, it's absolutely horrifyingly complicated, especially when I'm not really a networking engineer, I like tinkering in the backend but not to that degree.

You could look into Beanstalk, it depends on what the code your app is written in, though. But yeah, I really wouldn't suggest AWS for something simple that DO or Linode could easily handle.

GCP is also far more simplified than AWS if you haven't tried it. The way they do server auth via Google accounts (using gcloud) is pretty awesome and simple. You can make project-level SSH keys that are automatically placed onto every server and you can block specific servers from receiving them all through the UI.

I do greatly miss AWS's breadth of service offerings, though.

AWS did just introduce Lightsail, which is for people who just want a VPS. It's funny because I worked for a VPS hosting company back when EC2 first came out. At the time I thought it was the death knell for that type of web hosting, but now ten years later Amazon is going to try and grab that same market.

> If you're experienced enough you can basically do every single thing in AWS/GCP without having to hire ancillary staff (network engs, etc).

Disagree. You definitely need that staff.

Just because it's point and click UI (or script and execute terraform) instead of physical cables doesn't mean you don't need highly skilled network guys to design and configure it.

Short version: NEVER use AWS if you're not in a professional environment.

Try Digital Ocean. That should be easier for you.

Here's an article from the same blog to help you choose a cloud provider: https://thehftguy.com/2016/06/08/choosing-a-cloud-provider-a...

Thanks a lot!

Yeah, I totally understand what you mean. I felt the same when I first got on AWS.

What helped me immensely was taking a job where they had the whole stack on AWS, and I had the chance to learn by doing.

Secondly, I signed up for this class on Udemy and that taught me the details of setting up a legit infrastructure.


I agree it's challenging at first, but once you understand what's going on it's fucking awesome. You can configure every little thing and I have my system decently optimized to cost next to nothing.

DO is nothing for serious business, it's fun to use when you want to do a PoC at home. AWS is by far the most advanced platform when you need to do something backend related, they have large number of useful services.

Depends. We've been on DO for 3 years (we're slowly migrating to GCP for various reasons), and it's great if all you do is run your own VMs and manage everything yourself, and don't have a huge number of them.

Bandwidth, CPU and local disk performance, reliability -- all on par with AWS based on my experience. Of course, DO only has VMs -- they don't have things like EBS (though some data centers now have attachable storage), any of the add-on services like S3, or ability to tweak performance by IOPS etc.

DO is a bare-bones VM provider, and it is very good for what it is. It's not a toy.

It's not very good, it's average, the fact that the kernel version was set in the panel was a huge issue. You provide a VM and you can't upgrade the kernel wtf?

To be fair, that's no longer the case. I personally had no issues with that.

> it's quite terrible for just some amateur hosting.

The developer experience when you just want a box on the internet to ssh into is pretty frustrating. DO is fantastic for this use case.

Evaluated GCP, and 2 main issues made it hard to consider moving:

1) quickly bumped into project limits just doing some tests, and the fact that you have to wait until billing cycle to reset the counter was quite jarring (I presume there's a way to increase)

2) Better tooling for S3 than Google Cloud Storage - non-technical members of our team need to work with files, and there's many nice third-party tools for s3.

1) AWS has similar limits: https://aws.amazon.com/ec2/faqs/#How_many_instances_can_I_ru... and it's pretty simple to fill in the form and request more.

2) GCS has an S3 compatible API so you can use your S3 tooling: https://cloud.google.com/storage/docs/interoperability

AWS's limits are in real-time however. I don't mind the limits, I just don't like the fact that I can't run a quick test without it impacting me for the billing cycle.

I'll check out the GCS XML compatibility with the current tooling we use; it looks promising.

One more comment on storage.

Google Cloud storage has the same API for all its tiers. So to go from Multi-Region to Nearline, you just change the bucket designation. One API, one service, one interface and set of tools.

(work on Google Cloud)

You can increase the quotas under IAM> Quotas. They're extremely low on new projects, but I've never waited more than a couple of hours to have them increased significantly. Like 75 to 500 cpus, etc. One annoyance that they seriously need to lock down is that every time I request an increase they ask for a deposit or a project name that I own that has $xyz money spent on it already. I don't understand how/why they can't see the other 5 GCE projects attached to my organization. I've begun putting it in the quote notes because if I don't I'll wait a couple of hours for the "Please deposit $250 or send a project-name" then a few more hours after my reply to push it through.

It's not a terrible system, AWS does the same thing, you just need to be aware of it when you begin new projects and deal with it before you get rolling.

I use gcloud compute copy-files all of the time and it's extremely simple. I haven't used AWS much this year so I'm not sure how far along aws-cli or third parties have have come. I do generally prefer s3, it's far more feature full than GCS right now. The UI right now blows GCS out of the water. Your file revision numbers, etc are directly accessible from the UI, things like that.

See my comment to sibling - I'm okay with limits, just wish I could tear down a project in real time to be able to add projects.

As for storage, as a developer I'm okay with command line tools and APIs, but there are some tools out there for non-developers where they can just drag and drop files without caring who the cloud provider is. However, it looks like it may be XML API compatible, so existing tools may still work.

1) Increase your limits.

The time I opened a new account on AWS to do some disaster recovery. I had to send Amazon ~ 30 tickets for increasing limits and wait for a week :D

My concern isn't with limits, it's that they're not real time.

I really like OVH's SoYouStart in terms of its pricing for CPU/memory intensive computations, their prices just destroy AWS/GCE: https://www.soyoustart.com/ca/en/essential-servers/

In terms of raw machine cost it generally goes owning a DC < owning machines + leasing racks < leasing dedicated < VPS < cloud.

But each level down you go is another level of expertise you have to employ someone for and a loss of flexibility.

The strength of the cloud has never been cost, it's always been flexibility, the ability to scale up 1000 servers in 5 minutes without any preparation or management.

Yea but those 1000 server are going to do the job of 100 dedicated servers. You still have to provision them, and there aren't too many use cases (there are some) where you have to all the sudden jump 10x of demand. The reason cloud soars is because in developed economies the cost of human resources is astronomically high. The savings are from having fewer people deal with stuff. Even 100k/year bill is okay if that means that 1 person has to deal with it rather than 2 or 3. So if you have money then cloud makes all the sense in the world, if you don't have money and have a bit of free time dedicated servers make a lot of sense until you have money.

You're crazy son. AWS has this thing called "Resource Limits." I'll paypal you $20 if you show me a real screenshot that show you have 1k more instances available in your resource limits than you're using - subject to hangout verification.

Really bad support. And if you get packetstormed, they just nullroute you and call it "DDoS protection".

Over the years, I've had plenty enough of ovh/kimsufi/hetzner/etc.

For dedicated, I favor online.net and their alternative brand scaleway. Else AWS. I haven't played with GCE yet.

Being fair, nullrouting you is a DDoS protection... for them.

Because of speed of light/latency issues, I won't move to GCP until they have an RDS-postgres equivalent

According to comments from googlers elsewhere on HN, they're working on it. It's pretty much the number one thing people are asking for.

Having used AWS in production for about 2 years, I can say that scaling in the cloud is hard to get right. Spikes of traffic and load are hard to scale with, especially if you're scaling down to just the right amount of instances to manage the current load. We try to keep 50% capacity free for such spikes. Speaking with AWS support, we can fix issues with limits and scaling elbs, in hours not weeks, so I haven't had the same experience as you in that respect. One thing I would suggest, is that you have 2 AWS accounts. One for dev and one for prod. This way you should be able to see the limits that you're hitting before you get to production, and can raise tickets to get them ammended.

GCE has its own issues. The raw price is different than the real/total cost. In a better world you shouldn't have to choose between just a few cloud peoviders.

How have you observed the total cost being different from the raw price? In what ways were you surprised?

If you are after the raw price you should sign up for bare metal and install/manage your own tools. When you sign up for a service such AWS you pay mostly for the tools/ecosystem and management. I think AWS is still the undisputed leader in that regard, closely followed by Azure. GCP is something I would not recommend due some personal experiences namely the appengine platform and early cloud sdk releases.

You should use AWS if you _really_ need something like an infinitely scalable database such as DynamoDB or some other uber-scaling AWS service that you can't replace with some open-source software on a VPS provider. 95-100% of startups don't actually need an infinitely scalable database or whatever. So they shouldn't subject themselves to Amazon's horrible pricing and throttling.

GCE - piece of shit in case of support and solving customers issues. They didn't manage to FIX billing issue for 8!!! months. Support is useless. Missed access rules boundaries - everything you have to share with everyone. It's nightmare for companies.

The cost comparison link is broken.


I'd be interested to know why the OP never considered Heroku

Heroku is hosted on AWS, and so you inherit the performance, but lose the ability to control it (no way to select local SSDs or enable IOPS, as far as I know, for example), while at the same time paying a large premium for the convenience of using their platform.

Thanks lobster_johnson, yeah I know Heroku is ephemeral so data is stored in a DB or somewhere else, often S3.

Just wondering what kind of work he's doing that took him from AWS straight to GCE without stopping to think about Heroku.

Maybe you've hit the nail in the head, he's writing to disk a lot for some reason?

A lot of the article's focus was on network performance, local SSD availability and I/O performance. I think it's safe to say his use case involves a lot of I/O. If "thehftguy" means HFT == high frequency trading, then that's not surprising.

At that scale Heroku would probably cost 5x their current bill.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact