GCE has many quirks, for instance the inconsistency between API and the UI, it misses the richness of the services offered by AWS but everything GCE does offer is just faster, more stable and much more consistent.
One of the biggest problems with AWS is that once you outgrow the assigned limits, it becomes a hell to get more resources from them. We're running on average around 25k servers a day, majority of them preemptible (spot). AWS requires that you request exact type and location for your instances. GCE only asks for a region and then overall resources (e.g. number of CPUs).
Also the pricing is much less complicated. 1 core costs y, 32 cores cost 32*y.
Are you running 25k without using Spot Fleets? Spot fleets let you specify value per instance type, and then total value you need. ("Value" could be CPU, memory, network or whatever) AWS will maintain the lowest cost spot instances that fulfill your requirements.
(work at Google Cloud)
25k number of servers or its the cost of the servers ?
To be more specific, we redeploy servers from images instead of replacing just the code. The majority of servers (99.9%) is preemptible (meaning that they will be deleted every 24 hours) and thus we get a huge discount. Also our whole communication is through pubsub so the servers don't communicate directly between each other.
Thanks to all these optimizations, we run only what we need at any given time, which keeps the cost low. It's still much larger that anything I've ever expected, but I'm still trying to convince myself that this is the sign of success.
- we're very small team (13 people). We would never be able to manage this kind of infrastructure
- we're growing incredibly fast. Just 6 months ago we were running on ~300 servers. We grew so quickly that we are now working with GCE's infrastructure planers to plan our resource needs
- even we're not suffering from massive bursts, we don't grow gradually but rather in massive jumps. It would take us months instead of days to grow this quickly on bare metal. This way we can grow in days.
- it is way more complicated to host in many geographic locations, especially in asia. We need to be present in many parts of the world and with GCE/AWS/Azure it's very easy and convenient.
- we did try to work with some bare metal hosters (like OVH), but they are not able to deliver the servers on the schedule we need. They require way longer times and obviously much longer commitments
- the pricing is actually not that different once you run on preemptible (spot) instances. The bare bone would give us some performance boost, but the freedom is worth every penny.
As I've mentioned earlier, we're running an immutable infrastructure. That means once we need to change something we replace the whole server. Each server runs only one single service. That allows us to run smaller instances but in large quantities.
We actually did run on Softlayer. It was a nightmare. They have consistently some outage somewhere. We couldn't count on any instance to stay up. The performance was better, but you can't threat the infrastructure as a cattle and that was a huge limitation for us.
I would imagine that rolling a container to thousands hosts may take a while.
What kind of load and software do you run? In my experience, dramatically scaling out increases load and latency variance and cause all kind of problems.
When was your experience with SoftLayer? Care to elaborate? I've got some interests in them for future projects. I'd rather hear about the issues now :D
This is the benefit of running on GCP. We don't have to trouble ourselves with the headache of scaling both with the images and containers thanks to the internal tools offered by GCP.
We quit Softlayer around February. We were running bare metal and they went down so often that we essentially ended up keeping one super large server for a very insignificant service. We never gave them too much of chance so I may be too harsh.
I stopped thinking about bare metal a while ago. I'm thinking about the comparison with AWS and I see that your entire business is relying on capabilities only available on Google Cloud (which indirectly, is why I'm advising Google Cloud nowadays. It's easier for basic usage and it's unlocking a whole new set of extreme usage).
Things you couldn't have pulled off on AWS that easy: Create thousands of hosts, deploy images that fast, managed kubernetes, pubsub, worldwide AMI, multi region subnets.
It's likely that bare metal is lower cost than spot - it is for me.
It's likely that your issue is allocation of capital - you'd rather put your money into burning dollar bills into AWS versus pre-paying for 18 months of servers with capital.
I'm happy your company is doing great.
I agree GCP is a great competitor to AWS.
I think you're being close minded to building out a 20-60 cabinet data center to offload some of your workload.
It's hard to do a DC with over 20 racks and pay more than 1/2 of AWS's prices.
I understand why would think so, because you don't know our setup. Just to give you an idea, we process 16PB of data every single month. This is an ingress traffic. If we would pay for this traffic going out (egress), we would end up paying over $1M just for the traffic itself. By keeping everything in a single spot it costs us literally 0.
That said, I've tried. I reached out to many hosters, like OVH and others. They just don't have the capacity we need. 20 servers will not make a change for us. We wanted to start with 500, but it would take them 9 months.
Your $1M estimate is 3X too high at 2c/GB.
In two hours I'll save you $25k per month. In 6 months $300k/month.
Either you are growing and should be investing in cost efficiency or you run a staid lifestyle business. Which is fine, but if you aren't growing, get off expensive cloud.
I'd say it's time to forget about the infrastructure. Should optimize the software and rewrite in C++ :D
I'd worry about the time it takes to reconfigure these servers with ansible/salt. or the time it takes to kill/rebuild them from a system image.
On a side note, the 4th competitor is IBM SoftLayer. It's different enough from the top 3 (Amazon, Google, Azure) to be a thing of its own.
src: doh's HN profile.
Get 1-2 more people and move off the cloud. Maybe not for everything but to service your base traffic. You can manage it if you spend a few days learning how, in the same way you learned AWS/GCE.
I can't imagine why you'd guys need 25,000 servers. Your G bill at a minimum must be over 5 million a month. I could be wrong, but those numbers don't add up too me.
What instance type are you running?
As the instance go, we have almost everything ranging from many thousands on the micro side to hundreds on the largest side.
25,000 * $200 = 5 million a month.
It truly differs minute by minute.
Last I checked. AWS has hourly billing and Google Cloud has sub-hour billing.
If they are really bursty, they are in troubles with AWS pricing model.
> the number where it makes more economic sense to build out your own data center.
I would think that this number doesn't exist... unless you have already built datacenters for yourselves and you have a major internal expertise in that.
I am not sure what he is doing at 25k servers but you can see real world math puts this in a serious money range. Data centers can be built for less.
I wonder if the down voters on the hardware comments actually run services at scale.
You didn't see that coming, did you? :D
That gives you 25k CPU + 83TB of memory. (Of course, that's just to give a figure, they probably don't use single core instances).
Forget everything you know about AWS, it's over priced. Google is half the costs of AWS in average (down to one quarter for special cases).
I'm not sure what you mean by building your own datacenters (really, the buy some land and build from scratch?). Just having a bunch of dudes in a few locations worldwide is gonna be a multi-million dollar projects. And I'm not even talking about building, power, cooling, servers, storage, network. That's the hell of an undertaking.
We too are hockey sticking but not ready to go to metal .... yet.
Your reasons make sense. I would grow that team soon :)
The i2 comparisons are under "local SSD and scaling up". Google have local 400 GB SSD that can be attached to any instance. That's a lot more flexible and a hell lot cheaper when you have specific needs.
I grew to envy you as I read. It looks like interesting work.
AWS by contrast have demonstrated fanatically helpful support (even on business level, the cheapest), fixing issues within days of reporting, and a willingness to maintain obsolete/deprecated services (like SimpleDB) long after I'm sure they'd wish everyone had migrated away.
I have problems with Google as a whole, but I have as much faith in their cloud as I have in AWS or Digitial Ocean (we currently use all three).
The Kubernetes team is also highly engaged on Github (they also occasionally show up on the official Slack channels). The Go community suffers from a depressing abundance of hostility, intransigence and arrogance, and some of Google's projects (e.g. Protobuf, the Go project itself) reflect this, but I was delighted to find that the Kubernetes people are not like this at all. It's a very friendly, quality-focused community. I think they have to be since Kubernetes is still emerging tech that's craving adoption. Same goes for GCP -- Google doesn't hide the fact that they're aggressively courting customers to migrate.
Running anything on AWS or Google 3 years ago. That must have been hell.
However 1-3 years is a big gap in terms of capabilities and tooling.
If I go back long enough ago, there was a time without nat instances, without VPC, without region in X zone where I have customers. That's deal-breaking changes. Some things we use today to manage our servers didn't exist or didn't support a specific usage some months ago.
I'm certainly biased because I've only been using AWS since last year. I see the things we do which are marked as "added MM-YY" and it would have been hell to try to do that 3 years ago.
That's why I advise to take old criticism about capabilities with a grain of salt.
I can find alternatives for other services, but I don't want to compromise on the choice of relational database.
Note: I understand there are third-party providers for PostgreSQL, but I'd rather have Google's.
With AWS now offering both plain-old PostgreSQL and souped-up-PostgreSQL-on-Aurora , whatever Google produces needs to be great in order to compete. However, I fear they'll initially come out with something that's on par with the current MySQL support in Cloud SQL, which is just a vanilla MySQL server behind a UI/API. (For example, Cloud SQL's read-replica stuff is reportedly just MySQL binlog replication.) Better than nothing, of course.
Cloud SQL also has some annoyances (such as not supporting private IPs and the need for the Cloud SQL Proxy ) that I hope they're working on.
TBH, I don't need Aurora-level performance (although it would be a nice option in the future), I'm just happy to have a vendor-managed PostgreSQL instance with granular price points.
C'mon Google! :)
 - https://www.elephantsql.com/
 - https://aiven.io/
1) Reserved Instances: I think the pricing model for this has become very outdated since the beginning of AWS, and it is definitely becoming cumbersome (and therefore scary) to use.
2) ELB + Traffic Spikes: I have tried (unsuccessfully) to pre-scale an ELB to prepare it for the traffic it was about to receive. I tried to pre-scale for this project 3 different times, in coordination with support and without them. I could not do it. Very frustrating.
I think these are all signs of extreme growth, and a strange organization of engineering units inside of AWS. However, as OP descried.. we are much to heavily invested in AWS to consider an infrastructure shift at this point
Under high traffic, ELB will fail?
How do you pre-scale the ELB?
GCE - GCP - Google Cloud - Google Compute Engine - Google Cloud Platform ???
I can see on that blog's analytics that people are looking for various terms which result in different articles and ranking.
This needs to be unified by Google. Personally, I think I'm gonna call everything "Google Cloud".
You can look at the official products page for all the names and descriptions: https://cloud.google.com/products/
Is Google Cloud support even acceptable? Google is known for poor or no support for most services.
And my father asked support for help with his Google Home routers with a very specific question and support was absolutely phenomenal. No going through lame troubleshooting steps; it was clear we got connected with a well qualified network engineer immediately.
With AWS we do have a direct line to a Technical Account Manager which we utilize for the rare P1 (Prod down) situations. That gets things moving quickly if it wasn't already moving quickly with Support.
On gold support and P1 there is usually an engineer assigned to the ticket. In many occasions you're talking to the person fixing your problem.
GCE is not perfect and neither is their support. But they do try and even when things go sidewise they take responsibility for it, which didn't happen much with AWS, at least in our case.
Also with P1 issues on GCE most of the time an enginner assigned to the case calls me. I've even spoken directly to the head of google cloud platform via hangouts.
Check out the postmortem for the BigQuery Streaming API outage . Relevant paragraph:
"Finally, we have received feedback that our communications during the outage left a lot to be desired. We agree with this feedback. While our engineering teams launched an all-hands-on-deck to resolve this issue within minutes of its detection, we did not adequately communicate both the level-of-effort and the steady progress of diagnosis, triage and restoration happening during the incident. We clearly erred in not communicating promptly, crisply and transparently to affected customers during this incident. We will be addressing our communications — for all Google Cloud systems, not just BigQuery — as part of a separate effort, which has already been launched."
(Work on Google Cloud and was on BigQuery team in the past)
Also, Google My Business support is also pretty good.
Also, the AWS premium support fee is negotiable for some customers from what I have heard. They don't like to negotiate down, though!
In the year 2016, among maybe a hundred tickets, there was only ONCE where they could change something (an ELB issue).
And well, I'm not sure whether the fix was related to their changes or if it was just an intermittent error that happened once. Thus their implications in the only time something happened has yet to be proven.
> "Unfortunately, our infrastructure on AWS is working "
> "I learned recently that we are a profitable company, more so than I thought. Looking at the top 10 companies by revenue per employee, we’d be in the top 10."
For starters, most of the people here don't need to know anything about AWS. They just fill a [sortof] spreadsheet with a goal-team-instancetype-count-zone and they get servers up and running fully provisioned 5-15 minutes later. There's another lists with load balancers and security groups if they wanna do fancy stuff.
That will put Google Cloud seriously ahead of the competition in terms of GPU computations.
Long term: Google has a better trajectory. IMO.
Note that AWS announced something similar at reinvent 2016 but it ain't coming any time soon. Not sure about the status, is it even real anymore?
What's your source on per-month pricing for N-series?
It's not. It's utterly useless. After an entire year, where the support has never been of any help, my last action for the year 2016 was to call a meeting with everyone, subject line "we should cancel our support subscription with AWS".
It's clear that we (especially me) are way more qualified in all AWS offerings, from basics to special quirks, than they are. And they can't do anything that we can't do ourselves.
I think the support is useful when you first start out, they can answer a lot of general questions. You should subscribe for support the first year and see how it goes.
Note that some AWS managed services (e.g. RDS) can only be debugged by the support so you might be forced into support if you use these services. We don't.
Still, it really depends on what you're doing at which scale with AWS. I also found business support quite good, if you don't need such dedicated resources.
Regarding seeing issues, it's simply a matter of probabilities: The more you run in AWS the more likely it is to get problems sooner or later with one or the other service.
My advice is to start with business support and see based on the experiences with it if you have additional requirements business support doesn't meet. If yes, you might then better know if premium support might offer enough value for its money.
I don't need it for anything professional and it's quite terrible for just some amateur hosting plus the immense fees if you somehow manage to get decent traffic together.
Once my reserved instances run out I'll probably either check out GCE or DO, either seems to be a better option, though GCE seems to be more expensive.
Anyways, the console in AWS is a mess and I'm quite sure that I leaked my entire IAM settings to the internet because some switch somewhere isn't set right.
Since everything recommends to setup IAM users you'll have to setup the permissions, a procedure which I enjoyed about as much as getting my fingernails slowly removed by a glowing red iron.
Calculating any sort of sustained cost is a pain in the backplane if the total doesn't exceed three digits a month.
And lastly the login process is probably the biggest pain I've encountered across many many providers. There are atleast 4 login forms I've discovered, 2 of which I have to use and one of those always asks for a captcha with such low quality that a brain-damaged AI running on my calculator could figure it out, not mentioning never knowing if the 2FA setup was correct or maybe probably blew up somewhere because giving some feedback from the UI is plain impossible.
TL;DR Don't use AWS, anything else is better.
I see complaints all the time about how complicated the networking is, IAM, etc is. But it's far more simple dealing with VPCs then having to buy and hop onto a bunch of F5s/Brocades etc that tend to require their own network engineers on staff.
The issue always seems to be that someone tries to move a company onto AWS but they lack the experience in actually running infrastructure to that degree. If you know Change Management you can figure out Cloud Formation templates, if you don't you're likely completely lost and rolling out instances by hand. If you don't know any network engineering you're likely going to have issues with load balancers and VPCs.
You can really see the experience difference in people when you work on multiple AWS infrastructures. And that's the huge benefit to it. You can practically roll out whatever sort of infrastructure your business requires.
If you're experienced enough you can basically do every single thing in AWS/GCP without having to hire ancillary staff (network engs, etc).
I personally use Cloud Formation and create EVERYTHING by it. Load balancers, VPCs, instances, kubernetes, etc. But yeah, if you haven't touched CF before it can be like walking into a spider web of confusion - but believe me, I'd rather do that over again than take over someones hardware that has no APIs, no dashboards, no centralization. I spent years in datacenters where everything was done manually. I wouldn't ever want to go back now that I've learned to treat infrastructure as code.
But hey, if you want to run Vsphere servers, manually configure instances and databases and not do any automation via AWS you can do that as well.
That's why it's incredible.
But for running a few hobbyist/amateur servers on it, it's absolutely horrifyingly complicated, especially when I'm not really a networking engineer, I like tinkering in the backend but not to that degree.
GCP is also far more simplified than AWS if you haven't tried it. The way they do server auth via Google accounts (using gcloud) is pretty awesome and simple. You can make project-level SSH keys that are automatically placed onto every server and you can block specific servers from receiving them all through the UI.
I do greatly miss AWS's breadth of service offerings, though.
Disagree. You definitely need that staff.
Just because it's point and click UI (or script and execute terraform) instead of physical cables doesn't mean you don't need highly skilled network guys to design and configure it.
Try Digital Ocean. That should be easier for you.
Here's an article from the same blog to help you choose a cloud provider: https://thehftguy.com/2016/06/08/choosing-a-cloud-provider-a...
What helped me immensely was taking a job where they had the whole stack on AWS, and I had the chance to learn by doing.
Secondly, I signed up for this class on Udemy and that taught me the details of setting up a legit infrastructure.
I agree it's challenging at first, but once you understand what's going on it's fucking awesome. You can configure every little thing and I have my system decently optimized to cost next to nothing.
Bandwidth, CPU and local disk performance, reliability -- all on par with AWS based on my experience. Of course, DO only has VMs -- they don't have things like EBS (though some data centers now have attachable storage), any of the add-on services like S3, or ability to tweak performance by IOPS etc.
DO is a bare-bones VM provider, and it is very good for what it is. It's not a toy.
The developer experience when you just want a box on the internet to ssh into is pretty frustrating. DO is fantastic for this use case.
1) quickly bumped into project limits just doing some tests, and the fact that you have to wait until billing cycle to reset the counter was quite jarring (I presume there's a way to increase)
2) Better tooling for S3 than Google Cloud Storage - non-technical members of our team need to work with files, and there's many nice third-party tools for s3.
2) GCS has an S3 compatible API so you can use your S3 tooling: https://cloud.google.com/storage/docs/interoperability
I'll check out the GCS XML compatibility with the current tooling we use; it looks promising.
Google Cloud storage has the same API for all its tiers. So to go from Multi-Region to Nearline, you just change the bucket designation. One API, one service, one interface and set of tools.
(work on Google Cloud)
It's not a terrible system, AWS does the same thing, you just need to be aware of it when you begin new projects and deal with it before you get rolling.
I use gcloud compute copy-files all of the time and it's extremely simple. I haven't used AWS much this year so I'm not sure how far along aws-cli or third parties have have come. I do generally prefer s3, it's far more feature full than GCS right now. The UI right now blows GCS out of the water. Your file revision numbers, etc are directly accessible from the UI, things like that.
As for storage, as a developer I'm okay with command line tools and APIs, but there are some tools out there for non-developers where they can just drag and drop files without caring who the cloud provider is. However, it looks like it may be XML API compatible, so existing tools may still work.
The time I opened a new account on AWS to do some disaster recovery. I had to send Amazon ~ 30 tickets for increasing limits and wait for a week :D
But each level down you go is another level of expertise you have to employ someone for and a loss of flexibility.
The strength of the cloud has never been cost, it's always been flexibility, the ability to scale up 1000 servers in 5 minutes without any preparation or management.
Over the years, I've had plenty enough of ovh/kimsufi/hetzner/etc.
For dedicated, I favor online.net and their alternative brand scaleway. Else AWS. I haven't played with GCE yet.
Just wondering what kind of work he's doing that took him from AWS straight to GCE without stopping to think about Heroku.
Maybe you've hit the nail in the head, he's writing to disk a lot for some reason?