I've done pretty extensive work in all three major cloud providers. If you were to ask me which one I'd use for a net new project, it would be GCP -- no question. Nearly all of their services I've used have been great with a feeling that they were purposefully engineered (BigQuery, GKE, GCE, Cloud Build, Cloud Run, Firebase, GCR, Dataflow, PubSub, Data Proc, Cloud SQL, goes on and on...). Not to mention almost every service has a Cloud API, which really goes a long way towards eliminating the firewall and helps you embrace the Zero Trust/BeyondCorp model. And BigQuery. I can't express enough how amazing BigQuery is. If you're not using GCP, it's worth going multi-cloud for BigQuery alone.
But there is something to be said of AWS. Their SDKs are complete and predictable, their APIs are very fast and consistent, and AWS IAM, while having a steep learning curve, never leaves you guessing around what your principals have access to. For me, the real challenge with AWS has been introducing multiple AWS accounts. Governance just flat out sucks when you begin to scale past a handful of accounts (but it is getting better).
Azure on the other hand, has terrible consistency issues between their APIs, their SDKs are awful, and it just feels like the entire product is an extension of the MCP System Administrator persona of old, where it's expected that someone's job will be sitting in front of a UI and clicking around to get things done (the whole blade thing with their portal has to be one of the worst user experiences I've ever seen). However, I do like their Logic Apps, and Azure Policy with auto remediation (when it works as advertised -- ref API consistency and how long it takes for things to propagate through their system) has tons of potential. But they still have a ways to go before I'd consider it for my workloads.
I love the idea of GCP but I can't shake the fear that one colleague, using the company Gmail account, posts something somewhere on the internet that Google's morality-du-jour considers unacceptable, and the Google AI Killbot disables our entire account, GCP included, ruining the business, with nobody to call, nothing to do, except tweet and post on HN and hope someone at Google listens.
I don't mean this as a flamebait hyperbole, this is truly the single thing that keeps from moving our business to GCP because, like you say, that BigQuery thing tastes sweet.
How do you deal with this? Is there any sort of guarantee with GCP that I missed where they promise not to do this?
I've been using GCP since 2014 to run cocalc.com, and at some points in the past people have used it to launch attacks or mine bitcoin (we make that much more difficult now). Google did temporarily block or suspend our resources, but the experience was nothing like "nobody to call, nothing to do". Instead, Google contacts you immediately, and you message back and forth with real people who have the power to instantly fix things. In any case, in my experience the reality of being a GCP customer is not the same as the fear, uncertainty and doubt that you have.
It still appalls me how it's the norm these days for providers to first suspend service, and then ask questions (to the point that you describe it as a positive experience in an HN comment). But I think most big providers do that, ie it's not unique to GCP. And your experience is way better than my impression of Google (incl their paid services) so great to hear.
I am not a lawyer, nor your lawyer, however the terms you're looking for are the Acceptable Use Policies for both Google Workspace (née GSuite) [1] and GCP [2].
Both Workspace and GCP offer support (start at cloud.google.com/support). The included Workspace support ("Standard Support") includes phone support and a "Four hour SLO for P1 Support cases".
So if one of your employees did somehow get flagged for violating the Acceptable Use Policy, there is phone support included that would let your resolve this. You can pay more for higher levels of support with shorter response times, dedicated representatives, and so on.
Edit to add: if you're really concerned (and some folks are, I get that), I've seen some organizations make a separate domain for production. I don't love the ergonomics of switching accounts like that, but it's also not the worst thing I've seen people do.
Our service has depended on GCP/AppEngine for the last 8+ years. Our business depends on it and Google has proven to be reliable partner. We pay ~400/month for a support package and have always been able to get someone on the phone.
The only time we've ever really needed it was when one of our customer's (satellite) IPs was once flagged incorrectly as originating from Cuba and blocked because of sanctions. We reached an engineer via phone support and they were able to get the Google team responsible for their Geo IP database to correct the entry.
We've also had support engineers based locally call us to check in periodically, and I doubt we are in the top 10% of their customers by spend.
Definitely would recommend GCP without reservation.
We can easily get a Google person on the phone when we need to, so I wouldn't be terribly concerned about this scenario since we have a relationship, a contact route, and (possibly) some kind of contractual accountability.
Echoing this. GCP support is responsive and fairly effective. It’s a bit expensive, but that is a different matter.
This comes up on HN periodically, and I think folks have very mistaken assumptions about GCP based on Google’s reputation for poor or non-existent support on free consumer-facing services like gmail; GCP and Gsuite are very much serious enterprise services.
I can't find the links back, but there's been stories on HN about paid GCP accounts being blocked because of actions taken in connected GSuite accounts (or GMail? ie the mail was free but the GCP was paid? Can't recall) that Google's automation deemed malicious.
Surely the consumer stories are much more prevalent, but to my memory it's not only been that.
I’m sure someone has been taken down in this way on GCP, and if you’re on AWS you could get taken down like this too (see Parler). It’s really hard to quantify the risk here but my priors are that if you are a normal business that is not doing anything illegal then the risk is vanishingly small, and that GCP is not worse than AWS.
I think that building on a cloud platform (or other SaaS like an Oracle DB) and getting your license cost increased is a much bigger business risk.
Either way building your system to be easy to lift and shift to another provider (or on prem) has merit to hedge against these risks, but it also slows you down.
> Either way building your system to be easy to lift and shift to another provider (or on prem) has merit to hedge against these risks, but it also slows you down.
Not necessarily. Kubernetes is a great way to hedge against that. You can write fully cloud-native autoscaling apps that have minimal dependencies on the hosting environment.
I agree, and chose k8s for this reason, but it’s definitely more work to run your own message queue vs. using SQS etc, so I don’t think it solves all of the friction here.
Agreed, I've lost both a Gmail account and a YT account to Google's "acceptable bug rate" coupled with nearly fully automated support.
In both cases, the problem arose not from my behavior but from their merging, splitting and changing their product offerings. As dependent as I am on Google personally, I couldn't in good conscience tie a business to their whims if there were any reasonable alternative.
In terms of search and video as marketing channels, they've pretty much got a monopoly. In cloud services, there's more choice.
We don't use GCP a lot, but just a note: it's possible to login to GCP using your SSO rather than gmail addresses. I guess it would limit the risk you are worried about.
You shouldn't share one single company account across the team. Everybody should have their own account, of course on the company domain, and use that one.
Advantages
1. You know who does what.
2. If Google bans an account the others keep working.
3. Permissions per employee, because not everyone need to to everything.
4. If an account is compromised, you ban the account.
5. When an employee leaves, you ban the account instead of changing the password.
6. N people sharing a common password on an account nobody has particular responsibility for, ouch.
> You shouldn't share one single company account across the team
No one said anything about using a single account.
> Everybody should have their own account
This is moot if your company uses GSuite. The concern is that if a GSuite account is linked to a GCP account, and suspicious activity happens on Google Apps that it can result in interruption to GCP (and vice versa).
I definitely use a separate google account for all of this. The support for it is pretty good and I don't see why one would ever use a personal account.
I would still be more worried about Google canning the service you're using.
The "shout at it on HN until it gets fixed" customer service is awful, but I don't think they're going to hurt their bottom line like that - and besides, if you have (for the sake of argument) someone spouting wrongthink on the company account you're really playing with fire in the first place.
There are no proof / example whatsoever that GCP is related to other Google products. I've never heard or seen my GCP account was shutdown because Youtube / gmail / google account etc ...
Azure UI, Flows almost everything feels half baked. They have this weird segue like UX in some places where trying to go back to a previous screen feels like moving a big picture around, it annoys me so much, The best thing that we found with Azure was using Azure AD, works as advertised and the SSO integration was smooth.
I still prefer AWS services like S3 because they're predictable and I've yet to run into an issue which AWS support wasn't able to solve.
I completely second this.
We publish VM solutions in all the three marketplaces and find GCP the best as a partner and as a customer. The VMs spin in seconds , cost is the lowest for most of the common services and the web console is fast and does not have the clutter like that of AWS or Azure.
Ever since they got Thomas Kurien as new CEO, there is more focus on the marketing and partnering outside of US & Europe. The recent deal with ARAMCO to setup data center in KSA reflects this.
From a customer base perspective, our experience is that GCP has more developer & startup focused customer base, Azure has more enterprise crowd where as AWS is a good mix of every one.
I can kinda agree on GCP with one exception: Dataflow. I have no idea what the future holds for it.
It is a managed Apache Beam service and is very useful for certain scenarios (like "hey, we have a million incoming PubSub messages that we need to transform into a dozen different branching streams of data"). It looks like even BigQuery actually transforms SQL statements into a bunch of Dataflow jobs.
But...
But...
- Minor version updates to Google Dataflow SDK once every couple of months while deprecating most other minor versions? Check.
- No visible contributions to Apache BEAM itself? Check. In 2021 I still don't know if I can use any Java versions beyond Java 8 to develop for and run in Dataflow. And Google is arguably one of the biggest users of Apache BEAm, and definitely a user with the largest pile of money to throw at the problem.
- They've recently sent out a questionnaire about Dataflow to some of their customers that feels like a "hey, we're definitely considering deprecating this, we're gauging the potential impact"
Disclosure: I work on Google Cloud (and with the Dataflow folks on occasion).
Sorry, if you're getting mixed messages. Dataflow is here to stay. Google, Spotify, Twitter, and many other large customers heavily depend on it. Twitter moved their entire ad revenue pipeline to it [1] last year.
A quick perusal though of https://github.com/apache/beam/commits/master shows decent Googler activity. Can you highlight where you were looking for "no visible contributions"? (Maybe we do a bad job of being visible?).
Interesting comment, definitely want to hear more. I have concerns about Beam/Dataflow, but they seem different to yours.
The dataflow product seems to run older versions of Apache Beam just fine, so minor deprecations don’t seem like an issue in practice, but maybe I’m mistaken.
“No visible contributions to Apache BEAM itself”. I don’t think this is true, I’m a contributor and somewhat active on the developer mailing list, it seems the majority of the contributions these days come from google employees.
If the questionnaire you’re referring to was the paid Apache Beam survey, I participated and definitely didn’t get the impression that they were considering deprecating the service. It was much more focused on how they can improve docs, examples, and help developers use it.
Now, I think the project is too ambitious even for google. They don’t need to support Spark/Dataflow/Flink on three different languages (java/python/go) imo. I’m also frustrated with some of the bugs that slip through.
The fact that there is no back pressure support for a streaming framework is such a google thing to do: why worry about back pressure if you can just tell another team to increase their throughput for downstream sinks? /s
Dataflow does seem to be one of GCP’s most popular services (spotify and twitter are both users now) so I would guess it is here to stay in some form.
And GCP's Director of Outbound Product Management saying things like, "I’ve been thinking about the cool ways @GCPcloud reinvented public cloud... Sometimes you have to leave the past behind, and we haven’t hesitated to re:tire services and features. HIYOOOOO! We’re getting better though :)" doesn't really inspire any confidence either.
I mainly use work with Azure, but have worked with AWS too, and not yet with GCP. I'm a bit fan of Azure - the service on offer and the tooling.
Azure's UI is unusual, polarising even - you either love it or hate it, but I'm more on the "love it" side. When it was first released some years ago, it had some perf problems, but it got over those long ago. In use, I find it to be a really good UX - not sure I ever recall swearing at it because it was in my way :) I like that it has themes (e.g. dark mode), and for the most part, I also find it far more consistent than the AWS UI, which often feels like it's been cobbled together by several different teams. I also find the AWS UI feels pretty "clunky" and dated. And in terms of cost management, Azure is way more transparent and useful than AWS.
Regarding SDKs, not sure if you were really thinking of a single service in particular or more generally, but assuming the latter, I mostly disagree about Azure's SDKs. Some of the SDKs have had too much churn for my liking, and the docs don't always keep pace with those changes. In general though, I find them really good.
I'm not a huge fan of ARM templates for anything but the simplest deployments, but they get the job done. Bicep[0] shows MS are improving things, and there are a couple of nice OSS alternatives now, like Farmer[1].
I'm not a big fan of PowerShell in any form, but it's cross-platform, and I use it on occasion for Azure automation, and again it gets the job done without issues.
Azure CLI, I really like - it's OSS, cross-platform, and covers pretty much all services. Extensions/plugins mean that even new services are covered quickly. The syntax and commands are very consistent (there are a few exceptions, of course), and being able to output results in either JSON or CSV is great for parsing from the likes of Bash scripts. Also like the way you can filter and project output, without the need for something like jq.
Don't recall a single instance of something I could do in the UI but not via automation; aside from monitoring, cost reporting and quickly deploying throw-away stuff during dev, I don't feel compelled to use the UI.
My mine gripe with GCP is that most projects don't have any examples or support how to use it on GCP compared to the always AWS examples. That's always a bit of a bummer
Working with multiple AWS accounts is definitely a pain, but it definitely should get better in the future given that it's a super important use case for internal services.
Internally we have best practices that dictate to split services into their own accounts, with one account per stage (i.e. beta, gamma, prod).
I'd like to also mention that CDK also makes working with cross-account/cross-stack resources a lot easier.
GCP's potential for product abandonment and their terrible customer support are their primary weaknesses. And then there's the issues well-described in [1].
I’ve used GCP and I quite like it. However I would not recommend betting big on it. GCP as an extension of google has the same problems, they are way too big to care about small companies. They change their minds every now and then without upfront notice and kill things or deprecate them. Support is hard to reach.
In any case, think for your own self what you care about. There are tradeoffs of each cloud providers. I’ve found AWS provided me with a human touch and cares about my problems. Never quite used Azure (their portal is too complex).
In my mind, Google’s gonna be Google. They value automation and algorithms over human touch. It’s in their DNA. It’s what their hire for.
I can only agree - Azure is a mess in a lot of ways when you try to automate it as an IaaS.
They seem to have some sort of pattern issues in regards to UI and async API operations, and it bites you often enough to drive you crazy when automating.
I've had API calls fail because an Azure portal tab was open in my browser obviously locking or otherwise interfering with API operations until the I close, what looks like, a completely idle tab.
Also... dog slow provisioning compared to the "other two".
I also have significant experience in all 3 and I couldn’t disagree more. GCP support & documentation alone is a dramatic reason to avoid GCP. GCS CLI utilities are supposed to be S3 API-compatible, but they are not. GCP keyfile-based access is a horrid anti-pattern, but the rules for human IAM user vs service account vs impersonation are not uniform across all products (eg, if you need developers to have ad hoc non-console access to both GCE VMs and Dataproc clusters, you have to manage two very different approaches to identity-based access).
GCP’s region-level SLA are poor for most products and over a window of a few years, they don’t actually meet their region SLAs. GCP has all kinds of nasty legalese about “beta” features that aren’t supported by the SLAs, and if you use them, you forfeit your right to claim credits after SLA-violating outages. For GKE in particular, Google’s rules basically exclude every aspect of Kubernetes you need to actually use it in production, which is a blatant attempt to force users into Anthos.
In machine learning in particular, GCP has horrible offerings that are massively over-priced and/or are 100% hype-driven (TPUs are a good example, but also things like running Kubeflow or Feast).
Google Cloud Functions and Google Cloud Run have such severe limitations to resource sizing, especially memory, that they are irrelevant, whereas by comparison Fargate is excellent for ML workloads. There really is no equivalent in GCP, since Cloud Run can’t handle large Docker containers needing high RAM, so you’ll just be rerouted to GKE where because of the SLA legalese you can’t actually use any of the tools you want. And then on top of this, configuring any type of hybrid open internet / internal data center service with Cloud Functions or Cloud Run is miserable. You need a full Networking team just solely to manage Cloud Function or Cloud Run service access, it is absolutely nowhere close to self-service for normal backend teams.
GCP is a miserable, miserable choice for cloud vendor. It is typically chosen solely due to being cheap in the short term and allowing bulk deals on GSuite, Ads credits and other deal sweeteners. It’s so stupid to choose GCP for these short-term deals, because Google absolutely will lock you in and raise prices for their garbage tools and poor customer service.
For my money both Azure and AWS are still lightyears ahead of GCP and I would gladly pay a premium to use either just to avoid GCP.
I think you are probably confused. There are many beta features in Kubernetes and they are enabled by default. For a long time, very critical features could remain beta for years, such as all of Ingress and all of CronJob.
One of the big issues with the GKE SLA is that your organization must only consume Kubernetes from the Stable channel, but given the large amount of enabled-by-default, critical beta features, many (probably most) production deployments of Kubernetes rely on non-stable channels intentionally for the sake of critical beta features that have been de facto production features. In my company for example, we must run a slightly older version of Kubernetes and upgrades are very slow, so we are way behind the stable channel with no way to upgrade fast unless the stable channel supports all sorts of enabled-by-default beta features and older versions. So we could never run a hybrid cloud with GKE, it would violate the SLA restrictions from first principles. This has created a nasty, painful rift in my org between the on-prem Kubernetes and the cloud (essentially useless) GKE Kubernetes.
Beyond this there are other features that are critical and uncovered, like multi-region ingress for example. We operate some very very large data ingestion services for customers and we absolutely need a higher SLA uptime on it than what a single region offers in GCP. So we have to operate multi-region ingress, but all of the non-Anthos solutions are no longer supported by GCP, and void out the individual region SLAs. It’s madness.
On top of all this, Google does not actually publish clear lists of features that are or are not covered by the SLA. The way it’s worded relies solely on the Kubernetes alpha / beta / GA channels, but nothing actually ties Google legally to that. They can arbitrarily define the SLA terms to mean whatever they want it to mean at any time. While you likely can’t avoid a cloud provider with that freedom, at least you could expect them to actually publish and document it.
> I’ve also always been able to assign permissions to a user, group, or service account. When have you not been able to do so?
Please check again in my comment. I mentioned specific examples (user-based, not service account-based, workflows in Dataproc, for example), where it’s not possible in GCP, as in the product itself disallows it. It’s not an issue of me or you or anyone being able to create IAM policy or service accounts. It’s an issue that different products within GCP fundamentally disallow some auth workflows (like Dataproc cluster creation being fundamentally disallowed for user-based auth workflows) that then force you to manage multiple different auth flow patterns even within the same user workflow (for example, user-based auth flows for GCE VMs but impersonating service accounts for the exact same steps for a Dataproc cluster), leading to much more overhead, more inscrutable errors, more round trips through security approval. The issue is the poorness of the product design, not some general inability for a user to figure out a service account.
I was assuming you meant GKE beta features, not k8s beta, since the latter is ridiculous, can you point me to the bit of the SLA you’re referring to here?
Not a GCP user myself, but I've read a lot of stories about GCP deprecating products at a very fast rates (as google does with the general public).
I don't know if I'd be okay with this? Once a project is done I wouldn't want to do any re-engineering that's not strictly needed (and outside general maintenance).
You're understandably thinking of Google's track record with many of its other products that were discontinued. However, most of those products were "free" to the user, and as such Google had no real obligation to its customers.
GCP is a very different kind of product. Customers pay for it directly, and often their business depends on it. It's covered by all sorts of contractual agreements, including service level agreements. It's backed by a great deal of physical hardware around the world that Google wouldn't otherwise need. Its revenue is growing fast, currently over $12 billion/year. That's revenue from customers paying it directly. In Q2 2020 it had 43% growth, even though Alphabet had its first quarterly revenue drop.
It's not the kind of thing they're going to dump on a whim, and if they did decide to exit that space, it would most likely be by letting another company acquire it, since it would be hugely expensive to just drop it.
Tell that to folks building on the paid maps API's.
I don't think AWS has every really screwed anyone. My simpleDB kept running long after I even remembered it used simpledb! I can't even remember a price increase, much less a 10X gotcha one with no grandfathering! Ouch!
Google will kill your account, change pricing etc much more commonly than AWS. The nightmare of google+ and being forced to jam a profile onto everything - they give two sh** about user stability / happiness on some things if the command comes down to blow the house up which it seems to periodically.
Google Maps API is still not a comparable kind of business. And Google+ is completely irrelevant.
> I don't think AWS has every really screwed anyone.
The apples-to-apples comparison is not to AWS, but to Amazon. There are plenty of complaints against Amazon for ways they've screwed shoppers, book authors, publishers, etc.
If you want to compare Google Cloud to AWS, you won't have nearly as much to complain about.
It's perfectly reasonable to say you don't want to deal with or depend on Google because they've screwed you in the past. But the claim that there's a serious risk that GCP will suddenly be discontinued is just silly.
>But when we looked at performance on the 16-core benchmark, none of the winning machines ran Intel processors. In fact, the AWS custom-built Graviton2 Processor, which uses a 64-bit ARM architecture, edged out GCP and Azure’s winning machines, both of which ran AMD processors.
Google/GCP plays catch-up, and while GCP has been coming to the level of AWS on services, the AWS already gets new CPU platform, and Google doesn't have ARM thus leaving GCP several years behind - this will become pretty clear in the coming years. With ARM beating x86 on performance and power AWS can undercut - typical AMZN/Bezos - the GCP on price or use the extra margin to expand and finance even more R&D of AWS. I also think that Apple with M1 will become a huge cloud player, at least for various mobile apps, etc. ie. encroaching into GCP market, while not necessarily into enterprise market of AWS nor Azure. That way the GCP will be squeezed from all sides.
>Its revenue is growing fast, currently over $12 billion/year.
The tough question here for GCP is whether that revenue is supporting the R&D to match the R&D of AWS and Azure (and probably Apple in the coming years), especially the investment required to get their own ARM. If i remember correctly Google gave GCP till 2023 to become a leader comparable to AWS or something like this. I think they wouldn't reach the goal, and as result Google will start counting money when it comes to GCP and will just drop GCP from the top priorities, and the GCP will just linger lagging behind more and more.
You're addressing a different question from what was being discussed. Even if GCP "lingers lagging behind more and more," that gives users plenty of opportunity to move, they won't suddenly find themselves without a provider.
Given Google's stated commitment to their cloud business (see e.g. https://www.cnbc.com/2020/10/29/google-wants-to-show-how-ser... ), even if their efforts fail it's likely to take many years before anything like that is remotely an issue for customers. "But we might need to migrate in 5-10 years" is not a very persuasive argument for most businesses, for good reasons. Note that GCP is already 13 years old.
I do advise people to keep an eye on their dependencies on a particular provider, since moving providers can be needed for all sorts of reasons, not just the death of a provider. Luckily there are all sorts of tools and approaches to doing this. I've been involved in migrations between all the major providers, and concern about the provider's future wasn't an issue in any of those cases.
>gives users plenty of opportunity to move, they won't suddenly find themselves without a provider.
in some sense what i describe is worse than just sudden (i.e. a 1 year like notice which is pretty sudden in the enterprise time scales) death of a provider as the customers will be "slowly boiled like that frog" not feeling urgency to move at any given time while the platform will be falling behind and into more neglect.
>concern about the provider's future
it is concerns about provider's ability and, most important, willingness to support (i.e. to invest in) the state-of-the-art of the platform in the years to come. There is no such doubts about AWS nor Azure. It is too unfortunate for GCP that ARM popped these couple of years - Google has to decide right now and that is already pretty late whether they are going to invest a bunch of billions in having [competitive] ARM in the GCP. It is not just a matter of getting an ARM license and printing the chips, it is whole stack optimization to get those "40% faster at 20% cheaper" (as claimed by AMZN and with such improvements even native platforms of large slow BigCo-s like ours will probably move to support ARM while x86 will become more like PowerPC "supported too"). I think the Google management wouldn't risk venturing into ARM, at least not to the scale needed.
The big problem for me is trust. I don’t care what the feature set or performance is; I don’t trust Google enough to bet a business on it. And I’m not even worried about Google being malicious; I’m worried about them being mercurial and changing/removing things I need without warning.
There’s a difference between consumer stuff and enterprise stuff with contracts. Grown up services like GSuite, AppEngine, etc have been alive and well for many years and aren’t going anywhere.
It makes sense to do risk assessment and avoidance where there is value. General emotional stuff isn’t productive.
Building your entire business on AWS Lambda, for example, is a risk you need to understand. In the past, I worked on a team that chose to put a critical business process on a IBM POWER/Aix platform... in our case we went through the options, identified risks/opportunities for standardizing a long lived process on a sole source platform, and made a decision. It was a decision that served us for about a decade before we moved on, so it was very successful.
Granted, you offered a nuanced reply for someone building something on these clouds. However, when you're dealing with enterprise that are directly competing with the main cloud providers on other verticals, they are not being lazy or annoying, they are being careful, and not without a reason, and avoid using their competitor's cloud infrastructure.
Some of our clients have no problems using these cloud providers. Others wouldn't go through them because that would leak information.
If I'm not mistaken, Google used to buy datacenters in stealth mode, under different companies created for that purpose, to avoid getting on Microsoft's radar and keeping how successful search was from them.
> Google used to buy datacenters in stealth mode, under different companies created for that purpose, to avoid getting on Microsoft's radar and keeping how successful search was from them.
I believe, Google for a long time (and still does?) thought their infrastructure was the secret sauce, and that might have impeded them from competing with AWS in the early years (despite having all the pieces in place already)?
There are apprehensions like these all over the place to protect their user base.
This matters to us because we're building our machine learning platform[0] and we want flexibility as opposed to using their machine learning products, because each considers themselves "the only cloud". Therefore, we're compelled to do "multi-cloud"[sigh], because we want to be able to train models on X, deploy on Y, and have data on Z.
It's funny to watch, though, as if I recall correctly, there was an article on one of these companies where saying "multi-cloud" was blasphematory.
Eh, even corporate GSuite changes from time-to-time. Google products and services are less stable than AWS. I’m not talking about uptime, necessarily, but for all of AWS half-baked ills you know that half-baked service is going to be around forever.
I don’t think it’s a tired argument: Google is much more likely to cut bait than Amazon.
If my cloud provider said, “hey we released a half baked service and we’re deprecating it” at least that would give a solid reason to fix some obvious technical debt. Otherwise you may just be band-aiding technical debt for years.
Anecdotally, I’m thinking of ElasticSearch Service around 2017. We were pushing almost a terabyte an hour into ESS.
We ended up tacking on SearchGuard, ElastiAlert, some SSO proxy, and about 3-4 other products, when what we wanted was X-Pack.
It took a lot of toil before we convinced the org to go permit a migration off of ESS.
You might accept it, but I wouldn’t. Telling product owners that their timelines have been pushed because our cloud provider is removing something we depend on again is not a conversation I want to have.
Large enterprises are complicated beasts, and they value stability a lot. Even removing a single feature might cause dozens of teams to drop everything in order to go and fix the mess that someone else made. Why risk it? Especially if the alternative is someone who will wait to release a feature until it’s more than half baked and support it for a decade or more?
I think purging some of the lower quality services from a catalog of over 175 (AWS) services would be a net positive, because orgs wouldnt come along and build on top of sore ice that may not be as extensible as you need it two quarters from now.
I disagree. Perhaps AWS should be more selective in what they choose to launch, but once a service is launched you should have a very compelling reason to deprecate it. This creates a positive feedback loop that AWS is safe —- in the sense that if you build on top of their services you know that they will continue to be around.
It makes the AWS dashboard a bit more cluttered, but if you use one of AWS’ half-baked services you know it will be around for as long as you want. Maybe you outgrow it; that’s fine. You can opt to move off, but AWS won’t force your hand.
There’s a ton of value in that stability. Your MySQL server running on EC2 still works today if you haven’t migrated to RDS. And if you migrate to RDS, you can be confident that it will be around until well after you’ve moved to your next job.
Ironically, this is related to Golang’s strong 1.X backwards compatibility guarantee. Knowing that what works today will work tomorrow has tremendous value. You don’t have to wake up and migrate everything from vendor to modules. You can build on ECS today and have confidence ECS will be around tomorrow.
It’s human trust. It’s a basic thing in all business dealings. Google has built this reputation themselves over the years by constantly sun setting new products. I mean there is a website showcasing it https://killedbygoogle.com. Google support is known for being notoriously bad and unhelpful. They are the first targeted tech giant going through new antitrust suits and who knows the outcome of them. Why would I trust a business I’d hope could be around for 10 to N years when AWS/Azure are around? Yes, the same could be said for them but the basic human element of trust is on their side.
Why do you trust AWS when Amazon kills failed non-AWS products left and right? I even remember Bezos bragging about this being a core part of the company culture. Why do you trust Azure when Microsoft's graveyard of non-Azure graveyards has long since overflowed?
For me, it's because Amazon killing their failed phone, their failed Yahoo Answers clone, their failed search engine, or their failed paypal clone says nothing about their commitment to their phenomenally successful $45 billion / year business with high margins and 29% growth.
Google Cloud is a $14 billion / year business growing at 45% / year. It is a smaller business than AWS, sure. But not by an order of magnitude. It's basically 3.5 years behind Amazon on the growth curve. Were you afraid in 2017 that AWS would be killed due to being too small a business?
>"Why do you trust AWS when Amazon kills failed non-AWS products left and right? I even remember Bezos bragging about this being a core part of the company culture."
I'd be really interested in this quote and its context. At AWS, things are always iterated upon and tested. But there's an incredible emphasis on two-way door decisions. Basically, don't make big decisions that can't be reversed if things go poorly. And once you've released something customer-facing (especially something as important as a new AWS service), you're locked in for the foreseeable future.
There are still CloudHSM v1 HSMs running out there in the wild. CloudHSM v2 was released in 2017. And CloudHSM v1 hasn't been available since at least 2019.
There are still EC2 instances running in EC2 classic--that is, EC2 before VPC was introduced.
> "Failure comes part and parcel with invention," Bezos wrote in his 2013 letter to shareholders. "It's not optional. We ... believe in failing early and iterating until we get it right." Three years later, he added, "Amazon is the best place in the world to fail."
Or:
> “As a company grows, everything needs to scale, including the size of your failed experiments. If the size of your failures isn’t growing, you’re not going to be inventing at a size that can actually move the needle,”
Now, I'm sure that doesn't apply to AWS, and your culture is totally different. You're an enterprise product with real contracts, real commitments, and making real money. You're not the Groupon clone, online pharmacy or whatever that grocery service *Amazon killed today* was.
But if we're accepting that Amazon's general trigger-happiness when it comes to failed consumer products doesn't extend to AWS, why does Google Cloud not get the same treatment?
I beg to differ. We've run our business on AppEngine for 8+ years. We've never been forced to migrate, and when we've chosen to upgrade to newer runtimes like Java8, the transition was smooth.
Upgrading to Java11 will indeed be a big change, but Java8, with memcache etc, is still very much supported.
I led a team that wrote an app app engine using python 2. When Google upgraded to python 3 they completely rewrote entire libraries, like the one for datastore, so we were forced to either rewrite most of our app or to stay on a deprecated platform that stopped getting new updates.
Google provides a list of App Engine features they've removed. [1] Beyond that there is also a somewhat undocumented phase of working-but-forgotten. Classic App Engine features like the datastore, memcache, Users API, Python 2, Go 1.11 etc go under this category. These are things that still work, but get no updates. Instead you get constant e-mails and other notifications about how you should redesign your app to work with the 2nd generation App Engine system. Which means Firestore (in datastore mode) instead of datastore. Memorystore instead of memcache. Your own solution instead of the Users API etc.
Yes, this. Thank you for explaining it better than I did. I loved this with a real production python 2 app, and my take-away, be very careful building against proprietary systems like classic app engine and all of its services.
No, Amazon hasn't. If a service is poorly-priced and/or seldom used, they'll rebuild the service from scratch to offer better pricing and features. See Macie as an example.
Most customers do not plan to repeatedly break their contract and makes no effort to fix it. Parler’s refusal to follow a legal agreement is only a concern to a small handful of other sites.
You’re welcome to read the legal filings yourself so you can make less uninformed comments in the future. Here’s an example of the messages AWS contacted Parler about in November:
The entire filing makes it clear that they were given considerable time and refused to moderate content which violated the terms of service which they had agreed to follow:
Maybe you should actually read your own links. They don't say what you think they say. Not to mention the source is Amazon. Of course Amazon thinks Amazon is in the right.
In a span of seven weeks, on a platform with 8 million users, Amazon found 100 examples of objectionable content.
How many examples of objectionable content do you think I can find on Reddit, Twitter, or Facebook right now?
It's not a question of whether Parler had offensive content on its site. It's a question of Amazon holding Parler to terms that they don't hold anyone else to, including Parler rivals. Parler had moderation tools, were moderating content, and had a clear Terms of Service that outlined the type of content that was not allowed.
Let us travel back in time to 2009. It's three years after Twitter was founded. How robust and effective do you think the Twitter moderation was back at that time?
Even today, in 2021, Twitter is absolutely FULL of illegal content and hate speech. When do you think Amazon will eject Twitter from AWS? Should be any minute now. Any minute.
> How many examples of objectionable content do you think I can find on Reddit, Twitter, or Facebook right now?
This is why I suggested reading the filings: Parler is not unique for having user-hosted content but, unlike Twitter and Facebook, refusing to accept responsibility for managing it – those services are far from perfect but they don’t try to pretend volunteer moderation is enough, either.
If you read the filing note that the 100 examples were a representative sample and additional examples were provided.
The key point to look at is this:
“On January 8 and 9, AWS also spoke with Parler executives about its content
moderation policies, processes, and tools, and emphasized that Parler’s current approach failed to address Parler’s duty to promptly identify and remove content that threatened or encouraged violence. Id. In response, Parler outlined additional, reactive steps that would rely almost exclusively on ‘volunteers.’”
That’s after having months to develop a serious plan, and after an insurrection involving many users of their service. Beyond the obvious violations of their contract, at that point AWS would reasonably worry that not enforcing their ToS could invite legal claims that their continued non-enforcement constituted support. The direct costs of that and indirect risk to other contracts – imagine, for example, Congress prohibiting federal IT procurements from any company connected to the coup attempt — are much greater than Parler’s rounding-error portion of AWS’ customer base.
I work for a huge international bank. We've had some of our non-production infra on GCP and it's been a very bad experience for us; mainly reliability-wise. We're moving away from it entirely.
I put a production AAA game on it (with horrible constraints for a cloud provider, like highly stateful connections and hard requirement to honour fsync in the underlying hardware)- and we had a truly great experience.
Can you go into more detail? Not saying you’re wrong of course, but your experience is massively contra to my own, and like I said, we have production on there with little to no issues.
Sarcasm aside, at $COMPANY we have AWS reps integrated into our Slack, and they work hand in hand with our engineers quite often. Even for things as low down as “why is this query so slow on AuroraDB?” They even hop into war rooms for big events in case we need immediate assistance during high visibility outages.
It’s hard to overstate how important this level of close support is for an enterprise that is literally betting the farm on a cloud provider. The idea of counting on Google’s historical level of support is an absolute non-starter for us.
The role you describe is a technical account manager and Google has hired many. I’m a Google cloud partner and every project I’ve worked on has had multiple TAMS doing exactly what you describe, working hand in hand with customers, connecting the customer to product engineers, to support, navigating Black Friday and Christmas, etc...
To me, Google has always felt like the Stack Overflow of enterprise companies. Our questions to Google have been met with "You're trying to do it this way, but what you really want to do is this". Or worse, they'll send us a link to a document which we've already read ourselves and then just go silent.
They've never tried to deeply understand our use cases or business at all. Hopefully they turn this around but IMO, Google is absolutely horrible at B2B, they're just not currently set up to do it correctly.
I had this experience too around 2015-2017, but I've noticed a big difference between 2018 and now. They've hired a large number of people whose job is primarily focused on understanding the customer's business and use case. Most of these people have empathy in my experience.
In the large enterprise migrations I've worked on the past 2 years, I haven't heard a Google employee or partner say, "you're doing it wrong." Sometimes there is a response along the lines of, "we don't recommend doing it that way for the following specific reasons. For your consideration, here are alternatives we recommend for your use case. We've filed your use case as a feature request with product."
Additionally, I've seen issues which block a migration to GCP get escalated and fixed quickly. Issues which in ~2015 might have garnered a, "you're doing it wrong" response.
I've noticed a real shift in attitude and execution. There is genuine empathy and an attempt to understand how enterprise customers run their workloads in the cloud, multi-cloud and hybrid-cloud.
A while back before AWS was so popular their support at lowest tier developer accounts was totally ridiculous. I'd messed something up, out of their scope to fix (they could have just said nothing wrong with X). Instead some AWS engineer with my permission jumped in and got application side to fix things up and walk me through the configs etc.
What!! So I paid $50 for one month for support, and got a full on support expert who'd bill (at least in my world) $250+/hr. Honestly, the support cost was maybe even LOWER, and I'd just signed up for support to submit my ticket.
Google does not care. You GSuite calendar is not accessible on google home devices etc etc despite TONS of requests from paying customers. That same calendar integrates easily with ALEXA from amazon. Huh? Someone is paying attention, and it's not google.
I do like project based permission / approach in GCP. I like a lot of other things there too. But I've had stuff running for years on AWS without issue - so the trust with AWS keeping things going mostly is there.
this exactly. Microsoft has known it for years: to be successful with enterprise accounts, people matter. Account managers, sales/support engineers, direct access to product teams (sometimes). I've never worked with Amazon/AWS but from what I've heard anecdotally, they are of the same mindset. (It would be consistent with Bezos' customer service mantra so perhaps unsurprising).
I know of only 2 colleagues who've looked at GCP. Both are medium-large financial institutions. Both said they had very little human-to-human contact with Google representatives. In one case they chose AWS instead; the other is still evaluating GCP.
It's interesting that other posts in the thread are complimentary about the quality of the GCP products. I can believe that; Google has built its fair share of impressive technology. But I see no evidence of an enterprise supplier I could trust: one that would be there to help out when the world went belly up. Given that, it doesn't matter how good their products are.
I'd actually like to see GCP being successful. As well as bringing some competition and diversity to the market, it would give Google a more honest and above-board revenue stream than ads. I don't see that happening though, at least not without a change in leadership.
Long-time GCP user here, from a big conservative enterprise. GCP is a wonderful technical platform, but if I ran an oil or coal or tobacco or weapons company, or any other business susceptible to future boycotts and political activism, I would only adopt GCP in a cloud-neutral way. Google employees have disproportionate power over what gets done, versus for example AWS where customers seem to have a lot of power.
I know it's easy for me to say, but shouldn't nobody be building their business to be critically dependent on some other business, regardless of whether cloud provider or other class of service? Especially when the balance of power clearly favours one of the parties, this seams like a bad idea if it can be avoided. I don't know how hard it is to develop architectures which run on several cloud services. If it is possible, shouldn't this be standard practice?
> I don't know how hard it is to develop architectures which run on several cloud services. If it is possible, shouldn't this be standard practice?
I think the answer to this question depends on a multitude of factors. For example, nowadays you can use container technologies to create images of your software, which will run on any bit of infrastructure that supports them, be it AWS, GCP, Azure, or even your on-prem servers with something like Rancher or even Docker Swarm running on it. However, creating software like that needs some special thought put into it, this site covering some of those aspects: https://12factor.net/
And while this will make your code more reproducible and migrating between different clouds more easy (or even running on multiple clouds simultaneously), it'll do so at the expense of making development slower and a bit harder for you. For some, this will be worth it, while for others it'll be less so. There are definitely valid criticisms of container technologies and some of the aspects they don't handle all that well yet (Kubernetes being overcomplicated for some projects, whereas Docker Swarm isn't "trendy" or in active development, then there's managing storage and needing to work with either network file systems, or distributed file systems, or even using bind mounts even though they're considered bad practices, then there's routing and the complexity of service meshes etc.).
Of course, there are also people who really don't want to think about infrastructure that much and just want their apps to run in a semi-managed manner, like Heroku does, or perhaps just want to use one of the cloud vendors' managed database or messaging system offerings. Not everyone has a large amount of resources to invest in engineering and running infrastructure.
Can't put my finger on why, but this report comes off as almost pure marketing and not very substantive (say compared to the Backblaze reports). Maybe it's because I had to give an email (mailinator) with no option to opt-out of marketing emails to read it. Maybe it's because it seemed to try to paint all three as winners.
We run CRDB on baremetal. I'd love to see how that stacks up - but I guess their managed offering is a major money maker.
It's a shame because there's clearly a lot of effort put into it and I love the work they're doing (and how they do it).
I will say that, as a non-cloud-believer, I'm much happier dealing with Google Cloud than AWS. It's straightforward and doesn't require nearly as much vendor-specific knowledge. The console is more user-friendly, and things are usually cheaper and faster (but still so much more expensive and slower than just using a dedicated host).
> Maybe it’s because it seemed to try to paint all three as winners.
Compared to the rest of the market, AWS + Azure + GCP are all winners. They’re all huge, all growing, and all outpacing the growth of traditional non-“cloud” hosting providers by a long shot. They’re also all greatly beating out any other cloud providers who aren’t them, e.g. IBM Cloud (nee Softlayer.)
They’re essentially dividing up the hosting market together, like any good cabal.
Compared to the growth all three of the big cloud providers are experiencing, the relative growth margins they use to claim that one of them is “the biggest” are basically noise.
To put it another way: I’d much rather invest in all three of them, than in just one of them.
Take the telecom companies (AT&T, Telefonica, Tata, China Unicom, and on and on and on and on (these companies have hundreds of data centers each)). Then take the wholesalers (Equinix, Digital Realty, etc) - some of who count some cloud vendors as customers. Then take tens of thousand of collocation and dedicated providers that own their own data centers (PhoenixNap, HE, OVH, Hetzner, Softlayer, ...). Then take the VPS providers (Linode, DO, Vultr, ....). Then take the shared hosting (GoDaddy, ..). Then take the government agencies and companies that have their own private data centers (e.g, banks).
Cloud vendors are growing, but it's still a very small part of the market. What they're really good at is sales and marketing (and making much better margins.)
I wouldn't touch this thing purely based on the name - CockroachDB. Yes, its unfair but they're absolutely asking for it. I am going to be dealing with this all day, I don't want to develop some kind of a Pavlovian conditioning with the name where everytime I think about databases, I think about cockroaches and all the disgusting things they do.
GCP is also far easier to use than the others. Everything from the organization/project hierarchy, g-suite user IAM permissions, simple primitives that can be assembled to your specifications, and web-based console access to everything makes it much simpler to deal with.
I don't quite understand why, but much of the tech industry seems to be sleeping on Cloud Spanner. Google quietly completely revolutionized managed+consistent+available+scalable RDBMS and very few people seem to have caught on yet. Maybe it's too much of a threat to job security?
I agree spanner and cockroach are the future. Most people don't need a database that can scale that well, and spanner is too expensive to use unless you really truly need it. Also, google and cockroach have not done enough marketing. Look at all the marketing mongodb did - they actually managed to convince people to use a database that would regularly lose data.
You can get 3 nodes for about $27k a year and that'll handle 30k read QPS and in the neighborhood 1-2k write QPS iirc. It's a fraction of the cost of even a single dedicated engineer. And you'd probably need several engineers to achieve the same perf with open source alternatives and keep it stable/upright. There's a large class of businesses for which this choice is a no-brainer.
Is Cloud Spanner the only managed option? Why would you compare to a dedicated engineer?
AWS RDS or GCP Cloud SQL or Azure Managed SQL or IBM Compose or Aiven or any number of other vendors offer managed databases with more features, much higher performance, and far less cost. Even CRDB has its own cloud offering that's cheaper and more flexible than Spanner.
Your answer suggests that you don't fully understand what problems Spanner is solving and why it's worth paying for that. CRDB is improving but I don't think it's consensus production-grade quite yet.
You compared it to open-sources databases and having a full-time engineer (while overlooking managed solutions). If those problems were unsolvable by other systems then what exactly were you comparing?
Spanner isn't magic, it's just a proprietary distributed relational database that is strongly consistent (CP) and relies on Google's network and infrastructure to make up for availability as much as possible. CRDB solves the same problems. It was founded by ex-googlers who are familiar with Spanner and the product is production-grade enough to achieve a multi-billion dollar valuation with impressive customers. Plenty of "new-sql" relational datastores have been created that compete and win on both features and cost, because the reality is that the vast majority of companies do not have a scaling problem; certainly not one that can only be solved by Spanner.
It does seem however like you appreciate why Spanner and CRDB are important steps forward. The prices will continue to come down. It will become a de facto sensible solution in production for any reasonably well-capitalized business in the future.
It's already there, but you refuse to accept it for some reason. The comment you linked to directly says "...the experience was positive. The hassle free HA is a huge peace of mind..."
If you are aware of anyone who has lost data using MongoDB we would love to hear about it. There have been many examples of constructed scenarios where we a MongoDB cluster can be demonstrated to have lost data and in every case we have fixed those bugs. We take data loss very seriously. If you know of such a data loss instance please make contact.
Spanner is great for Google, and other companies at the scale of Google. For the other 99% of companies, it's way too expensive, with too much lock-in (even SQL-based DML is a new addition), and has too many limitations.
There are dozens of managed database vendors now along with self-service deployments like Kubernetes operators that can scale standard databases as far as you need.
It’s way too expensive. Plus I wouldn’t want to deal with the limitations and latency that its consistency model brings unless I had truly Google-scale data, which I don’t.
I have done projects on all except Azure (mostly due to a Microsoft aversion). I hate their special names for everything. Reminds me of Starbucks.
Here is my take: GCP tools are better PubSub, CloudSQL. However they don't support email and their docs are not as up to date and helpful as AWS.
I think the main reason to select the big three is a) security (network, instance, user management) b) you don't get fired for selecting the big guys c) some specialized tools (SES, s3, CDNs, github)
I always feel that the time you invest to learn all the details of AWS you could have invested into Ansible, Docker, Wireguard, Iptables, zfs and linux, and deploy a much more cost effective solution on Heztner, (which I prefer over upcloud, do, vultr). But you need to know what you are doing. Many companies prefer to trust a vendor instead of their employees.
The main reason I'm weary of recommending GCP is the support horror stories that keep coming up. I'm using it at work now since our massive Google Ad spend protects us from that. It's got some really good technologies although there's various rough edges.
One thing that really irks me is GCP requiring me to talk to sales people (not support, sales) to have a relatively small quota increase. Why would they make it harder for me to give them money?
As someone who has worked on large-scale deployments on both AWS and GCP, I would always prefer AWS over GCP. While GCP products are IMO superior to similar AWS offerings their support (even premium tier) is total garbage compared to AWS.
GCP wins hands down when it comes to cloud governance and network design.
I think the two biggest weaknesses are:
- IAM - some resources have awkward relationships with IAM; although the GSuite integration is nice
- CloudSQL (vs RDS) - for businesses that need relational data stores, but aren’t at the Cloud Spanner scale, RDS blows CloudSQL away in features
CockroachDB was founded by ex-Google employees and is partly funded by Google Ventures, so I'd take this report with a pinch of salt. IMO GCP is good for PoC/personal projects due to their liberal free tier quotas, but I don't know about going big. Anyone with large scale experience on GCP?
Report author here. We have no bias towards nor stake in any of the three cloud providers. We partnered with all three clouds to develop the testing methodology and benchmark set. Our bias is towards providing as much information as possible to our own customers as they select their clouds and machines.
Yeah, there needs to be a disclaimer in these reports about this kind of stuff. Not that I doubt the technical accuracy of the claims, but it's just good form to make these kinds of notices.
There are number of issues with this report. The AWS networking section is particularly problematic and in need of extensive disclaimers or changes to the test methodology.
On the throughput side, all this test does is demonstrate the documented[1] throughput limit for a single TCP connection. 10 Gbps if the two instances are in the same placement group and 5 Gbps otherwise. The reason some of the network-optimized instances were "slower" than the non-optimized ones was because it was simply a random draw of whether both instances in the test were physically close to each other.
If they wanted to do a proper throughput test they would have used placement groups and multiple connections/flows. If they felt like the single flow test case was important they should have mentioned that AWS has a specific limitation around this. Personally I don't think a single flow test case is particularly realistic.
The fact that the obvious discrepancy between their results and the documented (multi-flow) limits didn't cause them to dig deeper is enough to make me very skeptical of the purpose of this paper.
The latency results are also basically a random spread. It is essentially distribution of all the different latencies you might randomly get between two instances if you don't use placement groups. It says absolutely nothing about the networking capabilities of different instances used in each test.
Adding 20px of padding on all the slides seems to be the worst idea they got this year. Slides that you can't read is always better than slides you can read /s
In the authors' defense, these don't appear to be slides at all, but rather a PDF version of what could be a printed document. However, whether the document would look good in print is also highly debatable.
The network throughout is eye opening given how close most of the other benchmarks are. GCP's lowest performer is >50% higher than AWS's top performer and more than double Azure's best.
How much of this is affected just by "cloud weather"? It seems like network latency and some of these other measures would be influenced by adjacent workloads that happen to be running in your region, zone, facility, rack, or machine.
They're definitely thin on explaining the sample sizes. They say 54 configurations over "nearly 1,000", which suggest 17 tests (918 runs) or 18 tests(972 runs) per configuration.
They run 4 different benchmarks (CPU, network, I/O, TPC-C), suggesting an average of around 4.25 or 4.5 per bechmark per configuration. If instead they ran 16 per configuration, that would be a nice round 4 per benchmark per configuration, but total runs would drop to 864, somewhat less than "nearly 1000".
Assuming my figures are sound, we're looking at 4 to 5 samples per combination. Without some information about the within-group variation, though, it's difficult to distinguish what variation was due to "weather" and what was due to the platform.
I do however think that the effect size of some results is enough to make them useful (eg, network throughput). But all of the close results (eg single-core difference between AWS and Azure) are not very reliable, in my view.
Not really. The issue is that these cloud platforms aren't just "give me virtual machines" anymore. If you're just looking for VMs, there are loads of alternatives. The problem is that people are looking for so many value add services, not just VMs.
At some point, you might say, "I'd really like to put things into some sort of queue-like system." With AWS, you have SQS, you have hosted Kafka. With GCP you have their pub-sub. I'm sure Azure has something similar. With all three of them, you can get Confluent to run Kafka for you.
At some point, you might say, "we need an analytical system to run reports off." With AWS you might be able to use Athena or their hosted Spark stuff. With Google Cloud, there's BigQuery. Azure has data warehousing stuff. Third parties will often have their systems available on those platforms.
At some point, maybe you want some computer vision, or ML on text, or Redis/Memcached cluster for caching, or Functions as a Service, or global load balancing, or container system, etc.
What Amazon realized is that they could provide more than just machines and something different from "you pay us to manage the boxes, but they're still your boxes" that places like Rackspace might do. They realized that they could create an ecosystem of value-add services that would become self-reinforcing. S3 wasn't about Amazon installing and managing MogileFS on a few boxes for you; it was about making it so that you didn't need to care about the boxes at all. Athena isn't about getting them to install Presto on some boxes for you; it's about not having to care about the boxes.
And it becomes self-reinforcing. As people use AWS for these value-add services, third parties want to build on AWS to get access to you. Which means that you want to build on AWS to get access to those third parties. Likewise, the more you use AWS's value-add services, the more you become dependent on them.
Many providers offer an S3 competitor. Few go much beyond that and S3 is less interesting if it's only about storing and serving files. S3 becomes so interesting on AWS because it can feed so many things. It can become a target for Kinesis or a source for Athena or a storage engine for Aurora.
Digital Ocean wants to go in this direction, but it isn't easy and their prices are rising to be similar. For example, DO offers managed databases. However, their pricing pretty much mirrors Google's. 8GB RAM high-availability is $200 on DO and $197.25 on Google. DO offers "4 vCPU" while Google only includes 2 vCPU, but GCP's vCPUs are dedicated while DO's are shared so it's probably a valid comparison. Even DO's compute VMs are similarly priced to Google's sustained-use pricing ($49.34 for Google's 2 vCPU 8GB N2D vs $60 for DO's 2 vCPU 8GB General Purpose Droplet). Now, DO's come with 4TB of transfer - which is an important distinction. However, when we look at their new App Platform, they've stopped including a lot of transfer for free and they're charging $0.10/GB for outbound transfer after the limit.
Oracle is looking to become a cloud competitor, but they're behind. They include 10TB of outbound transfer for free (which is nice) and they only charge $0.85/GB after that. However, I think there's a decent amount of distrust around Oracle in general and their offerings are a lot more limited than AWS/GCP/Azure.
The problem is that it takes years and a lot of engineers and capital to build up the breadth of services that people have come to use. If you're a startup, do you want to be managing your database, backup plan, scaling plan, etc.? You'd likely rather pay AWS/GCP/Azure more money and concentrate on your product. Do you want to install and manage Kafka and its dependencies like Zookeeper? Do you want to run your own load balancers? Your own caching cluster? Your own ML system? Your own video encoding system?
So, it really depends on what you're looking for. The sales pitch for AWS/GCP/Azure is that you won't be blocked from "anything" that might be useful for your startup. If you go with DO and realize that you need a high-throughput queue system, do you start evaluating RabbitMQ and Kafka, figure out which your engineers feel comfortable maintaining, figure out how you'll back it up, maintain uptime, etc.? Or would you just rather Confluent give you a managed system? Or use AWS's SQS or GCP's PubSub?
If you're just looking for VMs, a lot of places are offering that. However, often times they look cheaper until you start looking for dedicated CPUs. If you're getting 8 vCPU, but they're hugely overselling the boxes, you aren't really getting 8 vCPUs.
Transfer is the one area where the big three cloud providers seem to really overcharge.
The problem is that once Amazon saw the margins they were getting from AWS, they poured money into it to keep adding value that others (without as much capital) wouldn't be able to keep up with. Microsoft and Google have the kind of capital to pour in to compete in the area, but a place like Hetzner, OVH, DigitalOcean, Rackspace, etc. generally don't have the kind of capital to hire the engineers to create and manage the myriad of services that Amazon started churning out. When AWS was just EC2 and S3, there were loads of options that were roughly equivalent. Once Amazon started pouring money into AWS, it just became really hard for competitors without Amazon's huge scale to keep up. Google and Microsoft could buy their way in and I'm guessing Oracle will keep investing in their platform. However, it's hard for a new company to get involved. DigitalOcean has raised almost half a billion and I think they're a great company, but their offering is very limited by comparison.
So, there aren't really competitors featuring the kind of breadth of services. Oracle is probably the closest and DigitalOcean has some nice offerings, but it takes a lot of time and money to really become a competitor - and an amount of money that is hard to raise given that you have three incredibly well-funded and incredibly competent companies looking to make sure you don't enter the market.
I've been asking this question for a long time, and I can't arrive at a solution in the middle that makes sense anymore.
If we pull the ripcord on AWS, we are going 100% on-prem with our own servers. Bare metal is just watered down EC2 as far as we are concerned. You still get left holding the bag on a plethora of management duties, so you might as well just take ownership of the whole stack at that point. There are actual benefits to keeping your own hardware in a datacenter that you can physically access. There are also a shitload of downsides that need to be reviewed. Once you accept and adjust to this fate, things can move a lot more smoothly than the cloud salespeople would like to admit.
Ultimately, you are probably stuck in the cloud until you can hit that point of being able to dedicate 2+ full-time engineers to the task of managing your infrastructure. Multiple hats for developers works for cloud, but not so much for on-prem. Having to drive to the datacenter should be something that only a few people in your organization need to worry about. You could consider outsourcing this specific aspect for a "pseudo-cloud" experience, but that is more complicated and starts to defeat the original purpose.
> Ultimately, you are probably stuck in the cloud until you can hit that point of being able to dedicate 2+ full-time engineers to the task of managing your infrastructure
This is the exact reason people stay on AWS. You need to be big and decently profitable to afford 2+ full time people managing the server hardware purchases, maintenance, updates - same with network.
These people need to be 24/7 on-call unless you can somehow make your system so fault-tolerant, that it can handle up to 16 hours of downtime without intervention (breaks the second the engineer clocks out, needs to hold its own until they clock back in).
Even the two people are a stretch when you need to be 24/7, that's at least 3 shifts and even then you're one flu away from being short-staffed again.
So now we're at a point where you need to make enough profit to pay 4 competent server engineer's salaries just to get out of AWS. Add to that the one-time costs of buying your own servers, setting them up and colocation costs.
> Ultimately, you are probably stuck in the cloud until you can hit that point of being able to dedicate 2+ full-time engineers to the task of managing your infrastructure.
Um, exactly?
People think about cloud as outsourcing the hardware when you are actually outsourcing the system administration.
So, when you have generated those system administrators in house (you can't hire them anymore because system administration as a career path is dead due to the cloud vendors) because they are troubleshooting the "cloud" so much, it's time to pull some things back to on-prem.
The other time is when you have one particular characteristic (network, storage, etc.) that the cloud providers are killing you on and you will save a whopping amount of money by changing. However, by that time, you presumably are successful enough that you have generated those system administrators anyway.
Any recommendations for a DO-likd experience that supports windows? As much as I'd like to use DO, it's a show stopper for people who want the simplicity of DO but have a requirement on windows (hello, MSVC)
Because there’s a lot of experienced folks in here that can give better info than a marketing website. Does it work well enough in comparison is another angle.
Google, like AWS and Azure, is 'only pay for what you use'. Can anyone tell me if there is a way to put limits in? Or to choose a 20/50/100 dollar per month plan?
In GCP you can configure budgets for projects (groups of resources). Budgets can issue alerts when reaching certain percentages and reconfigure or shutdown resources when exceeded.
But as the warning says, that might delete data in storage or other resources you may want to keep paying for. For that case, you can execute a program, that shuts down everything you can get rid of, but the cost for storage and everything you forgot will continue to be billed.
Not quite. Please note that there are various things you pay for, let's take a look at several of those:
- provisioned resources (instances, including the ones under relational databases or other stuff like that)
- usage of on-demand resources like AWS Lambda and API Gateway
- infrastructure such as load balancers (kind of mix of provisioned and pay-per-use)
- persistent storage (which is all over the place in terms of payment methods)
While you could shut down the first three to save some money, would you like to remove your data permanently?
So far all mechanisms that are available are mostly reactive (billing alarms etc) rather than proactive (service quotas although they are meant to shield from poor design rather than a typo in terraform). There is clearly incentive for cloud providers for the former, but it's not an easy problem anyway (they might mark your accounts for "training" or something like that - I think this would be reasonable).
I've seen many companies user "terminator" scripts that shutdown/delete things that aren't tagged as "keep" or something like that, though only in non-production accounts. The budget alerts from the cloud providers can be useful too (AWS recently released cost anomaly alerts).
We're trying to tackle this problem with https://github.com/infracost/infracost from another angle for people who use Terraform: show a cost estimate in pull requests so the user understands what costs money, and roughly how much it costs. I hope that helps clarify "only pay for what you use" without trawling through cloud pricing pages.
I'm a bit dubious about the networking results they present. I did some quite extensive network performence testing last winter on those three CSP, and even if single queue TCP+gso performence can behave like this, I find the claim 'GCP is 3x faster than AWS' a bit bold. It's definitely possible to get 50G of TCP traffic in AWS, and a lot of things are in the balance (MTU, number of queues, drivers...) that make this claim a bit weird to me.
One of the engineers who helped run benchmarks and compile the report here. It’s worth noting that for the majority of the machines we benchmarked on AWS, their tested bandwidth met the published AWS expectations. You may have noticed that some of the “network optimized” machines fell short of the published expectations though, and there’s an explanation in the report about how we tried to validate our findings.
As you point out, there are a variety of variables that could be tuned to eek out better performance here, and they could bring the two clouds closer. Our claim, of course, only applies to the benchmark configuration we tested with. That being said, with the size of machine we were restricting our testing to (16 vCPUS), no AWS machine claimed to offer more than 25G of throughput.
All the 16 vCPU "n" instances (m5n, c5n, r5n, etc) are capable of hitting the 25 Gbps limit easily. In your report, all of the AWS results are limited to either 5Gbps or 10Gbps, but this is because of a very specific test condition.
From my understanding of the test scenario, you are using a single TCP connection to run the throughput test, and hitting AWS' documented[1] throughput limit for a single flow: 10 Gbps if the two instances are in the same placement group and 5 Gbps otherwise. The reason some of the network-optimized instances were "slower" than the non-optimized ones is most likely a random draw of whether both instances in the test were physically close to each other (basically whether or not they are accidentally in placement groups).
To show the true throughput you would need to use multiple connections/flows, 5-10 would probably suffice. If the single flow test case was important then maybe you should have mentioned that AWS has a specific limitation around this. Personally I don't think a single flow test case is particularly realistic for a throughput test. Either way, how it is presented is pretty misleading.
Right, this also aligns with the our testing. Number of queues available seem to also be playing a role. But you're right RSS should spread traffic quite nicely when using 5-10 flows.
To be fair, network latency across all the providers is reasonably similar. The network throughput was not even close though, with GCP winning by a huge margin.
I'd say both statements are fair and not mutually exclusive.
One of the engineers who helped benchmark and compile the Cloud Report. You're right in noting that the statements aren't mutually exclusive. Our overall takeaway was informed by each Cloud's performance on the individual benchmarks as shown on page 4 of the report. The results were quite close though, as each of the Clouds had specific benchmarks they did excellent in.
I evaluated AWS and GCP for my startup and found GCP to be more expensive. The horror stories I've read about Google's lack of customer support put me off too.
I have substantial concerns running my core infra on Google products: deprecation, inhuman support, the allegations of anticompetitive behaviour in the states’ antitrust lawsuit.
But there is something to be said of AWS. Their SDKs are complete and predictable, their APIs are very fast and consistent, and AWS IAM, while having a steep learning curve, never leaves you guessing around what your principals have access to. For me, the real challenge with AWS has been introducing multiple AWS accounts. Governance just flat out sucks when you begin to scale past a handful of accounts (but it is getting better).
Azure on the other hand, has terrible consistency issues between their APIs, their SDKs are awful, and it just feels like the entire product is an extension of the MCP System Administrator persona of old, where it's expected that someone's job will be sitting in front of a UI and clicking around to get things done (the whole blade thing with their portal has to be one of the worst user experiences I've ever seen). However, I do like their Logic Apps, and Azure Policy with auto remediation (when it works as advertised -- ref API consistency and how long it takes for things to propagate through their system) has tons of potential. But they still have a ways to go before I'd consider it for my workloads.