Hacker News new | past | comments | ask | show | jobs | submit login
Google Cloud Platform – The Good, Bad, and Ugly (deps.co)
409 points by dantiberian 9 months ago | hide | past | web | favorite | 195 comments

One thing I didn't see mentioned in this article is Firebase. It feels like a hidden gem lurking within the overall GCP offering, and may be overlooked by devs who're not doing mobile-specific work. For me, Firebase was the gateway drug that got me into GCP. I successfully built and hosted the backends for a couple of iOS apps using Firebase. The best parts were cloud functions and built-in sync (including offline/occasionally connected sync scenarios). Firebase Cloud functions allowed me to convert an AWS Elastic Beanstalk project into a 100% serverless architecture, which basically runs itself with no ongoing maintenance or scaling required. It took a significant amount of work off my shoulders. Once I had a positive experience with Firebase, I started poking deeper into the Cloud Console. I realized I could use App Engine, Compute Engine and various other services, which all made perfect sense as "upgrades" to the core Firebase project. I've now migrated the backends of 2 existing, popular iOS apps from AWS to Firebase+GCP. Kudos to the Google team for making Firebase perfect for my needs, and the overall Cloud platform.

Ya, we love Firebase! Use Firebase functions heavily and i've been very pleased. That tied with redux works so smoothly. It's a different way of thinking about things but I find it so much easier to maintain and reason about than your conventional rest service.

Couple things I wish it had:

1. Ability to `keepalive` certain functions. Cold start is real. We have gotten around this via app engine cron that triggers certain functions.

2. Schedule them on cron. We have gotten around this using app engine.

If curious, this is what we are building off of Firebase (and React native) https://itunes.apple.com/us/app/bunch-group-video-chat-games...

> One thing I didn't see mentioned in this article is Firebase.

Can confirm, I'm also building my startup off of Firebase.

Super easy to get up and running with. I was talking to a friend at Amazon about how I create serverless HTTP entpoints and he was impressed that getting started with it was a 10 minute affair from "never done backend programming" to "REST endpoint setup." Another half an hour got me full end to end auth working from my app to my endpoints through to Firebase DB. FWIW I haven't investigated what it'd take to do the same within Amazon's eco-system, but from what I've gathered I'd have to understand a few more concepts than the simple "npm run deploy" that is the heart of Firebase's serverless offering.

Firebase's serverless stuff is dead simple. Wish the emulator was better though, I can'd do encrypted endpoints on localhost, which means 90% of my API can't be tested locally. Having to deploy to the actual service seriously adds to my code-deploy-debug loop!

> I can'd do encrypted endpoints on localhost, which means 90% of my API can't be tested locally. Having to deploy to the actual service seriously adds to my code-deploy-debug loop!

Try using ngrok (https://ngrok.com/). It can forward public HTTPS to localhost HTTP.

I run a service that fixes this problem. https://serveo.net/ It uses SSH port forwarding to proxy public HTTPS traffic to your local server.

Thank you for the kind words. I'll make sure the firebase teams receive it :)

Have you migrated anything off of Firebase? Or still kept the “base” projects and upgraded around it? I’m interested specifically in how expensive Firebase is for popular products

I haven't migrated off of Firebase but this is something I've thought a lot about as I've selected Firebase for many of my client projects.

As far as I can tell there is no open-source alternative with feature parity that makes it easy to migrate away from Firebase. I'd be curious to hear about how people have done this.

This is actually what I'm afraid of - specifically vendor lock-in and being stuck with a stack that has high operating cost. From some of the comments I've seen, Firebase has high operating costs. But I'm still planning on testing this out myself with a small project soon.

Is Parse on node comparable at all?

If you like Firebase, I'd like to give a shout-out to Sanity [1]. (Disclosure: I work part-time on the backend, but I'm not an employee and I don't work on the product itself; but call this a shameless plug if you will!)

Sanity is a headless CMS, but the API itself is like a souped-up Firebase and can be used on its own without adopting the Sanity UI. It's got many of the same features -- web API, change watching via WebSockets -- but adds support for joins, fine-grained patches (CRDT-like but not actually), transaction history APIs, fine-grained document-level permissions, and many other neat things. We are particularly proud of the query language, GROQ [2], which is a very powerful superset of JSON that allows you to express transformation pipelines over structured data.

Sanity is hosted on Google Cloud, too, so performance when used inside GCP should be great.

[1] https://sanity.io

[2] https://www.sanity.io/docs/data-store/how-queries-work

Thanks for the pointer to sanity. After a quick read over the docs, it seems a really impressive piece of software.

Yes. Firebase is so awesome! It really mis fun to work with and works so well with Angular. This guy has some awesome tutorials to get started.


Yep. Firebase + GCP(app engine) is awesome.

I've heard that while Firebase is good, it has pretty high pricing once you have enough users and has a big vendor lock-in problem.

Since you were already at Amazon, I assume you compared Firebase to AWS Lambda? Could you tell us why did you end up prefering Firebase?

I'm not him but to me the big thing that Firebase has and AWS Lambda doesn't is Firebase - a database.

Firebase Functions === AWS Lambda but it's only one of 17 components that is Firebase.

Also, Authentication component is nice (easy way to implement accounts/login with Google/Facebook/Twitter/email).

I'm not him but to me the big thing that Firebase has and AWS Lambda doesn't is Firebase - a database.

Only if you are pleased using a "NoSQL" solution.

Firebase is very much easy to get started for mobile apps. You have nice console to create project, and as developer you deal with just its SDK, one CLI and simple API. For something similar to Firebase in AWS, you need to make use of Lambda, API Gateway, DynamoDB, Cognito separate products, which has steep learning curve. And then you need to know the AIM security to get started. AWS has a Mobile Hub product which is trying to make things easy to create mobile apps by automate some of these AWS products for you. So now you have to learn awsmobile CLI tool on top of the core awscli tool. And if you develop for the web, they have this AWS Amplify Javascript library on top of basic JS SDK library which has a higher abstraction thus to make things simple (or worst). If you're new to AWS, you often struggle with which library should one learn and use. And if you use AWS Ampify and read the documents, it is as if the library is build for React. I'm not sure why AWS can be so married to one web stack. So much emphasis and priority is given to React and React Native in this library. They will give the excuse that React is what their customer wants. They now have mature React/React Native Amplify library thus giving the impression that to use AWS its better to use React/React Native. So they spend resource on AWS-Amplify, even move one of JS cognate-identity-js SDK under this umbrella, hoping the world will standardised on JavaScript and React Native for iOS, Android and web. So what happen if you're developing a mobile app using Unity, Xamarin and Flutter? No luck as they only have SDK for iOS, Android and web. Compare this with Firebase that has SDK for iOS, Android, web, Xamarin, Unity, Flutter and C++. AWS has an opportunity in AppSync to make a product as easy to get started as Firebase. What is required is native mobile SDK for iOS, Android, web, Flutter, Unity, Xamarin that is not just specific to AppSync and GraphQL, but also add in authentication, analytics etc APIs. Something like Firebase SDK API. AWS is still stronger for enterprise usage. And you can’t use Firebase in China due to its use of Google Play services. But Firebase is definitely a weapon used by Google to draw customers into other Google Clouds offering.

Thanks a lot for the through response. That was very helpful!

Firebase is great, but in my little experience, as it's database grows, it becomes clunky to sort and query, especially since everything is pretty much done client side.

Try Cloud Firestore, it has a lot of improvements with querying over the RTDB. I can also confirm sorting/querying is handled server-side. (Disclaimer: I'm the PM for it and use it for my own side-projects)

Would be even better if there was a graphql frontend like AWS AppSync. Helps structure the data a lot more easily.

Firebase is great, but covered under a separate agreement with Google, and its terms aren't as friendly to corporate users. For example, they still aren't encrypting tenant data at REST. Nothing is COPPA compliant.

I'm pretty sure that's wrong.

Also: https://cloud.google.com/firestore/docs/server-side-encrypti...

"Each Cloud Firestore object's data and metadata is encrypted under the 256-bit Advanced Encryption Standard, and each encryption key is itself encrypted with a regularly rotated set of master keys."

So... yup.

Wow, sounds like my info is out of date! This was true when we last evaluated it. I really do appreciate how fast Google is on this stuff.

> encrypting tenant data at rest

I know that this is required by various security certifications - but is there a reasonable threat model that it actually protects against?

The only one I see is someone physically stealing the hard disks out of the servers, which is impossible if you are using a trustworthy cloud datacenter instead of a server in your bedroom.

> The only one I see is someone physically stealing the hard disks out of the servers, which is impossible if you are using a trustworthy cloud datacenter instead of a server in your bedroom.

If you are using a public cloud data center for private data with regulations around authorized access, there is basically 100% chance that people without access authorization have physical access to the servers and their disks in a manner where there is no direct knowledge of the data owner of what occurs, which makes the threat of “unauthorized person gains physical access to the hard drive and steals data” greater, not less, than “a server in your bedroom” (or, more relevantly for corporate use cases, in a corporate data center for which you control physical security.)

If you have customer-managed keys like parts of Google Cloud, it keeps even Google from reading your data on disk.

I lot has changed in the last year or so, might be worth looking again. Most of the backend services are under GCP's terms of service, products like Cloud Firestore all do encryption at rest, etc.

It sounds like my info is out of date, which makes me very happy considering how much I love Firebase.

Although the author mentions they haven't had experience with AppEngine, it's the reason why I love Google Cloud SO much over anything else.

If you're a startup running something on Elixir (or even Rails), AppEngine's experience is hard to beat.

Not many people do know:

* You can run multiple microservices on AppEngine under one application.

* Each of these can have many versions serving different percentages of traffic.

* There is an on the fly image resizing service, which means, you don't need to run complex imagemagick setups on your machine, instead you can simply call your image with parameters appended Eg. - /my.jpg?size=120x80 and it will be resized on the fly. And it's pretty damn fast, too.

* If you're also using Cloud SQL, you can directly let your app talk to it without whitelisting IPs or even using the proxy, just using sockets.

* You can lock your app under what google calls IAP, a login layer that allows only authorized users to access your app. IF you're building prototype for clients, it's a no brainer and saves you from adding custom auth. (https://cloud.google.com/iap/docs/app-engine-quickstart)

* The development experience is superb, and has gotten better in the last 2 years.

For the record, I tried AWS ElasticBeanstalk recently and it has a lot of bugs, especially with the new interface changes, so I kept coming back to AppEngine. Seriously, if you're doing a startup in 2018, there is no reason to not use AppEngine.

App Engine is really awesome! There's a few caveats I would add if you're using it for Rails or Elixir which requires the Flex environment.

* No integration with Standard APIs

* Slow / build deploy times

* Minimum pricing of $40 per machine / month

Overall, I feel like the direction App Engine is headed makes it purely better than other PaaS services but if you're running a low traffic website that isn't standard environment (Java 7/8, Python 2.7, Node 8, PHP 5.5, Go 1.6/7/8/9), Heroku is probably still more economical and better developer ux.

I hope they add Ruby and Elixir to the standard environment. I don't think any other PaaS would compare.

I’ve been using AppEngine for years and really like it. Had an interesting billing issue recently.

I enabled a task queue to run a longer running task in the background but I didn’t realize there’s a default to retry failed tasks indefinitely. This started slowly increasing my front end instance hours (and my bill). A few hundred bucks later I figured it out and specified a retry attempts limit.

I had a similar issue and contacted them for a refund. It worked out fine and they refunded the full amount. It might be worth giving it a shot!

Would love to see Google embrace Elixir, the Erlang runtime doesnt play very nicely with Kubernetes but gives so many benefits. If they can get it so that you can set up a VPS of linked Erlang nodes, it would be everything I need

The generous free tier in App Engine is also very nice - lets you run a lot of personal / experimental stuff for free.

What happens if the free quotas are reached? Will the app terminate until the free quotas are available again, or will the user be billed?

I looked long and hard at App Engine, but I wasn't able to use it as I needed to be able to accept file uploads that were larger than the upload limit on the App Engine load balancer. Otherwise it likely would have been my choice. I'm looking at using it for some internal applications soon though.

If you're trying file uploads that directly sit within your application, AppEngine (sort of) discourages that, the right approach would be to let your app upload the files to a CDN (like google cloud storage, if you're on GCP) and manage access through ACLs.

Yeah, the issue (which I didn't explain well) is that Deps is a Maven repository and needs to speak the Maven protocol. That makes it difficult to do the ACL signing for direct uploads, especially if I want to make it clean for users configuring it.

You can have the client upload to a cloud storage bucket, then notify the server that the file is ready. Upon receiving the notification, the server can start a background task that reads the file from the bucket and uploads it to the Maven repository.

It's a few extra steps for something that should be simpler, but it's workable.

The client in this case is Maven, SBT, Leiningen, e.t.c., so I don't think I can get that kind of control over how Maven would upload the files.

I'm curious, could you use a 307 redirect to redirect the user agent to a signed upload URL?

The only way to do that on App Engine is to have the client ask the server for a temporary signed URL to a writable cloud storage resource, and then have the client upload to that URL (i.e. the only way is by sidestepping App Engine...).

I haven't tried this myself, but could you have your client(s) get around this via multipart upload?

I don't think you mean multipart upload - that's a single request that contains multiple files. You're talking about some kind of resumable uploads.

I mean 2+ concurrent uploads, which I've used before on S3, but not on GCP. [1]

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview....

Appengine gets very, very expensive as you scale and still has a ton of limitations that only bite you once you have some load.

Hey there! Seth from Google here. Thank you for writing up this article and providing this valuable feedback - we really appreciate it.

I’m personally taking this feedback and making sure it’s shared with the relevant teams (both positive and negative).

On the DevRel team at Google, we often write friction logs (my colleague just authored a post about friction logs in detail: https://devrel.net/developer-experience/an-introduction-to-f...) to help our product and engineering teams identify and fix any rough or unexpected edges in our offerings. This feedback will be valuable to our teams the same way a friction log is valuable.

Thank you again for taking the time to write this up!

A lot of what he wrote is on point.

For example:

> I have reported bugs/clarifications against AWS docs and got prompt feedback and even requests for clarification from AWS team members. This has never happened for my comments submitted against Google’s documentation.

I always submit "feedback" (for the last 2 years), but the lack of changes in the docs make them seem to disappear into a black hole, so I stopped doing so.

Would be great if the docs for GCP were on GitHub (like the K8s, and now AWS, documentation) so we can at least see that docs are being worked on and iterated, and share/validate common issues and suggest fixes.

Minor things like this also create a lot of developer confusion in large user orgs:

> The API libraries and tools are spread across several GitHub organisations including GoogleCloudPlatform, Google, and possibly others, which can make it a little difficult sometimes to track down the definition of something.

> It would be easier if I didn’t need to think about this API access

Everything in the "ugly" is spot on.

Thank you for the feedback. While I can't share specific details, we are always looking for ways to improve the documentation, bug reporting, and feedback process.

I also totally agree that these little things matter, even for small organizations. An incorrect or incomplete piece of documentation can cost someone hours or days, not to mention the emotional cost. Rest assured that we are working to make the experience better.

Had to use the cloud storage python API yesterday to retrieve the sizes of many blobs I have stored there.

First time using said API. Creating the client was straight forward enough...oh look, there is a section of the API docs on Blobs, cool - create a blob...and there's a size property.

And it has no data. Scan through ALL of the docs on the page...ahh, there's a "reload()" method, and sure enough, calling it retrieves the data.

Though...why doesn't constructing the object fetch the properties? And if it doesn't, why don't you just write some sentences somewhere telling people the general flow they should expect for how to interact with the API?

From my experience with all other GCloud docs, the answer is - because Google couldn't care less if their customers can figure out how to use their products. You write some minimal API docs on functions, MAYBE give one example of doing one specific activity, and then it should be OBVIOUS how to do the rest.

Having used both - would never, ever choose to use GCP over AWS if given the choice - and the main reason, aside from any technical differences - is that one company seems to care about what the experience of using their product is, and the other just doesn't.

I have to say I 100% agree about the documentation.

Having used both GCP and AWS, although the core GCP offerings are fantastic and far superior to AWS, the GCP documentation is a disaster.

It’s totally inexcusable, and really disappointing because it has a huge effect on the perception of the platform among folks that are AWS users and take a quick look at GCP.

I can’t see any reason why the documentation issue shouldn’t be solvable in 3 months with enough money. Take a wheelbarrow down to the bank, fill it up, and put out an open offer to everyone on the AWS documentation and developer relations teams to double their salary and stock.

The core of GCP is so great, that it’s such a shame the final polish seems to be impossible to accomplish.

Thanks for sharing your experience. This is definitely not the feelings we want users to experience. If you remember, could you share the documentation page(s) you were on? I'll make sure they are updated to reflect the proper fields and steps to take.

We do care deeply about our users' experiences with both GCP and the GCP ecosystem. Feel free to send me specific areas where you've experience pain in the past and I'll get them addressed (email is the same as my handle - at google - dot com)

Thanks for the response. See my response to bduerst in this same thread - it was that page.

Also to be clear - I don't think this particular example I gave is the end of the world/egregious - I just used it as representative of the fact that the general pattern I've seen of the docs following where details are explained, but the "obvious" bits of simple usage are not discussed.

Seth, thanks for taking the feedback directly here, but...

Shouldn't we be able to make these requests given the built in feedback tools in the documentation, and actually expect some result? It's great that you're willing to address this specific issue, but it's frustrating to basically see plain text validation that the feedback forms are not attended to, and developers should be emailing individual team members problematic documentation pages if we're in the cohort exposed to your contact information.

Can you point to which docs were you using? On github it says that if it reports 'none' then you need to load the resource.


Was reading that page. What it specifically says is:

> The size of the blob or None if the blob’s resource has not been loaded from the server.

That's pretty vague imho, because there is no high level overview or tutorial to tell you that you need to explicitly cause the load from the server. Point in fact, the function you need to use is "reload()", and the documentation of this func is "Reload properties from Cloud Storage." To me, the "re" part makes it a bit of a misnomer - as it implies there would have been some sort of initial load in the first place (there is not.)

Imagine if the interface for going to a web page in chrome was: 1. Type the URL in the location bar. 2. Hit "refresh."

To conclude that:

> Google couldn't care less if their customers can figure out how to use their products.

That feels like a pretty strong leap to make from your complaint about using the term "reload" when "load" is better.

I can understand you are providing an example, and perhaps you feel the documentation has many such examples. From personal experience reading extensive documentation, I think you are being slightly unfair in this instance.

If I had to guess, given the speed of development of Google Cloud, it will take some time to get to the level of mature documentation of AWS who has had years of head start.

> That feels like a pretty strong leap to make from your complaint about using the term "reload" when "load" is better.

I never made that leap - you're saying that. I'm saying that GCP docs are all severely lacking in "how to" guides.

API references are not so helpful without an explanation of how you're intended to use them. So, the problem isn't that the function name is "reload()" instead of "load()" - it's that nowhere does it tell you the pattern is to instantiate the object you're interested in (in this case, a Blob), and then you must make a call to retrieve the data from the server before the data you're interested in will show up.

Additionally, my commentary was made after now using many different GCP services - this was just something recently in mind.

> I can understand you are providing an example, and perhaps you feel the documentation has many such examples. From personal experience reading extensive documentation, I think you are being slightly unfair in this instance.

I kind of agree with the other poster. Getting up and running with .NET on flexible instances has been challenging to say the least. The datastore documentation does not give a good overview or introduction to how interact with this system.

Just to echo the OP -- the Cloud SQL proxy has been a pain point for me for years, it makes deploying to Kubernetes a PITA, since I have to use a different configuration vs. my dev setup. Whitelisting RDS networks inside the VPS is a feature that's existed in AWS for many many years.

Also agree that Stackdriver is pretty underwhelming, though it does have the perk of providing zero-configuration log capture in GKE.

Lots of problems though -

* UI is super-slow, yet it drops old records when you scroll, so scrolling up and down by more than 100 lines causes a reload of the logs I was just looking at. (Note that e.g. Kibana will do infinite scrolling to load more log entries, but doesn't drop old ones when you scroll, which is much more usable).

* It's really hard to work with large log traces; I'd love a quick way to "show raw logs for this time-period/search" so that I can either copy into a text editor or use browser search to explore. An example use-case here is "select all logs with this request correlator", which could include thousands of logs from a request, then jump around that log stream to look for interesting events. Re-running the search is painfully slow for exploring results like this.

* I've also had problems with alerts; the algorithm used for uptime checks is pretty naive, it just checks if the endpoint was down for the entire threshold period, which doesn't catch a flapping endpoint (e.g. down for 1s, up for 1s). Grafana can be configured to detect that the average latency spiked over the period, even if it's not fully down. This has led to us not being alerted for a production outage, so I've moved our critical alerting off Stackdriver. (Again, the promise here is great; if you could get alerting working as well as Grafana, it would be super-powerful to be able to alert on any synthetic metric generated from any log stream in the cluster. It's just half-baked right now.)

All that said, I've been happy with GCP in general, in particular GKE has made running a k8s cluster a lot easier.

I'm trying to reproduce the Cloud SQL Proxy in Cloud Shell that both you and the OP have mentioned, but I've been unable to reproduce (and thus file a bug). I'm sure I'm missing a step. If you're willing, I would love if you could drop me a quick note (sethvargo at google dot com) with the steps you're taking, commands you're running, and output you're seeing. This will help me escalate it to the right team.

I'm also going to share your feedback about Stackdriver with our product team as well. Thank you for that!

Just to clarify with some more detail (for the HN record), the issue with Cloud SQL is that while you can whitelist an individual internal IP, you can't whitelist an internal network (like 10/8). This means you need a sidecar service of some sort to add the whitelist rules when a new instance/pod comes up that should be able to access the DB.

In AWS for a simple app I'd just whitelist 10/8 and have the DB open to all internal instances (with TLS for added security as required).

I'll follow up by email too, happy to provide detailed feedback on this sort of thing.

Thanks for taking the time to engage with the community on this stuff!

Better yet. In AWS I can whitelist a security group and all ec2 instances with that security group would get access to RDS.

This would be handy with tags in gcp.

We have similar experience with StackDriver Logging. It's felt unfinished since day one. Two problems come to mind:

(1) Some time ago I needed to get a month of logs (filtered by an expression) in order to compare the logged data against a database. The Console does not have any such option. You can set up "exporting", but that only starts replication from the moment you set it up; no historical data. You can use the APIs or "gcloud beta logging read", but the performance is terrible. I got maybe 200-500KB/sec. After keeping the command running continuously for 3 days (!), I still only had 20% of the log data.

These days, we have exporting set up to pipe all the logs into BigQuery and GCS, just to get around the crappiness. That gives some query capabilities, as well as the ability to quickly grab the original data for processing with other tools.

(2) We're on GKE, and none of the container's labels end up in log entries. Surely this is pretty key stuff. There seems to be no way to customize the payloads that GKE's Fluentd sends to StackDriver.

(3) There's no way to tail logs. The gcloud command doesn't have it, and the Console doesn't either. It feels like such a missed opportunity.

I think Google would be better off ditching StackDriver Logging entirely and instead let the user set up rules to send logs to different services that actually work. The default could be BigQuery. I don't understand the point of having a separate, proprietary query language for logs when you have SQL.

I've looked briefly at the other StackDriver stuff (most of which is still on stackdriver.com, always causing a confusing redirect and often re-authentication), and it's been similarly underwhelming. We use Prometheus with Grafana, and StackDriver seems pointless in comparison.

the biggest problem I had with cloud sql proxy or cloud sql in generell is that postgresql is limited to 100 connections. since I'm a small customer hosting many low traffic projects I want to share my instance as much as possible, however running pgbouncer for a lot of ups is painful since cloudsql does not allow to query the auth table (what's used by pgbouncer if you don't add all database username/password on your own).

I think a lot of problems would be solved if there would be a way to setup cloud sql proxy that comes with transaction pooling, especially when using k8s and having a pooling namespace (multiple load balancers for HA) where all other apps connect to.

also cloud sql and ipv6 is not funny. (i.e. a lot of things in gcp is ipv4 only which is sad)

Thank you, this is very helpful because these little things are the stuff that you can’t find in any documentation but end up biting you in production.

Small nitpick, this tutorial https://codelabs.developers.google.com/codelabs/firebase-web... mentions firebase-cli 3.3.0, which was deprecated for firebase-tools (current version is 3.19.3). Might be worth updating the docs to reflect latest packages.

From personal experience, the documentation for GCP and especially the client libraries is one of its weakest points. This does not seem to be getting fixed soon, so it might need more than just accepting feedback to fix it.

For example, see this 3 year old thread on the same topic that is still pertinent - https://news.ycombinator.com/item?id=9497576

This was one heck of a rundown of different GCP options and problems. I haven't used as many services as the OP, but definitely most things that did overlap pretty much echoed my own experience.

Specifically, support and billing.

We've used Gold support for a while, and our experience wasn't great. Pretty much the same as the OP described with Silver. Perhaps response times were slightly faster I imagine, but I would be surprised if quality was any different.

Regarding billing, we've been going through some kafkaesque cycle there trying to set invoiced billing. I've filled a form on their invoiced billing page, a sales contact talked to me on the phone and I explained everything. She then sent an email asking me to fill another form, which I did, and then got contacted by another billing team who basically sent me to the original page. I explained the situation and they just didn't really seem to care or understand what's going on. They then tried to arrange a call, but missed two schedules that I've set, called the wrong number, wrong country code... Not sure where this phone call would lead. Absolutely mind boggling experience there. Teams don't talk to each other. They send the customer around running in circles. It's amazing that it's happening with one of the most sophisticated companies in the world in the 21st century.

Support seems pathologically bad. Obviously, the support people aren't well trained in all the various products and modes of failure, but they make no effort, take no ownership, and don't really escalate to engineering staff without a lot of hassle.

Had a problem with kube-api going down. I have alerting setup to detect such a thing. Opened a ticket, "I noticed outages for kube-api" gave them the specific times for the alerts, asked for an RFO and got back a response, "can you send me a screenshot of your monitoring software." Following which the support person set the case to "customer pending".

This kind of thing happens on 100% of the cases I open, usually multiple times from multiple support staff.

I am in a process of choosing between AWS and GC, and the reason I'm leaning towards AWS is that this is Amazon's primary business. They care about it and they know how to deal with customers. Google on the other hand... They could decide tomorrow that GAE is no longer something they want to deal with, and shut it down. Not likely, I agree, but their incentives and mine are not aligned.

I'm not with GCP but I'm fairly closely connected to its technical staff and user community and a heavy consumer and have been so for about 4 years.

It gives me a somewhat unique perspective where I'm happy to give them lip when they need it and they have at times and they also ask for my input fairly frequently. In my opinion they are all in on GCP and it's definitely going nowhere but up. They've been on massive hiring sprees and it's to the point that when I visited them at the Googleplex a few months ago a bunch of their campus buildings were now labeled as Google Cloud buildings.

Obviously logos can be removed and maybe it will say Android/Allo/whatever in 3 years but the team is incredible and they seem laser focused on growing the platform. You can go through their blog https://cloudplatform.googleblog.com/ and the improvements they're making and the pace at which they're making them is pretty great.

I've used them for ~4 years so I've definitely had frustrations, especially prior to them reorganizing their Support/Account Management structure about 6-12 mo ago (I do k8s so I interact with this side of things as infrequently as possible).

This is a huge money maker for them and I think they're doing a great job albeit they still need to focus on their soft skills (support, billing pains, etc).

I have migrated a few companies from AWS to GCP and they've all been happy. If the day ever comes to leave GCP I have no problem making that call; I'm not handcuffed to them. Hopefully they just continue to improve.

Nit: There's actually an entire campus that was built for Google Cloud, not just a few buildings.

Ha, proof I don't work there! :)

Thank you! This is interesting to know, and I'm sure the people there are really bright, capable and motivated to bring us the best solution possible. However this is not what concerns me. Willingly or not, they are part of a bigger ecosystem named Google, and the decisions from higher-ups inevitably influence the platform. The culture itself is probably engineer-centric and not customer-centric, which can be seen in differences in support responses.

I must stress out however that these are just my thoughts based on information from Internet, because I don't have extensive experience with either, so I might be wrong.

I've spent a good bit of time with both now. GCP has some real potential but AWS support is nothing short of fantastic. I leaned on them multiple times and had people actively working with me on the phone to get problems solved, following up with me, etc.

I was impressed.

Just got on the Silver plan with Google and only had one support ticket so far, but they got it resolved. Haven't really needed to push it yet, but we will see what happens.

I have a single dependency on Google for my business and I would never take on another one voluntarily. All the horror stories you've heard about Google support have been true for the 10+ years I've been dealing with the Google Maps team. Every year they find some new way to screw up their customer support.

And that's before you even get to your legitimate concern about Google's long-term commitment to any of their offerings.

Interesting idea about being Amazon's primary business. Would you happen to have pointers about AWS's size relative to Amazon's other business?

edit: found some myself: https://www.zdnet.com/article/all-of-amazons-2017-operating-...

I recall Google Cloud engineers trying to assuage concerns over their longevity “we’d give you s years notice before we’d shut down”.

Other commenters asked if they still had jobs after saying that.

I've been running a production project in Google App Engine for three years with six figure active users and can mostly agree with this. A few comments:

1. Stackdriver gets a bad rap. Maybe there are better solutions out there, but it hasn't been our weakest link. It's gotten very expensive, though.

2. Google recently deployed a new HTTP load balancer that is way better than the old one.

3. Instance usage on app engine is a little opaque and very hard to tune. A lot of it is trial and error.

4. Cloud SQL is garbage and the author is being generous if anything. Haven't had a chance to try Spanner, but it seems a lot better. App Engine + SQL has a lot of little hidden gotchas that are impossible to debug. I'd never recommend anyone use it in production.

Google Cloud has come a LONG way, and one of the biggest reasons we continue to invest in it is they keep improving based on feedback. Their product team is very engaged. Their IAM additions, improved load balancing, and additional availability zones have been huge for us.

Can you elaborate on what is wrong with Cloud SQL? I have next to no experience with it GCP Cloud SQL, but use AWS RDS a lot.

GCP is 2 years behind in versions and the funky setup around connections is a hassle to deal with. The number of connections is limited and not configurable, and the default way to connecting is to use the Cloud SQL Proxy which is a separate binary/service you have to run to connect.

Public IP access is available but that means going through the public internet, and also means you have to whitelist each individual IP accessing it, which is hard or impossible when trying to use ephemeral or private-ip-only instances.

Cloud SQL itself is fine...it's just hosted SQL. That said, there is a lot of google-bespoke code for interfacing various parts of their cloud with it. Cloud SQL Proxy has a lot of gotchas and hard to debug. For example, the per-instance connection limit between app engine and cloudsql is 5, and it's impossible to know how you're bumping against that. Tuning is a frustrating process, and google support hasn't been too helpful.

The limit was never 5. It used to be 12 and now it's 60.

> Sometimes when browsing documentation for a service or API you will find that the old way is deprecated, but the new way that they recommend you use is still in beta or alpha (!).


"Beta" is kind of an awful purgatory state on GCP. We all know that most of their beta stuff is actually really well-tested and stable (it's even mentioned in this article). They also know it themselves and seem to assume that people will be willing to start using their products when they are in beta. However, they have an ill-advised blanket policy that beta products are not covered by SLAs, which ends up meaning that no responsible decision-maker wants to build anything in production on top of anything tagged "beta". In many ways, you get forced to treat beta in exactly the same way as alpha even though you know that it's actually much more reliable. I strongly believe that Google and its users would be much better off if Google used beta to mean "SLA'd but with documented feature gaps and a shortened deprecation period" and kept stuff that they are not willing to SLA in alpha.

I totally agree, this is a serious problem. Because even if you understand that they mean “hey it’s Beta but it’s fine for production”, if you choose to use it and it breaks, all blame will point back to you for making an unwise choice.

There is a terrific explanation of the difference between Alpha, Beta, and Stable in the Istio project [0].

Istio was started by GCP, so while this isn’t officially Google’s definition I imagine it’s peobably the same.

It basically says, Beta is mostly usable in production, but things might change in the future that require work from you to upgrade.

[0] https://istio.io/about/feature-stages/

Also familiar: monitoring frontend is tragic-comic with no units and inscrutable data. It’s cute that the author thinks the internal version must be a lot better!

On the other hand, the article doesn’t mention Stackdriver Profiler, which has an amazing UX. Definitely recommended if you are spending significant money on CPU at GCE.

Nice, I haven't used the Profiler yet, will check that out soon. What languages have you used it with? I'm looking to use it with the JVM profiler.

[SD APM PM here] Yep, we support Java / JVM, Node.js, and Go; Python is coming soon

What’s the long term plan here with StackDriver and OpenCensus?

Are you eventually going to support high cardinality events with the ability to aggregate and then break down (as opposed to say Prometheus which pre-aggregates, so you can’t break down to debug, and doesn’t support high cardinality).

Would you be willing to add a contact method in your profile (twitter/email) for some offline questions?

I'm surprised that GKE is not included in the list of "Good"s. Managed kubernetes has been a game changer for us.

I've just updated the post to make it clearer that I'm only talking about services that I used. GKE definitely looks promising, but I haven't used it so I can't give an opinion on it.

I will add from my personal experience that GKE has come a long way, and it is probably the best managed K8s experience now. Still has some warts - a lot of the stuff you'd expect out of the box is still beta, or preview. And until this year, I wouldn't have used GKE for any significant installs, at least not without a really good support contract.

I've been jumping between AWS and Google Cloud ( with a bit of Digital Ocean sprinkled in ) for the last several years. I chose Google as our cloud platform when we founded our company last August. I could War and Peace a bunch of things but that article does a very nice job in the details. Instead, I'll give a one liner:

AWS is to Linux as Google Cloud is to FreeBSD. "Rock solid performance and everything is exactly where you think it should be"

Your analogy is confusing me. On GCP, none of the tools have all the features you expect?

My main bugbear with GC is the fact their api’s seems to be written with no regards to usability/consistency/stability from a dev perspective.

For instance I wanted to create a project progmatically a few months ago - so check out the docs. Ok v1 of the api has it...wait there’s a v2, it’s been completely removed. where has it been moved to...absolutely no clue. so what am I using now? A deprecated api?

well written.

Things I would add to the Good side.

- ability to commit to multi year cpu/ram without upfront costs

- they can live migrate your vm to new physical hardware, so no need to have random reboots of your instances

- https load balancers allow traffic around the world to enter google network closest to user, this reduces handshakes and reduces comcast like outages to affect users.

the above greatly decrease my stress.

> they can live migrate your vm to new physical hardware, so no need to have random reboots of your instances

Having supported VMWare at multiple locations, and having worked for a smaller cloud host, it shocks me that AWS don't support live migrations. VMWare can live-migrate VMs between datacenters in different geographic regions, while maintaining active sessions!

Technical debt. AWS had first mover advantage but now is saddled with the decisions they had to make to move fast at the time.

I guess that closeness to user got eliminated from the “good” list since Google’s FELB is serving 502 to this author’s customers.

They did eventually stop serving sporadic 502's, I've updated the post to make that a bit clearer.

OP: for the 502 problems give a look at this https://github.com/kubernetes/ingress-nginx/issues/1396 https://github.com/kubernetes/ingress-gce/issues/34

I run into the same problem while ago. It may solve your problem as well.

That’s good to hear. My intuition is random failure like that resulted from inability to find your instances, or a belief in their non-existence or ill health.

maybe. We haven't had any issues with it over last year. The comcast outage last week probably affected more customers.

The article mentions Terraform workspaces and their limitations (Not Google Cloud related at all). The way we get around this is to have separate variable files for each environment. The master file contains sensible defaults and then those get overridden by the environment specific files.

This is the only sane way to manage variables in terraform right now IMO. It does require that you specify a file when you run it, but that's somewhat of a bonus as it forces you to be explicit about the environment you are running against. You shouldn't really be running Terraform locally anyway so it's all handled by the CI server in most situations.

Why should one not run Terraform locally?

Not the parent, but I share the same opinion.

You totally can run locally, provided you are a single developer in your project. Once you start collaborating, you need this to be in a CI server or similar. Make it in such a way that the code is guaranteed to be properly versioned, taking the latest from your 'production' branch, as you can run into a situation where you are running terraform based on local changes which are not in version control, by accident or on purpose. Which now means that the state no longer conforms to the latest TF scripts, and this will come back to bite you. This also forces you to provide any settings as part of your CI job, not a one-off command in the CLI. It also give you auditing, and many other benefits.

But the most important piece of advice I can give is: no matter what your particular circumstances are, what your project size is, your Terraform skill, or whatever other variable, always use a Terraform remote state, with whatever backend you are most comfortable with. No exceptions. Even if you are running locally, I don't care. Unless you are doing testing while developing the scripts (and I'd argue even then), there is no reason to use a local state, ever.

Agree 100% with all this.

The ugliest: no ipv6 anywhere except the load balancer.

Came here to check if this bothered others as much as me. Not disappointed. Considering how strongly other arms of google get Ipv6, this is one of the saddest failures they have.

I keep hearing how great Terraform is because it allows you to treat infrastructure as code in a cloud agnostic manner. But every time I see a Terraform script, it's tied tightly to AWS's infrastructure -- including Hashicorp's own examples.

Hey there - Seth from Google here. We have a dedicated team of Google engineers who contribute to Terraform as their full time job. We are constantly looking for ways to improve. We added a lot of examples to our Google provider documentation, and we are working on a project that will enable us to add support for new GCP features in Terraform faster.

I can’t speak for HashiCorp’s own examples (I mean, I could, I used to work there), but we have a lot of documentation and examples in the Google bits. If there’s a specific thing you think is missing, I’d be happy to help address it.

That wasn't meant as a critique on either GCP or Terraform. Given a choice between Google creating another bespoke framework for infrastructure as code like CloudFormation and using Terraform, I think choosing Terraform was a great choice. I am huge fan of Hashicorp's other offerings - Consul, Nomad, and Vault.

It was meant more as a critique of people choosing Terraform over CloudFormation for AWS to prevent "vendor lock in".

We use terraform to create our new projects, but with them not supporting kubernetes deployments it would be very difficult for us to continue to use them after standup. (I then script kubectl commands to create our deployments, etc.)

Their own github repo issue list said they wouldn't support deployments when it was in beta (hashicorp won't do anything that is in beta, another problem with google cloud and its long, stable betas).. Kuberenetes deployments have been out of Beta for at least 6 months, and no movement on the kubernetes provider

When I was first looking into choosing between CloudFormation and TF not knowing either, I was actually leaning toward TF because of my positive experience using Hashicorps other products.

The two things that swayed me toward CF was reading that TF is usually slightly behind CF with adding support for new features when they were introduced by AWS and my rule of always choose the platform providers preferred solution.

I should have included this earlier: https://github.com/terraform-providers/terraform-provider-ku...

Deployments have been in GA 7 months now, and NO feedback from hashicorp on this one...

:) This one actually personally bothers me too. I have a really hacky shell-out to kubectl because deployments aren't currently supported, so I definitely empathize here. That being said, Google doesn't currently maintain the Kubernetes provider for Terraform; it's entirely maintained by HashiCorp.

Terraform is great because it lets you treat infrastructure as code, but I don't think it makes any promises about cloud agnosticism. In fact it's the opposite, each provider is specific to the service that they are offering.

Rather than Terraform offering a lowest common denominator set of resource definitions, each provider can be designed to work in a way that most naturally maps to their offerings and API.

In theory you could write modules that provide abstractions over multiple cloud providers, but I don't know how good a result that would give.

In that case, if you're on AWS, then why not just use CloudFormation? CloudFormation is a lot more powerful on AWS than TF.

While CloudFormation is lovely, it has a couple of real downsides:

- very often something new is introduced on AWS without any CloudFormation support, so you will either need to write custom resources with lambda functions, or be prepared to wait a long time. From the few times I checked TF is actually faster to support new services / features.

- CloudFormation has extensive documentation, but very often you end up having to also read AWS API documentation and going through a dozen trial & error attempts before you get something working.

- occasionally CloudFormation can get stuck and leave you in a state you can not recover from. Luckily AWS support tends to be very responsive and can help you here, and it hasn't been happening as much as 1-2 years ago

- CloudFormation has very little support for reusing things: no macro support, very limited include support, no support for YAML aliases (unless you use "aws cloudformation package" as a workaround)

- CloudFormation changesets are nice, but do not work if you use sub-stacks (which you should use)

Just like Zope there is a bit of a Z-shaped learning curve: there is a pretty steep learning curve to start, after which a lot of things become easy. But when you get to more complex things suddenly everything becomes frustratingly difficult again. That may come with the territory; I have not used other tools such as TF so I can't tell if that is a problem-space specific thing.

For context, I’m both relatively new with CF and the CF “expert” at my company - in the land of the blind, the one eyed man is king.

occasionally CloudFormation can get stuck and leave you in a state you can not recover from. Luckily AWS support tends to be very responsive and can help you here, and it hasn't been happening as much as 1-2 years ago

I had just the opposite experience with AWS support. CloudFormation was stuck because I had a syntax error in my Node based custom resource so of course it waited for a response from the lambda that it wasn’t getting and then when I tried to cancel the stack creation, it again called the same broken lambda to delete the resource and hilarity ensued. This was after I corrected the lambda and ran it successful to create another stack.

I did a live chat with AWS support and all they did was quote the user documentation - after trying to explain that I am not trying to create a lambda resource and it’s not trying to delete an ENI.

I finally just gave up and waited 4-5 hours for the rollback to time out.

Just like Zope there is a bit of a Z-shaped learning curve: there is a pretty steep learning curve to start, after which a lot of things become easy.

I’m very much at the second valley on the Z. I spent quite awhile getting the hang of CF just to make my code deployments easier (creating parameters, autoscaling groups, launch configurations, lambdas, etc.) but I haven’t had to do anything especially complicated.

CloudFormation is just for AWS Services. We use terraform for more than just AWS Services. HCL is also much nicer to work with than CF. I've also never found anything CF can do that TF can't. I have, however, found many things TF can do that CF can't.

I admittedly live in an AWS bubble when it comes to infrastructure and netops. I’m mostly a developer whose only expertise at modern netops and infrastructure is AWS. I could see where TF would be a better solution for infrastructure that spans providers.

As far as what CF can’t do with respect to AWS, most if not all of the missing pieces can be remedied with custom lambda backed resources and/or Python scripting with troposphere.

Because editing yaml or json sucks.

Linked from Hashicorp’s Getting Started page...


Which is neither json or yaml...

It's yet another semi-json format that doesn't have widespread support by any other parsers. Even Hashicorp recommends JSON if you want something that is machine readable.

I think it looks lovely :- ) having used Puppet a bit in the past. Look how simple: `inline = ["sudo apt-get -y update", ... install nginx ...` Terraform seems to be. .... hmm, apparently one can use things like Chef & Puppet etc together with Terraform, I eventually noticed.

the nice way to do this is to pass the chef role to the user data template in the autoscale group

vars { ROLE = "bastion"

Then append this to the user_data.tpl file

chef-client --audit-mode enabled -R -E ${chef_environment} -r role[${ROLE}]

Allows you to use the same user_data.tpl file for multiple autoscale groups

Terraform doesn’t abstract away the details of which cloud provider you are using. That would be impractical if not impossible for anything beyond trivial use cases. But it does let you abstract away the logic in dealing with maintaining infrastructure that’s managed through APIs. And the provider support for each cloud makes a huge difference in how usable it is. The AWS support is generally great so I assume the GCP support here is as well.

my very ugly experience: For a mobile app, the cost of storage access bandwidth (Google Storage) for a 30 second video will cost you more than the profit you can make from ad impression (CPM/CPC) using Google Admob, making any video-based business unsustainable using Google's ecosystem.

Note: If you work at Google, you might want to pass this message up the chain, thank you.

PS: i've played with all types of admob ads to try to make the system profitable, even the IN-YOUR-FACE interstitial ad that everyone hates (even Google themselves hate it, they are limiting its use)

PS #2: Google Admob has also removed the ability to create native ads.

This is to push you towards CDNs. The main data centers cannot scale appropriately to serve high bandwidth applications, and aren't meant to.

I am not sure why this was downvoted. Isn't it true, that cloud storage service is supposed to be used for cloud storage, and content delivery service is supposed to be used for content delivery?

Media content storage and delivery is an advertised use case [1]

> ...highest level of availability and performance is ideal for low-latency, high QPS content serving

And the cost using Google Cloud CDN isn't necessarily going to be a major improvement over Google Cloud Storage. Egress cost is similar depending on usage, plus you pay for cache fill and api calls.

[1] https://cloud.google.com/storage/use-cases/

Reading the comments here a persistent theme seems to be that the documentation is lacking. Something that might help is creating a "dog-fooding" team mostly filled with interns. This team will be responsible for creating samples with the public APIs. Why Interns? Because they will move on to better things around the time when their brains learn the patterns required to navigate the Google cloud API efficiently.

Yes! The documentation problem is NOT complex to solve. I would love to use GCP on more projects but the docs are embarrassing.

Are they trying to get engineers to write the docs rather than hiring professional technical writers?

Truly baffling.

[I'm a technical writer at Google working on Cloud documentation]

Thank you for the constructive comments regarding GCP and in particular documentation; we are working on addressing the points raised.

GCP has a dedicated tech writing team which is growing (we're hiring!) and the documentation is written by tech writers. We work with our colleagues in developer relations (developer advocates, developer programs engineers), UX and the software engineers who created the product.

All that said, we know our documentation is not perfect and are constantly working to improve. To that end if you see something that is broken/incorrect, please file a bug (the "SEND FEEDBACK" link in the top right of the documentation page is a bug submission form). We love to hear from users and I can assure you docs bugs do not get routed to /dev/null (we have a bug SLO, just like our SWE colleagues).

As a more general tech writing comment, documenting large, fast growing distributed systems such as public clouds is tricky. There's a lot more to documentation than just writing up instructions, such as thoughtful information architecture. It's a challenge that we, and I'm sure our counterparts at AWS et al., are grappling with.

As tech writers, we wince when we see errors in our work and feel pain when users are not able to enjoy the product/service as intended due to issues in documentation. The whole developer relations team (DAs, DPEs and tech writers) proactively try and catch these mistakes but of course, we miss some. It is nice to see users feel so passionately about documentation, so please let us know where we have made mistakes and we will fix them and try harder next time.

Isn’t Google using engineers for everything? They’re famous for piss poor support, for example, because they want engineers to build AIs to do support for them.

Both Azure and AWS have fantastic documentation. The bare-bones documentation and sparse tutorials of GCP are big reasons (other than market awareness) that larger enterprise customers are hesitant to adopt.

This is a very good evaluation. We were running on AWS for 3 years before moving to GCP and have the same experience. To make up for some of the shortages we replaced some parts with the following: - Monitoring with Datadog - Container build and deployment with Cloud 66 Skycap and Habitus - Support: well, I simply wish if Google Support was better or at least they had a more proactive status page update policy.

I'm always surprised that comparisons like this with AWS miss the fundamental philosophical approach of the cloud offerings.

AWS is "infrastructure as a service". Hosts, network switches, load balancers - things that historically cost an arm and a leg, but they could virtualize. Add some elasticity and auto scaling, and you've got the foundation to build anything else on top.

The hyper-specificity of SQS vs Kinesis, for example, comes out of the same philosophy: Provide the infrastructure, and let customers figure out what to build on top of it.

Google, meanwhile, started with AppEngine - run your apps in the cloud without concern for the hardware. AWS's equivalent is Elastic Bean Stalk - more than one layer of abstraction higher than the default offerings.

Google may have started with AppEngine, but that's not how it is today.

Google now has a line-up of products directly competing with equivalent AWS services: VMs (Compute Engine), queueing (Pub/sub), key/value storage (GCS, which even has S3 API compatibility), SQL databases (CloudSQL, Spanner), Redis, document storage (Google Datastore), distributed file system (Filestore, like EFS), big data store (BigQuery), etc.

I see lots of evaluation of GCP, or comparison between GCP vs AWS recently. But I don't see as many for the Azure platform, although Azure being the 2nd in the cloud provider market, and even closing on AWS in market share. Anybody has any insights on this?

My hunch on why it is considered to be the 2nd in the cloud provider market is due to the most Enterprise size companies already being in bed with Microsoft with a lot of other products and there are a lot of incentives being thrown around to commit to using Azure. Also, there is the "Azure is not Amazon." or "Azure is not Google." Some Enterprise size companies simply won't do business with AWS because they are Amazon. That doesn't meant Azure is a good technical product and this maybe why you are seeing GCP vs AWS more often in technical comparisons.

For example, Azure was a consideration for our "not AWS" cloud provider until we tried to use custom Linux images with at rest encryption on the root volume. It's simply not supported and there was no viable work around. There was also no ETA for adding in support for this non-negotiable customer requirement. This is why we moved onto evaluating GCP. So far it has been pretty great and it is checking a lot of our requirements that Azure either fell short with or flat out didn't support.

I have a feeling that most of us working with Azure are employed by large corporations outside of Silicon Valley / FAANG. And we're poorly represented on Hacker News.

Microsoft's bread and butter was always the Fortune 500's and I have the impression that's where they're making most of their Azure sales.

I've used Azure a bit and it's been pretty painful for the most part. There are three, maybe four identity management systems (Azure native, MS Live, Hosted AD, others?). Working with the SDKs and doing programmatic authentication was painful.

I did a fair bit with AKS. It had a lot of shortcomings that would take a while to go into. They ended up suggesting I use ACS-engine.

There are some products that are neat, that I haven't really used at scale, but seem good. Most of the bigdata stuff is pretty legit based on my limited usage.

The customers of my employer are large enterprises. Azure definitely has penetration there. I only have a small sampling, but many of them end up leaving Azure for AWS.

None of them are really considering GCP.

Could you go into it? I’ve kept hearing good things about AKS, but my experience with their VM service left me skeptical of those claims.

Here's a list of things from January of this year:

  - when scaling nodes, didn't get the desired count
  - creating and deleting clusters takes a while (30 minutes)
  - cluster operations get stuck in a loop
  - disappearing node did not bring up a new node
  - external management setup was difficult (GKE assigns publicly reachable IP by default)
  - changing SSH user or key through azcli or GUI didn't work
  - default storage class wasn't set
  - admission control disallows deploying a registry in kube-system
  - nodes randomly go into notready state
  - no ability to add new node pools or modify pool without destroying cluster
This was all prior to GA.

I don’t have a ton of experience with Azure as a whole but have used their VM service for a bit.

Azure VMs are way behind compared to EC2 IMO. Burstable VMs (the cheap ones) aren’t available in all data centers or for all images (which they call SKUs), storage accounts are weird, provisioning times are slower, their cheapest instance types (Standard_A) are REALLY slow, documentation is confusing, extremely spread out and contradictory at times, account management via Azure AD is way more confusing than IAM at least at first, their auto scaling service (scale sets) is less flexible than AWS autoscaling, etc.

Also, Microsoft and HashiCorp are pretty much the only maintainer of the Packer and Terraform providers. The difference in commit history between the ARM provider and the AWS provider are stark. This means that when you encounter bugs or cryptic errors from the Azure API (which will happen), you’ll be waiting a bit for a fix, even if you cut the PR.

The other thing with Azure is that every large company uses it for Windows workloads because they get massive discounts and tons of credits

You don’t shop Azure.

Microsoft is repeating what they did with NT — owning identity. They reel you in with Office 365 and when you need controls convert you to Azure AD. Once that happens, Azure is a no-brainer.

Also, every company with an EA has a contract vehicle for Azure... just add water!

Just a wild guess, but I think the people who use Azure use it for its compliance/security features, and people who need those may not be the most forthcoming about their applications. (Governments, Banks, etc)

I worked in a bank in Europe and the management did not consider any other option for cloud than Azure. Microsoft does a good job selling it to Enterprise Microsoft shops. Most of them want to continue their relationship with Microsoft who they consider cheaper and better than IBM/Oracle.

One other thing I believe is worth mentioning -- and to be honest I was a bit surprised with -- is the ecosystem around GCP vs AWS.

For example, we were looking for a solid redis hosting service in the EU, and latency is key obviously. There are virtually none with GCP, but quite a few with AWS. I'm pretty sure there are other similar examples with other services.

> we were looking for a solid redis hosting service in the EU, and latency is key obviously. There are virtually none with GCP

Google Cloud recently released public beta for Memorystore Redis (https://cloud.google.com/memorystore/) which is a hosted opensource Redis on GCP. It is availabe in europe-west1 region so it might suit your needs.

disclaimer: I am an engineer on Memorystore Redis team.

I apologize for bringing up something that is offtopic here but I am really glad to have read your comment.

Last I read in the docs, Memorystore was not accessible through the AppEngine Standard Environment, which was rather disappointing to me.

I was wondering if just providing the App Engine Standard Environment with the Redis instances address and credentials, along with a service account or IP whitelist for access, would not suffice to be usable there as well. If I'm not wrong, something similar is done in Flexible Environment according to the docs.


Thanks so much!

Unfortunately that is not possible at this moment. Without going into details: Flex is running on the same cloud infrastructure as Memorystore that it why it is possible to establish networking between them. This is not true for the regular AppEngine.

Thanks so much for the reply, it saved me a few hours of trying it out at the very least! Of course, I hope there is support for Memorystore on App Engine eventually. Thank you.

Yes, I'm aware of it. BUT, it's beta, and it's not persisted. For our use case, we wanted a persisted redis store. There are lots of providers, RedisLabs, RedisGreen, Redistogo and probably a half dozen more, but none of them provide service to Google Cloud in EU.

Seconding the issues on the Load Balancer and 502s. We moved our app there earlier this year to take advantage of GKE and just sort of put up with random 502s (especially during deployments, which are supposed to be zero-downtime). We contacted Google Support but they were unable to help and said that yeah, some percentage of dropped requests is fine. I suppose it's something with our Rails app or configuration, but no one was able to help and I decided to stop wasting time trying to fix it.

They have more-or-less gone away now but several months later it leaves me with very little confidence in GLBs and quite scared to change anything around.

Hey, I work on public-facing support for GCP. Given the point that it's hard to see what mailing lists are available I wanted to point out https://cloud.google.com/support/docs/groups

It's definitely not as well-integrated as AWS' forums, but it's a starting point.

Thanks! I've just updated the post with that.

This is a great article!

I've used AWS extensively, and GCP moderately.

If you take the thoughtworks style approach, I would classify AWS as "Adopt", GCP as "Trial".

Google is investing very heavily in GCP, both internally and in sales and marketing. But, it still feels very much like patchwork. At least when you compare to AWS - which, while complex, is very consistent, very well documented, very stable, and performs very well. GCP is also, I believe, a combination of acquisitions, and this bleeds through with differently skinned UIs and URLs.

I'd say the big exception to this is GKE. If you're happy with a managed K8S, there is no comparison to GKE right now. GLB+GKE and you're pretty much set to handle anything with little ops work, and probably more affordable than AWS+Kops. Although, I'm not sure what the support story is like.

We’re currently having this problem where firebase suddenly stops working with the main wired internet network. I switch to another internet provider (mobile wireless) and firebase starts working immediately. This may be something to do with router settings but has everyone in office confused like hell.

Can't stress enough how great Compute is compared to EC2. A lot of the (enterprise) companies I work with are weary of going all-in on a particular cloud and prefer to hand-roll open source solutions instead of consuming cloudy services; the most they'll use is VMs, basic networking and storage (although GKE seems quite popular as well nowadays). GCP nails this use case IMO. It's fast to provision, reliable, rarely if ever fails during setup, and you don't need a separate start-up company to predict your costs. It's very much quality over quantity though, so whilst this is great and manages to capture a particular market I do think they need put out way more (perhaps lesser quality) stuff to stay competitive with AWS and to a lesser extent Azure.

honestly, stackdriver is the worse.

monitoring, as the author said, is absurd.

but their logging solution is even worse, specially when compared to solutions like kibana + elasticsearch. i wouldn't recommend stackdriver to anyone.

I've not used GCP outside of GCP training courses yet (the quality of these is another matter....), but I did come away thinking the console UX and 'Cloud Shell' thing was absolutely brilliant.

This post was interesting to me as the training course presenters seemed to make out that StackDriver was the best thing since sliced bread, but I guess they have to being Google partners...

Interesting take. In my job, the idea that cross project connections would be rare or that it’s good for all resources in a project to have access to all others seem a little crazy. I agree it’s a far safer default that AWS’s, but it’s not _good_.

The biggest wall for me in setting up GCP is that the resources and accounts are all so tightly tied to personal Google identities. This is unworkable for me, but I don’t see any clear way around it. We’ve set up our billing account on a shared account but it becomes hard to work with around personal Google accounts (even moreso when they are corporate Google Apps accounts). The AWS IAM identity model makes much more sense to me, but perhaps I just need a new paradigm for thinking about it. Are there any good resources for relearning how to think about resource ownership in the GCP model for AWS heads?

I really like the fact that 1x micro VPS is included in their "Always free" allowance. A free *nix VPS in the cloud is a pretty awesome deal no matter how you look at it.

I'm currently hosting websites with it & despite being underpowered it seems to be hanging on thanks to cloudflare caching.

I got hard time while trying to figure out searching and filtering Stackdriver logs. I'd rather streaming logs with EFK and search with Kibana, Stackdriver is very basic comparing to it. And also visualizing instance metrics is such a pain, comparing with with AWS Cloudwatch.

One thing I would love to see is root domain CNAME support (or ALIAS, or ANAME, or whatever) so I can host a static website on a root domain. It seems like there's a workaround using CloudFlare, but it would be really nice if GCP itself supported this.

I'm not sure I understand what you mean, but does this help: https://cloud.google.com/storage/docs/hosting-static-website

Most of the AWS complexity issues that the author mentions are actually enterprise / high-scale features.

Sure, they are too complicated and unnecessary for small operations, but many large organizations wouldn't even consider a cloud infrastructure provider without those things (comprehensive permission and identity model, discernible region and inter-region model, breadth, and depth in computing types, highly auditable security model, etc...)

I get his points, but all the things he claims are great about GCP are only great for a one-man shop and not for an enterprise operation.

Easy-to-use permission control doesn't mean simple. For example, you can always use off-the-shelf IAM roles provided by GCP, but you can also mix-match permissions and create your own custom roles.

Also network configuration can be complicated if you track down for the detailed form.

Any specific feature you're looking for is missing in GCP?

> Another service that would be very handy is a hosted etcd/Zookeeper service for service discovery, consensus, leader election, and distributed cron jobs.

Look at GKE :)

Another Bad/Ugly: impossible to remove AppEngine data and application without removing the whole project, shame on you GCP :(

This is a good perspective from a smaller user. As said before, the core platform is technically fantastic with the fastest and easiest compute, storage, and networking around. Some downsides mentioned already have options in alpha/beta (like more updated UIs, products like memorystore, filestore).

App engine - it would be good to have a button to delete app engine instead of deleting a project approach.

Having multiple accounts as best practice has nothing to do with IAM being complex, it’s about organization and security.

Would you run all your workloads in a single account? A bit messy if you have tons of resources and people collaborating.

And what if this single account gets compromised?

I think the GCP LB is way better than anything you can get on AWS.

It's also more limited in some ways. People like integrating with Akamai/Cloudflare for instance, in which case there is little point to GLBs.

GCP wins hands down when you need to ssh into your server via mobile on the go. It has a mobile app. I'm not aware of an AWS mobile app!

There is one, but it doesn’t do much

Ugly: no VPC secure access to Cloud Functions basically making it worthless to connect with legacy database systems hosted on GCP.

+1 Loving Firebase + GCP

article needs a date

Thanks, I've updated this.

Okay Google, I would like to report an outage in GKE.

Thanks for contacting Google AI support. We can't escalate this to a human because we don't have enough skilled support team but our AI can solve any problem and we use you as a training set. /s

(I know its harsh but I'm sure people here can relate to it)

slightly off topic, but since the post mentions StackDriver Monitoring in the ‘Ugly’ section. Just wanted to point that you can get 100% free high quality Infrastructure monitoring from DripStat and use it on all your gcloud vms, thus avoiding that issue completely. https://www.dripstat.com/products/infra/

Is that your project? You seem to do a decent amount of promotion of it and Chronon Systems.

For what it's worth, I'd imagine most people would be hesitant to use a free service for something as critical as their monitoring. Especially since the first term in the terms of service says the service can terminate at any time, without notice.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact