Hacker News new | past | comments | ask | show | jobs | submit login
Why you should not use Google Cloud (medium.com/serverpunch)
895 points by samdung on June 30, 2018 | hide | past | favorite | 357 comments

As someone who is currently struggling with Google Cloud's mediocre support, this is not surprising. We pay lots of money for support and have multiple points of contact but all tickets are routed through front-line support who have no context and completely isolate you from what's going on. For highly technical users the worst support is to get fed through the standard playbook ("have you tried turning it off and on again?") when you're dealing with an outage. Especially since the best case is your support person playing go-between with the many, siloed teams trying to troubleshoot an issue while they apparently try to pass the buck.

Not to mention the lack of visibility in changes - it seems like everything is constantly running at multiple versions that can change suddenly with no notice, and if that breaks your use case they don't really seem to know or care. It feels like there's miles of difference between the SRE book and how their cloud teams operate in practice.

I'd just like to take this opportunity to praise Vultr. I've been using them for years and their support has always been good, and contrary to every other growing company, has been getting better over time.

I had an issue with my servers 2 days ago and I got a reply to my ticket within 1 minute. Follow-up replies were also very fast.

The person I was talking to was a system administrator who understood what I was talking about and could actually solve problems on the spot. He is actually the same person who answered my support requests last year. I don't know if that's a happy accident or if they try to keep the same support staff answering for the same clients. He was answering my requests consistently for 2 days this time.

I am not a big budget customer. AWS and GCP wouldn't think anything of me.

Thank you Vultr for supporting your product properly. And thanks Eric. You are very helpful!

Google Cloud provides more than just VMs and Containers. It has a bunch of services backed in, from a variety of databases such as Firebase (that have powerful built in subscription and eventing systems) to fully baked in Auth, (Google will even handle doing two factor for you!) to assisting with certain types of machine learning.

Vultr looks like they provide more traditional services with a few extra niceties on top.

Within Google's infrastructure, I can deploy a new HTTPS REST endpoint with a .js file and 1 console command.

Could I set up an ecosystem on a Vultr VM to do the same? Sure, it isn't magic. But GCP's entire value prop is that they've already done a lot of that work for you, and as someone running a startup, I was able to go from "I have this idea" to "I have this REST endpoint" in couple of days, without worrying about managing infrastructure.

That said, articles like this always worry me. I've never seen an article that says "Wow Google's support team really helped out!"

Using such proprietary features sounds like a great way to subject yourself to vendor lock in and leave you vulnerable to your cloud provider's every whim. I understand that using ready made features is alluring, but at what point are you too dependent on somebody else? All these cloud services reminds me a bit of left-pad, how many external dependencies can you afford? Maybe I'm too suspicious and cynical, but then I read articles like these from time to time...

The difference, IMO, is that you're generally leveraging the cloud providers platform in addition to using their hosting.

There are ways to make the hosting relatively agnostic, but choosing a pub/sub solution (for example), that operates at 'web scale' will have a distinct impact on your solutions and force you into their ecosystem to maximize value. Why bother with BigCorps UltraResistant services if you're only going to use some small percentage of the capabilities?

I've made systems that abstract away the difference entirely, but I think the 'goldilocks zone' is abstracted core domain logic that will run on anything, and then going whole-hog on each individual environment. Accept that "cloud" is vendor lockin, and mitigate that threat at the level of deployment (multi-cloud, multi-stack), rather than just the application.

You're not alone. I worry the same about many things, but everyone just thinks I'm a negative nancy for discounting THIS AWESOME SERVICE with these awesome, 100% non evil people behind it!

I do use AWS, and have tried out GCP before. Just because I use Vultr doesn't mean I can't also use others.

I should have quoted more. :) I was indicating why you might have gotten some early down votes, your comparison wasn't like to like.

Thanks for the info. You may be right about the downvote reason (though it's a pretty ridiculous reason), but I don't think that matters since they are in the same industry providing a similar service and there's no reason why GCP can't provide the same service as Vultr, especially since they charge a lot more for their instances than Vultr does.

Downvotes here on HN often don't make sense. You can be exactly 100% correct about something and still get downvoted to hell.

The best thing to do is to simply ignore them as asking about downvotes just invites more of them.

Well, I don't really care about the precious internet points disappearing. I'd much rather hear from someone what their reasoning is since I might actually learn something.

But it is telling that there have been at least 5 downvotes but no one is willing to comment as to why.

Edit: since I see you have a downvote (surprise, surprise) I'll clarify that it wasn't me.

Your new downvotes may be because HN guidelines say not to ask about or discuss downvotes.

What is this? Fightclub? "Rule #1 of fightclub: You never talk about fight club" :D

Seems like a recipe for breeding unfairness. "Don't talk about the system", :sigh:

It just generates a bunch of unnecessary comments in a thread. There's already like 8 in this chain for example.

Please don't break the site guidelines by going on about downvotes. That's a strict reduction in signal/noise ratio, which mars your otherwise fine comment. We're trying for the opposite here.

Downvotes can irritate, but there are at least two balms that don't involve adding noise to the threads. One is to remember that people sometimes simply misclick. The other is that unfairly downvoted comments mostly end up getting corrective upvotes from fair-minded community members (as happened with yours here).


Thanks for the info. I forgot about that guideline. Too bad I can't edit my comment.

I've re-opened it for editing for you and will happily delete my comment if you take that bit out.

Thanks. Done.

I'm happy for you to leave your comment. It might help someone else in future.


I’ve seen other companies walk away from Google Cloud for similar reasons. Automate everything to scale doesn’t work for the Fortune 500. They should absolutely own this market.

This is why AWS and Azure continue to gain market share in cloud, while Google remains relativity stagnant, despite (in many cases) superior technology.

Their sales staff is arrogant and has no idea how to sell into F500 type companies.

Source: 10+ meetings, with different clients, I attended where the Google sales pitch was basically "we are smarter than you, and you will succumb". The Borg approach. Someone needs to revamp the G sales and support approach if they want to grow in the cloud space.

Even for small businesses their sales is pretty bad. I once got a package in the mail from them with a URL containing a tracking code printed on it to contact them that was so obviously Google being Google and treating people as part of a funnel. There was no phone number to be found and nothing personalized.

The other funny thing is the package had a neoprene sleeve for a Chromebook. Eventually a sales person reached out via email assuming I owned a Chromebook and acted like I owed them a phone call because they gave me a neoprene sleeve I couldn’t use.

The entire package ended up going in the trash, which was an unfortunate waste of unrecyclable materials.

If you filled in a form at the link provided from one of the bits of paper in the box they would have sent you a Chromebook for the sleeve. I'e got one here gathering dust. My boss threw away the same package but I was curious and looked through it carefully.

Sounds like it functions as some kind of filter, whether intentional or not. ;)

"People who will look through every bit of advertising crap company x sends", vs those who don't.

Something, somewhere is probably making stats on that. ;)

I found it interesting that you wrote "Something, somewhere", and not "Someone, somewhere"

Yeah, that was on purpose. It's no longer obviously just humans potentially doing this. ;)

Yes, this is our experience as well, and the root cause of their many problems with GCP. Tech is nice but matters little if the account team just ignores us.

> "we are smarter than you, and you will succumb". The Borg approach.

Well, that seems to be the approach at Google. Starting with hiring

Not surprising they end up with a hivemind that can't see past their mistakes.

Reminds me of a thread I saw on the Google Inbox mobile app a while back. Brilliant app, but no 'unread message counter'. There was a huge number of people on the thread begging for that feature and going so far as to say that it was the one thing that prevented them from using the app. Their thinking was apparently that you should have filters for everything and it all should've fallen neatly into little boxes, but for people that have been using email 10 times longer than those developers have been out of college, that's not very practical. One G dev chimed in and said 'But that's now how I use email' and closed off the discussion.

That's interesting. I was of the understanding that everything at Google office tries to de-stress you/undistract you. I thought that would result in people being calmer/ more empathetic.

Arrogance is arrogance, whether stressed or not. Hiring arrogant people tends to create an environment non-conducive to empathy.

I have less experience with their sales/account managers but every time I got a super weird patronizing and even cultish vibe that really put me off.

Yes that's exactly what I'm talking about they are super arrogant and unwilling to discuss things at a practical level.

And I've seen it cause them to lose at least 10 potentially good sales.

They have advantages but they're so arrogant that it puts people off.

It's more than 10 times or more people told me they prefer Google's solution to Microsoft or Amazon's but they're going with a competitor because they can't stand Google's arrogant attitude. It's close to laughable because of throwing money away just because they won't back off.


It blows my mind that GCloud, with arguably superior tech and performance compared to AWS/Azure, can't handle support. I have my own horror stories from 2 years ago, but still they haven't fixed it.

Google just doesn't seem to be able to focus on products that requires service and customer support. Maybe they just don't care about it while they have an infinite revenue stream from search and advertising. Whatever it is, they should be humiliated.

I love the tech, and the ux details like in browser SSH (AWS hasn't improved UX EVER) but they can't get support right? Amazing.

> Google just doesn't seem to be able to focus on products that requires service and customer support

That's literally any product that people pay for (instead of viewing ads).

Customer support isn't and never has been in their DNA. It's often rage-inducing how hard it is to contact a human at Google.

They seem to think they can engineer products that don't need humans behind them.

That's the meme, but my experience with the business support for G Suite does match it at all: I can easily call the phone support, get a competent human quickly, and they are very helpful.

I think I read recently that they outsourced g suite support

Ironically, they let you know every time they can that you can contact your personal adwords sales person by phone.

The reason for this is obvious, but it's a good point. It's like someone whose personality becomes awful right after you marry them.

I'm actually going to take this back to my company as a principle: "Treat locked-in/subscribing customers as well as our salespeople treat prospects."

I didn't write that article, but last week I came to the same conclusion and began my migration from GCP to AWS. I admire Google's tech but Cloud Platform lacks fit and finish. It's not fully productized. It's not even fully documented. (Is it indolence or arrogance to publish a link to the source code as the only explanation of an important API?) I'm sorry, Google, you ignored me when I was crushing on you. Now I have Amazon.

I think they still are mainly focused on their ad business as the core of the company and cloud is something they 'do on the side'. For Microsoft, Azure is core business, it's the future of the company. If they fuck it up, they're dead. Google apparently doesn't see their cloud offering as their core business and therefore doesn't get the attention it needs.

In my limited experience, Google has worse support than facebook (when it comes to advertising agencies). They simply don't care because you are a tiny multimillion euros company and they are THE GOOGLE.

AWS has acquired Cloud9 for in browser for desktop environment/ SSH, it inherits account credentials. You should check it out.

Yeah, Cloud9 is billed as an IDE, but it's really more useful as a terminal inside your cloud environment that happens to have a text editor. Workspaces has been great for a cloud-based development environment, and the new Linux Workspaces will be more useful than the Web-based "cloud IDEs".

I haven't tried workspaces. Will I get better performance from my chromebook than just using cloud9? Cloud9 is also incredibly cheap.

They are very different things: Workspaces runs a full desktop environment (Windows or Linux) on an EC2 instance, and enables you to remotely access it through client software. The client software uses Teradici PCoIP, rather than VNC or RDP, and Teradici is amazing: it is so fast that the desktop feels like it is running on your local computer.

This means that you can run whatever development tools that you want on the EC2 instance, rather than the very limited code editor that Cloud9 provides. You can easily run a full copy of Visual Studio on a Workspace, and get the full resources of an EC2 instance with SSD drivess.

If it can make you feel any better, the AWS support is the same.

AWS sends you emails 9 months in advance of needing to restart individual EC2 instances (with calm, helpful reminders all the way through). IME, they're also really good about pro-active customer outreach and meaningful product newsletters... Even for tiny installations (ie less than $10K yearly).

Anecdotally: I've been an MS gold partner in a bunch of different contexts for years. The experiences I had as 'small fish' techie with AWS were on par or better. YMMV, of course, but I'd be more comfortable putting my Enterprise in the hands of AWS support than MS's (despite MS being really good in that space).

It costs a pretty penny, but I’m very happy with AWS enterprise support. When we had a ticket that we didn’t escalate get a crappy answer, our TAM escalated on his own initiative to get us a better answer.

Are you on insiders? Only way to get anyone to care.

Let's see if the current logging thread will result in anything. :-)


It's a referral-only group where you get to play with and complain about their tech before everyone else does

How would one go about getting an invite to the secret club?

Two people either on the list already or working for Google have to endorse you.

Haha :D

Higher paying (premium) customers actually get premium support in GCP, e.g. SRE who can get paged on an outage.

Prices seem high though: https://cloud.google.com/support/?options=premium-support#op...

So piggybacking on this, I have a similar story to tell. We had a nice young startup, infra entirely built out on Google Cloud. Nicely, resiliently built, good solid stuff. Because of a keyword monitor picked up by their auto-moderation bot our entire project was shut down immediately, wasn't able to bring it up for several hours, thank god we hadn't gone live yet as we were then told by support that because of the grey area of our tech, they couldn't guarantee this wouldn't keep happening. And in fact told us straight out that it would and we should move.

So maybe think about which hosting provider to go with, don't get me wrong I like their tech. But their moderation does need a more human element, to be frank all their products do. Simply ceding control to algorithmic judgement just won't work in the short term if ever at all.

I’m starting to favour buying physical rack space again and running everything 2005 style with a light weight ansible layer. As long as your workload is predictable, the lock in, unpredictability, navigation through the maze of billing, weird rules and what-the-fuckism you have to deal with on a daily basis is merely trading one vendor specific hell for another. Your knowledge isn’t transferable between cloud vendors either so I’d rather have a hell I'm totally in control of and of which the knowledge has some retention value and will move around vendors no problems. You can also span vendors then thus avoiding the whole all eggs in one basket problem.

Hybrid is what you are looking for. Have a rack or two for your core and rent everything else from multiple cloud vendors, integrated with whatever orchestration you are running on your own racks (K8s? DC/OS? Ansible?).

Or just two DCs in active/active.

Still works out cheaper for workloads than AWS does even factoring staff in at this point.

AWS always turns into cost and administrative chaos as well unless it is tightly controlled which in itself is costly and difficult the moment you have more than one actor. GCP probably the same but I have no experience with that. Very much more difficult to do this when you have physical constraints.

Two man startup, perhaps but I think the transition should go:

VPS (linode etc) for MVP, colo half rack, active/active racks two sites then scale out however your workload requires.

More importantly, there is a wealth of competent labor in the relatively stable area of maintaining physical servers (both on the hardware and software side). The modern cloud services move fast and break things, leading to a general shortage of resources and competent people. As a business, even if slightly more expensive initially, it makes more sense to start lower and work up to the cloud services as the need presents itself.

You can federate Kubernetes across your own rack and one or more public cloud providers.

You can but that’s another costly layer of complexity and distribution to worry about.

One of the failure modes I see a lot is failing to factor in latency in distributed systems. Mainly because most systems don’t benefit at all from distribution and do benefit from simplification.

The assumption on here is that a product is going to service GitHub or stackoverflow class loads at least, but literally most aren’t. Even high profile sites and web applications I have worked on tend to run on much smaller workloads than people expect. Latency optimisation by flattening distribution and consolidating has higher benefits than adopting fleet management in the mid term of a product.

Kubernetes is one of those things you pick when you need it not before you need it. And then only if you can afford to burn time and money on it with a guaranteed ROI.

Sure. The idea is that you get the benefits of public cloud and cost savings of BYO hardware for extra capacity at lower cost. Of course, you're now absorbing hardware maintenance costs as well. I haven't seen a cost breakdown really making a strong case one way or the other, but my company is doing it anyway.

Have you actually done this, or are you repeating stuff off the website? Because everyone I've talked with about kubernetes federation says it's really not ready for production use.

The approach we have taken is to create independent clusters with a common LoadBalancer.

Basically, the LB decides which kubernetes cluster will serve your request and once you're in a k8s cluster, you stay there.

You don't have the control-plane that the federation provides and a bit of overhead managing clusters independently, but we have automated the majority of the process. On the other hand, debugging is way easier and we don't suffer from weird latencies between clusters (weird because sometimes a request will go to a different cluster without any apparent reason <-- I'm sure there's one, but none that you could see/expect, hence debugging).

My people's time is more important than your complex system.

Ha. It's in process. Not ready yet. I'll report back if we fail miserably.

Federation v1 is legacy now. The new architecture is called MultiCluster and designed to work on top of K8S rather than having a leader cluster: https://github.com/kubernetes/community/tree/master/sig-mult...

That's exactly what we are thinking too. We've looked HARD into AWS/GCP/Azure, but for all the reasons you mentioned we don't want to go that route. Owning the entire stack is so much cheaper, both money and time wise.

Have you looked at OCI bare metal shapes? [1] Oracle Cloud provides the server, and you control the stack end to end (including the hypervisor).

If you run into an issue, send me a note and I will get someone to reply to your issue.

1. https://cloud.oracle.com/compute/bare-metal/features

This needs more upvotes

I can tell a similar story with Amazon MWS, where even if we had access to "human support", it felt like talking to some bad ML, not understanding what we were saying. Ultimately that start up was disbanded, never violating any rule they had, but flagged because of a false positive, and we couldn't even prove we didn't violate anything because we didn't even go live yet. It felt Kafkaesque, punishing one of a myriad possible intents due to malfunctioning ML, with no recourse.

Maybe support just needed to satisfy their quota of kicked out companies for the month, who knows?

Lol. That's definitely a possibility.

Is it only me or does it seem if you are not a "famous" person that has a lot of public visibility and is able to create pressure through a tweet or blog post you are lost, no number to call, no mail to write. Over the years I saw a lot of similar stories, youtube or in general "google accounts" blocked for no clear reason and no way to contact somebody to solve the issue... kinda scary...

> Because of a keyword monitor picked up by their auto-moderation bot

Can you elaborate on that? What do they monitor with the moderation bot?

The point is that anyone could fall into that category when laws change.

Imagine you're running a cosplay community, and all of a sudden all your content is being deleted because the SESTA/FOSTA bill gets passed in a country where your "cloud" happens to reside in: https://hardware.slashdot.org/story/18/03/25/0614209/sex-wor...

"because of the grey area of our tech"

"told us straight out that it would and we should move"

Sounds shady. I bet this would make more sense if OP explained what his company actually does.

Exactly -- there is a lot the OP isn't telling us. Maybe Google was right to shut them down.

I'd rather my cloud provider err on the permissive side. Preemptively shutting suspicious things down without an external complaint seems a bit much…

Well, there are all kinds of grey area stuff. One fairly obvious example is various security services, which have a wide variety.

Not everything is outright "likely to get banned" (eg pron things). ;)

Yup, agree. OP was probably doing something against the terms. Care to provide details?

Well, "grey" can mean a lot of things when you are talking about the same company that moderates Youtube.

I know they specifically ban cryptocurrency mining on their free credit / tier. Even called out on their public product pages.

I assumed they could tell that via CPU usage with they already monitor for quotas.

+1 - I'm curious as well.

This occasionally happens with gsuite users as well. Businesses lose access to all their email and documents.

Good times.

I've gotta whole heartily disagree. I've never encountered this on GCE. I run a DevOps consulting company and for standard EC2/machines I much prefer GCP. It's not even close. AWS for the most part lacks any or little user experience testing on UI's and developer interfaces. AWS region specific resources are a nightmare, billing on GCP with sustained use and custom machine types is vastly superior. Disks are much easier to grok, no provisioned IOPS, EBS optimized, enhanced networking hoopla.

By chance are you located out of the United States? These are not downtime issues, but anti-fraud prevention and finance issues.

I've noticed that over the last few years it's become increasingly difficult to do things with US based services (especially banking) if you are outside of the US. And this goes double if you are a US citizen with no ties to the States other than citizenship. Americans as a general rule have never been terribly adept at anything international; banking, languages, or even base geography. We have offices in Cambodia and Laos and I have been told by more than one US based-service/company that Laos is not a real country. I suppose they think the .la domain stands for Los Angeles :) We are looking to set up an office in Hong Kong or Singapore and use that to deal with Western countries. But we're a small not-for-profit operation and HK and Singapore are EXPENSIVE.

What gray area are you in?

> the grey area of our tech


Blockchain for sure

because of the grey area of our tech

The nature of the tech in question seems important in this story.

I am really curious, what was the business?

Thanks for sharing, I thought maybe it was a one off - it helps to avoid similar issues and luckily there is plenty of cloud competition.

Was it porn or cryptos?

Sounds like "we were doing something sketchy, got caught, but somehow it isn't our fault".

It can happen to you too: you get hacked, hackers run arbitrary code in your account

If your cloud services account was hacked, you'd most likely be thanking Google or Amazon for stopping the services.

stopped yes, deleted the project if the photo id of the credit card account holder cannot be reached in 3 days might be an over-reaction though.

I hope there is a possibility to put a backup contact person / credit card so organisations can deal with people going on vacation or being sick or whatever.

IMHO this should be nicely documented as any other technical material you get to learn about the cloud product when you create an account (e.g. important steps to ensure your account remains open even in case of important security breaches, yadda yadda it's possible we'll need a way to prove that you are you yadda yadda, this can happen when yadda yadda, be prepared, do yadda yadda)

I agree that it seems like an over-reaction. But on an account with intense usage, a single credit card on file, no backup, and a fraud warning it does seem very suspicious.

AFAIK, Google Cloud credit card payments are processed through Google Pay, which supports multiple credit cards, debit cards, bank accounts, etc.

Ideally, in this case the company shouldn't be using the CFOs credit card, but entered into a payments agreement with Google, receiving POs, invoices and so on, including a credit line.

Never set up a crucial service like you'd set up a consumer service.

yes that's a very good description of the best practices that sadly many companies are not really following.

In many situations the "right thing" must be explained, otherwise when people fail to get it they can argue that wasn't really the right thing after all (sure that's ultimately because they just want to deflect the blame from themselves; so don't let them! clearly explain the assumptions under which anti-fraud measures are operating so people cannot claim they didn't know)

> Nicely, resiliently built, good solid stuff.

Erm ... no, evidently not?

Projects in the context of GCP can encompass all the necessary infrastructure to build a highly available service using standard practices. There's no indication anywhere from GCP themselves that a project could be a domain of failure. If asked, I doubt they would consider it as such.

A prudent person might consider a cloud provider to be a domain of failure and choose a multi-cloud option, which would probably be the correct way to address this resiliency issue. However, that's not really an appropriate approach for an early stage startup, where availability is generally not that much of a concern.

In other words: It wasn't resiliently built stuff.

Is an exploding car safe because it is built by an early stage startup?

Just because you decide that implementing resiliency isn't a good business decision for some early stage startup, doesn't magically make the product resilient, it just isn't and that may be OK.

There are many options to choose from for implementing resiliency, it could be having multiple providers concurrently, it could be having a plan for restoring service with a different provider in case one provider fails, it could be by setting up a contract with a sufficiently solvent provider that they pay for your damages if they fail to implement the resiliency that you need, whatever. But if you fail to consider an obvious failure mode of a central component of your system in your planning, then you are obviously not building a resilient system.

Edit: One more thing:

> There's no indication anywhere from GCP themselves that a project could be a domain of failure. If asked, I doubt they would consider it as such.

Then you are asking wrong, which still is your failure if you are responsible for designing a resilient system.

If you ask them "Is a complete project expected to fail at once?", of course they will say "no".

That's why you ask them "Will you pay me 10 million bucks if my complete project goes offline with less than one month advance warning?", and you can be sure you will get the response to the problem that you are actually trying to solve.

> A prudent person might consider a cloud provider to be a domain of failure and choose a multi-cloud option, which would probably be the correct way to address this resiliency issue. However, that's not really an appropriate approach for an early stage startup, where availability is generally not that much of a concern.

If you replace "multi-cloud" with "multi-datacenter" (in the pre-cloud days), this premise is fairly unassailable. In those same days, applying it to "multi-ISP", it becomes more arguable.

Today, though, the incremental cost (money and cognitive) of the multi-cloud solution, even for an early startup, doesn't seem like it would be high enough to make the notion downright inappropriate to consider.

I'd even argue that if a cloud provider makes the lock-in so attractive or multi-cloud so difficult that that's a sign not to depend on those exclusive services.

choose a multi-cloud option

The economics don’t work out if you are trying to do this with just vanilla VMs across AWS, GCP and Azure and managing yourself. You either do it the old fashioned way renting rack space and putting your own kit in, or you make full use of the managed services at which point - by design - you are locked in.

This is very concerning but can happen on AWS as well. July 4th last year at about 4PM PST amazon silently shutdown our primary load balancer (ALB) due to some copyright complaint. This took out our main api and several dependent apps. We were able to get a tech support agent on the phone but he wasn't able to determine why this happened for several hours. Eventually we figured out that another department within amazon was responsible for pulling down the alb in an undetectable way. Ironically we are now in the process of moving from aws -> gcp.

My coworker is running a hosted affiliate tracking system on AWS as part of our company. He regularly has to deal with AWS wanting to pull our servers because of email spam -- not because we're sending spam emails, but because some affiliate link is in a spam email that resolves to our server, and Spamhaus complained to AWS.

Usually this can get handled after a few days of aggravating emails back and forth, we get our client to ban the affiliate in question, and move on with our days with no downtime. But a few weeks ago my coworker came in to find our server taken offline, because AWS emailed him about a spam complaint on a Friday night, and they hadn't gotten a response by Sunday. It'd been down for hours before he realized.

They'd just null terminated the IP of the server, so he updated IPs in DNS real quick, but he then spent half a day both resolving the complaint, and then getting someone at AWS to say it wouldn't happen again. They supposedly put a flag on his account requiring upper management approval to disable something again, but we'll see if that works when it comes up again.

You're going to have to go multi-cloud if you truly want to insulate yourselves from this sort of problem.

If and when you do, give serious consideration to how you handle DNS.

Fwiw, Ansible makes the multicloud thing pretty straightforward as long as you aren’t married to services that only work for a specific cloud provider.

For that, you should consider setting up multiple accounts to isolate those services from the portable ones.

Wouldn't that be Terraform (perfect for setting up cloud infrastructure) vs. Ansible (can do all, but more geared to provisioning servers you already have)?

Ansible uses Apache Libcloud to run just about anything you need on any cloud provider in terms of provisioning. Once provisioned, it will handle all of your various configuration and deployment on those.

Also plays really nicely with Terraform.

How does ansible make it straighftorward? As far as I know, it neither helps with networking failover, load balancing, data consistency, or other aspects of distributed systems, and running one application across clouds is certainly a distributed systems problem, not a deployment problem.

Ansible helps deploy software, but deploying software is the smallest problem of going multi-cloud.

See reply to other comment.

I know what ansible is and can do. Your other comment is about how it can provision and deploy things. While true, it's unrelated to my point that that's the least of your problems in a multi-cloud world.

A lot of that depends on scale too. I was mostly talking about the ability to standardize configuration so that you could replicate your infrastructure on multiple providers. Essentially just making sure that you have a backup plan/redundancy in case something happens and you find yourself needing to spin things up elsewhere on short notice.

You're absolutely right that running them at the same time, data syncing, traffic flow, etc is much more complicated.

Also check out Mist.io. It's an open source multi-cloud management platform that abstracts the infrastructure layer to help you avoid the lock-in.

Disclosure: I'm one of the founders.

What's the difference between mist.io and Apache's libcloud?

Ansible is great a doing the things that Ansible does!

What are popular multi cloud solutions if you use AWS or GCP services that have proprietary APIs? Are there frameworks that paper over the API differences?

mist.io supports most public and private cloud platforms. Also, it's open source https://github.com/mistio/mist-ce

What's the difference between mist.io and Apache's libcloud?

Apache libcloud is a Python library that's used primarily to create, reboot & destroy machines in any supported cloud.

Mist.io is a cloud management platform that uses Apache libcloud under the hood. It provides a REST API & a Web UI that can be used for creating, rebooting & destroying machines, but also for tagging, monitoring, alerting, running scripts, orchestrating complex deployments, visualizing spending, configuring access policies, auditing & more.

What's the DNS solution here? Something like Cloudflare or Edge?

The correct DNS solution is to use multiple providers.

See: Route53 and Dyn outages in the past couple years.

They shutdown just the load balancer?

Forgive my ignorance but that seems like a weird choice rather than cutting access to the servers or in some more formal ways for copyright...

Also kinda concernit that multiple departments can take enforcement type action and others not know it. That seems way disorganized / recipe for diasater.

Whomever reported the violation probably identified the public IP address of the ALB and notified Amazon

Makes sense but you would think someone at AWS would handle it... more systematically.

> Ironically we are now in the process of moving from aws -> gcp.

Why not Azure? They have a solid platform and (at least for a MSFT partner) their support is top-notch.

I respectfully disagree. I have worked on two projects with Azure both with big accounts, one even so big that we had senior Azure people sitting in our teams. Both had the highest possible support contract.

Yet their support didn't ever solve a problem within their SLA's and sometimes critical level tickets were hanging for months.

Plus my impression is that whereas AWS (and possibly Google) clouds are built by engineers using best practices and logic, Azure products felt always very much marketing driven e.g. marketing gave engineering a list of features to launch and engineering did the minimum effort possible to have the corresponding box ticked. I absolutely hated working on Azure and now won't accept any contract on it.

Documentation is horrible or non-existing, things just don't work, have weird limitations or transient deployment errors, super weird architectural and implementation choices + you never escape the clunkyness of the MS legacy with for example AD.

We did have the same issues back in beta and we're forced to build choas monkey degrees of robustness into our platform. Was this experience of yours a while back? However, there are now a few people at work who even run VMs on it as their daily driver.

> Yet their support didn't ever solve a problem within their SLA's

What does this mean?

Service Level Agreements dictate the quality, availability, and responsibilities with the client. They put bounds on how long things will take to get answered, and sometimes fixed.

GP is saying that even though they had a contract to resolve issues within X hours/days the issues were not being solved within X hours/days.

Cynically: most SLAs with the 'Big Boys' tend to give guarantees about getting an answer, not a solution. "We are looking into the problem" may satisfy the terms of a contract, but they don't satisfy engineers in trouble.

I know what a SLA means but I have never seen an SLA from Azure dictating a guaranteed response time. They only give the SLA for time until initial reply as far as I know. I was suspecting the person I replied to have misunderstood what it is they have purchased. Maybe in some cases if you pay them some obscene amount of money you can purchase an SLA for for time til resolution but I don't think that's the case here.

Can't you end-to-end encrypt your data, so that Amazon can't run their copyright filters over them?

In this case the complaint was against some image(s) we were publicly hosting. We've taken steps to isolate our file hosting from the rest of the system in case this were to happen again. We only host images for fashion blog posts written by staff so I imagine other aws customers have had a much worse time in this regard.

So the copyright claim was legitimate?

No truly production and especially revenue critical dependency should go on the card. Have your lawyer/licensing person sign agreement with them with actual sla and customer support. If it’s not worth your time you shouldn’t complain when you loose it.

That's a great point. These cloud hosting companies don't make this a natural evolution though, because there's no human to talk to, you start tiny and increase your usage over time. But every company depending on something and paying serious money should have a specific agreement. I wonder if this could still happen though, even if you have a separate contract.

> These cloud hosting companies don't make this a natural evolution though, because there's no human to talk to

This is not true at all. Once you start spending real money on GCP or AWS, they will reach out to you. You will probably sign a support contract and have an account manager at that point. Or you might go with enterprise support where you have dedicated technical assets within the company that can help with case escalation, architecture review, billing optimization, etc.

It makes sense that would happen. So they just didn't have the contact info for the people here? Maybe they just were spending a little, but their whole business still depended on it.

Once you hit a certain spend, Google contacts you and asks you to sign a thing.

There's a mismatch between how much you spend and how much business value is there. The spend for management systems of physical infrastructure like wind turbines is tiny relative to revenue compared to the typical pure software company, especially freemium or ad-driven stuff where revenue-to-compute ratio is very low. Calibrating for this wouldn't really be in Google's DNA.

Amen to that. Once you reach 1,000 USD monthly you can switch to regular invoiced account (subjected to verification) and you have dedicated account manager.

Yeah I couldn't get my head around this bit:

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

Indeed, presumably they were then also at the mercy of the credit card company cancelling or declining the card at the critical billing renewal moment.

I've yet to experience any subscription service that immediately shuts off access due to a declined card.

Certainly not; no reputable service provider will cut you off at the first credit card decline.

Is this possible for GCP? Definitely seems the way to go.

Yes they have invoice-based billing that you can apply for: https://cloud.google.com/billing/docs/how-to/invoiced-billin...


This is a standard risk with any attempt to remain anonymous with a supplier. The supplier, since they don't know you, and therefore can't trust you, will not offer much credit.

Cards get skimmed all the time. When a card gets skimmed, the issuer informs everyone who is making recurring purchases with that card "Hey, this card was skimmed, it's dead".

If someone has a recurring charge attached to that account, the recurring charge will go bad. If this is an appreciable number of cloud services which are billed by the second, this can happen very, very quickly and without you knowing. Remember, sometimes the issuer informs you that the card was skimmed, which you will receive after all the automated systems have been told.

So, the cloud provider gets the cancel, and terminates the card. It then looks around sees the recurring charge, takes a look at your servers racking up $$ they can't recoup and the system goes "we don't know this person, they buy stuff from us, but we haven't analysed their credit. Are they good for the debt? We've never given them credit before. Better cut them off until they get in touch."

If only they had signed an enterprise agreement and gotten credit terms. It could still be paid with a credit card, but the supplier would say "They're good for $X, let it ride and tell them they'll be cut off soon". They can even attach multiple methods of payment to the account, where, for example, a second card with a different bank is used as a backup. Having a single card is a single point of failure in the system!

In closing, imagine you're a cryptocoin miner who uses stolen cards to mine on cloud services. What does that look like to the cloud provider?

Yep, someone signs up for cloud services, starts racking up large bills and then the card is flagged as stolen.

It looks like they used a personal GCP account for their multi-million dollar business.

Would be interested to see what would've happened if they would've used a business account.

While not a cloud platform, I had an experience along the same vein with Stripe.

We're a health-care startup, and this past Saturday I got an email saying that due to the nature of our business, we were prohibited from using their payment platform (Credit Card companies have different risk profiles and charge accordingly--see Patereon v Adult Content Creators).

Rather than pull the plug immediately, they offered us a 5-day wind down period, and provided information on a competitor that takes on high-risk services.

Fortunately, the classification of our business was incorrect (we do not offer treatment nor perscription/pharma services), and after contacting their support via Email & Twitter, we resolved the issue in less than 24-hours.

So major kudos to Stripe for protecting their platform, WHILE also trying to do the right thing for the customers who run astray from the service agreement.

Please remember Google Cloud is a multi-tenant public cloud and in order run a multi-tenant environment providers have to monitor usage, users, billing, and take precautionary measures at times when usage or account activity is sensed to be irregular. Some of this management is done automatically by systems preserving QoS and monitoring for fraud or abuse.

This seems like a billing issue. If they had offline billing and monthly invoicing (enterprise agreement) I do not believe this issue would have happened.

If you are running an enterprise business and do not have enterprise support and an enterprise relationship with the provider, you may be doing something wrong on your end. It sounds like the author of this post does not have an account team and hasn't take the appropriate steps to establish an enterprise relationship with their provider. They are running a consumer account which is fine in many many cases, but may not be fine for a company that requires absolutely no service interruptions.

IMO, the time this issue was resolved by the automated process (20 mins) is not too bad for consumer cloud services. Most likely this issue could have been avoided if the customer had an enterprise relationship (offline billing/invoicing, support, TAM, account flagging, etc, etc) with Google Cloud.

A "consumer account"? I don't know what you're talking about. This is Google Cloud, not Spotify. I don't know a lot of "consumers" spending hundreds of dollars, thousands of dollars or more per month on Google Cloud. And paying bills by wire transfer instead of credit card doesn't change anything to the issue discussed here.

I'm a hobby programmer who runs a few small projects on GCP. My personal spending is smaller than OPs, but as mentioned elsewhere, once you hit a certain threshold, they will contact you to offer switching to a business account. Obviously they're not gonna force you to switch if you don't want to, but then don't complain for not getting business level support.

Hundreds or thousands is easily spent by 'consumers' on any public cloud. Think about it.

Paying bills offline via invoice establishes a enterprise agreement with cloud providers. It does in fact change everything with the issue discussed here. They wouldn't be taken offline due to an issue with the credit card payment.

In Germany, maybe even all of europe you need a tax ID which you, at least as far as the type they require is concerned, only get as a business, not a consumer. I actually tried due to the relatively easy way to get a fancy, reliable network (I kind of admire their global SDN that can push within 5% of line rate with no meaningful, added packet loss (apart from the minimal amount due to random cosmic rays and similar baseline effects).

They actually relented on that. You can now register "Individual" accounts that don't need a tax ID: https://cloud.google.com/billing/docs/resources/vat-overview

Individual projects without a GCP organization association are probably treated as consumer accounts.

That is correct. CC billing and individual projects could be consumer level usage. Anyone can signup for this level account and use for whatever purpose. Offline invoicing and G Suite / Organization setup / domain validation / enterprise support could be thought of as enterprise and would come with assurances such as billing payment stability.

What are you talking about? Are you representing Google in some way?

We have significant spend on CC with GCP and we’re not “consumers”. Our account manager has no issue with this. If they did we’d move somewhere else.

I can't speak to the specific incident. We've been running almost 400 servers (instances and k8s cluster nodes) for over a year on GCP and we've been quite happy with the performance and reliability, as well as the support response when we have needed it. I did want to address this comment...

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

You should never be in this position. If this were to happen to us we would be able to create a new project with a different payment instrument, and provision it from the ground up with terraform, puppet and helm scripts. The only thing we would have to fix up manually are some DNS records and we could probably have everything back up in a few hours. Eventually when we have moved all of our services to k8s I would expect to be able to do this even on a different cloud provider if that were necessary.

Restarting the service from scratch is one thing, but what about all your data? Some of these services have 100's T's of data hanging of them and if Google would delete that because of some perceived violation of their terms then that is not something you can recover from in a couple of hours, if at all.

This is one of the reasons I always implore people to have a backup of their data with another provider or at least under a different account. That protects against all kinds of accidents but also against malice.

Backup is a thing. If your company is making millions of dollars off your business you should have a redundant backup of everything including (especially) your data.

Yes, we have backups. The problem with `data` is not the fact I have backups, is that at a certain scale I will have so much data that "moving providers" could take on the order weeks.

If OP happened to me, sure yes I could have my entire infra on AWS/Azure/whatever else Terraform supports in an hour, maybe more to replace some of the tiny cloud specific features we use. But if it takes me a day to me to just move the data into Azure, thats an entire lost business day of productivity.

If it takes weeks then you should choose a second provider where you can show up with your backup hard drives or whatever you use and plug them in. Moving data physically is an option.

"Never underestimate the bandwidth of a semi full of harddrives."

Note that I did not imply that restoring the service to a different project or provider would always be easy or fast (certainly in the case of very large data volumes it would be neither of those things). I was addressing the prospect of losing "years of work" as was stated in the OP. That sort of implies that most or all of what they did over that time is recorded only in the current state of the GCP project that was disabled, and that is a really terrifying position to be in.

People usually end up using backup or move infrastructure on short notice in catastrophic situations, which is presumably rare. Days worth of work to bring back your business in catastrophic downtime - doesn't seem like a bad thing at all to me. If anything, it sounds like a very well organized development flow with very optimistic time-frame.

This vastly simplifies the situation, especially when the cloud is involved. Having a backup, much less a replica of such data requires an enormous infrastructure cost, whether it's your own or someone else's infrastructure. The time to bring that data back to a live and stable state again also is quite costly. (note the stable part)

It's a simple truth that even if you are at the millions of dollars point, there is a data size at which you are basically all-in with whatever solution you've chosen, and having a secondary site even for a billion dollar company can be exceptionally difficult and cost prohibitive to move that sort of data around, again especially when you're heavily dependent on a specific service provider.

Yes, the blame in part lies with making the decision to rely on such a provider. At the same time, there are compelling arguments for using an existing infrastructure instead of working on the upkeep of your own for data and compute time at that scale. Redundancy is built into such infrastructures, and perhaps it should take a little more evidence for the provider to decide to kill access to everything without hard and reviewed evidence.

It might be too expensive for some people. But really there is no other solution other than full backup of everything. Relying on a single point of failure, even on an infrastructure with a stellar record, is just a dead man walking.

And then of course there is the important bit that from a regulatory perspective 'just a backup' may be enough to be able to make some statements about the past but it won't get you out of the situation where due to your systems being down you weren't ingesting real-time date during the gap. And for many purposes that makes your carefully made back-up not quite worthless but close to it.

So then you're going to have to look into realtime replication to a completely different infrastructure and if you ever lose either one then you're immediately on very thin ice.

It's like dealing with RAID5 on arrays with lots of very large hard drives.

About ~6 years ago, I was involved in a project where data would increase by 100gb per day and the database would also significantly change every day. I vaguely remember having some kind of cron bash script with mysqldump and rsync that would have a near identical offsite backup of data (also had daily, monthly snapshots). We also had a near identical staging setup of our original production application which we would use to restore our application from the near-realtime backup we had running. We had to test this setup every other month - it was an annoying thing to do at first. But we were exceedingly good at it over time. Thankfully we never had to use our backup, but we slept at night peacefully.

Backup is a bit of an art in itself, everyone has a different type of backup requirement for their application, some solutions might not be even financially feasible. You might never end up using your backup ever at all, but all it needs is one very bad day. And if your data is important enough, you will need to do everything possible to avoid that possible bad day.

That's a good scheme. Note how things like GCP make it harder rather than easier to set something like that up, you'd almost have to stream your data in real time to two locations rather than to bring it in to GCP first and then to stream it back out to your backup location.

> Backup is a bit of an art in itself

Fully agreed on that, and what is also an art is to spot those nasty little single points of failure that can kill an otherwise viable business. Just thinking about contingency planning makes you look at a business with different eyes.

Yes, I'm aware of that. But you'd be surprised how many businesses are under the impression that using 'the cloud' obviates the needs for backups. Especially if their data is in the 100's of terabytes.

Non-technical owners making faulty assumptions is not the fault of "Cloud" providers. It's probably common (I faced it myself personally, in a non-cloud situation), but there is nothing the providers can do about unprepared users.

> but there is nothing the providers can do about unprepared users.

That's true, but they can do something to avoid making things worse, see the linked article.

While true. I was specifically referring to this part:

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

The comment suggests they are using personal GCP account instead of enterprise account.

Millions of dollars worth of work + imply no backup + non-enterprise account (but expecting enterprise support) + not having multiple forms of payment available.

Combining all these together, it seems like all sorts of things are going wrong here.

I have never used GCP (or any of the big three cloud providers), so I don't know how they are in general, but in this specific case there seems to be faulty planning on the user end.

Agreed, that wasn't smart. But, to their defense, this is how these things start out, small enough to be useful, and by the time they get business critical nobody realizes the silly credit card is all that stands between them and unemployment.

Why does it matter whether you're making millions of dollars? If you have any information which you would like to not lose for any reason, back it up in as many formats and locations as is feasible.

Agreed. I mentioned the money angle because I felt the person I replied to implied that 100s of terabytes of data are too expensive to backup.

If you are making money or if the data is important for you to lose than you should have a backup, anything else is faulty planning.

> I felt the person I replied to implied that 100s of terabytes of data are too expensive to backup.

Well, you felt wrong. Of course you should back up those 100s of terabytes, in fact that it is that much information is an excellent reason on top of all the other ones to back it up, re-creating it is going to be next to impossible.

It's just that the companies I look at - not all, but definitely some - seem to be under the impression that the cloud (or their cloud provider) can be trusted. Which is wrong for many reasons, not just this article.

I forget where I first saw this quoted, but it's relevant here: "There is no 'cloud', only someone else's computer". That's part of why I store very little data online, compared to most people (or the data I actually have/want). Anything I'm not okay with someone else having on their computer is backed up and stored on hard physical media. No cloud provider can be trusted - the moment the government wants in, they'll get in; and the moment it's considered more profitable for the provider to quietly snoop in your stored data, rest assured that they will.

Sorry, I stand corrected.

No problem, it's just that with 'This is one of the reasons I always implore people to have a backup of their data with another provider or at least under a different account.' that passage I thought I had the backup angle more than covered.

What bugs me about it is that there are some companies that give serious pushback because their cloud providers keep on hammering in to them how reliable their cloud is and that any back-up will surely be less reliable than their cloud solution and oh by the way we also have a backup feature that you can use.

They don't realize that even then they still have all their eggs in the one basket: their cloud account.

It's strange, but I completely missed the last part about backup from your comment. I have no idea how I missed it. Had I seen it would make my comment redundant and I would have never replied at all.

I only saw that part of the comment much much later.

Well, it definitely wasn't added in a later edit, or at least, not that I'm aware of, though I do have a tendency to write my comments out in bits submitted piece-by-piece. Even so, I wouldn't worry about it, I tend to miss whole blocks of text with alarming regularity while reading through stacks of pdfs and when comparing notes with colleagues we always wonder if we've been reading the same documents (they have the same problem...). Reading in parallel is our way of trying to ensure we don't miss anything and unfortunately it is not a luxury.

Often the effects are more subtle, reading what you think something said rather than what it actually said, or missing a negation or some sub-clause that materially alters the meaning of a sentence.

Even in proofreading we find stuff that is so dead obvious it is embarrassing. On the whole visual input for data is rather unreliable, even when reading stuff you wrote yourself, which I find the most surprising bit of all.

Studying this is interesting, and to some extent important to us due to the nature of our business, missing critical info supplied by a party we are looking at could cause real problems so we have tried to build a process to minimize the incidence of such faults, even so I'm 100% sure that with every job we will always miss something, and I live in perpetual fear of that something being something important.

Huh? You can provision new servers, but you can't just easily move over all the data, can you?

Why not? You should have backup strategy with business-acceptable RPO/RTO.

Easily? That's in the eye of the person doing the work I guess. But we have backups and could restore our databases.

I wrote a similar comment, thats what the best practices are for

This fraud flag is caused by your credit card being found in a leaked list of card numbers somewhere.

They suspect you are a fraudster because you are using a stolen card.

Either sign a proper SLA agreement with Google (which gives you 30 days to pay their bills by any form, and therefore you get 30 days notice before they pull the plug), or have two forms of payment on file. Preferably, don't use your GCP credit card at dodgy online retailers too...

Or you know, Google could have emailed them, told them exactly that and waited for a response before pulling the plug on the servers.

While you make sense from Google's PoV, it doesn't from the customer's PoV. As google is a big corp, it's IMHO better to side with the customer here, as next time it might be you who's getting screwed over by Google/other corp.

> Preferably, don't use your GCP credit card at dodgy online retailers too...

Or at gas stations, ATMs, or any other place where someone can install a skimmer.

> Either sign a proper SLA agreement with Google

How do you do that?

At least someone gets it.

I may be missing something, so help me out here... I get the impression that the author was not told the precise reason why the activity was suspicious. Wouldn't a precise error message, if not actually a human interface, been helpful? Why the generic "suspicious activity" warning?

It seemed very Kafkaesque to me, getting tried and convicted without any mention of the crime or charge. I think the author is justified in his disapproval.

So, you think this is okay?

I can echo with the sentiment here. There have been a few times, they have broken backward compatibility resulting in our production outage without even new deployment. For example the BigQuery client library suddenly started breaking because they had rolled out some changes from the API contract the library was calling. When we reached out to support they took it very lightly saying why are we even using "the ancient version of library", Ok fair enough we upgraded the library to the recommended version but alas! the dataflow library started breaking due to this new upgrade. For next few hours support just kept on playing binary search of a version which was compatible with both bigQuery and dataflow while the production was down.

The worst part is that when we did post morterm and asked Google why the support resolution was so slow despite being "the privileged" customer, their answer was that the P1 SLA was only to respond within 15 minutes there is no SLA for resolution. Most of the "response" that were getting was that a new support guy has taken over in a new time zone which is the most useless information for us.

We are seriously thinking of moving to another cloud vendor.

In my experience, the support from the other clouds is equally useless if not worse.

AWS would never admit that anything is wrong from their side.

I wonder how prevalent this behavior is. Mozilla behaves the same towards browser extensions, which put business depends on. They removed our extension multiple times, each time before asking for something different, be it uncompressed source code, instructions for how to build it, a second privacy policy separate from our sites policy and more. Each time we would have happily responded to a request promptly, but instead you find out when you’ve been shut down already.

Grace periods that respect your business should be a standard that all service providers hold themselves to

It sounds to me like Mozilla identified your extension as potentially malicious and prioritizing user safety, shut you down first.

As far as I know, Mozilla has no business relationship with extension developers, so I would actually be very concerned if their first action wasn't to cut you off.

I can confirm Mozilla handles this very poorly. I had the exact same experience with them. It was so bad that I actually just left the extension off their store and now focus on Chrome.

There is nothing dodgy about the extension. Mozilla was just being ridiculous.

What was the extension? Specifically.

A companion extension to my price comparison website.

That entire class of browser extensions is shady. Do you make money on referrals to shopping sites?

Not from the extension (not that it would be against Mozilla's ToS if it did). It has other nice features to make our users' lives better.

Thank you for judging my business without even knowing it.

Browser extensions that say they help with comparison shopping are a very common type of "Potentially Unwanted Application" (PUA - aka malware with a legal team). The infamous Superfish is an example of this type of thing, and there are many others.

I don't know anything about your business or the extension, I'm just pointing out that you're in a space that makes you suspicious by association.

Fair enough. But this has nothing to do with Mozilla's actions. It was as GP said. It includes things like their incompetence in dealing with a build process that creates transpiled/minified code. Even when I gave them all the source and the build instructions (npm run build) they still couldn't comprehend what was going on. Yes, I know it's strange since Mozilla makes a browser with a JavaScript engine.

Edit: I should add that after 2 weeks of back and forth emails the dude was finally able to build it then blamed me for not mentioning he needed to run "npm run build", even though I did mention it AND it's in package.json AND it's mentioned in the (very short and concise) readme.txt.

So after this exasperating experience he just took down the extension without warning and said it's because it contains Google Analytics.

I would have happily removed Google Analytics from the extension. The dude had my source for 2 weeks and could have told me about that at any time, but decided to tell me after 2 weeks of mucking around, after he had already removed the extension.

It was me that decided it was not worth the hassle to have the extension on their store. I just left it off.

Maybe link to the extension on the Chrome store, so people can see (if?) it's legit from their PoV? :)

I don't want anyone to misunderstand this as an advertisement. And it's an Australian website. If you're still keen shoot me a PM.

Nah, not that keen personally (I don't even use Chrome). I was just pointing out that it would have been useful to have the URL to reduce confusion. :)

The extension isn't the main event. The website is.

The extension is currently a proof of concept that I plan to revisit later.

I wonder if OP paid for Support? https://cloud.google.com/support/?options=premium-support#op...

And had they converted their project to monthly invoicing: https://cloud.google.com/billing/docs/how-to/invoiced-billin...

What difference does that make? There's no justification for intentionally shutting down a potentially critical service with no warning.

IIRC, A couple things:

* When you have invoicing setup, the above shouldn't happen. You need to keep a payment method in good standing, but you have something like 10 days to pay your bill. -- They do a little bit more vetting (KYC) on the invoice path, and that effectively gets you out of dodge.

* Without paying for premium support, there's effectively no support.

I think if someone didn't pay their bill on time, you might shut off their service too, wouldn't you?

> if someone didn't pay their bill on time

What does that have to do with anything? The account was not shut down for non-payment, it was shut down because of unspecified "suspicious activity."

But even in case of non-payment I would not shut down the account without any warning. Not if I wanted to keep my customers.

Most hosting providers give 72 hours to pay if a method fails.

Then they unplug the Ethernet cable and wait a week or two.

But as you said, this isn’t about non-payment.

You are correct sir

"Oh hey, it looks like $customer suddenly started a bunch of coinminers on their account at 10x their usual usage rate. Perfectly fine. Let them rack up a months billing in a weekend; why not?"

A hypothetical but not unheard of scenario in which immediate shutdown might be warranted.

It's a rough world and different providers have optimised for different threat models. AWS wants to keep customers hooked; GCP wants to prevent abuse, Digital Ocean wants to show it's as capable as anyone else.

If you can afford it, you build resilient multicloud infrastructure. If you can't yet do that; at the very least ensure that you have off-site backups of critical data. Cloud providers are not magic; they can fail in bizarre ways that are difficult to remedy. If you value your company you will ensure that your eggs are replicated to more than one basket and you will test your failover operations regularly. Having every deploy include failing over from one provider to another may or may not fit your comfort level; but it can be done.

> A hypothetical but not unheard of scenario in which immediate shutdown might be warranted.

Not without warning, no. It is possible that the customer intended to start a CPU-intensive process and fully intended to pay for it.

Send a warning first with a specific description of the "suspicious activity" and give the customer a chance to do something about it. Don't just pull the plug with no warning.

> Let them rack up a months billing in a weekend

Yes, there's nothing wrong with that. You have their credit card and can even authorize certain amounts ahead of time to make sure it can be charged.

This doesn't help if the spending is fraudulent, either because the CC is actually stolen or because it will be disputed or what have you.

"If you can afford it"

There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

My point is that the calculus has more to it than just money. The prudent response, of course, is to do as you described. Have a plan for your provider to go away.

Offsite backups and the necessary config management to bring up similar infra in another region/provider is likely sufficient for most.

> There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

Perhaps we'll start seeing a new crop of post-mortems from the "fail fast" type of startups failing due to cloud over-dependency issues. They're (presumably rare) edge cases, but easily fatal to an early enough startup.

> There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

I just heard a dozen founders sit up and think "Market Opportunity" in glowing letters.

CockroachDB has a strong offering.

But multi-cloud need not be complicated in implementation.

A few ansible scripts and some fancy footwork with static filesystem synchronization and you too can be moving services from place to place with a clear chain of data custody.

A few ansible scripts? Nah.

Everything I have runs in kubernetes. The only difficulty I have to deal with is figuring out how to deploy a kubernetes cluster in each provider.

From there, I write a single piece of orchestration that will drop my app stack in any cloud provider. I'm using a custom piece of software and event-driving automation to handle the creation and migration of services.

Migrating data across providers is hard as kubernetes doesn't have snapshots yet.

There are already a lot of startups in this space doing exactly the kind of thing that I just described. Most aim to provide a CD platform for k8s.

It's hard to get a fully multi-cloud response in a simple stack. I describe one option for a simple multi-cloud stack in this blog post: https://blog.fauna.com/survive-cloud-vendor-crashes-with-net...

For an early startup, though, I would think it's not necessary to be "fully" multi-cloud.

Rather, it would likely be enough to have a cloud-agnostic infrastructure with replication to a warm (or even mostly-cold to save on cost) standby at the alternate provider with a manual failover mechanism.

Most folks overestimate their need for availability and lack a willingness to accept risk. There are distinct benefits that come with avoiding "HA" setups. Namely simplicity and speed.

> Most folks overestimate their need for availability and lack a willingness to accept risk.

I disagree. More specifically, I think, instead, many [1] folks just don't make that assessment/estimate in the first place.

They just follow what they perceive to be industry best practices. In many ways, this is more about social proof than a cargo cult, even though the results can resemble the latter, such as elsewhere in this thread with a comment complaining they had a "resilient" setup in a single cloud that was shut down by the provider.

> There are distinct benefits that come with avoiding "HA" setups. Namely simplicity and speed.

Indeed, and, perhaps more importantly, being possible at all, given time ("speed") and money ("if you can afford it").

The same could be said of "scalability" setups, which can overlap in functionality (though I would argue that in cases of overlap the dual functionality makes the cost more likely to be worth it).

None of this is to say, though that "HA" is synonymous with "business continuity". It's much like the conceptual difference between RAID and backups, and even that's not always well understood.

[1] I won't go so far as to say "most" because that would be a made up statistic on my part

Agreed for the most part. Availability for very many is a binary operation. They either do none of it or all of it.

A clever man once said, "you own your availability".

An exercise in BC planning can really pay off. If infra is code, and it and the data are backed up reasonably well, then a good MTTR can obviate the need for a lot of HA complexity.

> Availability for very many is a binary operation. They either do none of it or all of it.

I assume I'm missing some meaning here, particularly since the premise of much of the discussion in the thread is that there can be high availability at one layer, but it can rendered irrelevant by a SPoF at another (especially when the "layer" is the provider of all of ones infrastructure).

Do you consider that a version of "none"? Or are you pointing out that, despite the middle ground under discussion, the "binary" approach is more common, if not more sensible?

The binary approach is that it either isn't considered or people opt in for all of it without consideration for what is actually needed. The Google SRE book goes into this at length. For each service, they define SLOs and make a considered decision about how to meet them.

Oh, so what you're saying is that they're no considering the notion that there may be a medium-availability (for lack of a better term) solution, which could be perfectly adequate/appropriate?

Yes, there is or they wouldn't turn it off. Companies aren't in the habit of trying not to take your money for services without a pretty damn good reason.

And if it was that critical it should have support and a SLA contract, and you know, backups.

Right. Because big companies never ever do anything unjustified. Particularly when they put automatic processes in place with no humans in the loop, because we all know that computers never make mistakes.

This is fatal. I have a small pilot project on Google Cloud. Considering putting up a much larger system. Not now.

The costs of Google may be comparable or lower than other services, but they don't seem to get that risk is a cost. Risk can be your biggest cost. And they've amplified that risk unnecessarily and shifted it to the customer. Fatal, as I said.

Making a decision purely based upon some posts on HN and the original artical isn’t a good idea either as there is little data on how often this happens and how often (and pulling the plug could happen with another IAAS). You need to weigh up your options for risk management based upon how critical your project is, the amount of time/money you have to solve the issues.

You might never see this happen to your GCP account in it’s lifetime.

This is a hallmark of Google's lack of customer service. They used to use the same filtering alg on customer search feeds as public. The system was a grey list of some sort and the client was worth about 1m in ads a day to them. Never the less, once a month it would get blocked. Sometimes for over a day before someone read the email complaint and fixed it. We had no phone, chat, or any other access to them. They have no clue how to run a business nor do they care. Never partner with them.

There's quite a lot of people talking about how this is their own fault, that they should have expected it, that they should have been prepared. Victim blaming, some would say, even.

But even if you assign blame to the OP for not expecting this, it doesn't look good, because the lesson here is "you shouldn't use google and if you do, expect them to fuck you over, for no reason, at any time".

Exactly. The whole point of using AWS, Google Cloud, etc, is that you get to stop thinking about certain classes of problems. An infrastructure provider that is unreliable cancels most of the value of using them for infrastructure.

Worse, they can potentially more than cancel it out, if they merely remove the "worrying about hardware" (yes, and network and load balancers and everything else) aspects, which are, at least, well understood by some us out on the market, and replace it with "worrying about the provider" where a failure scenario is, not only more opaque, but potentially catastrophic, since it's a single vendor with all the infrastructure.

It reminds me of AWS's opacity-as-antidote-to-worry with respect to hardware failures. If the underlying hardware fails, the EC2 instance on it just disappears (I've heard GCP handles this better, and AWS might now, as well). I like to point out that this doesn't differ much from the situation of running physical hardware (while ignoring the hardware monitoring), both from a "worry" burden perspective and from a "downtime from hardware failure" perspective.

Google just doesn't have the talent, skills, or knowledge for dealing with business customers. They don't have competition in adtech and so never learned, but that doesn't work with GCP. They have great technical features but don't realize that's not what matters to a customer who wants their business to run smoothly.

We've gone through several account teams of our own that seem to be eager to help only to turn into radio silence once we actually need something. We have already moved mission-critical services to AWS and Azure, with GCP only running K8S and VMs for better pricing and performance.

GCP has good leadership now but it's clearly taking longer than it should to improve unfortunately.

I generally agree with you but there is one exception, Google fi has amazing support. I am surprised gcp wouldn't have similar support considering the obvious cost differences though.

Google Fi is for consumers.

And? Businesses should get even better support.

>> Google just doesn't have the talent, skills, or knowledge for dealing with business customers.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact