Hacker News new | past | comments | ask | show | jobs | submit login
Why you should not use Google Cloud (medium.com/serverpunch)
895 points by samdung on June 30, 2018 | hide | past | favorite | 357 comments



As someone who is currently struggling with Google Cloud's mediocre support, this is not surprising. We pay lots of money for support and have multiple points of contact but all tickets are routed through front-line support who have no context and completely isolate you from what's going on. For highly technical users the worst support is to get fed through the standard playbook ("have you tried turning it off and on again?") when you're dealing with an outage. Especially since the best case is your support person playing go-between with the many, siloed teams trying to troubleshoot an issue while they apparently try to pass the buck.

Not to mention the lack of visibility in changes - it seems like everything is constantly running at multiple versions that can change suddenly with no notice, and if that breaks your use case they don't really seem to know or care. It feels like there's miles of difference between the SRE book and how their cloud teams operate in practice.


I'd just like to take this opportunity to praise Vultr. I've been using them for years and their support has always been good, and contrary to every other growing company, has been getting better over time.

I had an issue with my servers 2 days ago and I got a reply to my ticket within 1 minute. Follow-up replies were also very fast.

The person I was talking to was a system administrator who understood what I was talking about and could actually solve problems on the spot. He is actually the same person who answered my support requests last year. I don't know if that's a happy accident or if they try to keep the same support staff answering for the same clients. He was answering my requests consistently for 2 days this time.

I am not a big budget customer. AWS and GCP wouldn't think anything of me.

Thank you Vultr for supporting your product properly. And thanks Eric. You are very helpful!


Google Cloud provides more than just VMs and Containers. It has a bunch of services backed in, from a variety of databases such as Firebase (that have powerful built in subscription and eventing systems) to fully baked in Auth, (Google will even handle doing two factor for you!) to assisting with certain types of machine learning.

Vultr looks like they provide more traditional services with a few extra niceties on top.

Within Google's infrastructure, I can deploy a new HTTPS REST endpoint with a .js file and 1 console command.

Could I set up an ecosystem on a Vultr VM to do the same? Sure, it isn't magic. But GCP's entire value prop is that they've already done a lot of that work for you, and as someone running a startup, I was able to go from "I have this idea" to "I have this REST endpoint" in couple of days, without worrying about managing infrastructure.

That said, articles like this always worry me. I've never seen an article that says "Wow Google's support team really helped out!"


Using such proprietary features sounds like a great way to subject yourself to vendor lock in and leave you vulnerable to your cloud provider's every whim. I understand that using ready made features is alluring, but at what point are you too dependent on somebody else? All these cloud services reminds me a bit of left-pad, how many external dependencies can you afford? Maybe I'm too suspicious and cynical, but then I read articles like these from time to time...


The difference, IMO, is that you're generally leveraging the cloud providers platform in addition to using their hosting.

There are ways to make the hosting relatively agnostic, but choosing a pub/sub solution (for example), that operates at 'web scale' will have a distinct impact on your solutions and force you into their ecosystem to maximize value. Why bother with BigCorps UltraResistant services if you're only going to use some small percentage of the capabilities?

I've made systems that abstract away the difference entirely, but I think the 'goldilocks zone' is abstracted core domain logic that will run on anything, and then going whole-hog on each individual environment. Accept that "cloud" is vendor lockin, and mitigate that threat at the level of deployment (multi-cloud, multi-stack), rather than just the application.


You're not alone. I worry the same about many things, but everyone just thinks I'm a negative nancy for discounting THIS AWESOME SERVICE with these awesome, 100% non evil people behind it!


I do use AWS, and have tried out GCP before. Just because I use Vultr doesn't mean I can't also use others.


I should have quoted more. :) I was indicating why you might have gotten some early down votes, your comparison wasn't like to like.


Thanks for the info. You may be right about the downvote reason (though it's a pretty ridiculous reason), but I don't think that matters since they are in the same industry providing a similar service and there's no reason why GCP can't provide the same service as Vultr, especially since they charge a lot more for their instances than Vultr does.


Downvotes here on HN often don't make sense. You can be exactly 100% correct about something and still get downvoted to hell.

The best thing to do is to simply ignore them as asking about downvotes just invites more of them.


Well, I don't really care about the precious internet points disappearing. I'd much rather hear from someone what their reasoning is since I might actually learn something.

But it is telling that there have been at least 5 downvotes but no one is willing to comment as to why.

Edit: since I see you have a downvote (surprise, surprise) I'll clarify that it wasn't me.


Your new downvotes may be because HN guidelines say not to ask about or discuss downvotes.


What is this? Fightclub? "Rule #1 of fightclub: You never talk about fight club" :D

Seems like a recipe for breeding unfairness. "Don't talk about the system", :sigh:


It just generates a bunch of unnecessary comments in a thread. There's already like 8 in this chain for example.


Please don't break the site guidelines by going on about downvotes. That's a strict reduction in signal/noise ratio, which mars your otherwise fine comment. We're trying for the opposite here.

Downvotes can irritate, but there are at least two balms that don't involve adding noise to the threads. One is to remember that people sometimes simply misclick. The other is that unfairly downvoted comments mostly end up getting corrective upvotes from fair-minded community members (as happened with yours here).

https://news.ycombinator.com/newsguidelines.html


Thanks for the info. I forgot about that guideline. Too bad I can't edit my comment.


I've re-opened it for editing for you and will happily delete my comment if you take that bit out.


Thanks. Done.

I'm happy for you to leave your comment. It might help someone else in future.


Thanks!


I’ve seen other companies walk away from Google Cloud for similar reasons. Automate everything to scale doesn’t work for the Fortune 500. They should absolutely own this market.


This is why AWS and Azure continue to gain market share in cloud, while Google remains relativity stagnant, despite (in many cases) superior technology.

Their sales staff is arrogant and has no idea how to sell into F500 type companies.

Source: 10+ meetings, with different clients, I attended where the Google sales pitch was basically "we are smarter than you, and you will succumb". The Borg approach. Someone needs to revamp the G sales and support approach if they want to grow in the cloud space.


Even for small businesses their sales is pretty bad. I once got a package in the mail from them with a URL containing a tracking code printed on it to contact them that was so obviously Google being Google and treating people as part of a funnel. There was no phone number to be found and nothing personalized.

The other funny thing is the package had a neoprene sleeve for a Chromebook. Eventually a sales person reached out via email assuming I owned a Chromebook and acted like I owed them a phone call because they gave me a neoprene sleeve I couldn’t use.

The entire package ended up going in the trash, which was an unfortunate waste of unrecyclable materials.


If you filled in a form at the link provided from one of the bits of paper in the box they would have sent you a Chromebook for the sleeve. I'e got one here gathering dust. My boss threw away the same package but I was curious and looked through it carefully.


Sounds like it functions as some kind of filter, whether intentional or not. ;)

"People who will look through every bit of advertising crap company x sends", vs those who don't.

Something, somewhere is probably making stats on that. ;)


I found it interesting that you wrote "Something, somewhere", and not "Someone, somewhere"


Yeah, that was on purpose. It's no longer obviously just humans potentially doing this. ;)


Yes, this is our experience as well, and the root cause of their many problems with GCP. Tech is nice but matters little if the account team just ignores us.


> "we are smarter than you, and you will succumb". The Borg approach.

Well, that seems to be the approach at Google. Starting with hiring

Not surprising they end up with a hivemind that can't see past their mistakes.


Reminds me of a thread I saw on the Google Inbox mobile app a while back. Brilliant app, but no 'unread message counter'. There was a huge number of people on the thread begging for that feature and going so far as to say that it was the one thing that prevented them from using the app. Their thinking was apparently that you should have filters for everything and it all should've fallen neatly into little boxes, but for people that have been using email 10 times longer than those developers have been out of college, that's not very practical. One G dev chimed in and said 'But that's now how I use email' and closed off the discussion.



That's interesting. I was of the understanding that everything at Google office tries to de-stress you/undistract you. I thought that would result in people being calmer/ more empathetic.


Arrogance is arrogance, whether stressed or not. Hiring arrogant people tends to create an environment non-conducive to empathy.


I have less experience with their sales/account managers but every time I got a super weird patronizing and even cultish vibe that really put me off.


Yes that's exactly what I'm talking about they are super arrogant and unwilling to discuss things at a practical level.

And I've seen it cause them to lose at least 10 potentially good sales.

They have advantages but they're so arrogant that it puts people off.

It's more than 10 times or more people told me they prefer Google's solution to Microsoft or Amazon's but they're going with a competitor because they can't stand Google's arrogant attitude. It's close to laughable because of throwing money away just because they won't back off.


Absolutely.

It blows my mind that GCloud, with arguably superior tech and performance compared to AWS/Azure, can't handle support. I have my own horror stories from 2 years ago, but still they haven't fixed it.

Google just doesn't seem to be able to focus on products that requires service and customer support. Maybe they just don't care about it while they have an infinite revenue stream from search and advertising. Whatever it is, they should be humiliated.

I love the tech, and the ux details like in browser SSH (AWS hasn't improved UX EVER) but they can't get support right? Amazing.


> Google just doesn't seem to be able to focus on products that requires service and customer support

That's literally any product that people pay for (instead of viewing ads).

Customer support isn't and never has been in their DNA. It's often rage-inducing how hard it is to contact a human at Google.

They seem to think they can engineer products that don't need humans behind them.


That's the meme, but my experience with the business support for G Suite does match it at all: I can easily call the phone support, get a competent human quickly, and they are very helpful.


I think I read recently that they outsourced g suite support


Ironically, they let you know every time they can that you can contact your personal adwords sales person by phone.


The reason for this is obvious, but it's a good point. It's like someone whose personality becomes awful right after you marry them.

I'm actually going to take this back to my company as a principle: "Treat locked-in/subscribing customers as well as our salespeople treat prospects."


I didn't write that article, but last week I came to the same conclusion and began my migration from GCP to AWS. I admire Google's tech but Cloud Platform lacks fit and finish. It's not fully productized. It's not even fully documented. (Is it indolence or arrogance to publish a link to the source code as the only explanation of an important API?) I'm sorry, Google, you ignored me when I was crushing on you. Now I have Amazon.


I think they still are mainly focused on their ad business as the core of the company and cloud is something they 'do on the side'. For Microsoft, Azure is core business, it's the future of the company. If they fuck it up, they're dead. Google apparently doesn't see their cloud offering as their core business and therefore doesn't get the attention it needs.


In my limited experience, Google has worse support than facebook (when it comes to advertising agencies). They simply don't care because you are a tiny multimillion euros company and they are THE GOOGLE.


AWS has acquired Cloud9 for in browser for desktop environment/ SSH, it inherits account credentials. You should check it out.


Yeah, Cloud9 is billed as an IDE, but it's really more useful as a terminal inside your cloud environment that happens to have a text editor. Workspaces has been great for a cloud-based development environment, and the new Linux Workspaces will be more useful than the Web-based "cloud IDEs".


I haven't tried workspaces. Will I get better performance from my chromebook than just using cloud9? Cloud9 is also incredibly cheap.


They are very different things: Workspaces runs a full desktop environment (Windows or Linux) on an EC2 instance, and enables you to remotely access it through client software. The client software uses Teradici PCoIP, rather than VNC or RDP, and Teradici is amazing: it is so fast that the desktop feels like it is running on your local computer.

This means that you can run whatever development tools that you want on the EC2 instance, rather than the very limited code editor that Cloud9 provides. You can easily run a full copy of Visual Studio on a Workspace, and get the full resources of an EC2 instance with SSD drivess.


If it can make you feel any better, the AWS support is the same.


AWS sends you emails 9 months in advance of needing to restart individual EC2 instances (with calm, helpful reminders all the way through). IME, they're also really good about pro-active customer outreach and meaningful product newsletters... Even for tiny installations (ie less than $10K yearly).

Anecdotally: I've been an MS gold partner in a bunch of different contexts for years. The experiences I had as 'small fish' techie with AWS were on par or better. YMMV, of course, but I'd be more comfortable putting my Enterprise in the hands of AWS support than MS's (despite MS being really good in that space).


It costs a pretty penny, but I’m very happy with AWS enterprise support. When we had a ticket that we didn’t escalate get a crappy answer, our TAM escalated on his own initiative to get us a better answer.


Are you on insiders? Only way to get anyone to care.


Let's see if the current logging thread will result in anything. :-)


Insiders?


It's a referral-only group where you get to play with and complain about their tech before everyone else does


How would one go about getting an invite to the secret club?


Two people either on the list already or working for Google have to endorse you.


Haha :D


Higher paying (premium) customers actually get premium support in GCP, e.g. SRE who can get paged on an outage.

Prices seem high though: https://cloud.google.com/support/?options=premium-support#op...


So piggybacking on this, I have a similar story to tell. We had a nice young startup, infra entirely built out on Google Cloud. Nicely, resiliently built, good solid stuff. Because of a keyword monitor picked up by their auto-moderation bot our entire project was shut down immediately, wasn't able to bring it up for several hours, thank god we hadn't gone live yet as we were then told by support that because of the grey area of our tech, they couldn't guarantee this wouldn't keep happening. And in fact told us straight out that it would and we should move.

So maybe think about which hosting provider to go with, don't get me wrong I like their tech. But their moderation does need a more human element, to be frank all their products do. Simply ceding control to algorithmic judgement just won't work in the short term if ever at all.


I’m starting to favour buying physical rack space again and running everything 2005 style with a light weight ansible layer. As long as your workload is predictable, the lock in, unpredictability, navigation through the maze of billing, weird rules and what-the-fuckism you have to deal with on a daily basis is merely trading one vendor specific hell for another. Your knowledge isn’t transferable between cloud vendors either so I’d rather have a hell I'm totally in control of and of which the knowledge has some retention value and will move around vendors no problems. You can also span vendors then thus avoiding the whole all eggs in one basket problem.


Hybrid is what you are looking for. Have a rack or two for your core and rent everything else from multiple cloud vendors, integrated with whatever orchestration you are running on your own racks (K8s? DC/OS? Ansible?).


Or just two DCs in active/active.

Still works out cheaper for workloads than AWS does even factoring staff in at this point.

AWS always turns into cost and administrative chaos as well unless it is tightly controlled which in itself is costly and difficult the moment you have more than one actor. GCP probably the same but I have no experience with that. Very much more difficult to do this when you have physical constraints.

Two man startup, perhaps but I think the transition should go:

VPS (linode etc) for MVP, colo half rack, active/active racks two sites then scale out however your workload requires.


More importantly, there is a wealth of competent labor in the relatively stable area of maintaining physical servers (both on the hardware and software side). The modern cloud services move fast and break things, leading to a general shortage of resources and competent people. As a business, even if slightly more expensive initially, it makes more sense to start lower and work up to the cloud services as the need presents itself.


You can federate Kubernetes across your own rack and one or more public cloud providers.


You can but that’s another costly layer of complexity and distribution to worry about.

One of the failure modes I see a lot is failing to factor in latency in distributed systems. Mainly because most systems don’t benefit at all from distribution and do benefit from simplification.

The assumption on here is that a product is going to service GitHub or stackoverflow class loads at least, but literally most aren’t. Even high profile sites and web applications I have worked on tend to run on much smaller workloads than people expect. Latency optimisation by flattening distribution and consolidating has higher benefits than adopting fleet management in the mid term of a product.

Kubernetes is one of those things you pick when you need it not before you need it. And then only if you can afford to burn time and money on it with a guaranteed ROI.


Sure. The idea is that you get the benefits of public cloud and cost savings of BYO hardware for extra capacity at lower cost. Of course, you're now absorbing hardware maintenance costs as well. I haven't seen a cost breakdown really making a strong case one way or the other, but my company is doing it anyway.


Have you actually done this, or are you repeating stuff off the website? Because everyone I've talked with about kubernetes federation says it's really not ready for production use.


The approach we have taken is to create independent clusters with a common LoadBalancer.

Basically, the LB decides which kubernetes cluster will serve your request and once you're in a k8s cluster, you stay there.

You don't have the control-plane that the federation provides and a bit of overhead managing clusters independently, but we have automated the majority of the process. On the other hand, debugging is way easier and we don't suffer from weird latencies between clusters (weird because sometimes a request will go to a different cluster without any apparent reason <-- I'm sure there's one, but none that you could see/expect, hence debugging).

My people's time is more important than your complex system.


Ha. It's in process. Not ready yet. I'll report back if we fail miserably.


Federation v1 is legacy now. The new architecture is called MultiCluster and designed to work on top of K8S rather than having a leader cluster: https://github.com/kubernetes/community/tree/master/sig-mult...


That's exactly what we are thinking too. We've looked HARD into AWS/GCP/Azure, but for all the reasons you mentioned we don't want to go that route. Owning the entire stack is so much cheaper, both money and time wise.


Have you looked at OCI bare metal shapes? [1] Oracle Cloud provides the server, and you control the stack end to end (including the hypervisor).

If you run into an issue, send me a note and I will get someone to reply to your issue.

1. https://cloud.oracle.com/compute/bare-metal/features


This needs more upvotes


I can tell a similar story with Amazon MWS, where even if we had access to "human support", it felt like talking to some bad ML, not understanding what we were saying. Ultimately that start up was disbanded, never violating any rule they had, but flagged because of a false positive, and we couldn't even prove we didn't violate anything because we didn't even go live yet. It felt Kafkaesque, punishing one of a myriad possible intents due to malfunctioning ML, with no recourse.

Maybe support just needed to satisfy their quota of kicked out companies for the month, who knows?


Lol. That's definitely a possibility.


Is it only me or does it seem if you are not a "famous" person that has a lot of public visibility and is able to create pressure through a tweet or blog post you are lost, no number to call, no mail to write. Over the years I saw a lot of similar stories, youtube or in general "google accounts" blocked for no clear reason and no way to contact somebody to solve the issue... kinda scary...


> Because of a keyword monitor picked up by their auto-moderation bot

Can you elaborate on that? What do they monitor with the moderation bot?


The point is that anyone could fall into that category when laws change.

Imagine you're running a cosplay community, and all of a sudden all your content is being deleted because the SESTA/FOSTA bill gets passed in a country where your "cloud" happens to reside in: https://hardware.slashdot.org/story/18/03/25/0614209/sex-wor...


"because of the grey area of our tech"

"told us straight out that it would and we should move"

Sounds shady. I bet this would make more sense if OP explained what his company actually does.


Exactly -- there is a lot the OP isn't telling us. Maybe Google was right to shut them down.


I'd rather my cloud provider err on the permissive side. Preemptively shutting suspicious things down without an external complaint seems a bit much…


Well, there are all kinds of grey area stuff. One fairly obvious example is various security services, which have a wide variety.

Not everything is outright "likely to get banned" (eg pron things). ;)


Yup, agree. OP was probably doing something against the terms. Care to provide details?


Well, "grey" can mean a lot of things when you are talking about the same company that moderates Youtube.


I know they specifically ban cryptocurrency mining on their free credit / tier. Even called out on their public product pages.

I assumed they could tell that via CPU usage with they already monitor for quotas.


+1 - I'm curious as well.


This occasionally happens with gsuite users as well. Businesses lose access to all their email and documents.

Good times.


I've gotta whole heartily disagree. I've never encountered this on GCE. I run a DevOps consulting company and for standard EC2/machines I much prefer GCP. It's not even close. AWS for the most part lacks any or little user experience testing on UI's and developer interfaces. AWS region specific resources are a nightmare, billing on GCP with sustained use and custom machine types is vastly superior. Disks are much easier to grok, no provisioned IOPS, EBS optimized, enhanced networking hoopla.

By chance are you located out of the United States? These are not downtime issues, but anti-fraud prevention and finance issues.


I've noticed that over the last few years it's become increasingly difficult to do things with US based services (especially banking) if you are outside of the US. And this goes double if you are a US citizen with no ties to the States other than citizenship. Americans as a general rule have never been terribly adept at anything international; banking, languages, or even base geography. We have offices in Cambodia and Laos and I have been told by more than one US based-service/company that Laos is not a real country. I suppose they think the .la domain stands for Los Angeles :) We are looking to set up an office in Hong Kong or Singapore and use that to deal with Western countries. But we're a small not-for-profit operation and HK and Singapore are EXPENSIVE.


What gray area are you in?


> the grey area of our tech

Cryptocurrency?


Blockchain for sure


because of the grey area of our tech

The nature of the tech in question seems important in this story.


I am really curious, what was the business?


Thanks for sharing, I thought maybe it was a one off - it helps to avoid similar issues and luckily there is plenty of cloud competition.


Was it porn or cryptos?


Sounds like "we were doing something sketchy, got caught, but somehow it isn't our fault".


It can happen to you too: you get hacked, hackers run arbitrary code in your account


If your cloud services account was hacked, you'd most likely be thanking Google or Amazon for stopping the services.


stopped yes, deleted the project if the photo id of the credit card account holder cannot be reached in 3 days might be an over-reaction though.

I hope there is a possibility to put a backup contact person / credit card so organisations can deal with people going on vacation or being sick or whatever.

IMHO this should be nicely documented as any other technical material you get to learn about the cloud product when you create an account (e.g. important steps to ensure your account remains open even in case of important security breaches, yadda yadda it's possible we'll need a way to prove that you are you yadda yadda, this can happen when yadda yadda, be prepared, do yadda yadda)


I agree that it seems like an over-reaction. But on an account with intense usage, a single credit card on file, no backup, and a fraud warning it does seem very suspicious.

AFAIK, Google Cloud credit card payments are processed through Google Pay, which supports multiple credit cards, debit cards, bank accounts, etc.

Ideally, in this case the company shouldn't be using the CFOs credit card, but entered into a payments agreement with Google, receiving POs, invoices and so on, including a credit line.

Never set up a crucial service like you'd set up a consumer service.


yes that's a very good description of the best practices that sadly many companies are not really following.

In many situations the "right thing" must be explained, otherwise when people fail to get it they can argue that wasn't really the right thing after all (sure that's ultimately because they just want to deflect the blame from themselves; so don't let them! clearly explain the assumptions under which anti-fraud measures are operating so people cannot claim they didn't know)


> Nicely, resiliently built, good solid stuff.

Erm ... no, evidently not?


Projects in the context of GCP can encompass all the necessary infrastructure to build a highly available service using standard practices. There's no indication anywhere from GCP themselves that a project could be a domain of failure. If asked, I doubt they would consider it as such.

A prudent person might consider a cloud provider to be a domain of failure and choose a multi-cloud option, which would probably be the correct way to address this resiliency issue. However, that's not really an appropriate approach for an early stage startup, where availability is generally not that much of a concern.


In other words: It wasn't resiliently built stuff.

Is an exploding car safe because it is built by an early stage startup?

Just because you decide that implementing resiliency isn't a good business decision for some early stage startup, doesn't magically make the product resilient, it just isn't and that may be OK.

There are many options to choose from for implementing resiliency, it could be having multiple providers concurrently, it could be having a plan for restoring service with a different provider in case one provider fails, it could be by setting up a contract with a sufficiently solvent provider that they pay for your damages if they fail to implement the resiliency that you need, whatever. But if you fail to consider an obvious failure mode of a central component of your system in your planning, then you are obviously not building a resilient system.

Edit: One more thing:

> There's no indication anywhere from GCP themselves that a project could be a domain of failure. If asked, I doubt they would consider it as such.

Then you are asking wrong, which still is your failure if you are responsible for designing a resilient system.

If you ask them "Is a complete project expected to fail at once?", of course they will say "no".

That's why you ask them "Will you pay me 10 million bucks if my complete project goes offline with less than one month advance warning?", and you can be sure you will get the response to the problem that you are actually trying to solve.


> A prudent person might consider a cloud provider to be a domain of failure and choose a multi-cloud option, which would probably be the correct way to address this resiliency issue. However, that's not really an appropriate approach for an early stage startup, where availability is generally not that much of a concern.

If you replace "multi-cloud" with "multi-datacenter" (in the pre-cloud days), this premise is fairly unassailable. In those same days, applying it to "multi-ISP", it becomes more arguable.

Today, though, the incremental cost (money and cognitive) of the multi-cloud solution, even for an early startup, doesn't seem like it would be high enough to make the notion downright inappropriate to consider.

I'd even argue that if a cloud provider makes the lock-in so attractive or multi-cloud so difficult that that's a sign not to depend on those exclusive services.


choose a multi-cloud option

The economics don’t work out if you are trying to do this with just vanilla VMs across AWS, GCP and Azure and managing yourself. You either do it the old fashioned way renting rack space and putting your own kit in, or you make full use of the managed services at which point - by design - you are locked in.


This is very concerning but can happen on AWS as well. July 4th last year at about 4PM PST amazon silently shutdown our primary load balancer (ALB) due to some copyright complaint. This took out our main api and several dependent apps. We were able to get a tech support agent on the phone but he wasn't able to determine why this happened for several hours. Eventually we figured out that another department within amazon was responsible for pulling down the alb in an undetectable way. Ironically we are now in the process of moving from aws -> gcp.


My coworker is running a hosted affiliate tracking system on AWS as part of our company. He regularly has to deal with AWS wanting to pull our servers because of email spam -- not because we're sending spam emails, but because some affiliate link is in a spam email that resolves to our server, and Spamhaus complained to AWS.

Usually this can get handled after a few days of aggravating emails back and forth, we get our client to ban the affiliate in question, and move on with our days with no downtime. But a few weeks ago my coworker came in to find our server taken offline, because AWS emailed him about a spam complaint on a Friday night, and they hadn't gotten a response by Sunday. It'd been down for hours before he realized.

They'd just null terminated the IP of the server, so he updated IPs in DNS real quick, but he then spent half a day both resolving the complaint, and then getting someone at AWS to say it wouldn't happen again. They supposedly put a flag on his account requiring upper management approval to disable something again, but we'll see if that works when it comes up again.


You're going to have to go multi-cloud if you truly want to insulate yourselves from this sort of problem.

If and when you do, give serious consideration to how you handle DNS.


Fwiw, Ansible makes the multicloud thing pretty straightforward as long as you aren’t married to services that only work for a specific cloud provider.

For that, you should consider setting up multiple accounts to isolate those services from the portable ones.


Wouldn't that be Terraform (perfect for setting up cloud infrastructure) vs. Ansible (can do all, but more geared to provisioning servers you already have)?


Ansible uses Apache Libcloud to run just about anything you need on any cloud provider in terms of provisioning. Once provisioned, it will handle all of your various configuration and deployment on those.

Also plays really nicely with Terraform.


How does ansible make it straighftorward? As far as I know, it neither helps with networking failover, load balancing, data consistency, or other aspects of distributed systems, and running one application across clouds is certainly a distributed systems problem, not a deployment problem.

Ansible helps deploy software, but deploying software is the smallest problem of going multi-cloud.


See reply to other comment.


I know what ansible is and can do. Your other comment is about how it can provision and deploy things. While true, it's unrelated to my point that that's the least of your problems in a multi-cloud world.


A lot of that depends on scale too. I was mostly talking about the ability to standardize configuration so that you could replicate your infrastructure on multiple providers. Essentially just making sure that you have a backup plan/redundancy in case something happens and you find yourself needing to spin things up elsewhere on short notice.

You're absolutely right that running them at the same time, data syncing, traffic flow, etc is much more complicated.


Also check out Mist.io. It's an open source multi-cloud management platform that abstracts the infrastructure layer to help you avoid the lock-in.

Disclosure: I'm one of the founders.


What's the difference between mist.io and Apache's libcloud?


Ansible is great a doing the things that Ansible does!


What are popular multi cloud solutions if you use AWS or GCP services that have proprietary APIs? Are there frameworks that paper over the API differences?


mist.io supports most public and private cloud platforms. Also, it's open source https://github.com/mistio/mist-ce


What's the difference between mist.io and Apache's libcloud?


Apache libcloud is a Python library that's used primarily to create, reboot & destroy machines in any supported cloud.

Mist.io is a cloud management platform that uses Apache libcloud under the hood. It provides a REST API & a Web UI that can be used for creating, rebooting & destroying machines, but also for tagging, monitoring, alerting, running scripts, orchestrating complex deployments, visualizing spending, configuring access policies, auditing & more.


What's the DNS solution here? Something like Cloudflare or Edge?


The correct DNS solution is to use multiple providers.

See: Route53 and Dyn outages in the past couple years.


They shutdown just the load balancer?

Forgive my ignorance but that seems like a weird choice rather than cutting access to the servers or in some more formal ways for copyright...

Also kinda concernit that multiple departments can take enforcement type action and others not know it. That seems way disorganized / recipe for diasater.


Whomever reported the violation probably identified the public IP address of the ALB and notified Amazon


Makes sense but you would think someone at AWS would handle it... more systematically.


> Ironically we are now in the process of moving from aws -> gcp.

Why not Azure? They have a solid platform and (at least for a MSFT partner) their support is top-notch.


I respectfully disagree. I have worked on two projects with Azure both with big accounts, one even so big that we had senior Azure people sitting in our teams. Both had the highest possible support contract.

Yet their support didn't ever solve a problem within their SLA's and sometimes critical level tickets were hanging for months.

Plus my impression is that whereas AWS (and possibly Google) clouds are built by engineers using best practices and logic, Azure products felt always very much marketing driven e.g. marketing gave engineering a list of features to launch and engineering did the minimum effort possible to have the corresponding box ticked. I absolutely hated working on Azure and now won't accept any contract on it.

Documentation is horrible or non-existing, things just don't work, have weird limitations or transient deployment errors, super weird architectural and implementation choices + you never escape the clunkyness of the MS legacy with for example AD.


We did have the same issues back in beta and we're forced to build choas monkey degrees of robustness into our platform. Was this experience of yours a while back? However, there are now a few people at work who even run VMs on it as their daily driver.


> Yet their support didn't ever solve a problem within their SLA's

What does this mean?


Service Level Agreements dictate the quality, availability, and responsibilities with the client. They put bounds on how long things will take to get answered, and sometimes fixed.

GP is saying that even though they had a contract to resolve issues within X hours/days the issues were not being solved within X hours/days.

Cynically: most SLAs with the 'Big Boys' tend to give guarantees about getting an answer, not a solution. "We are looking into the problem" may satisfy the terms of a contract, but they don't satisfy engineers in trouble.


I know what a SLA means but I have never seen an SLA from Azure dictating a guaranteed response time. They only give the SLA for time until initial reply as far as I know. I was suspecting the person I replied to have misunderstood what it is they have purchased. Maybe in some cases if you pay them some obscene amount of money you can purchase an SLA for for time til resolution but I don't think that's the case here.


Can't you end-to-end encrypt your data, so that Amazon can't run their copyright filters over them?


In this case the complaint was against some image(s) we were publicly hosting. We've taken steps to isolate our file hosting from the rest of the system in case this were to happen again. We only host images for fashion blog posts written by staff so I imagine other aws customers have had a much worse time in this regard.


So the copyright claim was legitimate?


No truly production and especially revenue critical dependency should go on the card. Have your lawyer/licensing person sign agreement with them with actual sla and customer support. If it’s not worth your time you shouldn’t complain when you loose it.


That's a great point. These cloud hosting companies don't make this a natural evolution though, because there's no human to talk to, you start tiny and increase your usage over time. But every company depending on something and paying serious money should have a specific agreement. I wonder if this could still happen though, even if you have a separate contract.


> These cloud hosting companies don't make this a natural evolution though, because there's no human to talk to

This is not true at all. Once you start spending real money on GCP or AWS, they will reach out to you. You will probably sign a support contract and have an account manager at that point. Or you might go with enterprise support where you have dedicated technical assets within the company that can help with case escalation, architecture review, billing optimization, etc.


It makes sense that would happen. So they just didn't have the contact info for the people here? Maybe they just were spending a little, but their whole business still depended on it.


Once you hit a certain spend, Google contacts you and asks you to sign a thing.


There's a mismatch between how much you spend and how much business value is there. The spend for management systems of physical infrastructure like wind turbines is tiny relative to revenue compared to the typical pure software company, especially freemium or ad-driven stuff where revenue-to-compute ratio is very low. Calibrating for this wouldn't really be in Google's DNA.


Amen to that. Once you reach 1,000 USD monthly you can switch to regular invoiced account (subjected to verification) and you have dedicated account manager.


Yeah I couldn't get my head around this bit:

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.


Indeed, presumably they were then also at the mercy of the credit card company cancelling or declining the card at the critical billing renewal moment.


I've yet to experience any subscription service that immediately shuts off access due to a declined card.


Certainly not; no reputable service provider will cut you off at the first credit card decline.



Is this possible for GCP? Definitely seems the way to go.


Yes they have invoice-based billing that you can apply for: https://cloud.google.com/billing/docs/how-to/invoiced-billin...


+1!


This is a standard risk with any attempt to remain anonymous with a supplier. The supplier, since they don't know you, and therefore can't trust you, will not offer much credit.

Cards get skimmed all the time. When a card gets skimmed, the issuer informs everyone who is making recurring purchases with that card "Hey, this card was skimmed, it's dead".

If someone has a recurring charge attached to that account, the recurring charge will go bad. If this is an appreciable number of cloud services which are billed by the second, this can happen very, very quickly and without you knowing. Remember, sometimes the issuer informs you that the card was skimmed, which you will receive after all the automated systems have been told.

So, the cloud provider gets the cancel, and terminates the card. It then looks around sees the recurring charge, takes a look at your servers racking up $$ they can't recoup and the system goes "we don't know this person, they buy stuff from us, but we haven't analysed their credit. Are they good for the debt? We've never given them credit before. Better cut them off until they get in touch."

If only they had signed an enterprise agreement and gotten credit terms. It could still be paid with a credit card, but the supplier would say "They're good for $X, let it ride and tell them they'll be cut off soon". They can even attach multiple methods of payment to the account, where, for example, a second card with a different bank is used as a backup. Having a single card is a single point of failure in the system!

In closing, imagine you're a cryptocoin miner who uses stolen cards to mine on cloud services. What does that look like to the cloud provider?

Yep, someone signs up for cloud services, starts racking up large bills and then the card is flagged as stolen.


It looks like they used a personal GCP account for their multi-million dollar business.

Would be interested to see what would've happened if they would've used a business account.


While not a cloud platform, I had an experience along the same vein with Stripe.

We're a health-care startup, and this past Saturday I got an email saying that due to the nature of our business, we were prohibited from using their payment platform (Credit Card companies have different risk profiles and charge accordingly--see Patereon v Adult Content Creators).

Rather than pull the plug immediately, they offered us a 5-day wind down period, and provided information on a competitor that takes on high-risk services.

Fortunately, the classification of our business was incorrect (we do not offer treatment nor perscription/pharma services), and after contacting their support via Email & Twitter, we resolved the issue in less than 24-hours.

So major kudos to Stripe for protecting their platform, WHILE also trying to do the right thing for the customers who run astray from the service agreement.


Please remember Google Cloud is a multi-tenant public cloud and in order run a multi-tenant environment providers have to monitor usage, users, billing, and take precautionary measures at times when usage or account activity is sensed to be irregular. Some of this management is done automatically by systems preserving QoS and monitoring for fraud or abuse.

This seems like a billing issue. If they had offline billing and monthly invoicing (enterprise agreement) I do not believe this issue would have happened.

If you are running an enterprise business and do not have enterprise support and an enterprise relationship with the provider, you may be doing something wrong on your end. It sounds like the author of this post does not have an account team and hasn't take the appropriate steps to establish an enterprise relationship with their provider. They are running a consumer account which is fine in many many cases, but may not be fine for a company that requires absolutely no service interruptions.

IMO, the time this issue was resolved by the automated process (20 mins) is not too bad for consumer cloud services. Most likely this issue could have been avoided if the customer had an enterprise relationship (offline billing/invoicing, support, TAM, account flagging, etc, etc) with Google Cloud.


A "consumer account"? I don't know what you're talking about. This is Google Cloud, not Spotify. I don't know a lot of "consumers" spending hundreds of dollars, thousands of dollars or more per month on Google Cloud. And paying bills by wire transfer instead of credit card doesn't change anything to the issue discussed here.


I'm a hobby programmer who runs a few small projects on GCP. My personal spending is smaller than OPs, but as mentioned elsewhere, once you hit a certain threshold, they will contact you to offer switching to a business account. Obviously they're not gonna force you to switch if you don't want to, but then don't complain for not getting business level support.


Hundreds or thousands is easily spent by 'consumers' on any public cloud. Think about it.

Paying bills offline via invoice establishes a enterprise agreement with cloud providers. It does in fact change everything with the issue discussed here. They wouldn't be taken offline due to an issue with the credit card payment.


In Germany, maybe even all of europe you need a tax ID which you, at least as far as the type they require is concerned, only get as a business, not a consumer. I actually tried due to the relatively easy way to get a fancy, reliable network (I kind of admire their global SDN that can push within 5% of line rate with no meaningful, added packet loss (apart from the minimal amount due to random cosmic rays and similar baseline effects).


They actually relented on that. You can now register "Individual" accounts that don't need a tax ID: https://cloud.google.com/billing/docs/resources/vat-overview


Individual projects without a GCP organization association are probably treated as consumer accounts.


That is correct. CC billing and individual projects could be consumer level usage. Anyone can signup for this level account and use for whatever purpose. Offline invoicing and G Suite / Organization setup / domain validation / enterprise support could be thought of as enterprise and would come with assurances such as billing payment stability.


What are you talking about? Are you representing Google in some way?

We have significant spend on CC with GCP and we’re not “consumers”. Our account manager has no issue with this. If they did we’d move somewhere else.


I can't speak to the specific incident. We've been running almost 400 servers (instances and k8s cluster nodes) for over a year on GCP and we've been quite happy with the performance and reliability, as well as the support response when we have needed it. I did want to address this comment...

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

You should never be in this position. If this were to happen to us we would be able to create a new project with a different payment instrument, and provision it from the ground up with terraform, puppet and helm scripts. The only thing we would have to fix up manually are some DNS records and we could probably have everything back up in a few hours. Eventually when we have moved all of our services to k8s I would expect to be able to do this even on a different cloud provider if that were necessary.


Restarting the service from scratch is one thing, but what about all your data? Some of these services have 100's T's of data hanging of them and if Google would delete that because of some perceived violation of their terms then that is not something you can recover from in a couple of hours, if at all.

This is one of the reasons I always implore people to have a backup of their data with another provider or at least under a different account. That protects against all kinds of accidents but also against malice.


Backup is a thing. If your company is making millions of dollars off your business you should have a redundant backup of everything including (especially) your data.


Yes, we have backups. The problem with `data` is not the fact I have backups, is that at a certain scale I will have so much data that "moving providers" could take on the order weeks.

If OP happened to me, sure yes I could have my entire infra on AWS/Azure/whatever else Terraform supports in an hour, maybe more to replace some of the tiny cloud specific features we use. But if it takes me a day to me to just move the data into Azure, thats an entire lost business day of productivity.


If it takes weeks then you should choose a second provider where you can show up with your backup hard drives or whatever you use and plug them in. Moving data physically is an option.


"Never underestimate the bandwidth of a semi full of harddrives."



Note that I did not imply that restoring the service to a different project or provider would always be easy or fast (certainly in the case of very large data volumes it would be neither of those things). I was addressing the prospect of losing "years of work" as was stated in the OP. That sort of implies that most or all of what they did over that time is recorded only in the current state of the GCP project that was disabled, and that is a really terrifying position to be in.


People usually end up using backup or move infrastructure on short notice in catastrophic situations, which is presumably rare. Days worth of work to bring back your business in catastrophic downtime - doesn't seem like a bad thing at all to me. If anything, it sounds like a very well organized development flow with very optimistic time-frame.


This vastly simplifies the situation, especially when the cloud is involved. Having a backup, much less a replica of such data requires an enormous infrastructure cost, whether it's your own or someone else's infrastructure. The time to bring that data back to a live and stable state again also is quite costly. (note the stable part)

It's a simple truth that even if you are at the millions of dollars point, there is a data size at which you are basically all-in with whatever solution you've chosen, and having a secondary site even for a billion dollar company can be exceptionally difficult and cost prohibitive to move that sort of data around, again especially when you're heavily dependent on a specific service provider.

Yes, the blame in part lies with making the decision to rely on such a provider. At the same time, there are compelling arguments for using an existing infrastructure instead of working on the upkeep of your own for data and compute time at that scale. Redundancy is built into such infrastructures, and perhaps it should take a little more evidence for the provider to decide to kill access to everything without hard and reviewed evidence.


It might be too expensive for some people. But really there is no other solution other than full backup of everything. Relying on a single point of failure, even on an infrastructure with a stellar record, is just a dead man walking.


And then of course there is the important bit that from a regulatory perspective 'just a backup' may be enough to be able to make some statements about the past but it won't get you out of the situation where due to your systems being down you weren't ingesting real-time date during the gap. And for many purposes that makes your carefully made back-up not quite worthless but close to it.

So then you're going to have to look into realtime replication to a completely different infrastructure and if you ever lose either one then you're immediately on very thin ice.

It's like dealing with RAID5 on arrays with lots of very large hard drives.


About ~6 years ago, I was involved in a project where data would increase by 100gb per day and the database would also significantly change every day. I vaguely remember having some kind of cron bash script with mysqldump and rsync that would have a near identical offsite backup of data (also had daily, monthly snapshots). We also had a near identical staging setup of our original production application which we would use to restore our application from the near-realtime backup we had running. We had to test this setup every other month - it was an annoying thing to do at first. But we were exceedingly good at it over time. Thankfully we never had to use our backup, but we slept at night peacefully.

Backup is a bit of an art in itself, everyone has a different type of backup requirement for their application, some solutions might not be even financially feasible. You might never end up using your backup ever at all, but all it needs is one very bad day. And if your data is important enough, you will need to do everything possible to avoid that possible bad day.


That's a good scheme. Note how things like GCP make it harder rather than easier to set something like that up, you'd almost have to stream your data in real time to two locations rather than to bring it in to GCP first and then to stream it back out to your backup location.

> Backup is a bit of an art in itself

Fully agreed on that, and what is also an art is to spot those nasty little single points of failure that can kill an otherwise viable business. Just thinking about contingency planning makes you look at a business with different eyes.


Yes, I'm aware of that. But you'd be surprised how many businesses are under the impression that using 'the cloud' obviates the needs for backups. Especially if their data is in the 100's of terabytes.


Non-technical owners making faulty assumptions is not the fault of "Cloud" providers. It's probably common (I faced it myself personally, in a non-cloud situation), but there is nothing the providers can do about unprepared users.


> but there is nothing the providers can do about unprepared users.

That's true, but they can do something to avoid making things worse, see the linked article.


While true. I was specifically referring to this part:

> What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

The comment suggests they are using personal GCP account instead of enterprise account.

Millions of dollars worth of work + imply no backup + non-enterprise account (but expecting enterprise support) + not having multiple forms of payment available.

Combining all these together, it seems like all sorts of things are going wrong here.

I have never used GCP (or any of the big three cloud providers), so I don't know how they are in general, but in this specific case there seems to be faulty planning on the user end.


Agreed, that wasn't smart. But, to their defense, this is how these things start out, small enough to be useful, and by the time they get business critical nobody realizes the silly credit card is all that stands between them and unemployment.


Why does it matter whether you're making millions of dollars? If you have any information which you would like to not lose for any reason, back it up in as many formats and locations as is feasible.


Agreed. I mentioned the money angle because I felt the person I replied to implied that 100s of terabytes of data are too expensive to backup.

If you are making money or if the data is important for you to lose than you should have a backup, anything else is faulty planning.


> I felt the person I replied to implied that 100s of terabytes of data are too expensive to backup.

Well, you felt wrong. Of course you should back up those 100s of terabytes, in fact that it is that much information is an excellent reason on top of all the other ones to back it up, re-creating it is going to be next to impossible.

It's just that the companies I look at - not all, but definitely some - seem to be under the impression that the cloud (or their cloud provider) can be trusted. Which is wrong for many reasons, not just this article.


I forget where I first saw this quoted, but it's relevant here: "There is no 'cloud', only someone else's computer". That's part of why I store very little data online, compared to most people (or the data I actually have/want). Anything I'm not okay with someone else having on their computer is backed up and stored on hard physical media. No cloud provider can be trusted - the moment the government wants in, they'll get in; and the moment it's considered more profitable for the provider to quietly snoop in your stored data, rest assured that they will.


Sorry, I stand corrected.


No problem, it's just that with 'This is one of the reasons I always implore people to have a backup of their data with another provider or at least under a different account.' that passage I thought I had the backup angle more than covered.

What bugs me about it is that there are some companies that give serious pushback because their cloud providers keep on hammering in to them how reliable their cloud is and that any back-up will surely be less reliable than their cloud solution and oh by the way we also have a backup feature that you can use.

They don't realize that even then they still have all their eggs in the one basket: their cloud account.


It's strange, but I completely missed the last part about backup from your comment. I have no idea how I missed it. Had I seen it would make my comment redundant and I would have never replied at all.

I only saw that part of the comment much much later.


Well, it definitely wasn't added in a later edit, or at least, not that I'm aware of, though I do have a tendency to write my comments out in bits submitted piece-by-piece. Even so, I wouldn't worry about it, I tend to miss whole blocks of text with alarming regularity while reading through stacks of pdfs and when comparing notes with colleagues we always wonder if we've been reading the same documents (they have the same problem...). Reading in parallel is our way of trying to ensure we don't miss anything and unfortunately it is not a luxury.

Often the effects are more subtle, reading what you think something said rather than what it actually said, or missing a negation or some sub-clause that materially alters the meaning of a sentence.

Even in proofreading we find stuff that is so dead obvious it is embarrassing. On the whole visual input for data is rather unreliable, even when reading stuff you wrote yourself, which I find the most surprising bit of all.

Studying this is interesting, and to some extent important to us due to the nature of our business, missing critical info supplied by a party we are looking at could cause real problems so we have tried to build a process to minimize the incidence of such faults, even so I'm 100% sure that with every job we will always miss something, and I live in perpetual fear of that something being something important.


Huh? You can provision new servers, but you can't just easily move over all the data, can you?


Why not? You should have backup strategy with business-acceptable RPO/RTO.


Easily? That's in the eye of the person doing the work I guess. But we have backups and could restore our databases.


I wrote a similar comment, thats what the best practices are for


This fraud flag is caused by your credit card being found in a leaked list of card numbers somewhere.

They suspect you are a fraudster because you are using a stolen card.

Either sign a proper SLA agreement with Google (which gives you 30 days to pay their bills by any form, and therefore you get 30 days notice before they pull the plug), or have two forms of payment on file. Preferably, don't use your GCP credit card at dodgy online retailers too...


Or you know, Google could have emailed them, told them exactly that and waited for a response before pulling the plug on the servers.

While you make sense from Google's PoV, it doesn't from the customer's PoV. As google is a big corp, it's IMHO better to side with the customer here, as next time it might be you who's getting screwed over by Google/other corp.


> Preferably, don't use your GCP credit card at dodgy online retailers too...

Or at gas stations, ATMs, or any other place where someone can install a skimmer.


> Either sign a proper SLA agreement with Google

How do you do that?


At least someone gets it.


I may be missing something, so help me out here... I get the impression that the author was not told the precise reason why the activity was suspicious. Wouldn't a precise error message, if not actually a human interface, been helpful? Why the generic "suspicious activity" warning?

It seemed very Kafkaesque to me, getting tried and convicted without any mention of the crime or charge. I think the author is justified in his disapproval.


So, you think this is okay?


I can echo with the sentiment here. There have been a few times, they have broken backward compatibility resulting in our production outage without even new deployment. For example the BigQuery client library suddenly started breaking because they had rolled out some changes from the API contract the library was calling. When we reached out to support they took it very lightly saying why are we even using "the ancient version of library", Ok fair enough we upgraded the library to the recommended version but alas! the dataflow library started breaking due to this new upgrade. For next few hours support just kept on playing binary search of a version which was compatible with both bigQuery and dataflow while the production was down.

The worst part is that when we did post morterm and asked Google why the support resolution was so slow despite being "the privileged" customer, their answer was that the P1 SLA was only to respond within 15 minutes there is no SLA for resolution. Most of the "response" that were getting was that a new support guy has taken over in a new time zone which is the most useless information for us.

We are seriously thinking of moving to another cloud vendor.


In my experience, the support from the other clouds is equally useless if not worse.

AWS would never admit that anything is wrong from their side.


I wonder how prevalent this behavior is. Mozilla behaves the same towards browser extensions, which put business depends on. They removed our extension multiple times, each time before asking for something different, be it uncompressed source code, instructions for how to build it, a second privacy policy separate from our sites policy and more. Each time we would have happily responded to a request promptly, but instead you find out when you’ve been shut down already.

Grace periods that respect your business should be a standard that all service providers hold themselves to


It sounds to me like Mozilla identified your extension as potentially malicious and prioritizing user safety, shut you down first.

As far as I know, Mozilla has no business relationship with extension developers, so I would actually be very concerned if their first action wasn't to cut you off.


I can confirm Mozilla handles this very poorly. I had the exact same experience with them. It was so bad that I actually just left the extension off their store and now focus on Chrome.

There is nothing dodgy about the extension. Mozilla was just being ridiculous.


What was the extension? Specifically.


A companion extension to my price comparison website.


That entire class of browser extensions is shady. Do you make money on referrals to shopping sites?


Not from the extension (not that it would be against Mozilla's ToS if it did). It has other nice features to make our users' lives better.

Thank you for judging my business without even knowing it.


Browser extensions that say they help with comparison shopping are a very common type of "Potentially Unwanted Application" (PUA - aka malware with a legal team). The infamous Superfish is an example of this type of thing, and there are many others.

I don't know anything about your business or the extension, I'm just pointing out that you're in a space that makes you suspicious by association.


Fair enough. But this has nothing to do with Mozilla's actions. It was as GP said. It includes things like their incompetence in dealing with a build process that creates transpiled/minified code. Even when I gave them all the source and the build instructions (npm run build) they still couldn't comprehend what was going on. Yes, I know it's strange since Mozilla makes a browser with a JavaScript engine.

Edit: I should add that after 2 weeks of back and forth emails the dude was finally able to build it then blamed me for not mentioning he needed to run "npm run build", even though I did mention it AND it's in package.json AND it's mentioned in the (very short and concise) readme.txt.

So after this exasperating experience he just took down the extension without warning and said it's because it contains Google Analytics.

I would have happily removed Google Analytics from the extension. The dude had my source for 2 weeks and could have told me about that at any time, but decided to tell me after 2 weeks of mucking around, after he had already removed the extension.

It was me that decided it was not worth the hassle to have the extension on their store. I just left it off.


Maybe link to the extension on the Chrome store, so people can see (if?) it's legit from their PoV? :)


I don't want anyone to misunderstand this as an advertisement. And it's an Australian website. If you're still keen shoot me a PM.


Nah, not that keen personally (I don't even use Chrome). I was just pointing out that it would have been useful to have the URL to reduce confusion. :)


The extension isn't the main event. The website is.

The extension is currently a proof of concept that I plan to revisit later.


I wonder if OP paid for Support? https://cloud.google.com/support/?options=premium-support#op...

And had they converted their project to monthly invoicing: https://cloud.google.com/billing/docs/how-to/invoiced-billin...


What difference does that make? There's no justification for intentionally shutting down a potentially critical service with no warning.


IIRC, A couple things:

* When you have invoicing setup, the above shouldn't happen. You need to keep a payment method in good standing, but you have something like 10 days to pay your bill. -- They do a little bit more vetting (KYC) on the invoice path, and that effectively gets you out of dodge.

* Without paying for premium support, there's effectively no support.

I think if someone didn't pay their bill on time, you might shut off their service too, wouldn't you?


> if someone didn't pay their bill on time

What does that have to do with anything? The account was not shut down for non-payment, it was shut down because of unspecified "suspicious activity."

But even in case of non-payment I would not shut down the account without any warning. Not if I wanted to keep my customers.


Most hosting providers give 72 hours to pay if a method fails.

Then they unplug the Ethernet cable and wait a week or two.

But as you said, this isn’t about non-payment.


You are correct sir


"Oh hey, it looks like $customer suddenly started a bunch of coinminers on their account at 10x their usual usage rate. Perfectly fine. Let them rack up a months billing in a weekend; why not?"

A hypothetical but not unheard of scenario in which immediate shutdown might be warranted.

It's a rough world and different providers have optimised for different threat models. AWS wants to keep customers hooked; GCP wants to prevent abuse, Digital Ocean wants to show it's as capable as anyone else.

If you can afford it, you build resilient multicloud infrastructure. If you can't yet do that; at the very least ensure that you have off-site backups of critical data. Cloud providers are not magic; they can fail in bizarre ways that are difficult to remedy. If you value your company you will ensure that your eggs are replicated to more than one basket and you will test your failover operations regularly. Having every deploy include failing over from one provider to another may or may not fit your comfort level; but it can be done.


> A hypothetical but not unheard of scenario in which immediate shutdown might be warranted.

Not without warning, no. It is possible that the customer intended to start a CPU-intensive process and fully intended to pay for it.

Send a warning first with a specific description of the "suspicious activity" and give the customer a chance to do something about it. Don't just pull the plug with no warning.


> Let them rack up a months billing in a weekend

Yes, there's nothing wrong with that. You have their credit card and can even authorize certain amounts ahead of time to make sure it can be charged.


This doesn't help if the spending is fraudulent, either because the CC is actually stolen or because it will be disputed or what have you.


"If you can afford it"

There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

My point is that the calculus has more to it than just money. The prudent response, of course, is to do as you described. Have a plan for your provider to go away.

Offsite backups and the necessary config management to bring up similar infra in another region/provider is likely sufficient for most.


> There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

Perhaps we'll start seeing a new crop of post-mortems from the "fail fast" type of startups failing due to cloud over-dependency issues. They're (presumably rare) edge cases, but easily fatal to an early enough startup.


> There's a degree of complexity that comes with multi-cloud that's ill-suited for most early stage companies. Especially in the age of "serverless" that has folks thinking they don't need people to worry about infrastructure.

I just heard a dozen founders sit up and think "Market Opportunity" in glowing letters.

CockroachDB has a strong offering.

But multi-cloud need not be complicated in implementation.

A few ansible scripts and some fancy footwork with static filesystem synchronization and you too can be moving services from place to place with a clear chain of data custody.


A few ansible scripts? Nah.

Everything I have runs in kubernetes. The only difficulty I have to deal with is figuring out how to deploy a kubernetes cluster in each provider.

From there, I write a single piece of orchestration that will drop my app stack in any cloud provider. I'm using a custom piece of software and event-driving automation to handle the creation and migration of services.

Migrating data across providers is hard as kubernetes doesn't have snapshots yet.

There are already a lot of startups in this space doing exactly the kind of thing that I just described. Most aim to provide a CD platform for k8s.


It's hard to get a fully multi-cloud response in a simple stack. I describe one option for a simple multi-cloud stack in this blog post: https://blog.fauna.com/survive-cloud-vendor-crashes-with-net...


For an early startup, though, I would think it's not necessary to be "fully" multi-cloud.

Rather, it would likely be enough to have a cloud-agnostic infrastructure with replication to a warm (or even mostly-cold to save on cost) standby at the alternate provider with a manual failover mechanism.


Most folks overestimate their need for availability and lack a willingness to accept risk. There are distinct benefits that come with avoiding "HA" setups. Namely simplicity and speed.


> Most folks overestimate their need for availability and lack a willingness to accept risk.

I disagree. More specifically, I think, instead, many [1] folks just don't make that assessment/estimate in the first place.

They just follow what they perceive to be industry best practices. In many ways, this is more about social proof than a cargo cult, even though the results can resemble the latter, such as elsewhere in this thread with a comment complaining they had a "resilient" setup in a single cloud that was shut down by the provider.

> There are distinct benefits that come with avoiding "HA" setups. Namely simplicity and speed.

Indeed, and, perhaps more importantly, being possible at all, given time ("speed") and money ("if you can afford it").

The same could be said of "scalability" setups, which can overlap in functionality (though I would argue that in cases of overlap the dual functionality makes the cost more likely to be worth it).

None of this is to say, though that "HA" is synonymous with "business continuity". It's much like the conceptual difference between RAID and backups, and even that's not always well understood.

[1] I won't go so far as to say "most" because that would be a made up statistic on my part


Agreed for the most part. Availability for very many is a binary operation. They either do none of it or all of it.

A clever man once said, "you own your availability".

An exercise in BC planning can really pay off. If infra is code, and it and the data are backed up reasonably well, then a good MTTR can obviate the need for a lot of HA complexity.


> Availability for very many is a binary operation. They either do none of it or all of it.

I assume I'm missing some meaning here, particularly since the premise of much of the discussion in the thread is that there can be high availability at one layer, but it can rendered irrelevant by a SPoF at another (especially when the "layer" is the provider of all of ones infrastructure).

Do you consider that a version of "none"? Or are you pointing out that, despite the middle ground under discussion, the "binary" approach is more common, if not more sensible?


The binary approach is that it either isn't considered or people opt in for all of it without consideration for what is actually needed. The Google SRE book goes into this at length. For each service, they define SLOs and make a considered decision about how to meet them.


Oh, so what you're saying is that they're no considering the notion that there may be a medium-availability (for lack of a better term) solution, which could be perfectly adequate/appropriate?


Yes, there is or they wouldn't turn it off. Companies aren't in the habit of trying not to take your money for services without a pretty damn good reason.

And if it was that critical it should have support and a SLA contract, and you know, backups.


Right. Because big companies never ever do anything unjustified. Particularly when they put automatic processes in place with no humans in the loop, because we all know that computers never make mistakes.


This is fatal. I have a small pilot project on Google Cloud. Considering putting up a much larger system. Not now.

The costs of Google may be comparable or lower than other services, but they don't seem to get that risk is a cost. Risk can be your biggest cost. And they've amplified that risk unnecessarily and shifted it to the customer. Fatal, as I said.


Making a decision purely based upon some posts on HN and the original artical isn’t a good idea either as there is little data on how often this happens and how often (and pulling the plug could happen with another IAAS). You need to weigh up your options for risk management based upon how critical your project is, the amount of time/money you have to solve the issues.

You might never see this happen to your GCP account in it’s lifetime.


This is a hallmark of Google's lack of customer service. They used to use the same filtering alg on customer search feeds as public. The system was a grey list of some sort and the client was worth about 1m in ads a day to them. Never the less, once a month it would get blocked. Sometimes for over a day before someone read the email complaint and fixed it. We had no phone, chat, or any other access to them. They have no clue how to run a business nor do they care. Never partner with them.


There's quite a lot of people talking about how this is their own fault, that they should have expected it, that they should have been prepared. Victim blaming, some would say, even.

But even if you assign blame to the OP for not expecting this, it doesn't look good, because the lesson here is "you shouldn't use google and if you do, expect them to fuck you over, for no reason, at any time".


Exactly. The whole point of using AWS, Google Cloud, etc, is that you get to stop thinking about certain classes of problems. An infrastructure provider that is unreliable cancels most of the value of using them for infrastructure.


Worse, they can potentially more than cancel it out, if they merely remove the "worrying about hardware" (yes, and network and load balancers and everything else) aspects, which are, at least, well understood by some us out on the market, and replace it with "worrying about the provider" where a failure scenario is, not only more opaque, but potentially catastrophic, since it's a single vendor with all the infrastructure.

It reminds me of AWS's opacity-as-antidote-to-worry with respect to hardware failures. If the underlying hardware fails, the EC2 instance on it just disappears (I've heard GCP handles this better, and AWS might now, as well). I like to point out that this doesn't differ much from the situation of running physical hardware (while ignoring the hardware monitoring), both from a "worry" burden perspective and from a "downtime from hardware failure" perspective.


Google just doesn't have the talent, skills, or knowledge for dealing with business customers. They don't have competition in adtech and so never learned, but that doesn't work with GCP. They have great technical features but don't realize that's not what matters to a customer who wants their business to run smoothly.

We've gone through several account teams of our own that seem to be eager to help only to turn into radio silence once we actually need something. We have already moved mission-critical services to AWS and Azure, with GCP only running K8S and VMs for better pricing and performance.

GCP has good leadership now but it's clearly taking longer than it should to improve unfortunately.


I generally agree with you but there is one exception, Google fi has amazing support. I am surprised gcp wouldn't have similar support considering the obvious cost differences though.


Google Fi is for consumers.


And? Businesses should get even better support.


>> Google just doesn't have the talent, skills, or knowledge for dealing with business customers.


This is the problem with being excessively metrics-driven. They have a fraud problem, and there's some low-dollar customer that their algorithm determines is say a 20% chance of fraud. They know that 80% of the non-fradulent people will just upload their ID or whatever immediately, and they shut down all the fraud right away. Their fraud metrics look great, and the 20% of customers that had a problem have low CLV so who cares? It's not worth the CSR to sort it out, and anyhow, the CSR could just get socially engineered. The problem is that the 20% are going to talk to other people about their nightmare experiences.

It may not be expensive to Google to lose the business, but it's very expensive for the customer. Google's offering is now non-competitive if you aren't doing things at enterprise scale. Of course many of Google's best clients will start out as small ones. The metrics won't capture the long-term repetitional damage that's being done by these policies.


this exactly describes Google policy on small fish, by neglecting their concerns, Google gets a lot of negative reputation from small startups who spread the word on forums like this, making their technical innovation largely irrelevant to their future success


We seem to hear a lot of bad google customer support stories. I guess it really shouldn't be surprising. Amazon grew as a company that put customers first. Google is kind of known for not doing that. They shut down services all the time. They don't really put an emphasis on customer support.


I had the company visa blocked temporarily for suspicious activity twice in the last 6 years and no one shut their service off, but I got a lot of warnings. Seems like a really shitty thing for google to do.

Maybe for critical accounts you need to have a backup visa on file with Google cloud with in case the first dies for security reasons.

A single visa is a single point of failure in an otherwise redundant systemm


I was thinking the same thing. For my consumer Google account (google play music, YouTube red, buying movies, app engine) I can add additional payment methods https://cloud.google.com/billing/docs/how-to/payment-methods

After reading this article, I am probably going to do this.


If you use cloud services, a crucial scenario in your disaster recovery planning is "what if a cloud provider suddenly cuts us off?". It's a single point of failure akin to "what if a DC gets demolished by a hurricane?" or "what if a sysadmin gets hit by a bus?". If you don't have a plan for those scenarios, you're playing with fire.

https://libcloud.apache.org/


Sensible complaint/explanation for how this customer was treated: mission-critical systems getting shut down without prior notice.


I wonder what support option[0] they had from GCP.

https://cloud.google.com/support/?options=premium-support#op...


I’ve used AWS support many times and it’s actually really awesome. You can ask them basically anything and they have experts on everything. Really impressive. Yes you pay every month for it but it’s really good.


It sounds like this person didn’t pay for support.


AWS Billing support is Free. And it's also equally awesome.

Paid support tiers (as far as I know) are for deeper system level diagnostics.


I have a similar story. I submitted a ticket to increase my GPU quota. Then my account was suspended, because CSR think the account is committing fraud. At that moment, I have a valid payment method and have been using GCP for a couple of weeks. Only after I prepaid $200 and uploaded a bunch of documents including credit card pictures and ID pictures, my account was restored.

You heard me right, I prepaid them so that my account can be restored.

I miss AWS sometimes...


Honestly, if your not big enough to setup invoicing an option to always prepay wouldn't be so bad. If it reduces risks if interrupts.


This story gives evidence to something I have seen from Google, and why I refuse to pay them cash for a service ever again, much less critical ones - Google customer service is bad by design. I have never seen a company so arrogant and opaque/untransparent as them


This happened to me in a project too. Everything went down due to their bogus fraud detection. I had a Kubernetes cluster down for over a day. Very unfortunate as I loved GCP :(


Same here. Total nuke of the project with no warning even though we were an established paying customer, and there was no fraud involved.


This highlights one of the challenges Google has going into the cloud market - they don’t have a history of serving enterprise customers and the organizational structure and processes to do that well. I think one of the reasons Microsoft has gotten cloud market share so quickly with Azure (in addition to bundling licenses with other products!) is that they have the experience with enterprise customers and the organization to serve them well (regardless of how Azure compares to GCP as a product). Supporting enterprise customers is what they have always done - not so with Google (and Amazon had to learn that as well).


Terrifying.

I'm curious though, did you have multiple credit cards on file with your Google billing account? I'm under the impression that this is part of their intended strategy for avoiding service interruption, but I'd like to know if it actually works that way.

(I took this as a reminder to add a second card to my account)


GCP allows multiple payment methods.

You should always have at least two payment methods on file with them for anything important.

That way if one gets flagged for fraud, services won't be suspended.


This has happened to me. Google's billing system had a glitch, and all of a sudden, an old bill which was paid years ago became unpaid. Google immediately tore down everything in my account without notice due to non-payment.

If something like this ever happens in AWS, they email you, call you, give you a grace period, and generally, do their best to avoid affecting your infrastructure.

GCP is getting better, but it's not ready for anything other than science fair experiments.


That's pretty bad of Google. I just looked in detail at a company using Google Cloud exclusively for their infra and the application is somewhat similar to what these guys are doing. I'll pass the article on to them. Thanks for posting this.


> assets need to be monitored 24/7 to keep up/down with the needs of the power grid and the power purchase agreements made

If you're doing something this mission critical, you have to have an SLA with your cloud provider.


Does Google Cloud offer a service where they are bound by contract not to do this? If not, no business should use them.


Yes


Is it a lot more expensive?


Probably cheaper, as you can start negotiating discounts when you get to this size.


If maximum uptime is critical to the business, your infrastructure should be cross-provider.

I've been running three providers as peers (DO, Linode, Vultr) as a one-man shop for years, and I sleep better at night knowing that no one intern can fatfinger code that takes me offline.


At that Point wouldn't hosting it yourself running on something like vmware vsphere be a simpler option? At least you would have a nice hardware abstraction and a consistent api to build your tooling on.


I hear you. For me, the abstraction is the Linux distro. Build scripts abstract out creating a clean, secure box before installing any custom software, so regardless of the provider, every machine is exactly the same.


All those are VPS providers, it is hard when you use other services of vendors


Agreed! My take is that you don't have to, you shouldn't.


What kind of business do you run? Can you explain a bit about how your systems in different providers are connected?


Surely! Our business is mainly an API that B-to-B customers consume. The strategy is to create identical "pods" in different cities across different providers on identical distros. Between Vultr, Linode, and DO, that's 20+ cities you could place a pod in the States alone. Each pod has a proxy up front, a database slave, and a pair of app server and cache machines.

Ignoring tweaks for international customers, each proxy is in an A record round-robin with health checks via Route 53. US-based requests get forwarded to one of the pods, and the proxy either handles the request with local servers, or points to servers in another pod if it has servers that are down. If any pod has a power outage or goes down for any reason, Route 53 automatically pulls the entire pod out of the rotation. If an entire provider goes dark, all of the pods get pulled out of the rotation, but all the others keep running.


This is very cool. Where can I learn more about stuff like this, and what are the prerequisites for learning something like this? I have a BSc in CS and understand OSes and programming languages pretty well.


Posts on http://highscalability.com can get a bit hardcore, but there's a lot of great knowledge there.


Thanks a lot for your help!


Your domain registrar can probably still hose you. Where's your DNS hosted?


Very true! I have a script that automatically commits any DNS changes to source control, so if Route 53 bites it I could quickly move somewhere else, but you're right, on the off chance my registrar decides to vanish, there'll be some panicking.


you really can't compare switching between generic vps providers from switching between the big cloud providers that provide lots of more useful services.


It is a very different model, but my take is that you shouldn't be building anything on one provider that you couldn't easily move over to another. Provider lock-in is scary.


Google's custom service is ridiculous. They ever suddenly emailed me that my merchant earning (accumulated in about three years) will be escheated to nation government until I provide valid payment information in one month. However, for some reason. it will take more than one month for me to get a valid payment information.

Then they really escheated my earning to some a nation government after 30 days. However when I requested them for the escheatment ID so that I can contact that nation government to find my money back, they said they don't have the ID! They eascheated my money without any recording! Which is almost the same as throwing my money in ocean.


I think this is another instance of GCP just being really, really terrible at interacting with customers. I'm biased a bit (in favor of GCP), I suppose in that I have a fair bit of infra in GCP and I really like it. I'll share a couple anecdotes.

Last year my team migrated most of our infrastructure out of AWS into GCP. We'd been running k8s with kops in AWS and really liked using GKE. We also developed a bizdev arrangement.

As I was scaling up my spend in GCP, I began purchasing committed use discounts. Roughly similar to reserved instances in AWS. I'd already made one purchase for around $5k a month and these are tied to a specific region and can't be moved. I went to purchase a second $5k block, and typo'd the request ending up with $5k worth of infra in us-central rather than us-west. The purchase doesn't go into effect until the following day and showed as "pending" in the console. No big deal, I thought, I'll just contact support and they'll fix it right away, I'm sure. I had this preconceived notion based on my experiences with AWS. I open a support request and about an hour later I get a response that basically tells me that once I've clicked the thing, there's no undoing it and have a nice day.

I've literally just erroneously spent $5k for infra in us-central that I can't use and their response was basically, "tough". $5k is a sufficiently large loss that I'd be inclined to burn a few hours of my legal team's time dealing with this issue, something I shared with the support person. After much hemming and hawing over the course of a few days, they eventually fixed the issue.

More recently, I've been dealing with an issue that is apparently called a "stockout".

Unlike AWS, GCP does not randomize their AZ names for each customer. This means that for any given region, the "A" availability zone is where most of the new infrastructure is going to land by default. Some time in May, we started seeing resource requests for persistent SSD failing in us-west1-a. The assumption is that it would clear up pretty quickly after an hour or two, but persisted. After about a day of this, we opened up a support case asking what was going on and explaining the need for some kind of visibility or metrics for this kind of issue. The response we received was that this issue was "exceedingly rare" which was why there was no visibility and that it would be rectified shortly, but couldn't be given any specific timeline.

I followed up with my account rep, he escalated to a "customer engineer" who read the support engineers notes and elaborated how "rare" this event was and how unlikely it was to recur. Again, I contacted my account rep and explained my unhappiness with the qualitative responses from "engineers" and that I needed quite a bit more information on which to act. He was sympathetic throughout this whole process escalating inside the organization as needed and shared with me some fairly candid information that I couldn't get from anyone else.

Apparently, the issue is called "stockout" and us-west1-a had a long history of them. The issues had been getting progressively worse from the beginning of the year and at this time, this AZ was the most notorious of any in any region. Basically, the support engineer either patently lied or was just making stuff up. Also, I shared with my GCP rep how AWS load-balances across AZs. He promised to pass that along.

The moral of the story is that if you want to be in us-west, then maybe try us-west1-c. Also, GCP is a relatively young arm of a well-established company that has a terrible reputation of being able to communicate with consumers. They'll eventually figure it out, it will just take some time.


Am I alone in thinking critical infrastructure monitoring like this should be run on the metal in your own center? Sure, offload some data for processing and reporting to cloud providers. But I'm slightly worried that electrical grid technology is using something they cannot control, and freaking uptime robot (I use it also) instead of proper IT in a controlled facility.


Usually there are a bunch of comments about power plants being connected to the internet. I doubt the connections from the control rooms or cloud back to the machines are read only unless they have protocol level filters to remove write commands from the wan to plant networks.

Just the way it is unless it is nuclear plant probably.


This is a rookie mistake - create and host a mission critical part of your business (not just your backend infrastructure, your entire business) with a vendor that you have no real, vetted contract in place with?

Total clown show.

Take this as a lesson in risk management and don't fuck it up again.


I lost Gmail account (locked due phone verification)[lost on holiday sim card] all my photo from vacation important email is gone (done backup day before) and that was like 3 years ego still waiting for explanation from Google 20 email send no response


Google once shut down my employers App Store account for 3 days for some routine review. That didn’t just mean we didn’t get any money, that could be argued even. It meant we were simply off the store. Because some needed to see a TPS report


No doubt that this is horrible practice and customer service on googles-side - but at the hypotethical question - what if the card holder is unreachable blabla - we loose years of work etc. Well if you are not professional enough to back-up your system regularly you should loose everthing and it's your fault to begin with... For real - google is not the only one who should step up there game... Only because it's in the cloud now, doesn't mean you should ignore decades old best-practices... servers die, hard-drives die - people with passwords die... handling stuff like that is part of your job...


I work here in Google Cloud Platform Support.

First, we sincerely apologize for the inconvenience caused by this issue. Protecting our customers and systems is top priority. This incident is a good example where we didn’t do a good job. We know we must do better. And to answer OP’s final message: GCP Team is listening.

Our team has been in touch with OP over what happened and will continue digging into the issue. We will be doing a full review in the coming days and make improvements not only to detection but to communications for when these incidents do occur.


I think it starts with ensuring that you have staff that actually review an action before it takes place. Relying on automation can be catastrophic for someone like OP.


Horrible. This reminds me of travels in 3rd world countries where sometimes the electricity just dropped out. Ha, never would have thought that the one Google would occasionally become like this...


We are currently conducting a study on behalf of the European Commission in order to have a better understanding of the nature, scope and scale of contract-related problems encountered by Small and Medium Sized Enterprises in EU in using cloud computing services. The purpose is to identify the problems that SMEs in the EU are encountering in order to reflect on possible ways to address them. To assist the European Commission with this task you are kindly invited to contact: eu.study.cloudcomputing@ro.ey.com


This sounds interesting! I have founded an SME and I have experienced issues w/ cloud computing, mainly data storage and problems with the contracts from providers. I will write you an email! Thanks!


Whether AWS or Google Cloud, if you’re running a real business with real downtime costs, you need to pay for enterprise support. You’ll get much faster response times so that you can actually meet your SLA targets, and you’ll get a wealth of information and resources from live human beings that can help even into the design phases.

Feel free to budget that into your choices on where to host, but getting into any arrangement where you rely on whatever free tier of support is nonsensical once you’re making any kind of money.


Totally agree. There’s so many missing redundancies on this project. CTO / Dir. Eng should have their own card. They should have some contact with a Google account rep, and at least the basic support package.

We’re not a big Google customer, couple thousand a month, but when we migrated we instantly reached out to account reps and have regular quarterly check ins.


True.

Even still, I feel like it's still reasonably likely that the robot would shut down a project for "suspicious activity".


This has nothing to do with enterprise support, which is terrible and everything to do with the fact that Google has automated as much decision making as possible and is terrible at interacting with customers.

I'm not sure if it's the culture of secrecy or sheer cluelessness, but it's pretty bad.

I'm not trying to slam them either, I still really like their products and use them.


Regular Google support: two days after your stuff goes down, you get an email saying the algorithms decided you're fraudulent.

Enterprise Google support: one day after your stuff goes down, you get a phone call saying the algorithms decided you're fraudulent.


Paying for Enterprise Support simply so that they don’t fuck you over sounds a lot like hidden costs to me. The service should be enterprise-grade even if you don’t.


If you run an enterprise, pay for enterprise support. If you aren’t (r&d accounts, startups that aren’t monetized yet, etc.) then don’t pay for it. Flexibility is the model here, and it’s utilized by the big players, so being naive to the business model will burn you on many public clouds.


> If you run an enterprise, pay for enterprise support.

I do, but that wasn't my point.


I wouldnt trust google with anything anymore. Their customer support (entreprise or not) always sucked. They're always "right". You can only suffer the damage silently unless you're worth millions for them. People should come to realize that it's been years google is NOT your friend.


The weird thing here is that inside google, support is great. Meaning that when people are building products for other google engineers, they go way above and beyond what is needed to help each other out. Somehow, they are not able to transition that to external customers.


What really underlines this blog and thread is that as of the moment of writing this comment, there's no official answer from Google, or even a non official one from an employee. The feeling I get as a customer is that they just don't care.


I did make a post below actually a few hours back and will copy and paste it below for reference. Rest assured that we are working on this one and are in contact directly with the customer. We hope to have more of an official response soon.

from below: I work here in Google Cloud Platform Support. First, we sincerely apologize for the inconvenience caused by this issue. Protecting our customers and systems is top priority. This incident is a good example where we didn’t do a good job. We know we must do better. And to answer OP’s final message: GCP Team is listening.

Our team has been in touch with OP over what happened and will continue digging into the issue. We will be doing a full review in the coming days and make improvements not only to detection but to communications for when these incidents do occur.


My company was another victim if you need another non-fraud case to look into.


Sad to know such events. Can anyone comment on what would be the number of events (%) lead to tipping point- move people to avoid any SaaS? Anyone experienced such tipping points that led to overall change in trend especially in other industries?


On top of "this sucks": such "fully automated" response is a violation of GDPR compliance that requires "right to obtain human intervention on the part of the controller".

Stunning that Google cloud can get away with this.


The first requirement of any production system is redundancy. It sound like on top of the layers for cpu, storage, network and application, the new requiment is redundant cloud platform providers.


The writer - twitter handle @serverpunch - not only doesnt identify him/her self, they seem to have created a Twitter account just to shit on GCP.

What is the activity which got flagged as suspicious?


Wow, haven't heard of a story like this yet where a cloud provider provider shuts down all resources without warning...

Luckily, with AWS and Azure there is great competition to move to.


Does google allow you to preauthorize your purchasing card? It seems like part of this is that they all of the sudden get suspicious about your billing information.


I guess Google treats their paying cloud subscribers like their revenue generating YouTube content generators. Moderately surprised...


Google servivcces are great - unless something goes wrong. It seems googlers have never googled "support" :)


Wow if that’s possible on google cloud then I agree wholeheartedly.... don’t risk your business by using google cloud.


Larger companies on AWS will spread their infra across multiple accounts to deal with this risk.


I experienced exactly the same thing earlier this month. Due to "suspicious activity" on YouTube, Google suspended not only my YouTube account, but also Gmail, Google Chat, and ALL other services they provide, including their Cloud services.

They provide no explanation as to why the accounts are closed, and provide only a single form to appeal the situation.

In my case they refused to release the locks on their services so all information, all contacts, all files, all histories, all YouTube data, and everything else they store is now effectively lost with no means of getting the data back.

This was done without warning, and the account was locked due to "suspicious activity on YouTube".

Through Experimentation with a new account I found that the "suspicious activity" they were referring to was Criticism of the Trump Administration policy of kidnapping children from their parents.

Posting such criticism to the threads that follow stories by MSNBC and other news sources triggered Google to block YouTube and all other services they provide and to do so without warning or any explanation.


Here's the harsh truth:

If your stuff relies on "cloud" you are on your own.


Sorry but this happened because Google and aws lack of the vision of run true Enterprise companies. I see this situation on azure many times and in the end time try lo locate to te comercial people behind of this customer .. it is a lack of the Enterprise vision


Well that's a real Horror Story and I hope Google's listening. I guess Amazon web services really is the way to go.


Does it really matter whether you get bitten by the dog or the cat?


False equivalence. Amazon built their products on putting customers first while Google is like a giant robot overlord without a human interface


And dogs are supposed to be our best friends.


Has anyone experienced anything like this on IaaS services like Heroku or Gigalixir?


Wow, that is insane. Hope this gets distributed widely to Google Cloud decision-makers.


I don't get it. Why use Google at all? There are tons of better things out there.


GCP's managed Kubernetes product is superior to EKS and AKS.

I personally prefer using their API and tools over AWS. They seem more sensible. Azure is horrible.

GCP offers inter-region networking and more sensible firewalls and routers.

Pricing is pretty comparable to AWS, slightly better for instances, slightly worse for storage.

Storage performance for persistent SSD seems a little better than EBS GP2.

Cheap local disks on any kind of instance.


Do you know if gcp firewall has limits on the number of rules and what those limits are ?


I'm sure there's a limit, but I don't see anything that's published.

I, for one, wouldn't ever want to operate a firewall with enough rules where I'd have to be concerned about such a thing. Eek.


its pricing is pretty good, at least for VMs.


For managed services, there are better options.

For just VMs with the fastest CPU, storage, networking and the easiest billing compared to other major clouds, GCP wins.


I’d say that really depends on what you need. If you want a full platform with a lot of integrated services there’s really only GCP, AWS, Azure, Tencent, AliBaba and maybe something else I forgot. So I wouldn’t say it’s a ton of better things that are available. Sure if you only need a bunch of VMs you have a lot of options.


I had AliCloud experience similar to OP but they gave me 24 hour deadline.

"We have temporarily closed your Alibaba Cloud account due to suspicious activity. Please provide the following information within [24 hours] by email to compliance_support@aliyun.com in order to reopen your account: ... If you fail to provide this information your account will be permanently closed, and we may take other appropriate and lawful measurers. Best regards, Alibaba Cloud Customer Service Center "

I provided the documents in ~30 hours because that's when I saw the email. There was no further communication from Alibaba. I assumed everything is ok, but in 2 weeks my account was terminated.


Triton cloud! Joyent has a wonderful service that allows setting up multi-cloud environments.


I think I have been linked to at least a few hundred Medium articles, and this is maybe the third one that was good.


Wow as a CTO this is a nightmare scenario. Is this common? I guess this means google.com does not use Google Cloud because I’m sure they have uptime targets. They cannot handle incidents like this and expect people to take them seriously as a cloud provider.


Read the SRE book and learn about the companies you are talking about…


I'm not the person you were responding to, but I'd be curious to hear more. Could you please explicitely say whatever it was that you were implying?


I think the reference is to this book, "Site Reliability Engineering" [1]. As you will see on the main page [2], it is part of Google's effort to describe "How Google Runs Production Systems", which is basically what the parent comment was asking about.

[1] https://landing.google.com/sre/book.html [2] https://landing.google.com/sre/


> We would have lost everything — years of work

Okay that sounds like a greater systemic problem.

You should be have been able to deploy your git repository on another system pretty quickly, as well as have your own backups of your database.

The most time consuming thing should be setting up the environment variables.

Let me see, what else would be tricky: if you are using google analytics that data might be gone, but your other metrics package should have had many snapshots of that data too


From a Google shareholder’s perspective, this approach is unacceptable:

The only way to use their product safely is to engineer your entire business so that cloud providers are completely interchangeable.

Forcing the entire industry to pay the cost of transparently switching upfront completely commoditizes cloud providers, which means they’ll no longer be able to charge a sustainable markup for their offerings.

This is fiscally negligent. Upper management should be fired.

However, it’s great for the rest of the industry — Google nukes a few random startups from orbit, some VCs take a bath, early and mid-range adopters bleed money engineering open source workarounds, and everone else’s cloud costs drop to the marginal costs of electricity and silicon.


its not even a premature optimization to make completely interchangeable server code these days.

the shareholders should be proud that such naivete towards vendor lock is still rampant


Not OP, but according to his business case, being down for a few days could bankrupt the company. Re-deploying from git doesn't solve the use-case of your public cloud provider pulling the plug on your machines.


"Why you should not use Google Cloud" if you're a small business. A large business will have contacts, either on their own or through a consulting firm, that can call Google employees and get help. As a small company, you're at the mercy of what Google thinks is adequate support for the masses.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: