Hacker News new | past | comments | ask | show | jobs | submit login
Google Intrusion Detection Problems (fredtrotter.com)
374 points by K0nserv on Aug 22, 2016 | hide | past | favorite | 136 comments



If Google is ever going to fix their cloud market share problem[1], they need to reverse the deeply engrained perception that they don't support or maintain their services for the long term.

The majority of Hacker News pundits will all have the same gut reaction to these kinds of anecdotes: complete and total lack of surprise. And although not all of us are in a position to control the vast budgets of the enterprises that drive much of this marketshare, the opinions of the rank and file do have a vast impact and I believe Google's unstable support and product commitment is one of the biggest things holding them back.

1. http://www.nytimes.com/2016/07/25/technology/google-races-to...


I just want to say that Google's supposed lack of support is just a perception, despite all the usual comments below.

Real example: we use Appengine, and the other day I spotted a bug in a request handler. I filed an issue on the Appengine issue tracker, and have been having a back and forth conversation with a a very real (and very skilled) engineer about how to resolve it. This is outside our support contract, so anyone with an software issue has exactly the same access.

This doesn't cover situations such as the OP experienced, of course - but if they had a support contract, I think they'd have found the issue was resolved quickly and efficiently. We pay $150 a month for a silver support contract, and when we've needed help, we've got personal, prompt and effective support.

At the end of the day, you get what you pay for. If your organisation depends on any service, you need effective support, and you need to pay for it. That goes for Microsoft, Amazon, Salesforce or anyone else - good support costs a lot, and they all have to cover costs. Each company uses a different model, and some include more support costs in their base charges than others.

I happen to like Google's model where dipping your toe in the water costs very little, and when your service proves itself you can ramp up support as you need it. But it leaves Google open to criticism like this the OPs, who let their small project grow big without contingency planning and then complains when something goes wrong.

Finally, the only reason I'm commenting here is that I was so impressed with the support from the engineer mentioned above that I said I would mention him next time Google's perceived lack of support came up on HN. Promise kept!


You are soo right. I really want Google to succeed (more competition the better), but I hear things like this and I'm just blown away. They are already fighting a reputation for abandoning projects when it suits them (yes, mainly consumer applications, but the perception persists) and doing this just makes it worse. C'mon Google ... hire some marketing people that know what they are doing and sort out your customer perception issues.


I don't understand why people are ever surprised about Google offering substandard support for really anything.

Time and time again Google products prove to make the happy path happier, but as soon as anything should go wrong, you're on your own with no explanation, and no support.

Additionally, and as an aside to the core issue here, I've found the recent Google Cloud UI updates to be a real pain. When I last used them, I did acknowledge that it was beta, but I seemed to lose an awful lot of oversight of an awful lot of things. It's put me seriously off it vs. AWS.


Well, if it's something like YouTube you can kind of "forgive" their awful customer service in exchange for the free hosting and search features. It's a heck of a lot less forgivable for a product you pay actual money to use.

The support page telling you to use a widget that doesn't exist is really going above and beyond in the realm of user-hostility.


FWIW, Dropbox also points you to buttons that don't exist.

When you delete an account, DB keeps charging you. There's a nice FAQ that explains to push a "review subscription" button and blah blah, but in reality that button doesn't exist.

The button isn't there.

Support will "fix"it for you one-off, but not change the button. Wonder if this dark pattern is worth 3-4% of revenue.


> When you delete an account, DB keeps charging you.

Maybe they expect you to cancel the card upon deleting an account.

It can be said with certainty that this cannot be an oversight. Not many look at their credit card statement anyways.


> It can be said with certainty that this cannot be an oversight.

It can be, especially when you're using something like Stripe. It only takes one failed call without retries (or a period of time where networking isn't working quite right) for an account to be orphaned.

Arguably, those sorts of calls should be durably queued and retried until success, but it often doesn't happen.

PM decides it's cheaper to have support field the one-off requests than to spend a few days making those changes.

Later on, another developer accidentally disables the call and there isn't sufficient test coverage for disabling billing on account removal.

It's a story that'll be repeated over and over.


The easiest way to fix it is to click the fraud button on your cc account =P


Nice if it were that easy. But for business accounts you want to delete one (or more) licenses. Look at the form 2/3 of the way down and ask if this makes sense.

https://www.dropboxforum.com/hc/en-us/community/posts/201886...


There should be integration tests for documentation. Maybe some machine learning could be applied to that? Maybe I should set that up on some cloud service. Let me check the documentation for how to do that...


That's biting off way too much. You have to realize the scope of the actual problem "making humans understand something they previously do not understand". If you can do that you might as well apply it to fixing education instead.


One of the things sandstorm.io does which makes a lot of sense, is that they regularly commit changes to their docs within the PRs that change the actual software. It's not integration tested, mind you, but the general practice means they do a pretty good job keeping their docs updated.


Maybe Google needs to have fewer robots involved and more human beings?


> It's put me seriously off it vs. AWS.

Can't really compare, but AWS isn't great either.

I've experienced instances when an instance (1) shut down, (2) rebooted on it's own.

AWS support would point you to the SLA if you were to report the incident.


Even if they do, that's worlds better than saying "One of your servers looks like it's doing something bad, so we will shut down your entire Google/AWS account automatically. We won't tell you what server it is or what it's doing that we think is bad. There's no way to talk to a person at all. Your only hope is to either post something on social media and hope it gets noticed, or restart your business on a platform that actually knows what support is."


All servers can shut down on their own, servers in the cloud can fail like any other. A good way to develop a robust system is to do your own surprise shutdowns / reboots, since Amazon probably doesn't do it often enough.


EC2 instances are meant to be ephemeral.


That doesn't sound factual at all. Got docs which support that? They send us emails every time they plan on doing intentional maintenance on our instances.


No, I'm pretty sure this has always been the case. Some early adopters lost data when it was on local volumes and not backed up, and everybody who paid attention learned from the public post mortems.

Separately, even hardware you own and operate is ephemeral. You never know when the disk, PSU, or cooling fan will kick the bucket and the host wedges one way or another.

If you want the illusion of permanence, VMWare had a solution for you ... for a price you're not gonna like!


I wish it wasn't true. On all the occasions, I was told that there was "an issue with the underlying hardware".


Yes, this happens sometimes with AWS. We're heavy users and have had to respawn instances on several different occasions due to unexpected issues with the underlying hardware. Only once or twice has AWS told us they're moving us because of that, but we've noticed performance issues that after troubleshooting with support, could only be described as a symptom of the hardware.

Everyone should have backups and a plan in place to recover when a node goes offline, but this isn't exclusive to AWS or cloud in general. A robust infrastructure will have failovers standing by and accept the possibility that any one machine could die at any moment. Shouldn't have any single points of failure, as colocated hardware kicks the bucket unexpectedly sometimes too.


It is worth noting that "the cloud" is synonymous with "someone else's computer"


> It is worth noting that "the cloud" is synonymous with "someone else's computer"

Well, no, its not, otherwise "private cloud" would be incoherent.


Sure, but that's very different than them being intended as ephemeral.


I didn't suggest that those are ephemeral.


Yes, yes you did. Sargun posted: "EC2 instances are meant to be ephemeral" to which bpicolo replied: "That doesn't sound factual at all." You then replied with: "I wish it wasn't true."

_it_ in this case refers to the ephemeral state of EC2 instances. You literally did suggest they were ephemeral.


This does happen (I once wrote that it didn't ... turns out I just didn't wait long enough), but I've always found AWS support to be really helpful. Couple that with the abundance of documentation that they offer and all of the information that is available on the Internet and the situation isn't really comparable to Google's offering.


Yeah, Google tends to rely on machines too much for customer support for paid services. I understand the rationale, but it's just not good enough compared to real people handling customer support... yet


FWIW, anecdotally the Google Fi support was pretty good


My name is Terrance and I work on Google's Cloud Support Team. Our team mission is to Reduce Customer Anxiety. We take that very seriously, and Fred's experience obviously shows that we fell short this time. The process should have been an easy flow to follow, and it was not. We are reviewing this incident in detail to ensure that we make the process less error prone and quicker in the future.

Best, Terrance


Thanks for your help and for responding here, but I hope you guys at Google know that this seems like a major organization-level problem. There's a big perception that Google has no real customer support for anything, even if there's pretty serious amounts of money involved, fueled by a regular drumbeat of blog posts about people who had their entire livelihood shut down for unexplained reasons with no possible recourse. It's kind of like the "five whys" failure analysis. Yes, this particular automated workflow was broken, but that's not the real problem. The real problem is the lack of human support that can turn issues like this from a crisis to a minor hiccup.

It seems like you guys really need to get some call centers filled with people with some level of diagnostic ability and problem-solving authority. Yes, this is expensive and difficult to get built, but if you really want to Reduce Customer Anxiety, nothing can do that like the knowledge that there's a real person available 24/7 who can fix any problems on Google's side. Make it pay per incident or by subscription if needed to keep the jokers out, but anything is a whole lot better than nothing.


Even the phrase "Reduce Customer Anxiety" would increase my anxiety if I were a customer (and steels my resolve not to become one of their "acceptably-anxious customers").

"Reduce", not "remove" - it's stinks of implying there's an acceptable level of "customer anxiety" that their platform should generate, and someone's set some easily-measurable metrics and a bunch of people's KPIs - and those people are now all working out how to game the metrics instead of solve the problems. "Hey, my bonus depends on reducing the clicks on the "Request an appeal" button. What's the easiest way to meet that goal? I _could_ start handling customer service in a professional and sympathetic way - but that's not "The Google Way". I know, I'll just move the button to a different place this quarter, rename it next quarter, and remove it all together before the end of the year."

Sorry Google-people - I know you all don't think you're doing evil, but there'a a hell of a reputation you got to overcome on the "Google just doesn't do customer service" issue.


> "Hey, my bonus depends on reducing the clicks on the "Request an appeal" button..."

That sounds entirely too plausible as the real root cause of this problem.


I don't want to sound disrespectful, and very much appreciate your posting. But realize that the audience here has seen the OP's story play out with Google's products over and over again. This is clearly a systemic issue with how Google handles support, and claiming "fell short this time" is more than a little disingenuous.

(I also get 'man bites dog' will also get more traction than the inverse, but regardless - this kind of complaint is a very common one about Google's products, from consumers and businesses alike).


FWIW Google Cloud Support != Google support in general. My interactions with Cloud Support on the other side of the fence has led me to believe they are very committed and working very hard. Yes, you have to pay for it. But once you do, regardless of the tier of support, I've seen them work tirelessly and escalate tickets to engineering quickly (order hours, not days) if they couldn't figure it out.

I'm looking at the postmortem now and without wishing to jump the gun and talk about things I can't, it looks like this is being taken very seriously and a number of improvements and bug fixes are going to result. In this instance, I think it's doing the Cloud Support people a disservice to call them "disingenuous".

Disclaimer: Used to work on Google Cloud, now on Google Open Source Programs Office.


> FWIW Google Cloud Support != Google support in general

> I think it's doing the Cloud Support people a disservice to call them "disingenuous"

The problem is that to us from the outside, they do look the same.

In fact, that email in the article, and the subsequent events, are in exactly the same style as the "ban hammer" Play Store publishers get. You really can't blame us (Google's actual paying customers) from equating one with the other.


I'm not entirely convinced that people who have a bad experience with Windows support would then assume Azure support was poor, similarly with Amazon and AWS... Why should Google [Product Area X] and Google Cloud be any different?


Windows and Amazon are consumer facing products. Azure, AWS, Play Store publishing and Google Cloud are all developer facing products.

(And no, I do not equate consumer Windows support with enterprise Windows support either, and as it turns out they are not the same level.)


Nobody cares how hard you or your coworkers work. Results matter, nothing else.

Given the story so far, you offer adwords style customer service (we don't give a damn about you) while attempting to catch up from position 3 behind aws and azure in the paas race. Complete with automated emails with outdated documentation and no human anywhere without a social media outcry. I'm sure stories like this are damaging (seriously, just use aws), and there will be some postmortem damage control, but unless I see a human support sla for situations like this, I can't see why anyone would risk their business on you.

Seriously, a fix it or we shut you down email for reasons you won't share with an 2 business day response. Of course, you're issuing a 3 day warning, so if that fell on a Friday, you would be dark before anyone bothered to look at the support ticket (3 days expire Monday, 2 biz days Tuesday evening). Amazing.


As someone who had a similar experience[1] with Google Cloud, I would like you to understand that in Google Cloud people place their businesses. When you switch off a project, you switch off someone's business. For anyone else, even for a goverment agency, such action would never be taken hastily. Since downtime could be catastrophic for a company, you should err on its side. Technical prowess isn't all that matters for business.

That said, we still use Google Cloud for some part of our infrastructure and enjoy its technical side.

[1] The billing credit card for one of our projects expired weeks before any invoicing would happen. Instead of informing us —after all we were paying customers for many months before, there should be at least some good faith— all our projects were disabled many minutes (maybe 15 or 30) before we got a confusing email about the issue at hand and the total downtime was two days.


Love the "Reduce Customer Anxiety" phrase.

Reminds me of the emergency telephone signs on the Golden Gate Bridge. They once read "Emergency Telephone in case of Breakdown", so people could call a tow truck. Then they were changed to "Emergency Telephone for Psychological Counseling" and now connect to suicide prevention. I can see someone with a flat tire on the phone, "And how does having a flat tire make you feel?"

This mindset is totally inappropriate for a B2B service. The caller is probably not anxious. They're at work, doing their job, keeping something working. They need repairs, not grief counseling.


It's the new fad of marketing support as being an advocate for the customer. They use language and phrasing to imply they exist to make the customer happy/relaxed/productive which if it were true would be a great thing.

But it's just marketing, at the end of the day support is still a large expense that some companies believe they don't have to bear. This seems to work for some companies better than others, likely due to different business models leading to different customer expectations and requirements.


Don't worry dear customer, the company "understands your concerns". Great! (But that doesn't hel me get my problem solved.)


Google Cloud publishes a Root Cause analysis after technical incidents. I wonder if this non-technical incident is worthy of a root cause analysis. As a potential customer, I know I'd feel more comfortable seeing this organizational "bug" fixed.


It seems like there are a lot of points in Fred's story where even a miniscule amount of manual testing would have revealed the problem. For example, paying a tester to follow the instructions in the FAQ or to try to appropriately respond to the original email would have revealed how unclear the instructions were and that there is a missing UI control.


I really want to believe you, but every Google service has been like this for years. Like Play Store for example.

It will take way more to show you guys really are reviewing similar incidents and, more importantly, to actually act on your reviews and improve your processes.


This is absolute BS. I've had nothing but stellar support from the Play store and you can easily contact them via chat, email or even voice.


I just tried publishing my first app to the Play store and am in the middle of a horrible experience.

I wrote a little app to help myself learn birdsong. You copy some mp3s to a folder on your sdcard, open the folder in the app, and it will shuffle them and only show you the name of the file if you ask. Pretty simple, but I found it useful and thought it might be useful to other people. So I pay my $25, upload my app, and wake up the next day to an email saying my app had been suspended for "deceitful behaviour" and that it happened again they might suspend my other Google services, for instance the gmail account I've had since 2004.

Of course the email doesn't contain any information on what the deceitful behavior was, so like a Kafka story, I'm stuck defending myself against charges that I'm never informed of. When I heard back from support, they say that my app opens to a list of mp3 apps to download. So either my account has been hacked and someone changed the app apk, or they are unable to tell the difference between a directory listing from the sdcard and a list of mp3 apps. If the latter, how my app can be deceitful when it matches one of the screenshots I attached is completely beyond me.

Once my email migration to fastmail is complete, I'll rename my app and try again, this time with a modified UI that makes you click a button before showing a directory listing. I'd like to get my $25 worth if nothing else.

This whole incident has left me with zero confidence in their customer service or technical competence.


It's this linkage with the rest of your accounts that can make doing business with them particularly dangerous. After the Google+ real names debacle started, which included people using their real names which Google didn't believe, and shutting down all their Google services, I stopped using every one of their services except for ephemeral zero stickiness ones like search (fortunately I'd discovered Fastmail some time previously, and they're beyond wonderful).

Buying a house built in 1910 and having to remodel and move to it financially prompted me to buy a smart phone, and Project Fi was clearly the best match, so I've now seriously reversed that ... but that also means I now won't even think about increasing my exposure to these sorts of risks by trying to write apps for the Play store, using any of their cloud offerings, etc. My main phone and its number are too valuable, too sticky to risk.


Just because you have not experienced it doesn't mean it is BS. There are countless horror stories out there about people getting their apps removed by an automated process, receiving templated emails with ambiguous reasons for the removal that is impossible to pinpoint the exact cause, and having no way to contact any real person to ask about it.


Please, please post an update on what y'all are going to change in the future to prevent incidents like this from happening again. It would be more reassuring for those of us who are considering buying into Google Cloud Platform.


This post sounds like you copy-pasted it from the your department's list of prewritten generic reassuring statements. I doubt that is helping the perception of Google's support.


How does one reach someone at google in an emergency when you have no access* to any google services?

* locked out of your account


Write a good blog post and get it to the front page of a news aggregator


We became aware of Fred's problem before we knew it was here.

-Terrance


That doesn't really answer the issue, or the question. How are we supposed to get in touch when things go wrong? The blog post clearly showed that support was an afterthought and not properly QA'd.


Post to one of the product-related Google Groups like google-appengine@ or gce-discussion@googlegroups.com. We have support staff monitoring those lists seven days a week.


You know that Google has a serious problem with handling complaints and feedback when people take every opportunity to interact with an actual Google representative no matter what the topic is or if they are indeed interacting with a real Google employee.


Personally I really value comments like this and therefore feel the need to point out that the statement "Our team mission is to Reduce Customer Anxiety." seems like a bullet point for a status meeting and likely does not convey the desired effect of such a post: to have a somewhat personal response from a normally faceless company.


It actually gives me the opposite feeling than probably intended: "We'd rather pay someone to post BS responses on social news sites, than dedicate someone to fix the problem."

I'm not saying that's actually what is happening, but it feels like it to me.


If people wanted their anxiety reduced, they'd sign up for a guided meditation group.

People want their accounts reactivated.


So what's the support process when you screw someone but they can't hit the front page of a social media site?


Assuming that the appeal process is broken, or you're locked out of your account or something equally egregious: you post to one of the product-related Google Groups listed at the bottom of cloud.google.com/support. Support staff and engineers read those lists and will escalate issues.


HN is limiting my post rate. So I can't respond to all of you. If anyone would like to chat about reducing customer anxiety or feedback on what you would like to see on this process, you can send me an email and I will respond when I can.

Tscanausa @ Google


I think the response has been pretty clear and near-unanimous. Here's a bullet-point version:

1. It's absurd that an automated process accuses you of trying to intrusions of third parties without providing details of what has been detected. This leaves the "accused party" (your paying customer) without ANY idea what is wrong or how to fix it.

2. The automatic suspension of the entire project within 3 days is similarly out or proportion, especially without Google attempting actual human contact first (a phone call, for example).

3. The appeal-process is apparently literally broken. Even if this weren't the case, disabling a customers entire project in an automated fashion requires WAY more ways of directly reaching Google to prevent this. Give a direct phone number in "emergencies" like this. We know it costs money and training. We expect you to accept this as a cost of doing business.

The above leads to the (at the moment rightful) impression that Google Cloud isn't ready for business, especially not small business. Turning off what could be someone's livelihood is a testament to the (possibly unintentional) arrogance that is widely perceived from Google.


So you're telling us that you're frustrated because an automated system prevents you from doing something you need to do, and there's nobody to appeal to...


You need a bigger team.


Please offer an option to host servers in Germany under German data protection laws like Amazon does.

There's very little information and/or reassurances.


I think tscanusa might be trolling HN


This is a perfect example of why I refuse to use Google Cloud. Google in the past has put me through similar automated loops with Adwords and Adsense. Even disabling an Adwords account for 2 weeks for reasons I still do not completely understand. After weeks of trying to get a hold of the right person and being pushed through a ton of different channels, it was finally re-instated with a generic apology and still no concrete reason as to why.

I have learned that it is way too risky relying on Google services for crucial business operations.


The Nth story about that guy who runs a business depending on Google services and suddenly starts telling everyone else they shouldn't run their business on Google services because Google services flagged them and they realized it was a bad idea to run a business based on Google services...


Yup. This is note even a remotely surprising story. If you're building your business on Google, you're building your business to be arbitrarily shut down. You should not operate any business critical service or store any business critical data on any Google product or service.

If anything, this is one of the more generous stories, because Google gave them a day or two of notice first. Usually you just find your Google account shut down one day.


Well, I wouldn't go that far. If you run your business on Google Services, you definitely need to have a contingency plan for what you're going to do the day that Google decides to arbitrarily cut you off, though.

But in some instances it might be financially worthwhile to take advantage of their services (particularly their free or very cheap ones) as long as you have a plan for standing up something else when that happens.

For instance, I'm pretty comfortable using Gmail, even though I know they might someday pull the plug on my account for no particular reason. I'm comfortable doing that because all my mail is backed-up elsewhere, and in a pinch I could restore everything or stand up a new mailserver in a day. It would be obnoxious, sure, but it's not exactly breaking new technical ground or anything. The years of free service I've gotten out of them make that risk manageable.

I could easily see a company taking that bargain for various other services. The problem becomes when you simply rely on a 3rd-party's infrastructure as if it was your own.


This is the google M.O. - optimize all processes through cheap machine learning and heuristics and accept that some people will get screwed along the way.

The number of people that can be served at very little cost without any human involvement is apparently very lucrative. Google just considers customer service as an archaic relic that predates the invention of behavioral algorithms.


And it works quite well from them. Yes, people on HN and on internet will complain...

Also it could be that customer support is just overrated.... The problem is that 95% of support calls / issues are anyway just people complaining about things which are not at all related to the actual Google service.

But, on the other hand, hackers which try Google Cloud and are bitten by lack of Google support do have influence on decision making process in big corporations.


Yeah, how dare all those little people complain when they're inadvertently crushed by the glorious Google behemoth. Don't they realize how insignificant they are? Their entire lives are simply rounding errors at Internet Scale.

Sometimes the end result is less important than how you get there, and how you treat the ones who don't fit into your plans. It happens all the time with big companies, but Google even more than most. They KNOW that the potential for problems exists, but so long as it only affects 5%, who cares? Fuck you, got mine.

I think that the attitude conveyed by Google's lack of support is what inspires a lot of people to post rants online, rather than the frequency of incidents.


Once had Google kill a Spark job after it was running for about 12 hours because they thought we were doing something nefarious. It cost us thousands of dollars in wasted time and Google spend because their automated system made a mistake. They never attempted to fix it, refund us or even seem concerned.


And you compare that to the many stories you hear about Amazon waiving invoices when people inadvertently ran up multi thousand dollar AWS bills by accident or when credentials were exposed. (Happened to an ex-colleague of mine years back - AWS creds committed to a public git repo - minutes later there's like 40 10xlarge or g2.8xlarge or whatever instances mining bitcoin. First thing he knew about it was Amazon ringing him up saying "this $10,000 spike in your typical use, that's not really you, right?" and shutting it all down for him and reversing the charge...)

Then you consider this when you decide whether to base the next big business decision on an AWS or Google Cloud platform…


Intrusion detection false alarms can be a problem if you run anything like a web crawler. I have one running at "sitetruth.com", which is a site rating system hosted on a leased server. (Not a cloud service, a leased rackmount server). One of the things it does is to find the home page of a site by trying "example.com" and "www.example.com", with and without HTTPS. Some sites will block access for 30-45 seconds if those requests are made too fast.

About once a year, there's a serious intrusion complaint, as the crawler, which obeys robots.txt, examines about 20 pages on a site in a few seconds. No more than three connections at once, but some sites are touchy. The server leasing company sends me a warning letter, I reply and call tech support, and there's no big problem.

I can recommend leasing servers from Codero as an alternative to dealing with the Borg of Mountain View. I've been a customer for five years, and nothing bad has happened. They now have "cloud services" too, but I haven't used them.


> the crawler, which obeys robots.txt, examines about 20 pages on a site in a few seconds.

That seems like something that could be slowed down slightly with little negative impact, and a big positive impact (no annoyed people, no interaction with support).


Our user is waiting. We rate sites on demand, and the user is looking at a rotating "busy" icon in their browser search results during the rating process.


Posting this 4 days later, what happened next:

http://www.businessinsider.com/google-cloud-won-skeptic-afte...

Google then shocked him again, in a good way. Within four hours of tweeting, someone from Google had contacted him and had restored access to his project.

Trotter says that, it turns out, he and his team bear some responsibility. They had inadvertently set up a server wrong, exposing a hole, and a hacker was using his company's to conduct a "denial of service attack," which is when hackers overload another website or online service with so much traffic, it shuts down.


Lol

Sorry to be so harsh, but Google has always been this way.

Why do people keep putting essential stuff in someone else's sandbox? Your effectivly adding a SPoF, that you have no control over


Is Amazon or Azure or Heroku really any different?

I've had similar problems in the past with Amazon shutting down services and providing little to no support to get it remediated. Even after paying for technical support, it was impossible to recover (in our case, it was a billing issue that was entirely Amazon's fault).

My general recommendations are:

- be sure you have fully scripted your deployment process, - be sure you are making remote backups of critical data (often times this is as easy as setting up simple replication of S3 to GCS or Azure's Cloud Storage) - rely as little as possible on their "value added" features - if you do rely on their value added features, be sure you still have someone on staff who can quickly replace it with something minimal (e.g. know how to install your own mysql)

At the very least, this will put you in a position where you aren't 100% locked out of your data, or have to completely rewrite your application in order to move to another provider.

You don't have to go crazy with some kind of hot failover, or active-active deployment. Just have daily snapshots of your data and mitigate the risk of your PaaS provider.

No matter who your hosting provider is, you should probably have this sketched out as part of a "disaster recovery" plan.


> Is Amazon or Azure or Heroku really any different?

Nope, and thats why I don't use them. There are situations where they make sense, but people should be much more skeptical about them. I don't have stats, but IME I've yet to see a company that actually saves money by using them. I've seen one company that did save money the first year, but got too addicted to "just spinning another instance up" and stopped optimizing their code. The next year they were bleeding cash.

So, speaking from my experience: It's more expensive, less configurable, and unreliable. Why do people keep buying the cloud lie?


IMHO, I think this misses the larger motivator for moving to these services. I don't think you can approach the Cloud as just "a less expensive data center", because you're missing what I believe is the real motivator, "developer productivity". And I don't think this is just Cloud, it's DevOps, Microservices, CI/CD, Serverless, etc. I consult in these areas at large enterprises and have a front row seat to how a lot of these problems are approached. What's particularly interesting is the common thread of developers moving faster by automating human processes outside development. For example, if you have mature CI/CD, you've automated all your testing to the point you no longer need a QA team of manual testers. If you move to the Cloud, you no longer need a data center team. Etc.

Now it's obvious that companies just want to save money, but another reality is that those human processes just cannot compete on the service they offer to their customers, developers. You could have the best QA team in the world, but compared to a dev team armed with mature testing practices and CI, it's seconds vs. hours for developers to get the feedback they need in order to ship.

Infrastructure hardware is cheaper than ever these days in order to compete with the Cloud, while good developers are at an all time premium. When infrastructure is no longer a slow, scarce resource, it enables developers to spend more time perfecting their applications and experimenting with new approaches. So I suggest you not approach the Cloud as purely a cost saving measure for the infrastructure department, and really think about how transformative it can be in changing your IT processes.


oh absolutely. But you can also do the same thing with equipment you own. I run mostly VM's because it's easy to spin one up, test something, and kill it. My problem isn't virtualized infrastructure, it's letting go of control, for no actual gain.


I worked on a project that costs about $5k/month to self-host. After moving it to AWS, it was about $2k/month. So $36k/year savings in hosting costs. In our case outgoing bandwidth at AWS was about 10x cheaper than what our ISP offered.

Having spent years help running a physical hosting company, working with virtualized servers saves LOTS of time. I still find it amazing and much, much cheaper in staff-hours to reboot a cluster and have it come back up on 2x larger hardware when a project grows.


Right, I'm not saying hosted solutions are never the answer, they are just less often the answer than people seem to think. Too many people buy into the cloud-hype, and never look at the numbers until it's too late.

And that doesn't even touch the trust you have to have in your provider.

And I love virtual more than physical. But I set up my own virtual server and run them myself.


> And I love virtual more than physical. But I set up my own virtual server and run them myself.

Where do you keep the machines? In your bedroom?


Depends if you mean for work or home.

For work, we have a DC. For home, yes, I keep them in my office.


You're paying for a super ninja ultra special internet connection as well? Yikes. Interesting concept though.

So your strategy for scaling up is building a data center then?


> You're paying for a super ninja ultra special internet connection as well?

Not really. Just a normal home connection. If you design your services sanely, and don't have much traffic, it's not really an issue.

> So your strategy for scaling up is building a data center then?

Well, it depends. You really should do a cost-benefit analysis, because each product is different. But if it makes sense, then yes.


But your whole argument was that for some serious application you shouldn't trust on cloud services, wasn't it? The kind of things I was doing half a year ago you couldn't run on your home connection. So this whole "run important stuff on your home connection" is just a signal that you're not doing anything great.


> But your whole argument was that for some serious application you shouldn't trust on cloud services, wasn't it?

Well, with everything, there are pros and cons. My argument is that I think it's silly to trust someone else with your infrastructure. There can be benefits to doing so, however, your giving up a lot as well. And in terms of cost (which is usually the main argument that I hear), I've personally never seen it work.

Your trusting someone with something pretty critical with your business, so you should think long and hard about which direction you want to go. If your cloud-service gets arbitrarily shut down or has an outage, what responses can you take to fix it? Usually not many. Thats essentially adding a SPoF, which is generally considered a big no-no, although it does happen.

> The kind of things I was doing half a year ago you couldn't run on your home connection.

I'd be curious as to what your doing then.

> just a signal that you're not doing anything great.

I guess that depends on what you mean as great. I do a lot of useful stuff: email, web-apps, file-store, home-automation, etc.


That funny, because one of my beefs with Amazon is that they charge a lot for bandwidth! Once you're in a co-lo, high quality bandwidth just isn't that expensive.


Because few companies can own their entire infrastructure? I'd rather restart my server in a different AZ and have Amazon deal with tracking down an intermittent networking issue than to pay a senior network administrator to open a ticket with Cisco and try to replicate it on our lab.

Likewise, I really don't want to pay a power engineer to troubleshoot why our automatic transfer switch failed to transfer after a power failure and it took down our entire datacenter, I'd rather let Amazon deal with that while I failover to my backup servers in another AZ or region.

And my 100mbit ethernet handoff with a DS3 backup doesn't give me much clout with the teleco when the network is down, but AWS's multiple OC-192's (?) gives them a lot more leverage and more redundancy.

Owning your own infrastructure doens't mean that you don't have to deal with a third party provider, but it does ensure that you're caught in the middle of it more often.


Nobody said it eliminates problems, but moving to the cloud doesnt either, it just hides them away. If that's what you want, that's OK, but at the same time, don't expect much sympathy when you make a blog-post like this and bitch about the lack of transparency on a system you don't manage. You may want to give up control of your infrastructure to someone else, but my personal (not statistical, mind you) experience is that you end up with less control, for more cost.


I have my primary services spread across 3 separate datacenters (AZ's), and backup services across 3 separate AZ's in a region on the opposite side of the country. I pay far less than I would if I put my equipment in only 3 datacenters (2 primary + one secondary) since most of my backup infrastructure is powered off.

And I have far more control than I had with colocated equipment, if I find I need to scale up my services by 2X, 5X or 10X, all it takes is a little more money during the time I need the extra hardware (i.e. end of month processing, plus holiday shopping season), I don't need to order hardware a month or two in advance and pay for it all even when I don't need it.

What I used to pay for my network infrastructure alone (hardware + support + network engineers) pays for most of my entire AWS compute infrastructure.


> Is Amazon or Azure or Heroku really any different?

In my last work place we had paid AWS customer service, and they were outstanding.


Reaching a human being is a common problem with all Internet businesses. I've had similar frustrations with Google competitors. As an industry, we have to do much better with customer service. Our users are real human beings.


This is why when it comes to web hosting, I only do business with companies that have 24/7 phone support. It limits my options a lot. GoDaddy and HostGator both provide this, most others I've seen don't.

But before I got my first HostGator account, I called their tech support line at 2 AM, and got a guy named Chris with a strong Texan accent. And I've had escalated issues at odd hours where I've actually spoken directly to sysadmins there. Ticket support only hosting simply can't compete with that service, no matter the cost.


Dollars to donuts these guys got hacked. Still pretty shitty the appeal process was busted.


So, everyone's basically saying:

> Google sucks

> Don't rely on Google if you want your business to succeed

> This is old news! Happens all the time

> Google = SPoF

...you're all looking like trolls to me. Please, do tell how Spotify manages to service 60+ markets with some core pieces of infrastructure on Google's Cloud offerings.


Spotify will have a phone number of someone they can call.

I've had this issue with google too. They'll shut down your account, your whole account - google apps email and all - if an automated system detects something dodgy. In my case an automated system at ebay accused my site of phishing which it wasn't.

There is a paid support option, but when your account is shut down you are unable to access the required code.

I don't think you can appreciate how inaccessible Google is until they decide you're a bad actor.


By being Spotify-scale, thus forcing Google to care about them. See also: Snapchat.


Pretty sure if Google guts Spotify, Spotify can literally call Sundar Pichai himself on the phone to get it resolved. There's a scale issue there in terms of whether or not Google considers you worth bothering with.


I don't know that the comparison with Spotify is very fair - I would imagine they have a direct contact to employees.


Google doesn't typically have bad products. Google is amazing when things go right, it's only when they go wrong that Google really shows their true nature. I'd wager a guess that most of the time things just don't go wrong.

I also imagine that Spotify pays a lot of money and has a dedicated account manager.


Same way Pewdiepie gets excellent YouTube customer service: he has millions invested in Google properties. Google's great at helping their huge money-makers, it's the literally everybody else who gets screwed by their awful robot systems.


I hate that Google don't give a damn about its users. To them interacting with humans is a waste of time and money... They prefer to automate everything and manage servers, robots and programs instead.

The same thing happened to us in 2013 luckily had backup servers at Linode.. took them 2 weeks to solve the problem it was a bug in their billing system.

Google gave us $8k credit and I gave them another try and convinced myself that GCE was very young in 2013.

Now I'm just worried! It's 2016 and our entire business depends on their platform and their support sucks! Even the $400/month version!!

Google, why can't you have support like DigitalOcean and Linode?


Google, why can't you have support like DigitalOcean and Linode?

Because it's totally not in their DNA?

Because GCE and company are relatively low priority offerings? Would you agree these are higher:

Ad Words etc., which make the bulk of their money.

Search etc. which among other things has you seeing those ads.

Android and all their other efforts to keep themselves from losing their access to the users of the above.

For a while, Google+ was an anomalous offering they were pushing very hard (with this sort of brain damage with linked accounts that hit hard with their real names policy), but they might try such a stunt again.

Compared to the two service providers your mentioned, which only do one thing. Even AWS is I gather rather like that, there's it + all the Amazon selling stuff which is significantly separate. Some of the DNA is shared, but that's not all bad, such as the focus on customers.


Google, said to have very high hiring bar, yet I've seen more than a few times now that their product lines are sub-par. I'm sure the explanation they give is that we are very big and we can ignore the 1% corner cases and have the mentality that "we are so big we don't really care about smaller parts of the internet".

Also, the hypocrisy in using very logical questions to hire people but not sticking to similar logic in the product line is laughable.


It feels like they have an entire company of Mr. Spocks without a single Doctor McCoy to round things out.


Mr. Spock would have approached the issue of support more logically and would have thought things through more carefully.

It's more like they have an entire company of Sheldon Coopers, without a single Penny or Leonard Hofstadter to round things out.


Sheldon would have never agreed to release a system with flawed design.


Of course it's easy to pattern match this to previous stories of advertisers and publishers being banned from Google's ad products, people losing their email access on Gmail, etc, but this is an entirely different area of the company, and one where account managers are very much a thing. It strikes me as odd that one couldn't just email or phone their account manager to ask for clarity on a situation like this?


Google tries to get away with as little human supervision as possible. I doubt they have an actual account manager


They have an account manager, it's just a robot named Samantha West, programmed to insist she's a real person:

http://newsfeed.time.com/2013/12/10/meet-the-robot-telemarke...


It's part of Google culture, Google DNA. Even though the departments are different, the heritage will show.


The blog author was upset and confused that the email Google sent included an account number instead of account name. I don't think we're getting the whole story here.


I admit to having a gmail.com account

----

I really don't understand how anyone can use Google for production. If you don't pay for support to have a way to actually get a hold of Google...

... Someday you may be in a world of hurt and have absolutely no way to get ahold of anyone. You can post a story and hope and pray that Google might read it and have someone reach out and remedy the situation.


We got the 100k credit which is nice. Noticed the actual technical side is excellent - vastly better than azure and AWS (AWS especially - I find the whole UI and way it works so clunky).

However, despite using our credit to buy gold support when it came to using it I couldn't work their tickets UI at all. Turns out you need to change products from your Google apps support to cloud platform by some tiny link which is not at all obvious.

I had to phone up and it was obvious people hit it all the time as the operator instantly knew what the issue was.

Also related: azure by default didnt renew your free trial despite an active cc being on file. It just shuts everything off with no obvious warning. What other service does that?

Honestly think these companies need to do some proper usability testing on their flows because there is so much clunky weirdness in cloud providers right now.


Not sure if he missed it on the support page, or simply is not willing to pay the added cost, but Google does offer a paid support option with both phone and email support options. Obviously the more you pay the quicker the response and more access you have to those options.


I suggest reading the article.

"Google offers support solutions where you can talk to a person if you have a problem. We view it as problematic that interrupting an “allergic reaction” as a “support issue”. However, we would be willing to purchase top-tier support in order to get this resolved quickly. But there does not appear to be an option to purchase access to a human to get this resolved. Apparently, we should have thought about that before our project was suspended."


Hey thanks, but I did read the article, and saw that part. That is why I said that he may have missed that part of the page. And just because he says he would be "willing to purchase top-tier support" doesn't mean that he doesn't have a limit on how much he would pay for said support, thus my saying that maybe it was too much for him.


My impression from the article (and the quoted bit) is that you can't pay for support once your project is suspended.


As far as i can tell, this is false (I have a suspended test project, and the support options still seem to be there)

I think he means "there is no 'pay to talk to a human'" option in the FAQ/etc for this issue.

That is, he wants to pay to talk to a human about just this issue, and thinks somehow that paying for the support option will not give him that (or is too expensive for that).


This makes me wonder, which do people dislike more: not having access to support in almost any capacity except "help desk tickets" which might not get answered for days, e.g. how Google handles non-free support tiers, or horribly long wait times on phones? My gut says that the former is worse, especially in this case, because you can just turn your phone on speaker and wait for the queue to finish up (or with some systems, have them call you back when the queue is to you).

I almost can't believe that Google doesn't have a support line for its account holders for Google Cloud, when you have companies like Paypal and United Airlines, which have many more users (paying or not) and have support lines.


The situation of having a project fully shut down without adequate information or process by which to correct it is clearly unacceptable. That said, I was involved in recently migrating to GCP and it has worked wonderfully for us thus far. We run a fairly standard java/MySQL web app, with paid silver support. Questions are answered promptly in our experience. We found compute engine VMs to be faster (and cheaper) than EC2, the developer console easier to use, and better quality documentation/APIs.


Just wait until you can buy one of their self driving cars.


I usually defend GCP as they have better technology and platform architecture but I definitely agree that they have major communication problems. While there are plenty of individuals available through social channels, it does seem to be very informal most of the time which does not inspire confidence as a business looking for a solid foundation.


This sounds like a nightmare. One that I can't imagine happening at Rackspace Cloud...


Bet you cash money the affected site is or was in a ddos botnet.


Google = rain man.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: