"DigitalOcean Killed Our Company"

bcooks · on May 31, 2019

As DigitalOcean's CTO, I'm very sorry for this situation and how it was handled. The account is now fully restored and we are doing an investigation of the incident. We are planning to post a public postmortem to provide full transparency for our customers and the community.

This situation occurred due to false positives triggered by our internal fraud and abuse systems. While these situations are rare, they do happen, and we take every effort to get customers back online as quickly as possible. In this particular scenario, we were slow to respond and had missteps in handling the false positive. This led the user to be locked out for an extended period of time. We apologize for our mistake and will share more details in our public postmortem.

bcooks · on June 1, 2019

Thanks for the replies. Let me try to address a few of the things I have seen here. We haven't completed our investigation yet which will include details on the timeline, decisions made by our systems, our people, and our plans to address where we fell short. That said, I want to provide some information now rather than waiting for our full post-mortem analysis. A combination of factors, not just the usage patterns, led to the initial flag. We recognize and embrace our customers ability to spin up highly variable workloads, which would normally not lead to any issues. Clearly we messed up in this case.

Additionally, the steps taken in our response to the false positive did not follow our typical process. As part of our investigation, we are looking into our process and how we responded so we can improve upon this moving forward.

cprayingmantis · on June 1, 2019

With all due respect I think you’ve missed the point. The larger point from my perspective is that you denied your client the ability to move their data off your platform. This would be akin to someone breaking the terms of their lease and you confiscating all their belongings with the intent of burning them. You should provide some sort of grace period for users to move their data off your platform. For everyone else reading this this is should be a wake up call why you should never trust your data to a singular entity. Even if they have 99.9999% uptime you never know when they’ll decide to deny you access to your data.

lioeters · on June 1, 2019

Thank you for jumping in personally to clarify what happened.

As a business owner with much of our infrastructure depending on DigitalOcean, the incident is concerning. It affects the reputation of DO as well as its customers.

The demographics on Twitter and especially here on HN represents a sizable crowd with decision-making influence on DO's bottom line. I hope to see some effort being made to prevent situations like this in the future, and to regain the trust.

As a (so far) satisfied customer, it's great to hear that:

> A combination of factors, not just the usage patterns, led to the initial flag.

> We recognize and embrace our customers ability to spin up highly variable workloads, which would normally not lead to any issues.

> we are looking into our process and how we responded so we can improve upon this

someotherperson · on June 1, 2019

I’ll be awaiting the post-mortem and, depending on that and the procedures proposed to stop this from happening again, will hold off moving everything I have from DO.

The real “mess up” here was the bit where you blocked the account with no reason given and no further communication - other than the one-liner your intern wrote for the email.

I’m expecting you to sit down with your legal team and rewrite your TOS to be more customer-focused and less robotic.

bcooks · on June 3, 2019

I wanted to provide you all with an update on the postmortem I promised on Friday. Our analysis has been completed. We will be sharing the full document soon and will publish a link in this thread for those wanting to read it. We promised Raisup a first look and we have provided the draft document to them this afternoon. Because some information in the document could be considered sensitive we wanted to give Raisup a chance to review the document before sharing with the public.

marenkay · on June 3, 2019

As a long term customer here is a small suggestion to make this fail-safe: by default do trust your customers, and just ask them first instead of shooting them down first.

Considering you have been marketing yourself as the platform for developer oriented cloud, you should be aware that surge provisioning can and will always be happening.

apple4ever · on June 3, 2019

This is the right move. Sure, if an account has repeated violations after clear communications, then action must be taken.

But it doesn't make sense to shut it down before discussion!

godzillabrennus · on June 1, 2019

I have always been a happy DO customer. So glad to see you step into this lions den and reply to the community.

Looking forward to the write up!

9HZZRfNlpR · on June 1, 2019

What do you recommend your clients to do if that kind of mistake happens to them? Is Twitter-shaming the only way out?

I know people say some legal arguments why they close you down and won't say anything, but this is the worst scenario ever. I'd be better off excused at something I didn't do than just ooops we can't tell you anything, your account has been shut down.

xtracto · on June 1, 2019

This is important. I hate how it had became standard for companies to screw their customers unless they are online-shamed.

The response email even read like a giant polite FUCK YOU (we locked your account, no further action required by you)

You bet I will have further action!

And it is after the shaming that you get an "I am sorry for this situation". Which sounds more like saying "I'm sorry we got caught".

My frustration is not with DO specifically, as they do exactly what every other company does.

But, what of the other thousands of people that got screwed and did not put it on twitter?

It is the equivalent of when you are in a restaurant and get screwed: It is the loudest person that complains more the one that gets the reward, while all the others silently swallow the injustice.

qmarchi · on June 2, 2019

It's most likely due to the fact that the people who can act upon the process itself, not just follow the process inevitably see the issue and do truly want to help.

Getting your message into the right hands is what matters, not the platform it's on.

nathan-io · on May 31, 2019

Mistakes happen, and algorithms are sometimes a necessary part of scale/efficiency. Everyone understands that.

That said, what's highly troubling as a DO customer (and someone who is planning to deploy startup infrastructure of my own with DO) is:

1) The discrepancy between this customer's experience and clear assurances made on this very forum by high-level DO employees that:

a. warnings are ALWAYS issued before suspensions.

b. even in the event of a suspension, services remain accessible (though dashboard access and/or the ability to spin up NEW services may be impacted), ie. the affected customer could still retrieve data or SSH in to droplets.

2) The relatively trivial nature of the customer's offending usage (temporarily spinning up 10 droplets). What happens if, for example, a startup gets a press mention somewhere that leads to a massive traffic spike, necessitating a sudden and significant spin-up of new droplets (especially if this is done programmatically versus by hand in the dashboard)?

3) The apparent lack of consideration of the customer's history, or investigation into their usage. It seems the threshold for suspending services of longstanding customers who are verifiably engaging in commerce (taking a moment to look at their website and general online presence for indicators of legitimacy), should be SUBSTANTIALLY higher than for, say, an account who signed up a week ago. Context matters.

nathan-io · on June 1, 2019

I'm no longer able to edit the above comment, so to elaborate on #1:

Following is a comment[1] by Moisey Uretsky in another thread[2]:

> Depending on which items are flagged the account is put into a locked state, which means that access is limited. However, the droplets for that account and other services are not affected at all. The account is also notified about the action and a dialogue is opened, to determine what the situation is. There is no sudden loss of service. There is no loss of service without communication. If after multiple rounds of communication it is determined that the account is fraudulent, even then there is no loss of service that isn't communicated well in advance of the situation.

1. https://news.ycombinator.com/item?id=18296344

2. https://news.ycombinator.com/item?id=18294940

This is why I'm so confused by the case under discussion, because the customer appeared to have been completely locked out without warning.

rat9988 · on June 1, 2019

What he said in an other thread, and this thread, is press release, marketing. Don't trust what he says to save his business. You have absolutely no reason to.

nathan-io · on June 1, 2019

I prefer to give them the benefit of the doubt, though a clear explanation of why the above policy was not followed seems warranted. (It also doesn't appear to have been followed in several other instances reported by other former customers in various HN threads.)

If DO reserves the right to cut off services and access to your own data permanently and without warning (outside of a court order or confirmed illegal activity), that needs to be unequivocally stated, and the triggering factors should be made known. Otherwise, DO is not fit for production systems.

Additionally, it would be nice to see the creation of a transparent, high-level appeal process for customers affected by suspensions. Truly malicious customers wouldn't use it (what would they hope to successfully argue to an actual human reviewer?), but it would greatly benefit legitimate customers to have an outlet other than social media by which to "get something done" in the event of an inappropriate suspension followed by a breakdown in the standard review process.

tnolet · on May 31, 2019

Not sure why you’re being downvoted. Point 2 is very relevant. Scaling instances due to sudden peaks should be totally safe. Even when automated. Guess AWS is still lonely at the top.

perennate · on June 1, 2019

AWS has default instance limits too, though, which you need to open ticket to increase.

moondev · on June 1, 2019

Which is a much better policy than suspending the account.. 10 instances is nothing!

adventured · on June 1, 2019

It really is a trivial amount of resources to have it triggering such a reaction. It's almost like DigitalOcean doesn't like being in the cloud hosting business. One of the fundamental, desirable points to the shift to such cloud hosting services is that you can quickly spin up a bunch of resources when needed and then dump it.

rdiddly · on June 1, 2019

You've got an additional problem though, which is that this tells us you have two support channels: one that doesn't work (i.e. yours, the one you built), and one that does (Twitter-shaming). The first channel represents how you act when no one's watching; the second, how you act when they are. Most people prefer to deal with people for whom those two are the same.

dkersten · on June 1, 2019

As a DO user who was planning on ramping up usage in the coming weeks and months, this is what scares me and what is making me seriously reconsider.

xvector · on June 1, 2019

Do not use DO. The very fact that their default response to suspected spam is to cause prod downtime is so bizarre and unacceptable that it does not make any sense whatsoever for a business to rely on them.

dkersten · on June 1, 2019

Thanks, I’ll stick with AWS then.

sneak · on June 1, 2019

https://github.com/fog/fog/issues/2525

https://news.ycombinator.com/item?id=6983097

Running anything business or privacy critical on DO is madness.

nathan-io · on June 3, 2019

Indeed, this was bad. I assume they were trying to extend SSD lifetime by reducing writes.

It's fair to note that scrubbing is now the default behavior when a droplet is destroyed, so they did listen to the feedback.

https://ideas.digitalocean.com/ideas/DO-I-1947

sneak · on June 4, 2019

The SSD thing is a red herring.

You do not need to scrub or write anything to not provide user A’s data to user B in a multi-tenant environment. Sparse allocation can easily return nulls to a reader even while the underlying block storage still contains the old data.

They were just incompetent.

On top of all of that, when I pointed out that what they were doing was absolute amateur hour clownshoes, they oscillated between telling me it was a design decision working as intended (and that it was fine for me to publicize it), and that I was an irresponsible discloser by sharing a vulnerability.

Then they made a blog post lying about how they hadn’t leaked data when they had.

Nope.

buzzerbetrayed · on June 1, 2019

As someone who has been blown off by DO support, you hit the nail on the head.

mekane8 · on June 3, 2019

So well written. This is exactly what's so scary about this whole thing.

iamaelephant · on June 1, 2019

I think it says a lot that this CTO joker flew in, regurgitated the standard-issue "we will endeavor to do better" apology and left without answering any of the very legitimate follow-up questions. I would never deal with an organisation that behaves like these guys.

thanatos_dem · on June 1, 2019

Is there any response that would satisfy you?

pdimitar · on June 1, 2019

Yes.

"This will not happen again, ever".

People's livelihoods are at stake in DO's hosting. Canned responses and brutal account lockouts should have NEVER been on the table to begin with.

thanatos_dem · on June 1, 2019

That’d be unrealistic for any company to claim, and if any company I worked with did claim that I would run for the hills.

That’s akin to saying “we’ll never ship a bug”, or “we have an SLO of 100%”. That’s impossible for anyone to claim. Same goes for the response handling. There is clearly a lot of room for improvement there, but if you’re insisting on not getting canned response, that means a human needs to be involved at some point. Humans will at times be slow to respond. Humans will at times make mistakes. This is just an unavoidable reality.

I get that mob mentality is strong when shit hits the fan publicly, but have a bit of empathy and think about what reasonable solutions you may come up with if you were to be in their situation, rather than asking for a “magic bullet”.

I could see a good response here being an overhaul of their incident response policy, especially in terms of L1 support. Probably by beefing up the L2 staffing, and escalating issues more often and more quickly. L2 support is generally product engineers rather than dedicated support staff/contractors, so it’s more expensive to do for sure, but having engineers closer to the “front line” in responding to issues closes the loop better for integrating fixes into the product, and identifying erroneous behavior more quickly.

pdimitar · on June 1, 2019

Sure, me and a lot of others react rather strongly in these situations. I agree with that but you already seem to understand the reasons.

However, can you say with a straight face that the very generic message left here by DO's CTO instills confidence in you about how will they handle such situations in the future?

Techies hate lawyer/corporate weasel talk. Least that person could do was do their best to speak plainly without promising the sky and the moon.

thanatos_dem · on June 1, 2019

I would prefer a generic message and a promise for follow up once all the facts are known over a rushed response that may be incorrect.

I’m an engineering manager in an infrastructure team (not at all affiliated with Digital Ocean, tho full disclosure, I do have one droplet for my personal website). I know how postmortems generally work, and it’s messy enough to track down root cause even when it’s not some complex algorithm like fraud detection going off the rails.

I’d rather get slow information than misinformation, but I understand the frustration in not being able to see the inner working of how an incident is being handled.

pdimitar · on June 1, 2019

I applaud people like you. Seriously.

And I agree with your premise. However, my practice has shown that postmortems are watered-down evasive PR talk, many times.

If you look at this through the eyes of a potential startup CTO, wouldn't you be worried about the lack of transparency?

And finally, why is such an abrupt account lockdown even on the table, at all? You can't claim you are doing your best when it's very obvious that you are just leaving your customers at the mercy of very crude algorithms -- and those, let's be clear on that, could have been created without ever locking down an account without a human approval at the final step.

What I'm saying is that even at this early stage when we know almost nothing, it's evident that this CTO here is not being sincere. It seems DO just wants to maximally automate away support, to their customers' detriment.

Whatever the postmortem ends up being it still won't change the above.

Moru · on June 3, 2019

Our line so far has been to change provider of service if we start getting copy - paste answers from support. We always make sure we can get hold of a human on the phone even without a big uptime contract. This has so far lead us to small companies that are not overrun by free accounts used as spam or SEO accounts. That means they have no need for automatic shutdown of accounts and instead you get a phonecall if something goes wrong.

pdimitar · on June 3, 2019

This is how I would go about it as well. But I imagine that's a big expense for non-small companies, and not only through money but through the time of valuable professionals that could have spend the time improving the bottom line.

I too value less known providers. The human factor in support is priceless.

geezerjay · on June 3, 2019

> Is there any response that would satisfy you?

Do you believe that a PR response made in damage control mode that actually changes nothing is something that's satisfactory?

I mean, apparently this screwup was so damaging that it killed a company. What part of the PR statement addresses that precendent?

perlgeek · on June 1, 2019

How about "we will reimburse the company for any damages if we found it was our fault"?

isoprophlex · on June 1, 2019

It's been 7 hours, some follow up answers would be nice...

thanatos_dem · on June 1, 2019

7 hours, on a Friday night in the headquarters time zone. This issue is resolved and is clearly not wide spread, so does getting a response on Monday or Tuesday vs right now make any difference?

Companies are made of people. Let the people have a life. Their night is shitty enough as is after this, I guarantee you.

rat9988 · on June 1, 2019

The thing is, my business don't want to deal with people. It wants to deal with a business made of multiple people to guarantee service availability. If he cannot answer, surely someone else in DigitalOcean can?

skywhopper · on June 1, 2019

You are being unreasonable here. He promised a postmortem. I’d much rather wait a few days to get a clearly written, comprehensive analysis of the problems than to get an immediate stream of confusing and contradictory raw data.

If you have ever been involved in post facto analysis of a process breakdown like this you know how hard it is to get the full picture immediately. Rushing something out does no one any favors.

harshalizee · on May 31, 2019

Sure, but the email he received basically said "your account is locked. No other info. Thank You". That to me is a much scarier thing than anything else in the thread. How can anyone trust in your infrastructure if your standard protocol is literally just shutting down their entire operation without any form of review or communication?

jasonjayr · on June 1, 2019

We have a relatively large spend ($5k+) @ DO, for a unique client (most of our other clients can be served by our colocated facility), and I'm going to second this. Or with any other provider. They should always explain exactly which rule was broken. If the customer is legit + genuine, they will promptly fix the issue and won't be a further problem. Being vague makes it super troublesome to rely on any service that takes that tactic. (Like Google, for example) If they continue to re-offend, and find other ways to skirt the rules, that's when you move on to account termination.

throwaway34241 · on June 1, 2019

You can't, obviously. Even though I've used them before I really doubt I'll ever use DigitalOcean again. I can almost understand terminating customers (with notice) via automated heuristics for suspicious behavior, especially on the low end of the hosting market, but locking out a legitimate paying customer from backups with no notice or recourse is terrifying.

jdcro · on May 31, 2019

"In this particular scenario, we were slow to respond and had missteps in handling the false positive. This led the user to be locked out for an extended period of time."

This didn't seem like a case of being "too slow" - the customer in question went through your review process (which was slow, yes), and the only response he got was "We have decided not to reactivate your account, have a nice day".

That just seems like a lack of interest in supporting your customers that are falsely flagged.

bcooks · on June 4, 2019

Last week ended on a real low note for many of us at DO. We took a perfectly good customer and gave them an experience no one should have to go through (all while he was trying to leave on vacation no less). We can and must do better. To do better we need to learn from our mistakes. To that end, we also think sharing the information about this incident openly is the best way to help all our customers understand what happened and what we are doing to prevent it in the future.

Yesterday we completed our postmortem analysis of the incident involving Nicolas (@w3Nicolas) and his company Raisup (@raisupcom). With their permission we are sharing the full report on our blog here:

https://blog.digitalocean.com/an-update-on-last-weeks-custom...

exabrial · on June 1, 2019

No offense, as I'm sure this has been hard, but a screwup like this publicly demonstrates DO is not ready for prime time competition against AWS, Azure, GCP and the like.

I'd gladly do whatever it takes to KYC, send you my business license, tax returns, EIN, invoice billing, etc so you know there is someone behind my account.

We spend thousands of hours eliminating single points of failure. If an automated system can undermine that work, DO is not an option for us to host anymore.

amrx431 · on June 1, 2019

A year of Data backup lost. Do you realize how that alone may cause the clients to dump a company and do you realize that startups may never recover from fiasco like these? I understand that it was false positive triggered by internal systems. But how do you explain the delay in restoring the services and reflagging again within hours after the services were restored?

rat9988 · on June 1, 2019

At the end it was not even a matter of delay. It was more like, we locked your account, we don't want to hear you anymore until the end of times.

lapnitnelav · on May 31, 2019

Public post mortem? Brilliant.

Hope you can share what you learnt from this incident and hopefully you'll take a hard look at your processes.

I'd hate to be caught in the same issue, especially that we are already customers, and I'm not sure I'll have as much clout as Nicolas here to get your attention.

kbenson · on June 1, 2019

> and I'm not sure I'll have as much clout as Nicolas here to get your attention.

It's occurring to me now that while I've successfully ignored twitter for years, I should probably rectify that just so I have somewhere to type my hopes and prayers when this eventually happens to me, and hope for a miracle. It sure seems like the only place they're listened to.

sampo · on June 1, 2019

> I'm not sure I'll have as much clout as Nicolas here to get your attention.

Maybe keeping a twitter (and other social media) account with at least a certain number of follower should be considered a part of a company's security strategy? You'd also need to post something interesting periodically, to keep your follower, so that you have their attention when you need it.

mirimir · on June 1, 2019

Will the customer be compensated for business losses?

bufferoverflow · on June 1, 2019

I hope they sue and win. This bull###t needs to be fought.

mirimir · on June 1, 2019

IANAL, but DO's ToS is loaded with weasel words.[0] So if they can sue in some jurisdiction where the binding arbitration and liability limitations don't apply, maybe they could at least get a fair settlement.

0) https://www.digitalocean.com/legal/terms-of-service-agreemen...

system2 · on June 1, 2019

Are you dreaming?

_qjt0 · on June 1, 2019

It would probably be worth it to restore trust: Refund all the money they've taken from this company for the last year, and apply a credit to their account for 3x that amount, say.

mirimir · on June 1, 2019

Sadly enough, yes. I'm sure that it's covered in DO's ToS.

But the DO CTO did basically admit fault in a public forum.

chris_wot · on June 1, 2019

It's not the false positive that is the issue here. The issue is that a. it took way too long to get the business back up and running, and b. the second response gave no explanation and no recourse for the business to become operational again.

The very fact that this can happen from an automated script with no oversight should give every one of your customers pause as to whether they continue with your service.

Cakez0r · on June 1, 2019

I'd say the issue is that DO is shutting down servers for any reason at all (legal issues aside). If DO sells a product with a particular capacity, why should they intervene at all if a user is using all of the capacity they're paying for?

OBLIQUE_PILLAR · on June 2, 2019

Gcp bans mining. Most providers frown on running tor exit nodes.

MagicPropmaker · on June 1, 2019

So unless a person is popular enough to get enough people talking about it on twitter or hacker news, someone whose account is flagged by your bad script is going to lose his business.

That doesn't sound good to me.

jameshilliard · on June 1, 2019

Are you aware that Viasat has blacklisted a huge amount of Digitalocean /24 subnets? I can't access many of my servers when I'm on a satellite connection in addition to other websites hosted on Digitalocean. I've talked with the Viasat NOC and they told me they were blocking Digitalocean subnets due malware.

zhte415 · on June 1, 2019

This is probably worth it's own post, it would be very interesting to see more detail. I'm also probably certain that this is also not exclusive to DO.

jameshilliard · on June 1, 2019

Posted here: https://news.ycombinator.com/item?id=20067036

sneak · on June 1, 2019

DO is famously bad at dealing with abuse reports; in a lot of cases they simply do nothing.

I block their netblocks for a lot of things, too.

system2 · on June 1, 2019

Should we be concern about our 40+ droplets with DO now? We built our business on DO, we really can go bankrupt as well as our 30+ clients if anything like this happens to us. Please change your support system ASAP otherwise we will be switching to another platform. We are expecting a very serious response from you.

xvector · on June 1, 2019

Do not use DO. The very fact that their default automated response to spam is prod downtime is unacceptable.

It requires so many failures in understanding the service being provided across the company for this decision making process to have ever actualized that there is no reasonable expectation of safety or trust from DO at this point.

bpicolo · on June 1, 2019

Every cloud company has anti abuse systems that will limit your access to their APIs / take down your machines if abuse is suspected - for example if it looks like you're mining bitcoin. Your prod isn't any different from your staging for them

bufferoverflow · on June 1, 2019

Clearly you should be doing regular backups of everything, and not on DO. And make sure to test your backups. And make sure you have a fast migration plan into another cloud.

Ideally you should be cloud-agnostic, but that's quite hard to achieve.

scarface74 · on June 1, 2019

That’s all well and good. But how do you plan to reimburse your customer for this gross negligence? I have never heard of such incompetence or lack of communications from anyone on AWS’s business support plan. Why should anyone trust DO over AWS or Azure?

locusm · on June 1, 2019

Curious how you will compensate them?

ryanackley · on June 1, 2019

I'm genuinely curious. What type of fraud or abuse are you trying to prevent? Maybe cover that in the postmortem.

LeonM · on June 1, 2019

If your DO (or other cloud provider) credentials are compromised, it's usually a matter of seconds before someone fires up the largest possible number of instances to start crypto mining.

bcooks · on June 4, 2019

Yup. LeonM, you are correct. In this case that was the cryptocurrency mining detector that was triggered. More details in the postmortem.

bofadeez · on June 6, 2019

Unacceptable. I just instructed my team to begin a transition to AWS.

bofadeez · on June 6, 2019

You ruined your brand

justforyou · on June 2, 2019

Do you realize that by abusing this thread to make a single PR focused comment with no intention of participating in the conversation -- you've disrespected the community here and the few remaining DO customers within said community.

justforyou · on June 1, 2019

>> and we take every effort to get customers back online as quickly as possible. In this particular scenario, we were slow to respond and had missteps in handling the false positive.

You clearly don't make every effort, and did not -- so why waste the extra verbiage and switch from active to passive voice?

Based on your cliche response I have zero confidence that DO will do anything substantial to address the root causes of the issue.

mercer · on June 2, 2019

I've found DO's public posts to be particularly grating in the "we are listening to YOU, our customer. we take feedback extremely seriously" department.

thaumaturgy · on May 31, 2019

Some people on HN hate Linode because of their past security screwups (which is valid), but having used both DO and Linode quite a lot, the support on Linode is way, way, way better than DO's.

DO's tier 1 support is almost useless. I set up a new account with them recently for a droplet that needed to be well separated from the rest of my infrastructure, and ran into a confusing error message that was preventing it from getting set up. I sent out a support request, and a while later, over an hour I think, I got an equally unhelpful support response back.

Things got cleared up by taking it to Twitter, where their social media support folks have got a great ground game going, but I really don't want to have to rely on Twitter support for critical stuff.

DO seems to have gone with the "hire cheap overseas support that almost but doesn't quite understand English" strategy, whereas the tier 1 guys at Linode have on occasion demonstrated more Linux systems administration expertise than I've got.

znpy · on May 31, 2019

I have interviewed with DO and they tried diverting me towards a support position.

They told me that on a single day a support engineer was supposed to help/advice customers on pretty much whatever the customer was having issue with and also handle something between 80-120 tickets per day.

It's nice to see that DO is willing to help on pretty much anything they (read: their team) has knowledge about, but with 80-120 tickets per day I cannot expect to give meaningful help.

Needed EDIT: it seems to me that this comments is receiving more attention than it probably deserves, and I feel it's worth clarifying some things:

1. I decided not to move forward with the interview as I was not interested in that support position, so I have not verified that's the volume of tickets.

2. From their description of tickets, such tickets can be anything from "I cannot get apache2 to run" to "how can I get this linucs thing to run Outlook?" (/s) to "my whole company that runs on DO is stuck because you locked my account".

hackermailman · on May 31, 2019

I once worked for eBay a long time ago, and support consisted of 4 concurrent chats, offering pre-programmed macros often pointing to terribly written documentation the person had already read and was confused about. If you took the time to actually assist somebody you were chastised in a weekly review where they went over your chat support. The person doing mine told me I had the highest satisfaction record in the entire company, and a 'unique gift of clear and concise conversation, like you're actually talking to them face to face' then said I'd be fired next week because my coworkers were knocking off hundreds of tickets a day just using automated responses, leaving their customers fuming in anger with low satisfaction ratings, as people are very aware of being fed automated responses but the goal was not real support, it was just clearing the tickets by any means possible. I decided to try half and half, so if the support question was written by somebody who obviously would not understand the documentation (grandma trying to sell a car), I would help them but just provide shit support to everybody else in the form of macros like my coworkers. Of course this was unacceptable and I got canned the next week as promised. Was an interesting experience, I can imagine DO having an insane scope to their support requests like 'what is postgresql'.

Anyway imho you should have taken the support position and schemed your way into development internally. This was my plan at eBay before they fired me, though they shut down the branch here a few months later and moved to the Philippines anyway so I wouldn't have lasted long regardless.

zeta0134 · on June 1, 2019

I'm fortunate that my own company (Rackspace) at least has a level head about this sort of thing. My direct manager looks at my numbers (~60-80 interactions per month) and my colleagues (many hundreds of interactions per month) and correctly observes that we have different strengths, and that's the end of the discussion. I have a tendency to take my time and go deep on issues, and my coworkers will send me tickets that need that sort of investigative troubleshooting. My coworker meanwhile will rapidly run through the queue and look for simple tickets to knock out. He sweeps the quick-fix work away, but also knows his limits and will escalate the stuff he's not familiar with.

Let me stress here, this is not nearly as easy of a problem to solve as it appears to be on the surface. We're struggling as a company right now because after our recent merge, a lot of our good talent has left and we're having to rebuild a lot of our teams. Even so, I'm still happy with our general approach. Management understands that employees will often have wildly different problem solving approaches and matching metrics, and that's perfectly OK as long as folks aren't genuinely slacking off and we as a team are still getting our customers taken care of. I think that's important to keep in mind no matter how big or small your support floor gets.

Phlarp · on June 1, 2019

+1 for Rack support. A previous company I worked for was heavily invested in Rackspace infrastructure and while I often opined not getting the equivalent experience with AWS for the resume, I was regularly floored with the quality of their support. Whenever I had the pleasure of needing to open a ticket they solved my problems and usually taught me something new in the process. The linux guys were very clearly battle hardened admins.

I couldn't imagine getting that level of support from DO, let alone Amazon.

bluedino · on June 1, 2019

i have the opposite experience with Rackspace. The low end stuff (hosted exchange etc) is basically useless, people who are obviously on multiple chats, they let tickets sit for days...

Even when we had small handful of physical servers with them, they seemed inept. They actually lost our servers one time and couldn’t get someone out to reset power on our firewall.

Phlarp · on June 2, 2019

My experiences were all with their "dedicated" or "managed" cloud services. Although I did notice that their marketing seemed to shift in the last months I was working with them for that employer from "let us help you build things on Rackspace" to "let us help you move what you built on Rackspace to AWS"

zeta0134 · on June 2, 2019

Yes, the Public Cloud, which houses most of the smaller Managed Infrastructure accounts (minimal support) is one of the bigger ... I believe the polite word is "opportunities?" It's a very pretty UI on top of a somewhat fragile Open Stack deployment, which needs a significant amount of work to patch around noisy infrastructure problems. That turns into a support floor burden, and it shows in ticket latency. Critiques directed at that particular product suite are, frankly, quite valid. I think Rackspace tried to compete with AWS, realized very quickly that they do not have Amazon's ability to rapidly scale, and very nearly collapsed under their own weight.

That said, our FAWS team are a good bunch, and what AWS lacks in support they more than make up for in well engineered, stable infrastructure. Since Rackspace's whole focus is support, I think the pairing works well on paper and it should scale effectively, but we'll have to see how it plays out in practice.

Phlarp · on June 3, 2019

So is the business plan now to provide premium support for AWS customers?

zeta0134 · on June 3, 2019

Oh, absolutely:

https://www.rackspace.com/managed-aws

This is a big push, internally and externally. I don't know too much about the details (I don't work directly with that team) but it's been one of our bigger talking points for a while now.

ethbro · on June 1, 2019

Support should be looked at as a profit center, but almost everyone tries to run it like a cost center.

It's crazy that companies spend $$ on marketing and sales, then cheap out on a interaction with someone who is already interested in / using their product.

vinceguidry · on June 1, 2019

Running profit centers requires comparatively rarer leadership resources while running cost centers only requires easy-to-hire management resources. You don't want your best leaders whipping your support center into shape letting the company's competitive edge fritter away.

ethbro · on June 1, 2019

Alternatively, I'd ask if you want easy-to-hire management resources as your primary touchpoint with paying customers.

hitekker · on June 1, 2019

Crazy from the individual perspective; natural from the group's.

"We have a mark, lets suck it dry until we can throw it away and find new and better marks."

Sustainability tends not be a concern until after a group's leadership jumps ship and parasitizes other hosts.

tialaramex · on June 1, 2019

It remains weird to me that this even _can_ work as a business strategy. Customers know this isn't right, so they are only staying with a business that does this for so long as it is the absolutely cheapest/ only way to achieve what they want. That's super high risk, because if a competitor undercuts you, or an alternative appears, you are going to lose all those customers pretty much instantly.

Almost invariably the high ticket rates are also driven by bad product elsewhere. Money is being spent on customer "services" sending out useless cut-and-paste answers to tickets to make up for money not spent on UX and software engineering that would prevent many of those tickets being raised. Over time that's the same money, but now the customer is also unhappy. Go ask your marketing people what customer acquisition costs. That's what you're throwing away every time you make a customer angry enough to quit. Ouch.

jjoonathan · on June 1, 2019

"Abuse your customers and get away with it because they aren't good at punishing you" is the status quo in a depressing fraction of companies.

znpy · on June 1, 2019

> imho you should have taken the support position and schemed your way into development internally

> This was my plan at eBay before they fired me

I guess you answered yourself.

pfortuny · on June 1, 2019

You are very much like Bob Parr (Mr Incredible) at the Insurance company.

The reality is that his boss is true...

nurettin · on May 31, 2019

7 tireless hours of work (with lunch break) 15 minutes to Listen, understand and resolve an issue, assuming perfect knowledge, a lot of luck and normal human speed, that would still amount to less than 30 resolutions a day.

abacadaba · on May 31, 2019

with no data or experience to back this up, i'd expect this to be kind of a long tail, where some will take an hour, and the majority will take 30 sec

windowsworkstoo · on May 31, 2019

Yep, this is spot on - I used to work on a webhosting help desk and could bang out about 100 tickets a shift, because so many were small queries that required no depth work.

klank · on May 31, 2019

Old MSFT rule of thumbs was 2 bugs per day during bug crunch mode. Sounds crazy, but when you consider the number of "this text is wrong" and "that text box is too short" bugs that existed after a year of furious development, it wasn't too hard to achieve.

Gotta hit that ZBB!

jdsully · on May 31, 2019

Brought back memories. I think it might be a little Stockholm syndrome but there was just something about the pressure of getting a release out when you know it only happens once every few years. Bug triage definitely improved my persuasion technique.

Now its just "meh, we'll fix it in next months release".

bitexploder · on June 1, 2019

Agile bug fix workflow ;)

mooreds · on May 31, 2019

Per day? Or per hour?

jdsully · on June 1, 2019

Day. Of course some bugs would be: “this entire feature is done wrong and doesn’t work”.

shoes_for_thee · on May 31, 2019

having data and experience to back this up --

you are entirely correct.

znpy · on May 31, 2019

something's not clear: are you talking from direct experience ?

williamstein · on May 31, 2019

He's using mathematics to compute the number of tickets an employee can handle per day, given certain assumptions. Given the data from znpy above, we see that nurettin's assumption that the time per ticket is 15 minutes is inconsistent with DO's expectations; instead, the average time spent per ticket should be about 5 minutes.

Retra · on May 31, 2019

30 tasks x 0.25 hours per task is 7.5 hours. Looks simply like a calculation.

nurettin · on June 1, 2019

I have no prior experience working at DO tech support.

giggles_giggles · on May 31, 2019

This is appalling. I worked as a L1 ticket tech for an old LAMP host back in the day where probably half of the tickets required nothing more than a password reset or a IP removal from our firewall, very easy stuff, and was proud if I got over 60 responses out in a 8 hour shift. And that time was spent mostly just typing a response to the customer. I really expected higher standards out of DO.

Twirrim · on June 1, 2019

Indeed. If that's their average ticket-per-engineer rate, their support team is way undersized to handle the load.

toomuchtodo · on May 31, 2019

Yikes. Those metrics are entirely unreasonable for providing any support besides canned responses. Rough seas at DO.

Disclaimer: Based on providing support myself and coaching a support team at both a web hosting company and ISP I used to own years ago.

xfitm3 · on June 1, 2019

Yikes. I'd rather work manual labor vs hit 80-120/day ticket targets!

nikanj · on May 31, 2019

Linode is definitely in the minority here. Most companies, in tech and outside of it, seem to follow the DO model. Twitter provides decent service, and the official help channels provide canned responses and template emails.

I somewhat blame people in tech, actually. More than one company is creating products that "cut customer service costs via machine learning", which is code for "pick keywords from incoming tickets and autoreply with a template"

walrus01 · on May 31, 2019

ISP here: The margins in bulk hosting services are incredibly thin, and companies have resorted to automation tools. If somebody asked me to run backend infrastructure for something like DigitalOcean or Linode, I would run away screaming. It would literally be my own personal hell. I would rather run any other sort of ISP services on the planet than a bulkhosting service where anybody with a pulse and $10 to $20/month can sign up for a VPS.

I truly feel sorry for their first and second tier customer support people. I imagine the staff churn rate is incredible.

People who work for these sorts of low-end hosting companies inevitably quit and try to work for an ISP that has more clueful customers. When you have people paying $250/month to colocate a few 1RU servers, the level of clue of the customer and amount of hassle you will get from the customer is a great deal less than a $15/month VPS customer.

thaumaturgy · on May 31, 2019

This race to the bottom has reached a point that it's harming customers. It's okay to be more expensive than the competition if you provide a better service.

walrus01 · on May 31, 2019

Personal opinion, it's really important in the ISP/hosting world to identify what market categories are a race to the bottom, and if at all possible, refuse to participate in them.

I look at companies selling $5 to $15/month VPS services and try to figure out how many customers they need to be set up for monthly recurring services, in order to pay for reasonably reliable and redundant infrastructure, and the math just doesn't pencil out without:

a) massive oversubscription

b) near full automation of support, neglect of actual customer issues, callous indifference caused by overworked first tier support

Conversely, as a customer, you should be suspicious when some company is offering a ridiculous amount of RAM, disk space and "unlimited 1000 Mbps!" at a cheap price. You should expect that there will be basically no support, it might have two nines of uptime, you're responsible for doing all your own offsite backups, etc.

If you use such a service for anything that you would consider "production", you need to design your entire configuration of the OS and daemons/software on the VM with one thing in mind: The VM might disappear completely at any time, arbitrarily, and not come back, and any efforts to resolve a situation through customer support will be futile.

lelanthran · on May 31, 2019

"you're responsible for doing all your own offsite backups"

That's going to be true no matter which cloud provider you choose.

Their ToS almost certainly include terms which allow them to kick you off and refund any monies for any reason whatsoever.

Good luck if you bought into their entire ecosystem and can't move elsewhere on a whim.

Johnny555 · on June 1, 2019

That's going to be true no matter which cloud provider you choose

My company provides services to fortune 100 companies, and we host literally petabytes of data on their behalf in Amazon S3, but we don't have offsite backups. We (and they) rely on Amazon's durability promise.

We do offer the option of replicating their data to another cloud provider, but few customers use that service -- few companies want to pay over twice the cost of storage for a backup they should never need to use when the provider promises 99.999999999% durability.

zhte415 · on June 1, 2019

I don't know the data you're holding. If it is sensitive data, like customer anything, would it infact make sense not to have offsite backup?

Reasoning: Your contract with Amazon promises durability and I'm sure there's a service level agreement with penalty/liability clauses. By implementing a redundant backup, you're replicating something that you don't legally need to have, double-or-more due diligence on the offsite backup security/credentials, and in case of a failure of Amazon create a grey area with clients "Do you have the data, or do you not?"

In short, there could be a very good business reason not to do offsite backups.

michaelmrose · on June 1, 2019

Regardless of durability if you lose your customers data are you sure you will have customers paying you to keep you in business while you figure out liability?

zhte415 · on June 1, 2019

In this case, it was not losing data, but losing access to data. The data was eventually restored. Lose customers' data could also mean losing the backup:

"We're sorry, the tape that we didn't needed to keep has been lost/zero-dayed/secondary service provider has gone bankrupt/Billy's house that we left it at got robbed." These must be disclosed to a customer immediately.

Minimising attack/liability surface is not only a technical problem, but a business one too.

Spivak · on June 1, 2019

For AWS it doesn't make a lot of sense to protect against AWS itself losing data since you're paying them a premium for that. Backups in this model would be logically separated so a user/programmer error can't wipe out the only copy of your production dataset.

EugeneOZ · on June 1, 2019

It's just greed, not some wisdom. When data is lost - it's just lost. Maybe AWS will pay some compensation because of their promises, but money not always can solve problems of missing data.

Blackthorn · on June 1, 2019

Ten years ago, I had a pretty reasonable $10/mo account, that I eventually moved to a $20/mo account because I needed more resources to keep up with traffic.

I'd expect that, as more transistors have been packed blah blah blah, that such a $10/mo account would have gotten better, not worse, since then.

walrus01 · on June 1, 2019

Support and staffing costs are not subject to Moore's law.

cwyers · on June 1, 2019

Neither is bandwidth.

walrus01 · on June 1, 2019

bandwidth does drop in price, and increase in capacity, at a fairly rapid rate. Look at what an ISP might pay for a 10GbE IP transit circuit in 2008 vs what you can get a 100GbE circuit for today. But peoples' bandwidth needs and traffic also grow rapidly.

davidgerard · on June 1, 2019

> I'd expect that, as more transistors have been packed blah blah blah, that such a $10/mo account would have gotten better, not worse, since then.

This is what Linode do - they keep you on the same payment level, but raise what you get for it.

alexis_fr · on May 31, 2019

There is the same risk of being kicked off when using Amazon AWS. The rules are different, but there will be situations where you lose everything (imagine you become visible for a political reason and the landscape shifts a bit).

yks · on May 31, 2019

Unfortunately there is no guarantee that I receive any better support at the more expensive provider though

walrus01 · on May 31, 2019

At a certain price point, yes there is, if you're paying $800/month for hosting services to a mid sized regional ISP with presence at major IX points. That ISP cares about its reputation, and cares about the revenue it's getting from you.

I can tell you that as a person whose job title includes "network engineer", we have a number of customers who have critical server/VM functions similar to these people who had the DigitalOcean disaster. If something goes wrong, an actual live human being with at least a moderate degree of linux+neteng clue is going to take a look at their ticket, personally address it, and go through our escalation path if needed.

nikanj · on May 31, 2019

Having paid substantial amounts for various services over the years, paying hundreds of dollars per month doesn't automatically make you into a priority.

There seems to be a sweet spot for company size here. Too small companies can't support you even though they really want to. Large companies are busy chasing millions in big contracts, and don't really care about your $800 per month at all.

walrus01 · on May 31, 2019

Very good points. What I would recommend is to use a mid sized ISP in your local area where you can meet with people in person. At higher dollar figures there should be some sales person and network engineer you can meet in their local office, meet for coffee, discuss your requirements, and have something of a real business relationship with. You and your company should be personally known to them.

If you are just some semi anonymous faceless person ordering services off a credit card payment form on a website, all bets are off...

jsjohnst · on May 31, 2019

Imho, that point was reached in hosting over 15 years ago (which is why I sold the hosting company I had back then). We’ve seen some short lived upticks periodically since then, but they all end up going back to shit as they tried to scale.

bluedino · on June 1, 2019

I like prgmr.com’s motto, “expect about $5/month in support”

sn · on June 1, 2019

srn from prgmr.com here. Our tagline originally came from us being a low cost service, but I like to think of it as a customer support philosophy. One meaning is we want you to be able to fix the problem yourself by giving you instructions instead of logging into your system. Another is we try give you the benefit of the doubt that it could be our problem and not assume it's yours when there's an issue.

bluedino · on June 1, 2019

which goes in line with “we don’t assume you’re stupid”

fouc · on June 1, 2019

Linode was more of a bootstrapped business, it grew slowly and steadily. Digital Ocean was always built to grow fast from the beginning.

I think that the size of the company or how fast they grow is a good proxy for having poor customer support. What we should be doing is finding the slow growing businesses or the mid-tier (not too small, not too big) businesses to take our business to.

guhidalg · on May 31, 2019

There’s nothing wrong with that approach, the person raising the support ticket likely hasn’t read through all the documentation of the product they’re using.

thaumaturgy · on May 31, 2019

If implemented well, sure -- sometimes, maybe often, you can point a customer to a support document that directly answers their specific question and relieves some of the load on your staff. That's great.

But the execution matters a lot, and DO's is currently not great. IIRC, it takes clicking through a few screens of "are you sure your question isn't in our generic documentation? How about this page? No? This one then? Still no? You're really sure you need to talk to someone about this error? sigh Okay, fine then."

These systems should not be implemented as a barrier to reaching human support, but they often are.

Pxtl · on June 1, 2019

In all my experience with support, I have been referred to a document that helped me with my problem literally zero times, because if something went wrong the first thing I did was Google it and so I already saw the unhelpful document.

AYBABTME · on May 31, 2019

> DO seems to have gone with the "hire cheap overseas support that almost but doesn't quite understand English" strategy, whereas the tier 1 guys at Linode have on occasion demonstrated more Linux systems administration expertise than I've got.

That's simply not true. There's support engineers hired around the world, and depending on when your ticket is posted, someone awake at that time will answer. DO is super remote friendly and as a result, has employees (and support folks) everywhere on the planet. Not "cheap oversea support" at all. There's a lot of support folks in the US, Canada, Europe, and in India where they have a datacenter.

disclosure: ex-employee

bryant · on May 31, 2019

> That's simply not true. There's support engineers hired around the world, and depending on when your ticket is posted, someone awake at that time will answer. DO is super remote friendly and as a result, has employees (and support folks) everywhere on the planet. Not "cheap oversea support" at all. There's a lot of support folks in the US, Canada, Europe, and in India where they have a datacenter.

Going to guess from this tweet https://twitter.com/AntoineGrondin/status/113096281882239385... that you're currently staffed at DO. That's fine; I know two people who do great work on DO's security team. But it would be helpful if you could disclose this when you comment about your employer publicly so that readers don't have to dig up your keybase and then your twitter account to understand it.

AYBABTME · on May 31, 2019

I don't work at DO anymore, but you're right I should disclose that I used to do so. Will edit my original post.

bryant · on May 31, 2019

Much appreciated, thanks!

Aeolun · on June 1, 2019

But, isn’t hiring all over the world exactly because it is cheaper for the same kind of talent. I’m sure the company doesn’t do it out of the goodness of their heart.

Then of course, there is no guarantee these people speak and understand english perfectly.

AYBABTME · on June 1, 2019

I would say it's not. There's many advantages to hiring remote workers, they've been discussed at length elsewhere. One advantage is not having to pay office space, which indeed lowers cost. However DO has a nice office in Manhattan so really... they're not saving much money. And then on the term of compensation, for some reason DO pays it's remote employees really well. I don't know how this changed in recent years but people in NA and EU are all paid handsomely despite being remote. I don't know about other locales.

The reality is that top talent, even if remote, is competitive whether they are in NYC/SFO/SEA or not. And DO has some pretty talented people on staff.

And then, having people in all timezones is definitely an advantage for 24/7 support. I'd say it's not negligible, and not an after thought.

Now about english fluency, it's only that important to english native locations. And really, most of tech does not necessarily have english as a first language - I certainly don't. So I'd say that encountering support engineers with imperfect english shouldn't be a problem to anyone, and definitely not a sign of cheap labor. In fact, I'd say bitching about someone's english proficiency in tech is kind of counterproductive and I find it discriminatory.

Anyways. DO doesn't hire international employees to get cheap labor, that's a preposterous proposition. And with datacenters around the world and a large presence and customer base, it makes sense to have staff on board from many of these areas. And that staff might answer your tickets at night when they're on shift. Shouldn't they?

cannonedhamster · on June 1, 2019

I can concur from a company that is not DO we hire workers in locations around the glove specifically to have people awake in their normal time zones, not because it's cheaper because it's not always cheaper. There are many countries that have a large portion of very intelligent and multi lingual people, especially when it comes to English. Just because someone isn't a native English speaker, or speaks in different dialect doesn't mean that they are any less capable.

masonic · on June 1, 2019

  isn't a native English speaker, or speaks in different dialect doesn't mean that they are any less capable

It's not just about being less capable... Being less understandable can trump capability.

jen20 · on June 1, 2019

Not least England, which is literally full of actual native English speakers, and 8 hours ahead of the west coast of the USA.

thaumaturgy · on June 1, 2019

I probably should have left the word "overseas" out of my initial comment, it gave it a flavor that doesn't match my left-wing multicultural globalist ideals.

That said, I disagree wholeheartedly that it's okay for support staff to not be completely fluent in the language they're providing support in, regardless of the language.

There is functionally no difference between trying to interact with talented support staff who aren't fluent in your language, and trying to interact with illiterate support staff. The end results are identical.

There are people who are very talented and very fluent in more than one language. Those people tend to be more expensive. So, many companies forego hiring those workers and instead hire others who are cheaper and "about as good". My multiple experiences with DO support have suggested that that's what they're doing.

As other commenters are suggesting, it may just be instead that DO is expecting its support staff to meet metrics that are causing them to spend only a minute or two per ticket and send out scripted replies.

AYBABTME · on June 1, 2019

I know many engineers who are not that fluent in English whom you would never contemplate qualifying as illiterate; you would quickly see that (1) they're encumbered by English and (2) are obviously extremely proficient technically, and literate.

People who would make gratuitous grammatical mistakes but have read more classics than the average American college graduate. I can easily count many just thinking of it.

thaumaturgy · on June 1, 2019

You're arguing here against something I didn't say. You took one word from my statement -- "illiterate" -- and built a whole new argument around it which was never mine to begin with. I don't think you're doing it intentionally, I suspect it's just because you have a particular sensitivity on this subject. Either way I don't think I can say anything here that'll get a fair treatment from you.

username90 · on June 1, 2019

Not that person, but you said this:

> There is functionally no difference between trying to interact with talented support staff who aren't fluent in your language, and trying to interact with illiterate support staff.

That statement reeks of ignorance. It seems you have almost no experience with other languages than your own, or you would know that communicating while being non-fluent or with a non-fluent works just fine most of the time. Sometimes misunderstandings happens and it can be a bit slower but that is all.

thaumaturgy · on June 1, 2019

> Sometimes misunderstandings happens and it can be a bit slower but that is all.

So your position is that support that's a bit slower, with some misunderstandings, is exactly as good as fast support without misunderstandings, even in downtime-sensitive applications.

Well, okay then.

fernandotakai · on June 1, 2019

this statement is downright offensive and something i wouldn't expect to read here.

fluency is a high barrier to clear. it took me 5 years of speaking/reading/writing english daily to come even close to "fluency".

before that, i had a really good advanced english, but i wasn't fluent. and it didn't mean i was "illiterate".

ksec · on May 31, 2019

And Linode has always had faster CPU, I/O, Network. And lots of small things like that DO only catches up in the recent years, like pooled bandwidth.

Although these days I tend to go with UpCloud, it is very similar to Linode and DO, except you can do custom instances like 20x vCPU with 1GB Memory, spin it up for $0.23 an hour. Compared to standard plan on Linode and DO, 20 vCPU with 96GB Memory would be $0.72/Hr.

EugeneOZ · on June 1, 2019

I love Linode, their support is awesome, their CPUs on "Dedicated CPU Plans" are great (by benchmarks - similar to GCP n1 CPU), disks io bandwidth just amazing. But I had to leave them because their network is not so reliable. It was difficult decision and I tried to return few weeks ago (because I really love them), but again network in London was just not so great.

notacoward · on June 1, 2019

I'd love to see the benchmarks you used. I've done a bunch over the years, mostly I/O focused because that's the most common bottleneck for the kind of work I do. While Linode does pretty well, especially compared to the cloud giants, DO has pretty much always come out on top.

vbtechguy · on June 1, 2019

Really depends on underlying hardware of the Linode and DO VPS servers as they can vary greatly especially on DO depending on datacenter and region you end up in. Newer DO datacenters getting newer hardware so difference compared to older DO datacenters is huge - benchmarks of same DO droplet plan on different hardware https://community.centminmod.com/threads/digitalocean-us-15-...

rapind · on May 31, 2019

I've been on Linode for 8+ years now (moved there from Slicehost when Rackspace swallowed them up) and their service (not necessarily customer support) has significantly degraded. Not sure I blame them though. They've become far more popular since I started with them and are probably doing their best to grow... but I no longer recommend them as I used to.

Just my experience though.

theturtletalks · on May 31, 2019

That's how I feel about Scaleway. Scaling customer service is no easy task especially technical companies that require agents to have some understanding of the product.

metildaa · on June 1, 2019

I drop all traffic from Scaleway, they do not care about abuse coming from their network: https://badpackets.net/ongoing-large-scale-sip-attack-campai...

chillidoor · on June 1, 2019

Call me cynical but for those prices I don't expect amazing service or stability, so I don't run any production stuff there.

jolmg · on May 31, 2019

Could you give some concrete examples on how their service has degraded? I've been using their service for years for light stuff, and I haven't had any problems.

rapind · on June 1, 2019

So we host scores for a sport, as well as inputting these scores. Every year we have a few high traffic events (much higher than normal), and I scale up our servers to support it. However for the last two years there have been outages in their Newark data centre during both of these events. One time it was DNS, all other times it's been data centre wide.

I suspect as they've increased in popularity they've become a bigger target for DDOS attacks.

I've also noticed that in the past year there's been a lot of data centre outages... like every couple months. Hasn't been a deal breaker for us, since our traffic is generally fairly low outside of season, but the ones during seasons really hurt.

Also I'd like to add that they do give you the heads up when there are issues, which is a big plus in my book compared to some other hosts.

I really do think it's just growing pains, and I don't mean to disparage them. Just being honest that I wouldn't recommend them for high availability services. Since I consider them a low budget host that's probably unfair though. We've just outgrown them is all.

This is all anecdotal of course.

v4ult · on May 31, 2019

So who would you recommend?

EugeneOZ · on June 1, 2019

If you are looking for high availability - Google Cloud Platform network is the best. In some other things AWS is better, but GCP network quality is awesome.

rapind · on June 1, 2019

I don't really have a low budget alternative. I know that for our service we're evaluating both google and amazon cloud offerings, but only for our high availability services. I figure DO is in the same boat if not worse.

kudu · on June 1, 2019

I've been using Hetzner Cloud and have been quite happy with it.

nickpsecurity · on June 1, 2019

I'll throw in Prgmr.com. One of their owners, Alyn Post, is on Lobsters with us. They even donate hosting to the site. He's been a super-nice guy over the years. Given how cost-competitive market is, they mainly differentiate on straight-forward offerings with good service. So, I tell folks about them if concerned about good service or more ethical providers.

https://prgmr.com/xen/

Main, potential caveat is they're Xen-based hosting. That may or may not matter depending on what one wants to run. They support the major Linux's.

bluedino · on June 1, 2019

Luke used to post here quite a bit too

dannyw · on June 1, 2019

Linode support helped me diagnose some rare and obscure Linux OS error. They went above their call of duty and it’s why I’m still a customer.

Havoc · on June 1, 2019

>the support on Linode is way, way, way better than DO's.

Can't comment about support, but DO's tutorials on linux server random task 101 are fantastic.

I was surprised how much of my ubuntu server setup googling ended up on DO pages.

bluedino · on June 1, 2019

most of them are well written. It’s the community though, not DO

ficklepickle · on June 1, 2019

So true! Well written, to the point and kept up to date.

slig · on May 31, 2019

I've been with Linode since 2009, and only got bitten once when their entire Atlanta DC went down near Christmas.

I was eager to try the new DO managed Postgres service, but I guess I won't after this blunder.

millettjon · on May 31, 2019

+1 I have used both as well. Linode support has always been good and DO support has been a blocking barrier.

terppy · on May 31, 2019

In response to my first support request DO sent me instructions for closing my account. I followed up and they refunded me a random few dollars.

ausjke · on May 31, 2019

second this, Linode for 15+ years here, tried DO a few times(but never left Linode), now 100% back with Linode.

Linode even has an irc channel(you can use browser to access it), I rarely need support, but when I really need it, it is always fast, to the point, available.

brlewis · on May 31, 2019

Long-time Linode customer here. What were the security screwups? I tried a couple searches but didn't turn up anything.

thaumaturgy · on May 31, 2019

The two major ones were:

A Bitcoin theft via the Linode Manager interface in 2012: https://news.ycombinator.com/item?id=3654110

and a second Linode Manager compromise in 2016: https://news.ycombinator.com/item?id=10845170

In both cases, if you dig into the context a bit, the story turned out that Linode wasn't fully disclosing the breaches to their customers until they were forced to do so when the news about them reached a certain volume. They also may have been -- almost certainly were -- dishonest about the extent of the damage and how it may have impacted their other customers.

At the time, their Manager interface was a ColdFusion application, which tends to be a big pile of bad juju. They started writing a new one from scratch after, I think, the second compromise.

The really bad thing here is that they got soundly spanked for being less than truthful the first time, and then four years later -- when they'd had ample time to learn from that mistake -- they did it again.

So there's a nonzero chance at any given time that Linode's infrastructure has been compromised and they know it and have decided not to tell you about it.

That's what prompted me to start exploring DigitalOcean more. Unfortunately, I've found that there's a far greater chance that I'll experience actual trouble exacerbated by poor support than that I'll be impacted by an upstream breach, so about half my stuff still lives on in Linode.

Twirrim · on June 1, 2019

> So there's a nonzero chance at any given time that Linode's infrastructure has been compromised and they know it and have decided not to tell you about it.

This is entirely the roots of my distrust of them right now. Mistakes happen. Companies I trust demonstrate that they've learned from their mistakes. My tolerance for mistakes is pretty low when it comes to security related things, though. If something has gone wrong, let me know so I can take remedial steps. Their handling of both of those incidents suggested I can't trust them to tell me in sufficient time to protect myself.

ilikepi · on May 31, 2019

Here's a search that will help, look no further than the site you're already on:

https://hn.algolia.com/?query=linode%20security&sort=byPopul...

They've had several high-profile breaches over the years.

api · on June 1, 2019

Vultr is underrated too. I've had nothing but positive tech support experiences with them. Their weird branding turns people off but they do not seem fly by night. We have used them for various things for years with very few problems.

kijin · on June 1, 2019

I've dealt with many Vultr instances on behalf of my clients, and I've had nothing but negative experiences with them. Unstable performance even on top-tier plans. Internal network issues that support keeps trying to blame their customer for. Nowadays when I find that a new or prospective client has been using Vultr, the first thing I recommend is to move off of Vultr.

When there's an issue with Linode's platform, they discover it before I do and open a ticket to let me know they're working on it. When there's an issue with Vultr, the burden of proof seems to be on me to convince them that it's their problem not mine.

WrtCdEvrydy · on June 1, 2019

My only issue is the provisioning weirdness.

I've had interesting issues automating deployment to the point that my current build script provisions 9 VMs, benchmarks them and shuts down the worst performing 6. Some of their co-location is CPU stressed.

ajford · on June 1, 2019

My current employer uses Linode, and yeah, they have pretty good support. However, I've been using DO for the last 5 years, and haven't needed to reach out to support once. But I've had to contact Linode support about 5 times in the last year.

chaostheory · on June 1, 2019

Does DO have different levels of support that you can pay for like AWS? I like that system. You pay when you need it. You pay more if you need more support.

The difference in support DO vs Linode is probably due to DO being cheaper.

oefrha · on June 1, 2019

> The difference in support DO vs Linode is probably due to DO being cheaper.

What? Most DO and Linode plans have exact same specs and cost exactly the same, and IIRC it was DO matching Linode. Although DO didn't seem to enforce their egress budget while Linode does.

(I've been a customer of both for many years, and only dropped DO a few months ago.)

EDIT: Also, according to some benchmarks (IIRC), for the exact same specs Linode usually has an edge in performance.

yeukhon · on June 1, 2019

AWS support is fixed contract. There is basic, business and enterprise. What do you mean by “you pay when you need it”?

chaostheory · on June 4, 2019

I can cancel the fixed contract any time. e.g. when I need support I buy a month of basic or business.

randomsearch · on June 1, 2019

Having used both for years, I’d probably recommend DO. None of the big security issues, but also I’ve found more downtime with Linode, don’t know if they’re upgrading their infrastructure a lot for some reason.

nodesocket · on May 31, 2019

This headline is grossly misleading and very clickbaity "Killed our company". It's not exactly big business that you scale up to 10 droplets for short bursts, I am willing to bet their spend on DigitalOcean is less than $500 a month, yet the author is expecting enterprise support.

DigitalOcean should go the route of AWS and kill off free support completely and offer paid support plans. Something like $49 a month or 10% of the accounts monthly spend.

If you are a serious company with paying Fortune 500 customers, you need to act serious and pay up for premium support and stop expecting free.