
"DigitalOcean Killed Our Company" - sergiomattei
https://twitter.com/w3Nicolas/status/1134529316904153089
======
bcooks
As DigitalOcean's CTO, I'm very sorry for this situation and how it was
handled. The account is now fully restored and we are doing an investigation
of the incident. We are planning to post a public postmortem to provide full
transparency for our customers and the community.

This situation occurred due to false positives triggered by our internal fraud
and abuse systems. While these situations are rare, they do happen, and we
take every effort to get customers back online as quickly as possible. In this
particular scenario, we were slow to respond and had missteps in handling the
false positive. This led the user to be locked out for an extended period of
time. We apologize for our mistake and will share more details in our public
postmortem.

~~~
bcooks
Thanks for the replies. Let me try to address a few of the things I have seen
here. We haven't completed our investigation yet which will include details on
the timeline, decisions made by our systems, our people, and our plans to
address where we fell short. That said, I want to provide some information now
rather than waiting for our full post-mortem analysis. A combination of
factors, not just the usage patterns, led to the initial flag. We recognize
and embrace our customers ability to spin up highly variable workloads, which
would normally not lead to any issues. Clearly we messed up in this case.

Additionally, the steps taken in our response to the false positive did not
follow our typical process. As part of our investigation, we are looking into
our process and how we responded so we can improve upon this moving forward.

~~~
9HZZRfNlpR
What do you recommend your clients to do if that kind of mistake happens to
them? Is Twitter-shaming the only way out?

I know people say some legal arguments why they close you down and won't say
anything, but this is the worst scenario ever. I'd be better off excused at
something I didn't do than just ooops we can't tell you anything, your account
has been shut down.

~~~
xtracto
This is important. I hate how it had became standard for companies to screw
their customers unless they are online-shamed.

The response email even read like a giant polite FUCK YOU (we locked your
account, no further action required by you)

You bet I will have further action!

And it is after the shaming that you get an "I am sorry for this situation".
Which sounds more like saying "I'm sorry we got caught".

My frustration is not with DO specifically, as they do exactly what every
other company does.

But, what of the other thousands of people that got screwed and did not put it
on twitter?

It is the equivalent of when you are in a restaurant and get screwed: It is
the loudest person that complains more the one that gets the reward, while all
the others silently swallow the injustice.

~~~
qmarchi
It's most likely due to the fact that the people who can act upon the process
itself, not just follow the process inevitably see the issue and do truly want
to help.

Getting your message into the right hands is what matters, not the platform
it's on.

------
thaumaturgy
Some people on HN hate Linode because of their past security screwups (which
is valid), but having used both DO and Linode quite a lot, the support on
Linode is way, way, way better than DO's.

DO's tier 1 support is almost useless. I set up a new account with them
recently for a droplet that needed to be well separated from the rest of my
infrastructure, and ran into a confusing error message that was preventing it
from getting set up. I sent out a support request, and a while later, over an
hour I think, I got an equally unhelpful support response back.

Things got cleared up by taking it to Twitter, where their social media
support folks have got a great ground game going, but I really don't want to
have to rely on Twitter support for critical stuff.

DO seems to have gone with the "hire cheap overseas support that almost but
doesn't quite understand English" strategy, whereas the tier 1 guys at Linode
have on occasion demonstrated more Linux systems administration expertise than
I've got.

~~~
znpy
I have interviewed with DO and they tried diverting me towards a support
position.

They told me that on a single day a support engineer was supposed to
help/advice customers on pretty much whatever the customer was having issue
with and also handle something between 80-120 tickets per day.

It's nice to see that DO is willing to help on pretty much anything they
(read: their team) has knowledge about, but with 80-120 tickets per day I
cannot expect to give meaningful help.

Needed EDIT: it seems to me that this comments is receiving more attention
than it probably deserves, and I feel it's worth clarifying some things:

1\. I decided not to move forward with the interview as I was not interested
in that support position, so I have not verified that's the volume of tickets.

2\. From their description of tickets, such tickets can be anything from "I
cannot get apache2 to run" to "how can I get this linucs thing to run
Outlook?" (/s) to "my whole company that runs on DO is stuck because you
locked my account".

~~~
hackermailman
I once worked for eBay a long time ago, and support consisted of 4 concurrent
chats, offering pre-programmed macros often pointing to terribly written
documentation the person had already read and was confused about. If you took
the time to actually assist somebody you were chastised in a weekly review
where they went over your chat support. The person doing mine told me I had
the highest satisfaction record in the entire company, and a 'unique gift of
clear and concise conversation, like you're actually talking to them face to
face' then said I'd be fired next week because my coworkers were knocking off
hundreds of tickets a day just using automated responses, leaving their
customers fuming in anger with low satisfaction ratings, as people are very
aware of being fed automated responses but the goal was not real support, it
was just clearing the tickets by any means possible. I decided to try half and
half, so if the support question was written by somebody who obviously would
not understand the documentation (grandma trying to sell a car), I would help
them but just provide shit support to everybody else in the form of macros
like my coworkers. Of course this was unacceptable and I got canned the next
week as promised. Was an interesting experience, I can imagine DO having an
insane scope to their support requests like 'what is postgresql'.

Anyway imho you should have taken the support position and schemed your way
into development internally. This was my plan at eBay before they fired me,
though they shut down the branch here a few months later and moved to the
Philippines anyway so I wouldn't have lasted long regardless.

~~~
zeta0134
I'm fortunate that my own company (Rackspace) at least has a level head about
this sort of thing. My direct manager looks at my numbers (~60-80 interactions
per month) and my colleagues (many hundreds of interactions per month) and
correctly observes that we have different strengths, and that's the end of the
discussion. I have a tendency to take my time and go deep on issues, and my
coworkers will send me tickets that need that sort of investigative
troubleshooting. My coworker meanwhile will rapidly run through the queue and
look for simple tickets to knock out. He sweeps the quick-fix work away, but
also knows his limits and will escalate the stuff he's not familiar with.

Let me stress here, this is _not nearly_ as easy of a problem to solve as it
appears to be on the surface. We're struggling as a company right now because
after our recent merge, a lot of our good talent has left and we're having to
rebuild a lot of our teams. Even so, I'm still happy with our general
approach. Management understands that employees will often have wildly
different problem solving approaches and matching metrics, and that's
perfectly OK as long as folks aren't genuinely slacking off and we as a team
are still getting our customers taken care of. I think that's important to
keep in mind no matter how big or small your support floor gets.

~~~
ethbro
Support should be looked at as a profit center, but almost everyone tries to
run it like a cost center.

It's crazy that companies spend $$ on marketing and sales, then cheap out on a
interaction with someone who is already interested in / using their product.

~~~
vinceguidry
Running profit centers requires comparatively rarer leadership resources while
running cost centers only requires easy-to-hire management resources. You
don't want your best leaders whipping your support center into shape letting
the company's competitive edge fritter away.

~~~
ethbro
Alternatively, I'd ask if you want easy-to-hire management resources as your
primary touchpoint with paying customers.

------
_bxg1
Looks like Moisey Uretsky personally intervened fairly quickly:
[https://twitter.com/moiseyuretsky/status/1134547532149854208](https://twitter.com/moiseyuretsky/status/1134547532149854208)

That said, any company, especially one working with Fortune 500's, should have
DB backups in at least two places. If they'd had the data, they could have
spun up their service on a different hosting provider relatively easily.

~~~
macspoofing
>That said, any company, especially one working with Fortune 500's, should
have DB backups in at least two places

Yes they should.

How many 2-man shops do you think follow all the proper backup and security
procedures?

~~~
_bxg1
It could literally be a cron job that dumps your DB to a desktop computer once
a week. Not exactly CIA-level stuff.

~~~
stingraycharles
More realistically they would have done backups inside DO and would still be
locked out. Not many people actually do complete offsite backups to a
completely different hosting provider, getting locked out of your account is
usually just not a consideration. It’s unrealistic to expect this of a tiny
startup.

~~~
jdietrich
_> getting locked out of your account is usually just not a consideration_

How many horror stories need to reach the front page of HN before people stop
believing this? Getting locked out of your cloud provider is a very common
failure mode, with catastrophic effects if you haven't planned for it. To my
mind, it should be the first scenario in your disaster recovery plan.

Dumping everything to B2 is trivially easy, trivially cheap and gives you
substantial protection against total data loss. It also gives you a workable
plan for scenarios that might cause a major outage like "we got cut off
because of a billing snafu" or "the CTO lost his YubiKey".

~~~
williamdclt
> How many horror stories need to reach the front page of HN before people
> stop believing this

Sounds like the opposite of the survivor bias. I don't believe it's any sort
of common (though it does happen), even less that "it should be the first
scenario in your disaster recovery plan"

~~~
kelnos
Even if the stories we hear of account lockouts isn't typical, the absolute
number of them that we see -- especially those (like this one) that appear to
be locked (and re-locked) by automated processes -- should be cause for
concern when setting up a new business on someone else's infrastructure.

------
BinaryArcher
This exact same thing happened to me last year. I accessed my account abroad
and they perma banned me.

Support was useless and even with evidence did not believe who I was.

I then somehow convinced them to give me temp access, which in my opinion is
even worse. They didn't believe me about who I was and then gave me temporary
access to an account. DO can't be trusted when their support team could so
easily be socially engineered.

~~~
system2
Okay, if this is real, I am concerned. We have 40+ droplets with many clients.
If anything happens like this we will lose our entire operation, as well as
all of our client's confidential ecommerce data.

------
janjanson
This is probably going to get buried in the replies, but I had a similar
experience with DigitalOcean about a year ago with my account getting
permanently locked with very little explanation and no way of getting it back.
It's still locked to this day. I was just a student using my Github student
package credit, but I was pretty appalled by the service from DO and vowed
never to buy from them.

Unfortunately, my ticket no longer appears under closed tickets. I was still
able to dig up my original ticket message and all the responses their support
made to me through my email though. Here they are:

[https://imgur.com/a/NJMRnyY](https://imgur.com/a/NJMRnyY)

Between the replies I asked about what I could do to verify my account. As you
can see, they didn't even give me a single chance to do so. They told me to
hang on twice then just permanently closed it up. I'm not sure how I even got
flagged. All I did was turn on a droplet and delete it. I checked the audit
logs, and there was nothing suspicious there either. It was just me logging in
and out.

I thought about making a deal about it on Twitter, but I didn't bother because
I don't have any followers and it wasn't a huge loss to me either. Maybe it's
the only way?

~~~
sbarre
The emails from their "Trust and Safety" team are extremely tone-deaf...

"We've locked you out, no explanation" "Sorry for any inconvenience"

Seriously? That last line is like a slap in the face.

No one should talk to a customer like that in this situation, if only because
(a) if this is real abuse, you don't need to be "fake nice" and (b) if it's a
false-positive, you've just come across as extremely smug when you're in the
wrong.

------
shuzchen
If you're reading this and concerned for your own backup story, fret not! In
2019 secure off-premise backups are super easy to implement, even for a 1
person shop. Get something like Restic or Borg or any one of the enumerated
options here:
[https://github.com/restic/others](https://github.com/restic/others)

I've recently implemented backups with Restic, the static binary and plethora
of supported storage backends was extremely appealing. The easiest seems to be
to just point it at a S3 bucket, but given most people have infrastructure on
AWS (off-premise means off-premise) having other options supported out of the
box is pretty handy.

~~~
type0
> The easiest seems to be to just point it at a S3 bucket, but given most
> people have infrastructure on AWS (off-premise means off-premise) having
> other options supported out of the box is pretty handy.

Sure, having other options is good: [https://min.io/](https://min.io/)

~~~
shuzchen
Minio is great! I use it! But spinning up and managing yet another service
when you're already a small shop adds more barriers to entry. Maybe find an
already s3-compatible store (like Wasabi) or find something cheap and easy to
spin up that's supported by the tool (like
[https://www.hetzner.com/storage/storage-
box](https://www.hetzner.com/storage/storage-box))

------
keypusher
Given that the author was quite vague about the nature of this “pipeline” and
that their product is an “AI-powered Startup Selection engine”, I have a
suspicion they were probably crawling and scraping a whole bunch of pages for
new startups. It’s possible that this was totally legit and it just looked
like a ddos attack, or that it was something else entirely, but everyone here
seems to have taken him at his word that what they were doing was actually
above board.

~~~
voldemort1968
I agree that we're not getting the full story.

On the other hand, completely shutting down all their services without a quick
conversation...

~~~
dsl
Having been on the receiving end of terribly broken "pipelines" at startups
wanting to hammer away at my resources, the _right_ response is terminate
first discuss later.

I know of a company that explicitly had a "call us to discuss first" clause in
their contract with a smaller cloud provider. Everyone was on holiday and not
answering the phones while their hacked account was being used to spin up
dozens of boxes launching a DoS attack against a crypto scam site. Guess who
had to eat the bill on that one?

~~~
chris_wot
Surely there was a support contract option where there were on-call people?

------
duxup
It's interesting how many companies simply shut down service rather than say
give a warning and wait for a response (or at least start a clock).

Granted that would require people to communicate and use some form of reason.

Even DMCA for all its warts fires up a warning and has a response mechanism
(granted other issues there).

~~~
jerf
"It's interesting how many companies simply shut down service rather than say
give a warning and wait for a response (or at least start a clock)."

I'm sure many people have started their companies firmly convinced that
they'll give plenty of warnings and never automatically shut anything down.

The problem is, you rapidly discover that doesn't scale, not even on a human
level. You send your notice. 48 hours later, you've gotten no response. If you
act now, it isn't materially different from your point of view as if you
simply acted right away.

Also, in a cloud environment, even Digital Ocean, as many people have learned
the hard way with leaked credentials, you can rack up charges faster than the
relevant humans can even conceivably be notified. As the hosting company, you
can't just let abusive or accidental usage go. You can refund their money, but
that's still resources of yours that went to something that failed to produce
revenue rather than something that did; you can't absorb that indefinitely.

I'm pretty sure you'll inevitably discover that you have no choice but to put
automation in.

~~~
kelp
This is exactly why AWS has relatively low default account limits, and you
have to open a support ticket to raise them. It's largely to prevent run-away
costs from surprising the customer.

~~~
busterarm
I accidentally left a 24xlarge instance running for a month without realizing
it and they looked at the activity and were totally cool about zeroing the
bill for that instance for the month. Basically gave me us a $2000 credit.

It does probably help that I said I would be careful not to do that again and
had already put in a CloudWatch Alarm to automatically power-off the instance
after a set period of idleness before filing the ticket.

~~~
duxup
The actual cost to Amazon is so low it probably isn't worth insisting on
charging the mistakes that contact support.

~~~
dfrage
The good will generated by the stream of customer testimonials of this process
we hear about is priceless.

The proposition seems to go something like this: it's a new thing, mistakes
are statistically expected, you make an honest one and plead "oops!" and we
refund you, no doubt pointing you to resources on best practices and account
throttling. As long as the customer takes the lesson to heart, everyone wins.

------
StudentStuff
DigitalOcean is operating worse than a fly by night host (like AlphaRacks,
GreenValueHost, etc). The reasonable course of action would've been to email
the customer and throttle their API access to prevent load spikes, but DO
instead locked their entire account (not just the service that DO felt was
being abused).

A fly by night will often only suspend the VM or database that is in question,
not other services on the account (having been in that position before).

~~~
busterarm
Really? Big European VPS hosts like OVH just turn off your stuff until the
problem goes away.

Hardly fly-by-night.

~~~
figgis
Not in my experience they don't. I use OVH and nfoservers and I've had an
issue like this exactly once on both hosts.

On OVH one of my servers was hacked and running typical scripts that are run
once that happens (port checking, common admin credentials, brute force
attempts, etc)

They cut off all internet access to and from the server and sent me an alert
stating what was happening and that I needed to VNC into the server, resolve
the issue, and let them know how/why it happened and how I resolved the issue.
Once that was done they just removed all the blocks on the server and we all
went on our merry way.

Edit: To clarify the VNC console is on their site, not a remote connection.

~~~
busterarm
You said "not in my experience they don't" and then literally describe in
detail how they did exactly what I was saying they do.

~~~
rat9988
I guess he meant they don't lock your whole account but only stop a single
server.

~~~
figgis
I should have been clearer. This is exactly what I meant.

------
cptskippy
Let me see if I get this straight...

Developer has a Python script that takes 1 second per record to execute and he
has 500,00 records to process, so he spins up 10 distinct VMs each running the
same Python script to parallelize the task.

The provider shuts him down and cites a section of the EULA that says "You
shall not take any action that imposes an unreasonable... load on our
infrastructure." Basically saying "Hey whatever you did, don't do that."

Developer gets his account restored and then proceeds to do exactly what he
did to get it locked out the first time around.

Also Developer has all his eggs in one basket.

Shitty customer/product service aside, someone explain to me why DigitalOcean
is at fault here?

~~~
aftbit
Why would DO be upset about spinning up 10 VMs then spinning them down again?
Isn't this exactly the point of cloud providers? This is what they bill me
for, right?

~~~
SteveNuts
Smaller VPS providers like Linode or DO oversubscribe like crazy. Last time I
used Linode, they would email us telling us we're using too much CPU or
memory, and we'd need to move to a larger tier VM.

~~~
Aaronn
I think you misunderstood those emails. They are just there to help you if you
didn't realize some process was stuck or something, they specifically say
"This is not meant as a warning or a representation that you are misusing your
resources." and you can also change the value that triggers those emails or
disable them completely.

~~~
SteveNuts
No it was definitely a ticket from their support, telling us we were noisy
neighbors. They told us we needed to increase the size of our VMs.

------
rhacker
I am a software developer and yes software is eating the world. One of the
side effects of software eating the world is out of control software.

    
    
       * Autobans in facebook
       * Cheated instacart drivers
       * $10000 stolen from thousands of bank accounts (and returned hopefully) on Etsy
       * Tesla cars literally killing people (it now feels like it's once a month right?)
    

Now the software that runs software is running amok.

The interesting thing about software is that it runs very quickly and it acts
as a giant lever that affects the entire world.

You can think of it like a giant airport suddenly being installed in your back
yard and just start having planes take off, changing your $300k investment
into a $120k valued house overnight. That's how quickly software is changing
the real world.

I know there is at least one HN browser writing a book on it. But I would love
to see more books on how the internet, and software, is messing our world up.

~~~
karlmcguire
This reminds me of recent talk by Jonathan Blow [1], where he talks about how
we've made very little progress in the field of software and anything that
_appears_ to be progress is just software leveraging better hardware.

It's quite scary how low our standards have gotten.

[1]: [https://www.youtube.com/watch?v=pW-
SOdj4Kkk](https://www.youtube.com/watch?v=pW-SOdj4Kkk)

------
Kpourdeilami
Not even surprised by the way Digital Ocean have handled this. They pulled
something similar on me back in 2014 at a previous company I used to work in.
They essentially shut down my account and did not even let me get my backups
out.

~~~
wvenable
It seems like there should be a middle-ground between all-on and all-off. If
I'm paying customer, I should be able to access my account in some capacity
even if some abuse related issue closes off server access.

------
andrewstuart
Why is killing accounts part of the way they do business in any way, EVER?

That's what destroys company reputation.

I may be wrong but my understanding is that the gold standard - Amazon Web
Services - will only ever suspend your account until an issue is sorted out.

Whoever runs Digital Ocean needs to stand up and say very loud and clear to
this community that they will _never, ever_ delete accounts - if he doesn't
then he can live with the business destroying reputation of Digital Ocean
being "one of those account killing companies".

What company would ever host on "an account killer" \- the risk is way too
high.

~~~
duskwuff
> Why is killing accounts part of the way they do business in any way, EVER?

Because fraud and abuse exist.

Sometimes customers really are doing malicious things which need to be stopped
immediately, and the only thing that will make that happen is disabling their
account. Trying to make accommodations for those users is a fast path to
getting sued, getting blacklisted by mail providers, and/or losing your
upstream connection entirely.

~~~
dcbadacd
Disabling != deleting.

------
docker_up
I thought Fortune 500 companies (at least the enterprise companies I dealt
with) that had checklists for their SAAS vendors that had to check off things
like Disaster Recovery Readiness, etc.

At the very least this company learned a hard lesson about Disaster Recovery
best practices. Hopefully all the up and coming companies reading this story
learns as well. Also please remember that a backup that isn't tested IS NOT A
BACKUP! I've been in so many situations where backups were corrupted, so part
of the disaster recovery is to test the backups and make sure you can really
recover.

There are a million different scenarios where their data could be lost and it
not be Digital Ocean's fault. It's the company's responsibility to have
protected their customers from this.

I worked at a company with no real Disaster Recovery plan. I was told that "we
can get the servers up and running within 18 hrs if we had an outage", which
not only was absurdly slow but probably an underestimate. Only by the grace of
God did we not suffer a real outage but if we did, it was totally the VP's
fault for not addressing my concerns.

~~~
the8472
Such checklists exist, but not all services are of equal importance to the
customer, so they are not vetted in equal detail.

Often the supplier has to fill out the checklist themselves.

> do you have backups?

Manager: [X]

> are they offsite?

Dev: cloud provider docs say so

Manager: [X]

Customer: Great, you won the bid.

------
ApolloFortyNine
Not even giving the account a warning is honestly enough for me not to go with
DigitalOcean for even small projects in the future. Their prices are not that
good for fly by night VPSs if that's how they're going to act.

------
Hamuko
This reinforces my gut feeling that companies like DigitalOcean and Scaleway
are fine for hobby usage, but for any serious business operations, I'd go with
AWS.

~~~
ApolloFortyNine
Well if anything else this case just made that abundantly clear. Locking a
customer resources without warning, and not allowing access to their data, is
unacceptable, I'd argue even for hobby services. There's a lot of VPS
providers cheaper than DigitalOcean, I don't believe they have room to act
like this.

------
ziddoap
An expensive lesson learned in redundancy.

It's a shame to see things play out this way, but sometimes a lesson is taught
in a brutal fashion.

I imagine the author will be more careful in the future regarding off-site
backups, additional technical partners, contingency plans, etc.

~~~
Hamuko
There's a pretty hard cap on the level of redundancy you can do with a two-man
company, as I assume a two-man company does not bring in a lot of money.

~~~
ziddoap
The number of employees shouldn't be the deciding factor when you are a tech
company that apparently has fortune 500 companies as customers.

I'm not talking 5 9's redundancy. I'm talking grab a backup once a week or
something, anything, to help mitigate a scenario like this. According to the
thread, they lost ~1 year of data. That should be unfeasible to a company
serving customers, let alone Fortune 500 customers.

Disaster recovery planning is key for a technical company to succeed. It is
clear they never considered a scenario where there DO account would be
closed/compromised/down.

~~~
Hamuko
>It is clear they never considered a scenario where there DO account would be
closed/compromised/down.

I don't think the chance of DigitalOcean automatically freezing your account
to a point where only a co-founder can do something about it has been well
publicised.

~~~
ziddoap
In all practicality, DO freezing your account has the same effect of DO being
down (or closing, etc.), or your account being compromised and you being
locked out of it.

A contingency plan should ideally have been in place for a scenario where,
regardless of root cause, you have lost access to your DO account.

~~~
GordonS
> In all practicality, DO freezing your account has the same effect of DO
> being down (or closing, etc.)

Given their size, that is _extremely_ unlikely to happen without warning.

~~~
ziddoap
Sure, but them closing combined with the chance of them freezing your account
(feasible, considering the topic here) and the chance of account compromise,
and the chance they go down for extended maintenance... It is inexcusable not
to have a disaster recovery plan for the scenario where you cannot access your
DO account.

Imagine you are a customer of this company. Would you be rallying to their
defense, "backups aren't needed because the scenarios are unlikely", or would
you be angry that the company had zero contingency planning and lost all of
your data (or the data you rely upon)?

If you can honestly say, as a (hypothetical) customer of the company in the
thread, that you wouldn't care if a company you relied upon has no disaster
recovery planning, more power to you. I, however, like to make sure that the
companies I'm relying on have some sort of contingency that protects me as a
customer.

------
dahfizz
I think a real takeaway here is to avoid going all in on one hosting provider.
You should always have "off-site" backups for mission critical data.

Yes, it sucks that DO did this. But this is hardly the first time someone got
screwed over by some poor AI automated security. Backing up your data to
backblaze or AWS would be the cheapest insurance policy you could buy.

------
LinuxBender
I know this doesn't help you now, but you may want to consider distributing
your site or setting up DR sites across multiple VPS providers. If your
application supports it, you may even want to consider using a DNS provider
that can do health checks and fail over the site for you.

------
bitL
Heh, that reminds me when a director of one unit of a "world's top 10 brand" I
was working at on big data architecture told me that they always knew when I
ran anything on the cloud as I brought it down within a few seconds of my
heavily multi-threaded processing scripts. I guess DigitalOcean decided that
instead of fixing their infrastructure they just kill their own smart clients.

I find this enforced mediocrity pretty appalling. With barely functional
"anomaly detection" Deep Learning models with dubious decision making (I did
some so I am familiar with the "landscape") it's gonna be a lot of fun for
anything slightly deviating from whatever vague norm that can't be explained
nor tested against.

------
perrygeo
> Every 2-3 months we had to execute a python script that takes 1s on all our
> data (500k rows), to make it faster we execute it in parallel on multiple
> droplets ~10 that we set up only for this pipeline and shut down once it’s
> done.

Wait, a whole distributed computing sub-system to make a 1s process faster?

> I got their final message right after arriving in Portugal.

Did he initiate the script from a IP address outside France?

Though that might explain why DO's fraud detection was triggered, it doesn't
excuse their actions. Send an email first, jeez.

~~~
dkempner
> Wait, a whole distributed computing sub-system to make a 1s process faster?

I think OP means 1s per row, so 138 hours.

------
blacksmith_tb
I have happily used DO for small things, like hosting a Ghost blog, or running
an Algo VPN, but I am a little surprised to see people doing bigger
infrastructure with them - not that they deserve to lose everything for making
that choice, but it seems like it would have been clearly riskier than
AWS/GC/Azure?

~~~
tracker1
I'm surprised that people put significantly more faith into GCP, AWS, Azure
etc... similar things have happened in lots of scenarios. Not to mention when
registrars have taken down sites.

This is a shame, and imho it sucks a lot. One of my biggest points of paranoia
are with backups and scripting a return of a site on another provider should
the worst happen.

Getting ready to launch something barely more than a hobby and was planning on
DO because the hosted Postgres and a small K8s cluster is significantly less
for what's there than the alternatives. Frankly, I don't want to go from
~$100/month to over $200 for another provider for something that likely won't
lead anywhere.

Goosfrabah... goosfrabah...

------
coldtea
> _We lost everything, our servers, and more importantly 1 year of database
> backups. We now have to explain to our clients, Fortune 500 companies why we
> can’t restore their account._

I think what you have to explain is why there wasn't a contingency plan, with
your own servers, colocation, another cloud offering, etc...

~~~
PostPost
When AWS has gone down in the past, it's severely impacted massive tech
companies like Netflix and Spotify.

Why would there be an expectation that a 2-man shop have "another cloud
offering" as a contingency plan when some of the biggest and best tech
companies do not?

People use services like AWS or DO because they _are_ the contingency plan -
they have the size and scale that smaller companies cannot afford or
implement.

~~~
prophesi
The difference is that when AWS goes down, Netflix/Spotify still have backups
and could adapt infrastructure if the outage involved permanent data-loss.
You're talking about the people who built
[https://github.com/Netflix/chaosmonkey](https://github.com/Netflix/chaosmonkey)

I'd argue that it should be _easier_ for a 2-man company to adapt to cloud
service outages, as they likely don't have to keep up with nearly as many
backups or moving parts.

~~~
Dylan16807
So pretend they had offsite backups. That's a separate issue from an entire
contingency plan. The ability to adapt is not the same thing. This company
could certainly adapt to a new host if they had an extra backup.

~~~
prophesi
The ability to adapt is the definition of a contingency plan. It's
essentially, "If this person/service/database/customer/etc vanished off the
face of the earth, what do?"

~~~
Dylan16807
Okay, then that means they _did_ have a contingency plan, except for a single
rsync.

Which would mean you disagree strongly with coldtea?

~~~
prophesi
I'm really not sure what your argument is.

Their entire business was completely reliant on DO droplets. It doesn't take
much foresight to think, "hey, I should probably make a backup in case this
VPS goes down."

Nothing in this comment thread, or the OP twitter thread mentions anything
about the rest of this imaginary contingency plan of theirs.

~~~
Dylan16807
Coldtea said they need to explain their lack of contingency plan wrt "servers,
colocation, another cloud offering, etc...".

PostPost said they didn't, that even huge companies don't have contingency
plans.

I agree with PostPost, and I'm trying to figure out which one you agree with.

If you define being able to adapt as a contingency plan, well, I have
confidence that this company is _fully_ able to adapt! Their architecture is
small and pretty easy to move. The only problem is a lack of external backup,
which will be remedied very soon, and once that happens they could easily
shift to another service even if DO re-disabled their account.

So that would mean you agree with PostPost. But you don't seem to agree at
all.

I'm struggling to reconcile "The ability to adapt is the definition of a
contingency plan." and "this imaginary contingency plan of theirs". If you
demand a preexisting written plan then that means you're not accepting "the
ability to adapt" as a valid answer at all.

------
ginkgotree
Wow. well, I will avoid DO at all costs for the rest of my career. There is a
reason Enterprises trust AWS, GCP, Azure, and pay for premium support (and
startups should, to the best of their financial abilities). He needs to sue or
negotiate to recover material damages caused by this.

~~~
dkersten
While this incident certainly puts me off DO too, the GCP horror stories are
equally bad.

------
jlv2
Best Twitter comment (grammar errors and all): "And if your full business
relies on one tech partner (no offsite backups) your not doing your tech job
right."

~~~
beejiu
You'll be surprised how many companies are "all in" on AWS or Google Cloud,
including ALL backups.

~~~
mkhattab
I agree. Who are all these people that pull backups, that maybe GBs or TBs in
size for offline storage? How does that even work in practical terms in
disaster scenarios like this where resolution times are expected within hours
and not days?

~~~
jjeaff
It doesn't. I would hazard a guess that there are zero medium to large
companies out there right now that could swap to a new cloud provider in a few
hours.

------
tgsovlerkhgsel
The timeline is interesting in itself: This story was posted 4 hours ago on
Twitter, 3 hours ago on HN.

The cofounder picked it up 3h ago. DO responded and apologized from the
official account 2 hours ago, claiming it was fixed, and is actively
responding to people tweeting at them, doing damage control, since about 1
hour, promising a public postmortem.

While it's sad that a social media escalation was needed (and it confirms that
getting attention on social media is the only effective way to resolve hard
issues like this), the response after that was quite fast. Let's see how well
and how quickly they deliver on the postmortem.

------
nhooyr
A few years ago I used to run a VPS on DO with a mail server, VPN and some
code I was writing. Once I was done with everything, I used their snapshots
feature to backup my VPS and shut it down.

Two years later I wanted to restore the VPS but turns out my snapshot become
"outdated" and they stopped supporting the format for restoration... Support
was completely useless, wouldn't even let me download the snapshot, said at
most they could mount it into a new VPS and I could recover the data myself.

Very unprofessional.

~~~
coryrc
That seems reasonable. Launching security-bug-ridden software seems dangerous,
so mounting to a VPS so you can copy the important files seems an entirely
reasonable access method.

~~~
nhooyr
Not sure what you’re talking about? Why would the image be bug ridden?

------
wpietri
Does anybody know what they actually do?

I dug around to find a Wayback version that actually had text. I understand
the words on their own, but when put together I get nothing:
[https://web.archive.org/web/20181030015237/https://raisup.co...](https://web.archive.org/web/20181030015237/https://raisup.com/en)

~~~
ndiscussion
Typo on the front page, how you do one thing is how you do everything.

>powerfull

~~~
jjeaff
Ya ya, and sometimes a small typo just slips through.

------
machbio
From the twitter thread - seems like the problem was resolved.

~~~
nwsm
How's that? The last tweet from him I see is asking them for their data still.

[https://twitter.com/w3Nicolas/status/1134529379701334017](https://twitter.com/w3Nicolas/status/1134529379701334017)

~~~
ookblah
that last response from them is pretty damning. it's like what happens when
your customer success turns off their brain and just applies blanket rules to
everything.

yikes, just when i thought their kubernetes and managed db's were looking
attractive...

~~~
zbobet2012
They are, just don't rely one _one_ service. You need to be cloud agnostic and
this is why.

~~~
scarejunba
Strong anti-recommendation for this advice. The cost in dev time plus keeping
things running is not worth it. This is a rare event. Just use a more well-
established cloud service.

~~~
wpietri
That's generally good advice, but I think it's important to keep backups and
other disaster-recovery stuff somewhere else. Heck, just buy a NAS box and
sync things nightly.

------
grayed-down
We do a fair amount of business with DO to the tune of about 40+ droplets used
together and separately for various tasks.

While we could certainly survive the loss of these assets, the recovery would
be long and costly.

So I would certainly say that this story gives me a great deal of pause and
will take up some mental space this weekend as I think about future dealings
with DO.

------
drugme
_We lost everything, our servers, and more importantly 1 year of database
backups. We now have to explain to our clients, Fortune 500 companies why we
can’t restore their account._

And yet, the explanation is very simple:

Because you neglected basic principles and elected to put all of your backup
eggs in one basket.

------
elagost
It's always sad to see these stories because they always seem like a
preventable tragedy. DigitalOcean, Azure, and AWS seem like the go-to for
start-ups these days, instead of self-hosting your stuff at home or even in a
colocation space. Even though it's a "dirty word" these days, on-prem does
have its huge benefits.

Professional stuff is one thing, but that's not to mention anything personal -
anything I care about I won't put exclusively on someone else's computer. I
want to have absolute control over as much of my stack as I can. Seems really
scary that some company has control of your entire infrastructure and can ban-
hammer you without notice, permanently, at any time of day or night (or while
you're on vacation).

------
amelius
What other business shuts their customers out if they use too much of their
product?

~~~
detaro
Any that suspects fraud and doesn't want to be on the hook for the inevitable
chargeback.

~~~
taffer
Caps or quotas are a better way to achieve this IMO. Random shutdowns do not
make it sound as if DO is ready for production.

~~~
harryh
It's not just the volume of usage that can indicates fraud, but the pattern.
In addition, relying on quotas creates a system that is easily games by
perpetrators of fraud.

Caps or quotas are not sufficient to deal with this problem.

~~~
taffer
If a cloud service is supposed to be ready for production, then customers
should be safe to assume that they will not simply be shut down, especially
not without warning. Otherwise, the provider must make clear that the service
is only for hobby use and not for commercial use.

~~~
harryh
Every single major cloud service provider will shut you down without notice if
they detect obvious fraud.

~~~
taffer
What kind of fraud do you have in mind and do you know of a case in which, for
example, Azure switched off an enterprise customer due to unusual usage
patterns?

------
bsg75
The saying "2 is one, and 1 is none" applies to hosting providers as well.

------
pkrefta
The quality of DigitalOcean communications in this case - especially email
responses look like worst examples from some pretty terrible corporations. I
thought DO was (is?) trying to differ from them.

------
vsl
I was evaluating DO the past two months, but reading this, I'm staying at
Linode - their poor security incident handling in the past is a theoretical
concern, this is a more immediate one.

It's not this anecdote in itself, but that it corroborates my experience
during trial that I ignored and dismissed as support incompetence (which
should have been a warning sign in itself). After setting up the account,
adding a payment card, I wasn't able to enter our VAT ID as part of billing
details, with some nondescript error. So I asked support.

Two days(!) later, they responded by asking for _incorporation documents_ ,
which was frankly bizarre (and a first in ~10 years of running business):
they're not exactly a bank with KYC requirements. When I responded, basically,
WFT?, and told them to check the billing data in VIES, they eventually fixed
it.

But what I got from it was a distinct impression that their default assumption
is that the customer is trying to defraud them, even when it makes no sense.
To this day, I have no idea what kind of fraud could they _possibly_ be
anticipating there (they allowed the card).

This story is on the same general subject, and so are others surfaced here and
on twitter in reaction: the customer is presumed scumbag.

------
unethical_ban
I see a lot of people saying "They should have had more than one provider" and
"They should have had better backups".

What I'm saying is: If you ARE going to put your business at the mercy of one
company from top to bottom, would it not be wise to try to get some kind of
account rep? Or have some sort of communication with the company as to the
nature of your operation?

And if that isn't an option, should you do business with them?

------
dmh2000
Question : so what do you do besides host it yourself? Do you set up a backup
on a different cloud service so that at least you can failover without too
much downtime? you then have to at least pay for the some of the resources
even if there is no traffic. or backup locally but have a process set up with
the different service so you can provision and get back up quickly? or is
there a better solution?

~~~
ramraj07
At least have a plan (which you test out once) of how you will migrate to
another provider in case your current one screws you like this? All you will
need to do continuously then is dump a backup copy of that data into they
providers storage..

------
reaperducer
Is it odd that they didn't keep backups elsewhere?

I don't ever back up to the same service that I use for production. Or is that
just me being paranoid?

~~~
detaro
> _Or is that just me being paranoid?_

Clearly not.

------
throwaway9870
Fortune 500 customers with no off-site DR backup? Hm, not really a good plan.

------
vbezhenar
They are reckless to trust a single company with their backups worth of money.
I don't trust Google and Amazon with backups of my wedding video nobody cares
about, so I'm using my home NAS with raid as a third additional storage (along
with Glacier and Coldline). And they just trust DigitalOcean with all their
data, assets? Crazy. Don't put all eggs in one basket. Register domain from
one company, host DNS in another company, host your servers in third company,
keep your backups in multiple places owned by multiple companies, test your
backups periodically, have migration plan if one company decides to ban you.

That's another reason that I'm very skeptical of Amazon and Google offerings
of their proprietary API. Rent virtual servers, no problems, you can rent
those servers from a thousands of hosters. But if you're using their
proprietary APIs, you will have to rewrite your software to migrate and you
will have very tight time frame.

------
kristopolous
This is why I always use many providers. I'm talking about $10/month or so
packages, this is pretty cheap. They're also not all on the same card in case
one gets locked

I use 4 for my current company and have redundancy spread over them so that
if, say, AWS goes down or I get an account locked or whatever, nothing is
lost, things continue to be operational, slight degradation happens and that's
it.

This really isn't that hard to set up. Under a day or so and then just do
stuff in ssh config and the shell rc to act as helpers so you remember how to
do things.

It's super cheap, pretty easy, and robust.

It's awful what happened to this guy but it's kinda like the person who backs
up to the same harddrive as the originals. Awful to lose stuff, it shouldn't
happen, but also don't do that.

------
emdowling
While DO is not without fault in this story, any Fortune 500 company would
have to be pretty stupid to work with this company after they admit to not
storing backups in other places. I feel for you, but a backup is not a backup
if it’s subject to the same weaknesses as the master copy.

~~~
unixhero
True. Any sane Fortune500 should do it's supplier Due Diligence before signing
over business critical functionality to a third party.

------
MrStonedOne
I see no need for DO to shut down accounts and active long established nodes
for a fraud check. Disable the offending node, disable making new ones, and
limit editing existing ones to off/on controls.

Regardless of this example of a false positive, locking whole accounts over
that is unwise.

------
bww
I can understand why a cloud platform would proactively disable workloads that
appear to be acting dangerously. At the same time it seems quite unreasonable
that there's no mechanism for a customer to get their data back when their
account is unilaterally shut down.

------
yardstick
Are there any best practices I should follow when using AWS, GCP, Azure, DO,
and others to avoid these sorts of situations?

I have heard although can’t confirm that using on account billing rather than
using a credit card makes them less likely to just disable your account for
billing issues. Things like that would be helpful to know. Should I be letting
them know more contact info about us, asking for an account manager, etc?

Does it make a difference in how they react to you if you are spending low
thousands a month vs tens of thousands a month vs hundreds of thousands etc?
Or is everything always automated to death?

~~~
aembleton
Make backups to a different cloud provider, or onto your own infrastructure.

------
mountainofdeath
You know, there is usually an easy way to mitigate this. When you create an
account, be sure to supply business information e.g EIN in the USA and see if
you can do invoice billing. Once you get set-up as a business, most providers
assume you know what you are doing.

These sorts of things tend to slip through the cracks. If you are running is a
business capacity, make sure you treat everything like any other business
would.

Even as a two-man company, you can't afford the cost and potential liability
of not operating as a registered business. It also shows that this is a
serious business and not a hobby.

~~~
jaden
> _Once you get set-up as a business, most providers assume you know what you
> are doing._

Is there any evidence for this? For all we know his DO account was set up as a
business.

------
garlandcrow
This same experience happened to me 6mos ago as well, actually I tend to read
this online every month or so. No response from them for weeks, they without
warning locked and deleted all instances, data, backups, everything. It wasn't
until I posted on HN that I got a response. They said it was a mistake and
apologized but...yea...whoops we deleted everything and killed your start up,
want a free month of service? So no matter how slick the UI, I will go out of
my way to never use this trigger happy company that is killing peoples
startups on a regular basis.

------
m-p-3
They literally put all their eggs in the same basket by having the backup and
the production environment at the same provider.

Not saying this shouldn't have happened, and hopefully they have learned from
this experience.

------
elcritch
> After sending multiple emails and DM on Twitter they unlocked our account,
> we got 12h of downtime and got a nice

Wait their entire business was effectively shutdown and all they did was send
an email? Granted the DO handling of a possible abuse situation is shocking,
but to allow your business to go down for 12 hours and not be trying to call
any and every actual human being at DO seems to be negligent on their part.
While us technical types love interfacing via digital means, some situations
benefit from actually talking to a live human.

~~~
megaremote
This is a 2 person startup, one of them was enroute to portugal on holiday.

------
neop1x
Maybe they should have their own physical server in a datacenter. In addition
to more flexibility, it would also be cheaper. Cloud providers are trying to
convince people it is easier by using them, but in the end, if your cloud
grows a lot, you are still going to need a team dedicated for management, etc.
They are trying to convince it is cheaper to use them, but I can easily have
32GB of RAM or far far more on a single node at the fraction of the cost of
the similar virtual offering (if they even offer such big VMs).

~~~
stupidcar
It won't be easier, and it won't be cheaper. When you're a small team trying
to build a business, there are a lot of business functions where you won't
personally have the expertise or the time to do it yourself efficiently, and
your requirements won't be large enough to justify hiring somebody full-time.

In the case of these business functions, the standard and correct approach is
to outsource them to a 3rd-party service provider. You do this with
accountancy, legal representation, facilities, office management, recruitment,
etc. If you try to bring all these things in-house from the get-go you'll
never get around to building a product, and it's financially and logistically
sensible to do it with IT as well. This calculation may change over time as
your business grows, but if you can't comprehend that the correct strategy for
a fledgling business may not be the same as an established one, then you're
simply not suited to run a business in the first place.

Of course, in every case, you're taking a calculated risk by relying on a
service provider: They may go bust, they may be incompetent or malicious, they
might ramp up their prices. Your job as the manager of a company is to accept
and manage these risks as best you can. Risks cannot be eliminated, only
managed, and attempting to do is a fool's errand. If things do go wrong,
you'll always have people lining up to tell you how you _could_ have avoided
this problem, usually it's by making a decision that can be justified with the
benefit of hindsight. You should ignore these people. The only question is:
did you make the correct decision at the time, based on the facts to hand?

------
paulcarroty
Used DO ~2 years ago, the storage was very slow and feels like cheap Seegates.
So, I just switched to AWS.

Also, had troubles with trial too - activate $5-10 coupon was quite impossible
without PayPal payment.

------
rocky1138
The old saying still stands: There is no such thing as the cloud, there is
only someone else's server.

Hopefully everything gets righted and these folks can start making solid,
multi-site backups.

------
jasonlotito
This is not the first time DO has done things like this. I would not trust DO
with anything critical, and have advised people in the past to not use DO.
Nothing here surprises me.

------
nine_k
Something that made me scratch my head:

* The company quickly creates and starts 10 VMs and triggers an automated lock-down, with a message mentioning a sudden spike of activity.

* The company gets the account unlocked.

* The company _again_ quickly creates and starts 10 VMs, and triggers the auto-lockdown again.

Note to self: when something damaging happens as a result of a seemingly
normal action, avoid doing that same seemingly normal action immediately
again, lest the damaging consequences hit again.

------
morpheuskafka
Who is good about not locking accounts or taking similar actions? Apple and
Google are both notorious for blocking things for no reason, is AWS or Azure
any better?

~~~
keypusher
They are all going to have some form of automated protection against malicious
activity, but I suspect AWS and Google’s algorithms are better than the
others. My experience with AWS and Google in general is that your treatment
varies with your support plan. With business or enterprise level, you have
dedicated resources within the company that are going to be aware of such
issues or can escalate and sort it out quickly. I understand not wanting to
shell out the base cost for enterprise if you are a small company on a budget,
but paying at least for business support is a good idea if you are actually
running a business. I have never actually heard of this happening to such a
customer though, so perhaps they have extra processes in place?

------
pushpop
Not excusing DO here but there’s also a lesson to be learned about backups:
make sure they’re kept off site in case the whole site itself because
compromised. Usually when we say this we mean in terms of a physical disaster
(eg fire) but in this case it’s also in case of a logical disaster (account
getting shit down).

So the lesson is don’t leave your backups with the same cloud provider that
host your database. You should also pull local copies as well

------
charlus
This isn't assuring to read at all. I'm trying to build and still relying
heavily on DO, for what is becoming thousands in profit quite quickly.
Frankly, I'm very uninterested in resolving a problem in hosting - mostly
because it's boring, but because DO seems nice to use in every other way.

On the other hand - I'm becoming increasingly aware of the inevitable Twitter-
social-media-pile-on for any company ever.

------
mark_l_watson
I am sorry they had this problem. Stories like this about edge cases with
cloud deployments make my conviction stronger to always: 1) have a data backup
plan that uses a different cloud provider unless you have so much data that
you can’t afford the extra bandwidth and storage costs. 2) if possible don’t
rely on unique cloud platform services and be ready to bring a system up
quickly on another service provider.

------
SQL2219
How about a support emergency button. Clicking it will cost you $1,000, but
would immediately put you in contact with a human in escalated support mode.

------
superasn
I think an important thing you can learn from this story is that you should
keep your backup on a different host(s) or better even have replication
enabled.

In these days, most apps generally can be migrated to a new host in seconds as
long as you have the data source alive.

If they had access to thier data they probably should have been able to spin
up a similar ec2 instance in minutes and say goodbye to DO forever.

------
luhn
It's easy to think your data is safe when your cloud provider advertises 11
9's of durability. But three replicas of your data doesn't protect against
things like this. Even in the cloud age, offsite backups are important:
Different provider, different region, different payment method.

Unfortunately it doesn't help that cloud egress bandwidth is criminally
expensive.

------
kissgyorgy
I really liked [https://hyper.sh](https://hyper.sh), they provided hosting for
running Docker containers. They handled this same case by having a limit on
every account by default. When I requested to increase that limit, they asked
about my workload, I explained them briefly, they allowed the limit increase
and all was good.

------
oedmarap
I agree with the other comments about not having backups in the same place,
and ensuring that you distribute your assets (domain, DNS, compute, backups,
etc.) across as many providers as possible.

One thing I will add is that, especially for a small shop or project, assume
from the get go that by renting infrastructure from DO (or any provider) that
user-hostile actions can and will be taken when it comes to any issue
regarding TOS violations that you are unaware of.

This assumption helps to build redundancy in your mindset. Have a production
website or app in DO for example? Droplet backup, periodic snapshots, B2
server backup, S3 tar backups, containerize apps if possible, have equally
provisioned (smaller, idle VMs) infra on another DO account or another
provider if possible, and so on. I know this is overkill but paranoid
sysadmins/devops are always rewarded.

Just to add some context for DO specifically, they're a great provider in my
anecdotal experience and they are constantly rolling out services aimed at
medium to large scale workloads, such as managed databases and k8s.

That being said, it's entirely possible to transfer snapshots [0] to another
DigitalOcean user account or teams account. So at the very least, create an
entirely new DO account just for holding snapshots, outside of the native
droplet backups and third-party backups you're doing on an application level.

[0] [https://www.digitalocean.com/docs/images/snapshots/how-
to/ch...](https://www.digitalocean.com/docs/images/snapshots/how-to/change-
owners/#transfer-a-snapshot)

------
mettamage
Out of curiosity, does anyone have some resources to have a homegrown DO/AWS
server space?

Say I have a friend at all the continents (including Antartica for hypthetical
fun-ness), they are all willing to allocate some square meters to plunk down
some servers for whatever is needed.

How would I go about it and build this small(ish) infrastructure myself?

------
simonebrunozzi
If you run mission-critical apps for F500 companies, you HAVE to have a
backup/DR policy in place, using a different IT provider than your main one
(and/or a different infrastructure site).

DO in my experience is not the greatest company to work with; but at the same
time, your "incompetence" killed your company, not DO.

------
INTPenis
As someone who has spent the last 8 years in big enterprise it amuses me
slightly that small businesses are discovering this now. Because where I work
our customers often have contracts that make us accountable for every single
hour of downtime. Requiring us to pay customers penalty fees if their service
level is affected.

------
a2tech
You can not rely on one company. Let me repeat that. You. Can. Not. Rely. On.
One. Company. If you are you’re negligent and you deserve what you get. You
need your backups to be good and to be with a second provider. This is NOT
rocket science. Treat your company seriously if you expect to sell services.

------
vincedotca
Cloud Contingency When The Ban Hammer Drops, or, restic for the masses...
don't be caught without an offsite backup!

[https://revenni.com/cloud-contingency-when-the-ban-hammer-
dr...](https://revenni.com/cloud-contingency-when-the-ban-hammer-drops/)

------
csomar
I think, going forward, a data backup is not going to be enough. You’ll need a
full devops plan b/c where you just redeploy to another hosting provider.

Even better: run your infrastructure across multiple hosting providers with
something like consul. DO might cause slower service but not a death sentence.

~~~
jacobsenscott
A corollary is keep your infrastructure simple, and know how it works, so you
can stand it up anywhere. Web server/db/memcached can run anywhere easily and
is enough for almost every company out there. Are you sure you need {cool new
service}? Are you really really sure? Your database can probably do it.

------
sidlls
DO didn't handle this we'll, but a company that wants to serve "Fortune 500"
customers ought to have a more mature process for handling an outage like
this. The fact that they didn't makes it hard to view them as a serious,
credible business.

------
kangnkodos
The article should be called "I killed my company by having all my backups in
one spot."

------
zupreme
My reply (pasted from original tweet)

This is why you need, at minimum, a nightly mirror. Ideally you stack on top
of that a load-balancing device or service to redirect traffic in the event of
an outage from your primary farm.

Never go on holiday again until you have backups and failover.

#idothisforaliving

------
chj
Over the years, I have moved most of my VPS to vultr, and now only 1 droplet
left in DO. This story again proves my mistrust towards them. I don't like AWS
because it's too complicated for my use, but now I think we better have some
backups.

------
scarecrow112
Jeez this is scary! This very much sounds like the domain blocking incident
[1] that Zoho faced sometime back.

[1] -
[https://news.ycombinator.com/item?id=18059792](https://news.ycombinator.com/item?id=18059792)

------
beagle3
Amazon recently paid $200M or so for CloudEndure - I don’t know if they
support DO, but they do let you easily move between and among cloud providers,
which is something every cloud based company must plan for.

------
vincedotca
Cloud Contingency via restic.

[https://revenni.com/cloud-contingency-when-the-ban-hammer-
dr...](https://revenni.com/cloud-contingency-when-the-ban-hammer-drops/)

------
NoblePublius
Sounds like you killed your company by not keeping a local physical backup.

------
ksec
When will the wheel turn again, when we learn it is best to have at least one
copy of your Data in your hand? Not your Cloud VM, Server Rented somewhere,
but actually in a HDD within your own premises.

------
xvector
Ran a Minecraft server in college via DigitalOcean. Was my first experience
with a VPS. They locked it down after looking at the console logs and deciding
it wasn’t “educational” enough for them.

~~~
corobo
Were you using credit from something education related?

------
ericax
We had a similar incident with DO 8 days ago. It didn't kill our company, but
we got hit hard.

Our business is Dynalist, an online outliner app. Many of our users store all
their notes on Dynalist, so uptime is really important.

Starting 7 PM last Tuesday, we saw a slowdown in request handling. We filed a
ticket with DO 2 hours after that (we also posted our initial tweet to keep
our users informed:
[https://twitter.com/DynalistHQ/status/1131087411797270529](https://twitter.com/DynalistHQ/status/1131087411797270529)).

A few hours later, we started to experience full downtime. Still no reply from
DO. We filed another ticket with the prefix "[URGENT]". Still no reply.

We waited for 24 hours for their reply. We took turns taking naps because
we're only a 2-person team.

After 24 hours, we tweeted @ DO
([https://twitter.com/DynalistHQ/status/1131397013306847232](https://twitter.com/DynalistHQ/status/1131397013306847232)).
2 hours later we finally got a support person working on our ticket. We didn't
want to take it to the social media, but there doesn't seem to be any other
way at that point. DO doesn't have phone support, and us "bumping" our support
ticket didn't work either.

After 2 hours going back and forth on the support ticket and providing logs,
DO's support person identified the issue and offered to move us to a less
crowded server. They asked us what's a good time to do a manual migration if a
live migration fails, and we replied immediately saying whenever is fine
(we're experiencing downtime anyway).

We thought it's over, but we were so wrong.

They didn't reply in another 4 hours. That was 4 hours of more downtime.
Sometimes, CPU steal is down a bit and our server could catch up some
requests, although it would still take 10 seconds for our users to open
Dynalist. But most of the time, our web app was totally inaccessible. Watching
the charts on our dashboard go up and down felt like some of the hardest hours
of my life... mainly because there's nothing we could do.

4 hours in, I realized we had to post another angry tweet to get a solution.
There's nothing else to do other than trying to stay awake anyway. So I posted
another tweet:
[https://twitter.com/DynalistHQ/status/1131497962184564737](https://twitter.com/DynalistHQ/status/1131497962184564737)

This tweet didn't seem to work. Nothing happened in the next 3.5 hours and
things started to feel surreal. I didn't know how much longer this downtime is
going to last, and I didn't know what we were going to do about it.

At that time, it was 9:30 AM EDT and people were starting their day. We were
getting more and more emails and tweets asking what is going on and where are
their notes. A few customers were angry, but most were understanding and
supportive.

At 9:55 AM EDT, DO finally did the live migration a few minutes before the
time limit we gave them, which was 10 AM. That was the end of the incident;
CPU steal was down to < 1% and Dynalist was finally up again.

However, we couldn't trust DO any more. This weekend we're migrating to a
dedicated server provider which has phone and live chat support. DO is pretty
good for spinning up a $5 box quickly to test something, but we learned the
hard way we shouldn't rely on it.

Our postmortem post: [https://talk.dynalist.io/t/2019-05-22-dynalist-outage-
post-m...](https://talk.dynalist.io/t/2019-05-22-dynalist-outage-post-
mortem/4872)

------
juskrey
/writes down the note/ "never forget to replicate elementary backup
infrastructure on the second provider"

PS: Warheads have THREE copies of navigation systems

~~~
corobo
Calling my upcoming linode-digitalocean-vultr setup Project Warhead, thank you

------
proyb
I’m fortunate to use UpCloud provider with regional/dedicated technical
support being responsive within minutes, and resolve issues is quite speedy.

------
Epopeehief54
This is why I never rely on just one provider. Brutal to find this happen to
your company. I can’t imagine the heart sink feeling you had with this.

------
sergiotapia
DigitalOcean better respond, expeditiously (lol TI) or I'll never use them for
anything ever again. Extremely dangerous behavior.

------
EGreg
This is very sad, and why we all need to make the Web decentralized and own
our own data. Here, several Fortune 500 companies were relying on a two person
team, which itself was relying on a hosting provider with full discretion to
shut it all down. What could possibly go wrong?

[https://qbix.com/platform](https://qbix.com/platform) is one of many projects
working to tackle this. Tim Berners-Lee’s SOLID project and others are, also.

------
cavisne
Bookmarking this for next time there's a HN comment "why is Netflix using AWS,
they should just get some cheap VPS's from DO"

The rise of the public cloud is pretty fortunate for companies like DO, they
can make the same assumptions that legacy VPS companies did (most vm's sit
idle so oversell them massively, most customers will create a single vm and
nothing else) while branding themselves as having the same strong
infrastructure as AWS.

~~~
freehunter
Has anyone ever actually said Netflix should switch from AWS to "some cheap
VPS's from DO"? AWS offers a _lot_ that DO doesn't. They're hardly even
comparable.

------
RootKitBeerCat
The way Digital Ocean handled this, I am willing to bet the OP never has
trouble with his DO relationship again.

------
55555
I dont have a Twitter but maybe I'll get one. It seems like a cheat code to
getting real support in 2019.

------
warp_factor
I would love to hear what would all the advocates of going "100% in the cloud"
think about this.

------
beardedman
Sad sad day for DO. Will be moving my infrastructure ASAP that is with them.
Poorly POORLY handled.

------
brucemoose
"DigitalOcean Caused an Outage for Our Company" seems like a more accurate
title.

------
adem666
I now think twice about digitalocean, which was looking to me very nice
company before.

------
cybdestroyer
1st rule of businesses...don’t keep all your eggs in one basket. Is this
amateur hour?

------
chrshawkes
Should have went with Linode

------
unixhero
Okay. Brb. Deleting my DO account and instances. Migrating to elsewhere.

------
girmad
Their terms disclaim it, but I'd sue for consequential damages anyway.

IANAL

------
jcauldron
How disappointing of DO. Clearly shows their hypocrisy and lack of care for
customers with private tickets versus a public forum. Will not be recommending
DO.

------
citizenpaul
"Welcome to Costco. I love you"

------
delzennejc
This is really scary. Many developers are using Digital Ocean.

------
hyperpape
To everyone justifying the lack of a backup strategy by saying they’re a two
man show:

As far as sympathy goes, you’re not wrong. But you’re also justifying every
pain in the ass procurement process you’ve ever dealt with. Your attitude is
why so many companies won’t go near a two man shop.

~~~
tarsinge
I would agree if they were reckless, but here they had a backup strategy, just
the risk of a complete sudden shutdown from their provider was not taken into
it. This is a new risk that must be taken into account, and I’m not sure
procurement would have scanned this. Also the pain in procurement is usually
because it’s in fact on irrelevant arbitrary administrative things and not on
real risks like this.

------
ehutch79
Soooo... an automated script flagged them. they get themselves unflagged, and
proceed to do the same thing immediately. without any real confirmation that
they wouldn't be flagged again.

They've basically been flagged as abusing the system multiple times, and
they're surprised they had to kick up a storm to get themselves reactivated
again?

Not to mention, that process they need to run every couple of months, that
takes 1s, but they still need to parallelize over a bunch of vms, that's weird
and sounds like something that needs to be rearchitected at the very least.

~~~
harryh
My interpretation of that was that the batch job takes 1s per row or something
like that rather than 1s total. It obviously wouldn't make sense to spin up 10
nodes to turn a 1s job into 10 0.1s jobs.

Just a guess.

------
kwcts
If all your backups are in the same account of the same cloud provider then
you have no backups.

------
atonse
I’m certainly getting downvoted for this.

Not at all trying to blame the original posters and victims:

While they seem great for hobbyist and small business sites, there’s no way
I’d trust Fortune 500 client business to something like DigitalOcean. I just
don’t see the benefits over a more established operation like AWS, Azure or
GCP. Saving $50 here and there isn’t worth it.

~~~
sasasassy
Your "established operations" are more expensive and GCP is newer than DO.
Plus, getting the attention of someone to restore your account is probably
easier in DO than in a faceless giant company like Amazon, Microsoft or
Google.

~~~
busterarm
My AWS support tickets usually get answered in two minutes or less and I have
a dedicated rep who I can call whenever I want and we have regular check-ins
anyway.

You get what you pay for. We're even upgrading from this support plan to an
Enterprise account.

Startups can usually get enough in AWS credits that they probably could have
their entire first year of service _for free_.

------
tus87
Cool watching this in real time. Co-founder just intervened and got it back
up. Still....

~~~
Hamuko
Rather unfortunate customer service experience that you can't get help from
the actual support and you can't get help from the official Twitter account.
You just need to pray that your Twitter thread gets noticed hard enough for
the actual co-founder to notice it so they can make the call that saves your
company.

Also, even the co-founder doesn't seem to know exactly why the service was
suspended, even though he clearly managed to arrange things.

~~~
rc_kas
Right? I hate that justice in these cases relies on the person tweeting and
then that tweet hooking on and getting popular on Hacker News. So depressing.

------
robertAngst
Holy crap. Well that is a story worth sharing.

So... Never trust DigitalOcean for anything important.

------
creeble
Still waiting for response 22 hours in.

The response (and its timing) will determine whether we continue with DO as a
host for the (admittedly tiny) bit of infrastructure we host there.

And there had better be a reasonable response on HN if they value their HN-
reading customers; it is where we got the first recommendations for them years
ago.

~~~
detaro
? They have already responded, including _directly in this thread_.

~~~
creeble
>We haven't completed our investigation yet which will include details on the
timeline, decisions made by our systems, our people, and our plans to address
where we fell short.

I'm referring to _this_ response, i.e., the one where they explain why they
did what they did.

