That said, any company, especially one working with Fortune 500's, should have DB backups in at least two places. If they'd had the data, they could have spun up their service on a different hosting provider relatively easily.
DO has shown that their service is simply not suitable for some use cases: those that impose an "unreasonable" load on their infraestructure.
Even worse: they don't explicitly state what is considered "unreasonable". So, if your business is serious, you have to assume the worst-case scenario: DO can't be used for anything
Conclusion: Digital Ocean is just for testing, playing around, not suitable for production.
> Conclusion: Digital Ocean is just for testing, playing around, not suitable for production.
I think that's always been the standard position most people take. DO, Linode, etc are for personal side projects, hosting community websites, forums etc. They are not for running a real business on. Some people do, sure but if hosting cost is really that big a portion of your total budget you probably don't have a real business model yet anyway.
I am of the impression that people rent cloud services because they can expense the cost to someone else or because of an inability to plan long term or a need of low latency.
That's the kind of response you only send when you're convinced the customer is actually nefarious and you don't care about losing them. I wonder if there is any missing backstory here or if it really is just a case of mistaken analysis.
Used to work at Linode, let's flip this on it's head:
-When the majority of abuse support dealt with was people angrily calling and asking about the fraudulent charges on their cards for dozens of Lie-nodes you consider putting caps in place to reduce support burden and reduce chargebacks.
At the time at Linode, if it was a known customer, we could easily and quickly raise that limit and life is good.
I've always wondered how Amazon dealt with fraud/abuse at their scale.
I don't think DO was wrong here to have a lock, but the post lock procedure seemed to be the problem.
You can provide a helpful message with options for recourse without giving abuser's "clues." These are not somehow mutually exclusive. By your logic it makes sense to punish a marginal element at the expense of the majority.
I think the major issue there is process and management related. The account should have been reviewed by someone with the authority to activate it, and it definitely shouldn't have been flagged a second time. But looks like DO thought the user was malicious, and issues raised by malicious users don't get much information. The response was horrible though.
Sure. Hopefully it results in a change of policy, or at least a public statement of some kind. Everyone can't depend on the cofounder to come in and save them from bad automation.
Agreed, but it's not like the original poster had a huge platform, he just posted about it on Twitter. I may despise Twitter for a bunch of different reasons, but I can't deny it's a great tool for raising issues to companies.
> Account should be re-activated - need to look deeper into the way this was handled. It shouldn't have taken this long to get the account back up, and also for it not be flagged a second time.
So... he doesn't address what is the scariest part to me, the message that just says "Nope, we've decided never to give your account back, it's gone, the end."
I think it's entirely reasonable for companies to have that option. "You are doing something malicious and against the rules, you have been permanently removed". In this case, that option was misused, but I don't think the existence of that possiblity is inheritly surprising.
Access to your data should never be denied. Ever. It was not DigitalOcean's data. If you are a hosting provider, you can't ever hold customer data hostage or deny them access to it in any way.
Again, I must disagree. If DO genuinely believed that you were doing something malicious and that data was harmful or evil for you to own (e.g. other people's SSN, etc) then they are in the "right" to deny access to it. DO should not be forced to aid bad actors.
And, regardless of what DO should or should not do, they can do whatever they want with their own hard drives. You should structure your business accordingly.
If DO believed that there was criminal activity (notice I am not using the word "malicious"), they should have reported it to the police, and it that case they might be justified in securing a copy of the data. Blocking access would be justified only in the most extreme cases (such as if the data could be harmful to others, e.g. pictures of minors).
If there is no police report, then they are trying to act as police themselves, which I think is unacceptable. It is not their data.
Your argument that they can do whatever they want with their hard drives is indeed something I will take care to remember — I definitely would not want to host anything with DO.
> If DO genuinely believed that you were doing something malicious and that data was harmful or evil for you to own (e.g. other people's SSN, etc) then they are in the "right" to deny access to it.
The observant will note the particular corner you're backing into here -- that a business might be justified in denying access to code/data being used in literally criminal behavior -- is notably distinct from the general and likely much more common case.
> they can do whatever they want with their own hard drives.
Sure. But to the extent they take that approach, Digital Ocean or any other service is publicly declaring that however affordable they may be for prototyping, they're unsuitable for reliable applications.
Businesses that can be relied on generally instead offer terms of service and processes that don't really allow them to act arbitrarily.
> ... a business might be justified in denying access to code/data being used in literally criminal behavior...
I agree. Look at the absolutism of the comment I am replying to. My whole point is that there might be some nuance to the situation.
> ...Digital Ocean or any other service is publicly declaring that however affordable they may be for prototyping, they're unsuitable for reliable applications.
Again, I agree. Considering how cheap AWS, backblaze, and Google drive is, it is completely ridiculous to depend on any one single hosting service to hold all your data forever and never err.
At no point did DO ever believe this. This happened purely and simply because of usage patterns changing. It was done automatically and a bot locked them out. They should not be locking out data based on an automated script.
You seem to be accusing the aggrieved party of being a bad actor, when that is not the case.
For some practical, if extreme, examples: if a customer were to host a phishing site, or a site hosting CP, it would be grossly irresponsible (and likely even illegal) for the hosting provider to retain the customer's data after account suspension and allow them to download it.
And do what in the mean time? The legal system acts slowly. In the age of social media outrage, would you allow the headline "Digital Ocean knew they were serving criminals, and they didn't stop them" if you were CEO?
It's easy to be outraged when these systems and procedures are used against the innocent. That does not mean we should stop using rational thought. If someone is using DO to cause harm, then DO should (be allowed to) stop the harmful actions.
> Your account has been temporarily locked pending the result of an ongoing investigation.
You lock down the image, and let law enforcement do their thing. If law enforcement clear them, you then give the customer access to their data, perhaps for a short time before you cut them off as they seem to be a risky customer to have.
You don't unilaterally make the decision, you offload your responsibility onto the legal process.
>would you allow the headline "Digital Ocean knew they were serving criminals, and they didn't stop them" if you were CEO?
Seems to work just fine for AWS, Google and Cloudflare. In fact, counter to your argument, Cloudflare got in massive shit when they did decide to play God.
Reasonable to have the shutdown part of the option, yes.
At the very least, they should also provide ALL, as in every last byte, of data, schemas, code, setup etc. to the defenestrated customers. As in: "sorry, we cannot restart your account, but you can download a full backup of your system as of it's last running configuration here: -location xyz-, and all previous backups are available here: -location pdq-".
Anything less is simply malicious destruction of a customer's property.
If you violate a lease and get evicted, they don't keep your furniture & equipment unless you abandon it.
That's probably a reason to use containerization / other technologies so that you can spin up your services in a couple minutes on a different cloud provider.
You don't need to use containers for that.. all you have to do is set up a warm replica of the service with another provider. The fail over doesn't even have to be automatic, but that is the minimum amount of redundancy any production SaaS should have.
A "warm replica" is going to cost money though, while containerization allows you to not have anything spun up until the moment you need it, and then have it ready to go minutes / an hour later.
That is patently false, unless you plan on starting from a clean slate on the new environment. Any one who proposed such a solution as a business continuity practice to me would be immediately fired.
Containers solve the easy problem, which is how to make sure the dev environment matches the production environment. That is it.
Replicating TBs worth of data and making sure the replica is relatively up to date is the hard part. So is fail over and fail back. Basically everything but running the code/service/app, which is the part containers solve.
> Sure, a backup would have been a significant improvement, but still – a backup only protects against data loss and not against downtime.
Assuming you have data backup / recovery good to go, the downtime issue needs to be solved by getting your actual web application / logic up and running again. With something like docker-compose, you can do this on practically any provider with a couple of commands. Frontend, backend, load-balancer -- you name it, all in one command.
> Containers solve the easy problem, which is how to make sure the dev environment matches the production environment. That is it.
>That said, any company, especially one working with Fortune 500's, should have DB backups in at least two places.
They should have, at the very least, one DR site on a different provider in a different region that is replicated in real-time and ready to go live after an outage is confirmed by the IT Operations team (or automatically depending on what services are being run).
I feel for these guys, but that's not "all the proper backup procedures". I'm part of a three-man shop and storing backups in another place is the second thing you do immediately after having backups in the first place. Never mind being locked out by the company - what happens if the data centre burns to the ground?
More realistically they would have done backups inside DO and would still be locked out. Not many people actually do complete offsite backups to a completely different hosting provider, getting locked out of your account is usually just not a consideration. It’s unrealistic to expect this of a tiny startup.
>getting locked out of your account is usually just not a consideration
How many horror stories need to reach the front page of HN before people stop believing this? Getting locked out of your cloud provider is a very common failure mode, with catastrophic effects if you haven't planned for it. To my mind, it should be the first scenario in your disaster recovery plan.
Dumping everything to B2 is trivially easy, trivially cheap and gives you substantial protection against total data loss. It also gives you a workable plan for scenarios that might cause a major outage like "we got cut off because of a billing snafu" or "the CTO lost his YubiKey".
> How many horror stories need to reach the front page of HN before people stop believing this
Sounds like the opposite of the survivor bias. I don't believe it's any sort of common (though it does happen), even less that "it should be the first scenario in your disaster recovery plan"
Even if the stories we hear of account lockouts isn't typical, the absolute number of them that we see -- especially those (like this one) that appear to be locked (and re-locked) by automated processes -- should be cause for concern when setting up a new business on someone else's infrastructure.
If you plan for the "all of our cloud infrastructure has failed simultaneously and irreparably" scenario, you get a whole bunch of other disaster scenarios bundled in for free.
Whether it's normally a consideration or not, there are no meaningful barriers in terms of cost or effort, so it's totally realistic to expect it of a tiny startup.
Every week there's another article on HN about a tiny business being squished in the gears of a giant, automated platform. In some cases like app stores this is unavoidable, but there are plenty of hosting providers to choose from. People need to learn that this is something that can happen to you in today's world, and take reasonable steps to prepare for it.
I don't know, it seems simple enough to me. I have a server on DO hosting some toy-level projects, and IIRC it took me 15-30 min to set up a daily Cron job to dump the DB, tar it, and send it to S3, with a minimum-privilege account created for the purpose, so that any hacker that got in couldn't corrupt the backups. I'm not a CLI or Linux automation whiz, others could probably do it faster.
We don't know the structure of their DB and whether failover is important or not, so we don't know if the DB can be reliably pulled as a flat file backup and still have consistent data.
We also don't know how big the dataset is or how often it changes. Sometimes "backup over your home cable connection" just isn't practical.
Cron jobs can (and do) silently fail in all kinds of annoying and idiotic ways.
And as most of us are all too painfully aware, sometimes you make less-than-ideal decisions when faced with a long pipeline of customer bug reports and feature requests, vs. addressing the potential situation that could sink you but has like a 1 in 10,000 chance of happening any given day.
But yes, granted that as a quick stop-gap solution it's better than nothing.
> We also don't know how big the dataset is or how often it changes.
I'm going to take a stab at small and infrequently.
Every 2-3 months we had to execute a python script that takes 1s on all our data (500k rows), to make it faster we execute it in parallel on multiple droplets ~10 that we set up only for this pipeline and shut down once it’s done.
Yeah, probably. But we shouldn't be calling these guys out for not taking the "obvious and simple" solution when we aren't 100% certain that it would actually work. That happens too often on HN, and then sometimes the people involved pop in to explain why it's not so simple, and everyone goes "...oh." Seems like we should learn something from that. I've gone with "don't assume it's as simple as your ego would lead you to believe."
I suggested that solution because everyone is saying "they're only a two-man shop so they don't have the time and money to do things properly". Anyone has the time and money to do the above, and there's a 90% chance that it would save them in a situation like this.
Even if they lost some data, even if the backup silently failed and hadn't been running for two months, it's the difference between a large inconvenience and literally your whole business disappearing.
> "2-man teams generally don't prioritize backups" isn't an excuse for not prioritizing backups.
They had backups, but being arbitrarily cut-off from their hosting provider wasn't part of their threat model.
Isn't a big part of cloud marketing the idea that they're so good at redundancy, etc. that you don't need to attempt that stuff on your own? The idea that you have to spread your infrastructure across multiple cloud hosting providers, while smart, removes a lot of the appeal of using them at all. In any case, it's also probably too much infrastructure cost for a 2-man company.
> In any case, it's also probably too much infrastructure cost for a 2-man company.
keeping your production and your backups in the same cloud provider is the equivalent of keeping your backup tapes right next to the computer they're backing up. you're exposing them both to strongly correlated risks. you've just changed those risks from "fire, water, theft" to "provider, incompetence, security breach"
So what is the purpose of the massive level of redundancy that you are already paying for when you store a file on S3? I don’t think it’s terribly common for even medium sized companies to have a multi tier1 cloud backup strategy.
Back in the day, we used to talk a lot about how RAID is not a backup strategy. The modern version of that is that S3 is not a backup strategy.
> So what is the purpose of the massive level of redundancy that you are already paying for when you store a file on S3?
You're paying to try and ensure you don't need to restore from backups. Our data lives in an RDS cluster (where we pay for read replicas to try and make sure we don't need to restore from backups) and in S3 (where we pay for durable storage to try and make sure we don't need to restore from backups), but none of that is a backup!
If you're not on the AWS cloud S3 is a decent place to store your backups of course, but storing your backups on S3 when you're already on AWS is, at best, negligent, while treating the durability of S3 as a form of backups is simply absurd.
> I don’t think it’s terribly common for even medium sized companies to have a multi tier1 cloud backup strategy.
The company I work for is on the AWS cloud, so we store our backups on B2 instead. It's no more work than storing them on S3, and it means we still have our data in the event that we, for whatever reason, lose access to the data we have in S3. Who the hell doesn't have offsite backups?
> Back in the day, we used to talk a lot about how RAID is not a backup strategy. The modern version of that is that S3 is not a backup strategy.
This is not remotely the same thing. A RAID offers no protection against logical corruption from an erroneous script or even something as simple as running a truncate on the wrong table. Having a backup of your database in a different storage medium on the same cloud provider protects from vastly more failure modes.
> Who the hell doesn't have offsite backups?
No one. But S3 is already storing your data in three different data centers even if you have a single bucket in one region, and you also have SQL log replication to another region. Multi-region is as easy as enabling replication but that is only available within a single cloud provider (I can't replicate RDS to Google Cloud SQL, only to another RDS region). I would guess that a lot of people use that rather than using a different cloud provider.
> This is not remotely the same thing. A RAID offers no protection against logical corruption from an erroneous script [...] But S3 is already storing your data in three different data centers
That sounds like...the same argument?
A RAID array stores your data on multiple physical drives in the machine, but offers no protection against logical corruption (where you store the same bad data on every drive), destruction of the machine, or loss of access to the machine.
S3 stores your data in multiple physical data centres in the region, but offers no protection against logical corruption, downtime of the entire region, or loss of access to the cloud.
You can't count replicas as providing durability against any threat that will apply equally to all the replicas.
Storing a file on two tier1s would surely protect you from fire, water, theft no? Yet you will also be paying for all the extra copies Amazon and Google each make. I'm not disagreeing that this is the right strategy, just pointing out that the market offerings and trends don't support it.
> being arbitrarily cut-off from their hosting provider wasn't part of their threat model
Let's be fair: The threat model here is "lose access to our data".
This can happen in a number of ways, lost (or worse, leaked) password to the cloud provider, provider goes bankrupt, developer gets hacked, and a thousand other things.
Even if you trust your provider to have good uptime, there's really no excuse for not having any backups. Especially not if you're doing business with Fortune 500's.
Yeah I think this is what people are not getting. Redundant backups might mean "don't worry, in addition to backups on the instance, I have them going to a S3 bucket in region 1 and then also region 2 in case that region goes down," which of course doesn't protect from malicious activity from the provider. You certainly _should_ make sure you have backups locally available or in a secondary cloud provider but this is some hindsight.
As a startup, generally your secondary backup could literally be an external hard drive from best buy, or an infrequent access S3 bucket (or hell, even Glacier). No excuse, especially when "dealing with Fortune 500 companies".
Literally just push a postgres dump to S3 (or any other storage provider) once a night as a "just in case something stupid happens with my primary cloud provider". It'd take a couple hours tops to set up and cost next to nothing.
Most of the costs aren't from storage space, but compute power. We aren't talking about duplicating the whole infrastructure, just backing up the data. Disk space is dirt cheap.
Also, by "two places" I meant the live DB and one backup that's somewhere completely different. My wording may have been confusing.
They did have backups. Thats why I asumed you meant double backups. If you do cold storage you should have 3 copies due to possible corruptions. Sure tape drives are cheap but someone also have to run and check the backups.
I would say that it doubles the cost of backups, but using this math, we start with one copy plus one backup, and add a second backup; that means only a 50% increase.
That said, any company, especially one working with Fortune 500's, should have DB backups in at least two places. If they'd had the data, they could have spun up their service on a different hosting provider relatively easily.