Hacker News new | past | comments | ask | show | jobs | submit login
Why the “Digital Ocean killed my company” incident scares the hell out of me (checklyhq.com)
316 points by tnolet on June 6, 2019 | hide | past | favorite | 180 comments

I've had a DO mistake take down my site before, and when it was brought back up it had been reverted to several months prior. DO support was at a loss as to why this would have happened. I tried to restore from DO's backup service, but their backups had apparently stopped running several months prior as well. This was a major issue and could have easily been the death of my company, all because of a DO glitch.

But it wasn't, because every night I run a PG backup and copy it to AWS S3. I just had to download the backup from the other cloud vendor and restore it on my DO server.

Did DO fuck up? Yes. Did it cost me downtime? Yes. Was I mad? Yes, and I still am. But I still do business with DO because it costs ~half the price of a comparable EC2 instance, and writing a 10 line bash script to move my database backups to another cloud vendor isn't that hard. Storing that backup on S3 costs pennies per month.

I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse. When it comes down to it, you are solely responsible to your customers. They're not going to care whose fault it was, because it was your fault.

Don't trust any of your vendors.

> But I still do business with DO because it costs ~half the price of a comparable EC2 instance

I'm curious why people who don't have massive scaling and variability issues choose DO or AWS for their hosting.

For 34€/month, Hetzner will rent you a physical server (i7-6700, 64GB RAM, 2x512GB SSD, 1Gbit/s networking). That's a monster of a machine and can run most sites out there. And if you need more oomph and better reliability, rent three.

I looked at this many times and still check the pricing regularly, and it just doesn't make sense for me to switch from these physical servers to virtualized offerings, not in any foreseeable future.

Because hypothetically your instance should be hardware agnostic. If the physical hardware dies, it should automatically migrate to another physical server at their data center without your intervention. It will only look like an unexpected restart from your perspective.

That's something worth paying for. It would be more comparable to two servers at Hetzner with rapid failover. But even that is more involved since you have to set up the logic of when to failover.

Auto-scaling is a benefit of "cloud" services. But it isn't the only selling point. Hardware abstraction is perhaps bigger.

That's wishful thinking.

If there's state in that virtual machine, it's probably either stored on the physical host or in a SAN. If it's in the physical host, it has to be fished out of that machine or restored from a backup. If it's a SAN, you can lose your virtual machine if the SAN goes down.

I've seen both happen.

Actually, a single machine with RAID is surprisingly stable. A provider like Hetzner can switch out a faulty disk in less than five minutes, or switch a faulty motherboard/power supply in less than half an hour.

Virtualization on top of this does not increase stability, in my experience.

Now, some cloud providers do have more sophisticated distributed systems that do not have single points of failure, and that's a completely different story.

Of course, the software itself is a source of correlated failures, so even there you should never rely on a single cloud vendor.

There's a poster here on this site who commented some months ago that he had for years rented three servers, each on a different continent, each from a different provider, and never had downtime. That's engineering.

I receive on average about one notification from DigitalOcean each month that there's a problem with the host for one of my VMs and there may be some downtime while they migrate it to a new host. Usually the downtime is about the same as a reboot, sometimes there's no downtime at all.

This works in part because both Xen and KVM hypervisors (and possibly others) support live migrations, so it's altogether false that virtualization does not increase stability. Both DigitalOcean and Linode use KVM behind the scenes.

So, at lower cost than rented hardware, customers get staff whose job it is to constantly monitor systems for hardware failures and deal with them proactively in a way that minimizes downtime.

>If there's state in that virtual machine, it's probably either stored on the physical host or in a SAN

I think the point is to write your application in a way that there's not state in the VM. My VMs are disposable and in fact the way I do deployments is to spin up a new VM in DO and then assign it the floating IP for production. If things go haywire I can easily swap back the IP address to the known good VM.

Are you using managed databases then?

I am. Mostly for the backup and other DB administration that comes with it. Also use Spaces (DO's S3 equivalent) for any assets that aren't packaged with the code like user uploads.

Future wish: all VPSes have managed databases, object hosting and centralized logs as standard so as not to rely on local disk at all. There is a streaming backup protocol, which allows any database and queue vendors to stream to any location, and acts as a peer for fully synchronous replication. If your DO Postgres, Kafka and Mongo instances are killed, you have a Bakstream cluster in Linode which lost no data and allows you to restore in Vultr.

Thats not wishful thinking. I did it from day one with my startup with GKE. You can blow up any of my servers, and a new one gets spun up and dockers deployed while the glitch is load balanced around. All for only a few hours of work.

GKE has given me the infrastructure of a 10 million dollar company, with me as the IT guy doing 1h or less a week of IT work.

I would not be able to do this on a $35 hertzer machine.

GKE (and GCP in general) is a special beast when it comes to this. They support online migrations of instances and it's nearly seamless.

In fact, it's very commonly happening to instances, probably ones you have.

But if you're running on AWS with their EC2 instances for your application, then it's not more reliable than a single chunk of hardware in my experience (there are special cases, some hardware is just crappy)

Right. I'm using stock Ubuntu LTS with Docker and my software is in docker images or otherwise easily deployable. Changes to the host system configuration are documented, so re-creating a host from scratch is not a problem.

Things like automatic failover and high availability in general are cool, but your SLOs should be set according to your business model, e.g. they need to make business sense.

I do understand how people like VPSs at $5 or $10/month (I've used one like that myself for years), and I do understand why companies that need massive scalability use AWS. But I do not understand why people in the middle (which I think includes a significant fraction of this audience) assume that a provider like AWS is the best solution.

Anyway, those are just my thoughts. I've been running a business on Hetzner servers for years now. And this is not necessarily a recommendation to use Hetzner: there are many providers which will offer low-cost physical servers. I think in spite of the cloud hype, these make sense for many applications.

I agree with you. Many applications grow linerally or not at all. I personally run a couple hundred servers, but it's all VMware, chef and clustered. Frankly, most just sit at a stupid low load average, but I don't fuss it because the spare cycles aren't wasted and I can have hosts and storage fail with impunity.

Many ways of doing it. Virtualization is basically all direct hardware access for the stuff that counts.

But I digress. I am guilty of over building things. In part for fun. In part hubris. In part I like sleeping at night

I have several servers with Hetzner, including a failover setup. In the last 7 years I needed to use it 2-3 times for a very short time, maybe 2-3 hours. The worst accident was when two hard drivers and the controller failed at once. Still no problem rebuilding the array and get the old system running. So practically speaking, when you look at outages Google or Microsoft has, for me the total uptime over the years is comparable.

Most $10MM ARR startups are running on two medium T class aws instances behind a load balancer at $35/mo plus another small server for accessories (redis, message queue, etc) and then a sql backend. That's something like $200/mo for infrastructure, you never need to maintain the hardware and you have the option to autoscale at any time to support $100MM ARR.

Plus you get access to backups and all the other tools and products that are constantly being released. Not having to hire an IT guy at $1200/mo to maintain your physical server and tend to it's networking rules is a significant savings over a developer-managed aws/gcp instance. Plus all the cloud interfaces are standardized so you can just hire someone to come through and fix/upgrade your infrastructure.

Using a home-spun physical server means months if not years of undocumented tech debt for the next guy who comes along to have to maintain whatever kludges were installed long before he/she ever showed up. I just got done converting a bunch of physical servers over to the cloud and spent a year unwinding seven years of technical debt and now the dev/qa teams can actually spin up a new test environment in under 30 days (closer to 3 minutes).

You’re telling me that 10 million dollar “startup” companies can’t run 1 server without incurring “years” of tech debt? Sorry, that just sounds unbelievable.

Welcome to startup fallacies. You don't know what you don't know, and it's easier (but not cheaper) to follow a crowd.

They can, but let's be honest, people will be doing multiple roles when in the very early stage, so your lead programmer will be configuring the server ... probably won't be using a configuration management solution and whomever takes it over is going to have to comb through the various configuration files to work out what it's meant to be doing.

$200/mm for a 10MM ARR company. Can you add more info here. What sort of app/company are we taking about here? Genuinely curious

For production, $200/mo. Dev/testing seem to always cost 3x+ what production servers cost.

One was an IT management/automation SAAS company (I think they were $450/mo), another was backoffice SAAS company (I think they were $1000/mo but they were also hosting for an independent dealer of their SAAS so it was two production systems, and 10 years of tech debt), third is a live video consultancy SAAS that outsourced their video to a third party api service. None of those are actually $200/mo but realistically in the ballpark of under $1000/mo.

When I converted the company (which included 40 engineers and probably 12 qa engineers) to the cloud I got my hands slapped for converting the QA/test datacenter (not production) and the cost went from $3200/mo on bare metal to $6500/mo but we were able to squeeze that down to $4200/mo running about 10 always-on test/pre-prod environments and 2-3 on-demand test environments.

This is all enterprise space stuff where the utilization is low but the value of the service is high. If you're doing facebook for dogs and trying to turn a profit on ad revenue from high utilization it's probably not as effective/realistic.

B2B generally has a lot more cost in support / handholding, and B2C has much more cost in servers. The extreme examples are a B2B app with 100 or less total users vs an imgur clone with millions of users

I'm a bit late to the party, but Hetzner actually deleted all my VMs and backups. Their abuse team didn't like my name, despite me also having my name in my e-mail and my credit card info.

I got an e-mail that my account was flagged, but they didn't even give me a chance to tell them they had my name correct.

I responded to their email within 30 minutes, but everything had already been erased.

Luckily I had some stuff in github, but I will never use them again.

In our case, because it was easy to get thousands of dollars of free credit in Google cloud (the same appeared to be true for AWS). This went up to tens of thousands for Series A.

We had zero serving costs and would have continued to do so for years.

It makes perfect sense for Google/Amazon to incentivise startups to design themselves for their cloud.

>because it was easy to get thousands of dollars of free credit in Google cloud (the same appeared to be true for AWS)

Can you share how?

Because DO has their $5 and $10 / month plan, and that's really enough for me.

Nice, use Docker containers in it:


AWS Lightsail charges $3.50/month for a vm with 512MB RAM.

Hetzer charges €3.00/month for avm with 2GB RAM.

The CPU allocations on Lightsail are anemic compared to DO.

Are they? They come out fairly comparably here. I think Lightsail is a T3 instance "under the hood".


Notice the memory and file I/O, huge gulfs in performance. The focus of my comment though was about Amazon's restrictive leaky bucket CPU allocation scheme, you can burn through your allocation of high cpu use and then get throttled to 5% of peak use iirc for hours while this bucket refills. DO lets you use much more CPU comparatively.

True. For intermittent (less than 10% high CPU usage per period) use though, it's equivalent, and comes with other advantages.

That's exactly it for me too. The Hetzner plan costs about twice as much as my monthly DO bill. Not to mention I'm US based and all of my customers are US based so hosting a site in Germany or Finland doesn't make a whole lot of sense.

Since a year you can use their cloud product: https://www.hetzner.com/cloud starting at 2.5€

There are lots of "managed" features still missing but acceptable for a still young product.

And here I am a Canadian and my client base is Canadian. While not a problem just yet it is likely that PIPEDA/PHIPA would make it complicated to serve data from outside the country.

I am not a lawyer but the advice I have been given is murky. It’s much easier to stuck with DO which, at my usage, costs about the same number of $CAD as lightsail would in $USD

>For 34€/month, Hetzner ...

Hetzner is only available in EU and specifically Germany. And my bet most of these people want it within US.

I googled for "Hetzner USA" and found some similar offerings:



I think most people goes with DO not because it was just "affordable", it has also been around long enough and large enough to be trusted. Both OVH and Hetzner is the same as DO in this regards.

The two listed aren't really well known.

Hetzner is also a company that really wants you to send them a scan of your government ID / passport. Meanwhile e.g. prgmr takes bitcoin.

My negative record is "getting access to the server" until "hard disk died" is 12h. And no, this isn't 1 out of 1 server.

I'd never again run anything business-critical on Hetzner metal. I'm not even running my irc bouncer there, tbh. The cloud stuff I'm really happy though, higher uptimes than on most physical servers there.

Oh, and remind me again of the time where we regularly called them because of outages because our monitoring was better than theirs and the customer support people hadn't yet learned of the network outage...

Maybe because they had dozens of failures and DDOS outages? Why they have to keep their prices so low - because of reputation. Price is the only attractive thing in services like this.

I came here to say this.

Why people continue to think you can get everything on the cheap with services like this and not understand you're putting your business at risk by trying to cut corners on hosting and other services is beyond me.

I pay around $20-$25/month for Azure hosting for several e-commerce and mobile app clients I have. I told them up front you don't want to cut corners on this stuff, it will come back to haunt you later.

Sure, you'll save a few bucks year over year, but what happens in an outage? What happens when your clients can't order their stuff and you lose revenue for several days straight? Now the idea of saving a few hundred dollars a year evaporates as thousands of dollars are lost when your service provider takes a dump on you. Suddenly, it's not such a good deal anymore is it?

Because I can simply turn off everything and reduce my spend to storage alone. If I turn off that server I still pay for it

If you use a lambda on aws you get 1 million requests for free each month and after that factions of a penny per invoke.

That's wonderful. Now you can use those requests to read from or write to SQS, do DynamoDB stuff and play in the AWS walled garden where things seem cheap at first but things add up.

You can then keep telling yourself you have a stable, managed infrastructure and you're cutting on operational expenses and able to move fast while the engineers in your organization are working harder to make things work with generic services with shortcomings, making your product actually work in less than optimal ways.

Managed services feel like they are helping with operational costs but they have a cost when you're building your product.

I worked on projects that would have ended up being simpler and cheaper to operate on physical dedicated servers, but instead they are running on "ASG's that autoscale and have zero downtime" with "ALBs that send traffic to any and all hosts when all origins report unhealthy".

Things in the managed world is far less than ideal because one size doesn't fit all.

The issue with lambda is that at high loads it's a lot more expensive than using a server. I looked into doing something with lambda that would take about ~10b invocations/day and it was so expensive compared to just standing up a microservice.

So I've never deployed a lambda before and I'm sure at low loads it's fine, but I would be afraid of taking a dependency on lambda/functions in an application architecture.

For me, it’s because I have some latency specific loads that I can’t really easily do overseas sometimes.

I'm moving Azure, AWS and other VPS workloads to them right now. Very impressive service!

Because AWS et. al. are considerably more than just instances?

There's a whole host of services that make dev and deployment at lot easier.

Though instance costs are more expensive ... that's not the issue, the issue is 'TCO' (total cost of ownership) not 'server cost'.

If your team can move at 2x the speed on AWS, well that's worth a lot.

Cloud services are very useful, so the article raises a very legit concern/paradox.

The article addresses this in the "Why not just do x?" section.

> - Why were you only hosting one JUST ONE cloud provider?

> - Why didn't you have backups outside of JUST ONE cloud provider?

Telling them to just use two cloud hosting providers sounds easy on paper but when you're a cash-strapped startup it's a significant ask. Especially if you need to have the same architecture replicated across both, in order not to have extended downtime.

This is much bigger issue than having external backups (which every startup should still do).

But you don’t need the same architecture replicated across two providers, as your parent post demonstrated. Run Postgres in DO and upload your backups to S3/Azure Storage. This is both cheap and simple to the level that any startup could manage it. If and when it makes sense, you can investigate multi provider replication, but it might just be YAGNI.

These are strange times I guess? Maybe I'm a weird one-off, but I have 3 VPS providers for my hobby sites, my own DNS spread across all three (one remains shadow backup) and offline rsnapshot backups that automation can't touch.

I can't imagine doing less for a business. In my opinion, the business model and investment plan should account for all of this for at least the first five years.

Then save it locally as well? There really isn't an excuse to having DO or any provider being your sole backup.

Why is everyone talking about backups and not the actual infrastructure?

because that was the biggest issue in the original article, they didn't have backups of their data, so the company was dead. If they had had backups, they could have recreated the infrastructure.

Okay, but say you do get shutdown and you have your offsite backups, like most people. It's still a significant amount of work starting from scratch to rebuild and link all of your servers, dbs, services, etc. Plus all of the basic OS tweaking.

Losing your backups because your on one site is always dangerous. Getting completely kicked off your VPS systems with no warning or help is what's new and scary about this DO story.

They did have backups, only problem backups was also on DO. Backing up petabyte of data even on a Gbit connection is slow, and also cost more then a pennies if you for example back it up on AWS. Linus tech tips have a video on youtube explaining the problem.

The original post literally mentions "all our data (500k rows)" i.e. a data amount so small they could probably back it up to their cellphone every five minutes.

It's not really a "backup" if it's just on the same server/data center though. Moving data from one drive to another in the same machine, or from one machine to another in the same building, provides fault tolerance should the main drive/machine die, that's about it.

That's true whether the computer(s) is your own or someone else's.

It's all about threat model. And yes "our cloud provider accidentally deletes both our data and our backups" should be in the list.

No, it's not so difficult task and not even expensive. I don't think every project really needs it, though. But external db backups - they are ridiculously cheap and every project needs them.

Local backups? A few hard drives and a fiber connection can be cheap. Slow to restore, but not "killed my business" slow

It's because price of your site downtime is insignificant. DO (and many other "bulk vps" services) is only suitable for projects like this. When 1 hour of downtime cost much more than year of hosting, you will be ready to pay for high availability and duplication, replication, backups verification.

> I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse.

It's not unreasonable to be annoyed when a company you are paying specifically to do the things you're not familiar with, catastrophically fails to do them correctly.

Agreed, but "I'm annoyed" doesn't help your customers. "Don't worry, I'm restoring from last night's backup" sounds a lot nicer.

Again, your customer doesn't care what your excuse is, just like the dead company's founder doesn't care what DigitalOcean's excuse was. "Our processes failed" translates directly into "I failed".

I remember getting the e-mail about their backups not having been working for months. I was shocked at how negligent that was. I run my own backups now.

(Also their backup offering is so much more cumbersome to use than borg backup that it wasn't a great loss to manage these myself).

Still amazed that they didn't notice no one's backups had been running for so long.

FWIW I do the same I have backups of everything.

All that said in close to a decade (actually might be a decade) I've still yet to have any issue with linode (tempting the computer gods here) or their backups.

I remember when DO was first starting out and they used to knock their own stuff offline updating router tables.

Ever since I have a jaundiced view of their capabilities.


Linode backups start to fail when you have more then 3 million files on a drive.

When debugging this, they told me the reason, I did a snapshot that worked, and then they turned off backups because they were failing. They didn’t highlight that backups were now off.

Restores on the larger machines can take 5+ hours, and they will often report that the restore failed if you are restoring a drive that contains docker’s pipe files.

Ymmv, but I’m trying to get off linode.

Might I recommend prgmr.com instead? I use both DO and PRGMR (for resiliency).

> I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse.

If you're paying someone specifically to make backups for you, you should be able to trust that they've taken every reasonable measure to ensure that backups are actually being made and preserved.

You would think so, but then again you might be wrong. Better to be safe than sorry, no? I expected DO to make the backups I was paying them for and they didn't. Luckily I was making my own backups at the same time. Turns out my backups worked and theirs didn't. If I had just trusted them, I'd be out of business just like the dead company in question.

I'm not sure what's so confusing about "don't trust your vendors" but I've had to make this exact same reply way too many times.

Don't trust your vendors!

> But it wasn't, because every night I run a PG backup and copy it to AWS S3

Which is probably much easy on DO than AWS. For whatever reason AWS seems to go out of their way to make it as hard as possible to backup your RDS data to a separate AWS account.

Great point. I always run a separate backup through bash scripts that syncs with S3 bucket even though I also have the DO backups enabled as well. You always want 2 separate offsite backups for any real world production application.

If you don't mind me asking, what is the preferred method to backup to S3? Is it possible to scp or rsync a mysqldump to S3 or do you install aws tools on DO and run aws s3 cp as a scheduled job?

I use the awscli tools and aws s3 cp. I don't think it's possible to scp or rsync directly to S3, at least last I checked.


Not OP but I just use shell scripts. You could scp or rsync if you mounted s3 as a drive to your host, but that is not recommended as far as I know

I recommend looking into restic and/or rclone

check out restic which does dedup backups to s3, very easy to use

Cloud or not, you have to trust your vendor to deliver the service you paid them for. That's the whole reason for doing business with them.

The reason for doing business with them is because they say they will deliver the service you paid them for. But if they don't, what then? You can switch to another vendor, but without a backup you're starting from scratch.

You should hope they deliver the service you've paid for, but you should never trust that they will. Always plan for failure. That's the entire point of this whole "DO killed my company" saga.

I trust UpCloud more than other VPS provider because the technical support (Regional) has been very approachable and you know "who" they are. Beside, they promise 100% uptime SLA when your business is critical.


Do you not have a mirror copy of everything somewhere else? Or is that not possible in your case?

I only run one server, because like the dead company in question, I don't have the money or manpower to do a full high-availability solution. Just a database backup and my site's code in Gitlab, so DR failover is a matter of 'git clone' and 'pg_restore'.

It might take me a few hours to get back online and piece my docker containers back together but my customers can absorb a few hours of downtime as long as it's a very rare occasion. If they couldn't, I'd charge them enough to afford a real HA solution.

I am old enough to have lived in soviet times for a short while. One thing I remember from back then - in order to get things done, you had to know someone. Perhaps your aunt's friend worked in the politbureau or your mom's classmate was a friend of the director. Through connections like that you could get what you needed.

It was such a relief when times changed and regardless of who you were, you could start exchanging money for things and services.

This story and many others like it remind me of those times - it's back to who you know to get your account unlocked. I've never gotten a story on the front page of HN and probably have 5 twitter followers. What's my avenue of getting my data back?

This isn't just about DO either. Similar stories about google and other services are many to be found.

I'll toot my own theory and it is that nobody wants to pay for enough capable support staff, pay to keep the support staff at the ready often enough, pay support staff who want to stay in that role.

They want to automate that all away as much as possible.

The future is everyone who isn't somebody chatting about how they run their application on X... because they're mysteriously banned form Y and Z and the next guy talks about how he was banned on X and is on Z now... simply for that reason.

I would support your theory. Support staff don't scale as well or as fast as the technology they're supporting does.

I have often wondered about that for Google. Sure, they have bucket loads of cash, but could they even feasibly stand up a large enough support staff to handle all their platforms? They have so many services, across dozens of languages, and serve hundreds of millions of customers in different timezones.

Also, why would they want to? They're saving untold amounts of money by pushing the problem onto the consumer, and if it works 99% of the time then it's probably good enough.

> Also, why would they want to?

One of these days, they'll fuck up big time for many customers and get sued. They'll survive, but it'll cost a lot of money.

Especially for Google (and other life-or-death services for many people), the solution seems kind of simple: charge for support. Google terminated your GMail-Account because you logged in from Turkey? Pay $50 to get somebody to listen to your story and work with you on proving your identity. Would you rather change your email on all your accounts or pay $50?

There is Google Apps which is a paid service but most people gravitate to the free stuff.

Paying for the product isn't the issue, the company in the DO debacle was happy to pay - support is the issue. I pay Amazon to send me stuff, but I'd honestly be happy to pay them $5 to have my emails to support read & replied to by a person that doesn't have to rely on a low tier auto translate.

I understand that support is expensive, and especially for ad financed or cheap products, having an agent look into something can quickly cost more than you'll ever make off that customer. If the customer pays for support, support becomes a product and the company doesn't have to treat support as a profit-killer (that is, automate it, make it annoying for the customer so he avoids it, and staff it with the cheapest available labor creating a high fluctuation because of bad working conditions).

> the company in the DO debacle was happy to pay

Best as I can tell they are a early stage startup surviving off startup credits from DO. Also $5 doesn't go far towards support costs. After a minute or two troubleshooting and they are already losing money.

The problem is if google is shutting down access for you how will you use the support provided by google apps. its a support chat you have to be logged in...

This happened to me. Rather than have an email address that only works in one country, I stopped using the email address.

It was a pain.

> Pay $50 to get somebody to listen to your story and work with you on proving your identity.

Personally, I would understand this as extortion.

Doesn't have to be $50, but a few might make sense to to keep their support from getting clogged with stuff that could be googled. Some sort of "Rescue Me" emergency flare option.

It's billing for solving problems they have caused.

This is completely not Ok. If the idea is paid support for random problems, that's fine, if it's larger prices so it includes support, that's also fine. But if any company caused me a large damage and decided to ask money so they would reevaluate their actions, I'd go to the police.

If they did it willfully, sure. But even then, I'd still prefer paying them $50 then losing access to basically everything.

>I'll toot my own theory and it is that nobody wants to pay for enough capable support staff, pay to keep the support staff at the ready often enough, pay support staff who want to stay in that role.

The crazy thing is that it's really not that expensive. AWS, for example, offers 24x7 support w/ <1 hour response time for production issues starting at $100 a month. At worst it's 10% of your bill.

It's very doable, but sadly even skilled management (in my experience) tends to avoid / flee support organizations due to their low prestige / resources pattern that is across a lot of industries.

I fear it is something very doable, but I'm not sure there are many in leadership that can.

Hi treis - All DO customers have always received free, 24x7 Support. Of course, we're working on being as responsive as our customers deserve. A few months ago we also implemented a new paid Premier Support tier which features a live channel with 30-minute response times.

Here's some more information for anyone who's interested - https://www.digitalocean.com/support/#PremierSupport

Thanks, Zach, DigitialOcean Support

I've heard people observe how many support lines have transformed into these impeccably polite, but completely unempowered people--which is more frustrating than being on hold or getting stuck in an automated menu loop because their protocol deflects any anger away from that person (which isn't fair in any circumstance) and leaves you helpless.

> nobody wants to pay for enough capable support staff, pay to keep the support staff at the ready often enough, pay support staff who want to stay in that role

It's not about money. I worked at a unicorn that paid very high wages for support staff AND allowed them to work remotely and asynchronously from anywhere in the world. If you lived in Southeast Asia or Eastern Europe you'd make more than a local doctor just answering emails.

We still simply couldn't hire enough halfway intelligent people fast enough to keep up with the user growth. For each support person we'd hire, there'd be 10,000 new customers joining the same week. "Automating that all away" was the only tractable way to respond to people at all in a reasonable time frame. Obviously the support quality was awful.

I think it is about money... but also skilled leadership who understand how a good support organization works, how to find people and retain them.

I suspect though that due to the general trend to look at support as a "cost" most skilled leadership has moved on or just settled for poor support practices and etc.

Interestingly in my experience support teams that operate outside the home country of the company OFTEN have massive turnover issues, more than say domestic (wherever domestic is). There's a gap there that just never seems to fill in completely.

There is also something to be said for managing support in the sense that you don't have to talk to every 10,000 customers ;)

Hey terryf - Zach here from DO. If you ever encounter an issue that you don't feel is getting the proper attention feel free to reach out. We constantly engage with developers on social, which is primarily Twitter, and we don't look at follower count to determine who to reply to. If for whatever reason you don't get the help you need, you can always email me directly (first name at).

Thanks, Zach

Yeah, support by social media seems to be trend now; those who are loudest and have the largest online followings get support by shouting at companies on Twitter, everyone else is just basically told to pound sand.

Then again, it's not just support for services. Even abuse reports seem to be taken a lot more seriously it's some online 'influencer' at the receiving end rather than an average Joe.

This isn't a DigitalOcean option (I don't think anyways, I've never seen it), but on AWS/GCloud/Azure you can absolutely exchange money in order to get faster support SLAs, or even AWS solutions experts hanging out in your slack channel.

Of course, people who don't have money are forced to deal with greater hardships, yet that's always been a footnote to "regardless of who you were, you could start exchanging money".

Здравия желаю, тов. коллега! I had the very same thought: how big is Google (or DO or anything else) but still you had to know someone to be just safe from the the very service you use.

<completely my theory>

Personally, I blame the decoupling of the dollar to the gold standard and the distribution of newly minted dollars from the federal reserve into the well connected via banks and corporations that are controlled by a small social segment that all attend the same schools.

Once money became a thing that doesn't cost anything but the changing of zeros, then growth becomes a question not of how to produce something to get money, and but how to get the both the connection and the pedigree needed to receive cheap dollars.

This has created money silos, where the US aristocracy will take hundreds of billions in loses to capture a market and then extract value in monopolistic ways.

You can see this with google, facebook, amazon, etc.

It wasn't always the best companies that won. It was the best companies that had access to the vast capitals pools created out of thin air and who could promise to operate at the monopolistic scales the monied classes were aiming for from the beginning. That is why Ivy leaguers (whether drop outs or not) were chosen as the princelings. They are people who have a lot committed into the system and wouldn't dare betray it: they can be counted on to take things to their logical extreme.

I think also pertinent is the locking out of the middle and lower clases from growth fases of company creation (incentive angel investors and delaying IPOs + legally enforced discrimination against investors based on social class) - Oh and the pooling of legally stolen funds (pensions) into 'safe' stocks. Not to mention the legalization of bribery which has further accelerated our current state of legislative capture.

</completely my theory>

>"This has created money silos, where the US aristocracy will take hundreds of billions in loses to capture a market and then extract value in monopolistic ways."

I was curious about the above sentence. Who is the aristocracy in the context of startups? Are you referring to the VCs? If so don't the VCs not care about the "class" of the founders of a startup as long as they think there's money to be made from their company?

You are right, they don't care. If they did, someone would take their place. It's a system, not a conspiracy.

But an Ivy graduated person is a class of person that is highly committed to the system. There's a massive effort a whole family has to make for someone to be there.

If success is based on capturing the most market share the quickest so you can corner the market and extract value via monopolistic methods, then why go with anyone who could be a risk?

Edit: Re who is the aristocracy? There is no exact line. Society works on a gradient of privilege, from those who have access to resources with the least effort, to those who have access to resources with the most effort. This always happens. But I believe it is aggravated when money doesn't cost anything to produce because risk is then more important than reward.

Arguably Raisup should have been better prepared for this eventuality. But, I suspect that many of the people here talking about how their system is infallible made it that way because they, had time and/or money to spare working on it, have a very portable system, or have had something like this happen before.

I feel like HN comments are often sanctimonious to the point of ignorance. A solo dev, scrambling to get his project to the point where there is even the tiniest chance it will succeed, is likely to have a complex network of hardcoded filepaths, hostnames, and other magic numbers, strings, and config files that would make it very difficult to make portable without significant time and effort.

Or maybe the guy was an idiot. But possibly entertain the thought that maybe his situation wasn't exactly comparable to your eh?

> Or maybe the guy was an idiot. But possibly entertain the thought that maybe his situation wasn't exactly comparable to your eh?

I have never encountered a business that does not require backups. Not having a DR site as early in the game as they were might be excusable, but even then suggesting they get one is sound advice. It's an additional expense but it has a cheaper implementation cost while your infrastructure is small. Having a good DR plan is a selling point as well, this is the first thing I ask SaaS providers when I am evaluating them.

What is a total failure is not having backups on another provider. Before anyone says it's just a 2-person startup... surely that makes it easier? I can't imagine their DBs are all that big. Even just a three line cronjob and a Backblaze B2 ($0.005/GB) account could have secured their business continuity in the event their DO account didn't come back up.

>But also DO Spaces for our object storage with ~500k media and our database backups.

Based on that comment (from @w3nicolas on Twitter) they actually could have used the same script they used for DO Spaces to back up their DBs to S3 (both use the S3 protocol). The 500k image files should be relatively small and cheap to back up too, appears they are just logo thumbnails for the tracked companies (500k image files, copy says they track 450k companies).

If you are building infrastructure and you don't have IT Ops experience consult somebody who does because they will save you from yourself in scenarios like this. If you are a dev you could probably throw a rock and hit 5 friends with enough ops experience to tell you the importance of redundant backups.

I think this comment might read a bit more abrasively than I meant it, but I don't mean to kick the guy after he was down. If you are in a similar boat to RaisUp with your backups go fix it now. If you rely on any one vendor for business continuity that is a problem. This PSA is sponsored by salty sysadmins everywhere.

> I have never encountered a business that does not require backups.

Me neither. But unfortunately the most I've encountered don't either have truly full backups (missing stuff), actually working backup process (some issue with backup media, process, etc.) or any idea how to actually restore if something does happen.

Almost no one does drills to restore backups to some test system.

I can't talk about the other cases, but I can tell about one small startup I joined a long ago. The worst one. They did backups on a 3-disk RAID-5 server... with one disk broken and one with SMART warnings. Their backup process also failed to backup anything with too long path names, so in reality almost half of the data was actually missing! There was also some unicode issue that lost files with characters above 128 ASCII.

My first days went to just actually ensuring their data is backed up...

> I feel like HN comments are often sanctimonious to the point of ignorance. A solo dev, scrambling to get his project to the point where there is even the tiniest chance it will succeed, is likely to have a complex network of hardcoded filepaths, hostnames, and other magic numbers, strings, and config files that would make it very difficult to make portable without significant time and effort.

No, no, no. That's the "well, it's never happen before so why should I have planned for it?" attitude. If you are in a solo business it is the _most_ important time to pull out your SWOT chart be honest about threats and weaknesses. A solo company is literally the worse time to not consider weaknesses as you have the least amount of assets to cover calamity. His primary concern right now is to be able to reproduce his work on AWS or Azure. Period.

They've been in business for at least five years. They also decided to make a lot of noise about their Fortune 500 clients (multiple). If they're not well past "scrambling to succeed" then they failed on the business side as much as they did the technical side.

You don't have to actually have redundant providers and monthly-tested DR runs, etc. But you do need a plan. No matter how small.

If you are just a side project running locally, than plan me be as simple as "If my laptop falls in a river, get a new one and clone the repo again." As you start having production systems, it may still be "clone the repo again", but this time to actual servers.

As you start having customers rely on you, your plan should get more robust. Maybe you don't have a hot DR site set up where you can just flip a switch. But you should know who your backup host would be, and have an account ready with them. You should what steps would be needed to go from repo -> new provider.

None of this needs to be set up and tested ahead of time if you are just getting started. But if you have paying customers, you have to have thought about it. DR starts tiny and scales up, just like everything else.

> Do have some backups outside of your primary cloud provider. You'll sleep better.

This is really the takeaway from this whole incident. Even if the account doesn’t get banned you’re just one accidental action away from your database and backups disappearing simultaneously otherwise in most cases. Or if you are using a blob store there may not be any backup if the original is deleted (it’s like RAID not backups).

And have a separate DNS provider.

No reason to.

If your cloud provider does your DNS and you switch to a different cloud provider...

...then just set your registrar to point to the new DNS provider. You've still got total control.

Changing nameservers can take up to 48 hours. Things usually settle down after the first few hours, but a few odd users will continue to be directed to the old nameservers well into the next day. This is exactly the kind of intermittent, hard-to-diagnose issue that you don't want to have to worry about in the middle of a crisis such as "Digital Ocean killed my company."

Changing A records, on the other hand, can take as little time as you want depending on the TTL value.

That transition requires authorisation from the original provider. If they locked your account and don't pick up the phone, you won't get the move approved and won't be able to repoint your urls to a different cloud.

The GPs cloud provider is the registar.

It's not common to use those two words as synonyms. Although it is common to use the registar as a cloud provider, it may not be a good practice either (depends on the registar).

No it doesn't. Name servers are controlled by the registrar not the DNS provider.

By provider I mean the registrar (which typically provides both services). Moving to another registrar requires an authorization code, and good luck getting that on a short notice if your provider doesn't talk to you.

That's wrong.

More about the record tables probably

Absolutely, your domain name might be your most important asset. It’s silly to put that in the hands of a party that you might have to fight. Especially as there are hundreds of alternatives.

Does that really help? If your DNS provider screws it up, you're in deep trouble regardless of whether they were the same guys hosting your VMs or not.

In the original article this is referencing DO shut down/locked his account.

By having an alternate provider you spread your risk. And in this scenario you could update your DNS from the locked DO account to AWS/Linode/GCP whatevr

That's when you go to your registrar (you do have that separated off too, right?) and change your nameservers

Digital Ocean's statement on the events:


Lots of support related issues here.

As someone who worked in support for a long time it isn't surprising to see this play out and the whole "Support and Security Operations leadership will create new workflows to allow abuse-related events to leverage the 24/7 structure of Support." while probably a good action to take, is one of those things that you see in support time and again and so rarely do you see an appropriate response as much as a patch for "this one thing won't happen again".

Nobody cares to staff, fund, and support the support teams until something goes wrong, and then it is usually "new workflows" for support.

When I joined Digital Ocean, back when it was 2-3 months old, I hit a KVM bug in their stack with my network traffic that caused them to rate limit my droplet to 1Kbit/s because it kept taking the host machine down I guess(?), and then them ultimately terminating it without a ticket asking me to even investigate. That was enough to pretty much never consider using Digital Ocean for anything “production” ever again. They didn’t terminate my account, and apologized with credit, but wow.

Another thought: That was also back when support responded in a reasonable time frame, and they actually spoke on IRC and everything too.

I've got a stable setup:

  Vultr ========================== Linode
  Los Angeles ==================== Dallas
  Mastercard 1234 ============= Visa 4567
  VMs with custom Linux (ROOT/ZFS Debian)
  Replicating snapshots every 5 minutes
  CloudFlare + HAProxy Load Balancing
Total cost: 6 VMS at $20 each: $120, CloudFlare $20

Edit: backups to B2 and Wasabi. ~$10 month

Where's your DNS?

And good work on the credit cards -- too many people forget that point of failure.

Add a local backup. Then again, Im biased :D

Are you using a managed host for HAProxy? What if it goes down.

I have a failover setup that changes Cloudflare DNS when necessary.

Pretty much the standard "the cloud is just someone else's computer" issue: if you don't have the ability to reproduce your work somewhere else and your provider decides to go away, you're going to have a Bad Time(TM).

Also, for as cool as Digital Ocean is, their primary focus is on low-use, shared cloud resources. From my experience, they over subscribe CPU resources so "noisy neighbors" was a problem sometimes. They do not provide or work with people as if they have production services, and they don't seem to like people who really want to use their system. I would never use them for production unless the work was ephemeral.

What scares me the most these days is that a lot of vendors have a policy of "we're shutting you down, but we won't tell you why or how you can fix things so that it doesn't happen again".

I have had friends who have had their Twitter or Youtube accounts shut down with no explanation from the companies involved. Just recently, MailChimp shut down a brand new account for my SaaS (we had only 4 subscribers added, and hadn't sent out any email campaigns yet!) with a refusal to give us an explanation why. We set up another account and did everything exactly the same way (including tediously replicating lots of email templates manually) without the same thing happening. Why was the first account shut down? We will never know, it seems.

I am all for using AI to detect dark patterns and raising a flag, but I think these companies need to bring a more human aspect into the after effects. Surely a legitimate operator will always email in with a "WTF?" whereas a nefarious actor will simply automatically move on to the next thing?

Hi cyberferret. Two quick trys to provide some clarity for you here:

1. On your vendors don't say why they took action point. I think a lot of vendors, us included, worry about the bad actors knowing the _how_ of our algorithms because they will then have intelligence on what to try and work around to avoid detection. It's a constant adversarial chess match we adjust, they adjust.

2. As for the human element being needed for the after effects I totally agree. That said, I looked at our numbers and while many of the nefarious actors simply move to the next thing, a very large percentage of the, "WTF" replies are actually from bad actors hoping that is enough to be re-enabled and to keep going for a while. Our goal is to bring more people into the after effects and give them better information to make decisions and work with customers to avoid this kind of mess in the future.

I appreciate you giving an insight into the reasons behind your decisions Barry - I've learned a bit here myself, and can relate to the quandary that you must have to tiptoe around, in between being open and communicative, versus keeping your cards close to your chest. Thanks, and best of luck.

OP here. A dude in the devops Slack channel had a great comment on the multi-cloud thing. "People can't even get redundancy and failover working on one cloud, why bother with two."

If you are running a business then always have a good disaster recovery strategy. Split services between cloud providers and make sure your backups are good and offsite. You can do this all with opensource software if you dont want to pay for anything too (and if you have the staff that can do it).

If your business can be "killed" by your provider losing your data you have set yourself up to fail. I am not saying the provider is not to blame, but the burden should also be on the customer for not protecting themselves, what if DO burnt down?

We have yearly DR exercises, we have to bring up about 40 of our 120 production instances, it is a difficult task, but we know we can do it.

Never depend on your provider to save you, always make sure you have planed this out.

I really liked their blog response. I logged a support case about the incident, and a rep personally called me and spent 30m on video chat with me answering my questions. I don't think that would _ever_ happen from GCP.

> The initial account lock and resource power down resulted from an automated service that monitors for cryptocurrency mining activity (Droplet CPU loads and Droplet create behaviors).

So despite the fact that you can get the same or more resources at other providers who advertise their shared CPU resources for less than half that charged by DO I’m still buying a shared resource?

I think this is something that needs to be much more front and center. With AWS I can spin up as many “boxes” as I want on a moment’s notice. As far as anything on DO’s site seems to advertise I can do the same.

Turns out that’s not so. If they wanted to shut down an account because they suspected it was compromised then that’s one thing. Shutting it down simply because of suspected cryptocurrency activity... not so much. If I’m willing to pay for it then that’s what I’m willing to pay for and there should be no limitations.

Now that becomes a little more clear as you read...

> determine if automated action is warranted to minimize the impact of potential fraudulent high-cpu-loads on other customers.

So clearly we are not talking the same resources that were expected.

> So despite the fact that you can get the same or more resources at other providers who advertise their shared CPU resources for less than half that charged by DO I’m still buying a shared resource?

The account that got locked down triggered automatic checks that used payment history as a "these people are okay" check. They didn't have a payment history, they were running solely on credits, so got flagged. IOW, they hadn't paid for anything yet.

> With AWS I can spin up as many “boxes” as I want on a moment’s notice

You can, but they can also shut you down for "abuse". https://aws.amazon.com/premiumsupport/knowledge-center/aws-a...

You have 24 hours to respond and why would you have a business and not be on at least their business support plan?

That's a very good question for their CTO. Maybe we should ask him on Twitter.

I’m referring to AWS’s policy compared to DO’s.

If he were using AWS or Azure and he had the same set of issues, I wouldn’t blame him at all. But he is using a third rate cloud provider because it was cheaper and then he acts surprise that they don’t have the same level of competence and support as AWS or Azure.

Yeah I purposefully left out GCP. I wouldn’t trust their support anymore than DO.

> Why isn't everyone on bare metal?

Is this how we're referring to running on servers we have contractual control over these days? Part of managing risk is being sure your infrastructure will stay there, and if you don't have a contract, your risk is significantly higher.

Something tells me it scares DO too because I've been getting emails lately asking if I'm happy with their services.

It feels like the need for multi-cloud designs and infrastructure are becoming more sensible. If costs are best with one provider, by all means put 95% of your traffic there... But keeping 5% running smoothly on a second cloud, with a simple means to rebalance between them would be a huge win. It would mean the difference between an annoyance (argh, one of my cloud providers doesn't work) and a disaster (my sole cloud provider just ended my company). It would also save you the next time AWS or Google's cloud have an outage.

Are there any good resources for deploying multi cloud architectures?

For a business, especially a startup, the opportunity cost often makes multi-cloud infeasible. You'd rather spend that engineering time on delivering features or addressing tech debt. I'm of the opinion that multi-cloud doesn't make much sense if your monthly cloud spend is less than at least $500k.

Pick one of the strong public cloud providers such as AWS, Azure, or GCP and then be judicious about which PaaS offerings you use to avoid unnecessary vendor lock in unless it adds enough value to justify the lock in.

I’ve asked this in a previous thread but didn’t get much of an answer: what can a business do to proactively fend off issues with cloud providers shutting them down.

I’ve heard GCP (or was it AWS?) is less likely to shut you down if you bill on account vs using a credit card.

Are there any other things we can do to reduce problems?

Would asking for an account manager help? Would buying reserved instances help?

Is there a minimum spend that gets you more attention and priority? Eg 100/month vs 1k/month vs 10k/month etc?

Any other advice?

- Use billing methods which are less likely to be used by scammers (e.g. likely not credit card).

- Keep a steady balance.

- Have luck (sad but true).

Minimum spend and similar might help with customer service (if they can see your spend). But most problems start with automated services.

Don't switch cloud providers every half year because another one is slightly cheaper now (i.e. have longer running/existing accounts), might help.

But most important make sure that you don't get locked in by the could provider

Even a permanent ban from the could provider should just cause some, not to high, short term (mony) loss. This means:

- Do not ~use~ relay on cloud providers proprietary tec (sure use there management console, etc. but don't integrate it into your business to tightly).

- Have control over your DNS names so that you can map them to different IP addresses if needed.

- Make sure you can migrate to a different cloud provider in matters of hours/days/weeks (depending on what you use it for) at any time.

- Have backups outside of the could provider for all your stuff. - don't get trapped by doing backups on a different service of the same provider (e.g. Amazon) ;=)

In the end how much this costs you depend a lot on what you do and how you do it. And yes deciding and not doing any continous shot term investments taking the risk of loading everything is fine, you just should make the decision conscious.

Some good advice, thanks

I made this comment in an older thread but haven't gotten an answer on... the circumstances seem off to me:

The thing that has me scratching my head is how this chain of events unfolded.

I get that your fraud algorithm flagged it because of lack of established payment. how is this possible if what the tweet referred to as "locking us out of all of our backups and work"? surely an account history of any significance would have an established payment record. From their tweets they mention that they had 5 droplets and some storage of a not insignificant number of records (~500k) and that a script is required to be run every 2-3 months to process some data and that script spins up 10 droplets during that time. seems like it will take 13 hours to process the data based on row count and per record time.[0] I am struggling to see how they didn't have payment history. can you elaborate?

In addition another thing I'd think would help assuage fears of a complete lockout is some process where you can request and download the db or a snapshot of the virtual machine.

[0] https://twitter.com/w3Nicolas/status/1134529322902007809

Hi weaksauce... sorry I must have missed your earlier question.

The account had been live for some time and in that sense had history but because of credits it didn't have payment history. As some others have commented lots of startups use credits to get their business going and depending on your usage they can last you for quite a while. Payment history indicates a willingness and capability to make payments.

Part of the issue here was what triggers the algorithm used when looking at remaining credits, payment history (none), workload deltas (the new spin ups), and effective run rate (think of that as the amount of money they would be charged for the workload they were spinning up). The bug in this case was both simple and super impactful. Raisup did nothing wrong, everything right in fact. We just blew it.

Thanks for the comment on request for download of backups or snapshot. That is a great idea, I guess we just never expected to actually go shoot a real customer and the fraudsters don't ask for their data.

I appreciate the response. a followup question to that would be how they got enough credits to be running for that long without any payment?

DO has a startup program called Hatch: https://www.digitalocean.com/hatch/

That is the starting place for many folks.

So a startup was getting their hosting entirely for free, their paying customers were Fortune 500 companies, but they didn't have the money to pay for off-site backups.

What the hell was their cost then?

There is a heck of a difference between "eh, it only costs $5-$10" and between "the survival of my project relies on the site/service being under my control. If it was the latter, you go in having a plan B and a contingency Plan c.

GCP, AWS, DO, Azure... they all bank on that sweet combination of huge clients and clients who do not really need them but due to opportunity costs, cant be bothered to find viable, self-reliant alternatives.

I recently tried to sign up for a DO account to deploy a proof of concept app for my startup, and my account was immediately locked within minutes of opening it. I hadn't even spun up an instance yet. That combined with this means I'll be sticking with AWS for now, I suppose.

I want to love DO so much. I think they really do mean well and their products are great, but yeah, this kind of stuff scares me shitless for mission critical workloads.

Don’t use DigitalOcean for anything serious. My team and i frequently support DigitalOcean customers with issues like crashed applications failed builds and in many cases the challenge is how they (DigitalOcean) provide resource allocations for droplets. 1cpu != 1cpu which is vital for vm to perform at its best. AWS and GCP are better. i’d argue that most IaaS platforms are better than DIgitalOcean. You get what you pay for. i’m not a customer.

The one thing that the author forgot or didn't realize to mention is:

Test moving your environment between service providers in production.

You should be able to failover from one provider to another, in a predictable way. You should actually do so every few months so that you're well versed in what needs to happen. If you practice this, you will be prepared in case the absolute worst happens.

My account was banned two years ago just for accessing it from abroad. This is real, don't use DO.

They finally gave me temporary access even though they didn't believe who I was. This was the icing on the cake, they gave me access to a cluster, they could've given it to anyone.

It's cheap but seriously not worth it. They also have many more outages than other larger providers.

Any stories of being screwed by AWS in the similar manner?

That is, excluding the stories about the same account being used to sell thing on Amazon.coma and run a AWS setup. We know those are risky.

I think Scaleway and UpCloud are two biggest contender to DO. ( Not sure why Linode is not getting traction, may be its security incidents )

Before I moved to DO I used Linode. My day job is a security consultant, I switched from Linode because of their repeated security incidents.

'before switching to AWS completely'

why rely on a single party, you are putting all your eggs in a single basket again.

> most appeals are true bad actors

is that really the case?

Was there a post mortem by DO?

This was linked above


It doesn't explain any detail behind why there was radio silence until social media support stepped in, which is what I'm very curious about, but it does have a timeline of events and an apology.

I'm a one-man startup and I couldn't disagree with this more.

If you want to "play" startup, feel free to put all your eggs in one basket. If you actually want to build something meaningful and long-lasting, and you have Fortune 500 customers, take a breath and invest time in an infrastructure that can't be destroyed by a single rogue algorithm.

After all of the horror stories we've heard over the years, how is that time investment (and ability to sleep well at night) not worth it?

> take a breath and invest time in an infrastructure that can't be destroyed by a single rogue algorithm

As in? You can have your own cage, but somebody will have a cable that connects to the hardware you own. If they pull that cable because a single rogue algorithm told them that you're most likely abusing their service, that's it.

I do not have all the details, so caveat lector, but apparently it was due to spinning up a dozen droplets and running a batch job.

If this is considered “abuse” Digital Ocean is not a platform you should be running a business on.

If you can’t afford to pay for reliable big boy infrastructure, you might need to rethink your business plan or how much you charge.

All this Digital Ocean talk... I spun up some droplets to play with a few months ago and liked what I saw and then "deactivated" them or turned them off.

What I did not expect to see a few days ago was the bill for the stopped droplets. Not just a little for the dormant disks I have had but seemingly full-prices for the entire droplets, as if they were all still "active". Nonetheless I deleted all my droplets, discarding whatever I had setup a few months ago, out of pure frustration. I never plan to return.

AWS may take me a little more time to setup and their interface may not be as pretty, but they will at least bill me fairly.

I just tried to turn off one of my droplets and had to click "okay" on a message that said "You will still be billed for Droplets that are turned off because your disk space, CPU, RAM, and IP address will still be reserved."

If you clicked okay without actually reading the message, that is entirely your own fault.

While I don't remember any such message, after a few months it's certainly felt wrong and I've felt the sting. It'll be something I take into consideration whenever I shop around again.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact