Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had a DO mistake take down my site before, and when it was brought back up it had been reverted to several months prior. DO support was at a loss as to why this would have happened. I tried to restore from DO's backup service, but their backups had apparently stopped running several months prior as well. This was a major issue and could have easily been the death of my company, all because of a DO glitch.

But it wasn't, because every night I run a PG backup and copy it to AWS S3. I just had to download the backup from the other cloud vendor and restore it on my DO server.

Did DO fuck up? Yes. Did it cost me downtime? Yes. Was I mad? Yes, and I still am. But I still do business with DO because it costs ~half the price of a comparable EC2 instance, and writing a 10 line bash script to move my database backups to another cloud vendor isn't that hard. Storing that backup on S3 costs pennies per month.

I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse. When it comes down to it, you are solely responsible to your customers. They're not going to care whose fault it was, because it was your fault.

Don't trust any of your vendors.



> But I still do business with DO because it costs ~half the price of a comparable EC2 instance

I'm curious why people who don't have massive scaling and variability issues choose DO or AWS for their hosting.

For 34€/month, Hetzner will rent you a physical server (i7-6700, 64GB RAM, 2x512GB SSD, 1Gbit/s networking). That's a monster of a machine and can run most sites out there. And if you need more oomph and better reliability, rent three.

I looked at this many times and still check the pricing regularly, and it just doesn't make sense for me to switch from these physical servers to virtualized offerings, not in any foreseeable future.


Because hypothetically your instance should be hardware agnostic. If the physical hardware dies, it should automatically migrate to another physical server at their data center without your intervention. It will only look like an unexpected restart from your perspective.

That's something worth paying for. It would be more comparable to two servers at Hetzner with rapid failover. But even that is more involved since you have to set up the logic of when to failover.

Auto-scaling is a benefit of "cloud" services. But it isn't the only selling point. Hardware abstraction is perhaps bigger.


That's wishful thinking.

If there's state in that virtual machine, it's probably either stored on the physical host or in a SAN. If it's in the physical host, it has to be fished out of that machine or restored from a backup. If it's a SAN, you can lose your virtual machine if the SAN goes down.

I've seen both happen.

Actually, a single machine with RAID is surprisingly stable. A provider like Hetzner can switch out a faulty disk in less than five minutes, or switch a faulty motherboard/power supply in less than half an hour.

Virtualization on top of this does not increase stability, in my experience.

Now, some cloud providers do have more sophisticated distributed systems that do not have single points of failure, and that's a completely different story.

Of course, the software itself is a source of correlated failures, so even there you should never rely on a single cloud vendor.

There's a poster here on this site who commented some months ago that he had for years rented three servers, each on a different continent, each from a different provider, and never had downtime. That's engineering.


I receive on average about one notification from DigitalOcean each month that there's a problem with the host for one of my VMs and there may be some downtime while they migrate it to a new host. Usually the downtime is about the same as a reboot, sometimes there's no downtime at all.

This works in part because both Xen and KVM hypervisors (and possibly others) support live migrations, so it's altogether false that virtualization does not increase stability. Both DigitalOcean and Linode use KVM behind the scenes.

So, at lower cost than rented hardware, customers get staff whose job it is to constantly monitor systems for hardware failures and deal with them proactively in a way that minimizes downtime.


>If there's state in that virtual machine, it's probably either stored on the physical host or in a SAN

I think the point is to write your application in a way that there's not state in the VM. My VMs are disposable and in fact the way I do deployments is to spin up a new VM in DO and then assign it the floating IP for production. If things go haywire I can easily swap back the IP address to the known good VM.


Are you using managed databases then?


I am. Mostly for the backup and other DB administration that comes with it. Also use Spaces (DO's S3 equivalent) for any assets that aren't packaged with the code like user uploads.


Future wish: all VPSes have managed databases, object hosting and centralized logs as standard so as not to rely on local disk at all. There is a streaming backup protocol, which allows any database and queue vendors to stream to any location, and acts as a peer for fully synchronous replication. If your DO Postgres, Kafka and Mongo instances are killed, you have a Bakstream cluster in Linode which lost no data and allows you to restore in Vultr.


Thats not wishful thinking. I did it from day one with my startup with GKE. You can blow up any of my servers, and a new one gets spun up and dockers deployed while the glitch is load balanced around. All for only a few hours of work.

GKE has given me the infrastructure of a 10 million dollar company, with me as the IT guy doing 1h or less a week of IT work.

I would not be able to do this on a $35 hertzer machine.


GKE (and GCP in general) is a special beast when it comes to this. They support online migrations of instances and it's nearly seamless.

In fact, it's very commonly happening to instances, probably ones you have.

But if you're running on AWS with their EC2 instances for your application, then it's not more reliable than a single chunk of hardware in my experience (there are special cases, some hardware is just crappy)


Right. I'm using stock Ubuntu LTS with Docker and my software is in docker images or otherwise easily deployable. Changes to the host system configuration are documented, so re-creating a host from scratch is not a problem.

Things like automatic failover and high availability in general are cool, but your SLOs should be set according to your business model, e.g. they need to make business sense.

I do understand how people like VPSs at $5 or $10/month (I've used one like that myself for years), and I do understand why companies that need massive scalability use AWS. But I do not understand why people in the middle (which I think includes a significant fraction of this audience) assume that a provider like AWS is the best solution.

Anyway, those are just my thoughts. I've been running a business on Hetzner servers for years now. And this is not necessarily a recommendation to use Hetzner: there are many providers which will offer low-cost physical servers. I think in spite of the cloud hype, these make sense for many applications.


I agree with you. Many applications grow linerally or not at all. I personally run a couple hundred servers, but it's all VMware, chef and clustered. Frankly, most just sit at a stupid low load average, but I don't fuss it because the spare cycles aren't wasted and I can have hosts and storage fail with impunity.

Many ways of doing it. Virtualization is basically all direct hardware access for the stuff that counts.

But I digress. I am guilty of over building things. In part for fun. In part hubris. In part I like sleeping at night


I have several servers with Hetzner, including a failover setup. In the last 7 years I needed to use it 2-3 times for a very short time, maybe 2-3 hours. The worst accident was when two hard drivers and the controller failed at once. Still no problem rebuilding the array and get the old system running. So practically speaking, when you look at outages Google or Microsoft has, for me the total uptime over the years is comparable.


Most $10MM ARR startups are running on two medium T class aws instances behind a load balancer at $35/mo plus another small server for accessories (redis, message queue, etc) and then a sql backend. That's something like $200/mo for infrastructure, you never need to maintain the hardware and you have the option to autoscale at any time to support $100MM ARR.

Plus you get access to backups and all the other tools and products that are constantly being released. Not having to hire an IT guy at $1200/mo to maintain your physical server and tend to it's networking rules is a significant savings over a developer-managed aws/gcp instance. Plus all the cloud interfaces are standardized so you can just hire someone to come through and fix/upgrade your infrastructure.

Using a home-spun physical server means months if not years of undocumented tech debt for the next guy who comes along to have to maintain whatever kludges were installed long before he/she ever showed up. I just got done converting a bunch of physical servers over to the cloud and spent a year unwinding seven years of technical debt and now the dev/qa teams can actually spin up a new test environment in under 30 days (closer to 3 minutes).


You’re telling me that 10 million dollar “startup” companies can’t run 1 server without incurring “years” of tech debt? Sorry, that just sounds unbelievable.


Welcome to startup fallacies. You don't know what you don't know, and it's easier (but not cheaper) to follow a crowd.


They can, but let's be honest, people will be doing multiple roles when in the very early stage, so your lead programmer will be configuring the server ... probably won't be using a configuration management solution and whomever takes it over is going to have to comb through the various configuration files to work out what it's meant to be doing.


$200/mm for a 10MM ARR company. Can you add more info here. What sort of app/company are we taking about here? Genuinely curious


For production, $200/mo. Dev/testing seem to always cost 3x+ what production servers cost.

One was an IT management/automation SAAS company (I think they were $450/mo), another was backoffice SAAS company (I think they were $1000/mo but they were also hosting for an independent dealer of their SAAS so it was two production systems, and 10 years of tech debt), third is a live video consultancy SAAS that outsourced their video to a third party api service. None of those are actually $200/mo but realistically in the ballpark of under $1000/mo.

When I converted the company (which included 40 engineers and probably 12 qa engineers) to the cloud I got my hands slapped for converting the QA/test datacenter (not production) and the cost went from $3200/mo on bare metal to $6500/mo but we were able to squeeze that down to $4200/mo running about 10 always-on test/pre-prod environments and 2-3 on-demand test environments.

This is all enterprise space stuff where the utilization is low but the value of the service is high. If you're doing facebook for dogs and trying to turn a profit on ad revenue from high utilization it's probably not as effective/realistic.


B2B generally has a lot more cost in support / handholding, and B2C has much more cost in servers. The extreme examples are a B2B app with 100 or less total users vs an imgur clone with millions of users


I'm a bit late to the party, but Hetzner actually deleted all my VMs and backups. Their abuse team didn't like my name, despite me also having my name in my e-mail and my credit card info.

I got an e-mail that my account was flagged, but they didn't even give me a chance to tell them they had my name correct.

I responded to their email within 30 minutes, but everything had already been erased.

Luckily I had some stuff in github, but I will never use them again.


In our case, because it was easy to get thousands of dollars of free credit in Google cloud (the same appeared to be true for AWS). This went up to tens of thousands for Series A.

We had zero serving costs and would have continued to do so for years.

It makes perfect sense for Google/Amazon to incentivise startups to design themselves for their cloud.


>because it was easy to get thousands of dollars of free credit in Google cloud (the same appeared to be true for AWS)

Can you share how?



Because DO has their $5 and $10 / month plan, and that's really enough for me.



Nice, use Docker containers in it:

https://www.youtube.com/watch?v=z525kfneC6E


AWS Lightsail charges $3.50/month for a vm with 512MB RAM.

Hetzer charges €3.00/month for avm with 2GB RAM.


The CPU allocations on Lightsail are anemic compared to DO.


Are they? They come out fairly comparably here. I think Lightsail is a T3 instance "under the hood".

https://joshtronic.com/2019/06/03/vps-showdown-digitalocean-...


Notice the memory and file I/O, huge gulfs in performance. The focus of my comment though was about Amazon's restrictive leaky bucket CPU allocation scheme, you can burn through your allocation of high cpu use and then get throttled to 5% of peak use iirc for hours while this bucket refills. DO lets you use much more CPU comparatively.


True. For intermittent (less than 10% high CPU usage per period) use though, it's equivalent, and comes with other advantages.


That's exactly it for me too. The Hetzner plan costs about twice as much as my monthly DO bill. Not to mention I'm US based and all of my customers are US based so hosting a site in Germany or Finland doesn't make a whole lot of sense.


Since a year you can use their cloud product: https://www.hetzner.com/cloud starting at 2.5€

There are lots of "managed" features still missing but acceptable for a still young product.


And here I am a Canadian and my client base is Canadian. While not a problem just yet it is likely that PIPEDA/PHIPA would make it complicated to serve data from outside the country.

I am not a lawyer but the advice I have been given is murky. It’s much easier to stuck with DO which, at my usage, costs about the same number of $CAD as lightsail would in $USD


>For 34€/month, Hetzner ...

Hetzner is only available in EU and specifically Germany. And my bet most of these people want it within US.


I googled for "Hetzner USA" and found some similar offerings:

https://www.wholesaleinternet.net/dedicated/

https://www.nocix.net/dedicated/


I think most people goes with DO not because it was just "affordable", it has also been around long enough and large enough to be trusted. Both OVH and Hetzner is the same as DO in this regards.

The two listed aren't really well known.


Hetzner is also a company that really wants you to send them a scan of your government ID / passport. Meanwhile e.g. prgmr takes bitcoin.


My negative record is "getting access to the server" until "hard disk died" is 12h. And no, this isn't 1 out of 1 server.

I'd never again run anything business-critical on Hetzner metal. I'm not even running my irc bouncer there, tbh. The cloud stuff I'm really happy though, higher uptimes than on most physical servers there.

Oh, and remind me again of the time where we regularly called them because of outages because our monitoring was better than theirs and the customer support people hadn't yet learned of the network outage...


Maybe because they had dozens of failures and DDOS outages? Why they have to keep their prices so low - because of reputation. Price is the only attractive thing in services like this.


I came here to say this.

Why people continue to think you can get everything on the cheap with services like this and not understand you're putting your business at risk by trying to cut corners on hosting and other services is beyond me.

I pay around $20-$25/month for Azure hosting for several e-commerce and mobile app clients I have. I told them up front you don't want to cut corners on this stuff, it will come back to haunt you later.

Sure, you'll save a few bucks year over year, but what happens in an outage? What happens when your clients can't order their stuff and you lose revenue for several days straight? Now the idea of saving a few hundred dollars a year evaporates as thousands of dollars are lost when your service provider takes a dump on you. Suddenly, it's not such a good deal anymore is it?


Because I can simply turn off everything and reduce my spend to storage alone. If I turn off that server I still pay for it


If you use a lambda on aws you get 1 million requests for free each month and after that factions of a penny per invoke.


That's wonderful. Now you can use those requests to read from or write to SQS, do DynamoDB stuff and play in the AWS walled garden where things seem cheap at first but things add up.

You can then keep telling yourself you have a stable, managed infrastructure and you're cutting on operational expenses and able to move fast while the engineers in your organization are working harder to make things work with generic services with shortcomings, making your product actually work in less than optimal ways.

Managed services feel like they are helping with operational costs but they have a cost when you're building your product.

I worked on projects that would have ended up being simpler and cheaper to operate on physical dedicated servers, but instead they are running on "ASG's that autoscale and have zero downtime" with "ALBs that send traffic to any and all hosts when all origins report unhealthy".

Things in the managed world is far less than ideal because one size doesn't fit all.


The issue with lambda is that at high loads it's a lot more expensive than using a server. I looked into doing something with lambda that would take about ~10b invocations/day and it was so expensive compared to just standing up a microservice.

So I've never deployed a lambda before and I'm sure at low loads it's fine, but I would be afraid of taking a dependency on lambda/functions in an application architecture.


For me, it’s because I have some latency specific loads that I can’t really easily do overseas sometimes.


I'm moving Azure, AWS and other VPS workloads to them right now. Very impressive service!


Because AWS et. al. are considerably more than just instances?

There's a whole host of services that make dev and deployment at lot easier.

Though instance costs are more expensive ... that's not the issue, the issue is 'TCO' (total cost of ownership) not 'server cost'.

If your team can move at 2x the speed on AWS, well that's worth a lot.

Cloud services are very useful, so the article raises a very legit concern/paradox.


The article addresses this in the "Why not just do x?" section.

> - Why were you only hosting one JUST ONE cloud provider?

> - Why didn't you have backups outside of JUST ONE cloud provider?

Telling them to just use two cloud hosting providers sounds easy on paper but when you're a cash-strapped startup it's a significant ask. Especially if you need to have the same architecture replicated across both, in order not to have extended downtime.

This is much bigger issue than having external backups (which every startup should still do).


But you don’t need the same architecture replicated across two providers, as your parent post demonstrated. Run Postgres in DO and upload your backups to S3/Azure Storage. This is both cheap and simple to the level that any startup could manage it. If and when it makes sense, you can investigate multi provider replication, but it might just be YAGNI.


These are strange times I guess? Maybe I'm a weird one-off, but I have 3 VPS providers for my hobby sites, my own DNS spread across all three (one remains shadow backup) and offline rsnapshot backups that automation can't touch.

I can't imagine doing less for a business. In my opinion, the business model and investment plan should account for all of this for at least the first five years.


Then save it locally as well? There really isn't an excuse to having DO or any provider being your sole backup.


Why is everyone talking about backups and not the actual infrastructure?


because that was the biggest issue in the original article, they didn't have backups of their data, so the company was dead. If they had had backups, they could have recreated the infrastructure.


Okay, but say you do get shutdown and you have your offsite backups, like most people. It's still a significant amount of work starting from scratch to rebuild and link all of your servers, dbs, services, etc. Plus all of the basic OS tweaking.

Losing your backups because your on one site is always dangerous. Getting completely kicked off your VPS systems with no warning or help is what's new and scary about this DO story.


They did have backups, only problem backups was also on DO. Backing up petabyte of data even on a Gbit connection is slow, and also cost more then a pennies if you for example back it up on AWS. Linus tech tips have a video on youtube explaining the problem.


The original post literally mentions "all our data (500k rows)" i.e. a data amount so small they could probably back it up to their cellphone every five minutes.


It's not really a "backup" if it's just on the same server/data center though. Moving data from one drive to another in the same machine, or from one machine to another in the same building, provides fault tolerance should the main drive/machine die, that's about it.

That's true whether the computer(s) is your own or someone else's.


It's all about threat model. And yes "our cloud provider accidentally deletes both our data and our backups" should be in the list.


No, it's not so difficult task and not even expensive. I don't think every project really needs it, though. But external db backups - they are ridiculously cheap and every project needs them.


Local backups? A few hard drives and a fiber connection can be cheap. Slow to restore, but not "killed my business" slow


It's because price of your site downtime is insignificant. DO (and many other "bulk vps" services) is only suitable for projects like this. When 1 hour of downtime cost much more than year of hosting, you will be ready to pay for high availability and duplication, replication, backups verification.


> I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse.

It's not unreasonable to be annoyed when a company you are paying specifically to do the things you're not familiar with, catastrophically fails to do them correctly.


Agreed, but "I'm annoyed" doesn't help your customers. "Don't worry, I'm restoring from last night's backup" sounds a lot nicer.

Again, your customer doesn't care what your excuse is, just like the dead company's founder doesn't care what DigitalOcean's excuse was. "Our processes failed" translates directly into "I failed".


I remember getting the e-mail about their backups not having been working for months. I was shocked at how negligent that was. I run my own backups now.

(Also their backup offering is so much more cumbersome to use than borg backup that it wasn't a great loss to manage these myself).

Still amazed that they didn't notice no one's backups had been running for so long.


FWIW I do the same I have backups of everything.

All that said in close to a decade (actually might be a decade) I've still yet to have any issue with linode (tempting the computer gods here) or their backups.

I remember when DO was first starting out and they used to knock their own stuff offline updating router tables.

Ever since I have a jaundiced view of their capabilities.


N=1

Linode backups start to fail when you have more then 3 million files on a drive.

When debugging this, they told me the reason, I did a snapshot that worked, and then they turned off backups because they were failing. They didn’t highlight that backups were now off.

Restores on the larger machines can take 5+ hours, and they will often report that the restore failed if you are restoring a drive that contains docker’s pipe files.

Ymmv, but I’m trying to get off linode.


Might I recommend prgmr.com instead? I use both DO and PRGMR (for resiliency).


> I don't see "I've never run a business before" as a valid excuse, nor do I see "the cloud vendor is better equipped to handle backups" as a valid excuse.

If you're paying someone specifically to make backups for you, you should be able to trust that they've taken every reasonable measure to ensure that backups are actually being made and preserved.


You would think so, but then again you might be wrong. Better to be safe than sorry, no? I expected DO to make the backups I was paying them for and they didn't. Luckily I was making my own backups at the same time. Turns out my backups worked and theirs didn't. If I had just trusted them, I'd be out of business just like the dead company in question.

I'm not sure what's so confusing about "don't trust your vendors" but I've had to make this exact same reply way too many times.

Don't trust your vendors!


> But it wasn't, because every night I run a PG backup and copy it to AWS S3

Which is probably much easy on DO than AWS. For whatever reason AWS seems to go out of their way to make it as hard as possible to backup your RDS data to a separate AWS account.


Great point. I always run a separate backup through bash scripts that syncs with S3 bucket even though I also have the DO backups enabled as well. You always want 2 separate offsite backups for any real world production application.


If you don't mind me asking, what is the preferred method to backup to S3? Is it possible to scp or rsync a mysqldump to S3 or do you install aws tools on DO and run aws s3 cp as a scheduled job?


I use the awscli tools and aws s3 cp. I don't think it's possible to scp or rsync directly to S3, at least last I checked.

https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-in...


Not OP but I just use shell scripts. You could scp or rsync if you mounted s3 as a drive to your host, but that is not recommended as far as I know


I recommend looking into restic and/or rclone


check out restic which does dedup backups to s3, very easy to use


Cloud or not, you have to trust your vendor to deliver the service you paid them for. That's the whole reason for doing business with them.


The reason for doing business with them is because they say they will deliver the service you paid them for. But if they don't, what then? You can switch to another vendor, but without a backup you're starting from scratch.

You should hope they deliver the service you've paid for, but you should never trust that they will. Always plan for failure. That's the entire point of this whole "DO killed my company" saga.


I trust UpCloud more than other VPS provider because the technical support (Regional) has been very approachable and you know "who" they are. Beside, they promise 100% uptime SLA when your business is critical.

https://upcloud.com/blog/


Do you not have a mirror copy of everything somewhere else? Or is that not possible in your case?


I only run one server, because like the dead company in question, I don't have the money or manpower to do a full high-availability solution. Just a database backup and my site's code in Gitlab, so DR failover is a matter of 'git clone' and 'pg_restore'.

It might take me a few hours to get back online and piece my docker containers back together but my customers can absorb a few hours of downtime as long as it's a very rare occasion. If they couldn't, I'd charge them enough to afford a real HA solution.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: