

Digital Ocean completely destroyed my droplet beyond recovery out of the blue - phaser

<p><pre><code>  From: DigitalOcean &lt;support@support.digitalocean.com&gt;
  Subject: [DigitalOcean Ticket ID:xxxxx] cant power on droplet
  Date: August 13, 2014 at 1:26:58 AM GMT-4
  To: &lt;xxxxxx@xxxxxxx&gt;
  Reply-To: &lt;xxxxxxxx@support.digitalocean.com&gt;
</code></pre>
There has been a response to your ticket:<p>Hello,<p>You will be unable to power on this droplet as it has suffered unrepairable damage.<p>We are writing to let you know that the hypervisor that hosts (redacted) has suffered a catastrophic failure. Despite repeated attempts to replace failed components and even to perform two full swaps of the system chassis, we were unable to recover it.<p>Unfortunately, the failure also resulted in loss of all data on the hypervisor. Droplets and data hosted on this hypervisor node are not recoverable, despite all of our efforts.<p>Please let us know if you need help recovering from a recent snapshot or backup. While we know that an account credit is only a small comfort when confronted with data loss, we have gone ahead and credited your account for one month of service.<p>If there is anything else we can do, please let us know. We will be standing by to assist you in recovering from this as best we can, and we have marked the IP that was associated with this Droplet as reserved to your account, so your next Droplet in the SFO datacenter should reclaim it.<p>Regards,<p>(redacted)<p>(edit: line breaks)
======
Artemis2
It's the cloud. You have backups and you can spin up a replacement server in a
minute. If you don't have backups, you should probably rethink the way you use
these services. Hardware fails, inevitably.

------
wmf
So restore from backup. Hardware failures are inevitable.

~~~
mariust
I do think that, hardware failures are inevitable, but using a RAID system,
should prevent such things since all data is written on 2 or more disks at the
same time. The backup might be one or more days old, think of Facebook would
say: " hey 1 billion users the 10x billions of images you have uploaded in the
last 24h are gone, please upload them again "

~~~
TheLoneWolfling
RAID is _much_ less effective than it used to be.

Current BER (non recoverable read error rate) is around ~1/10^14.

If you lose a drive, the probability of you encountering an error when
rebuilding the RAID is rapidly approaching 1 as time goes on - BERs are not
decreasing nearly as fast as drive sizes are increasing.

(This assumes you are using a RAID where you can lose 1 drive and be fine.
RAIDs where you can lose 2+ drives are still fine for now.)

And this isn't taking into account catastrophic controller failure, damage due
to a faulty power supply / surge, etc.

------
trampish
Did you have DO's backup service enabled when this happened? If so, I feel
like they should have provided a more streamlined customer experience
restoring the damaged/lost droplet from the latest Amazon Glacier instance.

------
mattkrea
When backups can be enabled at 20% of instance cost I'll deal with it. Nothing
beyond development work runs on only one instance due to the problems I've
seen even on Amazon.

------
ksec
That is why you should be on Linode is you are in production.

Note: Not saying they are perfect, but it is just much less likely to happen.

------
farawayea
They are using RAID 5 and probably cheap ssd storage. I'm not surprised. This
is what they give you for your money.

------
tuananh
happened to me once. i wonder what's the actual SSD failure rate at DO? Is it
any higher than industry standard rate?

