
Happypenguin.org: Hard drive crash, a cautionary tale - roschdal
http://www.happypenguin.org/
======
albertzeyer
For people who might check this link a few days/months later, here the text as
of Nov 8th:

 _Due to a hard drive failure which occurred at the same time as our backup
system failed, the site is currently down. We expect to have full data
recovery within the next few days. As of 9th Oct 2010 the data recovery is
still ongoing. We have been told to expect a result early next week, so
crossed fingers, we are hoping to have service resumed on or around the 13th
Oct.

As of 15th Oct 2010 the data recovery is still ongoing. Apparently the Western
Digital HDD that died has corrupted itsown firmware, making it hard to extract
the data from the disc. As such, we are beginning the process of backup
recovery from our most recent tier 2 backups which are a few weeks out of
date, but waiting on the tier 1 recovery is getting ridiculous. We plan on
working over the weekend on this recovery, and hope to have more news for you
shortly.

As of 22nd Oct 2010 data recovery is STILL ongoing. The Western Digital Caviar
Green 2TB drive that failed has chemical degredation on the surface making the
data recovery much slower and harder. All going well, we will have the older
backups making sense soon, and will be back up to speed.

As of Nov 4th, we have been unable to bring the site back up yet, due to the
discovery that the older backups bring brought online would end up with a
number of users being unable to access their games due to copy protection
issues. As such, rather than give ANY user a problem with their legally
purchased game, we will be keeping the site down for the next few days while
the final stages of data recovery are performed. All going well, we hope to be
back up towards the start of next week._

~~~
chronomex
_As of Nov 8th, we have a report from the data recovery company of a complete
recovery. We should be receiving the recovered data on Wednesday 10th and we
can then begin the process of finally getting things back to normal around
here.

As of Nov 12th, we have now received the data recovery files, but there has
been a fair amount of filesystem damage, and it will take us some time to
locate the files, rebuild the filesystem, and get everything back on its feet.
We are still hoping for being back online next week, all going well.

As of Nov 15th, we have begin database reconstruction. As there was
significant filesystem damage, this will take a while, as we need to manually
sanity check the database records, and there are a lot of them. We still hope
to have things up and running this week._

------
tygorius
The drive in question sounds like the 2TB bargain that New Egg regularly
offers deals on:
([http://www.newegg.com/Product/Product.aspx?Item=N82E16822136...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822136514))

I've got three of these in a JBOD server, running a couple of months without
incident. They're cheap, quiet, and low-power. They're not high-performance,
however, and not ideally suited to RAID applications. Also, given the new
sector size you can get dreadful performance from them if you aren't careful
in setting your partition boundaries.

Having suffered the consequences of ignoring data safety rules in the past, I
now make a practice of having three copies of important data in at least two
separate locations. I'm frankly astonished that there apparently wasn't a
differential backup process at the company. I mean, a month's down time, WTF?
These hard disks are pretty darn cheap backup media if you've got lots of data
(e.g. video files), but after this item I'm wondering if biodiversity should
be among my archive criteria.

That the drives in question had "chemical degredation on the surface making
the data recovery much slower and harder" is a puzzler to me. Perhaps there's
something about today's drive materials I'm unaware of, but my CYA radar went
off on that choice of words: someone screwed up during the recovery process
would be my first suspicion.

~~~
mike-cardwell
Diversity is a good idea. I have a TimeCapsule which automatically backs up my
Macbook once an hour, but I still put important files in my Dropbox folder so
I have a 3rd offsite backup of them too.

~~~
pmjordan
Is there anything remotely as convenient as Dropbox, but with sane, client-
side crypto? As far as I know, Dropbox encrypt the data, but on the server,
which, as far as I'm concerned, is about as good as storing it in plaintext.

Alternatively/even better: is there a remote mirroring filesystem and/or block
storage system which allows reasonable client-side crypto over a slow
asymmetric (ADSL) link? I have a local backup server, it would be great if I
could mirror that into a server in a datacentre somewhere. Is there anything
remotely as convenient as Dropbox, but with sane, client-side crypto? As far
as I know, Dropbox encrypt the data, but on the server, which, as far as I'm
concerned, is about as good as storing it in plaintext.

Alternatively/even better: is there a remote mirroring filesystem and/or block
storage system which allows reasonable client-side crypto over a slow
asymmetric (ADSL) link? I have a local backup server, it would be great if I
could mirror that into a server in a datacentre somewhere. Has anyone tried
this kind of thing with lessfs, dm-crypt and drbd?

~~~
Robin_Message
I think this is what TarSnap <http://www.tarsnap.com/> is meant to be for.

~~~
pmjordan
As far as I can tell, Tarsnap is only for archives, it doesn't give you
random-access, mountable-as-a-filesystem style convenience. For backups,
Tarsnap is definitely the correct architecture, for syncing, not so much.

~~~
cperciva
Correct. Theoretically the Tarsnap client-server protocol could be used to
synthesize a mountable filesystem, but that would add hard latency
requirements -- one of the great advantages of Tarsnap's transactional
archive-creation is that it can tolerate high latency and even requests
failing with little impact.

------
viraptor
Also: DRM, a cautionary tale (unless I misunderstood what they mean by that -
I don't know what did happypenguin.org actually offer):

"older backups bring brought online would end up with a number of users being
unable to access their games due to copy protection issues"

~~~
Dramatize
They're a non-profit bringing happiness to penguins?

~~~
NathanKP
No, they offer a list of linux games, a forum for the discussion of linux
games, and a store for buying linux games.

------
foobar2k
Ma.gnolia anyone? <http://www.wired.com/epicenter/2009/01/magnolia-suffer/>

When our backups or replication fail I usually say we are at Ma.gnolia threat
level orange :)

~~~
narrator
Don't forget the big CouchSurfing
([http://techcrunch.com/2006/06/29/couchsurfing-deletes-
itself...](http://techcrunch.com/2006/06/29/couchsurfing-deletes-itself-shuts-
down/)) crash. They appear to have recovered from it though.

------
rarrrrrr
Offsite backup services created by HNers:

<http://www.tarsnap.com/>

<http://www.haystacksoftware.com/arq/>

<https://spideroak.com/>

(I co-founded SpiderOak in 2006.)

~~~
fluidcruft
The problem with tarsnap is that it's run by one (extremely competent) guy.
Unfortunately, if he dies/is disappeared/suffers a mental breakdown/etc your
backups may well be toast and unrecoverable.

------
bugsy
I have been places that have been through this same routine with backup after
backup being found to have failed even though they were all checked recently
through test restores.

Another thing that likes to happen is that even if you have a good backup,
it's likely due to psychological stress that the first two or three
restoration attempts will accidentally erase the backups due to someone typing
an incorrect command.

~~~
viraptor
So true. I lost ~3h of work because I screwed up my hg repo with msc
dissertation somehow... on the hand-in day. Then in a panic, tried to restore
it - of course I made a mistake in rsync source and destination and overwrote
my backups before I could stop the sync (also ignored modification times -
yay).

Fortunately I had another copy on bitbucket. Not the latest version, so I
needed another day to finish all document formatting.... but I'm not sure how
this would end otherwise.

~~~
lnguyen
I've had 5 years of graduate research literally walk out the door when someone
decided to steal the workstation out of my office at the beginning of 2000.
Thankfully everything was being backed up remotely but it took over a month
before a successful restore from tape. I'm guessing it was the first time the
department's backup system was put to an actual "test" and during that month
it was questionable whether or not I'd get anything back.

Not the best thing to happen right before you're set to write your PhD
dissertation.

------
idoh
This keeps happening ... make sure to backup, and then test your backup. An
easy way to do this is to run your dev environment on a backup snapshot.

~~~
iamjustlooking
Having a mirrored RAID array would have helped here as well, as long as you
don't mistake it for a backup solution.

~~~
eli
And given how cheap hard drives are (and how likely they are to eventually
fail) it seems like that would almost always be a good investment.

~~~
jrockway
Indeed. My /home directory is on a 1TB 3-way RAID 1. Because I am that
paranoid about disk failure.

Interestingly, all three disks failed, because Newegg does not know how to
ship them properly. Never buying disks from them again. The good news is that
only two failed at the same time, so there was no data loss. And the
replacement disks from the manufacturer, Samsung, have been chugging along for
months just fine.

RAID on your desktop is nice. RAID on your server is mandatory. The total cost
was $300 for one fucking terabyte. Just do it :)

~~~
X-Istence
I've purchased many hard drives from NewEgg, those same hard drives have been
through two moves now and they are still going along strong. Not a single bad
sector or anything along those lines.

~~~
jrockway
OEM or retail? Retail disks have better packaging, but I bought OEM disks.

Mine came in plastic egg cartons with a few packing peanuts. I'm surprised
they didn't shatter on their way across the country.

~~~
X-Istence
OEM, never retail since retail is always more expensive. Yep, same plastic egg
cartons with a ton of packaging peanuts. Received 8 1 TB drives.

------
motters
If anything, it's better to have an over-zealous backup policy.

------
jodrellblank
And the classic LeafyHost Saga
[http://arstechnica.com/civis/viewtopic.php?f=25&t=238085](http://arstechnica.com/civis/viewtopic.php?f=25&t=238085)
( might want to look for a summary, it's a long thread )

At least HappyPenguin sent their disk for real recovery...

------
zppx
Whoa, I just visited this page 4 hours before I saw it here on HN front page,
I used to visit the page weekly back in 2003 and 2004 when I had much more
spare time. I wish everything goes fine for them.

Anyway, where I work we use a controller with 4 attached SAS hard disks
working in RAID and also realize backups using bacula just to remain safe.

------
noonespecial
There comes a point where "backup" does not mean another hard drive, it means
another server. (preferably running in replicate so all you have to do is
change a router entry to recover).

My bet is that Yoda would encourage you to know when this is.

------
synack
I read this, logged into the AWS console and grabbed a fresh snapshot of my
EBS volumes. Problem solved.

