
What To Do When A Hard Drive Fails - giu
http://server.dzone.com/news/what-do-when-hard-drive-fails
======
thaumaturgy
I don't think his reasoning behind freezing is correct; most of what I've
previously read suggests that freezing is only a useful trick for bearing
damage. If the hard drive is making a rattling noise while running, and then
stopping, it likely has a bad bearing, and freezing it for a while (or running
it cold) can coax the gimpy bearing into running a little longer. However,
you're probably not going to be able to get the entire drive image off of it,
so pick and choose what's really important to you.

In the event of an actual head crash, you're usually screwed. If the head
crash left a stripe all the way around the platter, the head likely won't seek
past it. (I don't know why; it should be able to, but it doesn't -- I tested
this recently on a Toshiba drive.)

Head crashes seem to have a tendency to occur close to the spindle, and if
that happens, the data on the drive is completely irretrievable.

Occasionally the armature will develop a mechanical fault and will collide
with the edge of the platters when the drive powers up. If this happens, you
can usually either "adjust" the armature slightly, or replace it with an
armature from a matching donor drive.

There are a few other tricks and things too, but those are the most common for
internal failures. Sometimes just replacing the logic board is all that's
needed -- you have to be able to diagnose what's wrong with the drive. Simply
sticking it in the freezer isn't likely to do a darned thing for you.

And I'm a little skeptical that this guy has ever even cracked a drive open.
They don't have a "bunch" of platters, they almost always have 2.

~~~
sophacles
They used to have 3,4, and even 5 platters. It could just be that he hasn't
opened a drive for a decade or two.

~~~
baddox
That wasn't decades in the past. Western Digital's VelociRaptor 600GB drive
has 3 platters. Some of the new 4k sector drives have 3 platters. The first
1TB hard drives (obviously in the past) had 4 or 5 platters.

[http://www.xbitlabs.com/articles/storage/display/1tb-hdd-
rou...](http://www.xbitlabs.com/articles/storage/display/1tb-hdd-
roundup-3_2.html#sect1)

~~~
sophacles
Thanks! I personally haven't opened an hdd in 15 years, so I couldn't speak to
newer drives. (and now I feel old :) )

------
youngian
Here's a better set of steps:

1\. Use ddrescue (<http://www.gnu.org/software/ddrescue/ddrescue.html>) to
copy a complete image of your hard drive onto another drive with lots of
space. Let it work for a long time and it will probably get all the data off.
If it doesn't you can try the freezer trick, but don't be too hopeful.

2\. Unplug the busted hard drive and set it aside. You can perform all the
subsequent recovery steps on the image you just copied, which is better than
stressing an already-dying drive.

3\. Mount the hard drive image
(<http://tcdb.grinnell.edu/wiki/pmwiki.php?n=Help.Ddrescue>).

4\. Use testdisk (<http://www.cgsecurity.org/wiki/TestDisk>) to try to
recreate the partition table.

4.5 Alternatively, you can use parted with the rescue command, but that works
better if you know exactly what your partition table was.

5\. If that fails, use photorec (<http://www.cgsecurity.org/wiki/PhotoRec>) to
pull off all the files it can find. It should be able to get most of your
documents, photos, etc, but it might not find everything, and they will be
missing file names and folder structure. Remind yourself that this is better
than nothing.

~~~
youngian
Oh! I forgot to mention:

1.5 If the hard drive is so badly failed that you cannot even get ddrescue
running, try again tomorrow. And the next day. And the next. I cannot explain
why a hard drive would come back to life, but I have seen it happen a number
of times. I have seen a hard drive that was clicking and irrecoverable for two
solid weeks suddenly show up just fine one day in the device list. It mounted
and everything, and we promptly pulled all the data off. I'm not much of an
optimist, but when it comes to failed hard drives, I've seen miracles occur.

------
tbrownaw
Um, buy a new one and restore from your nightly backup? Or for important
things, maybe replace it and let your RAID mirror rebuild?

------
bluesmoon
I've been using this technique to recover data from bad drives for 13 years.
It's worked for me with drives from the following manufacturers: Quantum
(Fireball & BigFoot), Seagate, Samsung

It doesn't always work. When it does work, you'll get to read data off it just
once after the freeze. Sometimes a refreeze allows you to read data again, but
never more than that.

------
gamache
You post to HN about your impeccable backup practices

------
binarymax
I had a bad experience doing this accidentally. I mistakenly left my laptop in
my car overnight in the dead of a Boston winter...and the next morning my HD,
which has moving parts, did not like it one bit and made creaking noises. I
realized what had happened and let it warm up for a while on my lap, then
restarted and it was fine. Perhaps mine was just way to cold and there is a
mimimum temperature one could suggest?

------
ukdm
You sigh with relief remembering all your important files are stored in your
Dropbox folder

~~~
giu
This one counts only if you _embrace the cloud_ :) For me, a RAID-1 does a
good job, too.

~~~
cynicalkane
I'm surprised nobody has said this, but: RAID-1 is not a backup. You're
powerless against data damage, and also in trouble if the machine containing
the hard drives is damaged.

~~~
Hoff
_in trouble if the machine containing the hard drives is damaged_

Not all RAID-1 has that particular limit. I work with both hardware- and
software-based RAID-1 implementations that work just fine across multiple
hosts, and that can operate across separations of hundreds of kilometers.

And while you're correct about RAID-1 not being a backup; that a volume
corruption, errant delete or sufficient degrees of user disgruntlement can
still nuke the data. That written, the software RAID-1 solution I use can be
used as a component of a backup procedure; you can pull disks out of the
RAIDset, clone or archive, and then merge the volumes back into the RAIDset
using a delta of the changes.

FWIW.

edit: added a missing "; that"

~~~
Periodic
Even when using RAID1 like that for cloning, it's the _cloning_ that is the
backup, not the RAID1. You're cloning data that just happens to contain RAID1
configuration data. You could also clone the partition as presented to the OS
or the logical files. It's all cloning, just at different levels.

~~~
Hoff
No. That's not the same.

The difference being you can get a consistent copy of the whole disk with a
very small window where the applications must be quiesced, where the cloning
process you've suggested rolls through the whole volume and (with an active
volume) the volume state tends to change as the copy process traverses the
volume.

As for an alternative and I/O-heavy approach that's possible with the RAID1
implementation, it's also feasible to roll out a member volume and roll in a
scratch disk; to replace the volumes using a more traditional scheme. Rolling
the member volume back into the RAID1 set after the cloning operation can use
the delta of the differences, rather than necessitating a whole-volume copy.
Replacing a disk necessitates a whole-disk copy.

If you're working with a fairly quiescent disk, then the difference between
these two approaches probably isn't significant. If the volumes are busy, then
having a complete and consistent copy of the whole volume with a very short
window when the applications have to flush and hold can be advantageous.

------
rbranson
Anyone want to comment on the scientific reason for this?

Plenty of commenters on HN and dzone are claiming that RAID-1 will provide
sufficient protection. Unfortunately, RAID-1 doesn't guard against data
corruption or accidental destruction of data. I highly recommend something
like Backblaze ($5/mo, unlimited storage, fairly hacker-friendly).

~~~
gamache
The scientific basis, according to the article, is that the platters shrink
ever so slightly in the extreme cold, and that this slight shrinking can be
the difference between a clean read and a head crash.

------
derobert
Next step would be, I'd think, to bring the humidity in your room back up to
sane, comfortable, healthy levels. Either that, or a working freezer.

A 70°F (~21°C) room will have a dew point far above the 0°F freezer, so your
drive should develop substantial frost from condensation. Unless your relative
humidity is around 5%. (Does a glass of ice water drip from condensation? Your
drive will be far colder.)

Subjecting data that you care about to potential thermal shock and
condensation, well, that doesn't sound like a good idea to me.

------
rradu
I've actually tried the freezer trick in the past, even though I knew the
problem was with a stuck spindle. I wasn't able to get any data off it, but
the drive did make a completely different noise when it was frozen, so the
cold did have some sort of effect.

A professional repair costs a minimum of $600, so if you know from the
beginning that your data is not worth that much, then just go ahead and try
it.

------
rbanffy
You regret not using something like ZFS's RAID-Z? You blame whoever convinced
you to have a single drive inside your machine?

~~~
warfangle
How does ZFS work with fuse on linux? From what I've been able to find, it's
not available in-kernel due to licensing issues. Is negative performance with
running it in userspace versus kernel negligible, or hampering?

~~~
ryanpetrich
It depends on your workload. It is noticeably slower compared to the in-kernel
ZFS on FreeBSD. If you are just using it as a file server for media and
documents, it will do fine. If you are using it for something high-
performance, look elsewhere.

------
laut
In a server you replace the hard drive and let the RAID array rebuild itself.
You have RAID in your server, right?

Of course RAID is not a backup strategy. RAID 1 is great for exactly the
mentioned event: when a hard drive fails. If some of your data has been
corrupted or lost, you use your backups.

------
zokier
I wouldn't try this, and potentially damage the drive further. If the data is
worth recovering, then send it over to professionals. Of course if you really
don't care about your data, then you could try this, but the question arises,
why?

------
MikeCapone
Thanks for the reminder, I just fired up Time Machine to do a backup..

------
briankb
"What To Do When A Hard Drive Fails:" Spinrite
<http://www.grc.com/sr/spinrite.htm>

------
wendroid
I smile to myself that all my data is replicated in a Venti store and
accessible by date. Bonus smiles when I tell you one of the redundancies is I
keep it in 500Mb encrypted blocks on an insecure server.

