Hacker News new | past | comments | ask | show | jobs | submit login
What To Do When A Hard Drive Fails (dzone.com)
34 points by giu on April 6, 2010 | hide | past | favorite | 34 comments



I don't think his reasoning behind freezing is correct; most of what I've previously read suggests that freezing is only a useful trick for bearing damage. If the hard drive is making a rattling noise while running, and then stopping, it likely has a bad bearing, and freezing it for a while (or running it cold) can coax the gimpy bearing into running a little longer. However, you're probably not going to be able to get the entire drive image off of it, so pick and choose what's really important to you.

In the event of an actual head crash, you're usually screwed. If the head crash left a stripe all the way around the platter, the head likely won't seek past it. (I don't know why; it should be able to, but it doesn't -- I tested this recently on a Toshiba drive.)

Head crashes seem to have a tendency to occur close to the spindle, and if that happens, the data on the drive is completely irretrievable.

Occasionally the armature will develop a mechanical fault and will collide with the edge of the platters when the drive powers up. If this happens, you can usually either "adjust" the armature slightly, or replace it with an armature from a matching donor drive.

There are a few other tricks and things too, but those are the most common for internal failures. Sometimes just replacing the logic board is all that's needed -- you have to be able to diagnose what's wrong with the drive. Simply sticking it in the freezer isn't likely to do a darned thing for you.

And I'm a little skeptical that this guy has ever even cracked a drive open. They don't have a "bunch" of platters, they almost always have 2.


I had a laptop hard drive suffer a head crash on the lower platter, and it was able to seek past it. Further, the freezer trick seemed to work a little, but it may have been simply letting the drive rest for a while. I do know that I've never seen it take 12 full hours to reach temperature equilibrium in a hard drive. 3 or 4 usually does the trick just fine. I eventually got most of my important data off. This was before external hard drives were commonplace and frequent backups of laptops made much sense, though.


I have tried the freezer trick before with varying success.

On one occasion, my friend's laptop hard drive kicked the bucket and failed when I tried to use Spinrite.

However, a few weeks ago, a Aeagate hard drive from my brother's Macbook failed. Throwing it in the freezer for a few hours did the trick and we were able to make a full copy of the data.


They used to have 3,4, and even 5 platters. It could just be that he hasn't opened a drive for a decade or two.


That wasn't decades in the past. Western Digital's VelociRaptor 600GB drive has 3 platters. Some of the new 4k sector drives have 3 platters. The first 1TB hard drives (obviously in the past) had 4 or 5 platters.

http://www.xbitlabs.com/articles/storage/display/1tb-hdd-rou...


Thanks! I personally haven't opened an hdd in 15 years, so I couldn't speak to newer drives. (and now I feel old :) )


Oh, you're right. I forgot about that.


Here's a better set of steps:

1. Use ddrescue (http://www.gnu.org/software/ddrescue/ddrescue.html) to copy a complete image of your hard drive onto another drive with lots of space. Let it work for a long time and it will probably get all the data off. If it doesn't you can try the freezer trick, but don't be too hopeful.

2. Unplug the busted hard drive and set it aside. You can perform all the subsequent recovery steps on the image you just copied, which is better than stressing an already-dying drive.

3. Mount the hard drive image (http://tcdb.grinnell.edu/wiki/pmwiki.php?n=Help.Ddrescue).

4. Use testdisk (http://www.cgsecurity.org/wiki/TestDisk) to try to recreate the partition table.

4.5 Alternatively, you can use parted with the rescue command, but that works better if you know exactly what your partition table was.

5. If that fails, use photorec (http://www.cgsecurity.org/wiki/PhotoRec) to pull off all the files it can find. It should be able to get most of your documents, photos, etc, but it might not find everything, and they will be missing file names and folder structure. Remind yourself that this is better than nothing.


Oh! I forgot to mention:

1.5 If the hard drive is so badly failed that you cannot even get ddrescue running, try again tomorrow. And the next day. And the next. I cannot explain why a hard drive would come back to life, but I have seen it happen a number of times. I have seen a hard drive that was clicking and irrecoverable for two solid weeks suddenly show up just fine one day in the device list. It mounted and everything, and we promptly pulled all the data off. I'm not much of an optimist, but when it comes to failed hard drives, I've seen miracles occur.


ddrescue is a godsend. I actually had my backup drive bite the dust on me. All my live data was fine, but my backup drive ate it. ddrescue saved the day.

I agree with pretty much everyone here, though, in saying that "restore it from backup" is the right answer, and the fact that the parent-linked article doesn't mention it anywhere makes it worthy of ridicule. Fortunately, the second commenter on the article injects a bit of sanity into the situation.


Um, buy a new one and restore from your nightly backup? Or for important things, maybe replace it and let your RAID mirror rebuild?


I've been using this technique to recover data from bad drives for 13 years. It's worked for me with drives from the following manufacturers: Quantum (Fireball & BigFoot), Seagate, Samsung

It doesn't always work. When it does work, you'll get to read data off it just once after the freeze. Sometimes a refreeze allows you to read data again, but never more than that.


You post to HN about your impeccable backup practices


I had a bad experience doing this accidentally. I mistakenly left my laptop in my car overnight in the dead of a Boston winter...and the next morning my HD, which has moving parts, did not like it one bit and made creaking noises. I realized what had happened and let it warm up for a while on my lap, then restarted and it was fine. Perhaps mine was just way to cold and there is a mimimum temperature one could suggest?


You sigh with relief remembering all your important files are stored in your Dropbox folder


This one counts only if you embrace the cloud :) For me, a RAID-1 does a good job, too.


I'm surprised nobody has said this, but: RAID-1 is not a backup. You're powerless against data damage, and also in trouble if the machine containing the hard drives is damaged.


in trouble if the machine containing the hard drives is damaged

Not all RAID-1 has that particular limit. I work with both hardware- and software-based RAID-1 implementations that work just fine across multiple hosts, and that can operate across separations of hundreds of kilometers.

And while you're correct about RAID-1 not being a backup; that a volume corruption, errant delete or sufficient degrees of user disgruntlement can still nuke the data. That written, the software RAID-1 solution I use can be used as a component of a backup procedure; you can pull disks out of the RAIDset, clone or archive, and then merge the volumes back into the RAIDset using a delta of the changes.

FWIW.

edit: added a missing "; that"


Even when using RAID1 like that for cloning, it's the cloning that is the backup, not the RAID1. You're cloning data that just happens to contain RAID1 configuration data. You could also clone the partition as presented to the OS or the logical files. It's all cloning, just at different levels.


No. That's not the same.

The difference being you can get a consistent copy of the whole disk with a very small window where the applications must be quiesced, where the cloning process you've suggested rolls through the whole volume and (with an active volume) the volume state tends to change as the copy process traverses the volume.

As for an alternative and I/O-heavy approach that's possible with the RAID1 implementation, it's also feasible to roll out a member volume and roll in a scratch disk; to replace the volumes using a more traditional scheme. Rolling the member volume back into the RAID1 set after the cloning operation can use the delta of the differences, rather than necessitating a whole-volume copy. Replacing a disk necessitates a whole-disk copy.

If you're working with a fairly quiescent disk, then the difference between these two approaches probably isn't significant. If the volumes are busy, then having a complete and consistent copy of the whole volume with a very short window when the applications have to flush and hold can be advantageous.


It's good that you point it out. Surely it's not a backup, but at least it's a better option than working only with a single harddisk. If you feel comfortable with distributing data in the cloud, cloud storage services like Dropbox are a very good option for backing up everything, as ukmd mentioned it in the comment above.


Anyone want to comment on the scientific reason for this?

Plenty of commenters on HN and dzone are claiming that RAID-1 will provide sufficient protection. Unfortunately, RAID-1 doesn't guard against data corruption or accidental destruction of data. I highly recommend something like Backblaze ($5/mo, unlimited storage, fairly hacker-friendly).


The scientific basis, according to the article, is that the platters shrink ever so slightly in the extreme cold, and that this slight shrinking can be the difference between a clean read and a head crash.


Next step would be, I'd think, to bring the humidity in your room back up to sane, comfortable, healthy levels. Either that, or a working freezer.

A 70°F (~21°C) room will have a dew point far above the 0°F freezer, so your drive should develop substantial frost from condensation. Unless your relative humidity is around 5%. (Does a glass of ice water drip from condensation? Your drive will be far colder.)

Subjecting data that you care about to potential thermal shock and condensation, well, that doesn't sound like a good idea to me.


I've actually tried the freezer trick in the past, even though I knew the problem was with a stuck spindle. I wasn't able to get any data off it, but the drive did make a completely different noise when it was frozen, so the cold did have some sort of effect.

A professional repair costs a minimum of $600, so if you know from the beginning that your data is not worth that much, then just go ahead and try it.


You regret not using something like ZFS's RAID-Z? You blame whoever convinced you to have a single drive inside your machine?


How does ZFS work with fuse on linux? From what I've been able to find, it's not available in-kernel due to licensing issues. Is negative performance with running it in userspace versus kernel negligible, or hampering?


It depends on your workload. It is noticeably slower compared to the in-kernel ZFS on FreeBSD. If you are just using it as a file server for media and documents, it will do fine. If you are using it for something high-performance, look elsewhere.


I have no idea, but it should work.

Do you have a need to use Linux? I am planning to go OpenSolaris on my next media box.


In a server you replace the hard drive and let the RAID array rebuild itself. You have RAID in your server, right?

Of course RAID is not a backup strategy. RAID 1 is great for exactly the mentioned event: when a hard drive fails. If some of your data has been corrupted or lost, you use your backups.


I wouldn't try this, and potentially damage the drive further. If the data is worth recovering, then send it over to professionals. Of course if you really don't care about your data, then you could try this, but the question arises, why?


Thanks for the reminder, I just fired up Time Machine to do a backup..


"What To Do When A Hard Drive Fails:" Spinrite http://www.grc.com/sr/spinrite.htm


I smile to myself that all my data is replicated in a Venti store and accessible by date. Bonus smiles when I tell you one of the redundancies is I keep it in 500Mb encrypted blocks on an insecure server.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: