Hacker News new | past | comments | ask | show | jobs | submit login

I lost data to this kind of problem[1]. The linux dm-raid handles these kind of failures extremely poorly (or did at the time), even when one following all available tutorials. (When I reported my experience one developer said I should have set a cronjob to recursively md5sum my / every week or so - not exactly user-friendly, and not mentioned in any of the tutorials). When you attempt to rebuild a dmraid array, even a raid6 one, expect to lose all your data.

Now I use ZFS (on FreeBSD), which handles these kind of errors much more gracefully; if there's an isolated URE you might lose data in that particular file, but it won't destroy the whole array.

[1] Yeah yeah, RAID is not a backup. I'm talking about data I didn't consider worth the cost of backing up, as a poor student at the time.




> I should have set a cronjob to recursively md5sum my / every week or so

If you use Debian then install the debsums program which will do that for you for non-user data and report any errors.

You should also install mdadm and set it to check the array every month.

And finally install smartmontools and have it do a short self check every day and a long one (i.e. a full disk read) every week.


Yeah, now I've near 120 tb of data under ZoL (ZFS on Linux, currently on master as time of writing), replicated at 5 min interval between two datacenters...

Zero corruption, zero problems.


The main Linux raid impl (md) probably handles failures more robustly than dm-raid.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: