
Don’t Be Afraid of RAID - louwrentius
https://louwrentius.com/dont-be-afraid-of-raid.html
======
Exmoor
The overall premise of this article is good and needs to be said, but the
author missed an opportunity by lumping traditional hardware RAID solutions
with software solutions like ZFS. The general consensus I've seen from people
spinning up large storage arrays in recent years is that the software
solutions are far superior for many of the reasons the author mentions.

~~~
louwrentius
I kept my article neutral and I realise I should have stated that I consider
Linux software RAID and ZFS in the same boat in terms of perceptions about
RAID.

People forgo on both Linux software RAID and ZFS because of scary stories.

ZFS is an interesting alternative to Linux software RAID that discussion is a
different topic all together.

~~~
layoutIfNeeded
Scary stories of ZFS? I thought it was the pinnacle of file systems in terms
of reliability.

~~~
dehrmann
I think this has been fixed, but at one point, I used a log device with a one-
time encryption key, thinking I didn't care that much about the data if
there's a power loss. For the most part, ZFS didn't care either, but what it
did care about was not mounting a pool with a missing disk. I had to edit
metadata on the pool so it would mount with a different log device. Tools for
this didn't exist--I was reading specs and source code, learning how ZFS know
which devices belong in a pool.

------
KaiserPro
VFX sysadmin here

> Setup alerting if you care about your data

Yes. and make sure that you have the correct signal to noise ratio. Otherwise
you won't act on it.

> scrub everything

Yes, this too.

One thing that I would add is avoid mdadm like the plague. Its tricky, poorly
documented and requires a whole bunch of other crap tools to make it work.

ZFS or proper hardware raid, there is no inbetween.

On the subject of loosing a raid during rebuild, I've had that, but I can't be
sure that it was because of simultaneous disk failures. The raid in question
was a 60 drive array with 8TB drives. There were 4 raid 7 groups with 4 spare
drivers.

After each rebuild, a new drive would fail. over the space of two weeks we
changed about 6 drives. Because the raid was under huge pressure, it didn't
have the time needed to rebuild (24 hour rebuild time jumps up to 96 hours)

In the end we migrated the data to another array.

~~~
derefr
> One thing that I would add is avoid mdadm like the plague. Its tricky,
> poorly documented and requires a whole bunch of other crap tools to make it
> work.

Do you feel that Linux MDRAID is a bad _technology_ , or just a bad UX? I.e.,
are MDRAID-based _appliances_ (like many NASes)—where all the sharp edges have
been smoothed away—still bad in some fundamental way?

~~~
KaiserPro
Oh no, its a fine technology, its just useless documentation and to make it
really work you need LVM, which is just horrific to deal with.

I should say that I've worked proffeshionally with ZFS, mdamd/pv/lvm
gluster(ugh) lustre(ok) clustered xfs and GPFS.

Along with a bunch of appliances.

mdadm/pv/lvm is about the bottom three. After gluster and ceph.

~~~
joana035
How come mdadm needs lvm?

Do you mean device mapper?

~~~
KaiserPro
I don't mind device mapper, for what I used it for, multipath scsi.

mdadm doesn't need lvm. But if you want snapshots, or to have flexible
partitions (anything like the flexibility of zfs) you need lvm.

Also for the longest time redhat used to force lvm on you by default. I assume
this was to help with "flexibility"

In practice, on virtual machines, or anything back with decent storage, I'd
remove the partition table and just run on the block device directly.

------
dopylitty
Don’t overlook the advice regarding setting up alerting.

A long time ago I was managing several devices which ingested logs for a large
enterprise. One day one of the devices suddenly couldn’t handled the load and
sent an alert that it was dropping log messages.

After a lot of digging and headscratching it turned out the battery backup on
the device’s hardware RAID card had failed so (IIRC) it was operating in a
mode without the onboard memory cache and couldn’t handle the write load
anymore. We had to run to the data center and replace the entire 2RU device
because of this.

If we had alerting set up on the backup batteries at least we might have been
able to react quicker.

~~~
phire
I've had worse.

Someone has set the machine up with a raid 10 of SSDs and no monitoring.

At some point the RAID controller battery died, increasing the load on the
SSDs. Who knows when that happened, no monitoring.

Then I guess the SSDs failed one by one. I have no idea in which order, or how
long the gaps were between the failures. Like I said, it would have been nice
to have some kind of monitoring.

Eventually the 3rd SSD died, and the machine died, finally calling attention
to the problem.

To top it all off, it had recently been discovered that the backups were
incomplete, and everyone was still arguing over the best way to close the
backup hole.

So we forced to pull the SSDs and send them off to professionals for recovery.
I believe we spent several thousand for express recovery.

So yeah. Don't forget monitoring/alerting.

~~~
funnybeam
As I find myself frequently repeating - redundancy without monitoring is not
redundancy

------
user5994461
I find the article super confusing. It goes at length about how RAID is fine
and array failure is a myth, but doesn't mention RAID 5 until three chapters
down.

The only issue with RAID has only ever been with RAID 5 specifically. Don't
use RAID 5. Definitely not with large drives or more than 4 drives.

The author runs one disk check that immediately detects one disk is dying.
After replacing the disk without the whole array exploding, he concludes that
the risk of losing data is a myth??? WTF. It was really one inch away from
losing all the data. If anything it proves that having a single redundancy is
not enough to store critical data, there is always at least one disk about to
fail in a large multi disk array.

~~~
jugg1es
I have also only ever had problems with RAID 5. Many years ago, we ran a
hardware RAID 5 and it was really very slow. I eventually bought some extra
drives and switched to RAID 0+1 with much greater results and comparable
availability.

~~~
6nf
RAID 0+1, is that the same thing as RAID 10?

~~~
greenknight
Nope... RAID 10 or RAID 1+0 is 2 or more RAID 1's that are RAID 0'd together

RAID 01 is 2 or more RAID 0's that are RAID 1'd together

~~~
jugg1es
yea I had a mirror of striped drives. Gave you max throughput with some
protection against failure.

------
m0xte
RAID at home is dead IMHO unless you have a crazy huge amount of data. If you
have a desktop PC then chucking a PCI/NVMe bridge card in it and populating
that with Sabrent 1TiB NVMe M2 sticks works out cheaper, faster and more
reliable. Unless you need several TiB of contiguous file storage keep them as
separate volumes. Plus you don’t have to deal with separate hardware built by
the lowest bidder and quite frankly terrible NAS or RAID implementations.

You also still have to back up both solutions to offline storage. From SSD
this is a magnitude less painful at those data volumes.

~~~
Exmoor
Your definition of "crazy" huge amount of data is pretty conservative. Your
proposed solution would top out at 4TB, while most people I know who spinning
up their own RAID at home are looking to store multiple tens of TB. For the
cost of the solution you propose (4x 1TB @ $150/ea + $50 NVMe) you could set
up an RAID-type solution with 32TB of raw capacity (4x 8TB @ $140/ea + $90 for
PCie card and cables).

~~~
loeg
> tens of GB

typo -- tens of TB?

~~~
Exmoor
Indeed, fixed. Thanks.

------
maallooc
RAID is for speed and avaliability. It's not backup.

I personally use home grade 2x 4TB SSD RAID 1 for my media and server grade 2x
1TB SSD RAID 1 for my documents. The reason I use RAID 1 is because if one
drive fails I should be able to use the other drive while I order a
replacement. I don't expect them to be my backup.

~~~
willis936
Tomatoe tomato. RAID uses redundant (ie backup) drives for availability.

~~~
seized
Very wrong. Entire RAID arrays can be lost to fire, water/flooding, power
surges, multiple drive failures, theft, etc. And if the controller dies you
may be up the creek.

Backup is separate. Ideally following 3-2-1.

[https://www.backblaze.com/blog/the-3-2-1-backup-
strategy/](https://www.backblaze.com/blog/the-3-2-1-backup-strategy/)

~~~
willis936
Again, this is semantics. Redundancy is backup. That doesn’t mean it’s a good
idea to keep data in one physical box in one physical location. I also never
implied that. Please follow HN guidelines paragraph 3.

~~~
seized
Redundancy is not backup based on industry standard terms. Redundancy protects
against some hardware failures but not accidental deletes, filesystem
corruption, malware, or a whole host of other things.

Even filesystems with snapshots can be lost and snapshots are the only time
you can start blurring the lines between redundancy and backups in terms of
protecting against data loss via deletes etc. And they are still not proper
backups as you can still have a hardware failure significant enough to wipe
out all your data.

A backup is a separate copy of data, either on separate hardware or off-site
or a other combinations. Based on best practices and standard terminology.

No one uses "redundancy" to refer to separate copies of data especially when
discussing RAID in contexts like this discussion. So the only interpretation
of your comment is the weak one that I responded to.

------
crmrc114
The comments here are a treat to read. I have used hardware raid for years.
But I stick to only raid 1/0 or combinations of those two.

With hardware from 3ware/lsi/broadcom or whatever their name is this week I
have never had an issue or a rebuild failure. With software mdadm managed
arrays in Linux I stick to the same raid types and have never had any issues.

One of my home arrays alone is over 50tb. Systems at work with an unnamed
faang company are much larger. The cattle commonly have raid 10 arrays as part
of their recipe.

I was unaware people were scared of raid.

As I tell everyone though raid=\=backup, common mode failure can strike
anything.

------
ars
Suggestion to anyone setting up RAID on Linux: Use lvmraid, rather than mdraid
directly (lvmraid will use mdraid behind the scenes).

The flexibility it gives you for later is very large - things like migrating
to new hard disks, adding disk space, adding a cache layer (lvmcache), it's
quite worth it.

------
remote_phone
This article is surprisingly naive.

I’ve worked at a well known enterprise storage company. The biggest thing
drilled into us was that we couldn’t trust anything reported to us by the hard
drives. Hard drives can report back to the OS that the write succeeded even
when it didn’t. When something like this happens you’re in for a world of
hurt, especially over time. This isn’t a bad sector, it’s corrupt data.

~~~
louwrentius
Sure, at scale this kind of thing will happen. At scale, even small risks
become a certainty.

My article was more geared towards home / small SMB users.

Silent data corruption will happen. Will you hit that at home? Probably not.

But maybe people should stop buying Synologies, QNAPS and all that kind of
hardware, I don't know ;-)

~~~
remote_phone
“Probably not.”

How can you even begin to quantify this risk if you have no idea what the
problem is, or if you’ve never actually checked? That’s irresponsible to tell
people “nah you being run into this problem” or “Eh, RAID5 works great!” when
you’ve never looked into the stats or supported people who end up getting all
their data lost?

This is a real problem and anyone not running two drive redundancy in 2020 on
data that they care about is going to lose their data eventually.

If they don’t care about the data, then sure it’s not a big deal. I myself
have 2 Synologies where I back up one to the other. The backup Synology is
raid 5 because I don’t care so much about losing data, but my main drive is
RAID 6.

~~~
magnawave
I think his point was about drives silently returning corrupt data, not that
you shouldn’t have redundancy. Absolutely you should have redundancy and
backups.

At the (tens of) thousands of drives scale (and where you treat drives as
cattle) having some extra checks and balances for what you write is an
excellent idea. Even better is doing that in a distributed way so multiple
machines make that same decision. Occasionally you will run into a drive where
the firmware has jumped the shark or some such. (Or CPUs or memory or bus
issue for that matter).

But generally speaking drives DO know when they are returning bad data and
will error before they will do that. The odds are about as good as other forms
of hardware errors that will eat your data.

------
secabeen
The other weird thing about the standard RAID conversation is the assumption
that every bit on your disk is equally valuable, and that losing a single bit
is as catastrophic as losing an entire drive. Yes, RAID rebuilds are hard on a
drive, and yes, you can get read failures during a rebuild that can cause data
loss during a RAID5 rebuild. However, of the multiple TB stored on my drives,
if I lose a few sectors to a read error on the rebuild, what are the chances
that I will even notice? That error could be in unused space, or a movie I
downloaded from iTunes, or any number of blocks that store necessary but non-
personal data.

I wish that there was more data distinguishing total drive failures from
single block errors and from everything in-between.

------
johnklos
Thank you for this. I'm tired of the FUD surrounding RAID. It seems that the
same anecdotes are passed around without any vetting, and it helps nobody.

~~~
jbritton
I had a RAID level 1 on a desktop Windows machine. My machine would frequently
crash and then the RAID would have to fix itself which made the machine
unusable for the entire day. You could barely move the mouse until the rebuild
was finished.

~~~
to11mtm
Sounds like something was wrong.

I don't know if it's anything like this still today, but ideally for RAID
setups to work well the drives should be as close in spec as possible
(ideally, the same drives). Mixing drives too far mismatched leaves it up to
the controller to keep drives coordinated.

When we did RAID 1 for a setup at work (lol budget constraints but we needed
something for storage) IT insisted that we at least had a proper hardware RAID
card. I can't remember whether the rebuilds were fast, or if they ran online
and just slowed us down for a bit, but we were never down restoring for more
than a couple hours.

The irony of the situation was that we actually were blowing ALL the hard
drives more frequently in that box. At first we thought it was just the power
supply (Oh, that day was terrible. PS blew up alongside one drive, and when we
replaced drive+PS the tech may have picked the wrong direction to restore :X)

But, as it turned out the case just was too dang hot to run 2 5400 and 2
7200RPM hard drives at once. (Yes, it had enough bays to fit it all.) Even
after beefing up the power supply we would have to replace a drive every 9-12
months, and usually they were in the same location on the case.

FWIW it was an i7-920 with 12GB ram, nice SI Raid Card, running Server 2008
Terminal server. (Just... don't ask.)

~~~
Izkata
> ideally for RAID setups to work well the drives should be as close in spec
> as possible (ideally, the same drives).

Everything I've read says this is a bad idea, because it's much more likely
they'd die at the same time...

~~~
funnybeam
You want the same model but make sure they are from different batches.

When buying SANs the vendor will routinely mix the disks supplied to minimise
the chance of any manufacturing defects taking out multiple drivers at once

~~~
to11mtm
Yeah I should have been more specific there. For RAID you probably don't want
the exact same batch for all drives, accident waiting to happen.

Now CD/DVD-Burners on the other hand, if you ever want to run 4 of those at
once, at the shop we found your best bet was to get sequential serial numbers,
back then even a firmware rev could make a multi-burner setup squirrely.

But I suppose that's a different form of Array...

------
aSplash0fDerp
RAID 1 is dead!! Long live RAID 1!!

Cloud had a meteoric rise due to the paltry size of storage at the time (edit:
partly, compute too).

[https://en.wikipedia.org/wiki/History_of_hard_disk_drives](https://en.wikipedia.org/wiki/History_of_hard_disk_drives)

10TB+ NVMe with an external interface changes the formula for large data
transfers over slow pipes. Its faster to physically ship it or more likely,
too cost prohibitive to use metered networks for distribution.

Long live RAID 1!!

------
013a
I'm something of a data hoarder. 20TB array of four disks.

I tell everyone who is interested in this "hobby"; Synology. Their products
are, bar none, the best in the industry. I'd estimate that their direct
competition will never catch up. They're so easy to use and maintain, I would
assert that if you can install a web browser and check your email, you could
run a Synology NAS. And the prices don't even have much of a surcharge over
DIY (maybe 30%? you could DIY a server for pretty cheap with hand-me-down
enterprise hardware, so hard to compare, but Synology's prices are very fair).

Let me outline a few things Synology does that makes your life so much easier:

\- All of the administration happens through a web portal. And its not a
basic, geeky portal. Its literally a full desktop. They've implemented
windowing, minimizing/maximizing, desktop shortcuts, a start menu, it feels
like a literal desktop running in your web browser. Just YouTube some videos
of "Diskstation Manager".

\- But, its also just Linux. You can SSH into it. It supports everything Linux
does.

\- They don't use ZFS or mdadm; they've implemented their own raid system they
call "synology hybrid raid" (SHR). It supports surprisingly smart heterogenous
drive size utilization while still maintaining N-Drive redundancy; with, for
example, a 2TB+4TB+6TB+8TB drive combination in SHR-1 (RAID-5 equivalent),
only 2TB of drive storage is totally unused (12TB storage + 6TB redundancy).
Traditional RAID-5 would leave 12TB unused [1]. They put btrfs on top of it,
so its not totally custom.

\- When a drive goes bad, the physical unit starts beeping at you. No
configuration necessary. You can see which drive is dead via the LEDs on the
front, swap it out without turning the unit off, and a rebuild begins; you
don't even have to visit the amazing Admin UI if you don't want to. You can
also have it email you if you want, which is far easier than setting up an
email notification on a bare linux server.

\- It has an app store. A server app store! Want Plex? One click and you're
online. Want Minecraft? Done. Want backups? Their "hyper backup" software
supports a dozen targets including another Synology NAS, their own cloud
storage, Google Drive, S3, Azure, SFTP, RSync, a USB drive, Dropbox, bunch of
other stuff, it supports encrypted backups, it supports incremental backups.
Want dynamic dns? Built in, with multiple providers including Synology's own,
no cost. Wordpress? One click. MariaDB? One click. Email server? Done.
MediaWiki? LDAP? OAuth IDP? Easy. All of this is as simple as installing a web
browser. Actually, its simpler.

\- Want to share external links with friends, like Google Drive? Synology
Drive. It has a document and spreadsheet editor, totally custom, and probably
70% as good as Google Drive (which is pretty impressive considering you've
never heard about it). If you have internet which isn't NAT'ed in a way that
blocks hosting a server, it'll work. Generate a public link. Send it. Done.

Synology is awesome. If you thought self-hosting a server was too hard, think
again. Anyone can do this. The unit sitting under my TV has an uptime of three
years (I do have a UPS that can power it for about an hour or two; this has
been used many times). I literally cannot express how little I've touched it
in that time. I maintain all of our company's production cloud workloads on
AWS, and I'd feel comfortable moving some things to a Synology unit, if you
can develop a better story around redundant networking and power (this is the
real challenge with self-hosting; not the server itself, not data redundancy,
but the pipes supplying it).

[1] [https://www.synology.com/en-
us/support/RAID_calculator](https://www.synology.com/en-
us/support/RAID_calculator)

~~~
matheusmoreira
> They don't use ZFS or mdadm; they've implemented their own raid system they
> call "synology hybrid raid" (SHR). It supports surprisingly smart
> heterogenous drive size utilization while still maintaining N-Drive
> redundancy

This is their best feature in my opinion. It allows users to just buy random
drives from time to time and add them to the NAS when they need to expand
storage capacity. Unraid also offers a similar feature albeit with a different
implementation.

Current open source solutions just don't compare. Expanding storage with Linux
device mapper involves failing and replacing each storage device:

[https://raid.wiki.kernel.org/index.php/Growing](https://raid.wiki.kernel.org/index.php/Growing)

Even btrfs wastes 6 TB when given its 2+4+6+8 TB example:

[https://carfax.org.uk/btrfs-usage/index.html](https://carfax.org.uk/btrfs-
usage/index.html)

I wonder why existing systems can't support this easy expansion. It'd be a
huge help for home users who'd like to slowly expand their storage capacity.

~~~
zozbot234
> It supports surprisingly smart heterogenous drive size utilization while
> still maintaining N-Drive redundancy: with, for example, a 2TB+4TB+6TB+8TB
> drive combination

You can do this yourself with ease by creating 2TB partitions on all drives
and setting up RAID separately on each. Then you could use LVM to access them
as unified storage, or keep them separate for extra flexibility when
restoring.

~~~
fomine3
It's theoretically possible but I'm afraid of operation failure for such
complex array. So It's nice to have simple user interface like Synology.

------
danShumway
General question for anyone who feels opinionated: should I be avoiding BTRFS
in software Raids?

My understanding is that ZFS forces you to be a lot less flexible with which
size drives you put into the array, which is what attracted me towards BTRFS
in the first place. Aside from Raid 5/6 configurations, I've never had someone
tell me to explicitly avoid BTRFS.

But whenever I see any articles about Raid, they're always using ZFS.

It's gotten noticeable enough to me that I'm left wondering if the general
consensus is that everyone should just be using ZFS, even if that's something
that I don't see spoken out loud very often.

~~~
louwrentius
Yes, RAID 5/6 in BTRFS is not ready/stable AFAIK.

~~~
danShumway
Is there a strong reason why I would want to use Raid 5/6 over Raid 10 anyway
though? I guess Raid 10 is a bit more expensive, but my (naive) understanding
was that everything else was almost pure upside.

~~~
louwrentius
RAID6 is safer: any two drives can fail. RAID 10 can sustain more drive
failures, but can tolerate only 'certain' drives to fail. And the drive that
really should not fail is taxed during a rebuild.

Do you need the performance or capacity? What are your needs? How many drives
can your chassis hold. How much space do you need?

And so on. Maybe RAID10 is the best option for you but it's not a straight
rule-of-thumb.

~~~
cyphar
> And the drive that really should not fail is taxed during a rebuild.

That is also true for RAID 5/6 (since all drives are stressed during a
resilver of a RAID 5/6).

> RAID6 is safer: any two drives can fail. RAID 10 can sustain more drive
> failures, but can tolerate only 'certain' drives to fail

You should consider two-way mirrors as having the same redundancy as RAID5,
but with faster resilver times (a byte-for-byte copy is faster than parity
calculations -- and you should want your pool to be in a degraded state for as
short a time as possible). You might survive more disk failures, but you
shouldn't count on it.

If you want RAID6-like redundancy, use 3-way mirrors. But that's obviously
more expensive than RAID6 (even though I would strongly argue that 3-way
mirrors are far safer than RAID6).

~~~
barrkel
To rebuild RAID10, only one disk needs to be read completely; to rebuild
RAID6, n-1 disks.

For RAID10 to fail, two disks need to fail, and they need to be a mirrored
pair, so it's a conditional probability. It's possible to lose up to n/2 disks
and for the array to stay up.

For RAID6 to fail, three disks need to fail, but once three disks fail, that's
it, you're out.

This all means that whether RAID6 is better than RAID10 is dependent on the
number of disks and the actual failure rate. The more disks you have in your
array, the more likely RAID10 is to be safer than RAID6.

RAID10 is much, much safer than RAID5. They are not similar in reliability.

Most of the time, RAID6 is safer than RAID10, but RAID10 gets you a lot better
performance.

~~~
cyphar
I didn't say that RAID5 and RAID10 are similar in terms of reliability (most
of my comment said that RAID10 had many upsides over RAID5 in terms of
reliability). I said they you should consider them to have the same level of
redundancy -- unless you like to play Russian roulette with your data. Yeah,
if you have more drives there are less bullets in the revolver but I'd prefer
to not play that game in the first place. If you need a system that can
survive 2 independent disk failures, use 3-way mirrors.

------
ggm
Dell and the special/magic firmware.

MegaRaidCtl and 1001 --options none of them very clear.

"sorry, that drive is FOREIGN"

"sorry, I can't do that unless you drop to BIOS and do seventeen things in a
crude GUI nobody understands"

------
reaperhulk
Ubuntu 20.04's mdadm package doesn't appear to have a crontab entry for
scrubbing actually (while 18.04 does). It _does_ have e2scrub_all, but does
that serve the same purpose?

~~~
bc4m
No, e2scrub_all is a filesystem-level check which serves a different purpose.

mdadm scrubbing is in various states of broken across major distros. It used
to be that each distro had their own cron-based scrubbing scripts. At some
point mdadm introduced its own checks using systemd timers and there have been
bugs upstream and in packaging.

I only noticed because I'm preparing to upgrade an old home server and
carefully double checking everything. I'll probably end up grabbing one of the
old cron scripts and setting it up manually. Very disappointing.

Debian/Ubuntu:
[https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1852747](https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1852747)

RHEL/Centos:
[https://bugzilla.redhat.com/show_bug.cgi?id=1774354](https://bugzilla.redhat.com/show_bug.cgi?id=1774354)

~~~
reaperhulk
Thanks for the context. Unfortunate that this silent failure is present in
20.04 with no fix so far.

------
3JPLW
> Especially for home users and small businesses, RAID arrays are still a
> reliable and efficient way of storing a lot of data in a single place.

Is this really true? These are the sorts of situations where you don’t have a
dedicated IT (or dedicated know-how) to avoid the sorts of “user” problems TFA
assumes are at the root of many online problems.

I know I’ve always been terrified of rebuilding arrays after failures —- and
it’s not just because of the remaining drives’ reliability. It’s because the
tools I’ve used were really hard to use.

~~~
louwrentius
I should write a follow up article: you should test drive replacement /
simulate failure before you start using storage.

This makes you comfortable using it and do things like drive replacements.

Products from Synology/QNAP or other brands have nice interfaces that will
make things really easy for you.

Replacing the failed drive in my array was one simple command:

mdadm --add /dev/md6 /dev/sde

Then the rebuild started.

~~~
cyphar
And similarly, in ZFS it'd be

    
    
      % zpool replace <pool> <bad-drive> <new-drive>
    

And you can also set up hot spares in both systems so that replacing the drive
happens automatically.

One thing I would suggest is that you use /dev/disk/by-id when configuring
mdadm or ZFS. Device renumbering over reboots under Linux isn't an issue for
either system (they look at the metadata on disk rather than the drive name in
/dev), but as an admin it helps to be able to type out the actual serial
number on the drive when you're doing reboots (especially if you have the
serial numbers written on the hotswap bays).

------
RedShift1
The problem is this article has a sample size of 1. This story is basically
anecdotal. Come back to us once you've dealt with 100 RAID 5 setups and tell
us how that worked out.

------
fomine3
This post says that mdraid bitmap solves write-hole problem but is it really?
I thought that RAID journaling solves the problem.

[https://raid.wiki.kernel.org/index.php/Write-
intent_bitmap](https://raid.wiki.kernel.org/index.php/Write-intent_bitmap)

[https://lwn.net/Articles/665299/](https://lwn.net/Articles/665299/)

------
jscipione
"I ran a scrub on my 8-disk RAID 5 array (based on 2 TB drives)" Replace with
RAID 1 of 2 12TB drives and most of the article won't apply to you.

~~~
dhdhhdd
What do you mean? (I'm in the process of setting up a 2 mirrored 12tb drives)

~~~
6nf
A mirror set of two 12TB drives will perform much better than a eight disk
RAID 5 array

~~~
bc4m
Yeah, but what's the cost? RAID5's striping with parity is a cost-saving
approach compared to mirroring. If you can afford to mirror all your data that
will always be the better option.

~~~
fomine3
8-bay NAS also costs. 2x 12TB and 2-bay NAS makes sense for speed, reliability
and not too expensive.

------
cgijoe
Missed an opportunity on the title there, bud. Should be: "Don't Be Af-RAID"

~~~
nikanj
Not everything needs to be a pun.

------
StillBored
I'm a pretty big user of RAID5 these days, despite a decade where I
specialized in storage. There are plenty of reasons for doing
RAID6+hotspares+replication (and a ton of other fancy choices), but RAID5
nails 99% of the use/problem cases in my book when managed properly. Trust me
here, i've seen plenty of raid5 failures... :)

RAID5 is one of the best trade-offs in price/perf/availability you can make.
Just about no one runs RAID+ on their desktop/laptop because the core storage
technology is considered reliable enough that catastrophic failures are rare.
Instead good backup hygiene is practiced.

For most use cases RAID5 adds a couple digits to your
reliability/availability. When properly cared for (scrubbed and smart
monitored) the most exciting thing that will happen is you will drop a drive
every few ten thousand drive hours, and swapping a replacement in within a day
or two, everything will be fine.

The cases that manage to take out a well managed RAID5 array, will likely take
out just about any other mechanism as well. Having RAID6, RAID10, ZFS or
whatever adds additional complexity and won't take care of firmware bugs on
the disks, a batch of disks that all blow the bearings within a couple hours
of each other, something going horribly wrong with the SATA/SAS
controller/expander (think voltage spikes, chassis fires (well i've seen just
about everything!)), viruses or bad filesystem bugs. The latter is scary high,
a lot higher than most people think, and one of the reasons i'm skeptical of
ZFS/BTRFS/etc.

But of course, just like your laptop/desktop you need a solid backup/recovery
plan, hopefully offsite, multiply redundant and with a reasonable recovery
speed. So I strongly encourage everyone to invest and maximize your backup
story before you go beyond raid5.

Now, there are certain kinds of super critical data, and for those there are
fancier solutions (remote replication/etc), but those solutions overwhelmingly
aren't as robust as you imagine. Its the old "a plane with two engines has
twice as many engine failures" mantra. The more layers of fancy filesystem,
replication, deduplicatin, encryption, T10 checksuming, etc you add, the more
places that can fail. Which is why you shouldn't even be thinking about them
until you have a multivendor offsite (including offline) backup story that can
meet you TOS numbers for recovery. It will take exactly 1 building fire to
wipe out all that fancy hardware/sofeware, or at least take it offline for
days. The last thing you want is the dying breath of your local storage stack
to tell the remote one "ok wipe everything" or "here is write for XXXXXXXX
sectors....." and trash a big chunk of the remote replicant.

Bottom line, RAID5 provides a fair amount of additional assurance against
generally unlikely single drive failures. Nothing you can do on a single
array/filessytem will save you from all those other bad things that can
happen, so backup, backup backup!!!!

------
mrjin
I've read through the article and it turns out to me that he wrote it to show
he was lucky enough not to experience a multi-disk RAID failure. Personally I
owned dozens of HDDs across almost all brands from desktop/laptop HDDS to
Enterprise HDDs or NAS HDDs and almost half of them had bad sectors with a
life span of a couple of months to a few years, some of which even broke in a
row in a few days. The only exception is that in my last batch of 5 Seagate
NAS drives, there was no single failure after around 4 years and I think I'm
just been lucky with those as I did hear quite a few people were complaining
about the same model.

Due to the nature of how HDD works, the surfaces of the disks have more or
less defects leaving the factory and were graded by the count of defects, and
even the enterprise grade one are not free of defects. Those defects and
adjacent areas then are masked by the firmware so that magnetic heads won't
trying to read/write to/from them. But that won't prevent them from growing.
The factories are very clean but the dusts in the air are simply impossible to
avoid and if one lands on the disk, it's a time bomb. Also as the disks'
capacity grows, the tracks get narrower and narrower. The disks spins from
5400 rpm to 15000 rpm maybe even faster and the magnetic heads are gliding
micros from the disk and if the heads touch the disk surface, it almost for
sure to create a bad sector event bad tracks. And again, the vibrations are
impossible to avoid. Making it even worse, the contact are almost guaranteed
to generate debris to be scattered to the whole disk, event goes to other
disks sealed in the same case. The bigger capacity, the narrower the track
will be, the less tolerant to such contact. Based on the fact, I really don't
see where he got the confidence the disks people bought were not going to have
bad sectors during their expected life span.

Also the disks people use to build the RAID are most likely the same brand,
the same model and the same batch and thus if one started to have bad sectors,
most likely the others are going to have them soon or even worse may already
have them. Even if the other disks in the array do not have bad sectors at the
time of the failure, they are highly likely going to have during the
rebuilding process, why? The load. Take a 10TB drive as an example, my Seagate
10 NAS drive can deliver ~180MB/s peak writing and ~80MB/s at the trough.
Let's assume it can deliver 100MB/s all the way, and the disk is half full it
will almost take 14 hours to finish, this load is highly likely to push other
drives to the limit. For RAID6 allows another 1 to fail. For RAID5, if another
drive failed during rebuilding, you data is gone. If you had to rebuild a 12
bays NAS, you would know how frustrating it could be, especially when there
were secondary failures.

Software RAID may somehow alleviate the problem a little bit but it will also
cover up the issue. My experience was that once there is one single bad
sector, there will be lots of them soon, if you don't back up the data and
replace the faulty drive on spot, most likely it will be too late.

All above does not event consider the setup and quirks of NAS or RAID cards.
Put all those issue aside, what's are the benefits of RAID for home users?
IMO, almost NIL. Our home network device most likely are Gigabyte ones with 1
Gbps ports, which can only deliver about 100MB/s, you can see that a single
drive can almost saturate that bandwidth even with link aggregation. And most
people does not need the random IO boost or extra large volumes come with
RAID.

My suggestion for home users is to go single drive and stay away from RAID.

~~~
StillBored
Modern disks are like cd's they have layers and layers of ECC (both spinning
and flash at this point) and a certain assumed BER.

What you describe is the usual case with an unscrubbed drive. As it slowly
bitrots everything appears just fine, until that day you happen to read a file
on a part of the disk that hasn't been accessed in a couple years. The
suddenly the controller's ECC can't correct the error and you take a read
fault. Then you start listing the directory and discover that yah there are
bad sectors everywhere.

So, sure you can have catastrophic failure, but the usual case with a well
scrubbed drive is a corrected/relocated error once in a while. As long as they
remain intermittent your fine, if suddenly you get 100 errors on one scrub and
then the next one gets another 100, you better replace that disk fast. Or the
other case is that you go from just fine, to pretty much nothing can be read
in a single scrub cycle. You don't really have a choice in that case, the
controller will likely just kick it out.

