
The 'hidden' cost of using ZFS for your home NAS - walterbell
http://louwrentius.com/the-hidden-cost-of-using-zfs-for-your-home-nas.html
======
VanillaCafe
The 'hidden' cost of traditional RAID for your home NAS:

I've learned the hard way that the 'R' in traditional RAID truly does stand
only for "redundant" and not "reliable". Reliability in traditional RAID is
predicated an complete, catastrophic failure of a drive such that it is either
working wholly and completely or failing wholly and completely.

In a traditional RAID, for any failure mode in which a drive or its controller
starts to report bad data before total failure, the bad data is propagated
like a virus to the other drives. The corruption returned by a failing drive
is lovingly and redundantly replicated to the other drives in the RAID.

This is the advantage of ZFS (or BTRFS). Blocks of data are checksummed and
verified and corruption isolated and repaired. Yay for reliable data.

~~~
lbenes
I'm about to either buy a home NAS or build my own with an atom Mini ITX. I
plan to expand my array in the future. So is there any configuration that
gives me the best of both worlds, ie the expandability of traditional RAID
with checksums to prevent replicated errors?

~~~
Uberphallus
BTRFS allows you to add, remove drives, and change RAID levels _on a mounted
filesystem_ , while able to use differently sized drives efficiently. That
alone wins me over. Some people still say BTRFS isn't ready for production,
but I've had fewer problems with it than with ZFS (YMMV). Still, I don't care
much as I have multiple backups.

~~~
xobs
I decided to try BTRFS on my NAS because it didn't require rebuilding a
kernel. The ability to add disks to an array and have it rebalance made it
very appealing.

Unfortunately, the three-drive filesystem lasted two weeks before it became
unmountable. The only thing that let me mount it was finally running btrfsck.
I was left with 57 unrecoverable errors, and lots of lost data.

I would not recommend running BTRFS in RAID5 or RAID6 just yet. Stick with
mirroring, if you want to use it, and rebalance to RAID5/6 later on when it's
more stable.

~~~
washadjeffmad
Which kernel, and did you file a bug report?

e: To anyone not up on btrfs, its features are closely tied to the kernel
version it's used with. For example, raid56 scrub and device replace, and
recovery and rebuild code were not available prior to kernel 3.19.

I also believe the only way to use 5/6 modes before they were stable was to
explicitly compile with them enabled. It wasn't just something you could
accidentally do.

~~~
xobs
It was 4.2.5-1-ARCH, and I didn't file a bug report, no.

I didn't have much data to submit. No kernel panics, no useful error messages,
nothing beyond it saying it wouldn't mount. One could read the tea leaves from
the filesystem as it sat, but such data spelunking could take a while on an
8TB partition, and I wanted to get the disks back into use.

I didn't notice the corruption until after I had unmounted it, so scrubbing it
wasn't an option.

------
laumars
That's not a hidden cost. That's what you'd normally expect with any
traditional RAID. However like many modern solutions, ZFS does have a few
workarounds:

    
    
       MULTIPLE ARRAYS PER TANK
    

If you have space in your storage servers housing, then you can buy multiple
disks at a time as a separate array (eg 3 disk raidz1) and add them to your
existing "tank" (ie storage pool).

    
    
       AUTOEXPANDING INTO LARGER DISKS
    

If you have _autoexpand=on_ set on your tank then you can upgrade the capacity
of each disk in a raidz, one disk at a time. The caveat here is that your
array doesn't increase in capacity until every disk has been upgraded, but it
does still allow you to start the upgrade process incrementally.

    
    
       ALTERNATIVE SOLUTIONS
    

However, if one needs the support of upgrading their array, increasing the
capacity with each new disk added, then ZFS is definitely the wrong choice.
Any traditional RAID array would be ill suited for this. Instead I would
recommend unRAID or similar (I think Windows might have something built in,
but I'm a UNIX guy so couldn't comment there).

------
notpeter
> Other software RAID solutions like Linux MDADM lets you grow an existing
> RAID array with one disk at a time.

His issue isn't with ZFS, it's that most parity raid (raidz, raidz2, raid5,
raid6, etc) doesn't support safely rebalancing an array to a different number
of disks.

With mirrors, the things he describes aren't an issue, especially in a home
server. You can start with one disk, mirror it when you're ready; then add
additional vdevs of mirrored pairs extending your pool as necessary. Or
upgrade two disks to grow a vdev.

[http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-
vdevs-...](http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-
raidz/)

~~~
__david__
Linux mdadm supports re-striping raid5 devices in place. I've done it, it's
fun. If you're raid doesn't support that, it's deficient.

~~~
jlgaddis
> _If you 're raid doesn't support that, it's deficient._

Nowadays, RAID5 is deficient.

~~~
__david__
That's another topic entirely. I'm just saying if you're going to support it,
support it correctly.

------
joosters
Five or six drives for a home NAS? This, IMO, is madness.

RAID-1 a couple of drives, and _take backups_. If the worst happens, you can
restore offline from your backups. If you seriously need 100% uptime at home,
in the face of multiple drive failures, you are doing something badly wrong.
Or, at least, doing something far beyond 'home use'...

~~~
__david__
Not in the slightest. I'd rather have some redundancy up front than try to
restore a bunch of junk from backups.

Backups are important, but not ever having to use them is even better.

Not only that, but having RAID or mirror sets gives you flexibility in ways
that you might not think about up front. For instance, I just replaced some
old 750GB disks with new 5TB disks. I added the new 5TBs to the the mirror set
and let it bring all the data across automatically. When it was done, I
dropped the 750s out and resized the raid set to use the whole 5TB (I told it
it was already clean so it didn't have to sync up a bunch of unused space).
Then, finally, I resized the filesystem that was mounted on that mirror set.
This was all done live, with data actively being read and written to while all
this volume manipulation stuff was happening.

 _That_ is why you use RAID at home (and lvm, too).

~~~
snowwrestler
RAID is not a replacement for backups; they do different things.

RAID preserves performance and/or uptime in the face of hardware failures. It
does not protect against data corruption or deletion. It will happily sync
those across all your disks.

Backups protect against any data loss: hardware, software, even intentional.
The key feature of a backup is that it is insulated from live data operations
(including RAID sync).

~~~
__david__
I never said it was a replacement. I wholeheartedly agree that backups are
indispensable, but when dealing with drive failure, _I 'd_ rather just swap
out a drive and not have to deal with restoring from backups. I find backups
are good for fine grained laser focused restores—trying to restore an entire
drive is either slow, or error prone, especially if one deals with hard links
a lot.

------
protomyth
The BSD now podcast
[http://www.bsdnow.tv/episodes/2015_01_06-zfs_in_the_trenches](http://www.bsdnow.tv/episodes/2015_01_06-zfs_in_the_trenches)
talks about this article and gives some solid recommendations on how to do
your setup. It helps that one of the hosts co-wrote a book on ZFS.

~~~
jlgaddis
And in the most recent episode, #123, this specific article was addressed
(starting at about 11:55):

[https://youtu.be/B_OEUfOmU8w?t=11m55s](https://youtu.be/B_OEUfOmU8w?t=11m55s)

~~~
laumars
You're just reiterating what the previous comment said

------
KaiserPro
Sorry, but, no.

If you've gone to the effort of getting a massive raid like that for home use,
you need a backup.

If you don't have a backup, stop reading, you're an arse.

Run a full backup, check restoration.

Blast away your NAS (its ZFS, relies on free space. The fuller it gets the
slower it gets (unless you have SSDs or lots of write cache)) It's probably
due and upgrade, (OS/XFS)

re-arrange your array, you're changing the way you use it, so you need to do
it.

Restore data back over the top (with compression...)

as for "it costs money" well thats why you chose zfs, for redundancy....

~~~
ownagefool
It depends what you're doing really. I run a 5 disk ZFS cluster to host media
files. I'd rather not deal with failure very often, but when it does happen
and I lose a whole bunch of media, I can download it again.

The second I elect to host anything even remotly important on it, it'll be
backed up, but until then there's no need for me to pay to host that
elsewhere.

Professionals who think they don't need backup's because they have redundancy
do need a slap though.

~~~
leni536
For that use case ZFS seems overkill.

~~~
ownagefool
Possibly, but what's the problem? It's pretty much a "just works" file system
where you can lose 1-2 disks where blocks are continuously checksummed and
scrubbed to ensure consistency.

What exactly would you rather be using?

------
i336_
In case anybody's still reading this, I have a couple of use-case questions.

I'm planning on setting up my first volume pool using 4 5TB Seagate ECs. I
can't get any more than four to begin with.

I was intending on using mirrored vdevs, as I'd heard that was basically the
easiest way to get started. This is entirely for personal use, and I don't
need "enterprise" read/write rates (the hardware I'll be starting with
certainly won't manage that), and it won't be the end of the world for (very
very rare) resilvers to take all night.

Are there any advantages to _not_ using mirroring, ie using one of the RAIDZ
variants? I've already accepted the idea of having to buy new disks two at a
time so I can spin up a new vdev with them.

~~~
laumars
The biggest advantage is storage space. You lose more potential storage space
with mirroring since it's a 1:1, whereas as raidz1 across 4 drives would give
you much more storage space but with the caveat that you can only lose 1 drive
at a time before your entire pool is lost.

~~~
i336_
Hmm. 15TB definitely sounds nicer than 10.... :D :D

I fully intend to buy disks with widely spaced apart manufacturing dates, so
theoretically this should be manageable.

Now I get why 8 disks is recommended... you can make a mirrored pair of 4-disk
raidz1s, you get >50% storage efficiency, and you can lose any two disks
before it dies.

------
arca_vorago
I know everyone is talking about btrfs and zfs, but I would just like to give
a shoutout to DFLYBSD and HAMMER2, which is also doing great work in this
regard (among other things)

~~~
tw04
HAMMER2 has been in the works since 2012 and still isn't considered something
you should put any data on that you care about. No offense to Matthew, and I
applaud his hard work, but HAMMER2 isn't being mentioned because it isn't even
part of the conversation. At this point we can't even be sure it will ever
make it to a production ready state.

~~~
arca_vorago
You are correct about Hammer 2, but dont forget about Hammer 1.

------
mavhc
My experience with ZFS on my home NAS: started with 6 2TB greens, 1 boot, 5
raidz1, got 5 3TB greens in another raidz1. One of each failed, replaced.
Recently got 6 8TB SMR drives in a raidz2.

Never used ECC ram, too expensive. No problem with LSI raid cards, or SAS
backplane. Most important advice: People aren't kidding when they say your
zpool will slow down when it's almost full.

Online resizing of md arrays isn't reliable, my friend lost terabytes. ZFS is
10 years old, well tested.

------
throw7
In the article, adding a vdev of a raid-1 pair of disks would "break" raidz2
redundancy of the other vdevs. Why?

Is it best practice that all vdevs in a zpool use the same RAID level?

~~~
usefulcat
To answer your second question, I think the answer is generally yes, because
loss of _any_ vdev in a pool means the entire pool is lost.

For example, if you initially created the pool with a single raidz2 vdev,
probably the only way it makes sense to add a second raidz1 vdev is if you
later changed your mind about how important your data is.

------
cpncrunch
That's why I like Synology for a home NAS: you can just add/replace drives
with a larger one any time you like, and the Synology box sorts it out.

------
mozumder
What are the cheapest drives we can use for ZFS NAS for occasional media
access?

Can we use the absolute cheapest Western Digital Green drives, with limited
lifespans? Does ZFS spin down drives when not in use?

Actually, how long do the cheapest drives last when spinning 24x7?

I ask because Best Buy has 5TB Western Digital drives for $110 today:
[http://www.bestbuy.com/site/wd-my-book-5tb-external-
usb-3-0-...](http://www.bestbuy.com/site/wd-my-book-5tb-external-usb-3-0-hard-
drive-black/4222407.p)

But these consumer drives never list reliability figures, unlike data center
drives.

~~~
zer01
Greens would work, but are limited to 5400RPM so your data transfer rate will
take a hit with that.

Reds are WD's "NAS" drives, with some advanced features, but the Segate
ST2000DM001/ST3000DM001 have worked very well for me with multiple years of
almost constant data access.

Sidenote: the upgrade from 4x2TB to 4x3TB was extremely simple, I just failed
my drives over one at a time, so as long as you do everything at once you can
expand everything really easily with ZFS.

~~~
throwaway7767
The seagates worked well for me for a long time when they were directly
attached to SATA ports, until I got a chassis with a SAS backplane. Their
firmware had some bugs which would stall all transfers on the whole backplane
(not just the one disk) for about 30 seconds when there was a lot of activity.

I switched to WD REDs after that, haven't had any issues since.

~~~
zer01
Really? That's interesting (and shitty!). Thanks for the heads up!

------
stevejones
ITT: people who've never done an online resize

------
Spooky23
With the wide availability of cheap backup services like backblaze, this nas
building nonsense is a complete waste of time.

I have 3-4 direct attach usb and FireWire disks. One is a working disk, one
backup, the rest for various things.

If one of the working disks fails, I have a local backup. If they all fail,
call backblaze.

In all cases, I avoid wasting time and money setting up ersatz infrastructure
that in reality is less reliable and more expensive than the simple solution.

~~~
Amezarak
> With the wide availability of cheap backup services like backblaze, this nas
> building nonsense is a complete waste of time.

My home upload speed is 3mbps.

I make weekly image backups on top of my increment file backups. The images
are ~800GB and 200GB. Additionally, I have a ~1TB media collection.

Uploading the images to Backblaze weekly is clearly unfeasible. Restoring from
old images in the event of a failure would be extremely time consuming.

Therefore, I have a 14 TB NAS that should last me a long, long while and also
serves as a Plex media server. To lose data I really care about requires that
four drives / two computers fail or my house burns down. I am willing to
accept that risk.

~~~
i336_
> I make weekly image backups on top of my increment file backups. The images
> are ~800GB and 200GB.

Yeow. This needs a good helping of ZFS snapshots!

~~~
Amezarak
The images are Windows Backup vhdx's, so I'm not sure that's possible. It
isn't really worth my time to investigate - it happens automatically on a
schedule and over a gigabit LAN it goes fast enough and neither the
performance of the computer nor the NAS are noticeably impacted.

In an ideal world, Windows would just use ZFS, but hey, you can't have
everything. ;)

