I use ZoL for my home-server (5x3TB RAIDZ) and couldn't be happier, never had an issue too. The features really spoiled me compared to other filesystems/volume-managers. The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).
Recently I tried running a simple BTRFS SSD Mirror for only a single KVM virtual machine, I thought compression would be neat since they are just two cheap 120GB SSDs and I wanted to have a spare if one of them gave up the ghost.
At first everything was great but after I ran a btrfs scrub, and it found ~4000 uncorrectable errors, my VM was broken beyond repair and had to be restored from backup.
The SSDs SMART data was fine, there weren't any loose cables, everything worked great, no reported (ECC) memory errors... I have no proof but it seems that BTRFS just decided to destroy itself.
I have since moved my SSDs to a ZoL mirror and (after running scrubs every two days for two weeks) had no further corruption, silent or otherwise. To me, this means that btrfs just isn't stable enough for production use - while ZoL is.
> The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).
This is what eliminates it from home use for me.
It means to expand the pool the only way is to copy everything off (so you need enough spare storage to hold your entire pool), rebuild, then copy it back. The alternative is don't expand, and just add additional pools, but that starts losing benefits quickly (not having to think about where there's free space when adding something, not having to look in multiple locations to find something).
Can anyone who has been using ZFS long-term at home comment? How do you add more space?
I have a pool of two raidz2s of four disks each, one twice the size of the other; every so often I replace one set of disks with disks that are 4x the size (i.e. I started with 4x250gb + 4x500gb, after a few years I replaced the 250gb disks with 1tb disks, right now I have 4xt2b and 4x4tb - and I get to store half as much data as the total capacity, so 12tb at the moment). If a disk dies close to the time I was thinking of upgrading then I'll replace that disk with one of the "new" size (but can't use the extra space until I do the rest of the replacement).
It works pretty well - by the time I'm buying disks that are 4x as large I don't mind throwing the old disks away. I've definitely avoided data loss in scenarios where I'd previously lost data under linux md (which lacks checksums and handles disks with isolated UREs very poorly).
Uh?
Let's say you have a simple mirror of 2 vdevs, you go like this:
zpool attach <poolname> <first existing small vdev> <first larger new vdev>
zpool attach <poolname> <second existing small vdev> <second larger new vdev>
[...wait for the resilvering of the new vdevs...]
You now have a 4-way mirror with 2 small and 2 large vdevs. Detach the small ones:
zpool detach <poolname> <first old vdev>
zpool detach <poolname> <second old vdev>
Now you have a pool made only of large vdevs. You just give the pool the permission to expand and occupy all this new space:
zpool set autoexpand=on <poolname>
Done. Did it oodles of times on Solaris with SAN storage, but did it at home too, and in the weirdest ways (no SATA ports available? No problem, attach the new drive via USB3, then when finished take it out of the enclosure and install it inside displacing the old drive), sometimes even rather unsafely (creating a pre-broken raidz with 4 good disks plus a sparse file, migrating data off a different pool, then decommissioning that old pool and using one of its disk to replace the sparse file).
The commonly wished-for feature is not to entirely replace disks as you describe, but to expand it bit by bit. To be able to have e.g. a z2 setup spanning 4 disks of equal size and then add a 5th one later to get one disk's worth of extra capacity.
If you mean turning a 4-disk raidz2 into a 5-disk raidz2, indeed that's unfortunately not yet possible (unless you had built it with an extra vdev sparse file and run it in this degraded state - effectively you had a raidz and not a z2, and it only works once).
If you want, though, you can add capacity by attaching a second raidz2 to an existing pool - this forces you to add 4 new drives instead of one, but it works:
zpool add <poolname> raidz2 <first new vdev> <second new vdev> <third new vdev> <fourth new vdev>
You now have a concat of 2 raidz2's, in a single pool.
Nowhere as elegant as merging a new slice in an existing z2, I concur. Does the job though. Oh and if you are insane you could probably even add a single device, resulting in a concat of a z2 + unmirrored single device. I don't think it will stop you from doing that.
It will result in very uneven utilization which leads to poor performance. Not recommended. Is also extremely wasteful. The whole point of the sought feature is to minimize waste and cost. So no, I'd strongly disagree that it does the job.
I'm not trying to defend ZFS' lack of the ability to add a vdev to an existing RAIDZ/Z2. That would be extremely nice to have and has tons of legitimate uses, for many of which there is no acceptable substitute as you point out.
That said, even if that function existed, expecting it to magically rebalance the whole pool to take into account the existing data is rather unrealistic (it's basically a full rebuild in that case, the most terrifying operation you can perform on any pool/array already containing data).
While I agree with the performance aspects (your data still comes from either the first volume or the second, so you're only using the performance of half your spindles - unless you simultaneously access files in either half), I wouldn't call it wasteful: You are still losing the same % of your total capacity. e.g.:
6x2TB RaidZ2 = 8TB usable (4TB lost to parity)
let's add a second Z2 6x6TB volume:
6x6TB RaidZ2 = 24TB usable (12 TB lost to parity)
Total: 32 TB usable, 16 lost to parity.
What if we had built it from scratch using 6x8TB drives?
That said, even if that function existed, expecting it to magically rebalance the whole pool to take into account the existing data is rather unrealistic
That is exactly what is required (well, you re-balance the vdev, not the pool) and is what is proposed and has been in the works for basically forever. SUN never prioritized this because it has few uses within the enterprise (but extremely valuable for home users).
Your example is quite misleading. Here's is a much more common and realistic scenario.
6x2TB RaidZ2 = 8 usable, 4 lost to parity.
Now, imagine that you want to expand the pool with 4 TB. Being able to add to a raidz vdev would result in this:
8x2TB RaidZ2 = 12 Usable, 4 lost to parity. (ridiculously cheap, no further waste at all)
Current scenario:
6x2TB RaidZ2 + 4x2TB RaidZ2 = 12 Usable, 8 lost to parity. (expensive, lower performance, more power, more noise, requires more harddrive slots (this is often a very important aspect for home-setups))
The waste is outrageous. And the consequences from this waste reflects every aspect of designing a home-setup (as to avoid the above scenario) with ZFS and is vastly different to how you would design a system with, for instance, raid6 with expansion in mind.
Expanding bit by bit is extremely fragile, arcanely complex and error prone. Not worth the amount of financial savings, especially when the cost of time and effort is added on top of that.
It might be what you say, but in many cases it might also be a necessity. I built my first array with 10x1TB disks (starting with 3) for my home server when I was a student/got my first job and could only afford one disk every now and then. Linux' expandable mdraid worked like a charm multiple times. I probably wouldn't have bothered (and would have lost data or limited my hoarding) if I had to save for a year just for a fileserver. I am not talking about precious data but data don't have to be irreplaceable to want to protect them.
For example, you can start with 1 mirrored vdev, then add another mirrored vdev. You can upgrade vdevs separately, so you'll need to replace just 2 disks to grow your pool.
The only thing you have to keep in mind is that data is striped across vdevs only when written, so if you add another vdev, you won't get performance gains for data which was written just to one vdev.
It's also important to keep in mind that a stripe of mirrors, aka RAID10, has a higher failure probability compared to RAIDZ2 or higher on most common disk setups by several orders of magnitude.
You also have to factor in the speed of a resilver and account for the chance of a failure during this at risk window.
A mirror can resilver much faster, and it doesn't significantly affect the vdev performance while doing so. A RAIDZ resilver after a disc replacement can take a significant amount of time, and degrade performance seriously as it thrashes every single disc in the vdev.
Allan Jude and Michael Lucas' books on ZFS have tables describing the tradeoffs of the different possible vdev layouts, and they are worth a read for anyone setting up ZFS storage.
The danger of loosing data on a RAID10 resilver is much higher than people expect. A 6x8TB RAID10 has a 47.3% chance of encountering a URE during a rebuild, which means data loss.
A 6x8TB RAID6 has a 0.0002% chance of a URE.
(Assuming URE rate of 10^-14, in reality this rate is lower)
There is another way to grow a pool: replace each disk with a larger one, resilvering after each replacement.
But yeah, sizing your pool very generously when first building it is a good idea. I found that 6x 4TB drives in a raidz2 pool was within my budget, and it will take me a long time to fill 16TB.
What does the gp mean by saying there's no way to resize a pool? You can resize a pool by adding disks like you say, so what's the alternative? Resizing by only adding one disk? But it's RAID-Z, where will the parity go? Does anyone support this use case?
On BTRFS you simply use the new space or rebalance the pool to use the new disk properly.
On SnapRAID the next scan will add the disk to the parity drive contents.
For low-cost home usage, it is much much more cost effective to only buy single disks and start with small pools over buying large pools or even replacing entire pools.
Recovering from my current backup solution is expensive, the additional cost is not worth it.
Remaking the entire pool is also a hassle and incurs unnecessary downtime.
Additionally, not all data is backed up, which I will loose, as this is not important data, it's okay to loose during a house fire, but not just for resizing the disk.
Lastly, this operation would likely take a long time, days probably, I'd rather just be able to just ram in another disk and be done with it.
I had assumed you would have a second array as backup for the current pool ensuring zero data loss and easy backup. This would seem to be optimal. Remote backup is obviously a good thing to have too.
Btrfs supports it. The parity stays exactly where it is, but new files will use the new device. You can (and far too often have to, when btrfs decides there is no free space left) also perform a 'balance' operation, which recreates all the b-trees on the disks, optionally with different parity options.
Hmm, I don't understand how that's possible on RAID-Z? You can only have as much space per disk as the parity, no? I.e. you can't have three 500 GB disks for a total of 1 GB space, and replace one with a 1 GB disk and get more space, can you?
For each chunk of on-disk data, the fs stores which devices it is stored on and the used parity configuration. You can take one such chunk of data, and clone it into a new chunk with identical contents but a different parity configuration in the free space of the devices that are part of the file system. (Just like you'd allocate new chunks for storing new files in the same parity configuration). Once that copy is created, all references to the old chunk are changed to point to the new chunk and so the old chunk is now free space. Repeat this process for all chunks in the file system, and the whole file system is converted to use a different parity configuration.
> Can anyone who has been using ZFS long-term at home comment? How do you add more space?
My solution is to use RAID1 exclusively. That means that I can keep attaching pairs of devices if I run out of diskspace. I can never get them out again, however :-)
I have run ZFS at home since 2011 and migrated to larger drives one time: from 6x3TB to 6x4TB. The only downside is that the slabs are smaller compared to a native 6x4TB array.
Can't you just use ZFS without RAIDZ and still have your data protected from corruption/drive failures?
I think storage is hard and I never understood the advantage of RAID (at least for home usage). It really only looks like an inflexible option, with too much risk to me.
What's the benefit of RAIDZ over say, you choose to have X copies distributed over your disk(s)?
Anyone who wants to have real security, relies on off-site backups, isn't that right? And aren't RAID(Z)s slow to recover also? (serious questions, I'm a zfs noob)
RAIDZ: I don't know what stripe-set configuration is good for me and don't want to waste time comparing RAID controllers or if I even need one. Then configuring the beast on the hardware and software side just seems to be too tedious. Why not just (de)/attach another disk and let zfs expand/shrink my total disk space without loosing consistency?
Startup idea: Someone clever should find a flexible storage solution that uses aufs, unionfs etc. to give you the flexibility we need.
On ZFS, a pool is resized by replacing all the drives in the pool. When autoexpand=on and the last drive is replaced, the pool will have expanded capacity. And drives are cheap.
When another RAID device is added to the pool instead, it is used as a stripe. Different types of RAID can be combined in this way, for example one can add a mirrored stripe to an already existing RAID-Z. Whether this is desired or not is a different discussion, since the point here is that both scenarios are possible.
Unlike other volume managers, ZFS expands on the disk boundary, rather than physical or logical elements; it's a larger boundary, but since no arcane or complex procedures or knowledge are required and drives are cheap, it's very practical, as well as elegant.
> The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).
Not sure why ZFS would not auto resize for you. It's the reason I have been using it for several years as my home NAS under a linux server. In my case I just replace all the disk on a RAIDZ with bigger ones and automatically resizes up.
Which forces you to waste $/GB and $$ in general on multiple higher capacity disks as opposed to just buying a single new drive.
Not that my 10 TB RAIDZ is running out of space any time soon, but as soon as Btrfs gets their shit together I'm switching. At the rate of a few blu rays a year it'll fill up sooner or later.
Recently I tried running a simple BTRFS SSD Mirror for only a single KVM virtual machine, I thought compression would be neat since they are just two cheap 120GB SSDs and I wanted to have a spare if one of them gave up the ghost.
At first everything was great but after I ran a btrfs scrub, and it found ~4000 uncorrectable errors, my VM was broken beyond repair and had to be restored from backup. The SSDs SMART data was fine, there weren't any loose cables, everything worked great, no reported (ECC) memory errors... I have no proof but it seems that BTRFS just decided to destroy itself.
I have since moved my SSDs to a ZoL mirror and (after running scrubs every two days for two weeks) had no further corruption, silent or otherwise. To me, this means that btrfs just isn't stable enough for production use - while ZoL is.