I build a zfs based NAS last year[1], after 10 months of usage, I have nothing b...

asveikau · on Oct 4, 2022

Is this where we're giving testimonials?

I've been a ZFS user for maybe 6 years. I had data loss this year.

It was quite interesting. Writes would fail on multiple disks at the same time. That's why it was data loss. A normal disk failure wouldn't look like that, failures should be spread out and redundancy would help.

It turned out to be a bad power supply. I was able to predictably get it to corrupt writes with the old PSU. Then I replaced the PSU. No longer failed after that.

I wouldn't have guessed to suspect the PSU. It was a frustrating experience but in the end, ZFS did help me detect it. On ext4 or ffs I wouldn't have even been aware it happened, let alone could confirm a fix.

guilhas · on Oct 4, 2022

Jesus same happened to me so frustrating! Everything working ok, then slowly increase of writing failures and power resets, and strange clicking, eventually all disks drop, scary every time it happened

I swapped every server part before the PSU. Updated Linux, downgraded, tried kernel options... At some point I thought btrfs was the problem so I created an mdadm ext4 raid but got the same problem

It was a btrfs raid 1, lots of fs errors, and files missing until restart, as some disks were down. But didn't lost any data (take that ZFS!) besides the one being transferred as the disks/array went down anyway

aborsy · on Oct 4, 2022

Copy-on-write file systems such as ZFS should be resilient to power outages.

How does power fluctuations or cutoff cause data loss in ZFS?

I frequently force shutdown my laptop and PCs with ext4 when they hang, and never lost data.

rcthompson · on Oct 4, 2022

A bad PSU is not at all comparable to a hard reset. A bad PSU can cause individual writes to fail without the entire computer shutting down, and these failures can and will happen simultaneously across multiple disks since they are all connected to the same faulty PSU. If a filesystem wanted to guard against this kind of failure, I suppose it could theoretically stagger the individual disk write operations for a given RAID stripe so they don't happen simultaneously. (Implementation is left as an exercise to the reader.)

mustache_kimono · on Oct 4, 2022

> and never lost data.

How do you know?

I've seen a transient error due to a misconfiguration of ALPM that XFS/ext4/etc probably would have never caught, but ZFS did.

aborsy · on Oct 4, 2022

Because a good part of it is checksumed with restic.

mustache_kimono · on Oct 4, 2022

I don't mean to be critical of you, restic, or your backup strategies, but it doesn't seem like you know. ZFS is the I really need to know filesystem. I think it's pretty great and I use it where I can. But if it's not for you, it's not for you.

And FYI, an power supply failure does not manifest the same as a power outage.

asveikau · on Oct 4, 2022

As mentioned, it was not usually manifesting as power outage. Random components would fail. Presumably because the PSU was able to keep the system nominally "running" but not delivering the right power to components.

I also experienced some random reboots, things that also looked to me like bad memory... I suspected bad memory at some point. But swapping the PSU did the trick.

bbojan · on Oct 4, 2022

I just recently had a case where writes to an M2 SSD on a newly built computer would frequently fail. Reads were OK.

After replacing the SSD twice and then rebuilding everything using a different motherboard, it turned out it was the PSU.

Really hard to diagnose problems like this.

jen20 · on Oct 4, 2022

> Writes would fail on multiple disks at the same time. That's why it was data loss.

This doesn’t sound like data loss but rather a fault preventing writing… it would be data loss if confirmed writes were lost.

pixl97 · on Oct 4, 2022

Yep, designing around the write hole is hard, especially with non-enterprise equipment. Lots of firmware does unsafe things with cached data and will tell you data has hit disks that has not. The file system can't really do anything about this either, other than tell you after the fact that the data that should be there isn't (which ZFS is very good for).

You can disable write caches for safety, but note that this is very hard on performance.

asveikau · on Oct 4, 2022

I was a little imprecise on my words. It would lose recent writes seemingly randomly, and reading those back would fail. It seemed that caches could mask this for a while.

POSIX systems are pretty lax with this sort of failure. write(2) and close(2) can succeed if you write to cache. If the actual write failure occurs later there is typically no way to let your process know.

abrookewood · on Oct 4, 2022

What did that look like in terms of error messages etc? I'm guessing ZFS would try to write the checksum, which wouldn't work and then throw an error? I assume it never impacted data that already resided on disk?

asveikau · on Oct 4, 2022

zpool status showed an identical number of checksum failures across drives, and status -v would list certain files as corrupt. Reads on those files would return EIO. It was always recently written files.

A large file copy would predictably trigger it. Other times it was random.

aborsy · on Oct 4, 2022

TrueNAS ensures that different versions of different software fit together. So you update your OS, and your setup doesn’t break.

The common tasks are made easy: snapshot scheduling, snapshot replication, Samba share with a few clicks, backup to cloud, users management, one click install of applications, SMART tests, etc. Config management can be a headache.

vladvasiliu · on Oct 4, 2022

As GP said, all these features are nice, but if you don't need them, they're just overhead.

I'm in the same boat, all I have on my NAS is Samba, zrepl (ZFS snapshots and backups) and node exporter (monitoring agent for Prometheus - handles SMART, etc). It's running Arch, and I've never had my setup "break" in more than 10 years of using this distro (though this particular NAS is not that old).

I don't care for a DB, web server and god forbid random applications doing who knows what on my NAS. But if you want your NAS to do everything, it should be great. To each their own, I guess.

Also, TrueNAS is FreeBSD which may or may not work on dinkier hardware. I'm specifically thinking of random watchdog problems on some older Realtek Gigabit NICs, where you had to use a specific patched driver which, at one point, didn't support the latest version.

neilpanchal · on Oct 4, 2022

Yea, TrueNAS comes with a jails manager which has nothing to do with NAS. It's incredibly feature rich: https://static.ixsystems.co/uploads/2020/04/image-4.png

Plus the setup now gets intricately coupled with the host OS. That's a big no for me. I want to treat the host OS as disposable, ephemeral and decoupled for peace of mind. The host's only job is to serve files over samba protocol and run scrub cronjobs. That's it. It gets 2 vCPUs and 24GB of ram. Stays put.

vladvasiliu · on Oct 4, 2022

I guess there could be an argument for the NAS part to be one of the jails.

But then, the whole thing should be marketed more like "Proxmox with batteries included".

KAMSPioneer · on Oct 4, 2022

That's basically how SmartOS does it, as long as you're doing SMB or Samba. And boy, is it ephemeral: the entire OS is loaded into RAM from a USB stick (usually) on boot, and everything gets loaded from the zpool, as OC said, and if you need anything, you put it in a zone. Even VM hypervisor goes in a zone, because it's essentially free and gives you nice clean separation (and quotas, reservations, etc.).

Though NFS can only be shared from the global zone, so if you need NFS you don't get to play with zones.

cassianoleal · on Oct 4, 2022

> the whole thing should be marketed more like "Proxmox with batteries included".

It's not though. Even if you make all the "NAS software" part of the jails (or the half-baked Kubernetes in TrueNAS SCALE), it's not even close to Proxmox when it comes to managing VMs, containers/jails, firewall, networking, etc.

neilpanchal · on Oct 4, 2022

I use zfs-auto-snapshot for snapshots, it works great. I think the only thing I would want is early warning for a disk failure. I'll probably automate that. TrueNAS feels extremely heavy with too many features I don't need. There is a web server, database, etc which feels overkill for a simple home NAS IMO and having to manage the management system.

mustache_kimono · on Oct 4, 2022

> I think the only thing I would want is early warning for a disk failure.

The only other thing, you may be forgetting, is an app you never knew you needed.[0] ;)

[0, an interactive, file-level Time Machine-like tool for ZFS]: https://github.com/kimono-koans/httm

neilpanchal · on Oct 4, 2022

Hnggg.... this thing looks cool. A handful of .rs files in the /src folder. Sold.

mustache_kimono · on Oct 4, 2022

A man/person of taste!

mustache_kimono · on Oct 4, 2022

> I think the only thing I would want is early warning for a disk failure.

https://manpages.ubuntu.com/manpages/jammy/man8/zed.8.html

bubblethink · on Oct 4, 2022

>One thing I didn't understand conceptually at first is that ZFS is standalone and not dependent on the host OS.

This is true for every mature/production file system. You can do the same with mdadm, ext4, xfs, btrfs, etc. The only constraint would be with versions in that it would be a one way street. Can't necessarily go from something new to something old, but other way round is fine.

magicalhippo · on Oct 4, 2022

ZFS stores mount points and even NFS shares in its metadata[1], so more than most others.

[1]: https://openzfs.github.io/openzfs-docs/man/8/zfs-set.8.html?...

yamtaddle · on Oct 4, 2022

My one complaint about ZFS is that I'd repeatedly googled for "what's the procedure if your motherboard dies and you need to migrate your disks to a new machine?" since that's super-easy with single non-ZFS disks but I was worried how ZFS mirrored pools would handle it, especially since the setup was so fiddly and (compared to other filesystems I've used) highy non-standard (with good reason, I'm sure).

And yet, this thread right here has more and better info than my searches ever turned up, which were mostly reddit and stackoverflow posts and such that somehow managed never to answer the question or had bad answers.

The one complaint is that I found that to be true for almost everything with ZFS. You can read the manual and figure out which sequence of commands you need eventually, but "I want to do this thing that has to be extremely common, what's the usually procedure, considering that ZFS operations are often multi-stage and things can go very badly if you mess it up?" is weirdly hard to find reliable, accurate, and complete info on with a search.

The result was that I was and am afraid to touch ZFS now that I have it working and dread having to track down info because it's always a pain, but I also don't really want to become a ZFS wizard by deeply-reading all the docs just so I can do some extremely basic things (mirror, expand pools with new drives, replace bad mirrored disks, move the disks to a new machine if this one breaks... that's about it beyond "create the fs and mount it") with it on one machine at home.

The initial setup reminded me of Git, In a bad way. "You want to do this thing that almost every single person using this needs to do? Run these eight commands, zero of which look like they do the thing you want, in exactly this order".

I'm happy with ZFS but dread needing to modify its config.

mustache_kimono · on Oct 4, 2022

As someone who is a total ZFS fan, I think the `zfs` and `zpool` commands are some of the best CLI commands ever made. Just immaculate. So this comment was a head scratcher for me.

> I also don't really want to become a ZFS wizard

Admittedly, ZFS on Linux may require some additional work simply because its not an upstream filesystem, but, once you're over that hump, ZFS feels like it lowers the mental burden of what to do with my filesystems?

I think the issue may be ZFS has some inherent new complexity that certain other filesystems don't have? But I'm not sure we can expect a paradigm shifting filesystem to work exactly like we've been used to, especially when it was originally developed on a different platform? It kinda sounds like you weren't used to a filesystem that does all these things? And may not have wanted any additional complexity?

And, I'd say, that happens to everyone? For example, I wanted to port an app I wrote for ZFS to btrfs[0]. At the time, it felt like such an unholy pain. With some distance, I see it was just a different way of doing things. Very few btrfs decisions with which I had intimate experience, do I now look back on and say "That's just goofy!" It's more -- that's not the choice I would have made, in light of ZFS, etc., but it's not an absurd choice?

> "what's the procedure if your motherboard dies and you need to migrate your disks to a new machine?"

If you're setup is anything like mine, I'm pretty certain you can just boot the root pool? Linux will take care of the rest? The reason you may not find an answer is because the answer is pretty similar to other filesystems?

If you have problems, rescue via a live CD[1]. Rescuing a ZFS root pool that won't boot is no joke sysadmin work (redirect all the zpool mounts, mount --bind all the other junk, and create a chroot env, do more magic...). For people, perhaps like you, that don't want the hassle, maybe it is easier elsewhere? But -- good luck!

[0]: https://github.com/kimono-koans/httm [1]: https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubu...

yamtaddle · on Oct 4, 2022

IDK. I'm an ex-longtime-Gentoo user and have been known to do some moderately-wizardy things with Linux, and git for that matter, and in the server space I have seen some shit, but I managed to accidentally erase my personal-file-server zfs disks a couple times while setting them up. I've since expanded the mirrored pool once and consider it a miracle I didn't wipe out the whole thing, edge of my seat the whole time.

mustache_kimono · on Oct 4, 2022

> I'm an ex-longtime-Gentoo user

Here for this. Already delighted and amused. ;)

> edge of my seat the whole time.

In the future, you may want to try creating a sandbox for yourself to try things? I did all my testing of my app re: btrfs with zvols similarly:

  sudo zfs create -V 1G rpool/test1
  sudo zfs create -V 1G rpool/test2
  sudo zpool create testpool mirror /dev/zvol/rpool/test1 /dev/zvol/rpool/test2
  sudo zfs create -V 2G rpool/test3
  sudo zfs create -V 2G rpool/test4
  sudo zpool replace testpool test1 /dev/zvol/rpool/test3
  sudo zpool replace testpool test2 /dev/zvol/rpool/test4
  sudo zpool set autoexpand=on testpool
  ...

yamtaddle · on Oct 4, 2022

> > I'm an ex-longtime-Gentoo user

> Here for this. Already delighted and amused. ;)

Haha... yeah, I didn't intend that as a brag or badge of honor or anything—more like a badge of idiocy—but you don't play Human Install Script and a-package-upgrade-broke-my-whole-system troubleshooter for several years without learning how things fit together and getting pretty comfortable with system config a level or two below what a lot of Linux users ever dig into. Just meant I'm a little past "complete newbie" so that's not the trouble. :-)

> In the future, you may want to try creating a sandbox for yourself to try things? I did all my testing of my app re: btrfs with zvols similarly:

Really good advice, thanks. I was aware it had substantial capabilities to work in this manner, but using it this way hadn't occurred to me. Gotta get over being stuck in "filesystems operate on disks or partitions on disks that are recorded in such a way that any tools and filesystem, not just a particular one, can understand and work with" mode. I mean I'm comfortable enough with files as virtual disks, but having a specific FS tools, rather than a set of general tools, transparently manage those for me, too, seems... spooky and data-lossy. Which I know it isn't, but it makes the hair on my neck stand up anyway. Maybe my "lock-in" warning sensors are tuned too sensitive.

Now to figure out how to run those commands as a user that doesn't have the ability to destroy any of the real pools... ideally without having to make a whole VM for it, or set up ZFS on a second machine, and—initial search results suggest this may be a problem, for the specific case of want unprivileged users to run zfs-create without granting them too much access—on Linux, not FreeBSD :-/

magicalhippo · on Oct 5, 2022

Yeah I get your feeling. I had it similarly at first, dreading to make changes because I was afraid I'd mess up.

And to be fair, I think ZFS could be better in this regard. Some commands can put your pool into a very sub-optimal state, and ZFS doesn't warn about this when you enter those commands. Heck even the destroy pool command doesn't flinch if by chance nothing is mounted (which it may well be after recovery on a new system).

I found it helped to watch some of the videos from the OpenZFS conferences that explains the history of ZFS and how the architecture works, like the OpenZFS basics[1] one.

But I agree that the documentation[2] could have a lot more introductory material, to help those who aren't familiar with it.

That said, I echo the suggestion to try it out using file vdev's. For larger changes I do spin up a VM just to make sure. For example, it's possible to mess up replacing a disk by adding new disk as a new single vdev rather than replacing the failing one one, so if I feel unsure about it I take 15 minutes in a VM and write down the steps.

Again, this is something I feel they could improve. Adding a single-disk vdev to a mirrored or raid'ed pool should come with a warning requiring confirmation.

On the bright side, I've been running my pool since 2009, and have never lost data despite a few disk failures and countless unexpected power-outages without PSU. And I just run it on consumer hardware without ECC because that's what I got. Been up to 8 disks, now down to 6 and will soon go down to 4 once the new disks arrive. Send/recive ensures the data is just as it ever was on the new configuration.

[1]: https://www.youtube.com/watch?v=MsY-BafQgj4

[2]: https://openzfs.github.io/openzfs-docs/

fomine3 · on Oct 5, 2022

I think mdadm+LVM+LUKS+XFS is far worse experience for you. Currently ZFS CLI is one of the best in the class for features.

neilpanchal · on Oct 4, 2022

Right, what was not clear to me was that raid configuration and snapshots are also part of the file system. Usually, that's done through hardware cards or software raid where the configuration sits not on the file system (?) (Intel VROC, vSAN, etc), at least that was my wrong conceptual model. I used to make multiple copies of the USB stick of FreeNAS 8 back in the day because I didn't want the USB drive to fail and not knowing how to recover the zpool. Messing around with the file system directly cleared up everything.

viraptor · on Oct 4, 2022

That's more common for Linux storage. Whether you're using zfs, btrfs, or lvm, all the configuration required to read it is stored in the header somewhere rather than in a detached configuration.

vetinari · on Oct 4, 2022

It has to be stored out of the band, otherwise you would have chicken-and-egg problem: what's the shape of the array, when the info is in a file stored inside the array?

viraptor · on Oct 4, 2022

It's not stored in a file inside the array of course, but there are two types of out of band: next to the data, or completely disconnected. For example you only need to mount a single btrfs partition - in the header it already contains the information about other copies and will mount the whole raid setup as necessary. It doesn't matter if you move the drives somewhere else to a new system - mount one of them and it works.

On the other hand, if you move drives from a hardware raid and put in new drives, some (all?) controllers will read the raid config from memory and offer to build the same raid on the new drives. That's completely-out-of-band. Depending on the controller, even changing the order the disks are plugged in can give you weird results.

phpisthebest · on Oct 4, 2022

>no need for FreeNAS or any of that stuf

I would encourage anyone that does not have an system admin background to use FreeNAS(truenas now) or something

Their are default security and maintenance things (like periodic scrubs of the pool) that these nas operating systems set up by default

barrkel · on Oct 4, 2022

ZoL on Ubuntu adds a cronjob for zfs scrub; shipped in xenial (16.04).

jiveturkey · on Oct 4, 2022

> standalone and not dependent on the host OS.

??

The OS needs to implement zfs, like any other filesystem/host combination. Any filesystem can be "moved" to another host simply by attaching the drives, assuming you have hardware level compat and filesystem level compat.

OK the exception is when you have host-based hardware level encryption (ie, key in TPM or other security chip). In that case, zfs and otherwise, you can't just move the drives.

yjftsjthsd-h · on Oct 4, 2022

Oh, it's better than that! If you're crazy enough, you can even multiboot multiple OSs on the same pool:) You do need to ensure that the pool features are sufficiently compatible between ZFS drivers (ex. you can't create a fully-featured pool with OpenZFS 2.1.6 on Linux and import it on OpenIndiana, last I checked), but it does work, empirically;)

neilpanchal · on Oct 4, 2022

A friend of mine suggested this and I didn't like the idea of putting the OS on there especially that it is a virtual machine. There are a couple of things that made this setup nice though: PCIe bifurcation (4x4x4x4) lanes straight to the CPU and PCIe passthrough in ESXi. From what I read that ZFS needs direct access to the underlying hardware. So, I just treat the OS as totally disposable and not worth backing up. I can spin up a VM in less than 2 mins if something were to happen to it.

chasil · on Oct 4, 2022

SmartOS is literally disposable - it runs off a flash drive. Have you considered it for your use case?

bombcar · on Oct 4, 2022

My NAS is a zfs-root Gentoo system. I went with it because I couldn’t beat freenas and friends into doing what I wanted.