
In Praise of ZFS on Linux's ZED 'ZFS Event Daemon' - zdw
https://utcc.utoronto.ca/~cks/space/blog/linux/ZFSZEDPraise
======
jclulow
Despite what the article says about illumos not having something analogous, we
do actually have something and have had it for more than a decade:
[https://illumos.org/man/1M/syseventadm](https://illumos.org/man/1M/syseventadm)

It allows programs to be run in response to sysevents, some of which are
generated by ZFS and some of which are generated by other parts of the system
(e.g., device hotplug).

~~~
otterley
Where are the events that ZFS emits documented?

~~~
spullara
Typing your question into Google reveals that they aren't properly documented:

[http://manpages.ubuntu.com/manpages/bionic/man5/zfs-
events.5...](http://manpages.ubuntu.com/manpages/bionic/man5/zfs-
events.5.html)

------
acd
Insightful article. Zfs is the best file system. Ability to know that you do
not have silent file corruption. Running without raid controllers. Feature to
take snapshots super fast without waiting and that take little extra space.
You can use Cache SSD with ZFS for read acceleration of physical disk.
Transparent file compression. Good command line interface. Now you can take
good actions with ZED for example sending a notification to an alert system
like Slack, ticket system when disk fails or start disk scrubbing/rebuild.

If you are on Linux I can highly recommend ZFS and Minio for S3 like storage.
ZFS for local storage.

~~~
louwrentius
I would advice people to think about what they need.

ZFS is fine, but it is overkill for most home applications and it has a
pitfall related to extensibility.

[https://louwrentius.com/what-home-nas-builders-should-
unders...](https://louwrentius.com/what-home-nas-builders-should-understand-
about-silent-data-corruption.html)

[https://louwrentius.com/the-hidden-cost-of-using-zfs-for-
you...](https://louwrentius.com/the-hidden-cost-of-using-zfs-for-your-home-
nas.html)

So it really depends on your needs. Statements as "ZFS is the best filesystem"
are so meaningless.

P.S. SSD caching often has no tangible practical benefit for most
applications. More RAM is often the better investment. As is with any
filesystem.

~~~
m0zg
> ZFS is fine, but it is overkill for most home applications

Yeah, not having silent data corruption is "overkill", sure. /s Why not use
ZFS? It takes 15 seconds to install, and its CLI is fairly intuitive. Works
fine. Costs $0. Why not, even for "home" applications?

I could see how it could be unsuitable for "entreprise" applications where
there are strict performance requirements etc, but for home, I wish I could
use ZFS everywhere.

~~~
briffle
or install BTRFS and be able to detect bitrot as well, but also be able to
grow your storage by adding more disks. (even different sizes)

~~~
tw04
RAID-Z expansion is being worked on:
[https://github.com/openzfs/zfs/pull/8853](https://github.com/openzfs/zfs/pull/8853)

Last I checked BTRFS RAID5/6 was a dumpster fire and unusable in production.
Have they actually open sourced the ability to fix bitrot detection with
mdraid? If not, it's kind of irrelevant.

So... once again down votes without response - BTRFS raid still isn't
recommended and the file healing isn't compatible with MDRAID I assume and you
just don't like the fact I pointed it out? The "I'm downvoting because you
pointed out a flaw in my logic" @HN is disappointing.

~~~
accelbred
If you're using RAID-Z on zfs, your comparison isn't fair. Rather than use
RAID56 with btrfs, the equivalent would be to get 1 or 2 disk redundancy with
raid1 or raid1c3.

~~~
tw04
RAID-Z is the equivalent of RAID-5. RAID-Z2 is the equivalent of RAID-6.
RAID-Z3 would be the equivalent of RAID-7 (or whatever the standard is named
for 3-disk parity).

This is strictly speaking to how it deals with data and parity, the
implementations are obviously different.

RAID-1 would be a mirror in ZFS parlance.

~~~
accelbred
BTRFS raid1 isn't mirroring drives though, it means there are two copies of
each extent across the whole set of 2+ drives. and BTRFS and raid1c3 and
raid1c4 are 3 and 4 copies.

------
StavrosK
Has anyone noticed _really_ slow deletes on ZFS? I have a home NAS with RAIDZ
and deleting a 1 GB file takes about 30 seconds. I asked on IRC repeatedly but
the disks just aren't very busy during the delete (or at all) and nobody
managed to figure out why this is happening.

It's been that way for years throughout various ZFS versions, and it's driving
me crazy.

~~~
mkj
Try temporarily setting sync=disabled ? If that speeds it maybe a fast ssd log
device would fix it.

~~~
StavrosK
Hmm, interesting, I'll try that, thanks!

------
ksec
Qnap will hopefully have an ZFS based Home NAS out soon.

Are there any simple to use tools that automatically compare same files from
different source and tell if they are different. I have multiple copies of
Data but not knowing which file is corrupted and working through it is a pain
in the bottom.

------
ncrmro
Just installed Ubuntu on a zfs with native zfs encryption. Working real nice

------
yjftsjthsd-h
That's _neat_ , certainly, but I'm struggling to think what to actually use
this for? The article mentions a couple cases, of which "taking action if
devices fail" seems the more concrete; AFAIK that would make it probably
straightforward to replace a drive with a spare? Anybody want to share any
other concrete uses?

~~~
motakraxxer
Mail sysadmins about the drive failures for example.

~~~
yjftsjthsd-h
... thank you, that was the example that made me realize I can/should go use
this at work because it will solve a problem we have:) And a good point in
general.

~~~
onei
Are mail alerts not already a thing on your hardware? I know HP(E) stuff does
it, not sure about Dell or other vendors.

~~~
CaliforniaKarl
I think you are referring to one of two HPE components: Either the iLO, or the
iLO combined with a Smart Array controller.

(Dell has equivalent products: For this discussion, iDRAC is the equivalent of
iLO, and PERC is the equivalent to Smart Array.)

With those products, the RAID controller (Smart Array or PERC) will be
connected to internal and/or external drives, will handle RAID in hardware
(ideally with a battery backup write cache), and (through the iLO or iDRAC)
generate alerts when a drive fails (or is close to failure).

In the context of ZFS, you don't have that. Your drive controller is either
on-motherboard, or a PCIe card like (for example) the (Broadcom) LSI SAS
9300-8e. Those cards do have a RAID option (MegaRAID), but they are often used
without.

The rest of the ZFS storage setup is pretty similar to the setup you are used
to: Internal drives will have a SAS expander (if needed) on the motherboard,
or will use a SAS expander card (for example, an HPE part #870549-B21 SAS
expander card). External drives will be in a JBOD that has one or two
expanders, and which is connected back to the server using SAS cables. One ZFS
difference is that if you have many JBODs, instead of daisy-chaining arrays,
you might choose to use a SAS switch (for example, an A54812-SW-01).

With all that I described with ZFS, I haven't mentioned how RAID is handled:
ZFS handles RAID in software. RAID Z-1 is equivalent to RAID 5, Z-2 to 6, and
Z-3 to 7. RAID also supports RAID 1 (two-drive mirrors), as well as a RAID 10
equivalent (striping across mirrored pairs of drives).

Since RAID is handled in software, and with the physical equipment I
described, it is left to the OS to handle almost all monitoring an alerting.
The one exception is that JBODs and rack-mount SAS switches (like the Astek)
often have an Ethernet connection for monitoring and (basic) hardware control.
But even that can often be handled within the OS, using SCSI Enclosure
Services (SES, where the enclosure/switch itself is a device the OS can see
and query).

------
kim0
Any way to get commercial support for ZoL ?

------
Mekantis
I'd love to switch to ZFS, but the RAM requirements are absurd. I don't have a
separate storage server, and I'm not really to sacrifice 10GB of RAM (1GB/TB
of storage if I'm to believe what I find through Google) on my home desktop
just for it when the vast majority of my data could probably handle a rotted
bit or two.

I did briefly try ZFS on my laptop a year or so ago, and it ate up half of my
RAM permanently. Since it was already fairly limited, that wasn't a sacrifice
I wasn't willing to make either when I have plenty of backups anyway.

~~~
vetinari
That RAM is needed only when you run deduplication (you have to store the
checksums of blocks that you deduplicate somewhere). If you don't, the RAM
requirements are similar to other filesystems.

On your home desktop, you don't have to run dedup. You will get still the
bitrot protection.

~~~
Mekantis
Why wouldn't I want deduplication though? "You can use ZFS just fine, just
turn off one of its most useful features." Really? And I'm being downvoted for
it. Thanks, guys. You realize my home desktop is doubling as my storage,
right? Which goes back to RAM requirements being an issue.

I hate the cargoculting on this fucking site.

~~~
vetinari
> Why wouldn't I want deduplication though?

On a desktop?

Chances are, that you don't have many users saving the same or slightly
modified version of a file on the same storage. For a single person, it
doesn't make much sense.

> "You can use ZFS just fine, just turn off one of its most useful features."

ZFS has many useful features. They come with a price though, because there's
no free computation (see also laws of thermodynamics). It is then a matter of
deciding, which features you want or need, and are willing to pay the price
for.

You obviously are not willing the pay the price for dedup (lvmvdo asks for
similar price, so it is not ZFS-specific), so why are you complaining that you
cannot use it? ZFS still has many more useful features.

You also have another option: add RAM to your desktop. It is cheap. Then you
will be able to use that one feature.

> I hate the cargoculting on this fucking site.

Sigh. I'm actually btrfs fan, all my data are on a btrfs volume (at work, we
do use ZFS though, so I do have the experience). But that doesn't mean I won't
point out something that the _other club_ does well.

