Hacker News new | past | comments | ask | show | jobs | submit login
ZFS native encryption is currently broken for encrypted backups
17 points by amano-kenji 2 days ago | hide | past | favorite | 19 comments
There are various issues on ZFS native encryption. ZFS native encryption has been especially buggy when raw encrypted zfs snapshots are being sent or received.

https://github.com/openzfs/zfs/issues/11679

https://github.com/openzfs/zfs/issues/15989

https://github.com/openzfs/zfs/issues/15924

https://github.com/openzfs/zfs/labels/Component%3A%20Encryption

https://www.reddit.com/r/zfs/comments/10n8fsn/does_openzfs_have_a_new_developer_for_the_native/

On https://github.com/openzfs/openzfs-docs/issues/494 people unanimously agree that zfs native encryption is broken especially when sending or receiving raw encrypted zfs snapshots, and they blame the zfs leadership for refusing to admit that zfs native encryption is buggy because admitting that it is buggy is bad for the reputation of zfs.

zfs native encryption has been fine for local usage on my machine, but I have never attempted to send raw encrypted zfs snapshots due to numerous warnings.

Thus, I want to offer alternatives to zfs native encryption.

1. If your zfs pool is not large, LUKS is going to be faster than zfs native encryption. I don't know whether LUKS is going to be still faster if zfs pool contains many disks. ZFS native encryption can be as fast as LUKS or faster than LUKS, but it is not for now.

2. For making incremental encrypted backups, I recommend restic. Restic can make incremental encrypted snapshots of ZFS snapshots. You can delete any restic incremental snapshots without losing data in other snapshots. Restic 0.17 started supporting RESTIC_FEATURES=device-id-for-hardlinks which supports backing up $ZFS-MOUNTPOINT/.zfs/snapshot/$SNAPSHOT-NAME efficiently. Restic 0.18 will remove device-id-for-hardlinks feature flag and support .zfs/snapshot directories efficiently witout any feature flag. If you want to back up zfs dataset, you can take restic ZFS snapshot, back up .zfs/snapshot/restic as a new restic snapshot, and delete restic ZFS snapshot after backing it up. In this way, restic doesn't need to know about local sanoid ZFS snapshots which are independent from restic snapshots.

Restic supports compression, encryption, and deduplication. Thus, you can send incremental encrypted backups to untrusted machines. For remote backups, you can use amazon cloud storage, https://rsync.net, https://zfs.rent, and other cloud storage services. I don't have any association with any of these services. I don't recommend any. Do your own research if you want to pick a cloud storage provider.






Thanks for the heads up. I am using regular ZFS on LUKS already. Seconded restic. I researched many backup systems and also appreciate how nice restic is, especially with its compression support nowadays.

How about borg backup?

They are similar in features. Restic is a static binary, so you can copy it with data and it should always work. Restic has a design document, if there are problems in repository, there is knowledge how to remove and rebuild the index. Very nice CLI, and output.

Both have limitations, like they have locks , so you should check on it so that you don’t miss backups. Any easy monitoring solution?


> On https://github.com/openzfs/openzfs-docs/issues/494 people unanimously agree that zfs native encryption is broken especially when sending or receiving raw encrypted zfs snapshots, and they blame the zfs leadership for refusing to admit that zfs native encryption is buggy because admitting that it is buggy is bad for the reputation of zfs.

More details on this can be found in a gist from the same author (keep in mind he's a well known zfs commmiter).

https://gist.github.com/rincebrain/622ee4991732774037ff44c67...


I ran into a short period of time where occasionally sending a raw native-encryption snapshot would fail, but that was the only error, and deleting the failed snapshot resolved the error. Presumably the issue with quotas. That was fixed for me over a year ago and I have half a dozen volumes being snapshotted and sent hourly to three destinations with no errors.

This is on Debian with default kernels and ZoL versions.

I'd still prefer a bit more stability from native encryption.


Have you tried restoring from a backup? Given the numerous issues, I suspect restoring from a backup can still fail.

Yep, individual file checksums match in the destination filesystems, and they get scrubbed routinely to ensure overall durability. I've also been able to send incremental snapshots to a third system from either of the first two, interleaving where the snapshots arrive from, so I am pretty convinced that for my use case the raw snapshot transfer is sufficient. I was also able to change the wrapping passphrase on the origin system and propagate it to the destinations successfully (I can `loadkey` on the destinations with the new passphrase).

I've always felt a little nervous about encrypting backups. The point of the backup is something has gone wrong and you want to get the data back. You don't really want stray bit errors to cascade across vast swathes of the data. Encrypt in transit sure, but maybe not when laying down the bit pattern on the medium at the other end.

It Depends(tm).

Some encryption schemes will cascade bit errors (aka a flipped bit somewhere early on will fail everything and you’ll potentially lose access to the entire archive). Those also are terrible for random access. But usually extremely secure.

Most sane ones for backups use a variant of block level encryption where you’ll lose at most a block worth of data from a bit flip.

Which hopefully whatever archiving format you’re using can recover from, since long term storage already has that issue.

Way back, I had issues like this (unstable hardware) and piped tar through encryption (I forget what library - it might have been gpg haha) then rsbep2 to avoid problems. It recovered from corrupted blocks at least twice.


To avoid bitrot the rule is simply having more replicas on different iron. The chance to get ALL broken is obviously low. The big issue is backing up keys or people who memorize passphrases, because for that there is no pure-tech solution.

Reed Solomon encoding helps with both. But yes, is not a 100% solution.

Restic can detect and fix bitrots in encrypted backups. Restic stores small chunks of files, so only small chunks are affected by bitrots.

It can detect, but cannot fix. It doesn’t have Reed Solomon coding. Very few have, Kopia is the only one that I know of.

ZFS can fix errors too with Raid or copies=2.



This is just repairing the index file, automating what used to be done manually: salvaging the rest of the data that has not been lost hopefully (not restoring lost data).

Does restic not try to fetch the pristine data chunks from the source?

Technically, there are two copies of data. One is on the source. The other is on the destination. Thus, restic can be like raid1.

Restic has checksums for chunks. RAID1 with checksum can repair errors.


Encrypted ZFS raw send and receive in several machines including laptop. No issues since the feature was added.

It is said that sending and receiving raw encrypted zfs snapshots can work, but it fails for enough people that people should be warned about it.

I am not sure if this is correct. The consensus seems to be, there are a number of related bugs pertaining to ZFS raw send and receive. There seems to be a set of very special circumstances that trigger it. In fact, it’s so rare, that ZFS developers don’t have enough reports and dsta to reproduce and fix it. Moreover, those bugs have not led to data loss (someone may correct me if there are confirmed data loss reports among them).

Otherwise, software always has bugs that you can find their bulletins. Like I use restic and Borg and there are sometimes integrity errors. I have repositories in both with integrity errors in them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: