> Al Viro asked if there is a plan to allow mounting hand-crafted XFS or ext4
> filesystem images. That is an easy way for an attacker to run their own code
> in ring 0, he said. The filesystems are not written to expect that kind of
> (ab)use. When asked if it really was that easy to crash the kernel with a
> hand-crafted filesystem image, Viro said: "is water wet?"
This is why an rsync.net account that is enabled to allow zfs send/recv is actually inside a VM and the customer is given their own zpool and their own root login.
It's really resource intensive to do it this way and there are other, much simpler and scalable ways to provide the ability to zfs send into cloud storage ...
However, there is universal agreement among the ZFS coding community that allowing someone to 'zfs send' an arbitrary datastream (in this case, a snapshot) is tremendously dangerous.
In the best case, the malicious actor can crash the kernel and deny service. In the worst case, the malicious actor could destroy the underlying zpool.
 Please consider attending the OpenZFS developer Summit in November if you have any interest in this ...
Otoh it's one of the things that I think made Linux succeed in the beginning. Everyone could upstream a half arsed driver for something and it would get fixed while people use it and encounter bugs. Now that Linux is used professionally everywhere that just isn't feasible anymore.
On another note, I remember that some time ago there was a talk about Linux file system fuzzing given at some conference and ext4 fared the best by far, which is why I'm still using that exclusively, although some of the features of btrfs would come in handy at times.
I’m completely amazed at how fast snapshots can be done, though.
Infinite memory and infinite storage are great abstractions, easy to use models and a lot of software runs on those models... but in this case it seems like ENOSPC should have been designed in from the start.
Maybe Canonical will solve that, we will see how distributing zfs together with kernel will go through.
it's been fine for some of us for, what, nearly 10 years now?
Otherwise, it is a massive pita. I do have one machine with zfs, and I made the mistake of placing the root into a subvolume. Won't do that again.
There's also FreeBSD as well as the various platforms derived from the, now defunct, OpenSolaris.
I've never had compile ZFS myself (though I have, on occasions, chosen to because I've wanted to try features before they hit the repos).
And they are supported by any linux distro out of the box.
That said, I did once run into an issue running ZFS on ArchLinux which caused data loss. That was a highly experimental set up though and was before ZoL really took off (incidentally I've also run Btrfs on ArchLinux and that also caused me data loss).
Hopefully I'm not jynxing things saying this, but ZFS has saved me from excessive downtime on a number of occasions. It has even recovered from corrupted superblock failures (when a RAID controller was faulty and randomly dropping devices during heavy load).
Non-obvious, but very straight forward way around such a wedged in file system, is to add a 3rd device. It could even be a USB stick, back then I was using small 4GiB sticks and it would work. That was enough to allocate a couple metadata only block groups to the stick, to write out the file system changes necessary to back out the second device. And once that completes, a brief filtered balance (e.g. btrfs balance start -dusage=10 is usually sufficien) allows enough free space on the 1st device, to back out the 3rd (the USB stick).
The non-obvious thing about any COW file system is that deletion always requires free space. There is no such thing as deleting a file with COW unless the fs can write that deletion change to all the affected trees into free space. Once the entire set of metadata changes is committed, then the data and metadata extents for those deleted files can be freed.
Anyway, a lot has changed even in one year in Btrfs, let alone the past five years. It's thousands of line changes per kernel release.
PS: As I still struggle to understand, why I'm getting downvotes. So please be so kind and write why, so I can eventually delete this comment.
SSDs implement them internally, and some Android devices use the F2FS filesystem:
Lockfiles, O_SYNC, flush(), etc. all become unnecessary if we just assume that all data is at risk in case of improper poweroff. libeatmydata does this, and dramatically increases performance for some workloads.
Attempting consistent I/O (kernel and hardware will try their best to thwart any attempt) etc. may help in some cases, but ultimately there are hardly any guarantees when it comes to power loss, and your data may be gone or corrupted no matter what you did.
1) compile fs code into wasm
2) generate in-memory disk images
3) run fs code over in-memory disk image
4) use a neural net to search the fuzz space
At that point one could couple something like profile guide optimization but branch predicted adversarial input differentiation. This would automatically find patterns between on-disk data structures and the code that is executed from changes in those structures.
Zero syscall, zero vm exit file system fuzzing all in user space. One could easily get thousands of cores working on this problem in short order.
They would be expected to maintain it either way.
The biggest benefit would probably be if they hope to get Google to make it a standard part of Android, or hope for other manufacturers to start using it (and therefore sharing the work of feature development and maintanance).
But also to make sure that it isn't kicked out of the kernel. Staging is not meant to be a permanent location for any kernel code.
At least I think that's what I did... Man time flies