
New tricks for XFS: support for subvolumes - diegocg
https://lwn.net/Articles/747633/
======
CyberShadow
I think the biggest missing feature for home/casual use in XFS right now is
shrinking. Currently, it's impossible to reduce an XFS filesystem (partition)
in size, so you have to commit to your disk layout once you've set it up.
Whether it's to install another OS side-by-side, grow a swap partition,
experiment with or gradually migrate to another filesystem, none are currently
possible without adding additional physical storage or using loop devices.

The same applies to ZFS (it's not possible to shrink a ZFS pool), which is why
I'm currently using btrfs (with all its pain points) on my machines.

~~~
jbronn
From operations experience, the risks of data loss with btrfs far outweigh
volume shrinking capabilities on either XFS or ZFS.

~~~
CyberShadow
Fortunately, btrfs is really good at backups.

Well, so is ZFS, but in some situations, like choosing what filesystem to use
on a rented dedicated server, any choice that could lead to a situation
requiring physical hardware configuration changes is a non-starter.

ZFS licensing is an issue too, as it means that you can't just boot into any
ol' Linux live CD (or remotely boot into a rescue environment) to fix the
system or salvage the data on it.

~~~
laumars
> ZFS licensing is an issue too, as it means that you can't just boot into any
> ol' Linux live CD (or remotely boot into a rescue environment) to fix the
> system or salvage the data on it.

Sure you can. I've done just this with 3 different live CDs: ArchLinux,
FreeBSD and OpenSolaris. I'm fairly sure I've also used ZFS on the Ubuntu
Desktop live CD as well but that was just for playing rather than rescusing a
degraded system.

~~~
vetinari
FreeBSD and OpenSolaris probably aren't very useful, when trying to rescue a
Linux system. Especially if you need to chroot and run things from there. (My
need so far was to rescue non-booting system, because the zfs package upgrade
went wrong and didn't update spl too. Re-running dracut would be a somewhat
problematic from these two systems).

Ubuntu desktop live CD doesn't contain zfs, you have to install it from apt.

However, if you have a ZFS system, I see no problem with having an USB stick
with minimal installation of your distro of choice, together with ZFS support.
I'm glad I did have it since the ZFS install.

~~~
laumars
I think you're nitpicking a little to be honest. None of those problems are
hard to workaround:

 _> FreeBSD and OpenSolaris probably aren't very useful, when trying to rescue
a Linux system. Especially if you need to chroot and run things from there.
(My need so far was to rescue non-booting system, because the zfs package
upgrade went wrong and didn't update spl too. Re-running dracut would be a
somewhat problematic from these two systems)._

I do see your point but it really depends on the problem as not all recoveries
require chroot / package management access. I've rescued Solaris (not
OpenSolaris) with an OpenSuse live CD back when a cavalier opp chmodded /etc.
I've rescued OpenSolaris with a FreeBSD CD back when a faulty RAID controller
borked the file system. As for ArchLinux ISOs, I've used them to rescue more
systems than I can count. But as you said, some problems do just require
booting an instances of the host OS via some means.

 _> Ubuntu desktop live CD doesn't contain zfs, you have to install it from
apt._

It took me all of about 10 minutes to bake the ZFS driver into the ISO. It's
not hard compared to the other technical challenges you've discussed. Though
if that's too much effort then I think you can also just apt it from the Live
CD and manually modprobe it into the running kernel.

 _> However, if you have a ZFS system, I see no problem with having an USB
stick with minimal installation of your distro of choice, together with ZFS
support. I'm glad I did have it since the ZFS install._

Indeed. My preferred method is having rescue disks available over PXE booting.
Before then I was forever hunting down my recovery disks or spare USB keys /
CD-Rs. Not to mention the pain involved if the system I was trying to recover
was my main workstation (ie the hardware I'd normally use to download and burn
CDs on).

~~~
vetinari
> None of those problems are hard to workaround:

Sure, there are very few problems that cannot be solved by throwing some time
and sweat on them. However, when I do need to solve something, I prefer to not
be sidetracked by sub-problems. Smooth sailing and all that.

It's much simpler to pull an usb key from the drawer or PXE boot, as you
mentioned, and go on on solving the damaged system, than to start downloading
and preparing a live distro somewhere.

~~~
laumars
Again, you're overstating things. If it genuinely takes you more than a couple
of minutes to run apt and modprobe then I really think you shouldn't be
allowed anywere near a degraded system to begin with. These aren't "sub-
problems" \- they're the absolute basics of system administration.

~~~
vetinari
It a bit more than couple of minutes to download installer, install it
somewhere (livecd doesn't have persistent /), install zfs there and only then
go on doing whatever you were doing.

Compared to grabbing standard media you have somewhere, it will take at least
15 minutes extra.

Basics of system administration does not mean, that you are wasting your time,
especially on something you can be without.

~~~
laumars
_> It a bit more than couple of minutes to download installer, install it
somewhere (livecd doesn't have persistent /), install zfs there and only then
go on doing whatever you were doing._

You don't need a persistent root. I'd already addressed that point. Just run
modprobe and you're done.

 _> Compared to grabbing standard media you have somewhere, it will take at
least 15 minutes extra._

Bullshit. I've done exactly what I described and it did not take me 15
minutes. Furthermore all you're doing is pre-emptively pushing the work to
before your outage which you could do the same with the ISO (if you really
wanted to compare apples with apples).

 _> Basics of system administration does not mean, that you are wasting your
time, especially on something you can be without._

The whole point of this tangent was about when one needs an Live CD. Not about
whether creating a live CD is worthwhile when you already have a USB key. That
new argument you've invented is stupid because the answer is quite clearly
"use the USB key if that's already in your draw." But what happens if you have
a ZFS volume on a system and you don't already have a recovery media? (ie the
original question) Well in that case you can use any of the methods I
described. Or, of course, you can create a USB key too. But that will take
just as long as the methods I described anyway (you still have to download the
OS image, ZFS drivers and write them all to your storage medium. Thus all
you're really doing is swapping out one chunk of plastic with another chunk of
plastic).

~~~
vetinari
> You don't need a persistent root. I'd already addressed that point. Just run
> modprobe and you're done.

That assumes too much. For example, that you have a network connection while
booted from the live media. You may not have one; then you cannot run apt/yum
and you need persistent media that you prepared somewhere else. (Happened to
me).

> Bullshit.

Surely. Or you have extra speedy USB keys. Just installing minimal distro on
USB takes a better chunk of that time.

> The whole point of this tangent was about when one needs an Live CD.

When you are doing something non-standard - and installing ZFS on Linux is
pretty nonstandard - you know in advance that the normal live media won't
work. It's prudent to have something prepared, if/when SHTF event occurs.

Specifically with regards to filesystem, when you are installing with non-
distro-provided-fs root, you need to make it anyway, just to install it in the
first place. So instead of throwing it away, just label it and put in into the
drawer. (When you are not installing on non-distro-fs root, you don't need
support for that fs in live media at all, the standard one will do for making
the system boot).

~~~
laumars
_> That assumes too much_

You've been assuming a crap load of stuff as well when it suits your argument.
Like having a pre-prepared USB key to begin with.

 _> For example, that you have a network connection while booted from the live
media. You may not have one; then you cannot run apt/yum and you need
persistent media that you prepared somewhere else. (Happened to me)._

Indeed. You might also not have a CD drive on the host (happened to me), or
any blank CD-Rs, or a CD burner on your workstation. Or the internet
connection might not work on your workstation either. But then most of those
arguments can be made for creating a USB key as well so your point is moot. In
fact my latest workstation (Macbook Pro) only has USB-C so I couldn't use my
USB keys when I went to install Linux on that.

My point is, if you're looking for ways to nitpick, there are plenty for your
examples as well. In fact there will be a thousand different exceptions for
any solution you could dream up. Thus is the nature of working in IT.

 _> Just installing minimal distro on USB takes a better chunk of that time._

Arguably yes but that also takes longer and your original point was about
getting stuff done as quickly as possible. So you're now contradicting
yourself.

 _> When you are doing something non-standard - and installing ZFS on Linux is
pretty nonstandard - you know in advance that the normal live media won't
work._

Except the whole point of this tangent is me demonstrating where it does work.

 _> It's prudent to have something prepared, if/when SHTF event occurs._

Now you're arguing a different point to the point I was discussing. I'm not
going to disagree with you there (since I've already discussed I run a PXE
server for situations like these) but that wasn't the topic we were
discussing.

I seriously just think you're now just arguing for the sake of winning an
internet argument. I'm not going to argue with you that a CD is better than
USB because it's pretty obvious that isn't the case. But that wasn't the point
I was discussing. So for the benefit of my own sanity can we please get back
onto topic: you _can_ use live CDs to repair a degraded system running ZFS.
Sure there will be occasions when you cannot; but that's the case when doing
anything in IT (and thus why use sysadmins get to command such a good wage).
But generally you can. And I literally have. Many times in fact. So enough
with the dumb "death by a thousand paper cuts" and goal post moving arguments
please.

~~~
vetinari
> You've been assuming a crap load of stuff as well when it suits your
> argument. Like having a pre-prepared USB key to begin with.

You are still conveniently ignoring what I said: if you want to install system
with ZFS root, you have to make it. That's also the reason why I have it. I
just didn't throw it away after the installation.

> Except the whole point of this tangent is me demonstrating where it does
> work.

Yes, if everything is aligned right, it can work.

> I seriously just think you're now just arguing for the sake of winning an
> internet argument.

You are free to think whatever you want.

> you can use live CDs to repair a degraded system running ZFS.

Yes, under certain conditions. How they apply in your environment is up to you
to assess.

> Sure there will be occasions when you cannot; but that's the case when doing
> anything in IT (and thus why use sysadmins get to command such a good wage).
> But generally you can. And I literally have. Many times in fact. So enough
> with the dumb "death by a thousand paper cuts" and goal post moving
> arguments please.

It's not goal post moving, it's what happens. Having a livecd that supports
your configuration is advantageous to not having it. Being able to download a
ready-mady one is advantageous to having to make it. Etc.

So when I can choose between freebsd or opensolaris iso and native system that
fully support whatever I need (that was the original issue, remember?), of
course I will choose the latter, or having the latter available is preferred.

~~~
laumars
_> You are still conveniently ignoring what I said: if you want to install
system with ZFS root, you have to make it. That's also the reason why I have
it. I just didn't throw it away after the installation._

I'm not ignoring it; I've repeatedly addressed it and pointed out how it's not
true (the Ubuntu Desktop example). Want a few more examples? When I installed
ArchLinux with a ZFS root I didn't use a custom ISO (read their ZFS wiki if
you don't believe me). I also didn't create a custom Ubuntu Server ISO when I
installed that with a ZFS root. Both were installed from CD - the vanilla CD
available on their respective websites.

Also, even if you did install from a USB key; what's to say you don't then
lose said key afterwards? I'm forever am losing them.

The point is whichever argument you're going to make will be full of more
exceptions than you can count. So nitpicking one over the other, like you are,
is an utterly pointless exercise and a distraction from the original point I
was making.

 _> So when I can choose between freebsd or opensolaris iso and native system
that fully support whatever I need (that was the original issue, remember?)_

No that wasn't the original issue. The original issue was whether there are an
live CDs that can be used to rescue a degraded ZFS system - which I've
demonstrated there are.

However I do agree with you that running ZFS on Linux is a little pointless
when FreeBSD and the OpenSolaris forks are all solid platforms and have
unencumbered native ZFS support. Though installing a ZFS root on FreeBSD was
just as painful as doing so on ArchLinux (at least that was the case a few
versions ago - things might have improved since but thankfully FreeBSD never
really needs rebuilds so I've not had revisit that particular pain point)

------
qubex
I've been having some fun with a revived SGI Octane running IRIX and it
occurred to me, quite out-of-the-blue, that XFS development essentially ceased
before SSDs were ever even contemplated. For a few moments, I pondered the
apparent profundity of this realisation, and then I moved on, cursing the lack
of package management while trying to get something to work.

~~~
KaiserPro
ah, youre right up to a point. around 2014 there was a great flurry of work
done, which meant its metadata speed improved >50x turning it from a great
streaming file system, to a good all rounder.

~~~
pbh101
It was a bit before that, iirc. LWN article summarizing the changes at [1],
which are mostly delaylog, if we are referring to the same thing. Was default-
enabled in 2011 on Linux 3.3.

... I’ve happened to spend much of the last three weeks learning about XFS.

[1] [https://lwn.net/Articles/476263/](https://lwn.net/Articles/476263/)

~~~
KaiserPro
Aha, good spot. My boss sent me the video of that presentation

------
amelius
By the way, I'm using XFS only because it allows for more than 64k hardlinks
per file. Strangely, this isn't possible with ext4 out of the box.

~~~
xelxebar
Just curious, but what use case do you have for such a voluminous quantity of
hardlinks?

~~~
amelius
Basically copying files without actually copying them. For example, when
making versioned backups, or when copying a large number of files from
production to various development environments.

Something I wish there was a better solution for. Perhaps there is a
filesystem out there that supports CoW in a way that fits this usecase (XFS
perhaps even), but I haven't looked into it.

The disadvantage of using hardlinks is that you can't hardlink between users
(user is a property of the file, not of the link to the file), and there's
always the danger that a write takes place through one of the links. Imho,
that should really be solved at the filesystem level using a CoW scheme.

~~~
josteink
> Basically copying files without actually copying them. For example, when
> making versioned backups, or when copying a large number of files from
> production to various development environments.

To me it sounds like you want simple snapshots and backups without the
redundancy at the storage level.

So why don't use a filesystem which supports that natively, like ZFS or Btrfs?

~~~
mkj
Do they handle 64k snapshots?

~~~
lloeki
That's one backup every hour for 7+ years. I'm _really_ curious about the use
case.

Anyway, theoretically[0]:

> As you might know btrfs treats subvolumes as filesystems and hence the
> number of snapshots is indeed limited: namely by the size of files.
> According to the btrfs wiki the maximum filesize that can be reached is 2^64
> byte == 16 EiB

But it seems practically you're hitting the mud at ~100 snapshots[1] but be
sure to read the reply to that mail as it will depend on the use case and it
might turn out to be fine way beyond that.

[0]: [https://unix.stackexchange.com/questions/140360/practical-
li...](https://unix.stackexchange.com/questions/140360/practical-limit-on-the-
number-of-btrfs-snapshots#147666)

[1]: [https://www.mail-archive.com/linux-
btrfs@vger.kernel.org/msg...](https://www.mail-archive.com/linux-
btrfs@vger.kernel.org/msg72385.html)

------
viraptor
> When a CoW filesystem writes to a block of data or metadata, it first makes
> a copy of it

Is this a really a precise description? I was sure that (for data) the actual
copy only happens in case of a block with multiple references. If there was a
single use of a block I expected it to be modified in place in most real-world
filesystems. Summary from wikipedia seems to confirm that.

Am I missing something, or is that just unfortunate wording?

What they describe with the write of the data followed with write of the
indexes as new elements seems more like a log filesystem.

~~~
aidenn0
CoW stands for "Copy on Write" a system that updates blocks in-place is not is
not CoW.

~~~
viraptor
There's lots of things which are called copy on write, where CoW means really
copy-on-deduplication-otherwise-update-in-place. Like qcow2 filesystem. Or Cow
type in Rust.

------
diegocg
tl;dr XFS will support using filesystem images as if they were directories,
kind of like an "internal loop", which will allow (with the help of copy-on-
write data, which they support in recent versions) having
subvolumes/snapshots.

~~~
josteink
It might just be how you present it, but to me that sounds like using multiple
layers of hacks to implement use-cases other filesystems were carefully
designed to support in the first place, and that sounds extremely unreliable
and brittle.

I'll stick to ZFS :)

~~~
diegocg
Yeah, these subvolumes are going to have scalability issues compared with
cleanly designed subvolumes such as in ZFS. But I wouldn't describe them as a
hack - it's a rather interesting feature that not other filesystem has
explored before. I would describe it as "loop devices done well". I don't
think reliability will be a problem, for the upper layer these embedded
filesystems are in fact just files.

------
brian_herman
Nice can’t wait for the next version of XFS from red hat.

