
XFS: the filesystem of the future? - dmm
http://lwn.net/Articles/476263/
======
owenmarshall
It's somewhat amusing that the filesystem of the future was developed by SGI
in 1994 ;)

I've had little to complain about when I've used XFS. It seems like a very
good file system, and has handled the large datasets I've thrown at it without
any problems.

~~~
sespindola
As mentioned on the article's comments, when I used XFS about 8 years ago, I
found exceptionally prone to data corruption.

It remains the only filesystem that I couldn't at least partially recover.

For a good mix of speed, reliability and huge data capacity on Linux, I stick
to ReiserFS, at least until btrfs becomes more stable.

~~~
owenmarshall
8 years ago would make you an _extremely_ early adopter of the Linux port. XFS
has seen significant development since then. I don't think experiences based
on eight year old software hold much water.

XFS has adopted some nasty stigma of being a filesystem that eats your data.
But it seems that for every user that complains about data loss, another does
not.

~~~
Terretta
> _But it seems that for every user that complains about data loss, another
> does not._

So, a coin toss then? You're not helping!

~~~
owenmarshall
No, we need a careful investigation of the filesystem.

Listening to my "works fine" comment is as useless as any other "didn't work
for me" comment.

What _might_ be helpful is a comment saying "I encountered the following
issues with the following configuration, reported the bug, and the maintainers
said ..." What would be even better would be for actual experts to audit the
code, look through the bug reports, and give their opinions.

But "works for me"/"broke for me" comments are, unfortunately, as useless as
most filesystem benchmarks. Indeed, any time filesystem discussions come up, a
stunning majority of the opinions are unhelpful. Unfortunately, I jumped right
in with one as well :(

------
yason
A file system for generic purpose has only one primary feature: "Don't lose
data. Ever." If I pop the SATA cable off my drive while writing, the
filesystem should later remount in 100% uncorrupted condition with whatever
data it had time to write to the disk. I don't want to run fsck or use debugfs
to recover from a hairy state. Backups or RAID-1 take care of physical
failures of the disk.

Beyond avoiding data loss, anything else is _ultimately_ secondary. Speed is
nice but near-average filesystem performance is all right for a HDD. SSD gives
you more speed. Totally abysmal speed _might be a reason_ to switch
filesystems—though, I'd still take the abysmal speed if it saved me my data
and the faster filesystem wouldn't.

Due to this conservative approach, switching filesystems _is really hard_.
I've been using ReiserFS since 2000 or so because it hasn't failed me once.
I've had HDDs going slowly broken and lost some individual files until I
cloned the old disk to a new one. But I've never had to fsck, defrag, recover,
or _anything_ my ReiserFS partitions. Never. It's getting harder and harder to
switch. A conservative alternative would be ext3 but ext3 has lost me data on
another computer.

I have some interest in btrfs but I probably won't switch until I have to. XFS
would be very interesting but not just worth the risk because ReiserFS is good
enough.

~~~
cbsmith
It must seem very strange to you that most filesystems don't do data
journaling and that they have options which increase the risk of data
integrity problems.

------
kijin
> For I/O-heavy workloads with a lot of metadata changes - unpacking a tarball
> was given as an example - Dave said that ext4 could be 20-50 times faster
> than XFS.

A while ago, I read somewhere that XFS is good for a small number of large
files, whereas ReiserFS is good for a large number of small files. I don't
know whether that's true anymore if it ever was; but perhaps extracting a
tarball with thousands of small files in it is not the best way to bring out
XFS's strength?

By the way, when are we expecting the "experimental" label to be taken off of
btrfs?

~~~
noahdesu
> By the way, when are we expecting the "experimental" label to be taken off
> of btrfs?

I believe distributions are waiting on an fsck for btrfs.

~~~
sneak
Hahaha. That's pretty much /thread, innit?

~~~
eru
What are you trying to say?

------
yusufg
Is XFS available in stock RHEL 6.2 ? or does it require an additional purchase
of the Scalable File System Add-On

[http://www.redhat.com/products/enterprise-linux-add-
ons/file...](http://www.redhat.com/products/enterprise-linux-add-ons/file-
systems/)

Redhat doesn't list prices of its add-ons and its generally onerus if one has
to call sales to find the price of a filesystem

Reminds me of Veritas VxFS

~~~
antoncohen
I think it's in the base RHEL 6, try 'yum list xfsprogs', if it's there you
have XFS.

[http://docs.redhat.com/docs/en-
US/Red_Hat_Enterprise_Linux/6...](http://docs.redhat.com/docs/en-
US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/xfsmain.html)

------
waffle_ss
I've had good experiences with XFS on my home file server. The only things to
keep in mind are that once you cannot shrink the filesystem once it's created,
and that fsck doesn't run when booted (you can write an init script to run
xfs_repair if you like).

Also, my fragmentation often gets quite high - over 90% - but it doesn't seem
to really affect performance.

~~~
RexRollman
XFS filesystems can be defragmented. The tool is called xfs_fsr

<http://linux.die.net/man/8/xfs_fsr>.

~~~
waffle_ss
Yes, I have a cron job that does that.

~~~
RexRollman
And yet you are getting 90% defragmentation? I wonder why that is happening to
you.

------
colanderman
I want to throw in another XFS data-eating anecdote. When I used it, I found
it had a tendency to zero out files which were open for writing during a
system crash.

I've been using JFS now for 6 years or so and have never had files go missing
on me. JFS routinely takes second place to XFS in various benchmarks but still
far outperforms ext3 and the like.

~~~
gcp
_When I used it, I found it had a tendency to zero out files which were open
for writing during a system crash._

This was pretty much a FAQ and very much by design. See for example:

<http://madduck.net/blog/2006.08.11:xfs-zeroes/>

~~~
colanderman
Interesting. I don't buy the security argument though. If, after a crash, the
file size is larger than the actual extents, _why not just truncate the file
size to match the extents_?

~~~
gcp
Because there's no guarantees the extents actually point to the file data.

------
calloc
How does XFS stack up with ZFS when it comes to speed, reliability and
protecting my data?

------
jamesfmilne
At our company, we have been shipping XFS as the filesystem for our products
for over 8 years. We've shipped many petabytes of storage in that time, all of
which is being heavily hammered every day in feature film & TV post-
production.

We've found XFS to be a very robust filesystem, and any problems have
generally been traceable to modifications we've made to tools like xfs_fsr.

In general it's been considerably more frequent for people to end up in a
panic after making a mistake replacing a disk in a 3ware array and deleting a
RAID unit! Luckily our resident RAID ninja has been able to get the majority
of people out of that hole. :-)

------
tytso
All open source developers want users. Users means more bug reports and often
more developers to contribute to the prokect. In addition, the greater
relevance also means it's more likely they will be able to get companies to be
able to pay for them to do what they do, conferences to invite them to talk
about their projects, etc. (It's for this reason that when the IMHO over-
aggressive enforcement of busybox license forcing people who get sued to
release code on code unrelated to busybox has caused FUD amongst Linux
embedded companies, the emergency of a replacement toybox project which avoids
these issues has caused so much dismay amongst the busybox development
community.)

Given that Dave Chinner is one of the primary developers of XFS, it's not
surprising that he wants to promote XFS. And to be fair, XFS has not gotten as
much attention as it has perhaps deserved based on technical considerations
alone (as has other perfectly capable filesystems at the their time, such as
JFS) and that's no doubt frustrated him. In addition, the work he has done to
improve XFS removes one of the significant performance bottlenecks often seen
by desktop users and developers, and he should be saluted for that.

That being said, it's also true that in many cases the file system is not the
bottleneck, and so other issues that aren't tightly focused on performance
(i.e., the quality of the userspace tools such as e2fsck and debugfs for ext4,
and their equivalent or lack thereof in other file systems), familiarity by
sysadmins, data recovery services, etc., ease of upgrading of existing large,
production file systems, and so on.

In addition, it's dangerous to draw conclusions from a single microbenchmark
such as fs_mark alone. It's not common that you have workloads which create
thousands and thousands of small (< 64k) files in parallel across lots of CPU
cores at the same time, on the same file system. So using this this benchmark
alone to say that file system X is more scalable than file system Y is just
not going to tell the whole story. Personally, I like to use microbenchmarks
as a tool for improving a file system, and not as an argument to try to get
people to switch from one file system to another. Unless someone's use case is
exactly mirrored by the microbenchmark, I personally find this approach to be
a little dishonest.

I will say that at the moment, many of the developers who have been working on
ext4 are employed by companies who are using ext4 as part of a cloud data
storage stack. This is why there has been changes such as no journal mode
(which is great when you have consistency guarantees being provided by a
cluster file system above you, since they have provide the file even if an
entire server's power supply has exploded), and good performance when under
severe memory pressure (funny how most benchmarks are done when the only thing
running on the server is the benchmark, so there is no competition for the CPU
and for system memory --- XFS in particular have proven to be a memory hog,
and others have noted severe performance degredations and in some cases
stability problems, under memory pressure; not a problem on a stand-alone file
server, but not so great if you are also trying to run VM's or other
applications using the file system on the same machine). Arguably some of
these improvements don't mean as much for desktop users, although I believe
some of the performance enhancements we've made have also trickled down to
help the desktop.

XFS, in contrast, has been focusing a lot of attention on the desktop use
case, and they've traditionally owned the big stream writes, HPC workloads
using huge servers, huge memory, and huge RAID arrays. It's good that XFS has
made these improvements, and I salute them for it. But to state that these two
workloads are the only ones which are important, and therefore they are the
file system of the future, may be overstating matters --- with all due respect
to Dave and his many years of experience working on XFS.

~~~
zanny
Reading the post, a lot of what he says is that XFS scales better than ext4
due to better algorithmic implementations of various "things".

Why not just get active with ex4 (more likely 5) development and work to
introduce his performance improvements into the main line file system already
in use?

~~~
tytso
His performance improvements are very specific to XFS; arguably they were
fixing a problem/fundamental design issue in the original journaling code in
XFS. Ext4's journaling code is very, very different from XFS. So a design
improvement that applies to XFS is not necessarily applicable to ext4. That
being said, there have been times when we've looked at what XFS has done with
some feature (such as delayed allocation writeback for example) and written
code which has been inspired by XFS's algorithms. I'm an engineer, I'll take
good ideas from wherever I can. (Unless I suspect there may be legal reasons
why I had better stay away from certain techniques :-/)

Also, the scalability issues that Dave has been talking about aren't ones that
matter for any of the workloads that the ext4 developers or their employers
care about. We don't have 32 CPU cores all writing small files and requiring
small block allocations to a single file system at the same time. So yes,
there are scalability problems which he has identified with that specific
benchmark that Dave was experimenting with (fs_mark) where ext4 has its own
performance problems that could be fixed with the appropriate developer
attention. It's on my list to look at and hopefully address, but I have higher
priority things that do benefit the workloads that my company is interested
in.

For example, I am currently working on making Async I/O truly Async even when
we need to read metadata blocks, which is a bug that all file systems suffer
from under Linux; AIO is not truly "A". This is something that has been known
for over a decade, but up until now, no one who was funding fs development,
for ext4 or XFS, has had a workload where this has mattered enough. When I do
fix this, it will be an area where ext4 will have an advantage over XFS. Will
I then trumpet this as the reason why they should switch from XFS to ext4? Of
course not. Not everyone has a need for true Async I/O. On the other hand, if
ext4 has such a feature, it may appeal to some application writers, probably
of various different storage servers or perhaps web servers, and if they start
using it, then there may be more workloads where true AIO is relevant. And
that in turn might inspire XFS developers to add a similar feature. This is
why I've always believed competition is a good thing, and why I've never
argued that fs developers should abandon one file system and go work on
another file system. (As Dave has done on the linux-ext4 list, but never mind
that.)

------
aidenn0
I'll have to give XFS a look again. We use subversion at my work and an "svn
up" took 4-8x longer on XFS than on reiser3.6; it was insanely faster for just
about everything else though.

It definitely is not great for sudden power-failure, but really if you're not
using a laptop, buy a UPS, it's not that expensive.

