
XFS, ext and per-inode mutexes - Garbage
http://www.facebook.com/notes/mark-callaghan/xfs-ext-and-per-inode-mutexes/10150210901610933
======
alexgartrell
I'm starting at Facebook in about a week and a half and a big reason I chose
Facebook over Google as an infrastructure guy was (somewhat ironically) their
openness (in infrastructure/engineering practices).

I much prefer the quick Facebook note to the occasional well-edited white
paper.

~~~
RexRollman
Good luck to you!!

------
spudlyo
Mark didn't mention this in his article, but writes to the inode are only
serialized in jfs/ext(2|3|4) if you are using direct I/O (i.e O_DIRECT) which
for performance reasons MySQL ibdata files often are. This is why this problem
does not often show up on filesystem benchmarks.

------
RexRollman
I like both XFS and JFS filesystems. I have used both on and off with Arch
Linux and I've never had a bit of problem with either. Some people don't fancy
them because they are in maintenance mode now but, frankly, I would rather use
a finished and maintained filesystem over something that is in constant
development (unless the new filesystem in development provides something truly
unique).

I think the one thing I miss from BeOS (my all-time favorite OS along with
Nextstep) was BeFS. It was nice to be able to create arbitrary file metadata
and then have it indexed and searchable in real time. Does any Linux FS have
this capability? I believe that XFS supports extended metadata but from what
understand it is not something one can search on.

~~~
danudey
You can't inherently search on it (i.e the filesystem doesn't manage its own
index) but it would be trivial (with something like inotify or its successors)
to write an app to monitor the filesystem for metadata changes and index them
itself. This is how Spotlight works on OS X, with HFS+'s arbitrary metadata -
the spotlight daemon watches for FS events and updates its index when a change
occurs.

I strongly suspect that this is actually what was happening in BeOS as well,
at least on some level. It would be surprising to me if the filesystem itself
was also what maintained the index; it seems like this would reduce write
throughput if it had to update the indexes all the time as well.

------
asb
Does anybody know the ext4 behaviour?

~~~
rg3
Yes, ext4 uses inode mutexes by default like ext3. They're needed if you want
to use a journal, for example. Open
<http://www.kernel.org/doc/Documentation/filesystems/ext4.txt> and look for
dioread_lock for more information.

------
uriel
I used XFS for years, until I got tired of it semi-randomly overriding my
files with zeros and then found out that apparently this was 'by design',
after that I stick with ext3/4 that might not be as fast in some corner cases,
but which are just as fast or more with my usage patterns, and which so far
has been good at not corrupting any of my data.

~~~
moe
_semi-randomly overriding my files with zeros and then found out that
apparently this was 'by design_

Can you elaborate on this?

I'm using XFS in a few fairly large deployments and so far without any
problems. Should I worry?

~~~
jwatzman
Someone _please_ correct me if I'm wrong, but my understanding is that it's
something approximately like this:

In the event of an error and journal replay, the filesystem metadata can be
recovered, but not necessarily the data itself. ext3 tries to keep your data
intact, or at least as much of it as it can piece back together. XFS (and I
think JFS), if they don't know for sure that all the data is intact, just zero
the file. This is by design, on both sides.

So in normal usage everything is fine; it's just a difference in error
handling.

------
gojomo
Anyone know if btrfs would be more like XFS or ext3/4 in this regard?

