
The Evolution of Stupidity: File Systems  - darkduck
http://www.enterprisestorageforum.com/storage-management/the-evolution-of-stupidity-file-systems.html
======
perlgeek
The article talks about repeating mistakes, and descends into the world of
file systems, but I didn't quite understand what the problem is that is
repeated there.

Anybody care to explain?

(FWIW my impression is that there's lots of reinventing going on in the open
source FS development; everybody wants to reinvent the cool features from ZFS,
but with improved performance or a slightly different architecture, and they
all seem to be eager to learn from their own and other people's mistakes).

~~~
mmatants
I, for one, thought that he was going to argue against the overall
hierarchical FS metaphor. Personally, I wonder what things would be like if
the OS provided a key-value store (or some such NoSQL-ish API) as a core
service instead.

~~~
srdev
I suspect that you would still have some semblance of built over such a file-
system. Hierarchy is useful for organizing data, and I often find myself using
keys that allow me to abstract hierarchy in my NoSQL systems. ex:
"/foo/bar/0113" -> data

------
wccrawford
I'd hardly call people who design file systems stupid.

I don't expect them to be psychic, or know everything. I don't even expect
them to be the most knowledgeable person in their field. They're just human.

Calling them stupid because you saw something they didn't isn't just rude,
it's ridiculous.

------
0x12
I'm fine with some of the criticisms but using XFS as the posterchild example
of a free file system done 'right' is a bit much. XFS has absolutely terrible
performance for lots of use cases (deletions, for instance).

What the author also fails to understand - apparently - is that the problems
plaguing the storage industry are perennial, they will never be resolved. We
will always yearn for more storage that is more reliable at a lower price-
point, no matter how good our current technology is.

~~~
dexen
_> XFS has absolutely terrible performance for lots of use cases (deletions,
for instance)._

Depends /heavily/ on hardware. XFS evolved on high-end machines, and performs
awesomely when you have:

* EITHER large write cache (write cache size >> journal size), think `decent RAID controller',

* OR at least put the journal is on a separate harddrive -- which is very real on average workstation.

Been there, done that, the difference is astonishing.

It boils down to specific media access patterns: XFS uses mixed
physical/logical journaling; some operations cause a lot of `physical' (i.e.,
low-level) representation of directory and file metadata to be written to the
journal.

Having harddrive's heads fly back-and-forth between different areas (journal
vs. metadata) is a sure recipe for abysmal performance. On the other hand, if
you have large write cache to handle journal, or at least employ separate
harddrive head to serve journal, directory and file metadata has pretty good
locality (thanks to very smart allocator) and the other head doesn't have to
move too much.

EDIT:

it's worth noting XFS handles large (multi-gigabyte) files very well, as
compared to other filesystems. Both r/w access and creation/removal is fast,
on any hardware. This is XFS's original and primary use case: handling large
multimedia and scientific datasets.

Want to keep countless virtual machine images? XFS is the way to go,
especially thanks to smart allocator which lessens fragmentation as compared
to competing filesystems.

tl;dr: XFS is optimized for handling large files. With the right setup --
possible on mid-end hardware -- it also handles numerous small files (say,
linux-kernel-size projects) very well.

~~~
0x12
Yes, large files are fine.

But contrary to your claim many (10's or 100's of millions) of small files and
performance is terrible, orders of magnitudes slower than vanilla ext3.

I don't consider the linux kernel to be 'many' files.

~~~
notmyname
As we were developing openstack swift, we evaluated many different file
systems and settled on xfs because our testing showed it to be faster than
other file systems when storing many small files (where "many" is much more
than the number of files in the linux kernel).

Unfortunately, I can't find the test results that back up my claim.

~~~
0x12
Yes, storing files is fast. We ran similar tests to yours and storing never
was the problem, deletion also wasn't a problem as long as the number of files
was in the low millions. The problem comes around when you start to delete
files once you have 10's of millions of entries.

I helped a customer set up a CDN and this was one of the most painful mistakes
I ever made to correct for. It took weeks to migrate all the data to EXT3
filesystems while the system was live. The whole point was to end up with
something scaleable, the 'sweep' ran 3 months behind the write so after 90
days, with the filesystems nicely filled to 80% capacity or so we found out
that deleting a single file would take an impossibly long time.

In the end we found a manageable workaround, to try to select candidates for
deletion on a directory-by-directory basis which improved performance to the
point that we could migrate the data but it still was a pretty scary
operation.

------
epo
A linkbait headline that makes the writer seem like the one that is stupid.

If anything he seems to be saying that old known mistakes don't get fixed as
we don't learn from the past. This is not evolution but it is an example of
the widescale misunderstanding of that word that seems to be common in some
parts of the world, it is in fact the exact opposite.

------
mother
It's not a very good article. It's rambling, almost incoherent.

------
thirdstation
I think he is confusing ignorance for stupidity. It's difficult to find good,
sensible information about file systems. And I'm only talking about selection
for using a file system. If I wanted to develop one, where would I go to
learn?

------
noahdesu
I think this article is (doing a bad job of) attempting to explain that
limitations behind the current byte-stream oriented interfaces and
hierarchical organizations that have been around for 20 years. These two
limitations show their heads when 1) people want extreme speed (parallel I/O),
and 2) file systems begin to contain billions of files. People are storing
more and more structured data, and building enormous middleware layers on top
of the read/write to only present richer interfaces to higher levels.

------
radicalbyte
Young people only want to talk about the "presentation"? Appliance based
systems were far easier to "management" than provisioning file systems?

Rambling & incoherent this article is.

------
ori_b
Did he have a point?

------
shawndumas
print version: [http://www.enterprisestorageforum.com/print/storage-
manageme...](http://www.enterprisestorageforum.com/print/storage-
management/the-evolution-of-stupidity-file-systems.html)

