
Hierarchical File Systems are Dead - signa11
http://www.eecs.harvard.edu/~margo/papers/hotos09/
======
silentbicycle
So, yes, ontology tends to break down. Are we going to throw out all existing
filesystems, though? Heck no. Fortunately, it's not hard to layer search and
non-hierarchical indexing on top of existing filesystems.

I wrote a simple (Lua + SQLite) script that tags files, searches all known
paths by tag, etc. It's really not hard. It hasn't been hard for decades. Why
hasn't it caught on? Beats me.

Using directories as contexts + symlinks to merge them works around many
issues, but the same ugly breakdown appears in e.g. statically-typed OOP class
hierarchies. ("Is a platypus a mammal or an amphibian?") Sometimes, _is_
isn't.

Basically, I see where they're coming from, but "normal filesystem, with tags"
probably makes more sense to most people than "fully non-hierarchical*
filesystem". And, hell, I keep _spelling_ hierarchical wrong.

~~~
yason
It is not hard but it hasn't caught on because having userspace scripts or
daemons running tagging and indexing passes on the files in an existing file
system is nowhere near transparent. Even if you use inotify or something to
keep track of teh changses in real time, the illusion of transparency _will_
leak sooner than later. And that's when your users' confidence and trust will
vanish into the air.

Indexing has to be built in to the filesystem to make the searchs match the
state of the filesystem 1-in-1. There's no other way.

I've seen this working live only in the Be Filesystem, or BFS. There you can
be sure that if you add an image somewhere, another view of the filesystem
will instantly contain the new file and removing it will again banish the
image from all other views. The file either exists or doesn't, and you will
never notice that a search returned information that isn't no more.

~~~
hexley
Spotlight fulfills all those requirements.

------
thinkcomp
For anyone who doesn't know, Margo is the co-inventor of BerkeleyDB and sold
her company, Sleepycat Software, to Oracle. She's quite knowledgeable as an
academic, a programmer, and as a businesswoman.

------
thecombjelly
I think hierarchies are one of the most common ways to organize data and most
systems in this world are organized hierarchically. Governments, socioeconomic
systems, taxonomies, literature, organized religion, and nature, are all often
organized hierarchically. Humans seem to think in hierarchies better than in
many other systems of organization. Sure, search is great, but even sites like
Newegg and Amazon prominently display hierarchies to organize data, and I
appreciate it. I don't think hierarchal file systems are dead at all. I think
they are a very good way of organizing most file system data, and that search
can be of utility even in a hierarchal file system.

~~~
baddox
Every computer system that uses tagging is a counterexample. We humans think
in tagging as often as we think in hierarchies, but I agree that we tend to
think of hierarchies as more well-defined and organized. This tendency can be
incorrect.

I think it boils down to this: for hierarchical data, use hierarchies. I
believe that files on a modern personal computer will almost certainly _not_
be hierarchical, so it's not ideal to store them hierarchically. Videos, for
example, may be movies, tv shows, screencasts, music videos, etc. Videos may
also be standard definition, 720p, 1080p, etc. It's difficult to say, for
example, which attribute should be the root directory for your videos, since
it's reasonable to want to browse by many different attributes. A hierarchy
does not apply to videos, but attributes could easily be represented by tags.

------
mdonahoe
What's a filesystem? -sent from my iPhone

~~~
tl
It's a scary monster living outside your walled garden.

------
est
Anybody remember WinFS? It's really promising technology back when it's
announced, but Microsoft failed to deliver, along as many of the Vista stuff.

<http://en.wikipedia.org/wiki/WinFS>

~~~
rbanffy
That's one impressive piece of vaporware... The first paragraph suggests it's
coming in Windows 8...

------
bayareaguy
Escaping file naming hierarchies was the original motivation of Namesys aka
ReiserFS[1].

1-
[http://reference.kfupm.edu.sa/content/n/a/the_naming_system_...](http://reference.kfupm.edu.sa/content/n/a/the_naming_system_venture_66797.pdf)

------
harscoat
This awareness I saw it growing at Enterprise Search Customers/prospects from
early 2000's on. I was part of Vivisimo (spin off from CMU) which created a
"clustering engine". In the beginning we always had the "Librarians", pro
categorization & classification, dismissing the interest of Clustering (or
maybe they were justifying their job). Then more people in the organizations
(government or corp.) said yes those 2 approaches (automatic "on the fly"
clustering vs human generated taxonomies) are complimentary. Finally couple
years ago, in the same week, I visited the Government of Israel which said "we
just need clustering" eg. on the left side:
<http://search1.gov.il/govilt?query=israel> and then Ferrari competition in
Maranello (yeah:) even if I don't like cars;) and there their CTO explained to
my surprise [was trying to tell him "we integrate well with your existing
taxonomies, ontologies..."] "forget about any hierarchy or classification tree
in our File Systems, we just search!". Reality/information is much more
diverse & evolving than any predefined categories.

~~~
bensummers
You can combine the two approaches, with "just search" which also uses human
"categorisation and classification". This helps an awful lot with relevancy
ranking.

In smaller data sets (eg not Google scale), where all the items are about
roughly the same thing, you need some human classification for your search to
work well.

~~~
harscoat
Yes you are right. I was a big Friendfeed fan and proposed ideas which we
implemented: humans could modify the search index on the fly with ratings,
comments and tagging of the search results. Eg. people were able to search
documents based on people's comment or rating for instance.

~~~
bensummers
I think the lesson from "web 2.0" apps is that if you make something easy to
use and useful, people will use it. So the rise of folksonomies is about
usability, not doing away with structure.

In my experience, if you can make entering more structured metadata just as
easy, people _will_ enter it, and you get a big return in the ability to use
the information you've collected.

------
bensummers
I've spent the last 3.5 years building a platform for "information
applications". The key observation which prompted this was that hierarchical
file systems didn't work well for organising information within an
organisation.

However, hierarchy itself is still incredibly valuable. People think in terms
of hierarchies - it's just that they think in terms of multiple hierarchies
and an item will almost always belong in more than one place in those
hierarchies.

If you allow users to describe items in the way which makes sense to them, and
then search and browse by any of the terms they've used, then you've
eliminated almost all the frustrations of a file system. In my experience of
working with people building complex information applications, you need:

    
    
      * deep hierarchy for classifying things
      * shallow hierarchy for noting relationships (eg "parent company")
      * multi-values for every single field
      * controlled values (in our case by linking to other items wherever possible)
    

Unfortunately, none of this stuff is done well by existing database systems.
Which was annoying, because I had to write an object store.

------
pyre
Whenever I hear these proclamations, I always wonder what a 'source tree'
would look like in a non-hierarchical file system.

~~~
tonyarkles
Your source wouldn't need to be stored in a tree, necessarily. Files in your
source code could be tagged with the module name they belong to.

~~~
pyre
True, but what about when you have multiple copies of the same source tree?
(though I guess this could be less of an issue of everything was using a dvcs)

~~~
groby_b
Git e.g. solves that by addressing via content hash. Which is more or less
inevitable for non-hierarchical storage, since that's one of the few ways you
can disambiguate.

------
kylecordes
(snark warning) I pointed out years ago, that hierarchical file systems would
eventually be unnecessary on Windows, since the clear direction from Microsoft
at that time was to store every file in the entire system in
\windows\system32, than manage it via an extremely deep and complex hierarchy
in the registry... (end snark)

Fortunately, Windows moved in a better direction since then, with user home
directories and most user data stored effectively underneath then, more
organization to the Windows directory, and so on.

------
TheNewAndy
Without having actually read the paper yet (still downloading). I always feel
like these sort of things are nice, but you still want a hierarchy underneath
it all.

For example, I have a document which I want to put onto a USB stick for a
friend. I have no idea how their search metaphor lets me do this. I search for
the document, but now how do I "move" it? Does "moving" even make sense? Now
I'm not doubting that they do have an answer for this, but is the answer
something that will be easy to explain?

If you just start with hierarchy, and put a working search over the top of it,
then it has a nice model that is easy to explain. All files have an address
(like a URL), but if you don't know (or can't be bothered to type) the
address, then you use the search box (like google).

~~~
johnny22
hmm. first thing that popped in my mind: I'd say you don't "move" files. You
can only "give" them.

does that make sense?

~~~
silentbicycle
Hierarchical filesystems have an inherent sense of place, but are non-HFS
about _meaning_. "This has to do with to these ideas." That doesn't translate
to storage, though - if it has to take up space, _where does it go?_

This is why I think non-HFS ideas work better as indexes _on top of_ existing
systems. We don't have the right metaphors yet.

~~~
terry
I'd think the opposite; if you were combining hfs and non-hfs at a visible
level, the way to do it would be to have a non-hierarchical storage system
underneath with superficial hierarchical (and other) views. This would give
you the flexibility to change your schema without breaking file paths while
still being able to organize and name things sanely.

As far as metaphors, I'd say think of calling someone on a landline vs. a cell
phone. With the cell, you don't have to specify where or who joe is
(/people/friends/joe/house|work|carphone|vacation|etc.) in order to reach him
since joe's 1 cellphone number rings wherever he happens to be.

Another example is folders on the iphone -- moving an app into a folder with
other apps doesn't actually move the app from /homescreen/app1 to
/homescreen/games/app1 on the iphone hard disk (as far as I know...), it just
changes the superficial hierarchical view of the data.

------
baddox
I want to know more about how their "thin POSIX layer" would actually provide
backwards compatibility. How could you get a directory listing, look at every
single file's POSIX tag to see what its path is?

------
makecheck
It seems like you could fake tagging in any hierarchical file system by using
directories for tag names and _hard_ links to files (not symbolic links).

A hard link basically gives multiple paths to the same data, sharing
attributes like the owner and permissions. Using any linked path will change
the file. The exception is that "rm" deletes just one link (hence the system
call "unlink"), and the file is only lost when all its links are gone.

------
michaelfeathers
I was thinking about this a while ago when I was writing my own personal
finance software. Rather than embedding any notion of hierarchy, I allow all
transactions (regardless of the source) to have an arbitrary number of tags.
Hierarchy is simply the conjunction of sets of tags. I can easily re-tag
transactions and rollback tagging schemes. Seems to work well for all of the
processing I want to do.

------
motters
I'm not a security expert, but surely there are security implications for a
non-hierarchical file system.

------
Palomides
one of the issues with non-hierarchal filesystems is that the idea of tagging
stuff seems like too much effort. for this to appeal to me, there'd have to be
some big shift in the GUIs for it to be easy to tag/group files

~~~
rbanffy
And that says nothing about non-GUI operations...

------
powera
I thought Harvard professors were above trolling in paper titles.

------
iwr
Do you have a real-life use scenario for this DB-FS paradigm?

------
zeynel1
Funny that their paper is organized as a Hierarchical File System:

    
    
        1 Introduction
        2 The Hierarchical Namespace Albatross
            2.1 Irrelevance
            2.2 Restrictiveness
            2.3 Performance Limiting
        3 hFAD: A New File System Architecture
            3.1 API
                3.1.1 Naming Interfaces
                3.1.2 Access Interfaces
            3.2 Index Stores
            3.3 OSD Layer
            3.4 Implementation
        4 Open Questions
        5 Conclusions

~~~
cdavid
But they are academics. As most people familiar with computers and their
technical usage, I also tend to organize things in a hierarchical manner. But
I cannot ignore that most people around me do not: they just put everything on
desktop and one or two other folders.

~~~
baddox
I keep things fairly well organized on my file server, but I've definitely
encountered times where my folder hierarchy is insufficient. Wanting to get
away from hierarchies isn't just for academics and the disorganized. Here's
two examples:

I naturally have a Movies directory and a TV directory. Under each I have two
directories SD and HD. Sometimes I may want to simply browse all the HD
_video_ on my system.

Where do music videos (i.e. live concerts) go? They're certainly Music, but
it's fair to imagine wanting to see them listed when I'm browsing through
Movies and not listed when I'm browsing my mp3's. Add in tutorial videos for
music production software (I have a lot of these) and it's even more of a
guessing game.

~~~
jhrobert
It would be cool to navigate by file type sometimes.

ie: have some \sound\mp3\all "virtual directory" automatically populated will
all the mp3 files of my system.

Now that I think of it, it would be cool to be able to express such "virtual
directories" by way of declarative filters.

eg: mkdir ./bigfiles ./ -r 'file.size > 100mo, subdir by file.year/file.type,
file.type isnt tmp' -- ..., you get the picture

~~~
groby_b
It's almost like "Smart Folders" in spotlight ;)

(Yes, I get what you're saying - you want to impose a _structure_ on that
smart folder via the metadata. It was just too tempting to make a stupid
comment ;)

