

Set Based Filesystem idea - dejan
http://www.aleveo.com/ideas/bsc-thesis-set-based-file-system

======
mjl
luckily, you can implement the idea on top of existing systems. e.g. fuse or
plan9/inferno's 9p/styx fs protocol.

"File indexing and searching for Plan 9", <http://lsub.org/ls/export/tags.pdf>
describes mostly text indexing, but the interface to searching is through a
file system (unfortunately not described in much detail at first glance).

only now do i remember that wrote something related once: attfs,
<http://www.ueber.net/code/r/attrfs>. that's for inferno, using the styx/9p
protocol. i never polished it, am not currently using it, and the interface
the file system provides is not ready for casual file browsing, though that's
mostly a detail (from an amount of code point of view). the things stored are
strings though, not files. but the illusion that it stores files is easy to
build.

~~~
silentbicycle
There's also libixp, a port of plan9's 9p to Unix.
(<http://libs.suckless.org/libixp>)

I've only ever used it with wmii, but I've been looking for an excuse to write
a Lua wrapper for it. :)

~~~
uriel
There are many 9P implementations (most of which run on *nix systems):
<http://9p.cat-v.org/implementations>

I think somebody was working on one in lua, but I don't know if they finished
it.

If you write a Lua implementation or a lua wrapper for an existing C
implementation let me know.

~~~
silentbicycle
Will do. (I'll probably wrap libixp.)

~~~
dejan
Please let us know if you shoot some code. It could be a good open source
project, according to the interest. I'd personally go for defining a community
whitepaper / wiki maybe for the concept and solutions.

What do you think? In the mean time we can build some prototypes on different
platforms.

The real reason why we don't know how and if this interaction model would work
in practice - is the obvious absence of implementations.

~~~
silentbicycle
I'm looking at it now. I'm working on OpenBSD, and just updated the port to
libixp-0.5 - that'll get posted once I test it more.

A Lua _client_ library for libixp should be pretty straightforward, looking at
the source for ixpc. I might have that together this afternoon (particularly
if my fiancee continues napping ;) ).

As for a server library, I'm still thinking about it, but passing an init
function a table with callbacks for the various messages would be pretty
straightforward - similar to LuaExpat's design
(<http://www.keplerproject.org/luaexpat/manual.html>).

I'm silentbicycle on twitter, FWIW.

------
silentbicycle
To use more conventional terminology, that would be a relational filesystem.
Since _a filesystem is a kind of database_ , this is another instance of the
break between hierarchial and relational databases. Filesystems seem to have
largely favored hierarchial DBs (with exceptions, e.g.
<http://en.wikipedia.org/wiki/Pick_operating_system>), since following
pointers to files from a containing directory is more direct than selecting
from the whole filesystem. Filesystem extensions like symlinks are attempts to
selectively add features that would come free with a relational filesystem,
though.

Of course, a relational FS would still be broken up into namespaces (tables!),
much like conventional filesystems frequently occupy multiple partitions.
/sometable/tag+tag2+tag3 , for example, so searching for a file wouldn't
necessarily mean searching every drive. There's no reason that making lots of
small tables couldn't be cheap (they don't have to be disk partitions), and
that would help constrain searching quite a bit.

Also, research into journaling filesystems has major parallels to database
transactions.

~~~
alextp
That's an interesting way of looking at file systems. I think among the pipe
dreams of reiser4 there was a plan to make the fs itself more relational-like
(but I also think this was relegated to a plugin and then promptly forgotten).

------
bayareaguy
I recall this was one of the original goals of "namesys" (the file system that
eventually became ReiserFS) but the paper describing it seems to be lost. The
only reference I found after a short search are a few pages in a power point
presentation[1] comparing it to Cyc. For the curious, the set "grouping"
construct is on slide 7.

1- <http://psy.st-andrews.ac.uk/foldiak/NamesysAndCyc1.ppt>

------
omouse
> _no one_ dares to reevaluate the underlying concepts that bound those
> layers. One of those underlying concepts is the hierarchical file system
> found on every personal computer of today as well mobile device.

I see someone hasn't read a single thing that Ted Nelson has written just yet.

~~~
dejan
The text is a bit too strong, and should be revised from the arrogance of 2006
:D Just copy-pasted it.

Indeed I haven't. I have to change that, thanks for the tip.

------
Hexstream
"Obviously, with sets, the location can be described in several ways, so it is
not a Unique Resource Locatior - only RL."

Wouldn't it be possible to implement the filesystem with tags and then specify
an ordering of the tags to map the graph into the usual hierarchy? The first
part of the URI would specify the mapping (maybe just a name for a particular
mapping), and the second would traverse the hierarchical view in the usual
way.

This way, legacy apps could still think in terms of a tree while exposing a
more powerful structure to new apps.

~~~
silentbicycle
I think worrying too much about reverse compatibility may be a bad idea. With
access to relational features, the filesystems would probably be organized
completely differently. Probably better to have a clean break.

~~~
dejan
I agree, but it seems natural using the same paths. No patches to the concept
would be needed.

------
vicaya
Actually, you can write a wrapper in your favorite language around your
favorite database and call it your favorite filesystem, which is really a CMS.

~~~
dejan
not really. CMS is essentially different than a fs, but I see your point.
While I won't go into explaining why it is, I have to say that some concepts
from both should be brought closer. File should not be seen as a unit, but a
set of tightly bound data. I wrote about this too recently, I'll post soon for
opinions.

------
skwaddar
You want to make your way in the CS field? Simple. Calculate rough time of
amnesia (hell, 10 years is plenty, probably 10 months is plenty), go to the
dusty archives, dig out something fun, and go for it. It's worked for many
people, and it can work for you.

\-- Ron Minnich

------
chiara
I am not sure if the paths would be compatible.

~~~
silentbicycle
It's a set of tags, not a hierarchy of directories. They're unordered, so the
"paths" (which are really tag sets) would be equivalent.

------
malkia
That indeed might be a good system for storing user documents, but not in
general...

~~~
dejan
Why do you think so? I am trying to do a use case and it seems that the usage
would not differ that much. The sets are still seen as folders, and have
subfolders. The only thing is that the subfolder also owns the parent, but
that is not noticeable due to the unique direction.

What I want to say is that Documents -> University or University -> Documents
is the same, thus it makes sense. In real life the order doesn't matter.

Thus it opens a path for better HCI: computer, give me all documents from the
university (all university documents)..

I am curios in what perspective you see this failing?

~~~
m0th87
I've heard of this concept of 'set-based' filesystems (although with different
terminology), and I think it's brilliant. And not just for documents either.
Generally speaking, websites have moved away from storing data in a
hierarchical matter (e.g. old-school Internet directories), and on to flat,
tag-oriented categorization (e.g. delicious) because of the abundance of
information to categorize. Hierarchies do not scale and are difficult to
change, whereas free-form categorization do not suffer from this.

Personal systems have more files on them than ever before, and fitting them to
hierarchies that were invented decades ago can be tricky. These issues have
been abated for now with things like OS X's Spotlight and Win7's Libraries,
but my suspicion is that set-based filesystems would be far simpler than the
current paradigm.

A good example off the top of my head is resolving what command-line
executables are available. In most operating systems, this is done by setting
a PATH variable with a list of directories that contain the executables. While
this is trivial for the type of people on HN, it is far from for your average
user. Instead, command-line executables could be given a specific tag, and the
command-line would be able to execute any application that has that tag.

Now, whether set-based filesystems could replace hierarchical ones is a
different question. Such a filesystem would diverge so greatly from current
assumptions that people might be too aversive to change. It wouldn't be the
first time; I think Plan 9 was light years ahead in so many respects, but it
never took off because it was such a radical departure.

~~~
dejan
That's what I'm thinking too. However, those are assumptions (which I share).
I am thinking maybe it would be interesting to see this simulated in the web
environment and get feedback on that - web file system.

I started playing with tags on Aleveo for that reason, maybe I should take it
further on a separate test project. It makes a big difference using it and
thinking how it would be used.

Also, I am not sure what would be best for executables but how are those
currently seen - as having an "+x". Having an executable tag is not much
different. That also means, tags are a bit "smarter" than the regular ones
seen on the web.

I am mostly thrilled by the ability to see the relations of files and folders
to others. Those are not folders anymore as "storage units" but "meanings".

Also currently a file can exists only in one place. Having symlinks is cruft
in my opinion, compensating for the need of multiple context presence of data.
Symlink is a separate file, a pointer different than the original. Windows is
excluded completely :) I think file can be in several places depending on the
need. After all this is not the physical world.

