
Tagsistant: a reasoning semantic filesystem for Linux and BSD - gnosis
http://www.tagsistant.net/component/content/article/10-what-is-tagsistant
======
colanderman
I wrote something like this using FUSE and Postgres a couple years ago. The
major problem with this approach is scalability: on a large, multi-user FS,
you can get TONS of tags to sift through in your root folder.

My solution, which I never got around to implementing, was to have
"hierarchical" tags. e.g. "recipes:cookies:chocolatechip" is a single tag for
all chocolate chip cookie recipes, all of which have the implied tags
"recipes" and "recipes:cookies" (but not "cookies" or "chocolatechip"). The
advantage is that your root directory will not be littered with the "cookies"
and "chocolatechip" tags when these collections are fairly irrelevant.

This idea can be extended to solve multi-user conflicts (all tags are prefixed
with the username) and to make sensible tags for dates -- it makes little
sense to tag a file with "2011", "May", and "22", since no-one cares to find
all files from the 22nd of every month; but the hierarchical tag "2011:May:22"
is perfect for this situation since these files will also be present under
"2011" and "2011:May".

~~~
tx0
Did you use an ontology or some kind of "manifest" to bootstrap hierarchical
relations between tags?

I'm thinking about splitting Tagsistant into a client and a server to provide
a multiuser environment, and probably some ontological foundations are
required to coherently organize tagging coming from different users.

~~~
colanderman
I never got around to implementing the hierarchy. My idea was to have a few
basic entries, say, "users", "dates", "MIME-types", etc. which could be
prepopulated.

------
janjan
I'd like to see something similar:

Lately I have thought about a filestorage like this which consists of two
parts: 1) some kind of database in which you can put binary files and attach
tags to them 2) a FUSE (?) driver which let's create different 'views' on this
database which then can be mounted as part of a normal filesystem.

For example this would be nice for music and pdf collections. You could create
different 'views' and then go to one folder to see your music/pds sorted by
year and then to another folder to see them sorted by author/artist and then
to a third one which is sorted by type_of_music/artist/album.

this way you would get the best of both worlds: 1) a powerful database to
store and organize binary data and 2) downward compatability since you can
just use the command line / bash to 'export' files to mp3 players and so on.

~~~
judofyr
Woah, this pretty much describes my "perfect" file organization tool which
I've been thinking about in the last months. I was thinking of combining it
with revision support (every version of your file stored) together with
Dropbox-like synchronization.

~~~
elcron
I think every version would be a bit much, especially on smaller drives if it
had logarithmic diffs i.e. every change today. Every hour for the past week.
Before that every day for the past month. etc. It would probably be more disk
efficient for people with smallish SSDs, but everybodies "perfect" is
different.

------
sixtofour
This is a great idea. Early use would have to be limited, because the rest of
the world assumes directories. It would be great if the world ran on tags
though.

Bookmark tags is why I'm still using Firefox instead of Chrome. I recently
tried Chrome for a week, I really wanted to like it, but bookmark tags is what
brought me back to Firefox. I look at my browser as an information manager as
much as a reader, and multiple tags per bookmark is my killer feature.

Gmail's imap folders as tags, multiple tags per message, and their exposure of
imap to external clients like Thunderbird, is why I was finally able to
convince myself to use gmail instead of my hosting provider's email. I confess
that I don't use multiple tags as much as I thought I would. But I _could_!

~~~
gnosis
_"This is a great idea. Early use would have to be limited, because the rest
of the world assumes directories."_

Tagsistant still uses directories. It's just that the directory names are
automatically also usable as tags.

------
simcop2387
I'd consider doing something like dropping the AND by default.

    
    
      i.e. /photos/london/2011
    

while still having the OR and NOT and any other operations you need. the idea
being that at least due to training from hierarchical filesystems the AND
would seem rather implied. As far as considering the duplicate filenames i'd
add a prefix to them based on the order they were added to the DB (e.g. first
one gets no prefix, second one gets a 2_ or whatever you figure out) that way
you won't has as many cases where a file's name changes the moment you add in
a new one.

~~~
tx0
Tagsistant 0.4 doesn't use AND any longer, but that brought the need for a
termination operator, which is '=' so far, but probably will be changed for
shell comfort.

The 0.2 query mpoint/rock/AND/seattle/ becomes mpoint/tags/rock/seattle/=/ in
0.4

And about the prefix, Tagsistant 0.4 indeed uses a NNN_ prefix to filenames to
allow for duplicated names to coexist.

------
TeMPOraL
One drawback I can think of is that you loose the "permanent paths" to files
in some cases. If you start twiddling with relationships between tags, I can
imagine how many paths previously stored in software will get broken.

I can also imagine 'identity problems'. Not counting symlinks and hardlinks,
the full file path serves as its URI. How can I be sure if
/photos/europe/DSCN0001.JPG and /photos/london/DSCN0001.JPG are the same
files? What's the file URI here?

~~~
mnzaki
Another related ramification of doing away with heirarchy and unique
identifiers (tree paths) is not being able to have files with the same
filename.

Say you have: /photos/london/DSCN0001.JPG and /photos/berlin/DSCN0001.JPG And
the relationships: europe contains london, europe contains berlin

Now what does the 'path' /photos/europe/DSCN0001.JPG resolve to?

~~~
tx0
Tagsistant 0.2 does not allow to store two files with the same name, exactly
as you say. But Tagsistant 0.4 will! Well, at the little compromise of having
a small unique number prepended to each filename.

Tagsistant 0.4 has a broader vision (tagging of entire directories) but is
still under development. If you have suggestions or doubts, I'll be very happy
to discuss it.

~~~
spoondan
Can you provide meaningful prefixes for conflicting files? When you detect a
file name conflict, construct a distinguishing prefix for each conflicting
file from the difference in tags on the conflicting files. (If all the tags
are the same, then fallback to a synthetic prefix or overwrite the file or
error out or whatever.)

For example, let's say you have /photos/london/DSCN0001.JPG and
/photos/vienna/DSCN0001.JPG, where "london" and "vienna" are both included in
"europe". This could yield paths like /photos/europe/london:DSCN0001.JPG and
/photos/europe/vienna:DSCN0001.JPG.

The big trouble here (and, if I understand, with what you're suggesting as
well) is that changing the name or tags of one file can alter the path to
another as a side effect. So if I started with just
/photos/vienna/DSCN0001.JPG, I might reference it as
/photos/europe/DSCN0001.JPG somewhere. But when I go back and add
/photos/london/DSCN0001.JPG, my reference to the photo of Vienna breaks
because its name is no longer unique. As TeMPOral points out, this is a
general class of problems afflicting a system like this.

~~~
tx0
It does not work exactly this way.

When you create a file "DSCN0001.JPG", it receive a prefix, even if it's not
conflicting, becoming, lets say, "123_DSCN0001.JPG".

But both you and your software (say: a filemanager) are presuming the file is
named "DSCN0001.JPG", not "123_DSCN0001.JPG". To solve that, Tagsistant 0.4
provides an aliasing layer that maps the original name to the prefixed one.

It's still something under development, so both the idea and the
implementation can change. For example: how long should an alias exists? Just
after the first access? Up to an extimated expiration time?

I'm oriented to the latter solution. Being aliases implemented as an SQL
table, adding a expiration column and a garbage collecting thread should be
all that is needed.

Of course, using expiring aliases is just like postponing the problem. But, in
my opinion, Tagsistant is primary a personal tool, nothing that automated
procedures or batches are supposed to rely on. I hope that, in this
perspective, the alias workaround is an acceptable compromise.

------
haliax
Why couldn't this co-exist with a hirerarchical filesystem, maybe via a
special subdirectory as procfs does? (With homonym files getting some
distinguishing prefix, possibly based on their hierarchical paths.) It seems
somewhat like a more sophisticated version of spotlight in that it can handle
logical relationships between tags.

------
p4bl0
I had exactly this idea last year but I never took the time to actually
implement it. Nice to see someone did!

------
mnzaki
I was excited about this at first, but I have come to think a filesystem that
does away with the tree structure would be rather difficult to deal with or
get used to. I will probably end up with loads of files under similar tags and
it would make files rather difficult to locate due to the sheer number of
them. Maybe the idea just needs a bit of refining?

~~~
masterzora
This is actually something I've been wanting for a long, long time. Obviously
I'm not planning on using this for / or anything system-related, but for my
media archives &c it's perfect. Taking movies for example, I've found most
people I know think an alphabetical approach is the way to go. This is
useless, however, if I don't know what I want to watch. With the tags I can
easily take tag in genres, directors, actors, etc and have my filesystem help
me choose a film.

There is on major thing that I've considered would be an awesome addition to a
system like this: automatic tagging from the metadata. It'd be awesome if my
movies automatically got tagged by, say, resolution and length, so I wouldn't
have to bother with such things.

~~~
mnzaki
Same here, I want something similar to manage my photo collection. The current
implementation actually can have plugins that act on specific file types, so
automatic tags are just a few lines of code away.

But then what advantage would this system have over a dedicated media manager
or the several semantic desktop projects? The only advantage I see is allowing
traditional tools (unix utilities, conventional file managers, even 'open
file' popups) to take advantage of the system without any extra effort on
their side. Though wouldn't dedicated tools, say a full blown semantic
desktop, allow for much better integration and usability?

~~~
masterzora
That really depends on the use case, I guess. For me, the use case is that
I've got a media server in my living room and, in addition to being hooked up
to my TV, it also has a Samba share set up so that anyone in my apartment can
watch/listen/look at/read/view anything anywhere in the apartment. In this
case, it certainly would be possible to have every computer set up with a
semantic desktop setup, but I'm still trying to keep my Eee minimal and
visitors wouldn't necessarily be set up with the semantic desktop tools.

Maybe this implies ground for a semantic server, but I'm inclined to think
that Samba (or your favourite mostly-transparent sharing protocol) on top of a
tag-based system is actually a good and way to accomplish this particular use
case.

------
hmottestad
Yay...semantic technologies. I hope they make a way of utilizing existing
technologies from the semantic web. Like OWL :)
[https://secure.wikimedia.org/wikipedia/en/wiki/Web_Ontology_...](https://secure.wikimedia.org/wikipedia/en/wiki/Web_Ontology_Language)

~~~
currywurst
If you read his notes, he finds that OWL as it currently exists is too
cumbersome for his needs.

~~~
hmottestad
ahh...yes...[http://www.tagsistant.net/documents-about-
tagsistant/semanti...](http://www.tagsistant.net/documents-about-
tagsistant/semantic-reasoning/3-semantic-rules-inside-tagsistant)

I think OWL would fit the task quite well. Yes, right now it may seem like
overkill and will probably we somewhat slow. RDFS is an alternative though. Or
just using a subset of OWL, like OWL lite.

------
tx0
Tagsistant is not supposed to replace posix filesystems and can't be used in
system directories like /bin or /etc.

Tagsistant is a personal tool to organize files (and directories, starting
with version 0.4).

It provides a plugin architecture to allow autotagging.

------
jorangreef
Well done with the "Europe" tag includes anything with the "London" tag idea.

~~~
gnosis
This idea is called "hierarchical tagging". Searching for that phrase in
google should get you some links to other people who've discussed and
implemented it.

~~~
Someone
I think that was a joke. Over in London, 'Europe' is across the channel, on
the mainland.

------
CurtHagenlocher
Reminds me of WinFS (<http://en.wikipedia.org/wiki/WinFS>)

