My solution, which I never got around to implementing, was to have "hierarchical" tags. e.g. "recipes:cookies:chocolatechip" is a single tag for all chocolate chip cookie recipes, all of which have the implied tags "recipes" and "recipes:cookies" (but not "cookies" or "chocolatechip"). The advantage is that your root directory will not be littered with the "cookies" and "chocolatechip" tags when these collections are fairly irrelevant.
This idea can be extended to solve multi-user conflicts (all tags are prefixed with the username) and to make sensible tags for dates -- it makes little sense to tag a file with "2011", "May", and "22", since no-one cares to find all files from the 22nd of every month; but the hierarchical tag "2011:May:22" is perfect for this situation since these files will also be present under "2011" and "2011:May".
I'm thinking about splitting Tagsistant into a client and a server to provide a multiuser environment, and probably some ontological foundations are required to coherently organize tagging coming from different users.
Lately I have thought about a filestorage like this which consists of two parts: 1) some kind of database in which you can put binary files and attach tags to them 2) a FUSE (?) driver which let's create different 'views' on this database which then can be mounted as part of a normal filesystem.
For example this would be nice for music and pdf collections. You could create different 'views' and then go to one folder to see your music/pds sorted by year and then to another folder to see them sorted by author/artist and then to a third one which is sorted by type_of_music/artist/album.
this way you would get the best of both worlds: 1) a powerful database to store and organize binary data and 2) downward compatability since you can just use the command line / bash to 'export' files to mp3 players and so on.
Bookmark tags is why I'm still using Firefox instead of Chrome. I recently tried Chrome for a week, I really wanted to like it, but bookmark tags is what brought me back to Firefox. I look at my browser as an information manager as much as a reader, and multiple tags per bookmark is my killer feature.
Gmail's imap folders as tags, multiple tags per message, and their exposure of imap to external clients like Thunderbird, is why I was finally able to convince myself to use gmail instead of my hosting provider's email. I confess that I don't use multiple tags as much as I thought I would. But I could!
Tagsistant still uses directories. It's just that the directory names are automatically also usable as tags.
The 0.2 query mpoint/rock/AND/seattle/ becomes mpoint/tags/rock/seattle/=/ in 0.4
And about the prefix, Tagsistant 0.4 indeed uses a NNN_ prefix to filenames to allow for duplicated names to coexist.
I can also imagine 'identity problems'. Not counting symlinks and hardlinks, the full file path serves as its URI. How can I be sure if /photos/europe/DSCN0001.JPG and /photos/london/DSCN0001.JPG are the same files? What's the file URI here?
Twiddling with tags is no worse than twiddling with directories.
Few people consider it a fault of the design of ordinary filesystems that if you mess with the underlying filesystem layout, software that relied on that layout might break.
The same is really the case for any kind of dependency on a certain kind of organization of your data.
The fault for breaking software that's tightly coupled to a certain underlying organization or layout lies with the software itself (for not tolerating changes) and with the user for making the changes in the first place.
"I can also imagine 'identity problems'. Not counting symlinks and hardlinks, the full file path serves as its URI. How can I be sure if /photos/europe/DSCN0001.JPG and /photos/london/DSCN0001.JPG are the same files? What's the file URI here?"
But symlinks and hardlinks are the critical bit of filesystem functionality that makes ordinary filesystems subject to the very question. So why would you not consider them?
There are various solutions to this problem on ordinary filesystems: first, your tools (like "ls") could show you that a file or directory is symlinked (though you might have to traverse through the parent directories to find out whether there is a symlink). Second, you could also use stat to check the inode of the files in question to see if they're the same.
It should not be difficult to add similar functionality to a tag-based filesystem.
Say you have: /photos/london/DSCN0001.JPG and /photos/berlin/DSCN0001.JPG
And the relationships: europe contains london, europe contains berlin
Now what does the 'path' /photos/europe/DSCN0001.JPG resolve to?
Tagsistant 0.4 has a broader vision (tagging of entire directories) but is still under development. If you have suggestions or doubts, I'll be very happy to discuss it.
For example, let's say you have /photos/london/DSCN0001.JPG and /photos/vienna/DSCN0001.JPG, where "london" and "vienna" are both included in "europe". This could yield paths like /photos/europe/london:DSCN0001.JPG and /photos/europe/vienna:DSCN0001.JPG.
The big trouble here (and, if I understand, with what you're suggesting as well) is that changing the name or tags of one file can alter the path to another as a side effect. So if I started with just /photos/vienna/DSCN0001.JPG, I might reference it as /photos/europe/DSCN0001.JPG somewhere. But when I go back and add /photos/london/DSCN0001.JPG, my reference to the photo of Vienna breaks because its name is no longer unique. As TeMPOral points out, this is a general class of problems afflicting a system like this.
When you create a file "DSCN0001.JPG", it receive a prefix, even if it's not conflicting, becoming, lets say, "123_DSCN0001.JPG".
But both you and your software (say: a filemanager) are presuming the file is named "DSCN0001.JPG", not "123_DSCN0001.JPG". To solve that, Tagsistant 0.4 provides an aliasing layer that maps the original name to the prefixed one.
It's still something under development, so both the idea and the implementation can change. For example: how long should an alias exists? Just after the first access? Up to an extimated expiration time?
I'm oriented to the latter solution. Being aliases implemented as an SQL table, adding a expiration column and a garbage collecting thread should be all that is needed.
Of course, using expiring aliases is just like postponing the problem. But, in my opinion, Tagsistant is primary a personal tool, nothing that automated procedures or batches are supposed to rely on. I hope that, in this perspective, the alias workaround is an acceptable compromise.
But files are also accessible from the archive/ directory where nothing is supposed to change as a consequence of tagging.
Can be a reasonable compromise?
There is on major thing that I've considered would be an awesome addition to a system like this: automatic tagging from the metadata. It'd be awesome if my movies automatically got tagged by, say, resolution and length, so I wouldn't have to bother with such things.
But then what advantage would this system have over a dedicated media manager or the several semantic desktop projects? The only advantage I see is allowing traditional tools (unix utilities, conventional file managers, even 'open file' popups) to take advantage of the system without any extra effort on their side. Though wouldn't dedicated tools, say a full blown semantic desktop, allow for much better integration and usability?
Maybe this implies ground for a semantic server, but I'm inclined to think that Samba (or your favourite mostly-transparent sharing protocol) on top of a tag-based system is actually a good and way to accomplish this particular use case.
That's a pretty big advantage.
Instead of having operators like AND be path components, I would have the path separator itself be an and operator, or use & and | to just make logical expressions. As long as you're throwing out the hierarchy, you may as well throw out the hierarchical syntax.
I think OWL would fit the task quite well. Yes, right now it may seem like overkill and will probably we somewhat slow. RDFS is an alternative though. Or just using a subset of OWL, like OWL lite.
Tagsistant is a personal tool to organize files (and directories, starting with version 0.4).
It provides a plugin architecture to allow autotagging.