
Tagsistant: A Semantic Filesystem for Linux - pcr910303
https://www.tagsistant.net/
======
jmiskovic
Very nice project. I'm delighted to see new ideas for organizing files and
personal data.

I feel that most people gave up on organizing data and just went with concept
of searching. This is a shame because searching wastes everyone's time to
filter false positives, while small effort of tagging new content goes long
way to enable discoverability. Web sites that allow you to filter content
based on desired and undesired tags give you optimal way to recover
information.

A very interesting feature in Tagsistant is tag relations. It enables tree
hierarchies for tags ("anything starwars-related is also scifi-related"). Kind
of ironic how they wanted to get away from tree structure of files, and then
they implemented tree structure for tags. Perhaps a tagging system for
organizing your tags would be better? :)

This meta-structure of data is fascinating. Is there a good resource that
systematizes the area, with best practices and implementation tips?

~~~
feanaro
Tree hierarchies for tags is starting to sound a lot like categories. It's
probably essential not to overdo it because then you start to lose the power
of tags and gain the detriments of categories. That said, I do think that
introducing some limited relations between tags can be beneficial.

------
fao_
I was worried, because I'm currently working on a tagging system and they beat
me to the release, punchy website included.

Then I realised that they built it on top of FUSE, and SQL, and took a sigh of
relief.

((EDIT: Perhaps this is a little harsh? I didn't mean to be harsh, just
precise, but perhaps I went a little over the top -- I apologize to the
authors if I did.))

I investigated the FUSE/db option earlier a year or two ago, and personally I
don't see this as an interesting or compelling solution to the file<->tag
problem. Because users move and rename files, pathnames are potentially
semantically meaningless to the tag system. The contents of files change
often, and arbitrarily (given things like MS Word's formats which are literal
memory dumps of what word is doing, not to factor in encrypted files, etc.),
because of this, file hashes are potentially semantically meaningless to the
tag system.

In other words, basic file operations (reading/writing/renaming) will cause
this system to break your tags without significant work to keep the file<->tag
relation consistent. You can attempt to mitigate this problem through systems
that keep track of files (inotify, etc.) but that introduces a runtime cost
and has technical difficulties as well. It's 'designed' (albeit
unintentionally) to break from the start, and the developer has to exert a
large amount of effort to stop the system from breaking. To me it didn't seem
like the effort was worth it, that the innate flaws were not worth
surmounting. Unfortunately to avoid this from being a 'debbie downer' post,
I'd have to talk about the alternative approach, which I don't really have
space (or the time, right now) to do here.

~~~
repsilat
It sounds like this is a pretty easy problem to solve if files aren't
identified by their names. If a file just _has_ a name (and a parent, and a
bunch of tags...) then tags are trivially stable when moves/renames/writes
happen. No need to hash anything, ids are a fine way to track identity, and
perfectly amenable to storing an a SQL database.

If you want to support hard links you can decide whether to associate tags
with files or inodes, depending on whether you want all linked files to have
the same set of tags.

~~~
fao_
> It sounds like this is a pretty easy problem to solve if files aren't
> identified by their names.

That would make things slightly better, but it's really not how things are
supposed to look from userspace. You still have the problem of tags not being
preserved across file copies, and not across filesystem boundaries (Which, the
latter is almost a universal problem in this space, I guess).

------
miohtama
This is an interesting concept. But why to do this on a file system level? Web
and desktop based applications like Google Photos and iTunes offers some
cataloging and search capabilities that should cover most of the use cases.

Would this later be connected to something like Spotlight on OSX?

~~~
yjftsjthsd-h
One advantage to a FS over a program is that it gets to be really
interoperable for free; pretty much everything can work on raw files, and the
unix ecosystem is very good at inter-operating with filesystems as an "API".

------
nerdponx
Interesting timing. I came across TMSU [0] the other day, which seems to have
similar objectives.

[0]: [https://tmsu.org/](https://tmsu.org/)

~~~
rasengan0
Yayy! Thank you for the reference, tag my shit up seems more straightforward
to me after reading tmsu and Tagsistant. and both use sqlite!

------
theon144
Would also like to point out TMSU [0] which also presents a filesystem-y
interface, alongside a quite powerful CLI.

[0]: [https://tmsu.org/](https://tmsu.org/)

~~~
Fnoord
Can it import from current file magic? ID3? CDDB? IMDB?

You see, it is a big effort to manually input all this data.

What is useful is reliable, public source entries.

You also want a documented data format for future compatibility. So you can
migrate to the next format.

This furthermore gives the advantage that applications can use the metadata.
Although I suppose the file system abstraction achieves the same?

------
teddyh
Isn’t stuff like this what the ext4 “user_xattr” mount option is for?

~~~
tgbugs
Not quite. xattrs are not indexed, so if you want efficient retrieval by xattr
tags you have to build/maintain your own inverted index. xattrs are good if
you have a file that you want to store metadata about, but not so good if you
want to use that metadata for discovery (at least in my experience, maybe I'm
missing something, but I've implemented a remote file syncing system using
xattrs and I never came across a tool in the unix arsenal beyond find +
getfattr which is painfully slow).

~~~
zmix
I made heavy use of OpenMeta (that was a community effort on OSX, before Apple
introduced something very similar), that placed tags in xattrs. It is great!
Retrieving never caused any issues or slowdowns.

------
jlrubin
This is really cool, I've been thinking about making something like this
myself for a while! Design seems to be relatively well thought out.

I might be a bit too paranoid of file loss to install it though...

~~~
mruts
Then just use it with symbolic links?

------
xixixao
If you find this interesting, check out Bear with its nested tags, which are
executed brilliantly.

------
zmix
Why not just make a macOS compatible system, based on xattrs?

------
amelius
I think tags are a dead end because few users will make the effort to tag
their files properly, at all times. Better use the Google approach, and let
ML/NLP do the job.

~~~
eitland
Users will not give their files proper names or put them into folders or tidy
their desks or do the laundry either.

That is _some_ users won't do that, but by adapting everything to these users
we are making everything worse for everyone including them.

~~~
amelius
The problem is that everyone is one of these users at some point. E.g. when
about to leave the office and receiving an attachment. Or at 3am when solving
a tough problem.

Instead of requiring every file to be properly tagged, why not take the
Google-approach altogether?

~~~
eitland
Perfect example. However:

I'm not trying to argue against search, I'm arguing against _everyone having
to rely on search_ because ux designers decided to remove every other way of
finding files because _someone_ might _someday_ forget to tag the file or
whatever the procedure is.

------
joshu
and of course there's the requisite tag cloud

