Hacker News new | comments | ask | show | jobs | submit login
TMSU: a tool born out of frustration with the hierarchical nature of filesystems (tmsu.org)
237 points by goblin89 on May 9, 2016 | hide | past | web | favorite | 128 comments

I'm actually not frustrated with the hierarchical nature of filesystems. I'm most frustrated with the state of filesystem search these days. I don't want to tag and curate my files. I want to search them.

I've strung together something using bleve full-text search and some OCR libs to scratch my particular itch but it still doesn't quite get all there.

I agree.

I actually like the hierarchical organisation, and I don't like the 10 year usability trend, particularly driven by Microsofts attempts to patch over their horrid structure with even worse workarounds.

The solution to learning my mother were to find her documents is not hiding the place 15 levels deep and having 10 symlinks to it.

Fast, unobtrusive indexing and a good structure is all that is needed. Humans organise and memorise things in hierarchies, and while categories is another helpful abstraction they tend to be confusing if they do not point to a fixed position in a hierarchy in my opinion.

You can also explore a hierarchy; Not so with a cloud of things connected to a bunch of tags.

> Humans organise and memorise things in hierarchies,

Humans also track a lot of things based on spatial memory, which no OS since Mac Classic has even tried to make use of, which is a shame.

True, with the exemption of iOS spring board. It's a great spatial memory comeback and a testament to the easy of use of the model, even if it doesn't scale well when you get into the 100s of items.

How did Mac Classic make use of that?

In Mac Classic each folder you opened created a new window which remembered all of its settings for size, position on screen, scroll position, icon size/layout, etc. This window is an explicit and exclusive representation of that particular folder and attempting to open the same folder again does not create a new window, it simply shifts the focus to the already-open window. The change in Mac OS X towards a browser for navigating the filesystem hierarchy caused an active debate over spatial[0] vs. navigation (or browser)[1] file managers.

[0] https://en.wikipedia.org/wiki/Spatial_file_manager

[1] https://en.wikipedia.org/wiki/File_manager#Navigational_file...

Ars did an amazing series of articles about the (IMO) missteps made in OS X, and presented a theoretical design that would satisfy both Mac Classic fans and fans of "browser-based" windowing systems.


Any software developer interested in designing usable software should read and digest this. Alas, nobody working at Apple did, so I'm no longer an Apple customer.

Thank you for this! I vaguely alluded to this (active debate) in my comment but I had forgotten the source. I'm going to read through this now! :)

Very interesting, thank you.

The short answer is: stuff stayed where you put it. Every icon, every window, opened up exactly where you left it last time you saw it.

So you had the option of remembering "Excel is the icon below the Special menu" and you could be confident that the Excel icon would always be located below the Special menu, exactly where you put it.

One of the reasons I switched from OS X to Windows is that Apple threw that brilliant design in the garbage. Shame.

I wouldn't say no OS has tried, the windows 8 start screen was designed to exploit spatial memory for instance.

Project Xanadu had something like that. Of course, it is barely working.

You have to realize though that this software doesn't take away your heiracichal system, it just gives you another way to look at it. Recently I've been cleaning up my music library and I'm seeing some of the shortcomings of the current system. What do you do with a compilation album? Have a folder for each artist or put them all together, (Which would separate them from others works by the same artist)? iTunes or whatever media player you use will probably list them in both locations depending on how you sort or whatever. I find this to be a useful feature, why wouldn't you like to have the option to do that sort of thing from the command line?

That iTunes view is still a hierarchy, though. More accurately, it's still a DAG. The hierarchy is not the problem here, it's the rigidity of trying to put something in only one part of that hierarchy.

It's multiple, interwoven hierarchies, which you can look at by "seeing" a particular DAG (a subset of the actual graph).

You can see this twin representation conceptually as Artist->Album->Song and also as Compilation->Artist->Song, the latter being a small folder with symlinks to the actual song directly.

It's really just two different trees.

Spot on, for file system now it should not be a problem to have links in different places in hierarchy. You can have "favoriteSong.mp3" in catalogue with artist name and use links to put it in catalogue with "ultimate super songs". But I think people are not aware of possibilities.

Tags are still a DAG, too. I don't see what you're trying to prove here.

Tags are unordered sets that can be viewed in a DAG format if desired. Tag's themselves do not specify a DAG though.

> You have to realize though that this software doesn't take away your heiracichal system, it just gives you another way to look at it.

I think that this is extremely important. You mention iTunes, which does its best to pretend that there's no hierarchical file system underlying it. There is, of course, but—ugh, one look at it will make you long for a totally flat file structure.

I think that it's important that any hierarchical-file-system 'killer' actually not replace (or render meaningless), but only supplement, the often-useful hierarchical structure; and it looks like this tool does that.

(In case that looks like an argument, let me say explicitly that I am agreeing with you.)

>Humans organise and memorise things in hierarchies, ...

Well, sure, if things do have a hierarchical relationship, but a hierarchy is just one form of network. It fails for concurrent dependency problems, e.g.

Tags are a subset of hierarchies + symlinks. If you have a one level hierarchy, you have the equivalent of tags.

I'd argue that tags provide a more graph-like structure, of which a hierarchy (tree)[1] is just a subset. Symlinks provide an "escape-hatch" to hierarchical systems allowing objects to appear to be in more than one directory.

[1] https://en.wikipedia.org/wiki/Tree_(graph_theory)

How so? Unless you can tag tags, you can't represent a tree with them. If you can tag tags, you get an arbitrary graph, with all the problems that represents.

I don't think I've seen tagging systems that let me tag tags out in the wild.

You can create a DAG from Tagged objects if you use Set operations. The set of things with tag A includes things from other sets. Those sets form the next level in the hierarchy.

The tagged objects define the relationships between differing tags which allows you to define that arbitrary graph. The graph can be viewed as an effectively infinite depth tree if you wish to.

The 'path' in the tree can be represented as a tag. It's a simplistic way to go, but you can get pretty far with it to emulate a folder structure using tags.

> Humans organise and memorise things in hierarchies

Big claim. Proof?

Big proofs. Conceptual categorization is a big part of how we humans comprehend and analyze the world around us, and it relies on a hierarchy of defining attributes, from abstract model to concrete specimen:




Maybe less abstract: if you try to remember where your phone is, do you follow a pattern like "which room", "which coat/pants/desk", then "which pocket/drawer"?

I actually wnt to believe you because it accords with my own intuition but I don't trust my intuition on this!

Conceptual categorisation + language + ? is totally how we humans comprehend and analyze the world around us, granted. (The devil is in the details of course.) But I don't see how we can go from there to "it relies on a hierarchy of defining attributes"

I don't buy your example because the concepts in our heads seem to be organised in context-specific word bags, i.e. tags.

I think I would believe you if cognitive science discovered that tree-hierarchy navigation relied on built-in neuro-spatial machinery or something like that :)

I _want_ to believe you but I also want proper proof! :)

edit: thanks for the heads up on prototype theory, can't believe I'd forgotten about this. also, thanks for making me ponder more deeply about stereotypes.

Glad to be of (some) help :)

I'm not a developmental psychologist btw, just have an interest in cognitive development. The proof you're looking for would be more the field of cognitive neuroscience.

There definitely are graph-like connections between concepts all over the brain, I didn't mean to suggest that our brains work hierarchical only, to the exclusion of all other relations. As an example: if our brains were strictly hierarchical, we would be unable to see the similarity in colour between a grey table and a grey hare. I just meant to explain my thinking on how abstract thought is tied to a attribute/property hierarchy, not proclaim that I know how the brain works.

(edit: snipped on-topic content, best reserved for a separate post)

I wonder about this a lot.

File systems are arranged hierarchically. Fluke of development? Library catalogues are organised hierarchically. Lucky coincidence? Books themselves are laid out hierarchically. I see a pattern emerging, but is it a trick of the light?

Seems like we like to chunk and splay amorphous informational units into graph-like rooted structures. Maybe cuz of its flexibility?

For something we use all the time and something we do all the time language-use and conceptualisation are deeply mysterious.

We have all sorts of associations -- our brain is fundamentally an association engine. One kind of association is simplification/abstraction. This perhaps proves to be one of the most useful kinds of associations, because it helps us make decisions quickly without sifting through massive amounts of data.

I'd say it's fundamental, not any kind of mystery.

If language-use and conceptualisation are not any kind of mystery I'm sure you won't mind explaining how language acquisition works? Also, why does natural language grammar appear to be mildly context-sensitive and not context-free nor fully context-sensitive? Also, given the topic: why does the brain stereotype, and what's up with cognitive prototyping? Also, how does the brain associate the objects of conception with words? Also, how much of language's basic machinery is hard-wired? Also, how are lexicons formed? Also, is thought simply language or something else? Also, are the different languages backed by different conceptual schema?

Btw, Frege, following Hume has pointed out that abstraction is an equivalence relation: http://logic.uconn.edu/2015/01/21/reference-and-invariance-i... I wouldn't lump simplification and abstraction together, and characterising them both as associations feels wrong.

What I'm trying to say is that your time would be better spent in not correcting every instance of mild hyperbole on HN :)

And mine too. :)

Wow talk about waste of time, you jumped to a conclusion about what meant in a casual comment (you were way off, by the way, I didn't and wouldn't ever make the silly claim that language acquisition is a simple concept), and proceeded to assault us with a diarrhea of lingo meant to, what... prove how much you know about a topic? It comes off like a first year linguistics student trying to show off to his friends.

What I'm trying to say is that your time would be better spent in not correcting every instance of mild hyperbole

What was I correcting, exactly? I was giving a casual opinion. To call a casual, armchair hypothesis that differs from someone else's "correcting"... well let's just say you failed spectacularly at understanding human language patterns.

I see a pattern emerging, but is it a trick of the light?

I think it is fundamental. A hierarchy forms naturally from iterative steps of aggregation and differentiation: first, group all similar objects together; then, for each high-level group, look for differences within each group, and split the groups into smaller subgroups. Rinse and repeat until you've reached an acceptable number of items per group.

I read this as, humans with 8-24 years of western education forced on them tend to think this way.

I'm not a big fan of semantic search itself, nor do I enjoy meticulously tagging my files. But I would love to have different views of my files, based on different metadata.

I know that I'm not always searching for files along the same hierarchy: sometimes I want to search along a timeline, because I know that I edited two files around the same time. Or I remember where I was when I wrote something, and I'd like to search on geolocation. Sometimes I know that I've implemented a similar feature for a different project, and want to copy/review what I did then.

All these searches still have hierarchical components in them: in the "date" case, I would expect to browse to a generic "N weeks ago" directory, look at a few specific items to get my bearings, and start browsing deeper to narrow my date range, or move forward or backward in time. In the geolocation case, I'd want to pull up a map, click on a country, then a city or area, and look at the files I accessed while there. As for the project example, I probably have the project files lying around in a hierarchical directory already. But I'd want to access only the relevant feature file, not browse the entire project structure again.

For me, the problem with non-hierarchic interfaces isn't just the lack of metadata: it's a lack of tools for visualizing non-hierarchic matches. Every search I described above requires its own GUI...

I actually made a sibling post highlighting almost the same paint points as you, which makes me think we may be onto something ...

I don't really need semantic search. Full text search covers 99.9% of my needs.

Your videos, music and pictures must be rather boring, if you can index them with OCR... ;-)

This is probably explained best by the fact that I don't really store videos, and My Music and Images are all in the cloud.

There needs to be a middle ground between hierarchical organization and search. Hierarchical organization is painful because an object can only be in one place in the hierarchy (disregarding links, which are a hack), and search by itself (as implemented in e.g., Spotlight) is not optimal because spatial/temporal/conceptual clustering of documents is important for discovery of related materials.

I have more than 5 todo.txts scattered in my system. I have cloned my webpage's git repository countless times in my local drive. I have countless LaTeX documents named letter.tex.

Of course, if I search for letter.txt I get a hit, but what other documents in my system did I create around the same date? were any other associated file changes? these questions are hard to answer in a hierarchical filesystem.

While we're at it, we should separate the task of document naming and classification from document saving. Right now, if I create a document in say, Microsoft Word and I want to make sure it gets saved, I have to pick not only a discoverable filename but also the right place for it in the hierarchical tree. I am just trying to type up a quick letter and get it printed, not grapple with the philosophical question of "where does this file belong and what should I name it?"

Why don't you put everything in one folder and name it with what the document is?

Because I would like to find the file later if I need to.

Yeah. I just want my files in directories, save maybe music, where I might want to query by various tags. But beets already can do that. Or I suppose I could build an app to build a static index off of a master, but again, why would I do that when I can use beets?

All of the other files I work with on a day to day basis I don't really want to query by structure or tags. I just want to know where the %^&*@ they ARE! And a simple hierarchy is actually better than tags for that.

I hear you there! If you're looking for a tool that scratches that itch, I've spent the last year and a half working on a could based search engine that spans all services and devices.

We inherit all the information from your existing file system structure, as well as the people files are shared with, and provide a powerful content-based search on top of that.

Check out our website if you'd like: https://www.meta.sc

If my files include a scanned in PDF with text that isn't stored as text per se will meta.sc index those? My homegrown solution does this for the most part and that feature was the motivation for creating it.

That and the fact that it's available offline on my machine.

Stuff like this is cool, but it really points out just how poor the filesystem is for organizing certain types of data. The "Why does TMSU not detect file moves and renames?" question in the FAQ really highlights how this isn't helped along by the filesystem at all.

One comment mentioned BFS, which had some really cool stuff. There's an Ars Technica article that touches on some of it[0].

The secret to BFS, in my mind, is that applications use it. The Haiku Mail app, as noted in the article, used the filesystem as its email database by attaching its own attributes to messages. This is also an example used in the "Practical Filesystem Design with the Be Filesystem" book[1].

Unless the metadata becomes a first class citizen in the filesystem, any attempts to layer it on top will have problems. Either applications won't understand it or normal filesystem operations will cause the metadata database to become de-synced with the filesystem data.

[0] http://arstechnica.com/information-technology/2010/06/the-be...

[1] http://www.letterp.com/~dbg/practical-file-system-design.pdf...

> Unless the metadata becomes a first class citizen in the filesystem, any attempts to layer it on top will have problems.

Remember when it seemed like Mac OS might give us a modern era of rampant metadata (http://arstechnica.com/apple/2005/04/macosx-10-4/6)? Ah, those were the days.


MacOS does an OK job of helping the user find things with Spotlight, but it's not a full metadata system like BFS had.

Mail.app, for example, keeps each message in a separate file[0] (and probably has a cache or separate database of this to make displaying mailboxes quicker). This makes it easy for Spotlight to index, but all of the stuff that you'd think of as metadata is actually just regular data inside the .emlx file.

If Apple made a huge effort to start treating the metadata (assuming the infrastructure described in the Ars Technica article still exists) as a first class citizen and using it like BeOS did, maybe we can get there. This would be a drastic rethink though. It feels like files, in some ways, are becoming second-class citizens in the Mac world. Photos, for example, are managed in the Photos app - you do not go into the filesystem and organize your photos.

One big problem with filesystem metadata is how do you transfer it? The Ars article showed a sidecar file (._filename) being created when the file was copied to a non-HFS volume. Now the metadata is detached from the file and we're back to the same problem.

[0] http://mike.laiosa.org/2009/03/01/emlx.html

A great idea! I think tags are definitely a better way to organize most personal data than trees.

Also I like that they describe what data they actually change on your computer right on the homepage: "TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever you put them. TMSU maintains its own database and you simply gain an additional view, which you can mount, based upon the tags you set up."

Unfortunately building on a foundation of sand (meaning not TMSU's code, but Unix filesystems) has downsides:


" Why does TMSU not detect file moves and renames?

To detect file moves/renames would require a daemon process watching the file system for changes and support from the file system for these events. As some file systems cannot provide these events (e.g. remote file systems) a universal solution cannot be offered. Such a function may be added later for those file systems that do provide file move/modification events but adding support for this to TMSU is not a priority at this time.

The current solution is to periodically use the repair command which will detect moved/renamed files and also update fingerprints for modified files. (The limitation of this is that files that are both moved/renamed and modified cannot be detected.) "


ding ding ding! This is the monkey in the wrench as it were.

Tagging is a really useful idea, it is also a naming thing and as such either it lives in the naming infrastructure (aka dirents) or it rots over time. A simple example I used to use in the 'object naming' [1] days was, imagine that instead of house numbers on the street you wrote down last names. That works fine until somebody moves and now not only did you show up at the wrong house, you don't even have a chance of knowing what the correct house is. [2]

Microsoft's LongHorn project was way out there but took a swing at the actual problem. Just make the file system an actual relational database. Then your home directory is simply 'select * from files where (owner = chuck);' It really does solve the problem at a more fundamental level, using naming by attribute rather than mapping. I got to observe that effort from the outside (I was at NetApp at the time) but I believe it died due to really horrible performance issues.

I find it pretty awesome that people can lose files, back when a "big" hard drive was 100MB it really wasn't all that hard to just look through all the files on it, but when its a couple or three terabytes, all bets are off!

[1] Object File systems were all the rage in the early 2000's, files themselves were object ids and the naming was a database that connected object ids to user recognizable names. -- https://en.wikipedia.org/wiki/Object_storage

[2] The typical solution is to add "tombstones" or redirects at the previous address. That then is a layer of additional meta data to maintain, and sometimes the file doesn't move, it just changes value (trivial example you have a file 'my-favorite-song.mp3' which is tagged 'jazz mp3' and then you discover techno and make something from Tiesto your favorite song and while the name and type are still valid, the tag 'jazz' is now invalid.

Hmm, seems like they could have gone the other way, throw everything into a DB, and then wrote a fuse plugin to access it all through traditional file system mechanics. That would have allowed for gating direct access such that moves and renames could be dealt with accordingly. Of course, there are other problems with that approach, but probably not as many as you might think (the file system is a database, so you're really just choosing a back-end that is less likely to be directly accessed).

    they could have gone the other way, throw everything into
    a DB, and then wrote a fuse plugin to access it all
    through traditional file system
This is the Camlistore strategy!

    Of course, there are other problems with that approach
Could you elaborate more on these? I've never worked with FUSE.

The other problems I was alluding to weren't really with FUSE, but one that does pertain to FUSE is speed, since FUSE imposes overhead through a daemon running in user space, and associated context level switches because of that. From just looking into is again, this may have been mitigated to some larger or smaller degree with some FUSE performance enhancements in 2012.

Specifically, I was referring to the different off the shelf database systems which could be used. Each will have it's own benefits and drawbacks to storing large chunks of data per-record. Benefits might include (relatively) easy sharding or replication. Drawbacks might include not being space efficient for removed files, not being as resilient to corruption due to crashes or corruption affecting more than the files in use, or overly aggressive use of memory to function efficiently.

If a custom database was developed, you could tailor to your exact needs, but then you have much more work to do, and a period of immaturity.

Off the top of my head, if I were designing a general purpose system for tagging files where people were expected to use it as a regular file system and some overhead from FUSE was acceptable, I think I would leverage the file system but in a different way. I would set up a specialized directory for the files themselves, and store then hashed within it, and have a BerkelyDB database relate filename to hash and tags, and use FUSE to do direct file access. But that's my 5 minute assessment, so I reserve the right to change it completely given someone pointing out the obvious problems. :)

Couldn't they just create a hardlink in a private, hidden directory that they control, and then symlink to that?

Then, it's OK if the original file gets renamed or moved, as long as it stays on the same FS. You still have your hardlink, and so your symlink still works.

what if you really want to delete the file tho? (passwords, customer data, incriminating evidence) then you have to remember to delete it from this system too!

I've been considering writing something similar myself, and my plan had been to hardlink the files by their hash into my blobstore. It won't fix the move/modification case, but it would solve the problem for simple moves. But I guess they're trying to deal with remote filesystems, too, and I was not targeting those.

This idea is rediscovered every few years in a new project. I tried this with StorageBox many years ago and even did some UX research in this context. Turns out many users don't like querying only as it they feel they can not search their data exhaustively, and might "lose" some of their data this way.

Also for not tech savvy users, folders have the nice interaction pattern of question-response via menu selection. They see something, click, see something, click, without realizing that they are navigating a folder hierarchy.

Not convinced. Why wouldn't a 'tag browser' work just like that too? Even better, since what I'm clicking on are meaningful tags, instead of 'directory names'

From my minor tech support experience, users don't understand what they've saved a file as. Search is great but if you don't know what the needle is it gets hard.

I empathize with the OP's view that there's a usability gap with out of the box file systems/namespace (short of bash/sed/awk/perl wrangling, natch), so I applaud this project.

I have a couple of points/concerns after a quick read.

1) So 'tag' is the verb that updates the records, but 'tags' is a read?

Poor taxonomy, IMHO. If I'm using "tmsu" I know I'm working with tags, so I'd think natural switches would be the way to go ( tmsu add, tmsu ls|list, &.) Or at least don't make the "create" verb and "read' verb differ only by the plural 'S.'

2) Is there a way to list all existing tags? —not just the ones already bound to a file but all available in the database (with regex filtering, of course).

That's what I'd need to pick from my 'tag pallette' before actually tagging so I could avoid creating synonyms accidentally that'd later require a merge.

The reason hierarchical structures are used in file systems is because they are a pretty intuitive and, most importantly, generic way of classifying and storing information. For the most part, just about any file you have on your computer can be stuffed into some sort of folder structure.

Specialized files like music, movies and images are a solved problem. iTunes and other software do a great job at organizing this information and making it easy to use. There's also software like Quicken for dealing with the other common mountain of data people have.

What might be useful is a piece of software that is capable of extracting and managing metadata automatically. Think of a tool like iTunes that you feed it a collection of files and it uses some form of ML to extract and create a database build logical ontologies for this data. The big problem with this kind of tool is finding a large, complex dataset that an individual has, but that has not been organized by a specialized piece of software. I doubt these exists in numbers significant enough to justify creating a project.

tl;dr: Directory structures are low-effort, generic, and discoverable ways of dealing with files that are not managed by other applications. It will be hard to improve on them without sacrificing one of those three attributes.

I don't think its intuitive at all. We don't ask questions like "What parent folder did I put that document it?" We ask "Where is that document I printed yesterday. The one that I got via email."

With liberal use of tags, and the ability to browse them fluidly, we could ask those sort of questions.

> My Documents [Sort by Date]

At the top.

The way that one person manages their documents probably isn't going to be the same as another, but generally a person is consistent with all their files.

In this case, it's either a document that you have several types of, in which case you would have an existing folder structure, or it's a one-off that you dump in My Documents along with all your other one-offs. Even if you lose it, file systems all keep date-time information, so you can easily search for the all the files last modified yesterday.

The problem with liberal tagging is that it requires a bunch of up-front effort that you're never going to perform. In this case, if it were an email, you'd just use your email browser (specialized software!) to find the document again based on the things you remember about it (date, source, size, etc).

I don't remember it by date or size or maybe even source. That's just parroting what we do now. I remember that I printed it. That could easily be a tag. The tags need not be created by me; the tools could be promiscuously tagging persistent data all the time, with useful clues. I'd learn some clues, learn to use them.

That 'my documents' thing - you could easily create a 'view' on tags that yielded that result. Without relying on Microsoft or whomever to do it for you.

How is this different than something like this:

    mkdir -p ~/tags/{music,big-jazz,mp3}
    ln -s /path/to/summer.mp3 ~/tags/music/
    ln -s /path/to/summer.mp3 ~/tags/big-jazz/
    ln -s /path/to/summer.mp3 ~/tags/mp3/
From there, you can use all normal filesystem tools to interact with your 'tags'. You could extend this with a simple script that handles duplicate file names in the same tag by sticking a hash of the file before the extension. Having a separate database for this information seems unnecessary.

It's still hierarchical. Try to tag a file both as jazz and as party music with folders. You can't.

Edit: Misread. Yes it would work. Well, consider this a tool for doing it automatically, without filling your hd with bogus folders and links

This supplies you with only one relation between the tags: "And Then". Ex. 'Path' (and then) 'to' (and then) 'summer.mp3'. Or 'tags' (and then) 'big-jazz'.

What you want is to also have AND, OR and NOT available. How would you find 'music' (and) 'mp3' (and not) 'big-jazz'?

I'm not saying this would be simple, or as nice as the tool given, but it's not impossible.

First, do a 'ls' directory dump of the 3 folders. Then 'cut' out the softlink destination from each file. This will end up with 3 files with soft links. From there, 'grep' the music and mp3 files together to get a list of soft links that are in both music and mp3. Then you can do an inverse 'grep' to remove softlinks that are in big jazz.

Even if the system creates hard links instead of softlinks, then the same could be done via inode numbers

* 4 commands vs 1

* Doesn't account for files with the same name

* No helper methods like merge when you typo

Just a couple of reasons off the top of my head

Think of the poor inodes!

I feel like I must be missing something, but I feel like tags is just a band-aid. If you have enough files, you will end up with too many tags to manage (think of a tag directory filled with tag files, instead of a home directory filled with actual files), so you'll need hierarchical tags. Or kludge it with "tag/subtag" which looks a lot like a directory. GMail added nested tags, which seems to me like an admission that flat tags is not sufficient.

One problem is that regular people don't know how to organize a hierarchy. General -> specific works really well, but that requires the ability to generalize.

The only use I can see for tags is if you want files to be a member of more than one directory. Other than music, everything I have is generally created for a specific purpose, so tags are not particularly helpful.

So tag order is important?

i.e. `music mp3 folk` results in a different virtual file system than `folk music mp3`?

I often think of tags as an unordered set, rather than an ordered list.

An ordered list smells a bit like a hierarchy to me...

This is actually kind of akin to BeFS. Although BeFS had greater capabilities in some ways, being an actual filesystem. For the uninitiated, BeFS was the native filesystem of BeOS, and allowed for metadata attributes that allowed for querying and indexing capabilities akin to an relational DB. Or at least, that's what Wikipedia says.

I just keep all my work-files (documents, downloads etc) in my desktop folder (it's the "top-level" folder in Windows when you Alt-Up, have it symlinked at /desktop/ too).

My default view is a detail view sorted by "date accessed" (descending) which is what I need 99% of the time. Especially handy when uploading random images, quick edits etc from the browser.

btw I highly recommend https://pathcopycopy.codeplex.com/ for those that use a terminal and win explorer at the same time a lot.

Why not use file attributes and provide a nice interface for managing them? (e.g. extend find to search them, etc)? A parallel database is fragile -- it can trivially get out of synch.

I thought the same thing, but in practical terms you'd need to maintain a separate db for portability anyways I guess.

Its been a generation since the hierarchical file system was obsolete. Its lame to have to hang every file on the ceremonial file tree like some Christmas ornament. Hardly any app wants data organized like that.

In fact, nearly every large app does something to avoid it. They create their own representations of a log, or a mail folder, or a document, or an image (and on and on) and manage the details themselves. Because 'file systems' are so lame and underpowered.

This tool begins to help. Creating flexible groupings (tags) resembles a relational database. That's a start. I'd like to replace the OS file system with something like that.

Instead of renaming files when you bring another copy onto your persistent storage, you could just add a version tag to them. Leave the names alone! I can tell my build system (or document store, or mail tool) what version I want to deal with e.g. tag='version' value='2.5'. No collisions any more. No requirement by the 'file system' to mash them into some file tree so they can still be found, but don't collide.

In fact this system can do everything the hierarchical file system can do, and more. Just add the tag 'parent directory name' and voila! You have a file tree (if you want).

I seem to recall reading somewhere that one of the reasons Vista was a "bad" os, was that it originally had much higher more optimistic ambitions to build a sql like file system (i suppose similar to this). It however had issues, and a decision to scrap it delayed the vista project, and reduced the scope of "cool" things it was supposed to deliver.

The thing you're thinking of was indeed supposed to be in Vista, then post-Vista, and then scrapped all together - WinFS.


It was promised for Cairo (that become win95), under the name of Object File System.

Different name, exact same promises.

An interesting chart appears on the Wiki explaining how Microsoft keeps tilting at this windmill and has two decades of cancelled or scaled back products to show for it.

I suspect there is an inherent scaling problem with the approach. While it demos well and can be useful in a lab setting, the full scale version ends up being so complex that mere mortals aren't able to comprehend it and end up just losing their files constantly in the morass of schemas and structured accesses.

The talk on the wiki about being able to write an editor that can handle any file type smells of pure insanity to me. An application that can present a usable editing interface for any filetype would be so bloated that it's almost inconceivable to think about a single company writing it. Can you imagine the kind of application that has an interface for manipulating photos, writing reports, tabulating data, programming (all languages), doing CAD, editing textures, composing music, etc...?

I can imagine such a thing; it's called a programming language. A GUI application, you say? Well then you're just asking for a graphical programming language, and as they say, now you have two problems...

I think the more general-purpose a tool is, the more intelligence and/or general knowledge is needed to wield it:

"Here's a widget thromper. You push the big green button and it thromps the widgets."


"Here's a car. It will get you from virtually anywhere to anywhere else. Don't kill anyone."


"Here's a general purpose computer. Knock yourself out."

Ok, we need you to work on the music for this TV show. Your first job will of course be to build your composition suite from scratch so you can access our incredibly complicated database oriented file storage system.

If you're competent with your language, you don't always need to build a special purpose tool with it to do work - you can use it directly. You picked an example where a) libraries are lacking (high level audio manipulation) and b) good special purpose tools exist, but it's entirely possible to find tasks for which these things aren't true. The whole idea of shell scripts as a basic tool, or IPython/Jupyter as a reasonable interface to a variety of tasks, is based on this concept.

In my last "proper" job, I wrote a lot of code, and the vast majority was one-off; virtually none of it ended up in what you might call an "application".

UI nitpicking:


$ tmsu tag summer.mp3 music big-jazz mp3

$ tmsu tag --tags "music mp3" foo.mp3 bar.mp3

$ tmsu tag spring.mp3 year=2003

Are confusing. Very non-Unix. They should be:

$ tmsu tag music,mp3,year=2003 summer.mp3

I have to ask, how are they non-unix?

They're inconsistent. Sometimes the file name comes before something and sometimes after something. It's a lot easier to remember commands if their structure is uniform, and Unix commands usually have the general structure of "program [options] [files]"

This is an interesting idea, but it seems like an awful lot of work on my part to go through and organize and tag every file. While some of it could be automated (pulling ID3 tags out of MP3s), a lot of it seems to depend on me figuring out good names for everything I make.

The biggest problem is that I have to figure out which tags are going to be useful to me in the future and where to add them. This is relatively easy for music (but even there can explode in complexity depending on how granular you want to be), but more difficult for things like photos or papers.

IMHO, fully general tagging systems never work because the complexity explodes as the number of potential tags increases. You need to narrow the scope down to a specific domain so your tags can be limited to human scale.

Very cool attack on the problem.

I've been working on ideas for managing filesystems and tools for thought for a while off and on, it's starting to coalesce into a set of design ideas as well as prototypes.

What I won't do is set up a filesystem: I don't think that adds value to me; it mostly sounds technically complex and hard to figure out. And, I think that driving the whole business through manual tagging is a lost cause. Manual tagging can be _useful_, but actual attempts to derive semantic knowledge will be _more_ useful. I have many thousands of documents - manual tagging ain't gonna happen.

I need _semantic_ search and _semantic_ cross-referencing; something like a Xanadu or a (much better) wiki/hypertext system.

You can do the same thing with git-annex: https://git-annex.branchable.com/tips/metadata_driven_views/

This reminds me of BeOS's BFS, which let you do all this as part of the FS itself :)

I can't believe this, but I created something like this in an ad-hoc way. I have a large amount of music, and I needed to be able to transfer them to devices (my phone) based on some sort of tag - I need work-out music, I need driving music, etc. Genres are inappropriate for this. The easiest way to transfer this to the device is have a directory full of links that point to the right files / directories.

So, depending on the need, I will either have two directories (files and tags) or a number of file directories and a tag directory. File directories can have whatever they want, and tag directories have either only tag directories (like workout, driving, etc) or soft links.

Tagging a file / directory is easy - just link to it. Untagging is just as easy. The links don't take up much space, especially next to the music. When I'm transferring the files, I either use a script to make a directory with the links replaced with their file counterparts, or I transfer with something like rsync that can do that itself.

I'm amazed how similar this project is to my own solution. It's nice to have a dedicated script for the whole thing, but the solution itself is very simple, and easy to script with.

This isn't born out of frustration - hierarchical filesystems are perfectly adequate for most tasks. But they have not been trees for a long time - we have links, which let us make any graph we want out of those trees.

This is just PDM.I built a system that does all of this, more, and integrates into our company's existing products for managing manufacturing data. While my version only watches the directories you tell it to, it also detects file movement and changes via a service and through the hashing of files. I'd be more impressed if someone built this into a file system directly. Combine advanced tag based indexing with zfs and you've got something impressive.

An old solution to this problem was the Logic File System https://en.wikipedia.org/wiki/Logic_File_System (disclaimer: I am one of the author).

I looked for a tagging system months ago, did fairly extensive research into existing solutions (rather than writing a FUSE layer myself) and TMSU was the leading result. I installed it and it's all ready to use.

Today, it's still all ready to use. I haven't touched it. I'm actually quite happy with the way my filesystem works, I just had this idea how great it would be to work with tag selections instead.

The only reason I might still use a tagging system is to tag some files I want to back up manually (if at all), like a 50GB disk image or some temporary big download, but in general I create one or two symlinks a year and I'm good. The hierarchy works fine.

> in general I create one or two symlinks a year and I'm good. The hierarchy works fine.

Same here. I have a few things that tags would be nice for, but it's infrequent enough that current filesystems are fine. Couple of examples

* Tagging the source [CD/iTunes/Amazon etc.] of my music - tags would be nice (and possibly doable as IDTags etc.) but "/music/source/artist/album" or "/music/artist/album [source]" works fine

* Multiple paths to the same file - I have various media (movies, music, books, TV shows etc.) related to a single series in one folder. Those should also be in my main music etc. folders. Again tagging with the series name to make a virtual "series" folder would be nice, but symlinks solve that and it happens infrequently enough that it isn't an issue.

Other than those edge cases, I'd say most of my data fits pretty well into a hierarchical structure.

Reminds me of ReiserFS's original goals.


Seems like the Tags feature that has been in MacOS X since 10.9

Was just going to post this - you can also access them from the CLI with mdfind command, mdfind tag:jazz

I could see where a tool like this would be a huge help to legal and compliance departments.

There is a frequent need to produce documents for discovery purposes. However, you generally have to review the documents or send the to outside counsel first for review before they are produced. This can take hundreds of man hours and cost thousands of dollars.

Building something on top of TMSU could be a great solution for this task.

It would seems to me a better idea to order the VFS around queries - a command-line command returns a query id, which represents a directory in the VFS (such that the command is a bit like a mkdir for the VFS), the dir might then be '/foo/tmsumountpoint/<queryid>/', and contains symlinks for all files found in the query.

I'm sure FUSE can do this.

This reminds me of something that would be useful in implementing Desktop Neo https://news.ycombinator.com/item?id=10932378

Wasn't Windows 7 supposed to be built on a revolutionary new filesystem based on a rdbms? I was pretty excited about that because it would have natively enabled a lot of the features listed here. Unfortunately it was one of the features they cut when Win7 went over budget and over deadline, and they never brought it back for subsequent releases.

It seems to me most of this stuff can be done via unix command line if you are so adept... without this app.

You can have some poor's man tag management with subdir-per-tag containing softlinks, but it gets unwieldly pretty quick.

It's not terribly portable, but you can use extended file attributes.

Here's an article showing how they work for Linux, OSX, and FreeBSD: http://www.lesbonscomptes.com/pages/extattrs.html

What existing tools can automatically generate a virtual filesystem based on tag metadata?

Tagging can be done with symlinks if you just want to stick to using the filesystem.

no not virtual filesystem, i meant querying your files by date or tag... find, grep and something to print the tag would do the job.

But where is the metadata? With OP's tool, it is in some db, so you don't have to tamper with your files or their name.

well with mp3 files the meta data is in the file, it's called an id3 tag. for other files, depends on the file.

Spotlight can do the same thing, no ? https://en.wikipedia.org/wiki/Spotlight_%28software%29

Hey guys, you should look at diamond.io, we're building something that is a tool to help you solve this problem and the problem of organizing information in general. Check us out!

"It's backed by a powerful Artificial Intelligence."


You're tackling a laudable goal, but it would help to link to a page with more technical details.

Haha, I don't want to go too much off topic of this thread, but essentially we can analyze actual file contents, source and metadata to find patterns between file structures and associate them to user and "common" labels.

On top of that we try to get further accuracy by using user information to help us find the context. We call it a Personal System, a repo of knowledge and learned preferences heavily tailored around each individual user. We're just in beta now but definitely put your email down if you want to try it out eventually!

I put my email down to spy on you:)

My personal feeling is that (A) you're totally right that we need better _personal_ organization systems, but (B) the bottom layer (tagging, schemaing, relationships) should not involve fuzzy processes but should be totally understandable by the average user.

I know (B) is a weird opinion though so I look forward to seeing how far you can get with (A) plus machine learning & whatever other tricks you guys plan:)

Absolutely, I think (B) is quite a big concern for people. While it's nice to have a fuzzy AI match things _most_ of the time, it's absolutely critical the user can always go in and override and take control of the situation.

So for us, we very clearly distinguish "a user labelled this item" vs "we guessed it was this label".

I'll keep you posted!

For some reason, I knew this would be about tagging files before reading the article.

> TMSU does not alter your files in any way: they remain unchanged on disk, or on the network, wherever your put them. TMSU maintains its own database and you simply gain an additional view, which you can mount where you like, based upon the tags you set up.

I'm not so sure this is a great design decision. Now your tags are only in your sqlite file, and you'll have to work extra hard to get a copy of the relevant tags when you backup/copy etc.

I think storing tags in extended attributes[x], and possibly a separate utility that maintains and index (hopefully shouldn't be needed just for the tags, but might help with a) exposing file-level tags (like ID3, exif, file-type (magic number) etc), and b) allow for automatic organization based on full text and other content-based indexing.

It appears, on a * nix system, the only major reason to stay away from extended attributes (apart from the limit on size of tag data) is NFS. But samba should (AFAIK) work fine with extended attributes.

As far as I can gather, Gnome Beagle is dead, and Gnome Tracker[t] has taken its place. But it's not crystal clear if Tracker will index tags placed in files' extended attributes or not. If I understand correctly, Tracker's own tagging utility, will only place/edit tags in the Tracker database/index. But the indexers will certainly honour file-level tags for some files.

I don't really use full Desktop environments, but some kind of system with inotify support, and a Xapian or similar back-end (like Tracker), does seem like a good idea. It would certainly be nice to see such a system implemented in Go, but I think an architecture along the lines of Tracker is probably worth keeping: A database daemon, an indexer and a set of query/view tools (I'm not a fan of the centralized tag database, though).

Another alternative to Tracker would be Recoll:


[t] https://github.com/GNOME/tracker

[x] http://www.lesbonscomptes.com/pages/extattrs.html

Btw, for editing/automating ID3 tags, I recommend "Ex Falso", the tag-editor for Quod Libet (which is an audio player): http://quodlibet.readthedocs.io/en/latest/

This is really complicated

I wish it would support tag hierarchies.

This just seems like a different problem.

Nice logo. D:


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact