Hacker News new | past | comments | ask | show | jobs | submit login
TMSU: Command-line tool for applying tags and viewing virtual tagged filesystem (tmsu.org)
108 points by walterbell 85 days ago | hide | past | favorite | 47 comments



I messed with TMSU back one of the previous times it was posted. It's very cool and works well but I just couldn't make myself go retroactively apply tags to terabytes of existing files.

It almost feels like a personal categorization version of the "AI Bitter Lesson": people keep thinking that doing a bunch of manual taxonomy work is going to help them find files faster but eventually search catches up


My version of that is, "metadata curation is a fun hobby but search is king."


Search and LLMs can make good use of accurately labelled data.


> people keep thinking that doing a bunch of manual taxonomy work is going to help them find files faster but eventually search catches up

This. I spent many days cataloging, tagging, deduping and organising my photo and data files, programs, bookmarks, etc.

And I've barely used any of those photos or data files since. The time invested totally wasn't well spent, and I should have just left everything called "DSC0000565.jpg".


> retroactively apply tags to terabytes of existing files

This has always felt like one of the primary issues with tag-based lookup over hierarchical. By the time you're knee deep with enough stuff that you realise tags would help, you've already accumulated too much to practically deal with.

That and figuring out what the tags should be upfront and hoping you don't realise you need additional or different tags later on.


CLI tools (find, grep, locate, tmsu) enable bulk changes to tags.


Only to the files you can already easily find with find grep plocate etc

That's because they are already tagged via path, or in the file. I'm just going to wait for multimodal LLM tagging solutions to catch up, rather than just try to hack it with current models/tech.


Something like this just for photos has been in the back of my mind forever - it would be really nice to have a virtual folder built from images where the exif data says you used X camera or took the photos on X date. This would be useful for editing applications that are not catalog based, just point them at the virtual folder and query the images you want to edit, and there they are.

Edit - Someone mentioned befs but deleted their comment, it seems like it might sorta be supported in modern linux, possibly just read only though:

https://github.com/torvalds/linux/tree/master/fs/befs


I'm sure one could whip together a FUSE filesystem like this very quickly. Here's something similar from 12 years ago: http://pisarenko.net/blog/2013/06/02/introducing-photofs-fus...


For sure, just need the time and motivation :)


I'm the guilty party who deleted the comment re: BeFS. I thought my analysis of the project was a little biting and, aside from mentioning BeFS, I didn't think my comment was adding much.

I thought about photos and EXIF tags, too. Duplicating the data from the EXIF into another repository strikes me as a bad idea. That's why I was pining for BeFS.

(I have a lot of crazy ideas about filesystems (arguably more like digital asset management systems) and data ingestion and export. Ideas kind of like the failed WinFS. Nothing will ever come of it because I don't have the skills or the time, but sometimes in fever dreams I imagine this stuff.)


With regard to BeFS or BFS the native BeOS (and Haiku) filesystem.

The TSMU examples for mp3 files + VFS are similar to BeOS.

One of the BeOS advocates - Scott Hacker - created bash script for ripping CDs into MP3s called RipEnc. It would query the CDDB to get the metadata - track names/artists etc, so the files would be renamed from TRACK1 to e.g. "Dead Milkmen - Punk Rock Girl" for the CD. It would then convert the CD tracks to MP3 files. The metadata would be added both in the MP3 ID3 fields, as well as to the extended attributes of the files in BFS, and it would organize the music in folders by Artist or Album or something.

You could then have a query - a virtual folder/directory that lists files based on extended attributes - all mp3 files by ARTIST foo, and from ALBUM bar, that would stay updated if the file metadata changed. I can't remember if this virtual directory was available at the command line - or if it was only available in Tracker (the native BeOS/Haiku file manager).

The problem with this, and it's not just a BFS problem, is that the metadata in the file and about the file get can get un-synced, either when updating it, or transferring it to another system that doesn't support the extended attributes.


If you mostly want to query and can live without the VFS, dogsheep[1] is your friend. It's a general tool to import lots of different data types into a personal sqlite instance, and dogsheep-photos[2] both extracts image metadata and uploads all the pics to S3 if you'd like.

On my to-try list, there's also supertag[3], a tag-based filesystem that's mounted via FUSE

[1] https://dogsheep.github.io/ [2] https://github.com/dogsheep/dogsheep-photos [3] https://amoffat.github.io/supertag/


Apple/Android devices could assist with offline image analysis and metadata generation, https://github.com/mazzzystar/Queryable


I've used exif-database for something similar but it doesn't build the folder, it just lets you query the sqlite database to find what you are looking for.

https://github.com/perk11/exif-database


Check out Lightroom.


I have 41k photos in my Lightroom catalog, I’ve checked it out.

That doesn’t work when I want to use Capture One, Lightroom does not apply Phase One calibration profiles which makes it useless for them, or my own raw processor for Sinar digital backs.

Recommending the most common digital photo DAM/editor is not really a helpful comment either. The number of people who know what exif is and don’t know about Lightroom has to be…small.



I'll admit when I first saw this, I was put off by the idea of having a separate tool to do something that _should_ be baked into the filesystem. But honestly, this a pretty close to a very Unixy way of solving the problem. Have a separate (and importantly: optional) tool that does the job and does it well.

Additionally, Linux _does_ support tagging files right in the filesystem via the user.xdg.tags xattr. Although it looks like Dolphin is one of the few userspace tools that knows about it.


This reminds me a bit of the DESCRIBE command in 4DOS ca. 35 years ago [0]. It was only a single text entry per file, but supported by many tools [1], including file managers. There was a proposal to extend the format to XMP properties [2].

[0] https://archive.org/details/bitsavers_jpsoftware_65101374/pa...

[1] https://4dos.info/4tools.htm#02

[2] http://www.optimasc.com/products/fileid/4dos-descext.pdf


Yoo this is such a great idea , I once saw a video of a youtuber creating open source tag software and I don't know , I realized a frustration and I was also needing something like this once and I was installing this tag cli tool

but this seems even better, this is why I am on hackernews


See also:

- "Designing better file organization around tags not hierarchies" [1]

- `tag` - a macOS version of `tmsu` that uses the system tags (xattr-based if I recall) [2]

[1] https://www.nayuki.io/page/designing-better-file-organizatio...

[2] https://github.com/jdberry/tag


I also inadvertently implemented something similar as a zsh script [1] and as a simple rust CLI [2] a couple years ago.

[1] https://github.com/xdoardo/zshelf

[2] https://github.com/xdoardo/shelf


xattrs are great, and would the obvious solution for tags/metadata on Linux too, if the syscall API didn't delete them at every opportunity; programs must be explicitly told not to do that.

https://wiki.archlinux.org/title/Extended_attributes

That being said, it'd be cool to see a port of that CLI to Linux using user.xdg.tags. You can avoid deleting them if you're careful.


I was wondering why this wasn't using xattrs.


yea when I was looking for linux version I was finding tag again and again lol


There's multiple projects that attempt similar thing with SQLite, most recent one being Tag Studio. Seems like tag-based file organization is better solution but required mental cost/effort of upkeep is what keeps it from gaining any traction in long run

https://github.com/TagStudioDev/TagStudio/ https://www.youtube.com/watch?v=wTQeMkYRMcw


User notes and other file metadata has been supported since Linux Kernel 2.6. See man xattr

https://man7.org/linux/man-pages/man7/xattr.7.html

Since it is baked into the file system, it is pretty easy to create bash scripts to add keyword tags by parsing the directory tree (e.g. batch add tags to books, movies, videos, etc stored in hierarchical category directories).


Preservation of xattrs is app-dependent? https://news.ycombinator.com/item?id=42807500


Most apps have flags to be used to preserve xattrs. Others were written as though they never knew xattr existed and therefore fail to use the kernel flags to preserve them. But it is not that difficult to preserve them once you are using them.

I use rsync to back up my book, music, and video collections (which is where I use them) and the meta data is backed up with them - so if something ever happens I can always restore them. The xattr commands also have a backup and restore for just the xattr built in.


Could a custom Linux kernel force xattr preservation flags on even for xattr-unaware apps?


Assuming the application used kernel file system calls, I don't see why not.


I still maintain that all it would take to add tags to the unix filesystem is to start calling "files" "tags"

then use ln to add tags.


Wouldn't really help for the use cases that I can think up.

  $ ls -l # How grep for files with tag "foo"?

  $ find . -tag foo


The hardest(most inefficient) part is finding out what tags a file has.

    find . -samefile group/beatles/love_me_do

    album/please please me/side 2/1
    vocals/paul mcartney/love me do
    vocals/john lennon/love me do
    year/1963/love me do
    group/beatles/love me do
You have to love ontology to go down this route. I do, and did this once... It is possible, but does not really provide any meaningful advantage.


Makes me think of tagspaces https://www.tagspaces.org/


I really wanted this to exist about 15 years ago, but at the time didn't have the skills to make it happen. Nowadays though I'd want something that can sync across between Linux and Android at the very least. But I've gotten to comfortable with find and grep to bother.


If you are interested in one of the absolute worst ideas that worked flawlessly over the years: I wrote a 100-line python script that just appends a magic number and one or more tags as ascii at the end of an arbitrary file.


Nice. What were some files/apps that didn't complain about the unexpected suffix?


I mostly use it to tag media files (music, video, photos), pdfs, docs/spreadsheets. So far I haven't encountered any issues. But obviously if you tag a plain text file you are gonna see the tags at the end.


Nice idea, all these type of databases should indeed support both hierarchies and tags for any decent organization, but no gui and no update tracking - unfortunately, that's way too limiting


The fact that some kind of tags or Key/Value storage as attributes on files, has been missing until 2025 (and still is) seems so bizarre to me. Our file systems have hardly changed since the 1960s. We get filename, timestamp, filesize, and that's about it. Pathetic.

Imagine the opportunities if a folder structure could represent a "document" where each file represents a paragraph, or image, chunk of that document. We would be able to do 'block-based editors' (like content management systems, or Jupyter Notebooks) without having to have some large XML file holding everything.

Even if we had simple "ordinal" (ordered position) for files that would open up endless opportunities for innovation in the 'block-editor' space, but sadly File Systems development has been frozen in place for decades.


See man xattr available since kernel ver 2.6


I did research this a few years back because I needed it! But I missed it! Since 2009! Thank you so much for telling me!!

Or is this simply the Mandela Effect and xattrs didn't exist in the universe I've been living in, and I've jumped? haha.


I do this manually by appending `__t_tag`, where `t` is a category and `tag` the value.

E.g. `__o_car`, where `o` means object, or `__p_supercode`, where `p` = project, `__t_ml`, where `t` = topic, ml = machine learning, etc.

No dependencies, hardcoded into the files forever, and search is reasonably fast too (don't need it that often anyway).


That still looks pretty hierarchical. I think the point of tags vs something like categories is that a file can have multiple independent tags. Your system can do it with symlinks too of course.


Forgot to add, that those can be concatanated, e.g. `__o_car__t_finances`.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: