Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I want to have both options because traditional folders and a tag based file system solve different problems.

Let's say I go on vacation with my dog and make pictures. After I am home again I want to sort the pictures but then I have a problem: the pictures in which you can see my dog in belong into the '2019 vacation to Bavaria' collection _and_ also in the 'Best pictures of my dog' collection.

I'd love to have some sort of universal file-database where I can store all my "final" images and then create collections by adding tags.




In macOS (for some number of years now) you store files in a standard folder hierarchy, and you can add tags to files. With or without tags, you can use Spotlight (cmd-space) to quickly find files.


Yes, I know that there are approaches which use the normal file system.

But what I want is different:

I want a universial "DB for binary files" where I can store binary data and all its metadata.

Then I can use this DB to build a app for picture galleries, music collections and tons of other things.

This DB should also support:

* automatic checksumming so that I can detect data corruption

* Some sort of version history so that I can store multiple versions of a file

* there could be built-in replication which I can use to see the same data (or parts of it) on all my devices


Is this common enough a use case that it ought to be a file system feature?

Plenty of applications do just this today by using the file system plus an index in SQLite. Is that method insufficient?


You asked: "Is this common enough a use case?"

Then you answered your own question: "Plenty of applications do just this today".

So yes, it is a common enough case that it could/should be built into the OS.


Well, OS could certainly offer some support, like file change detection, but the main indexing is often too application specific.

Photo albums wants to do face recognition. Music player wants BPM detection. Should those be done by OS? I do not think so.


Sounds a lot like ZFS.


Exactly - ZFS offers everything on the bullet list.


I think the old BeOS supported what you are describing.


And Haiku, its open-source successor does also.


I think we'll start to see that happening as machine learning is used to tag the files. See Google Photos for example, where you can basically use search in place of any sorting.


I think most of the photo organizers offer this? I remember using digiKam back around 2006 or so, and it already had this feature.

I think tags have limited scope. They are great for photos. They are OK for music, but strictly in addition to the main hierarchy. You could use them for text docs, but folders + full text search is much better. And file level tags are completely useless for code/programming


If you have a rich set of tags, you can have more than one main hierarchy.

Artist sort, year sort, genre sort, etc. Genre is really hard though.

There isn't much reason to use tags as the only way of organizing things though, they work great as views.


Agree, views, but not main organizing principle.

For example, my music collection is big and diverse, and both "year sort" and "album sort" are kinda useless now, because there are actually multiple disjoint subsets. There is no point ever in showing me audiobooks for year 2010 and regular music for year 2010. I always only want a subset of it.

This is what I meant "strictly in addition to main hierarchy" -- let me keep my folders, and maybe when I want to go deep enough, I want to browse by tag. But even then it would not be a hashtag-like tags that the original page refers to.


You can achieve this in existing systems with a hardlink.


A hardlink is a (poor, partial) implementation of a tag in the file system. A tag will recover a collection of all the elements filed under the same label, and a link can't do that.


I think hard links could. To take the up-thread example, you have one folder called "Vacation2018", and another called "BestDogPics". The photo of your dog on vacation lives in both folders, but hardlinked together.


I think the key thing here is that files can indeed be categorized in different ways, and could - and should - exist simultaneously in different collections with different structures.

I therefore think it'd be worth separating the concept of a "filesystem" from a "folder system" or "index system". That is: keep the file storage itself flat (e.g. in a relational database table), then have different categorical "views" that could be relational and/or hierarchical pointing to these files. Naturally, those collections will have their own sets of metadata for that file.

So for example, you have a file named "img-8675309.png" in your camera's storage. The operating system presents a view of said camera storage, in the form of a flat list of files and some basic metadata like creation date (plus perhaps camera-specific metadata if, say, the camera driver is the thing generating the view). You could then open up views for your 2019 Bavaria vacation ("Vacations" → "Bavaria 2019" → "Photos") and your dog ("Pets" → "Fido" → "Photos"), set a sorting field for each view (for the vacation, probably chronological; for your dog, by however you define "best"), drag/drop the camera file into those views (in the latter case, maybe even drag it into the spot where you want it to show up ranking-wise), and the operating system would then add references to that file automatically (almost certainly copying it into a local cache) in both your opened views and potentially in some system-maintained views (e.g. "Local Files" → "Photos").

One of the slick things here is that file access could be entirely transparent to how those files are stored. For example, those views will of course include your device's internal storage, but might also include external devices (like the camera in the above example) or even remote services (like, say, your social media account). If you accidentally delete your prized Fido photo on your local machine, the "Pets" → "Fido" → "Photos" view could still have a reference to the copy on the camera, or a copy in your social media posts, or a copy in the system backup that automatically ran last Sunday, and thus retrieve it and re-cache it locally (or prompt you to plug your camera or your external USB drive back in so it can check there).


I would like both as well. I can see value in tags but I am thinking from a developer perspective, I'm not sure how I would target a specific file in my code without a tree structure? Maybe I just haven't thought about this enough but it seems necessary.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: