Hacker News new | past | comments | ask | show | jobs | submit login
Yet another hot take on “folders versus tags” (eleanorkonik.com)
143 points by not-now 5 months ago | hide | past | favorite | 174 comments

I keep my personal notes in an app that I wrote which is backed by SQLite. It resembles a wiki.

First, I tried tags. This seemed like a good idea and was a lot of fun at first, but eventually I got _really_ tired of having to curate tags for all of my notes and boy there were a lot of them! It's not something you can just do once either, because every time you update a note you have to remember to change the list of tags as well. I wound up with many articles whose tags no longer (or never did) match the content.

To get myself out of the tag-curating business, I then tried the "folder" idea by separating areas of concerns into namespaces. So, all of the Python-related notes go under the "Python" namespace. All of the database notes into "Databases," and so on. All the did was shift the problem space. Which namespace does the note on SQLAlchemy go in? If an article does not fit into an existing namespace, should I create one for it now, or put it in the root and hope I remember to move it if a second related note comes along? To put it succinctly, my notes do not fit into a DAG.

My third iteration dropped both tags and namespaces and implemented the FTS5 search built into SQLite. Two years later, I have no regrets. The only "curation" I have to worry about is giving each note an accurate title and breaking up large notes into smaller ones when it makes sense. Now when I need something I just search for it and it shows up in the results.

Google for all of their other faults got one thing extremely right when it came to web content and email: curation is a fine hobby but for Really Getting Shit Done, you index the content well and then just search for it when you need it.

Search works brilliantly for your own notes because you know what you're looking for. You wrote the notes. It fails for problems that you can't describe well though, especially if you don't know the technical jargon to search for. Directories and tags can work for that problem because you can easily see relationships like a hierarchy of folders, or what other tags a tagged item has. Tags and folders also act as a prompt to classify and organise things, which occasionaly leads to discovering how things are connected.

"How to find stuff" is a space with many solutions to similar looking but subtly different problems.

> “You wrote the notes.”

HA! In my case I wrote my notes, but don’t remember what’s in ‘em. It’s endemic for learning.

folders/tags is a false dichotomy.

This is how I tell people to use Confluence. There’s no hope of ever having the general set of all articles organized in a way that makes sense so just try to make the article searchable. If it’s about an error, make sure the error message is in there. If it’s about configuration of a feature make sure the feature name and “configuration” is in the title. Also make sure to write a paragraph describing what the article is about and different times someone might need it.

It’s like internal white hat SEO

Making docs discoverable is good, and this works well for retrieving mostly static documentation and tickets. It gets complicated when when resulting items are used for troubleshooting or new documentation and tickets.

When results, e.g., include multiple tickets and documents referencing the same error or configuration, which do you use for troubleshooting? Which results are the authoritative / canonical / source of truth for documenting the latest configuration directives?

Versioning used well for the input helps. Organization and ontology is additional context for using search results well.

> So, all of the Python-related notes go under the "Python" namespace. All of the database notes into "Databases," and so on. All the did was shift the problem space. Which namespace does the note on SQLAlchemy go in?

You don't need to have a perfect heirarchy, you just need to have a heirarchy that you'll remember.

If you constrain yourself into trying to produce a Liskov free heirarchy you'll tie yourself in knots.

I'd probably put SQLAlchemy under Python because it is python-specific. Databases should be kept for your notes on PostgreSQL administration (and you may have python scripts in there but they'll be to serve the purpose of administrating a concrete PostgreSQL install or something.

And you don't need to find all the database stuff and collect it together, you just need to know where you put something. So largely put it in the folder that makes the most immediate sense to you and don't overthink it. Later if you hate it, refactor it. But remember that you're dealing with your own personal hash table for retrieval and not trying to implement group_by. The latter issue is where you'd need an index over your notes that was searchable, but I don't find myself needing it.

> To put it succinctly, my notes do not fit into a DAG.

I think you mean "tree" or "hierarchy"? A DAG would be a superset of tagging, and could place SQLAlchemy under both Python and Databases, as long as you consider the edges oriented (i.e. there's a parent-child relationship).

Doesn't solve the tag maintenance issue, though.

Technically, if their notes can't be represented by a DAG, they also can't be represented by a tree either.

Right but the point is that the notes, as described, can actually be represented by a DAG. They just can’t be represented by a tree.

That's a nice summary of the problem and a solution I think I can agree with. One quibble is I think your notes could maybe fit into a DAG, but (as I think you meant) not a tree. In practice I don't think there's an easy way to do that - it would mean lots of sym or maybe hard linking and I don't know enough about filesystems to know if that's feasible.

But the one area that curation carries a big advantage over search is in browsing. You can do maybe some things with topic modeling and recommendations to allow a kind of browsing, but with searching it's really hard to know whether you have thoroughly covered a part of the space via searching, while with hierarchical curation that is easy. Filling in that last part with a good solution would make searching a no brainer over curation I think, but IMO I don't think most search solutions try to handle that right now.

> I wound up with many articles whose tags no longer (or never did) match the content.

I'm curious about your tagging strategy. Could you show some example of this issue?

I feel like my tags are very different since they're barely curated, but I can't imagine ending up with a "wrong" tag. I may miss some and have to add them later, but i can't remember ever removing or changing a tag. That's both in pinboard and in my scanned documents.

for me, folders are just a limited form of tagging. since i switched from mutt with its folders to sup which uses tags for email, i don't want to look back. first of all, my extensive list of mutt folders was trivially translated to equivalent tags, so i could continue as i was used to. but then, tagging allowed me to create additional tags as i needed them. i didn't tag everything, and sup also indexes all mails and also has search, so in fact i get the advantage of both tags and search (and i can save searches and treat them like virtual tags/folders as well).

well, effectively, tags too, are just a specialized form of searching.

the one thing i don't see is the problem with tags getting out of sync.

yes, it happens. tags do become obsolete, but that is rarely a problem. it just means that i find get more results than i should otherwise. occasionally when i see way to many obsolete tags i go and clean up that particular category of tags. if it is just a few then i ignore them.

and if i look at a tag and i don't find what i need, then i search for it, and add the tag when i find it so that next time i'll find it faster, because with more than 2 million emails in my archive search alone is not good enough.

> for me, folders are just a limited form of tagging.

I understand that this is a very subjective thing... but folders offer an exclusion which isn't immediately available via tagging. Tagging is implicitly an OR operation.

Now, your search engine might offer ways to add "not X", but it's a tradeoff. It you have highly-compartmentalized bits of info you end up with the problem of "how much do have to explicitly exclude from my search?".

It's complicated.

Personally, I think we're missing a level of organization somehow.

> Tagging is implicitly an OR operation.

Doesn't have to be. I can't think of many tag-based search systems that default to OR; usually the default is AND, and that's pretty sensible.

There are plenty of text-based search systems that default to OR, though (see also: pretty much every mainstream search engine), and it makes searching a fucking nightmare, so I definitely get the aversion.

not sure i follow. can you give an example how exclusion works? all i see is that without using symlinks i can have only one folder/tag per item. i can choose to do that with tags too if i want, but i don't see how using folders helps with that other than forcing me to do it.

If you can nest folders, then a thing is all of its parent folders in addition to the folder it is directly contained in. This is the AND relationship, which you don't get with most tagging systems.

To highlight exclusion, consider an example- folder Animals with subfolders Cats and Dogs. A thing can be either a cat or a dog, but not both.

but that's just tagging discipline. if i tag an animal with cat and dog, that's obviously wrong. but i don't need folders to help me avoid that mistake.

if an object has five tags, it is all those things. that looks like an AND relationship to me.

the rest is search. if your tagging system only does OR searches, then that's indeed a problem, but of the implementation, not of the concept.

and there is no reason both can't coexist.

one application that i use extensively is kphotalbum. it allows me to organize photos by tags, categories, date, folders and other attributes all at the same time. i can navigate and search photos based on all of those attributes. usually i drill down a folder, and then pick tags from within that folder, or i pick tags at the the top level and then work with the folder structure of the selected images. ironically, kphotoalbum does not do well on OR relationships or exclusion of specific attributes. i often use temporary tags to help me there, which kphotoalbum makes trivially easy to use

The final step GP is describing would correspond to notmuch[1] for mail. Have you looked into that?

[1]: https://www.notmuchmail.org

> Everything is a Tag. Users are just Tags with passwords.

I like tags which are structured as a tree, so that you can tag something as “Python” and it will also acquire the parent tag “Programming”. When you add a new tag for the first time you still need to eventually curate it by giving it other tags as parents if applicable. I suppose this is roughly equivalent to a directory structure where each item can be in multiple directories at the same time.

That's a really valuable insight and well said, thanks for sharing.

I wonder if tags could work like spam detection. For each tag mark a few things as "tag" and "not tag" and let an AI figure out everything else.

If there's a tag you never look at, and so it's poorly trained, who cares? You never look at it. Commonly used tags will be better trained, because you will have marked more things as "yes tag" or "not tag".

How good is the search performance? I have used an a couple of apps that provide full text search, but it is very slow when the number of notes is large.

I think the best solution is a mix of 3, folder structure with tags, full text search when needed

I feel like you must be implicitly tagging your notes.

Let's say you have notes about SQLite. If you don't put "SQLite" in the note, how will you find them?

select * from notes where upper(text) like '%SQLITE%' :P

My point is "sqlite" is effectively a tag.

Right. If you search for "database" instead of sqlite it will only show up if you happened to use that word in the notes. If you're searching notes from years ago you might not remember the exact keyword you need. You need a search engine with a powerful enough synonym/grammatical similarity capability.

exactly the same sold me on spotlight search on macos. Have to sign "for all of their other faults got one thing extremely right" as well, sigh.

Trees/hierarchies ("folders") are for organization, unconstrained graphs/networks ("tags") are for ontologies. Crossing these streams leads to a lot of trouble.

When flexible graphs/networks are abused for organizational purposes, you get circular dependencies, spaghetti code, and general dysfunction. Organizations (code or people) need to be easy to navigate.

When rigid hierarchies are abused for classification purposes, you run into "class House, class Boat, class HouseBoat extends ???" knots. Ontologies need to be flexible.

Modern filesystems necessarily use both: we have folders, and we have file types/tags.

You could hypothetically collapse the two though. You can have tags and could use those to automatically generate efficient organization structures.

For example, say you have the following datums with the following tags

  apple:     plant, Rosaceae, tree, fruit
  peach:     plant, Rosaceae, tree, fruit
  rose:      plant, Rosaceae, shrub, flower
  dandelion: plant, Asteraceae, herb, flower
  carrot:    plant, Apiaceae, herb, root
  pig:       animal, Suidae
You can then try to generate a folder structures that optimizes certain parameters like keeping the average number of children to not too far away from ~6 and/or minimizing the depth of the repo structure or whatever. Depending on how you wanna optimize it, you could end up with a number of structures such as:

  - plant
    - tree
      - apple
      - peach
    - herb
      - dandelion
      - carrot
    - rose
  - animals
    - pig

  - Rosaceae
    - apple
    - peach
    - rose
  - animal
    - pig
  - other
    - dandelion
    - carrot
You can use your imagination to think of better structures or optimization problems, but you get the basic point. We can have tags be our primary classification method and have the organizational structure be an outcome of that.

I like this approach: building a simple folder interface on top of a more flexible tag system, which can always be accessed when necessary. I think many systems would benefit from it.

There are tradeoffs, though: a fair bit of overhead for each heavily nested item. And you would want some support for ensuring the integrity of the hierarchies (i.e. someone accidentally removes "tree" from the tag set of "apple").

I implemented this for my email. Each email can have any number of tags. Each tag can have a directory hierarchy.

So for instance an email can be tagged fun/ski, social/facebook/marketplace, money/receipts.

If you open the social/facebook/marketplace “folder” you’ll see the email, but also if you open the money/receipts “folder”. You could also see it in the intersection of the social tag and the fun/ski tag, etc.

maybe on top of your idea ensure there's a graph feature like at the top right of https://help.obsidian.md/Obsidian/Index

Here's a library for writing such classification systems in Emacs Lisp: https://github.com/alphapapa/taxy.el

Especially, see this example: https://github.com/alphapapa/taxy.el#sporty-understanding-co...

An interesting hybrid approach I've experimented with in the past is to use tags while ensuring that the tags themselves are purely hierarchical. (Is there a name for this scheme?)

I'm not sure of a name, but Tiddlywiki does this. You create hierarchies by having tags by creating an item with the tag as its name which causes all items with said tag to be a child of that item. This has a nice side effect of allowing items to exist in multiple locations (so no unique parent is enforced) while still requiring the graph to be acyclic.

It ends up working kind of like hard links for folders/files, but it is a lot easier to setup since child items are the ones which declare where they are located, not the parents/directories. I think another reason why hard links are more difficult to use than this particular system is that with Tiddlywiki, it is easy to see all the locations an item falls under at once as well as seeing all the items at a particular location. I feel like adding this reverse location information would be quite helpful and would be less of a change than implementing tags for existing filesystems.

Sir Bimlas' TW5-Locator tool is worth checking out: https://github.com/bimlas/tw5-locator.

Forever ago I built something similar for generating canonical urls for SEO from a tag system (and to concentrate "google juice" from linking), though hierarchies were not strictly enforced.

"expert/american" and "american/expert" couldn't have duplicate content so you take your tag system and overlay hierarchies.

Anything that wasn't manually set was auto hierarchied based on highest traffic volume.

I also prevented tags from showing up on the same tag chain, so any given keyword could only appear once. That prevented infinite recursion.

Moving away from hierarchy also made interesting permutations easier to generate.

Since any metadata can become a tag, price or price range ($100hr – $300hr) is easy to generate and "enhance" American/Experts/Between$100and$300anhour.

It worked really well, allowed us to manually enforce high traffic hierarchies/phrases while still auto-generating intelligent canonical links for the rest of the site.

Didn't google punish your site for having duplicate entries under different urls?

I might not have been clear enough I think. I can clarify further if the below is insufficient.

That was the point of labeling 1 version of any duplicated permutation "canonical", to prevent duplicate content penalties.


I'm having troubles visualising what you mean, without it becoming "just folders".

Can you elaborate?

I am not the author of the previous comment so my view could be different. But I have thought about it. With tags, you could have got one note in multiple directories. Imagine I have found an interesting blog post on HN about SQLite in python. It could be problematic to decide is it more about python of Sqlite? Tags could look like this:

python, databases, SQL, SQLite, $date, FromHN, beautiful_site (I liked the look of this blog)

or in proposed system

/IT/Databases/SQL/Sqlite /IT/Python/libraries/SQLite /IntrestingHNposts/2022/July /beutiful_site

the difference between "just folders" and the proposed system is that you could have one note in multiple folders. Which gives you more flexibility in assigning notes to its topics. But it is still more structural than tags which could easily turn into an unpenetrable list of random words.

To be fair you could also achieve it with symlinks.

As a node in a tree hierarchy any folder can only have one parent folder. Tags of course allow nodes to have any number of parents (aka "associations").

The relationship between arbitrary nodes in a tree can be determined by tracing their common ancestry, but tags don't provide equivalent functionality, unless you strictly define how tags themselves relate to other tags. An obvious way to do so is to prescribe that every tag shall have exactly one parent (except for the root abstract "thing" tag).

In other words tags become folders, but any non-folder content of those folders can simultaneously live inside any number of folders. Similar to symlinks, but arguably less hacky, because there is no differentiation between "actual" location and "linked" location.

> Similar to symlinks, but arguably less hacky, because there is no differentiation between "actual" location and "linked" location.

In other words, similar to hardlinks

Right, yes. True. I should have compared with hardlinks instead.

Minor detail: I intended for different deletion semantics from hardlinks. Whereas hardlinks use reference counting for that (only the last deletion actually deletes); for my purposes, delete anywhere meant delete everywhere.

Maybe something like "hierarchical taxonomy", or just simply class hierarchy? Though I'm not aware of such a term that's explicitly for tags if that's the point.

The challenge is knowing which one you are dealing with at any particular time.

I've always wanted to see a "folder view" of a purely tags-based system. At the root level, `ls` shows you a list of tags (and possibly "all files"). cd into a tag, and `ls` shows you a list of the remaining tags (and relevant files). Repeat as many times as you like.

Example session might look something like this:

    / $ ls -d
    foo bar baz
    / $ cd foo
    /foo $ ls -d
    bar baz
    /foo $ cd baz
    /foo/baz $ ls -d
Obviously, at the root level you're also including all files so a naked `ls` would return a lot of output. But this seems like a good way to bridge to the POSIX world; you could probably implement it on any modern OS with a FUSE filesystem.

I'm really trying not to overly promote Supertag[0] in this thread (disclaimer: I am the author) , but it does exactly what you describe.

0. https://amoffat.github.io/supertag/

In my opinion, any OpenSource application should not be bashed for promoting in threads like these. What will you win? probably more work as more people try your software and raise issues (feature requests or bugs). The fact that it is Libre means that any "self promotion" will not have any monetary gain, so it is fine with me.

Aaaaanyways... SuperTag looks AMAZING. I will give it a try right now and see how it works for me. Thanks a lot for implementing it :)

EDIT: It doesn't want to install for me on OSX :( Oh well, it sounded too good to be true haha.

      ~ brew install amoffat/rnd/supertag

    Running `brew update --preinstall`...
    ==> Tapping amoffat/rnd
    Cloning into '/usr/local/Homebrew/Library/Taps/amoffat/homebrew-rnd'...
    remote: Enumerating objects: 24, done.
    remote: Counting objects: 100% (24/24), done.
    remote: Compressing objects: 100% (17/17), done.
    remote: Total 24 (delta 4), reused 24 (delta 4), pack-reused 0
    Receiving objects: 100% (24/24), done.
    Resolving deltas: 100% (4/4), done.
    Error: Invalid formula: /usr/local/Homebrew/Library/Taps/amoffat/homebrew- rnd/supertag.rb
    supertag: Unsupported special dependency :osxfuse
    Error: Cannot tap amoffat/rnd: invalid syntax in tap!

I think the HN consensus is that self-promotion in the comments is OK as long as the conflict of interest is disclosed ("Im the developer") and the comment is relevant to the conversation (no spam), which were both satisfied in this case.

Osxfuse is notoriously problematic software. Would you mind opening an issue on the supertag project please?

Wow, I did not expect that! Fantastic, I'll check this out when I get a few spare cycles.

EDIT: Couldn't wait. This is an exceptionally well put-together FAQ, explaining the design constraints: https://amoffat.github.io/supertag/faq.html

Does this also allow looking for e.g. files that are either in tag A or B or both? Intuitively I'd guess this is a limitation when you use a path to filter files, though I might be wrong.

Yeah the path parts are AND, with no ability to do OR, but you can do NOT. If you wanted to do your use-case, you'd need to look in 3 separate places... /A, /B, and /A/B (or /B/A)

This looks great! Am I understanding correctly that all tagged files need to have a unique filename to be addressable?

No, but that was a challenge to overcome. See #2 here[0]

0. https://amoffat.github.io/supertag/faq.html#why-are-my-files...

Very interesting

I think 'ls' should show you tags plus any untagged files. Thus no files "fall through the cracks" due to not being tagged.

So if your files (and their tags) are as follows:

  file1 ()
  file2 (t1)
  file3 (t2)
  file4 (t1,t2)
your session would look as follows:

   / $ ls
   t1 t2 file1
   / $ ls t1
   t2 file2
   / $ ls t2
   t1 file3
   / $ ls t1/t2
And 'ls' should have a flag to show all files regardless of additional tags:

   / $ ls -a
   file1 file2 file3 file4
   / $ ls -a t1
   file2 file4

With symlinks (or hard links for that matter) you can have folders work like tags in a real filesystem.

That said, binary classification (object A is a member of class B, a function with a boolean truth value) is the basic concept in classification. That is, any classification can be represented correctly with binary classification.

There really are a few things where you assign something a category from a finite set (like a chess game was won by "White", "Black" or was a draw) but frequently when people build ontologies based the idea that "A is a member of one of this set of categories" they are going to screw it up.

> With symlinks (or hard links for that matter) you can have folders work like tags in a real filesystem.

The word can is doing a lot of heavy lifting there.

Tagging system (at least the ones I've seen): enter the tags as autocompleted tokens in a single text entry. Possibly the system can autosuggest a number of tokens, possibly not, depending on your use case.

Symlinks-as-Tagging-System: I want to save a file called "foo.txt" and start editing it immediately. How do I enter the tags?

Presumably the same way you would enter tags in a regular program that has tags, except the datastructure would be stored as a disk file system. you dont necessarily have to manually run ln -s every time, the same way you dont have to run a sql insert to update a tumblr posts tags.

Yep, that's why when we built our PKM app[1], we decided to eschew folders not in favor of tags but in favor of nesting the notes (which we call cards) themselves. From there a card can be nested inside multiple other cards, which is an idea we stole from symlinks, which generally gives the benefit of a structured hierarchy while also giving the flexibility of tags. That being said there are also tags for those who like a flatter structure.

[1] https://supernotes.app

If only symlinks worked for every software / OS. On Windows at least, there are so many different way to link things (including "subst") that it's hard to get idea which one to use. (I do love symlinks, but having spent most of my time on Windows, there are some edge cases I've hit, for example some third_party software not following the symlink and simply failing to open the file..., and no that was not the old .lnk style link)

mklink if anyone is curious. It can act oddly in a few cases. Especially if you cross volume/network boundries.

Is there some reason that a filesystem couldn't be written to permit multiple paths to the same file without the clunkiness of sym links?

The problem with multiple paths leading to the same file isn't with the file system, but with the file system users. When you're writing scripts or programs against file systems it is really convenient to assume that each filename is one and only one file, that the file has a unique name, and that the file system is strictly a tree.

File systems can technically break all of these; files can contain multiple files within themselves with many file systems, the ability to have multiple paths leading to the same file means that if you just have the file in hand you don't have a unique path, which also means that if you remove that name it doesn't mean you've removed the file. But most of the time, you can kinda get away with writing normal code and it'll do the right thing. But that last one really burns. It is really easy to write code assuming file systems are just trees, and in particular can't have infinite loops, and be wrong about that.

i make extensive use of hardlinks in my files (especially with photos), and i don't see the problem. if i remove a file, i don't usually want it gone completely, but i want removed from this particular location, if it is hardlinked elsewhere, i usually still want to keep it there.

if i want to really remove it, i can scrub the contents without removing the hardlink, and if i want to delete after scrubbing i check the hardlink count and search for the remaining entries.

Doesn’t a lot of software break hardlinks on edit? I think non-database software usually completely replaces the file via overwrite or atomic rename.

Sometimes you might want that behavior, but other times you might not. But there’s no option in any app I’ve seen to choose how to handle hardlinks on edit/save.

yes, i consider this a conceptual design flaw in unix. effectively i think a better way to do this would be if the move or copy operation onto an existing file would move/copy the content of that file onto the target inode. there could be a seperate rename operation which keeps the current behavior

fortunately in my case this issue is not a problem. i use hardlinks mainly for pictures where i never want to replace the original anyways, so editing always gets me a copy which i don't need to link to multiple places.

Unix has supported multiple paths pointing to the same inode for a very long time. `man ln`.

(Windows) NTFS has hard links which are multiple pointers to the same file, but I imagine that the strict separation between FSes that made it possible in NT is also the reason why Unix, which has a single arbitrary hierarchy, doesn't have hard links.*

* Yes there are hard links, but the single hierarchy means that unless you memorised the specific FS of each folder, it's gonna be hard.

I have the answer it's a graph based filesystem! Just kidding I don't really have an "answer" just your question inspired me to Google for a graph based file system.


That would be a hard link, though they can't cross filesystem boundaries.

That's a hard link. They have problems too.

Spatial metaphors is arguably the easiest shortcut to a design that makes sense to humans. Spatial reasoning is something even animals are capable of.

Categories are much more abstract, and while useful, less intuitive.

Which is why folders should remain, complement them with other means of navigation for sure, but hiding them away will only make things less intuitive.

In the end, whatever capabilities your design has are only as useful as their ability to be understood and used.

Indeed, consider how one navigates a "memory palace" going into a specific room (=folder) !

I don't consider folders spatial. At all. The only hard relation is hierarchical, and in a folder, there's no inherent order or spatial relationship. I find it a pain to navigate.

Actual spatial arrangement of data (and code) is something I've wanted to try for a long time now though. It's incredibly how you can take a map that shows the entire planet, and in a few seconds zoom in to the building you're in. And in a few more seconds, zoom in to the the house you lived in when as a child in a different town. I'd like to try build a similar map of personal data and code.

It is a bit like the original Myst, which was made of a bunch of "places" that you could walk between along pre-defined paths.

But that's also how we conceptualize space. One place is my bedroom, it connects to a space that's a corridor, which connects to a few spaces like the kitchen, the bathroom, the living room. In one of these spaces is a bookshelf, which contains a shelf for sci-fi, a few shelves worth of computer science and physics, then a few shelves of philosophy.

I don't know the Cartesian coordinates of the particular books, but I know if I want to get a book, I walk the path connecting the bedroom, to the corridor, to the living room, and then look in the bookshelf. Then if I want to say read Dune, I know it's in the bottom-most shelf where I keep the sci-fi.

That procedure conceptually translates to

  $ cd ..
  $ cd livingroom
  $ cd bookshelf
  $ cd bottom-shelf
  $ cat Dune

Yeah these hierarchies are something that really slow me down. Was it arch/aarch64/dts? Arrgh no, arch/aarch64/boot/dts? No, actually arch/arm64/boot/dts..

Similarly, I couldn't decide on a hierarchy for my apartment. What's above living room? If root is your entire house, does that mean you can skip into any room without going through another? I can't go to the bedroom at the other end of my apartment without going through corridor..ish thing and the living room. I can't place room above the other because none contain the other. So I guess root is the whole house then. What does cd .. in my bedroom do? Do I get out of the apartment, so that I can enter another room? That's just weird. But ok, I'll take it: I can magically teleport out of my room and "into the apartment" and just warp into another room. Convenient, could we also flatten the bookshelf and bottom shelf into my apartment so that I can warp straight to them without first thinking about "livingroom"? Hierarchy begone :)

And still there is no spatial relationship between items under livingroom. It's unorganized chaos. I have more than one bookshelf.. I can refer to the bookshelf on the left or bookshelf on the right but in a folder? It gets interesting when you try to draw the line between living room and kitchen since they're kinda one and the same but not really. They're spatially separated but it's hard to say where one ends and the other begins.

On a map, where I can see continents, countries, cities, all at one glance. What shallow hierarchy there is, you can also see right through it; it's merely a guideline, not an obligation. You can see through multiple layers of hierarchy and your vision is not restricted to one branch. I can slide from Finland to Sweden near Tornio without having to first back up in some artificial hierarchy. I could do that even if Finland and Sweden were considered to be on different continents for some weird reason.

You do get into delineation problems when you investigate the conceptual notion of space, as well as overlapping concepts, but I don't really see this as a problem. The world as it is, and the world as we conceptualize it is not the same thing. Our conceptual view of the world does not play by the same rules as the things in themselves.

Anyway, given a google maps view of the entire world, completely unstructured, Where's Waldo? You get exactly the same problem you had with arch/aarch64/dts, except now you don't even have a hierarchy to go on. He could be in Tokyo among 13.5 million people, he could be in Siberia, he could be on a boat on the ocean. Heck, he could be in a mine underground. He could be on the International Space Station.

This problem is pretty much solved by Adobe Lightroom, although not at filesystem level, purely within the application itself. You tag photos with keywords, which can be hierarchical.

Let's say you take a photo of a frog in Florida at night. You apply the keywords "frog, Florida, night".

Over time you photograph all kinds of other creatures as well as across several locations, so this list of individual keywords will grow. The cool thing is that you can now group the keywords themselves.

You create a new keyword "USA", and then drag "Florida" in it. You create a new keyword "Amphibians" and drag "Frogs" in it. And then create "Animals" and drag "Amphibians" in it.

The powerful thing is that you now have the power of flat keywords as well as hierarchical browsing across multiple hierarchies.

Folders work perfectly in a pure hierarchical taxonomy. Many classifications defy this rigid of a structure, however. For example:

Widget 1: it is A and B but not C, so tag it with A and B.

Widget 2: it is A and B and C, so tag it A, B, and C.

Widget 3: it is B only, so tag it B.

That is pretty simple, but you couldn't represent that in a folder system without permutation folders, meaning you now have folder sprawl, making things harder to find.

This is how servers and ec2s are for almost everyone. Billing codes, environments, teams, business units, etc. A folder taxonomy to replace ec2 tags would be a nightmare.

Folders have the weakness which you describe. But that weakness is also a strength - folders are a form of encapsulation which, for example, discourages or prevents processes in one folder from destroying data in another folder (accidentally or maliciously). With just tags, that is a bigger danger.

After the past 5 years or so of experiencing the supposed best "folderless" computing has to offer on iOS and ideologically similar platforms.

I've gained huge respect for how beautiful the concept of the folder is.

I wrote Supertag[0] specifically to get the same kind of ergonomics with tags as you get with folders. Basically you can dynamically render sub-folders based on the tags that apply to your current selection.

Example: /A/B contains the intersection of tags A and B. If sub-folder C exists underneath /A/B, it's because one of the files in the intersection also has tag C.

0. https://amoffat.github.io/supertag/

Oh, this is neat. The user is constructing a predicate "A and B and C and ...", where each new "and" is added by clicking into a folder.

If you add OR with ctrl+click, and perhaps some sort of NOT, the user can now construct arbitrary predicates using only the tags and familiar folder point-and-click operations.

Folders are simple, even if tags can imitate them, they can't replace their structure. Tags are a good fit for items that naturally fall into several categories, like music or books, but not for general file systems.

> Folders are simple, even if tags can imitate them, they can't replace their structure. Tags are a good fit for items that naturally fall into several categories, like music or books, but not for general file systems.

This. They're different tools for different jobs. Being a dogmatist and trying to win an us vs them competition on which is the ONE TRUE SOLUTION is stupid. That's like trying to argue which is a better tool, a screwdriver or a pair of pliers, and saying it's stupid to keep a screwdriver in a toolbox. Honestly, in a lot of cases both should be implemented and available.

Stupid reasons pliers are better than screwdrivers:

* There are so many different kinds of screwdrivers; it's too confusing to users.

* Pliers can be used to drive screws, so there's no need for a dedicated screw-driving tool. We've had a lot of success gripping the screw head with pliers and turning.

* Pliers are beautiful and modern, and lots of popular influencers are using pliers now. Screwdrivers are old tech and ugly.

Your toolbox analogy is great for another reason. I can “tag” my screwdrivers with multiple items: Philips, flat, Torx, Square, short, long, ratcheting… Each screwdriver can have multiple tags. But I store them all in the same drawer in my toolbox.

Yeah tags can imitate them on some level. But what about folder workflows like “I want to copy this whole project across 100+ files exactly so I start changing it without losing anything”

With folders it’s just copy the base folder.

Now I know as devs some of us would suggest version control as the solution but keep in mind with folders my non-technical 70 year old dad can do this action and understand it completely because of the tactility a folder provides.

> if tags can imitate them, they can't replace their structure.

Technically, they can. A "folder" is just a tag whose name is the entire directory path up to root. All files with that same tag are in the same folder.

Assuming the tags cannot be linked in hierarchies/graphs.

And yet, folders are tags.

In the sense that a staircase can be used to seat guests. Folders are not like tags, tags are like flat folders. Folders are better at separating, tags are better at mixing.

just veeery shy ones.

Agree. They both have a place. I use Zotero to keep track of my references and also just interesting things I find. It uses folders and tags. Fantastic!

Could you describe your Zotero folder & tag strategy? I don’t know where to start with this and it’s a hot mess

I start with the folders. Create ones on broad categories like "Math", "Physics", "Politics", and each of those could have subfolders (folders are called collections in zotero). An entry can be copied to multiple folders but its copy-by-reference so you see the same file and notes and so on in whatever folder ever you view it. Tags are added as meta-data and I use them for searching the library for things. In a given collection you can select a keyword/tag and it will highlight all items with that tag. I've been using zotero now for years and many 1000s of entries. It's by far the best tool I've used for organizing my stuff.

Here you can see uses of folders and tags. It's from an academic context but you'll get the idea. https://youtu.be/efLOqgS4jzA

It should be possible to mimic tags with folders & clever scripting that moves files around according to their "tags".

Imagine a command like:

    tagmv file1 file2 directory1 directory2 -t tag1 tag2 tag3
where it moves the files/directories to something like

Or instead of using "#" could use some other special/rare character/unicode to indicate that it is a "tag-directory".

The hierarchy of the tag-directories could ordered from the most common to least common tags. The most common tag, would have the tag-directory under the top level directory (something like ~/t/ to prevent confusion with the rest of the filesystem perhaps?).

The files & tag-directories could get re-organized every time the usage of tags changes, in order to keep the hierarchy from most common tags to least common.

A set of tools like tagmv, tagcd, tagls, etc could work with this tag-based structure.


There's a talk about such https://karl-voit.at/managing-digital-photographs/, there's a video.

Karl seems to have a fancy TUI, I made me a cmdline helper https://codeberg.org/mro/Tagger.

I've seen that type of idea before, encoding tags within the file name. But as someone that lives in the terminal, I'm not a huge fan of cluttering up and making the file names so long.

Also this limits the number of tags that can be assigned to a file, with the limit varying between operating systems.

But if you don't use filenames you have to store this tagging information externally, which has to stay in sync with file locations and ideally not be reliant on one specific program. There's folders + symlinks but I imagine that might be complicated to manage as well.

I think it's possible to avoid using symlinks and just use plain folders to represent tags only. By using a special naming strategy on the folders themselves, it would be possible to have a set of scripts for "tagging" files and operating on them. Tagging would be equivalent to just moving the files into the relevant nested tag directories.

No duplicate files or symlinks necessary.. Can make scripts like tagcd and tagls to make it work.

I don't think that would scale. The nested directory structure could reach a depth equivalent to the number of tags to support all combinations. Meanwhile with symlinks you only need one folder per tag and it can be comfortably used with file browsers.

If there was a limit of 10 tags per file, then the max depth would be 10 folders.

With <20 tags per file you could just use the filename and not worry about more than filtering a bit using `find -iname`. With a filename limit of 255 bytes (Linux) that would still be ~12 characters per tag on average.

I'd prefer not to worry about such limits though, with some files that don't allow full-text search it can be useful to have more.

I think a casual relationship to tags and folders together is the way. I built a system that allows both [0]. And using them together has some advantages. It's important that nothing of type A could ever be of type B. This is useful managing client work, and other things like transactional data. Folders handle this scenario. OTOH, a lot of research material for one bucket might be related to another, for me that's usually programming related articles etc. I want to review all that stuff together -- a tag handles that.

In other words, folders create boundaries between information, and tags connect across those boundaries.

[0]. https://www.tatatap.com

I still want a filesystem that can do both.

I want to have regular folders, and then folders that I can issue a SQL style query to generate their contents.

Take multimedia. With a traditional file system you can only have one type of sort. Typically by type (audio, video, image) and then alphabetically. It would often be nice to have a folder that is formed by querying the metadata, say all the items released in the 1950's, or all the items that are a low quality copy.

MS tried it [0] and failed. It was not easy. This was in the days of Whistler, Blackcomb and Longhorn and always an interesting read if you've the time. [1]

[0] https://en.wikipedia.org/wiki/WinFS [1] https://hal2020.com/2013/03/10/winfs-integratedunified-stora...

Apple did it and succeeded? In MacOS you can use tags and/or folders to organize files, and there is an OS-wide search index (Spotlight).

WinFS was meant to be far more sophisticated than that. It's more like Salesforce for your file system.

why choose?

Current desktop operating systems have hierarchical filesystems and support flat lookup via indexing (e.g. MacOS's spotlight).

I strongly disagree with the article's assertion "take my word for it — it’s much easier to tell a computer “do this to every file in this folder” than to tell it “first, find everything with this tag, then put those into a string, then do a thing to every file in this string…”" because in most reasonable filesystem API's that's exactly what you would do with folders - first, find everything matching this wildcard from this folder string, put these filenames in a list, and then do a thing do every file in this list... I don't see a fundamental difference, in both cases you're running an arbitrary query first to get a list of the file names or handles with a specific criteria.

It's easier in the sense that if you're a brand new baby programmer, it takes fewer easily-findable-via-Google commands. Sorry if that wasn't clear!

The article wasn't really written for the HackerNews audience, sorry I never expected to wind up here — it was written for, like, college kids majoring in archaeology.

My issue with tags is that I always forget to tag things. When you have folders, it's more obvious when something hasn't been organized.

Tags I've always thought of conceptual while Folders are at the very least absolute.

Let's say I'm recording events for my future travels. trips/paris.md and trips/cancun.md are in my folder structure with them tagged as business and vacation respectively. Later, I can go back and add a "mistakes" tag to cancun.md, but really if ever need to look up all my trips, I know it will be in trips and it's incontrovertible fact that cancun was a trip.

There's room for both, but tagging historically came out of a need where search functions were poor. These days tagging is unnecessary work imo.

Shameless plug:

I wrote a Nemo extension [1] that lets you add columns for #tags @persons or $whatever you put in a filename. You can sort by these columns. For complex things, there's always `find`.

[1] https://github.com/dejj/nemo-addons/blob/main/nemo-python/ex...

Check out Bonsai Browser [0] if you want to see what a web browser built on tagging rather than folders looks like (disclaimer: co-founder).

I think the main virtue of tagging systems is in the low friction to add info and multiple inclusion.

Tagging also has its downsides and I think we'll probably end up on some hybrid system in the long run.

[0] https://bonsaibrowser.com/

Seemed interesting, but why do you have to have an account to log in?

I was going to try it out, but.. I just hate not being able to try something before giving away my email.

Maybe we can add anon accounts at some point. The main reason right now is to make the feedback process easier since it starts an email chain that we can follow up on. All of the best improvements recently have started with these email chains and follow on conversations.

You'll get more and higher quality feedback from a larger volume of users. And right now you're self selecting for people who are willing to use their email to create an account.

The best way to improve a product is to have a lot of people use it

Instead of tagging (which takes effort) you could also use a good search algorithm.

  SELECT * FROM desktop_icons WHERE ...
I used to be one of those people who had to sort my desktop icons to find stuff, because there were so many icons they'd overlap. Today, ~/Desktop is completely empty. I've gotta say, folders are far superior to a rubbish bin, even when it isn't hard to search files by name & contents.

A good folder hierarchy is like a library: you know where it is even before you look for it.

Except when you forgot to put it in the right place because you just had to extract it and do that one quick thing...

I'd rather not rely on my own willingness to organize files so I'll take a search tool any day.

Nah, that's when you just have an "Unsorted" folder. Sort that once per week. Problem solved

This is how I treat my project (personal and work) task management. Throw things into the inbox as they come up unless I have the time to organize right then, later on sort into various contexts and tag them (to the extent I care to tag) when I have a chance. Once I've tagged it, it's clear of the inbox. Also how I used to use Gmail (back when I used the web interface, which I gave up on when it started crawling on my new and bleeding edge, at the time, desktop).

One needs both. With folders and trees of bookmarks, you can discover stuff you’d forgotten and wouldn’t search for (because you’d forgotten you knew it once :).

This becomes more important as you get older and begin to experience the pleasures of reading things again for the “first” time.

I have two such folders: "Downloads" and "Temp". Everything not classified stays there until it gets moved to a permanent folder, or erased.

I admit I sometimes search for files, but it's a filename search, not a content search. I don't want a background service indexing the files and contents of all my disks when I can use a regular search instead.

Probably depends on your use case, but my experience is that getting good results with text search often takes a lot more effort than tagging/classifying the documents when you get them would. It takes effort to craft a good query. Terminology used varies. "Oh, Sally uses a different term than I do, what was that?" If you tag or put the files into folders then you can standardize.

I worked as a patent examiner in the past and using only text search for that would be considered negligent there as it would miss a lot of documents. I ultimately used a combination of text, classification, citation, and AI similarity search techniques. Each has strengths and weaknesses so using all of them makes sense.

If (fuzzy) search is your primary use case, sure. But classification is also important for other use cases, like security ACLs, usage analytics, etc.

If you don't know what's the name of what to search for can you search for it? I get that a lot of developers like the idea of falling back to search because most files you use can be searched for contents but for everyone that has to work with binary files it's not suitable.

Johnny Decimal is also just another abstraction layer and in this a terrible one.

i would still want it to support tagging so that I could ensure that those searches picked up specifically what I wanted and not more random results. I’ve had to try to find things on Confluence and Sharepoint and it takes a lot of time to try to get what you want out of the search results.

> Take my word for it — it’s much easier to tell a computer “do this to every file in this folder” than to tell it “first, find everything with this tag, then put those into a string, then do a thing to every file in this string…”

The solution here would be a shell (or command) that's able to "do this to every file with this tag", right?

In the GUI world, Haiku's Tracker is an example of how such queries might look (https://www.haiku-os.org/docs/userguide/en/queries.html); a command-line tool (perhaps with nicer syntax if possible) could readily do the same (and then perhaps do something with the results). Haiku's advantage here is that it uses attributes instead of tags, so it's a bit of a richer experience; also, most Haiku/BeOS software is already aware of attributes, so that also makes it easier to rely on those attributes being actually used.

I have settled on putting tags (bunch of words I associate with given subject separated by underscores) in names of files/directories and using FSearch (https://github.com/cboxdoerfer/fsearch) to search for the tags in names/paths.

It's simple, it's portable, it's good enough. You could in theory improve it by having multiple views of the same data (let's say you want to save some notes about "Naturalis Historia" - should you put it under "ancient Rome" directory or under"biology"?) for example by using hardlinks but I don't know if there is a way to create a backup on another filesystem that will keep hardlinks as hardlinks (DAR seems promising http://dar.linux.free.fr/doc/Features.html but I have yet to try it).

I just accept that modern computing is complex, and requires a multitude of methods to organise and navigate.

For example, I use Digikam from KDE to manage my photos. It has a LOT of ways to file and retrieve photos. First are collections, and they contain folders that are a window to the filesystem. (I like that, because it means only maintaining one filing system.) Inside a folder view, you can also group photos by dragNdrop. You also have star ratings, tags, flags, locations and faces. You can search by dates, locations on a map, or show images that are similar. The list goes on. It is very flexible, so you can choose your workflow.

My personal hot take:

Folders are just heirarchical tags.

Tags are just non-heirarchical folders.

Both can be compared to programming classes, with subclassing representing folders (key negative: Can't inherit from multiple parents), and class composition representing tags (key negative: No heirarchy).

Folder names (including parent folders) could automatically be applied as tags. Tags could likewise be viewed as folders (tags with multiple "parents" would be something like symlinks), but would need a heirarchy to become nested.

> Sometimes people will ask me what I do when I have something that could go into multiple places and the answer has always been pretty simple: if something could conceivably fit in two different folders I need to consolidate my folders.

How does that not lead to an end result where you have one folder "everything", and thus no organization at all? There's always edge cases.

I've got over 2k files in my Obsidian vault and haven't had a problem really.

I think it depends on how you categorize things? I'm never going to confuse my tax documents with my short fiction, or my character profiles for a fantasy character for my daily notes page.

You'd have to give me more info on what categories you have to come up with examples, but I find it completely unbelievable for that problem not to exist.

Hey, I know this writer! Her blog is awesome and she's super friendly and gives great advice on Obsidian and on productivity/PKM in general. This was a nice surprise!

what about 3d building like structures that you fly over or into. mainly for amusement parks, zoos, a combination of both, and oil speculation supercomputers

What can you do with folders that you cannot with tags?

It’s not about ability, but mindset.

Folders are tree structures where each file lives in only one place (in practice, hard links are uncommon, and symlinks are more common but obvious).

That abstraction is useful when navigating, looking for related files, etc.

Tags are more general. Yes, they could implement the features of folders, but the connotation of tags is that they are flat and unstructured.

Sometimes programs implement hierarchical tags, and that appeals to me the most. That way, you can navigate hierarchies of tags the same way you would navigate hierarchies of folders, but there's more than one hierarchy that can be navigated. And because you don't need to worry about encoding every piece of information into a single path, the hierarchies can be shallower and more convenient.


you can do that with tags tho.

They both support discoverability in different ways

Both. Folders with tag support

I miss the promise of WinFS

why not use semantics tagging.

Ex: use wikipedia page id as a tag.


>Error establishing a database connection

Whew I wasn't ready for that hot take.

Oh god I'm sorry I wasn't expecting this level of traffic, let me see if I can fix it.

Don't worry about it, it's the HN effect. I really recommend using a PaaS like Netlify (I'm not sponsored or affiliated). It will take a weight off your shoulders the next time you get a surge.

I think I managed to get Cloudflare set up, let's hope that works

I guess it was so hot, that the database melted.

I'm hosted on Digital Ocean, and the irony is, I'm literally mid-migration to Ghost from Wordpress — if this had happened a week later this wouldn't have happened.

I'm trying to figure out how to upgrade the droplet but being honest, my husband is the one who set all this up for me, so it's going to take me a minute to figure out how to add more bandwidth or whatever.

I'm trying to figure out how to upgrade the droplet but being honest, my husband is the one who set all this up for me

Yeah, HN can be all "why didn't you just...?" when a site gets hit with traffic, but you know what? I've been in this industry for over 30 years, had a decent stint at Microsoft and other companies you've heard of, working on stuff you've probably used. And if my stupid blog somehow ended up on the front page of $POPULAR_SITE I wouldn't have the first clue how to increase bandwidth. Oh, 30 years of this shite means I know where to immediately start looking, but off the top of my head? phhhhht And it sure as hell would take me more than "a minute to figure out how to add more bandwidth or whatever". :-)

Point is, your page hit the HN lottery, no need for apology. I can bookmark it for later.

If I can figure out how to GET to my wordpress page for long enough, I'll set up a redirect and mirror it as an article on https://obsidianroundup.org/ — which is literally what I was working on this week, haha, the nice folks at Ghost's concierge already helped me do it for my history nerd stuff newsletter.

Someone reached out and said I can use Cloudflare to fix this, so I'm gonna go try that, doot doot.

This - I've worked fullstack on apps that do unfathomable numbers of connections per second. But for a personal blog the best thing I could muster is probably go to the cloudflare site with my wallet in hand and click around nervously until I figure out how to buy caching from them before it falls off the top page of hacker news.

It's the database that is being hit(multiple times probably) every page request. typically you would add a caching layer to wordpress so that each url would get cached for N minutes so you don't need to do the expensive rendering each time.

If you want something quick and easy, just sign up for a free account at Cloudflare and hook up their CDN. It's a useful thing to have even when you've switched to WordPress, too.

Ah, bless, this is exactly what I am doing right now and it is much less terrifying than I thought it would be.

Ironically, this has been on my todo list to learn — I want to mirror my Obsidian notes and that requires Cloudflare and before today I've been too nervous to muck around with it.

Or, uh, switched to ghost. Although ghost could likely handle it on its own

It's an old problem. You just need WP-Cache or WP-SuperCache or successor plugins. i.e. not your fault, this happens to everyone who runs WP.

if you're expecting more bursty traffic coming your way from reddit or HN, it might be best to deploy a static site out to something like Vercel, GitHub Pages, Cloudflare Pages, Netlify, et cetera. it's not really as easy as running from a WordPress instance but it'll better handle these sort of events.

DigitalOcean's App Platform supports static sites too.

running a wordpress is a noble endeavor but it isn't ideal for unexpected traffic hugs from a site like this.

I am _literally_ mid-migration to Ghost because of how slow Wordpress is :( :( :(

The Ghost team has been talking about moving more towards the JAMstack model (static sites, basically). I don't run it myself but it's something worth looking into depending on your traffic expectations in the future:


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact