First, I tried tags. This seemed like a good idea and was a lot of fun at first, but eventually I got _really_ tired of having to curate tags for all of my notes and boy there were a lot of them! It's not something you can just do once either, because every time you update a note you have to remember to change the list of tags as well. I wound up with many articles whose tags no longer (or never did) match the content.
To get myself out of the tag-curating business, I then tried the "folder" idea by separating areas of concerns into namespaces. So, all of the Python-related notes go under the "Python" namespace. All of the database notes into "Databases," and so on. All the did was shift the problem space. Which namespace does the note on SQLAlchemy go in? If an article does not fit into an existing namespace, should I create one for it now, or put it in the root and hope I remember to move it if a second related note comes along? To put it succinctly, my notes do not fit into a DAG.
My third iteration dropped both tags and namespaces and implemented the FTS5 search built into SQLite. Two years later, I have no regrets. The only "curation" I have to worry about is giving each note an accurate title and breaking up large notes into smaller ones when it makes sense. Now when I need something I just search for it and it shows up in the results.
Google for all of their other faults got one thing extremely right when it came to web content and email: curation is a fine hobby but for Really Getting Shit Done, you index the content well and then just search for it when you need it.
"How to find stuff" is a space with many solutions to similar looking but subtly different problems.
HA! In my case I wrote my notes, but don’t remember what’s in ‘em. It’s endemic for learning.
folders/tags is a false dichotomy.
It’s like internal white hat SEO
When results, e.g., include multiple tickets and documents referencing the same error or configuration, which do you use for troubleshooting? Which results are the authoritative / canonical / source of truth for documenting the latest configuration directives?
Versioning used well for the input helps. Organization and ontology is additional context for using search results well.
You don't need to have a perfect heirarchy, you just need to have a heirarchy that you'll remember.
If you constrain yourself into trying to produce a Liskov free heirarchy you'll tie yourself in knots.
I'd probably put SQLAlchemy under Python because it is python-specific. Databases should be kept for your notes on PostgreSQL administration (and you may have python scripts in there but they'll be to serve the purpose of administrating a concrete PostgreSQL install or something.
And you don't need to find all the database stuff and collect it together, you just need to know where you put something. So largely put it in the folder that makes the most immediate sense to you and don't overthink it. Later if you hate it, refactor it. But remember that you're dealing with your own personal hash table for retrieval and not trying to implement group_by. The latter issue is where you'd need an index over your notes that was searchable, but I don't find myself needing it.
I think you mean "tree" or "hierarchy"? A DAG would be a superset of tagging, and could place SQLAlchemy under both Python and Databases, as long as you consider the edges oriented (i.e. there's a parent-child relationship).
Doesn't solve the tag maintenance issue, though.
But the one area that curation carries a big advantage over search is in browsing. You can do maybe some things with topic modeling and recommendations to allow a kind of browsing, but with searching it's really hard to know whether you have thoroughly covered a part of the space via searching, while with hierarchical curation that is easy. Filling in that last part with a good solution would make searching a no brainer over curation I think, but IMO I don't think most search solutions try to handle that right now.
I'm curious about your tagging strategy. Could you show some example of this issue?
I feel like my tags are very different since they're barely curated, but I can't imagine ending up with a "wrong" tag. I may miss some and have to add them later, but i can't remember ever removing or changing a tag. That's both in pinboard and in my scanned documents.
well, effectively, tags too, are just a specialized form of searching.
the one thing i don't see is the problem with tags getting out of sync.
yes, it happens. tags do become obsolete, but that is rarely a problem. it just means that i find get more results than i should otherwise. occasionally when i see way to many obsolete tags i go and clean up that particular category of tags. if it is just a few then i ignore them.
and if i look at a tag and i don't find what i need, then i search for it, and add the tag when i find it so that next time i'll find it faster, because with more than 2 million emails in my archive search alone is not good enough.
I understand that this is a very subjective thing... but folders offer an exclusion which isn't immediately available via tagging. Tagging is implicitly an OR operation.
Now, your search engine might offer ways to add "not X", but it's a tradeoff. It you have highly-compartmentalized bits of info you end up with the problem of "how much do have to explicitly exclude from my search?".
Personally, I think we're missing a level of organization somehow.
Doesn't have to be. I can't think of many tag-based search systems that default to OR; usually the default is AND, and that's pretty sensible.
There are plenty of text-based search systems that default to OR, though (see also: pretty much every mainstream search engine), and it makes searching a fucking nightmare, so I definitely get the aversion.
To highlight exclusion, consider an example- folder Animals with subfolders Cats and Dogs. A thing can be either a cat or a dog, but not both.
if an object has five tags, it is all those things. that looks like an AND relationship to me.
the rest is search. if your tagging system only does OR searches, then that's indeed a problem, but of the implementation, not of the concept.
and there is no reason both can't coexist.
one application that i use extensively is kphotalbum. it allows me to organize photos by tags, categories, date, folders and other attributes all at the same time. i can navigate and search photos based on all of those attributes. usually i drill down a folder, and then pick tags from within that folder, or i pick tags at the the top level and then work with the folder structure of the selected images. ironically, kphotoalbum does not do well on OR relationships or exclusion of specific attributes. i often use temporary tags to help me there, which kphotoalbum makes trivially easy to use
I wonder if tags could work like spam detection. For each tag mark a few things as "tag" and "not tag" and let an AI figure out everything else.
If there's a tag you never look at, and so it's poorly trained, who cares? You never look at it. Commonly used tags will be better trained, because you will have marked more things as "yes tag" or "not tag".
I think the best solution is a mix of 3, folder structure with tags, full text search when needed
Let's say you have notes about SQLite. If you don't put "SQLite" in the note, how will you find them?
When flexible graphs/networks are abused for organizational purposes, you get circular dependencies, spaghetti code, and general dysfunction. Organizations (code or people) need to be easy to navigate.
When rigid hierarchies are abused for classification purposes, you run into "class House, class Boat, class HouseBoat extends ???" knots. Ontologies need to be flexible.
Modern filesystems necessarily use both: we have folders, and we have file types/tags.
For example, say you have the following datums with the following tags
apple: plant, Rosaceae, tree, fruit
peach: plant, Rosaceae, tree, fruit
rose: plant, Rosaceae, shrub, flower
dandelion: plant, Asteraceae, herb, flower
carrot: plant, Apiaceae, herb, root
pig: animal, Suidae
There are tradeoffs, though: a fair bit of overhead for each heavily nested item. And you would want some support for ensuring the integrity of the hierarchies (i.e. someone accidentally removes "tree" from the tag set of "apple").
So for instance an email can be tagged fun/ski, social/facebook/marketplace, money/receipts.
If you open the social/facebook/marketplace “folder” you’ll see the email, but also if you open the money/receipts “folder”. You could also see it in the intersection of the social tag and the fun/ski tag, etc.
Especially, see this example: https://github.com/alphapapa/taxy.el#sporty-understanding-co...
It ends up working kind of like hard links for folders/files, but it is a lot easier to setup since child items are the ones which declare where they are located, not the parents/directories. I think another reason why hard links are more difficult to use than this particular system is that with Tiddlywiki, it is easy to see all the locations an item falls under at once as well as seeing all the items at a particular location. I feel like adding this reverse location information would be quite helpful and would be less of a change than implementing tags for existing filesystems.
"expert/american" and "american/expert" couldn't have duplicate content so you take your tag system and overlay hierarchies.
Anything that wasn't manually set was auto hierarchied based on highest traffic volume.
I also prevented tags from showing up on the same tag chain, so any given keyword could only appear once. That prevented infinite recursion.
Moving away from hierarchy also made interesting permutations easier to generate.
Since any metadata can become a tag, price or price range ($100hr – $300hr) is easy to generate and "enhance" American/Experts/Between$100and$300anhour.
It worked really well, allowed us to manually enforce high traffic hierarchies/phrases while still auto-generating intelligent canonical links for the rest of the site.
That was the point of labeling 1 version of any duplicated permutation "canonical", to prevent duplicate content penalties.
Can you elaborate?
python, databases, SQL, SQLite, $date, FromHN, beautiful_site (I liked the look of this blog)
or in proposed system
the difference between "just folders" and the proposed system is that you could have one note in multiple folders. Which gives you more flexibility in assigning notes to its topics. But it is still more structural than tags which could easily turn into an unpenetrable list of random words.
To be fair you could also achieve it with symlinks.
The relationship between arbitrary nodes in a tree can be determined by tracing their common ancestry, but tags don't provide equivalent functionality, unless you strictly define how tags themselves relate to other tags. An obvious way to do so is to prescribe that every tag shall have exactly one parent (except for the root abstract "thing" tag).
In other words tags become folders, but any non-folder content of those folders can simultaneously live inside any number of folders. Similar to symlinks, but arguably less hacky, because there is no differentiation between "actual" location and "linked" location.
In other words, similar to hardlinks
Minor detail: I intended for different deletion semantics from hardlinks. Whereas hardlinks use reference counting for that (only the last deletion actually deletes); for my purposes, delete anywhere meant delete everywhere.
Example session might look something like this:
/ $ ls -d
foo bar baz
/ $ cd foo
/foo $ ls -d
/foo $ cd baz
/foo/baz $ ls -d
Aaaaanyways... SuperTag looks AMAZING. I will give it a try right now and see how it works for me. Thanks a lot for implementing it :)
EDIT: It doesn't want to install for me on OSX :( Oh well, it sounded too good to be true haha.
~ brew install amoffat/rnd/supertag
Running `brew update --preinstall`...
==> Tapping amoffat/rnd
Cloning into '/usr/local/Homebrew/Library/Taps/amoffat/homebrew-rnd'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 24 (delta 4), reused 24 (delta 4), pack-reused 0
Receiving objects: 100% (24/24), done.
Resolving deltas: 100% (4/4), done.
Error: Invalid formula: /usr/local/Homebrew/Library/Taps/amoffat/homebrew- rnd/supertag.rb
supertag: Unsupported special dependency :osxfuse
Error: Cannot tap amoffat/rnd: invalid syntax in tap!
EDIT: Couldn't wait. This is an exceptionally well put-together FAQ, explaining the design constraints: https://amoffat.github.io/supertag/faq.html
So if your files (and their tags) are as follows:
/ $ ls
t1 t2 file1
/ $ ls t1
/ $ ls t2
/ $ ls t1/t2
/ $ ls -a
file1 file2 file3 file4
/ $ ls -a t1
That said, binary classification (object A is a member of class B, a function with a boolean truth value) is the basic concept in classification. That is, any classification can be represented correctly with binary classification.
There really are a few things where you assign something a category from a finite set (like a chess game was won by "White", "Black" or was a draw) but frequently when people build ontologies based the idea that "A is a member of one of this set of categories" they are going to screw it up.
The word can is doing a lot of heavy lifting there.
Tagging system (at least the ones I've seen): enter the tags as autocompleted tokens in a single text entry. Possibly the system can autosuggest a number of tokens, possibly not, depending on your use case.
Symlinks-as-Tagging-System: I want to save a file called "foo.txt" and start editing it immediately. How do I enter the tags?
File systems can technically break all of these; files can contain multiple files within themselves with many file systems, the ability to have multiple paths leading to the same file means that if you just have the file in hand you don't have a unique path, which also means that if you remove that name it doesn't mean you've removed the file. But most of the time, you can kinda get away with writing normal code and it'll do the right thing. But that last one really burns. It is really easy to write code assuming file systems are just trees, and in particular can't have infinite loops, and be wrong about that.
if i want to really remove it, i can scrub the contents without removing the hardlink, and if i want to delete after scrubbing i check the hardlink count and search for the remaining entries.
Sometimes you might want that behavior, but other times you might not. But there’s no option in any app I’ve seen to choose how to handle hardlinks on edit/save.
fortunately in my case this issue is not a problem. i use hardlinks mainly for pictures where i never want to replace the original anyways, so editing always gets me a copy which i don't need to link to multiple places.
* Yes there are hard links, but the single hierarchy means that unless you memorised the specific FS of each folder, it's gonna be hard.
Categories are much more abstract, and while useful, less intuitive.
Which is why folders should remain, complement them with other means of navigation for sure, but hiding them away will only make things less intuitive.
In the end, whatever capabilities your design has are only as useful as their ability to be understood and used.
Actual spatial arrangement of data (and code) is something I've wanted to try for a long time now though. It's incredibly how you can take a map that shows the entire planet, and in a few seconds zoom in to the building you're in. And in a few more seconds, zoom in to the the house you lived in when as a child in a different town. I'd like to try build a similar map of personal data and code.
But that's also how we conceptualize space. One place is my bedroom, it connects to a space that's a corridor, which connects to a few spaces like the kitchen, the bathroom, the living room. In one of these spaces is a bookshelf, which contains a shelf for sci-fi, a few shelves worth of computer science and physics, then a few shelves of philosophy.
I don't know the Cartesian coordinates of the particular books, but I know if I want to get a book, I walk the path connecting the bedroom, to the corridor, to the living room, and then look in the bookshelf. Then if I want to say read Dune, I know it's in the bottom-most shelf where I keep the sci-fi.
That procedure conceptually translates to
$ cd ..
$ cd livingroom
$ cd bookshelf
$ cd bottom-shelf
$ cat Dune
Similarly, I couldn't decide on a hierarchy for my apartment. What's above living room? If root is your entire house, does that mean you can skip into any room without going through another? I can't go to the bedroom at the other end of my apartment without going through corridor..ish thing and the living room. I can't place room above the other because none contain the other. So I guess root is the whole house then. What does cd .. in my bedroom do? Do I get out of the apartment, so that I can enter another room? That's just weird. But ok, I'll take it: I can magically teleport out of my room and "into the apartment" and just warp into another room. Convenient, could we also flatten the bookshelf and bottom shelf into my apartment so that I can warp straight to them without first thinking about "livingroom"? Hierarchy begone :)
And still there is no spatial relationship between items under livingroom. It's unorganized chaos. I have more than one bookshelf.. I can refer to the bookshelf on the left or bookshelf on the right but in a folder? It gets interesting when you try to draw the line between living room and kitchen since they're kinda one and the same but not really. They're spatially separated but it's hard to say where one ends and the other begins.
On a map, where I can see continents, countries, cities, all at one glance. What shallow hierarchy there is, you can also see right through it; it's merely a guideline, not an obligation. You can see through multiple layers of hierarchy and your vision is not restricted to one branch. I can slide from Finland to Sweden near Tornio without having to first back up in some artificial hierarchy. I could do that even if Finland and Sweden were considered to be on different continents for some weird reason.
Anyway, given a google maps view of the entire world, completely unstructured, Where's Waldo? You get exactly the same problem you had with arch/aarch64/dts, except now you don't even have a hierarchy to go on. He could be in Tokyo among 13.5 million people, he could be in Siberia, he could be on a boat on the ocean. Heck, he could be in a mine underground. He could be on the International Space Station.
Let's say you take a photo of a frog in Florida at night. You apply the keywords "frog, Florida, night".
Over time you photograph all kinds of other creatures as well as across several locations, so this list of individual keywords will grow. The cool thing is that you can now group the keywords themselves.
You create a new keyword "USA", and then drag "Florida" in it. You create a new keyword "Amphibians" and drag "Frogs" in it. And then create "Animals" and drag "Amphibians" in it.
The powerful thing is that you now have the power of flat keywords as well as hierarchical browsing across multiple hierarchies.
Widget 1: it is A and B but not C, so tag it with A and B.
Widget 2: it is A and B and C, so tag it A, B, and C.
Widget 3: it is B only, so tag it B.
That is pretty simple, but you couldn't represent that in a folder system without permutation folders, meaning you now have folder sprawl, making things harder to find.
This is how servers and ec2s are for almost everyone. Billing codes, environments, teams, business units, etc. A folder taxonomy to replace ec2 tags would be a nightmare.
I've gained huge respect for how beautiful the concept of the folder is.
Example: /A/B contains the intersection of tags A and B. If sub-folder C exists underneath /A/B, it's because one of the files in the intersection also has tag C.
If you add OR with ctrl+click, and perhaps some sort of NOT, the user can now construct arbitrary predicates using only the tags and familiar folder point-and-click operations.
This. They're different tools for different jobs. Being a dogmatist and trying to win an us vs them competition on which is the ONE TRUE SOLUTION is stupid. That's like trying to argue which is a better tool, a screwdriver or a pair of pliers, and saying it's stupid to keep a screwdriver in a toolbox. Honestly, in a lot of cases both should be implemented and available.
Stupid reasons pliers are better than screwdrivers:
* There are so many different kinds of screwdrivers; it's too confusing to users.
* Pliers can be used to drive screws, so there's no need for a dedicated screw-driving tool. We've had a lot of success gripping the screw head with pliers and turning.
* Pliers are beautiful and modern, and lots of popular influencers are using pliers now. Screwdrivers are old tech and ugly.
With folders it’s just copy the base folder.
Now I know as devs some of us would suggest version control as the solution but keep in mind with folders my non-technical 70 year old dad can do this action and understand it completely because of the tactility a folder provides.
Technically, they can. A "folder" is just a tag whose name is the entire directory path up to root. All files with that same tag are in the same folder.
Imagine a command like:
tagmv file1 file2 directory1 directory2 -t tag1 tag2 tag3
The hierarchy of the tag-directories could ordered from the most common to least common tags. The most common tag, would have the tag-directory under the top level directory (something like ~/t/ to prevent confusion with the rest of the filesystem perhaps?).
The files & tag-directories could get re-organized every time the usage of tags changes, in order to keep the hierarchy from most common tags to least common.
A set of tools like tagmv, tagcd, tagls, etc could work with this tag-based structure.
Karl seems to have a fancy TUI, I made me a cmdline helper https://codeberg.org/mro/Tagger.
But if you don't use filenames you have to store this tagging information externally, which has to stay in sync with file locations and ideally not be reliant on one specific program.
There's folders + symlinks but I imagine that might be complicated to manage as well.
No duplicate files or symlinks necessary.. Can make scripts like tagcd and tagls to make it work.
I'd prefer not to worry about such limits though, with some files that don't allow full-text search it can be useful to have more.
In other words, folders create boundaries between information, and tags connect across those boundaries.
I want to have regular folders, and then folders that I can issue a SQL style query to generate their contents.
Take multimedia. With a traditional file system you can only have one type of sort. Typically by type (audio, video, image) and then alphabetically. It would often be nice to have a folder that is formed by querying the metadata, say all the items released in the 1950's, or all the items that are a low quality copy.
Current desktop operating systems have hierarchical filesystems and support flat lookup via indexing (e.g. MacOS's spotlight).
The article wasn't really written for the HackerNews audience, sorry I never expected to wind up here — it was written for, like, college kids majoring in archaeology.
Let's say I'm recording events for my future travels.
trips/paris.md and trips/cancun.md are in my folder structure with them tagged as business and vacation respectively. Later, I can go back and add a "mistakes" tag to cancun.md, but really if ever need to look up all my trips, I know it will be in trips and it's incontrovertible fact that cancun was a trip.
There's room for both, but tagging historically came out of a need where search functions were poor. These days tagging is unnecessary work imo.
I wrote a Nemo extension  that lets you add columns for #tags @persons or $whatever you put in a filename. You can sort by these columns. For complex things, there's always `find`.
I think the main virtue of tagging systems is in the low friction to add info and multiple inclusion.
Tagging also has its downsides and I think we'll probably end up on some hybrid system in the long run.
I was going to try it out, but.. I just hate not being able to try something before giving away my email.
The best way to improve a product is to have a lot of people use it
SELECT * FROM desktop_icons WHERE ...
I'd rather not rely on my own willingness to organize files so I'll take a search tool any day.
This becomes more important as you get older and begin to experience the pleasures of reading things again for the “first” time.
I admit I sometimes search for files, but it's a filename search, not a content search. I don't want a background service indexing the files and contents of all my disks when I can use a regular search instead.
I worked as a patent examiner in the past and using only text search for that would be considered negligent there as it would miss a lot of documents. I ultimately used a combination of text, classification, citation, and AI similarity search techniques. Each has strengths and weaknesses so using all of them makes sense.
Johnny Decimal is also just another abstraction layer and in this a terrible one.
The solution here would be a shell (or command) that's able to "do this to every file with this tag", right?
In the GUI world, Haiku's Tracker is an example of how such queries might look (https://www.haiku-os.org/docs/userguide/en/queries.html); a command-line tool (perhaps with nicer syntax if possible) could readily do the same (and then perhaps do something with the results). Haiku's advantage here is that it uses attributes instead of tags, so it's a bit of a richer experience; also, most Haiku/BeOS software is already aware of attributes, so that also makes it easier to rely on those attributes being actually used.
It's simple, it's portable, it's good enough. You could in theory improve it by having multiple views of the same data (let's say you want to save some notes about "Naturalis Historia" - should you put it under "ancient Rome" directory or under"biology"?) for example by using hardlinks but I don't know if there is a way to create a backup on another filesystem that will keep hardlinks as hardlinks (DAR seems promising http://dar.linux.free.fr/doc/Features.html but I have yet to try it).
For example, I use Digikam from KDE to manage my photos. It has a LOT of ways to file and retrieve photos. First are collections, and they contain folders that are a window to the filesystem. (I like that, because it means only maintaining one filing system.) Inside a folder view, you can also group photos by dragNdrop. You also have star ratings, tags, flags, locations and faces. You can search by dates, locations on a map, or show images that are similar. The list goes on. It is very flexible, so you can choose your workflow.
Folders are just heirarchical tags.
Tags are just non-heirarchical folders.
Both can be compared to programming classes, with subclassing representing folders (key negative: Can't inherit from multiple parents), and class composition representing tags (key negative: No heirarchy).
Folder names (including parent folders) could automatically be applied as tags. Tags could likewise be viewed as folders (tags with multiple "parents" would be something like symlinks), but would need a heirarchy to become nested.
How does that not lead to an end result where you have one folder "everything", and thus no organization at all? There's always edge cases.
I think it depends on how you categorize things? I'm never going to confuse my tax documents with my short fiction, or my character profiles for a fantasy character for my daily notes page.
Folders are tree structures where each file lives in only one place (in practice, hard links are uncommon, and symlinks are more common but obvious).
That abstraction is useful when navigating, looking for related files, etc.
Tags are more general. Yes, they could implement the features of folders, but the connotation of tags is that they are flat and unstructured.
Ex: use wikipedia page id as a tag.
Whew I wasn't ready for that hot take.
I'm trying to figure out how to upgrade the droplet but being honest, my husband is the one who set all this up for me, so it's going to take me a minute to figure out how to add more bandwidth or whatever.
Yeah, HN can be all "why didn't you just...?" when a site gets hit with traffic, but you know what? I've been in this industry for over 30 years, had a decent stint at Microsoft and other companies you've heard of, working on stuff you've probably used. And if my stupid blog somehow ended up on the front page of $POPULAR_SITE I wouldn't have the first clue how to increase bandwidth. Oh, 30 years of this shite means I know where to immediately start looking, but off the top of my head? phhhhht And it sure as hell would take me more than "a minute to figure out how to add more bandwidth or whatever". :-)
Point is, your page hit the HN lottery, no need for apology. I can bookmark it for later.
Someone reached out and said I can use Cloudflare to fix this, so I'm gonna go try that, doot doot.
Ironically, this has been on my todo list to learn — I want to mirror my Obsidian notes and that requires Cloudflare and before today I've been too nervous to muck around with it.
DigitalOcean's App Platform supports static sites too.