Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Building an effective tagging system can be much harder than people realize. I once worked on a tagging system for a collection of math problems. I thought I could code a simple tagging model, and let users tag their own math problems, and it would become much easier to find the problems you're most interested in.

Then I realized that tags like algebra 1, Algebra 1, Algebra I, Alg I, and all other variations should mean the same thing. So I started to develop a closed set of tags. That led to a fascinating rabbit hole about taxonomies that I don't even remember how to speak about clearly at this point.

That project is still a work in progress, and it's left me with immense respect for people who build well-structured systems that involve tagging.




Two impressive site-wide systems I've seen are the categories of Wikimedia Commons (multimedia) and tags of Archive of Our Own (fanfiction). The Commons guideline[0] elucidates its system and interesting ontological theory well. It's scope is extremely broad, aiming to simultaneously include any possibly useful categorization scheme,[1] and overall is a fairly freeform (ideally) directed acyclic graph. Variations are handled with redirects and disambiguation pages in a typical wiki manner, with the limitation that individual category uses must have the canonical name. Ao3, in contrast, has a schema of sorts, and synonyms are made equivalent during resolution (its tags FAQ[2] is also an interesting read).

I tried to write a more thorough comment but also struggled with being coherent. Thus, some ideas, only briefly:

- At an even higher level, the web itself and the overlapping userbases/communities ('intersectionality', without the discrimination--the original set-theory kind?) of individual sites can also be considered a way of organizing content

- Thus, analogously: Search engines replaced directories and webrings as algorithms did tags. The present SEO meta, though...

- Generalizing from Commons, all Wikimedia wikis (Wikipedia, etc.) have parallel category structures, only less developed due to the greater reliance on links. So do most wikis in general, though Wikimedia also unifies categorization and structured data with Wikidata. From there are knowledge graphs and databases in general, wrapping back around to Google trying to determine the Knowledge Graph item that each query refers to.

[0] https://commons.wikimedia.org/wiki/Commons:Categories

[1] all the typical keying on depicted people, things, times, and places, plus the ways that we categorize those. Niches from 'horizontal bicolor blue and white flags‎' to 'Luxembourgish pronunciation by gender‎', 'trams on route 709', 'ships with 6 funnels'. There's a tool (now called vCat) to visualize categories, some outputs here: https://commons.wikimedia.org/wiki/Category:Wikimedia_catgra...

[2] https://archiveofourown.org/faq/tags

Edit: specific examples


> tags of Archive of Our Own (fanfiction)

On a similar note, Danbooru-style image boards often have highly developed tagging systems, ranging from tags for specific characters or artists to tags for art styles, poses, or even specific features which happen to appear in the artwork (like "hat bow" or "blue eyes").


Just for fun, here are your examples applied to Commons (and a conjecture that tag systems naturally converge as they become more fine-grained):

https://commons.wikimedia.org/wiki/Cat:Wikipe-tan

(NSFW-ish[0]) https://commons.wikimedia.org/wiki/Cat:Drawings_by_User:Seed...

https://commons.wikimedia.org/wiki/Cat:Demoscene

https://commons.wikimedia.org/wiki/Cat:Paintings_of_couples,...

https://commons.wikimedia.org/wiki/Cat:Blue_eyes

https://commons.wikimedia.org/wiki/Cat:Bow_hats

There's also a tool to intersect or subtract categories hidden in the dropdown of the 'Good pictures' button at the top right.

[0] (NSFW-ish) https://en.wikipedia.org/wiki/Seedfeeder


> I tried to write a more thorough comment but also struggled with being coherent.

how fitting.

I guess it's always about neighbourhoods. In your street, in your pew, in your bookshelf, inside your brain, in your zettelkasten.


I just used synonyms and a tag hierarchy ( nested sets).

Works pretty well.


I ended up building out a hierarchy as well. But figuring out the structure of that hierarchy was not trivial at all. How does the name of a repeatable class (Algebra 1) fit with the name of a specific class (Algebra 1 Fall 2020 Section 2)? How does that relate to an area of math like algebra, geometry, number theory? How does that relate to things like context (ie problems about Minecraft, Lego, Physics, etc.)

I developed a closed system of tags, and then gave people the ability to define aliases.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: