Hacker News new | past | comments | ask | show | jobs | submit login
Idea HN: Tag Reduction / Unification
8 points by janitha on July 19, 2009 | hide | past | favorite | 6 comments
Since many folks here are doing web startups that involve tagging (be or blog posts, photos, etc), this might be relavant. The use for tags is to cluster groups of photos together. But across users, tags are Inconsistent, somewhat defeating this purpose.

Lets take Flickrs tag example (most popular tags)

  photowalk2009, sp4449, pitchforkmusicfestival,   worldwidephotowalk, skphoto, scottkelbyphotowalk,   scottkelbysworldwidephotowalk, scottkelby, vihar,  day199,  scottkelbyworldwidephotowalk, scottkelbyssecondannualworldwidephotowalk, kelby,  scottkelbyphotowalk2009, odori,  photowalk,  therebeastormabrewin, worldwide, riat, mefi10
Notice that the "photowalk" appears so many different ways, where it should have been just one tag "ScottKelbyPhotowalk". This is just a example, but there are lot of instances where this happens with other tags.

Unifying the inconstant tags can be done in several ways. Automatically using world similarity, or allowing users to suggest tags that mean the same (then a moderator can go select the ones that validate them). Another way is picking a random tag and all objects tagged with it, and then examining the most common tags within that set (most likely the top N tags common to all of them will mean the same).

Just an idea to make your next app a bit more usable.




If apps allow multiple people to tag the same resource then you should be able to figure out synonyms statistically.

It's probably a mistake to show top tags at all.

Flickr allowing you tag an entire upload set is definitely a big mistake. Browsing/discovering on the tag "tokyo" to find 20 pictures of the inside of a nondescript hotel room (even though the owner was on his way to or in tokyo) is not really a great experience.

Overtagging/hypertagging is definitely a symptom that something is wrong.

Tags should mostly be describing the object itself (photo, in this case.) They are also overloaded to describe a set of things (see previous) and a group organization of people. This last is very fragile.


I think a good (and easy) method would be to see if any of the tags they are trying to input can be matched at the beginning or end of any of the others, and then disallow the longer version (or suggest an alternative).

Using your example above, photowalk matches the beginning of "photowalk2009" and the end of "worldwidephotowalk", "scottkelbyphotowalk", "scottkelbysworldwidephotowalk", "scottkelbyworldwidephotowalk", and "scottkelbyssecondannualworldwidephotowalk". So none of those would be allowed (only the original "photowalk"), but "scottkelbyphotowalk2009" would be allowed, since "photowalk" doesn't match the beginning or end of the string.

Or, you could simply do what Stack Overflow does, and limit the number of tags you can enter (they have a limit of 5 per question). I think that would keep people from spamming it so much since they'll have to enter relevant tags, and not just a bunch of variations.


To add to the StackOverflow is doing is right, is the Tag Suggestions. (forcing early users to use already existing tags)


I designed the UI so that it pushes users to use existing tags. There are also moderation tools that allow an admin to merge one tag into another. The merge also records the lesser tag as a synonym of the greater tag so that you only have to do each merge once.



There is a conflict between letting people use tags for whatever meaning is natural to them, and mildly enforcing a 'consensus' meaning.

I'm currently working on a system where starting a tag with a specific character implies 'consensus definition'. Each such tag has a corresponding wiki page where users can hammer out the common standards for its use/meaning.

For now, the special character is ':' -- the '#' of '#hashtags' presents problems mapping into neat URLs.

So for example, the tag 'photowalk' would mean whatever each user individually intends it to mean. There's no expectation of common meaning, even though in many cases a useful concordance may occur (as in any mixed personal/public tagging system). OTOH, ':photowalk' means exactly what the matching /wiki/:photowalk page documents it to mean. If you use it incorrectly, people and the system will nudge you to clean up your usage. (In particular, later users can essentially nullify your mistaken tag in a manner analogous to correcting a wiki edit, by applying a '-:photowalk' tag.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: