

Idea HN: Tag Reduction / Unification - dryicerx

Since many folks here are doing web startups that involve tagging (be or blog posts, photos, etc), this might be relavant. The use for tags is to cluster groups of photos together. But across users, tags are Inconsistent, somewhat defeating this purpose.<p>Lets take Flickrs tag example (most popular tags)<p><pre><code>  photowalk2009, sp4449, pitchforkmusicfestival,   worldwidephotowalk, skphoto, scottkelbyphotowalk,   scottkelbysworldwidephotowalk, scottkelby, vihar,  day199,  scottkelbyworldwidephotowalk, scottkelbyssecondannualworldwidephotowalk, kelby,  scottkelbyphotowalk2009, odori,  photowalk,  therebeastormabrewin, worldwide, riat, mefi10
</code></pre>
Notice that the "photowalk" appears so many different ways, where it should have been just one tag "ScottKelbyPhotowalk". This is just a example, but there are lot of instances where this happens with other tags.<p>Unifying the inconstant tags can be done in several ways.  Automatically using world similarity, or allowing users to suggest tags that mean the same (then a moderator can go select the ones that validate them). Another way is picking a random tag and all objects tagged with it, and then examining the most common tags within that set (most likely the top N tags common to all of them will mean the same).<p>Just an idea to make your next app a bit more usable.
======
joshu
If apps allow multiple people to tag the same resource then you should be able
to figure out synonyms statistically.

It's probably a mistake to show top tags at all.

Flickr allowing you tag an entire upload set is definitely a big mistake.
Browsing/discovering on the tag "tokyo" to find 20 pictures of the inside of a
nondescript hotel room (even though the owner was on his way to or in tokyo)
is not really a great experience.

Overtagging/hypertagging is definitely a symptom that something is wrong.

Tags should mostly be describing the object itself (photo, in this case.) They
are also overloaded to describe a set of things (see previous) and a group
organization of people. This last is very fragile.

------
profquail
I think a good (and easy) method would be to see if any of the tags they are
trying to input can be matched at the beginning or end of any of the others,
and then disallow the longer version (or suggest an alternative).

Using your example above, photowalk matches the beginning of "photowalk2009"
and the end of "worldwidephotowalk", "scottkelbyphotowalk",
"scottkelbysworldwidephotowalk", "scottkelbyworldwidephotowalk", and
"scottkelbyssecondannualworldwidephotowalk". So none of those would be allowed
(only the original "photowalk"), but "scottkelbyphotowalk2009" would be
allowed, since "photowalk" doesn't match the beginning or end of the string.

Or, you could simply do what Stack Overflow does, and limit the number of tags
you can enter (they have a limit of 5 per question). I think that would keep
people from spamming it so much since they'll have to enter relevant tags, and
not just a bunch of variations.

~~~
dryicerx
To add to the _StackOverflow is doing is right_ , is the Tag Suggestions.
(forcing early users to use already existing tags)

------
eli
I designed the UI so that it pushes users to use existing tags. There are also
moderation tools that allow an admin to merge one tag into another. The merge
also records the lesser tag as a synonym of the greater tag so that you only
have to do each merge once.

------
al3x
I wrote about this in 2004: [http://al3x.net/2004/11/14/emergentcollaborative-
catagorizat...](http://al3x.net/2004/11/14/emergentcollaborative-
catagorization.html)

------
gojomo
There is a conflict between letting people use tags for whatever meaning is
natural to them, and mildly enforcing a 'consensus' meaning.

I'm currently working on a system where starting a tag with a specific
character implies 'consensus definition'. Each such tag has a corresponding
wiki page where users can hammer out the common standards for its use/meaning.

For now, the special character is ':' -- the '#' of '#hashtags' presents
problems mapping into neat URLs.

So for example, the tag 'photowalk' would mean whatever each user individually
intends it to mean. There's no expectation of common meaning, even though in
many cases a useful concordance may occur (as in any mixed personal/public
tagging system). OTOH, ':photowalk' means exactly what the matching
/wiki/:photowalk page documents it to mean. If you use it incorrectly, people
and the system will nudge you to clean up your usage. (In particular, later
users can essentially nullify your mistaken tag in a manner analogous to
correcting a wiki edit, by applying a '-:photowalk' tag.)

