

Ask HN: How to improve dup management? - blahedo

Suppose there were a 'dup' button on any post (next to 'flag', maybe) that not only marks a post as duplicate content, but lets you link it to the particular other post that you think it's a dup of.<p>This, you see, is the hard part.<p>Every other thing you might imagine doing about duplicates can be algorithmic.  Some suggestions are below, but any good ideas we have require something like a graph of duplicate posts.  Some of them only require a tree-based (parent-only) annotation, but if you represent it internally as an undirected graph you can topo-sort it on various criteria (popularity, chronology, etc) depending on the algorithm you want to play with.<p>Some things you could do with the graph:<p>- Distribute karma among the various posters.<p>- Merge discussions among the various posts.<p>- Include a "previously posted as" link on the discussion page<p>- Display <i>all</i> the previous and subsequent dups (e.g. in an inset float-right box, or in a list above the discussion)<p>- "Lend" post karma from one dup to another in some fashion, so the popularity of one post (and its rank on the front page) affects another.<p>I'm not suggesting that marking a dup would automatically kill the dup---in fact, I'd suggest that that <i>not</i> happen---but it seems to me that the only way we can do anything intelligent with duplicates is if we have reliable information on which posts are duplicate content (and dups of what).
======
RiderOfGiraffes
What if something gets marked as a dup when it isn't? What if it's nearly a
dup - same story, different source - should that get marked? What if each
submission already has some discussion?

What if the submission with discussion is poor, and a better source is
submitted?

Interested - interesting.

~~~
wmf
I think the key is to allow many HN users to mark dupes and have a few trusted
moderators use their discretion to fix them.

(Of course, nobody will see this thread because /newest is clogged with
dupes.)

~~~
blahedo
Yes, and even without moderation it's not the end of the world if a near-dup
is marked; that just puts it into the dup graph for that topic/article/source
and then the algorithms for karma and popularity can play with it.

