

YC: There should be a way to prevent reposts - martianpenguin

Recently, I've noticed a lot of articles that are reposts of the same thing.  They should all be consolidated to one thread.  Just my opinion.
======
aneesh
There could simply be a "report duplicate" link (where you specify which
article it's a repost of) next to a new post for the first 1 hour after it's
posted. If enough people say it's a duplicate of a particular article, then it
goes off, and the comment threads are merged.

~~~
tim2
Instead of dropping or merging the article, put a list (or link to a list) of
"previous postings" somewhere very prominently on the page.

------
tlrobinson
There is. Duplicate URLs are not allowed.

As for different sources reporting on the same thing, that's a bit harder.

~~~
bootload
_"... There is. Duplicate URLs are not allowed. ..."_

There is a subtler variation on this. Two different urls resolving to the same
article. A site may publish:

\- <http://foo.com/date/bar/some-inane-tech-article>

\- <http://foo.com/date/some-inane-tech-article>

Both urls point to the same article, both are unique but point to the same
document. A quick example might be an article and the same article printed.

~~~
bootload
_"... There is. Duplicate URLs are not allowed ... There is a subtler
variation on this. Two different urls resolving to the same article ..._ "

Here is an live example I just spotted:

original ~ <http://www.paulgraham.com/ycombinator.html> ~ post ~
<http://news.ycombinator.com/item?id=133430>

dupe ~ <http://paulgraham.com/ycombinator.html> ~ post ~
<http://news.ycombinator.com/item?id=134775>

------
kajecounterhack
Can something be written that strips the url of arbirary variables?

<http://foo.com/bar?aritrary-var=arbitrary-val>

to just

<http://foo.com/bar>

that would help because a lot of people end up posting links with
?source=newsletter or &sessionid=asdf1234ilikepie or etc

~~~
natrius
Those variables are occasionally meaningful. Stripping them all off
indiscriminantly would break some links.

~~~
marcus
So don't strip them indiscriminately, have the HN site compare submitted pages
it gets by dropping them one by one to the original link, only maintain those
that actually affect the resulting page.

------
jmtulloss
I agree. I've also noticed that the same story as reported by two different
news sources will get posted, which is essentially the same thing.

Google does some pretty cool probability stuff (at least, I think that's how
they do it) to figure out what articles are the same for news.google.com.
Something like that would be really cool on new.yc.

I know, the source is open.... but I'm clearly busy ;)

~~~
aneesh
Yeah, I've wondered, how exactly does Google do that?? Anyway, in the short
term, a manual system would probably be more accurate.

------
nreece
There should be a _better_ way to prevent reposts

