Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
YC: There should be a way to prevent reposts
11 points by martianpenguin on March 11, 2008 | hide | past | favorite | 13 comments
Recently, I've noticed a lot of articles that are reposts of the same thing. They should all be consolidated to one thread. Just my opinion.


There could simply be a "report duplicate" link (where you specify which article it's a repost of) next to a new post for the first 1 hour after it's posted. If enough people say it's a duplicate of a particular article, then it goes off, and the comment threads are merged.


Instead of dropping or merging the article, put a list (or link to a list) of "previous postings" somewhere very prominently on the page.


There is. Duplicate URLs are not allowed.

As for different sources reporting on the same thing, that's a bit harder.


"... There is. Duplicate URLs are not allowed. ..."

There is a subtler variation on this. Two different urls resolving to the same article. A site may publish:

- http://foo.com/date/bar/some-inane-tech-article

- http://foo.com/date/some-inane-tech-article

Both urls point to the same article, both are unique but point to the same document. A quick example might be an article and the same article printed.


"... There is. Duplicate URLs are not allowed ... There is a subtler variation on this. Two different urls resolving to the same article ..."

Here is an live example I just spotted:

original ~ http://www.paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=133430

dupe ~ http://paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=134775



Can something be written that strips the url of arbirary variables?

http://foo.com/bar?aritrary-var=arbitrary-val

to just

http://foo.com/bar

that would help because a lot of people end up posting links with ?source=newsletter or &sessionid=asdf1234ilikepie or etc


Those variables are occasionally meaningful. Stripping them all off indiscriminantly would break some links.


So don't strip them indiscriminately, have the HN site compare submitted pages it gets by dropping them one by one to the original link, only maintain those that actually affect the resulting page.



I agree. I've also noticed that the same story as reported by two different news sources will get posted, which is essentially the same thing.

Google does some pretty cool probability stuff (at least, I think that's how they do it) to figure out what articles are the same for news.google.com. Something like that would be really cool on new.yc.

I know, the source is open.... but I'm clearly busy ;)


Yeah, I've wondered, how exactly does Google do that?? Anyway, in the short term, a manual system would probably be more accurate.


There should be a better way to prevent reposts




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: