YC: There should be a way to prevent reposts

aneesh · on March 11, 2008

There could simply be a "report duplicate" link (where you specify which article it's a repost of) next to a new post for the first 1 hour after it's posted. If enough people say it's a duplicate of a particular article, then it goes off, and the comment threads are merged.

tim2 · on March 11, 2008

Instead of dropping or merging the article, put a list (or link to a list) of "previous postings" somewhere very prominently on the page.

tlrobinson · on March 11, 2008

There is. Duplicate URLs are not allowed.

As for different sources reporting on the same thing, that's a bit harder.

bootload · on March 11, 2008

"... There is. Duplicate URLs are not allowed. ..."

There is a subtler variation on this. Two different urls resolving to the same article. A site may publish:

- http://foo.com/date/bar/some-inane-tech-article

- http://foo.com/date/some-inane-tech-article

Both urls point to the same article, both are unique but point to the same document. A quick example might be an article and the same article printed.

bootload · on March 12, 2008

"... There is. Duplicate URLs are not allowed ... There is a subtler variation on this. Two different urls resolving to the same article ..."

Here is an live example I just spotted:

original ~ http://www.paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=133430

dupe ~ http://paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=134775

bootload · on March 12, 2008

and another:

- http://thoughtpad.net/alan-dean/http-headers-status.png, http://news.ycombinator.com/item?id=134933

- http://thoughtpad.net/alan-dean/http-headers-status.gif, http://news.ycombinator.com/item?id=134236

kajecounterhack · on March 11, 2008

Can something be written that strips the url of arbirary variables?

http://foo.com/bar?aritrary-var=arbitrary-val

to just

http://foo.com/bar

that would help because a lot of people end up posting links with ?source=newsletter or &sessionid=asdf1234ilikepie or etc

natrius · on March 11, 2008

Those variables are occasionally meaningful. Stripping them all off indiscriminantly would break some links.

marcus · on March 11, 2008

So don't strip them indiscriminately, have the HN site compare submitted pages it gets by dropping them one by one to the original link, only maintain those that actually affect the resulting page.

mixmax · on March 11, 2008

What about:

http://economist.com/articles?somearticle

and

http://economist.com/articles?entirelydifferentarticle

jmtulloss · on March 11, 2008

I agree. I've also noticed that the same story as reported by two different news sources will get posted, which is essentially the same thing.

Google does some pretty cool probability stuff (at least, I think that's how they do it) to figure out what articles are the same for news.google.com. Something like that would be really cool on new.yc.

I know, the source is open.... but I'm clearly busy ;)

aneesh · on March 11, 2008

Yeah, I've wondered, how exactly does Google do that?? Anyway, in the short term, a manual system would probably be more accurate.

nreece · on March 11, 2008

There should be a better way to prevent reposts