Ask HN: Anyone use Search HN before submitting articles to prevent duplicates?

mindcrime · on July 19, 2013

Nope. There is code in place to detect (some) dupes and turn the re-submit into an upvote for the original submission. For anything that makes it through the dupe detector, I'm satisfied to assume that it's probably worth a second discussion. Note that not all duplicates are a Bad Thing, and there are a few links that show up on HN about once a year or so anyway, and rightly so.

dfc · on July 19, 2013

I don't mind the dupes when they are spaced out by months or more. The instances that bother me are times when a user appends $x=y to a URL to get best the detector on a regular basis. For instance:

  http://www.restorethefourth.net/?s=hn  15.000 days ago thepumpkin1979  350+ points
  http://www.restorethefourth.net/       15.001 days ago iProject 2 points

This means that pumpkin submitted the bare URL saw that someone else submitted it and then appended s=hn and resubmitted the URL to get past the dupe detector for fun and karma.

pg needs to figure out how to embed ColinWright into the duplication detection process, nothing gets by him.

stfu · on July 19, 2013

Personally I don't mind about duplicates. Quite frequently stories make it to the TOP20 where (I believe) that I had submitted these previously (from another news outlet).

Hackernews is a lot about timing, and I really don't mind it if some stories need a few attempts until they break through. This randomness is part of the fun in submitting stories to hn.

Tichy · on July 18, 2013

No, HN detects duplicates automatically. And if that mechanism fails, either the article just won't get upvoted, or it's an evergreen that is OK to repeat now and then.

dfc · on July 19, 2013

  http://example.com/story/
  http://example.com/story/
  http://example.com/story/?s=hn

HN will identify the second URL as a duplicate of the first URL. However it will allow the third URL through. In a perfect world HN would check if the submissions provided a canonical url and use the canonical url for deduplication.

I have seen an increasing number of people using the $x=yz trick to get past the duplicate detector lately. For instance:

  http://www.restorethefourth.net/?s=hn  thepumpkin1979 15 days ago 350+ points
  http://www.restorethefourth.net/ 15 days ago iProject 2 points

NB: This is merely an example. I do not mean to single thepumpkin1979 out. That is just the first username that I could recall that has multiple ?s=hn in their 30 most recent submissions.

stephengillie · on July 19, 2013

I've seen multiple threads where the original submission received 100+ upvotes and 100+ comments, and the repost also received 100+ upvotes and 100+ comments. Sometimes the repost comments went in an entirely different direction from the original post comments.

Tichy · on July 19, 2013

I suppose if the repost gets a lot of upvotes, it was a good thing that it was reposted?

twalling · on July 19, 2013

No, this is what computers are for. There's no reason a user should have to check if a link is a duplicate.

llamataboot · on July 18, 2013

Duck Duck Go with !hn appended

greenyoda · on July 18, 2013

I always check hnsearch before submitting an article. It only takes a few seconds to do, and I don't want to be that guy who submits a duplicate article.

kamakazizuru · on July 18, 2013

you wont be - hn is smart enough to consider your resubmission equivalent to an upvote. Dont waste those valuable seconds ;)

greenyoda · on July 19, 2013

Judging by the number of duplicate submissions that I flag on a daily basis, the duplicate-finding algorithm doesn't work very well, even on variants of the same URL (e.g., parameters after the "?", like "print=yes" aren't always ignored). And there are lots of things the algorithm won't find at all but are obvious to a person:

- Frequently, the identical story is cross-posted to multiple sites.

- Someone posts the mobile version of an article (m.foo.com) in addition to the canonical URL (www.foo.com).

- Several very similar articles on the same topic: every breaking news story on the Zimmerman verdict or the Bolivian president's plane had pretty much the same facts in it, so if you've seen one, you've seen them all.

So I'll just keep on spending those valuable seconds for the good of the HN community.

kamakazizuru · on July 19, 2013

true - ive seen the same story with almost no different facts but from different news sources on the home page multiple times - e.g. when steve jobs passed etc - my point was just about posting a single url - the hn algorithm is typically decent enough. AT the end of the day - if its already been posted - it shouldnt end up getting upvoted much anyways - but in any case - if it makes you feel good - you should keep doing that :)

lucb1e · on July 19, 2013

Obligatory xkcd http://xkcd.com/1205/

ressaid1 · on July 18, 2013

I thought HN automatically de-duped articles.

donretag · on July 18, 2013

Almost. HN only does exact matching, so if the duplicate article varies even slightly, it would not be detected. Variations include query params at the end of a URL such as &utm_medium=...

An automatically detected article will upvote the original article.

btilly · on July 19, 2013

It should be noted that moderators are well aware of this loophole, and upon repeated abuse will ban accounts. If you're just submitting what you see, don't worry. If you're submitting your articles over and over again, though, well, don't do that.

prawn · on July 19, 2013

Anyone else habitually clean off &utm_* query strings before sharing a link anywhere? Feels so dirty leaving them on.

hobs · on July 19, 2013

Do you mean removing referral links from legit links? Yep. No reason to give them even more information than they already get from the visit.

mindcrime · on July 19, 2013

I usually clean links of superfluous query parameters before sharing them.

kunai · on July 18, 2013

The problem with de-duping is that there's usually no chance of sustaining discussion on a duplicate that was submitted more than perhaps a week ago, which could be the reason why PG hasn't implemented fuzzy de-duping yet.

hkmurakami · on July 19, 2013

if you're karma motivated users have no incentive to look for a duplicate. if you're not karma motivated, then submitting bare URLs is usually enough for dupe detection (IMO).

I'm admittedly somewhat karma motivated but I think I'm pretty clean with respect to my submissions since I dislike when people try to game the submission system and i don't want to be one of them.

cpncrunch · on July 18, 2013

It would be nice if there was a search box on HN itself. Why should I have to remember some other site to search?

ElbertF · on July 18, 2013

There is. In the footer.

jjsz · on July 19, 2013

I guess I'm blind for not noticing it.

cpncrunch · on July 19, 2013

Me too :) I guess I just expect it to be up at the top, like every other website.