Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Anyone use Search HN before submitting articles to prevent duplicates? (hnsearch.com)
33 points by deadfall on July 18, 2013 | hide | past | favorite | 26 comments



Nope. There is code in place to detect (some) dupes and turn the re-submit into an upvote for the original submission. For anything that makes it through the dupe detector, I'm satisfied to assume that it's probably worth a second discussion. Note that not all duplicates are a Bad Thing, and there are a few links that show up on HN about once a year or so anyway, and rightly so.


I don't mind the dupes when they are spaced out by months or more. The instances that bother me are times when a user appends $x=y to a URL to get best the detector on a regular basis. For instance:

  http://www.restorethefourth.net/?s=hn  15.000 days ago thepumpkin1979  350+ points
  http://www.restorethefourth.net/       15.001 days ago iProject 2 points
This means that pumpkin submitted the bare URL saw that someone else submitted it and then appended s=hn and resubmitted the URL to get past the dupe detector for fun and karma.

pg needs to figure out how to embed ColinWright into the duplication detection process, nothing gets by him.


Personally I don't mind about duplicates. Quite frequently stories make it to the TOP20 where (I believe) that I had submitted these previously (from another news outlet).

Hackernews is a lot about timing, and I really don't mind it if some stories need a few attempts until they break through. This randomness is part of the fun in submitting stories to hn.


No, HN detects duplicates automatically. And if that mechanism fails, either the article just won't get upvoted, or it's an evergreen that is OK to repeat now and then.


  http://example.com/story/
  http://example.com/story/
  http://example.com/story/?s=hn
HN will identify the second URL as a duplicate of the first URL. However it will allow the third URL through. In a perfect world HN would check if the submissions provided a canonical url and use the canonical url for deduplication.

I have seen an increasing number of people using the $x=yz trick to get past the duplicate detector lately. For instance:

  http://www.restorethefourth.net/?s=hn  thepumpkin1979 15 days ago 350+ points
  http://www.restorethefourth.net/ 15 days ago iProject 2 points
NB: This is merely an example. I do not mean to single thepumpkin1979 out. That is just the first username that I could recall that has multiple ?s=hn in their 30 most recent submissions.


I've seen multiple threads where the original submission received 100+ upvotes and 100+ comments, and the repost also received 100+ upvotes and 100+ comments. Sometimes the repost comments went in an entirely different direction from the original post comments.


I suppose if the repost gets a lot of upvotes, it was a good thing that it was reposted?


No, this is what computers are for. There's no reason a user should have to check if a link is a duplicate.


Duck Duck Go with !hn appended


I always check hnsearch before submitting an article. It only takes a few seconds to do, and I don't want to be that guy who submits a duplicate article.


you wont be - hn is smart enough to consider your resubmission equivalent to an upvote. Dont waste those valuable seconds ;)


Judging by the number of duplicate submissions that I flag on a daily basis, the duplicate-finding algorithm doesn't work very well, even on variants of the same URL (e.g., parameters after the "?", like "print=yes" aren't always ignored). And there are lots of things the algorithm won't find at all but are obvious to a person:

- Frequently, the identical story is cross-posted to multiple sites.

- Someone posts the mobile version of an article (m.foo.com) in addition to the canonical URL (www.foo.com).

- Several very similar articles on the same topic: every breaking news story on the Zimmerman verdict or the Bolivian president's plane had pretty much the same facts in it, so if you've seen one, you've seen them all.

So I'll just keep on spending those valuable seconds for the good of the HN community.


true - ive seen the same story with almost no different facts but from different news sources on the home page multiple times - e.g. when steve jobs passed etc - my point was just about posting a single url - the hn algorithm is typically decent enough. AT the end of the day - if its already been posted - it shouldnt end up getting upvoted much anyways - but in any case - if it makes you feel good - you should keep doing that :)


Obligatory xkcd http://xkcd.com/1205/


I thought HN automatically de-duped articles.


Almost. HN only does exact matching, so if the duplicate article varies even slightly, it would not be detected. Variations include query params at the end of a URL such as &utm_medium=...

An automatically detected article will upvote the original article.


It should be noted that moderators are well aware of this loophole, and upon repeated abuse will ban accounts. If you're just submitting what you see, don't worry. If you're submitting your articles over and over again, though, well, don't do that.


Anyone else habitually clean off &utm_* query strings before sharing a link anywhere? Feels so dirty leaving them on.


Do you mean removing referral links from legit links? Yep. No reason to give them even more information than they already get from the visit.


I usually clean links of superfluous query parameters before sharing them.


The problem with de-duping is that there's usually no chance of sustaining discussion on a duplicate that was submitted more than perhaps a week ago, which could be the reason why PG hasn't implemented fuzzy de-duping yet.


if you're karma motivated users have no incentive to look for a duplicate. if you're not karma motivated, then submitting bare URLs is usually enough for dupe detection (IMO).

I'm admittedly somewhat karma motivated but I think I'm pretty clean with respect to my submissions since I dislike when people try to game the submission system and i don't want to be one of them.


It would be nice if there was a search box on HN itself. Why should I have to remember some other site to search?


There is. In the footer.


I guess I'm blind for not noticing it.


Me too :) I guess I just expect it to be up at the top, like every other website.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: