
Ask HN: Anyone use Search HN before submitting articles to prevent duplicates? - deadfall
https://www.hnsearch.com/
======
mindcrime
Nope. There is code in place to detect (some) dupes and turn the re-submit
into an upvote for the original submission. For anything that makes it through
the dupe detector, I'm satisfied to assume that it's probably worth a second
discussion. Note that not all duplicates are a Bad Thing, and there are a few
links that show up on HN about once a year or so anyway, and rightly so.

~~~
dfc
I don't mind the dupes when they are spaced out by months or more. The
instances that bother me are times when a user appends $x=y to a URL to get
best the detector on a regular basis. For instance:

    
    
      http://www.restorethefourth.net/?s=hn  15.000 days ago thepumpkin1979  350+ points
      http://www.restorethefourth.net/       15.001 days ago iProject 2 points
    

This means that pumpkin submitted the bare URL saw that someone else submitted
it and then appended s=hn and resubmitted the URL to get past the dupe
detector for fun and karma.

pg needs to figure out how to embed ColinWright into the duplication detection
process, nothing gets by him.

------
stfu
Personally I don't mind about duplicates. Quite frequently stories make it to
the TOP20 where (I believe) that I had submitted these previously (from
another news outlet).

Hackernews is a lot about timing, and I really don't mind it if some stories
need a few attempts until they break through. This randomness is part of the
fun in submitting stories to hn.

------
Tichy
No, HN detects duplicates automatically. And if that mechanism fails, either
the article just won't get upvoted, or it's an evergreen that is OK to repeat
now and then.

~~~
stephengillie
I've seen multiple threads where the original submission received 100+ upvotes
and 100+ comments, and the repost also received 100+ upvotes and 100+
comments. Sometimes the repost comments went in an entirely different
direction from the original post comments.

~~~
Tichy
I suppose if the repost gets a lot of upvotes, it was a good thing that it was
reposted?

------
twalling
No, this is what computers are for. There's no reason a user should have to
check if a link is a duplicate.

------
llamataboot
Duck Duck Go with !hn appended

------
greenyoda
I always check hnsearch before submitting an article. It only takes a few
seconds to do, and I don't want to be that guy who submits a duplicate
article.

~~~
kamakazizuru
you wont be - hn is smart enough to consider your resubmission equivalent to
an upvote. Dont waste those valuable seconds ;)

~~~
greenyoda
Judging by the number of duplicate submissions that I flag on a daily basis,
the duplicate-finding algorithm doesn't work very well, even on variants of
the same URL (e.g., parameters after the "?", like "print=yes" aren't always
ignored). And there are lots of things the algorithm won't find at all but are
obvious to a person:

\- Frequently, the _identical_ story is cross-posted to multiple sites.

\- Someone posts the mobile version of an article (m.foo.com) in addition to
the canonical URL (www.foo.com).

\- Several very similar articles on the same topic: every breaking news story
on the Zimmerman verdict or the Bolivian president's plane had pretty much the
same facts in it, so if you've seen one, you've seen them all.

So I'll just keep on spending those valuable seconds for the good of the HN
community.

~~~
kamakazizuru
true - ive seen the same story with almost no different facts but from
different news sources on the home page multiple times - e.g. when steve jobs
passed etc - my point was just about posting a single url - the hn algorithm
is typically decent enough. AT the end of the day - if its already been posted
- it shouldnt end up getting upvoted much anyways - but in any case - if it
makes you feel good - you should keep doing that :)

------
ressaid1
I thought HN automatically de-duped articles.

~~~
donretag
Almost. HN only does exact matching, so if the duplicate article varies even
slightly, it would not be detected. Variations include query params at the end
of a URL such as &utm_medium=...

An automatically detected article will upvote the original article.

~~~
prawn
Anyone else habitually clean off &utm_* query strings before sharing a link
anywhere? Feels so dirty leaving them on.

~~~
hobs
Do you mean removing referral links from legit links? Yep. No reason to give
them even more information than they already get from the visit.

------
kunai
The problem with de-duping is that there's usually no chance of sustaining
discussion on a duplicate that was submitted more than perhaps a week ago,
which could be the reason why PG hasn't implemented fuzzy de-duping yet.

------
hkmurakami
if you're karma motivated users have no incentive to look for a duplicate. if
you're not karma motivated, then submitting bare URLs is usually enough for
dupe detection (IMO).

I'm admittedly somewhat karma motivated but I think I'm pretty clean with
respect to my submissions since I dislike when people try to game the
submission system and i don't want to be one of them.

------
cpncrunch
It would be nice if there was a search box on HN itself. Why should I have to
remember some other site to search?

~~~
ElbertF
There is. In the footer.

~~~
jjsz
I guess I'm blind for not noticing it.

~~~
cpncrunch
Me too :) I guess I just expect it to be up at the top, like every other
website.

