
Ask HN: Can HN prevent duplicates? - trendspotter
There has to be a solution for this kind of problem (duplicate posts like these two): check first https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6053531 and now check the other link https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6053424
======
pearjuice
THIS THREAD HAS BEEN MARKED AS DUPLICATE

[https://news.ycombinator.com/item?id=1012215](https://news.ycombinator.com/item?id=1012215)

~~~
trendspotter
Thanks. I found a great suggestion to this problem over there:

"Maybe borrow a feature from Stack Overflow [or Get Satisfaction or Desk.com
or Uservoice], which presents similar-sounding stories before posting? Some
client-side JS which pulls the page, heuristically picks some "interesting"
words, and runs them through search.yc?"

Here is the perfect and best of both worlds solution to prevent duplicates:

Step 1: (Algorithm) Automatically check the title (and maybe other metadata
like keywords) of his submission against similar existing content on HN.
Present these results to the user.

Step 2: (Human) Let the user's intelligence decide if his submission to HN
looks like a duplicate of another user or not.

------
tokenadult
The duplicate link detector has a minimal ability to recognize duplicate
links. The Hacker News community relies partly on the honor system to avoid
excessive duplicate links. Some members point out duplicate submissions when
they appear anyway. I'm not sure that this is a high enough priority for
anyone on the HN programming team to devote a lot more effort to refining the
duplicate submission detector.

~~~
trendspotter
It can't be that hard to let a machine find out that these two links are
basically the same. I'm not talking about programming a artificial general
intelligence (AGI) solution here:

(a) [http://techcrunch.com/2013/07/16/estimote-creator-talks-
abou...](http://techcrunch.com/2013/07/16/estimote-creator-talks-about-
building-an-os-for-the-physical-world)

(b) [http://techcrunch.com/2013/07/16/estimote-creator-talks-
abou...](http://techcrunch.com/2013/07/16/estimote-creator-talks-about-
building-an-os-for-the-physical-world/)

~~~
chc
Nobody's saying it's impossible. The point is that news.arc is more or less
just a little project that Paul Graham works on in his free time, and the
amount of features that could be implemented vastly outstrips the supply of
Paul Graham's free time, so not every possible feature is going to be
implemented.

~~~
krapp
But why even implement a duplicate link checker without putting more effort
into parsing the uri string for obvious gotchas like whether there's a
trailing slash, or extraneous but meaningless query parameters?

~~~
smartwater
Coding checks such as those are pretty tedious. It's not hard, and that's why
it's not interesting.

~~~
krapp
Sure, but even with pet projects you can't expect everything to be fun. I'm
just suggesting that if pg considers duplicate links a problem worth solving
in his application, then the extra effort might be worth the tedium.

And if he doesn't, then he doesn't.

~~~
smartwater
The fact that this question keeps coming up and it doesn't get changed
indicates that it isn't a problem he cares about.

------
ScottWhigham
Just a word of warning for you - there is a threshold of flagging that, if you
flag too many stories in too short of a time period, you will lose your
flagging privileges permanently. It's an odd implementation, frankly, so be
aware. It often rears its ugly head whenever there is a popular story here
(most recently: NSA/Snowden) in which there are lots of dupes reported.

------
arh68
And what, do away entirely with reviving old topics? I thought it was rather
poignant to consider the old news article about Geocities' acquisition by
Yahoo when Tumblr was recently acquired. Who knows what links submitted
recently will still be interesting in several years? There is value in
allowing duplicates.

