

Richard Feynman and the Connection Machine (by W. Danny Hillis) - DaniFong
http://www.longnow.org/views/essays/articles/ArtFeynman.php#top

======
cmos
While an amazing article, this has already been posted here. I guess it might
be good enough for a repeat?

~~~
jcl
Several times over? I guess if we must, we must...

<http://news.ycombinator.com/item?id=191212>

<http://news.ycombinator.com/item?id=31834>

<http://news.ycombinator.com/item?id=185>

~~~
DaniFong
Wow. I'm quite embarrassed.

~~~
gruseom
_I'm quite embarrassed._

Nah. Arguably the HN algorithm could do slightly smarter matching on URLs.
It's interesting to look at the URLs in this case:

<http://www.longnow.org/views/essays/articles/ArtFeynman.php>

[http://www.longnow.org/views/essays/articles/ArtFeynman.php?...](http://www.longnow.org/views/essays/articles/ArtFeynman.php?dupe=with_honor)

[http://www.longnow.org/views/essays/articles/ArtFeynman.php#...](http://www.longnow.org/views/essays/articles/ArtFeynman.php#top)

<http://www.kurzweilai.net/articles/art0504.html?printable=1>

(Yours is the third. Note how the second invented a query string to
deliberately skirt the rules!)

Three of these URLs could be considered equivalent - drop the query string in
one and the anchor in the other. The last one, of course, is just totally
different.

On the other hand, if you do this, you'd probably get some false matches,
because some web pages use query strings to represent different content. Plus
you couldn't do things like post neat Google searches directly.

So probably we should resign ourselves to an endless series of duplications,
whereupon the geeks pounce, digging up previous versions and posting them all
in ever-lengthening lists. Not so bad, really. :)

Edit: how about preemptive duplicate prediction? There's an obvious gap here:

<http://www.kurzweilai.net/articles/art0504.html>

... and one day, someone's gonna post it. (Actually, it gets redirected to
still another.) Or maybe they already did and we could make an even geekier
list. But how to check that without actually submitting it?

Edit 2: Do I win for minutiae of the day?

~~~
neilc
You could also try to do content-based duplicate detection: rather than
looking at the URL, try to determine whether a new article is substantively
similar to a previously-posted one. Getting that to work with a high degree of
accuracy would take some thought, however. Perhaps there's some work on this
in the information retrieval literature?

~~~
DaniFong
I actually worked on this problem for Scribd. The solution isn't too
heavyweight, especially if you're matching large blocks of unaltered text.

