

Complete guide to mastering eshell, the emacs lisp shell - preek
http://www.masteringemacs.org/articles/2010/12/13/complete-guide-mastering-eshell/

======
DupDetector
Dup: <http://news.ycombinator.com/item?id=2002037>

~~~
preek
Thank you for the information. I have read you check for duplicates with a
program. I also double check with the ihackernews.com API before I submit.

Obviously this API is not fully trustworthy. Would you tell me how you do the
checking?

~~~
DupDetector
I pull enough of the referenced page to get the html title. If it's the same,
ish, and the domain is the same, then my script flags it up as a probable
duplicate. I'm looking at adding more checks, because some sites use the same
title for every page regardless of the content, but that's my first
approximation.

~~~
preek
Thanks for the reply. But you left out how you check against HN itself.

You said your check goes like this:

1\. Parse <title> from submission page

2\. Check against description on HN

3\. If 1==2, then flag as duplicate.

I don't think your script flags anyone as a duplicate who uses the same title
as the original content producer(;

~~~
DupDetector
No. For each item on HN I pull the referenced page, parse the HTML and extract
the text between "<title>" and "</title>". I compare those, as well as
comparing the domain names. I don't parse the title from the HN submission,
but from the page that has been referenced from the submission.

Is that clearer?

