

Information Retrieval papers you need to read - Anon84
http://www.scienceforseo.com/information-retrieval/10-papers-you-need-to-read/

======
mahmud
Anon84, you have been churning out some good links lately. What is it? hack
mode? beautiful isn't it? :-)

[sorry, had to make a private remark publicly, to someone I don't know because
HN doesn't have messaging. just had to acknowledge that Anon84, and others
like him, has been keeping the place more "hackish" and less TechCrunch-
fluff.]

~~~
Anon84
Thanks for the support.

I've been spending a lot more time lately reading up on this type of stuff for
a _huge_ scraping/data mining project. My background is mostly in Physics so
depending on what I happen to be working on at any given moment there's always
some catching up to do.

Anything particularly interesting gets posted here for others to enjoy and
comment. :)

~~~
mahmud
I'm doing the same right now. Massive scraping plus "gisting" or document
summarization. You're pretty much on the right track; half of those papers are
industry standards (my browser marked them as "visited" automatically :-)

~~~
GeneralMaximus
So which of those would be a good starting point for someone who has no idea
about IR?

~~~
mahmud
start with LSA: laten semantic indexing, if you want information
gisting/summarization.

i'm not worried about IR because I use a good search-engine library:
Montezuma, and it's in Common Lisp. It's Java clone is called Lucene :-)

~~~
GeneralMaximus
A Google search gave me this: <http://www.cs.utk.edu/~lsi/>

Is that what you're talking about?

------
wheels
A couple other seminal favorites:

 _Authoritative Sources in a Hyperlinked Environment_

<http://www.cs.cornell.edu/home/kleinber/auth.pdf>

 _Probabilistic Latent Semantic Analysis_

<http://www.cs.brown.edu/~th/papers/Hofmann-UAI99.pdf>

Kleinberg, author of the first paper, is quite possibly my favorite computer
scientist. He's been on the leading edge of a lot of largely disjoint areas of
research on the information structure of the web and almost always drops in
some really smart stuff. He seems to hit new fields and drop in some big ideas
and move on, so a lot of the more concretely applicable stuff comes in
refinements of his work, but sometimes I feel like every time I cross into a
new problem domain he's already written three papers on it. Beyond that, he
writes in a very readable style, which is frustratingly rare for top-notch
computer scientists.

~~~
gfodor
The "rebel king" is also a phenominal lecturer and a favorite among all
cornell CS students.

------
ananthrk
All good links. Thanks for sharing.

Another list of books for IR
[http://researchonsearch.blogspot.com/2005/12/information-
ret...](http://researchonsearch.blogspot.com/2005/12/information-retrieval-
textbooks.html)

------
alexitosrv
Also you guys surely you would like this book too: [http://www-
csli.stanford.edu/~hinrich/information-retrieval-...](http://www-
csli.stanford.edu/~hinrich/information-retrieval-book.html)
<http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf>

Is a very readable and comprehensive introduction of the field. Besides it is
recent and covers in a depth several current hot topics.

here is a review the book [http://glinden.blogspot.com/2009/02/book-review-
introduction...](http://glinden.blogspot.com/2009/02/book-review-introduction-
to-information.html)

