

How to Build a Search Engine - sajidu
http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
This is a textbook on information retrieval co-authored by the head of Yahoo Research and the authors of 'Foundations of Statistical NLP' (another great textbook).<p>Kind of like a more up to data edition of 'Managing Gigabytes', and it's just as good if not better.
======
gtani
[http://www.acmqueue.com/modules.php?name=Content&pa=show...](http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=143)

------
ewiethoff
Thanks! Some ([http://nlp.stanford.edu/IR-book/html/htmledition/near-
duplic...](http://nlp.stanford.edu/IR-book/html/htmledition/near-duplicates-
and-shingling-1.html)) looks useful for a project I'm on. And, no, we're not
trying to index the Internet or do a Google Desktop search.

------
gtani
Jurafsky and Martin, Speech and language processing

[http://books.google.com/books?id=fZmj5UNK8AQC&pg=PR24...](http://books.google.com/books?id=fZmj5UNK8AQC&pg=PR24&lpg=PP1&ots=LqRc00EIJJ&vq=44&dq=jurafsky+martin+%22speech+and+language+processing%22)

