
Ask HN: Best Text Mining Resources - big_data
I plan a deeper dive into text mining this year, and am looking for some suggestions on what resources are best.  A friend suggested Text Mining by Weiss, et al http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-0-387-95433-2<p>What would you suggest?
======
helwr
Managing Gigabytes (Witten)

Information Retrieval (Manning)

Text Compression (Bell)

Natural Language Processing (Manning)

Natural Language Understanding (Allen)

Speech and Language Processing (Jurafsky)

The Text Mining Handbook (Sanger)

Statistical Machine Translation (Koehn)

Data-Intensive Text Processing with MapReduce (Lin)

Algorithms on strings (Gusfield)

Jewels of Stringology (Crochemore)

Regular Expressions (Friedl), also:
<http://swtch.com/~rsc/regexp/regexp1.html> and automata theory (Hopcroft)

Practical Text Mining with Perl (Bilisoly)

Natural Language Processing with Python (Bird)

Computational Linguistics (Hausser)

Syntactic structures (Chomsky)

also check out these links:
[http://measuringmeasures.blogspot.com/2010/01/learning-
about...](http://measuringmeasures.blogspot.com/2010/01/learning-about-
statistical-learning.html)

[http://measuringmeasures.com/blog/2010/3/12/learning-
about-m...](http://measuringmeasures.com/blog/2010/3/12/learning-about-
machine-learning-2nd-ed.html)

<http://www.cs.technion.ac.il/~gabr/resources/resources.html>

------
dejv
If you want to learn more about learning over text I will recommend you to
look at those lectures: <http://videolectures.net/mlas06_pittsburgh/>

First two lectures are great introduction to this topic and third is also
related, but not necessary.

If you want to dive deeper to more advanced stuff I will recommend to look to
the conditional random fields, which is kind of state of art of this field
right now.

Great tutorial: <http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf>
Wiki entry: <http://en.wikipedia.org/wiki/Conditional_random_field>

------
mindcrime
Tapping Into Unstructured Data: [http://www.amazon.com/Tapping-into-
Unstructured-Data-Intelli...](http://www.amazon.com/Tapping-into-Unstructured-
Data-Intelligence/dp/0132360292)

Mining The Talk: [http://www.amazon.com/Mining-Talk-Unlocking-Unstructured-
Inf...](http://www.amazon.com/Mining-Talk-Unlocking-Unstructured-
Information/dp/0132339536/ref=sr_1_1?ie=UTF8&s=books&qid=1276405985&sr=1-1)

Text Mining Application Programming: [http://www.amazon.com/Text-Mining-
Application-Programming/dp...](http://www.amazon.com/Text-Mining-Application-
Programming/dp/1584504609/ref=sr_1_1?ie=UTF8&s=books&qid=1276406016&sr=1-1)

Introduction to Information Retrieval (available freely online):
[http://nlp.stanford.edu/IR-book/information-retrieval-
book.h...](http://nlp.stanford.edu/IR-book/information-retrieval-book.html)

------
kunjaan
You could ask in the Machine Learning subreddit too :
<http://reddit.com/r/machinelearning>

------
vark
\- Modern Information Retrieval by Baeza Yates

\- Data Mining Book by Jiawei Han et al

\- Managing Gigabytes by Witten et al

\- Hypertext Mining book by Chakrabarti

------
big_data
Great stuff here, sure to get me going in the right direction! Thank you all!

