

Natural Language Processing with Python - danso
http://www.nltk.org/book

======
gbaygon
Here is a good blog about NLTK: <http://streamhacker.com/>

The blogger is also the author of the book "Python Text Processing with NLTK
2.0 Cookbook"

------
sjaakkkkk
great book! Don't wanna spam, but made a project www.whatrapperareyou.com by
programming along the lines of Chapter 6 on Naive Bayesian Classifiers.

Chapter 6: <http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html>

------
danso
FYI, there's an offshoot page that describes a list of projects (ongoing and
suggested) that can be undertaken with the natural language toolkit:
<http://ourproject.org/moin/projects/nltk/ProjectIdeas>

------
mark_l_watson
I usually use my own NLP code that I have written over 12+ years in Lisp,
Java, and Ruby. That said, I have used NLTK on a few projects (some personal,
some for a data mining customer) and the "everything included" (including
useful data sources) aspect of NLTK is a real time saver. I recommend it,
especially so if you mostly work in Python.

------
gilesc
NLTK is great for _learning_ NLP, but Python is much too slow for scalable
deep NLP (by which I mean tagging and parsing, as opposed to TF-IDF etc). Also
parallelization can become a problem because of the GIL. It's a real shame
they chose Python actually, because otherwise it's a superbly structured,
documented, and maintained project.

~~~
Radim
Hmm, I think Python was an excellent choice; what other platform would you
suggest? IMO being "superbly structured, documented and maintained" is not a
magical property acquired by luck, but rather connected to the platform of
choice.

Btw for performance, whenever pure Python is indeed "much too slow"
(profile?), there's the option of C extension modules. The NumPy or SciPy
libraries are good examples: used in hardcore numerical computing aka the
epitome of I-NEED-IT-TO-RUN-FAST!, but still Python.

And not to nitpick ;) but GIL only affects multi-threading; other modes of
"parallelization" are reasonably straightforward and some even built-in (
_import multiprocessing_ ).

~~~
cf
Yes, that is how you make fast libraries in Python. But, nltk isn't written
using C extension modules. All of this NLP is done in pure python. You could
rewrite what needs to be fast with C extensions, but then what's the point of
using nltk in the first place?

Nltk was never intended to be a way to do production-grade natural language
processing. It's primary objective has been to teach users natural language
processing with clear, well-commented code and documentation. If this isn't
your situation, please use something else.

~~~
andreasvc
What's the point? That half of your code base has already been written for
you. Rewriting performance critical parts is a lot of work, and not having to
rewrite a corpus reader, tree transformations, or evaluation procedure is an
advantage; aside from being an excellent prototype platform. With Cython you
can seamlessly combine Python code such as that from NLTK with your own
optimized code. This was indeed never the intention of NLTK, but I have found
the general idea of combining arbitrary Python code with optimized Cython code
to work very well. The end result is a much less bloated code base in
comparison to something like Java or C++.

------
RBerenguel
It's an awesome book and project. I found about it in Mining the Social Web
(another fantastic book)

------
Rotor
I'm glad to see this NLP book available online for free. Some great knowledge
in there.

------
Legend
Does anyone know if there is a distributed framework to run NLTK?

------
NnamdiJr
Good book for an intro to NLP. NLTK is a cool library but when is it gonna get
Python3 compatible??

