

Natural language processing in Clojure, Go ,Cython - gtani
http://samibadawi.blogspot.com/2010/10/natural-language-processing-in-clojure.html

======
danieldk
As a computational linguist, I could not really extract any useful
information. Sure, it lists the availability of NLP libraries for some
languages, something a cursory Google search would turn up.

What I would be interested in:

\- How mature are the libraries?

\- Are the libraries general enough? Often, a particular component is
developed with one task in mind, and requires a lot of hoop jumping to use for
other tasks (e.g. in classification).

\- How does performance compare for actual tasks? Python may be a slow
language, but selective use of C or Fortran (e.g. NumPy) make it fast enough
for most work.

~~~
thibaut_barrere
As a total beginner willing to learn, it helped me grab a few keywords and
libraries names to get started.

What would you recommend for complete beginners willing to learn ? My first
focus is on trying to extract meaningful data from scraped html, if that
matters.

~~~
danieldk
These two books are really recommended:

\- Speech and Language Processing, Jurafsky and Martin

\- Foundations of Statistical Natural Language Processing, Chris Manning and
Hinrich Schütze

Together they provide a very good overview of the field, from tokenization to
robust parsing.

I cannot really recommend one library. The last two years, I have been working
on a parsing/generation system that was developed in our research group
(Alpino). We use very little external software (only for maximum entropy model
parameter estimation and building finite state automata). We use Prolog,
because it is ideal for writing unification grammars and C/C++ for components
that Prolog is not well-fit for (part of speech tagger for lexical
restriction, finite state automata, term compression, language models, etc.).

This system is amongst others used for extracting syntactic structure for
question answering (based on e.g. Wikipedia data and newspaper texts).

~~~
thibaut_barrere
Thanks, added to wish list!

------
dododo
hal daume, who also wrote "yet another haskell tutorial", is an NLP
researcher, and uses O'Caml, Haskell and C successfully:
<http://www.umiacs.umd.edu/~hal/software.html>

andrew mccallum, one of the originators of conditional random fields and also
an NLPer, has recently been using scala: <http://code.google.com/p/factorie/>

