
Ask HN: How Can I Get into NLP (Natural Language Processing)? - aarohmankad
I&#x27;ve recently become quite intrigued by the concept, and want to learn more about it.<p>If you have any resources, I&#x27;d love to see them. They can be videos, articles, tutorials, courses, etc.<p>If there is any prerequisite knowledge required, which I assume there will be, I would also love a starting point. As for my background, I have experience in full stack web development, game development, and about a year&#x27;s worth of academic computer science study.
======
gsingers
My co-authors and I wrote "Taming Text"
([https://www.manning.com/books/taming-
text](https://www.manning.com/books/taming-text)) specifically for programmers
(there is little math, mostly code) interested in getting started in NLP. The
examples are a bit dated at this point (2013 publication date), but still
applicable for someone getting started. Covers getting started, feature
extraction and preprocessing, search, clustering, classification, string
heuristics, Named Entity Recognition and finishes off w/ a simple Question
Answering system. Examples are in Java. It is not an academic treatise.

~~~
binarymax
Just wanted to say: thanks for writing this book, it's really good! I also
second your motion that it is a great place to begin, as it's a single
coherent source for a nice starting point.

------
erniedeferia
I have found these sources useful for learning and prototyping NLP:

[http://garysieling.com/blog/entity-recognition-with-scala-
an...](http://garysieling.com/blog/entity-recognition-with-scala-and-stanford-
nlp-named-entity-recognizer)

[http://tika.apache.org](http://tika.apache.org)

NLTK is always a good starting point:
[http://www.nltk.org](http://www.nltk.org)

I also wrote a 3-part article leveraging OpenNLP with Clojure:

[http://edeferia.blogspot.com/2015/03/from-natural-
language-t...](http://edeferia.blogspot.com/2015/03/from-natural-language-to-
calendar.html)

If you're interesting in applying NLP without necessarily having theoretical
background, wit.ai offers some really impressive features.

Course also offers a good course:

[https://www.coursera.org/learn/natural-language-
processing](https://www.coursera.org/learn/natural-language-processing)

~~~
garysieling
Wow, thanks for mentioning my blog! I got into this using "Natural Language
Processing with Python", which is basically an intro textbook for NLP that
uses NLTK.

I particularly like that they include example exercises in each chapter,
because it can be otherwise challenging to see how particular techniques are
useful.

[https://www.amazon.com/Natural-Language-Processing-Python-
An...](https://www.amazon.com/Natural-Language-Processing-Python-
Analyzing/dp/0596516495/ref=sr_1_1?ie=UTF8&qid=1478777763&sr=8-1&keywords=natural+language+processing+with+python)

~~~
avyfain
This one is also available online for free:
[http://www.nltk.org/book/](http://www.nltk.org/book/)

------
theCricketer
There is a great set of lectures by Dan Jurafsky and Chris Manning:
[https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26...](https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26D00A269)

It would be helpful to have some background in Machine Learning. For a good
introductory course with a mix of mathematical background, see
[https://see.stanford.edu/Course/CS229](https://see.stanford.edu/Course/CS229)

NLP in the more modern systems is backed by deep neural nets. Here's a course
on NLP using deep learning:
[https://www.youtube.com/playlist?list=PLIiVRB6G_w0i-uOoS6cDh...](https://www.youtube.com/playlist?list=PLIiVRB6G_w0i-uOoS6cDh_5nkUyxy_hxe)

~~~
danieldk
Jurafsky & Martin's book is pretty much the standard work on NLP. They are
currently working on a 3rd edition and the draft chapters are available from
the book's web page:

[https://web.stanford.edu/~jurafsky/slp3/](https://web.stanford.edu/~jurafsky/slp3/)

------
deepaksurti
For initial learning, I would second NLTK with:
[http://www.nltk.org](http://www.nltk.org)

You can also checkout [https://github.com/vseloved/cl-
nlp](https://github.com/vseloved/cl-nlp). It is an NLP toolkit in Common Lisp.
Vsevolod the project owner is a great guy to work with. I had contributed with
some minor bug fixes, tests, documentation more than a year back, hence the
mention of Vsevolod.

You could also think on the alternative lines of contributing to an open
source project in NLP and building an application on top of it. Talking to any
such project owner for expected sample apps might help, as they can go into
that project gallery and you get to level up your skills. Hope this helps.

~~~
blahi
Any recommendation that does not start with statistics and continue with
statistics for a while isn't serious.

~~~
deepaksurti
A better approach will be to recommend statistics resources to make this a
better or serious recommendation.

Could you be kind enough to do that? Otherwise, your evaluation of the
recommendation is not serious!

------
smcameron
You're probably looking for something a bit more sophisticated than what I'm
about to mention, but if you don't need anything too sophisticated (that is,
if you can significantly limit the domain of the speech you need to be able to
understand), you could do something like what I did for "the computer" on my
star trek-like space sim Space Nerds In Space:
[http://hackaday.com/2016/06/08/talking-star-
trek/](http://hackaday.com/2016/06/08/talking-star-trek/)

I used pocketsphinx (trained with specially limited vocab) for speech to text,
my own home grown Zork-esque parser for "understanding" the text and
generating responses, and pico2wav for text to speech for the responses.
That's described in a bit more detail here:
[https://scaryreasoner.wordpress.com/2016/05/14/speech-
recogn...](https://scaryreasoner.wordpress.com/2016/05/14/speech-recognition-
and-natural-language-processing-in-space-nerds-in-space/)

------
dksidana
[https://spacy.io/](https://spacy.io/) is one of the best library for NLP if
you are using python

~~~
gghootch
Highly recommend Spacy and the [http://explosion.ai](http://explosion.ai)
blog!

------
sandius
NLP is a huge topic, and the choice of materials pretty much depends on what
you'd like to focus on. In my experience nothing beats a good textbook,
especially if you do the exercises.

The classic NLP textbook is

* Jurafsky, Martin: "Speech and Language Processing" ([https://web.stanford.edu/~jurafsky/slp3/](https://web.stanford.edu/~jurafsky/slp3/)) -- already mentioned here: a very solid overview textbook to give you an idea about the field;

Should you be interested in statistical NLP (even if it probably isn't as sexy
as it used to be), the classic there is:

* Manning, Schütze: "Foundations of Statistical Natural Language Processing" ([http://nlp.stanford.edu/fsnlp/](http://nlp.stanford.edu/fsnlp/)).

------
lovelearning
My recommendations, based on online courses and YouTube playlists I've taken:

\- Coursera's old NLP course by Michael Collins, Columbia Univ. More of theory
and concepts. It's discontinued now on coursera but the material is available
at academictorrents. [1]

\- NLP with Python and NLTK videos by sentdex [2]. Mostly programming, but
with useful nuggets of concepts introduced here and there.

[1]:
[http://academictorrents.com/details/f99e7184fca947ee8f779016...](http://academictorrents.com/details/f99e7184fca947ee8f77901679e171fcadbf82e7)

[2]:
[https://www.youtube.com/playlist?list=PLQVvvaa0QuDf2JswnfiGk...](https://www.youtube.com/playlist?list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL)

------
mrborgen
I did a one week ml stunt last year: [https://medium.com/learning-new-
stuff/machine-learning-in-a-...](https://medium.com/learning-new-
stuff/machine-learning-in-a-week-a0da25d59850#.y8pe9o9qm)

I'd recommend starting with the Kaggle Bag of Words tutorial.

------
languagehacker
Take a look at Stanford CoreNLP:
[http://stanfordnlp.github.io/CoreNLP/](http://stanfordnlp.github.io/CoreNLP/)

It's relatively fast (after model load time) and quite feature-rich.

~~~
charlieegan3
Also checkout [http://corenlp.run](http://corenlp.run) for a hosted version of
the CoreNLP server.

------
andrewtbham
If you're interested in deep learning for nlp... I suggest at least some
familiarity with these papers. It sorta depends on what task you want to use
it for.

[https://github.com/andrewt3000/dl4nlp](https://github.com/andrewt3000/dl4nlp)

------
denzil_correa
Please read through the Handbook of NLP for a nice overview.

[https://karczmarczuk.users.greyc.fr/TEACH/TAL/Doc/Handbook%2...](https://karczmarczuk.users.greyc.fr/TEACH/TAL/Doc/Handbook%20Of%20Natural%20Language%20Processing,%20Second%20Edition%20Chapman%20&%20Hall%20Crc%20Machine%20Learning%20&%20Pattern%20Recognition%202010.pdf)

------
norswap
I have no particular expertise on the topic, but just in case you missed it,
there is this Quora question: [https://www.quora.com/How-do-I-learn-Natural-
Language-Proces...](https://www.quora.com/How-do-I-learn-Natural-Language-
Processing)

It points to NLTK as the framework of choice, and has links to a couple MOOCs
and tutorials.

------
sundarurfriend
My suggestion is, in addition to using the videos and courses for background
knowledge, to take up and work on a (non-homework) project, to truly explore
the area.

For eg., Betty [1] is quite an interesting project with both real-life use and
practical NLP considerations, and is looking for new maintainers. (I'm not
affiliated, just interested in NLP myself and have been itching to get into
betty for some time.)

If you like thinking about game design, there's also the option of Interactive
Fiction [2], NLP-involving ones are called parser-based fictions I believe. A
recent FLOSS podcast episode with folks from the IF Tech Foundation was pretty
interesting and illuminating regarding this area.

[1] [https://github.com/pickhardt/betty](https://github.com/pickhardt/betty)
[2] [http://iftechfoundation.org/frequently-asked-
questions/](http://iftechfoundation.org/frequently-asked-questions/)

------
du_bing
Hi, some tools seem work fine with English, so is there any good NLP tool for
Chinese? Hope for some advice, thanks ahead.

~~~
accraze
yeah I would like to know about this too, any sort of NLP tools for Chinese or
Japanese would be helpful.

------
probinso
Start by finding a linguist. You can find one at your local university.

Let the linguist design your first project. It should be something that they
don't know how to solve, but have wanted to know.

Don't worry about if it is feasible. Go to local data meetups when you have
enough exposure to form your first questions.

------
carljohan
Jurafaki and Martins Natural language processing is a great book covering a
great deal pf topics in nlp.

------
joelhooks
We've just started adding lessons on this topic on egghead.io [0]

[0] [https://egghead.io/lessons/node-js-break-up-language-
strings...](https://egghead.io/lessons/node-js-break-up-language-strings-into-
parts-using-natural)

------
garysieling
Do you want to use NLP in a project, or to dig into the state of the art?

The NLTK approach may be dated, but it is easier to approach as an engineer,
especially if this is a hobby. It will give you a good introduction to
problems in the space.

The math heavy approaches may give better results long-term, but it will be a
much longer time commitment, but this is probably more appropriate if you're
trying to find a job.

You can also do interesting things with a small dataset and the free plans of
APIs like Watson. E.g., I'm working on a search engine for standalone lectures
- [https://www.findlectures.com](https://www.findlectures.com).

------
elorant
I would suggest you start with “An introduction for information retrieval”.
You can find a free version here:

[http://nlp.stanford.edu/IR-book/](http://nlp.stanford.edu/IR-book/)

------
dukakisxyz
Check out this curated list of resources dedicated to Natural Language
Processing on GitHub: [https://github.com/keonkim/awesome-
nlp](https://github.com/keonkim/awesome-nlp). Also this is a great blog for
understanding the business and high level aspects of the technology:
[https://lekta.ai/blog/](https://lekta.ai/blog/)

------
noahshpak
I got into NLP through Chris Callison-Burch's class at the University of
Pennsylvania ([http://mt-class.org/penn/](http://mt-class.org/penn/)). Great
meta resource for intro readings, background, and advanced methods.

This is the textbook for the course:
[http://www.statmt.org/book/](http://www.statmt.org/book/)

------
totalperspectiv
Has anyone read Language Processing in Perl and Prolog and have thoughts on
it? I'm looking g for something that goes deep on theory, but has good code
examples, and is preferably a book.

[https://www.amazon.com/gp/aw/d/364241463X/ref=dp_ob_neva_mob...](https://www.amazon.com/gp/aw/d/364241463X/ref=dp_ob_neva_mobile)

------
stass
Prolog and Natural-Language Analysis[1] is great from both theoretical and
practical standpoints.

[1] [http://www.mtome.com/Publications/PNLA/prolog-
digital.pdf](http://www.mtome.com/Publications/PNLA/prolog-digital.pdf)

------
JSeymourATL
Build up personal & professional contacts. Check out this group -- ACM Special
Interest Group on Artificial Intelligence >
[https://sigai.acm.org/index.html](https://sigai.acm.org/index.html)

------
shanwang
I'm going through the stand ford cs224D videos, only done 3 videos and they
are very theory focused, lots of math equations. Any one know other good
materials on NLP using neural networks?

------
felix_thursday
here's a pretty comprehensive overview of NLP videos, tutorials, courses,
books, etc. [http://blog.algorithmia.com/introduction-natural-language-
pr...](http://blog.algorithmia.com/introduction-natural-language-processing-
nlp/)

------
probinso
start a project with someone. write your own data scraper, and implement a
model.

------
kylebgorman
I would _not_ recommend NLTK (or its book) or Jurafsky & Martin, or Manning &
Schuetze. All are insanely dated. Watch some Coursera lectures, check out a
newer, non-academic, application-oriented text, or just build something.

------
lifeisstillgood
to the mods: vagabondjack's comment seems sensible, informative and well
thought out but seems to have been de-duped in error.

Any chance of raising it out of grey-text territory?

------
hiou
NLTK[0][1] (Natural Language Toolkit) was fantastic as an initial resource for
me. Because it's a self contained book and library, I found it to have a very
smooth learning curve. There is some introductory programming stuff that you
can very easily just skip in the beginning so don't let that turn you off
initially.

[0] [http://nltk.org](http://nltk.org) [1]
[http://nltk.org/book](http://nltk.org/book)

------
joesmo
Check out Stanford's NLP libraries. We've been using those in production for
years now. The documentation around it is not great, but the tools work well.

------
edblarney
Watch the videos made by Jurafsky (Stanford) as a starting point.

They are quick. This will give you an overview of classical NLP.

From there, you can dig more where you want.

