Hacker News new | comments | show | ask | jobs | submit login
Ask HN: How Can I Get into NLP (Natural Language Processing)?
297 points by aarohmankad on Nov 10, 2016 | hide | past | web | favorite | 43 comments
I've recently become quite intrigued by the concept, and want to learn more about it.

If you have any resources, I'd love to see them. They can be videos, articles, tutorials, courses, etc.

If there is any prerequisite knowledge required, which I assume there will be, I would also love a starting point. As for my background, I have experience in full stack web development, game development, and about a year's worth of academic computer science study.

My co-authors and I wrote "Taming Text" (https://www.manning.com/books/taming-text) specifically for programmers (there is little math, mostly code) interested in getting started in NLP. The examples are a bit dated at this point (2013 publication date), but still applicable for someone getting started. Covers getting started, feature extraction and preprocessing, search, clustering, classification, string heuristics, Named Entity Recognition and finishes off w/ a simple Question Answering system. Examples are in Java. It is not an academic treatise.

Just wanted to say: thanks for writing this book, it's really good! I also second your motion that it is a great place to begin, as it's a single coherent source for a nice starting point.

I have found these sources useful for learning and prototyping NLP:



NLTK is always a good starting point: http://www.nltk.org

I also wrote a 3-part article leveraging OpenNLP with Clojure:


If you're interesting in applying NLP without necessarily having theoretical background, wit.ai offers some really impressive features.

Course also offers a good course:


Wow, thanks for mentioning my blog! I got into this using "Natural Language Processing with Python", which is basically an intro textbook for NLP that uses NLTK.

I particularly like that they include example exercises in each chapter, because it can be otherwise challenging to see how particular techniques are useful.


This one is also available online for free: http://www.nltk.org/book/

There is a great set of lectures by Dan Jurafsky and Chris Manning: https://www.youtube.com/watch?v=nfoudtpBV68&list=PL6397E4B26...

It would be helpful to have some background in Machine Learning. For a good introductory course with a mix of mathematical background, see https://see.stanford.edu/Course/CS229

NLP in the more modern systems is backed by deep neural nets. Here's a course on NLP using deep learning: https://www.youtube.com/playlist?list=PLIiVRB6G_w0i-uOoS6cDh...

Jurafsky & Martin's book is pretty much the standard work on NLP. They are currently working on a 3rd edition and the draft chapters are available from the book's web page:


Any suggestions for a follow on to 224D? Anything with larger systems would be interesting

For initial learning, I would second NLTK with: http://www.nltk.org

You can also checkout https://github.com/vseloved/cl-nlp. It is an NLP toolkit in Common Lisp. Vsevolod the project owner is a great guy to work with. I had contributed with some minor bug fixes, tests, documentation more than a year back, hence the mention of Vsevolod.

You could also think on the alternative lines of contributing to an open source project in NLP and building an application on top of it. Talking to any such project owner for expected sample apps might help, as they can go into that project gallery and you get to level up your skills. Hope this helps.

Any recommendation that does not start with statistics and continue with statistics for a while isn't serious.

A better approach will be to recommend statistics resources to make this a better or serious recommendation.

Could you be kind enough to do that? Otherwise, your evaluation of the recommendation is not serious!

You're probably looking for something a bit more sophisticated than what I'm about to mention, but if you don't need anything too sophisticated (that is, if you can significantly limit the domain of the speech you need to be able to understand), you could do something like what I did for "the computer" on my star trek-like space sim Space Nerds In Space: http://hackaday.com/2016/06/08/talking-star-trek/

I used pocketsphinx (trained with specially limited vocab) for speech to text, my own home grown Zork-esque parser for "understanding" the text and generating responses, and pico2wav for text to speech for the responses. That's described in a bit more detail here: https://scaryreasoner.wordpress.com/2016/05/14/speech-recogn...

My recommendations, based on online courses and YouTube playlists I've taken:

- Coursera's old NLP course by Michael Collins, Columbia Univ. More of theory and concepts. It's discontinued now on coursera but the material is available at academictorrents. [1]

- NLP with Python and NLTK videos by sentdex [2]. Mostly programming, but with useful nuggets of concepts introduced here and there.

[1]: http://academictorrents.com/details/f99e7184fca947ee8f779016...

[2]: https://www.youtube.com/playlist?list=PLQVvvaa0QuDf2JswnfiGk...

https://spacy.io/ is one of the best library for NLP if you are using python

Highly recommend Spacy and the http://explosion.ai blog!

I did a one week ml stunt last year: https://medium.com/learning-new-stuff/machine-learning-in-a-...

I'd recommend starting with the Kaggle Bag of Words tutorial.

NLP is a huge topic, and the choice of materials pretty much depends on what you'd like to focus on. In my experience nothing beats a good textbook, especially if you do the exercises.

The classic NLP textbook is

* Jurafsky, Martin: "Speech and Language Processing" (https://web.stanford.edu/~jurafsky/slp3/) -- already mentioned here: a very solid overview textbook to give you an idea about the field;

Should you be interested in statistical NLP (even if it probably isn't as sexy as it used to be), the classic there is:

* Manning, Schütze: "Foundations of Statistical Natural Language Processing" (http://nlp.stanford.edu/fsnlp/).

Take a look at Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/

It's relatively fast (after model load time) and quite feature-rich.

Also checkout http://corenlp.run for a hosted version of the CoreNLP server.

If you're interested in deep learning for nlp... I suggest at least some familiarity with these papers. It sorta depends on what task you want to use it for.


Please read through the Handbook of NLP for a nice overview.


My suggestion is, in addition to using the videos and courses for background knowledge, to take up and work on a (non-homework) project, to truly explore the area.

For eg., Betty [1] is quite an interesting project with both real-life use and practical NLP considerations, and is looking for new maintainers. (I'm not affiliated, just interested in NLP myself and have been itching to get into betty for some time.)

If you like thinking about game design, there's also the option of Interactive Fiction [2], NLP-involving ones are called parser-based fictions I believe. A recent FLOSS podcast episode with folks from the IF Tech Foundation was pretty interesting and illuminating regarding this area.

[1] https://github.com/pickhardt/betty [2] http://iftechfoundation.org/frequently-asked-questions/

Hi, some tools seem work fine with English, so is there any good NLP tool for Chinese? Hope for some advice, thanks ahead.

yeah I would like to know about this too, any sort of NLP tools for Chinese or Japanese would be helpful.

Start by finding a linguist. You can find one at your local university.

Let the linguist design your first project. It should be something that they don't know how to solve, but have wanted to know.

Don't worry about if it is feasible. Go to local data meetups when you have enough exposure to form your first questions.

Jurafaki and Martins Natural language processing is a great book covering a great deal pf topics in nlp.

We've just started adding lessons on this topic on egghead.io [0]

[0] https://egghead.io/lessons/node-js-break-up-language-strings...

Do you want to use NLP in a project, or to dig into the state of the art?

The NLTK approach may be dated, but it is easier to approach as an engineer, especially if this is a hobby. It will give you a good introduction to problems in the space.

The math heavy approaches may give better results long-term, but it will be a much longer time commitment, but this is probably more appropriate if you're trying to find a job.

You can also do interesting things with a small dataset and the free plans of APIs like Watson. E.g., I'm working on a search engine for standalone lectures - https://www.findlectures.com.

Check out this curated list of resources dedicated to Natural Language Processing on GitHub: https://github.com/keonkim/awesome-nlp. Also this is a great blog for understanding the business and high level aspects of the technology: https://lekta.ai/blog/

I got into NLP through Chris Callison-Burch's class at the University of Pennsylvania (http://mt-class.org/penn/). Great meta resource for intro readings, background, and advanced methods.

This is the textbook for the course: http://www.statmt.org/book/

Has anyone read Language Processing in Perl and Prolog and have thoughts on it? I'm looking g for something that goes deep on theory, but has good code examples, and is preferably a book.


I have no particular expertise on the topic, but just in case you missed it, there is this Quora question: https://www.quora.com/How-do-I-learn-Natural-Language-Proces...

It points to NLTK as the framework of choice, and has links to a couple MOOCs and tutorials.

Prolog and Natural-Language Analysis[1] is great from both theoretical and practical standpoints.

[1] http://www.mtome.com/Publications/PNLA/prolog-digital.pdf

Build up personal & professional contacts. Check out this group -- ACM Special Interest Group on Artificial Intelligence > https://sigai.acm.org/index.html

I would suggest you start with “An introduction for information retrieval”. You can find a free version here:


I'm going through the stand ford cs224D videos, only done 3 videos and they are very theory focused, lots of math equations. Any one know other good materials on NLP using neural networks?

here's a pretty comprehensive overview of NLP videos, tutorials, courses, books, etc. http://blog.algorithmia.com/introduction-natural-language-pr...

start a project with someone. write your own data scraper, and implement a model.

I would not recommend NLTK (or its book) or Jurafsky & Martin, or Manning & Schuetze. All are insanely dated. Watch some Coursera lectures, check out a newer, non-academic, application-oriented text, or just build something.

to the mods: vagabondjack's comment seems sensible, informative and well thought out but seems to have been de-duped in error.

Any chance of raising it out of grey-text territory?

NLTK[0][1] (Natural Language Toolkit) was fantastic as an initial resource for me. Because it's a self contained book and library, I found it to have a very smooth learning curve. There is some introductory programming stuff that you can very easily just skip in the beginning so don't let that turn you off initially.

[0] http://nltk.org [1] http://nltk.org/book

Check out Stanford's NLP libraries. We've been using those in production for years now. The documentation around it is not great, but the tools work well.

Watch the videos made by Jurafsky (Stanford) as a starting point.

They are quick. This will give you an overview of classical NLP.

From there, you can dig more where you want.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact