
Ask HN: How to start with NLP - arrmn
It seems that I&#x27;m going to start to work on a NLP project but I don&#x27;t have much experience with it since I&#x27;m a software engineer. I&#x27;ve played around with coreNLP and spaCy to get a feeling for NLP.<p>So what is the best way to tackle this topic, I don&#x27;t want to do research in NLP, for now I just want to get to a level where I&#x27;m productive.<p>What are the sites that I should read regularly, tools that I need to try and good intro books?<p>Thanks in advance
======
gtani
Frankly if you give more detail on what kind of NLP, we can give more
pointers, but for comprehensive resource, 3rd draft Jurafsky/martin standard
text, missing pieces TBA: LSTM's, vector embeddings (word2vec etc),
seq2seq/neural translation etc

[https://web.stanford.edu/~jurafsky/slp3/](https://web.stanford.edu/~jurafsky/slp3/)

(you can email questions anytime)

~~~
arrmn
Thanks for the lectures, I'll look through it, I've started to watch their
lecture form Jurafsky and Manning that is available on yt.

Currently It's mostly about topic/category recognition in articles and the
other part are entities, how to do entity linking and the entities relevance
for the articles.

Maybe later on I'll have to look for synonyms (wordNet and word embeddings can
be useful for this), but this is something for the future.

~~~
gtani
Things like topic recognition, entity and reference linking/disambiguation,
synonym/paraphase are well researched tasks with purpose-made datasets for
competitions like TREC. The most common approaches will be to choose a word
embedding (one of the word2vec or Glove are good starts) and run that into a
CNN or RNN variant:
[https://arxiv.org/abs/1702.01923](https://arxiv.org/abs/1702.01923)

[https://arxiv.org/abs/1611.09100](https://arxiv.org/abs/1611.09100)

[https://metamind.io/research/learning-when-to-skim-and-
when-...](https://metamind.io/research/learning-when-to-skim-and-when-to-read)

[https://arxiv.org/abs/1611.06792](https://arxiv.org/abs/1611.06792)

[https://arxiv.org/abs/1611.03305](https://arxiv.org/abs/1611.03305)

probably if you search arxiv and the ACL anthology you should find good lit
searches/historical summaries:
[http://aclweb.org/anthology/](http://aclweb.org/anthology/) Karpathy's arxiv
sanity preserver is a big help. You can put together a list of maybe a few
dozen twitter accounts and hashtags to watch, that'll keep you pretty current.

------
abhikandoi2000
[http://blog.ycombinator.com/how-to-get-into-natural-
language...](http://blog.ycombinator.com/how-to-get-into-natural-language-
processing/)

~~~
arrmn
Must have missed that one, great resource as far as I can tell, thanks.

------
Eridrus
NLP is a pretty broad area. Do you want something like a chatbot? Do you want
to extract facts from the web? Do you want to understand reviews to see what
specifically liked or disliked? Do you want to group documents based on an
unknown set of topics? How much data can you get a hold of? Is it labeled?

IMO the field is quite diverse, so the best you can do is make yourself aware
of the various problems people have managed to have some success on and
roughly how, so that when you encounter that problem you know what it is
called and can dig into the literature.

Alternatively if your goal is to build products, I would suggest trying out
the various NLP APIs that exist that may be able to take care of the entire
problem for you. Not everything has an API, and they don't all make sense to
use when the off the shelf components are available, but higher level things
like LUIS or API.ai may be useful for not really having to think about the
underpinnings too much.

~~~
arrmn
Basically we want to do analyze articles, what are the topics and categories
and then we need to look at entites and link them to our knowledge base.

There are a lot of APIs that can solve our problem, but our customer doesn't
want to use external APIs, he wants to build it in his own project.

------
sprobertson
Not sure if this is directly useful, but specifically for neural networks
applied to NLP-ish tasks in PyTorch (Python machine learning framework):
[https://github.com/spro/practical-pytorch](https://github.com/spro/practical-
pytorch)

~~~
sprobertson
Here are some NLP classes:

* Oxford Deep NLP Lectures [https://github.com/oxford-cs-deepnlp-2017/lectures](https://github.com/oxford-cs-deepnlp-2017/lectures)

* Stanford Natural Language Processing with Deep Learning [http://web.stanford.edu/class/cs224n/syllabus.html](http://web.stanford.edu/class/cs224n/syllabus.html)

* Georgia Tech Natural Language Understanding [https://github.com/jacobeisenstein/gt-nlp-class](https://github.com/jacobeisenstein/gt-nlp-class)

* Georgia Tech Deep Learning For NLP in PyTorch [https://github.com/rguthrie3/DeepLearningForNLPInPytorch](https://github.com/rguthrie3/DeepLearningForNLPInPytorch)

And some books:

* Natural Language Processing with Python [https://www.amazon.com/Natural-Language-Processing-Python-An...](https://www.amazon.com/Natural-Language-Processing-Python-Analyzing/dp/0596516495)

* Foundations of Statistical Natural Language Processing [https://www.amazon.com/Foundations-Statistical-Natural-Langu...](https://www.amazon.com/Foundations-Statistical-Natural-Language-Processing/dp/0262133601)

