Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to start with NLP
18 points by arrmn on Mar 31, 2017 | hide | past | web | favorite | 9 comments
It seems that I'm going to start to work on a NLP project but I don't have much experience with it since I'm a software engineer. I've played around with coreNLP and spaCy to get a feeling for NLP.

So what is the best way to tackle this topic, I don't want to do research in NLP, for now I just want to get to a level where I'm productive.

What are the sites that I should read regularly, tools that I need to try and good intro books?

Thanks in advance




Frankly if you give more detail on what kind of NLP, we can give more pointers, but for comprehensive resource, 3rd draft Jurafsky/martin standard text, missing pieces TBA: LSTM's, vector embeddings (word2vec etc), seq2seq/neural translation etc

https://web.stanford.edu/~jurafsky/slp3/

(you can email questions anytime)


Thanks for the lectures, I'll look through it, I've started to watch their lecture form Jurafsky and Manning that is available on yt.

Currently It's mostly about topic/category recognition in articles and the other part are entities, how to do entity linking and the entities relevance for the articles.

Maybe later on I'll have to look for synonyms (wordNet and word embeddings can be useful for this), but this is something for the future.


Things like topic recognition, entity and reference linking/disambiguation, synonym/paraphase are well researched tasks with purpose-made datasets for competitions like TREC. The most common approaches will be to choose a word embedding (one of the word2vec or Glove are good starts) and run that into a CNN or RNN variant: https://arxiv.org/abs/1702.01923

https://arxiv.org/abs/1611.09100

https://metamind.io/research/learning-when-to-skim-and-when-...

https://arxiv.org/abs/1611.06792

https://arxiv.org/abs/1611.03305

probably if you search arxiv and the ACL anthology you should find good lit searches/historical summaries: http://aclweb.org/anthology/ Karpathy's arxiv sanity preserver is a big help. You can put together a list of maybe a few dozen twitter accounts and hashtags to watch, that'll keep you pretty current.



Must have missed that one, great resource as far as I can tell, thanks.


NLP is a pretty broad area. Do you want something like a chatbot? Do you want to extract facts from the web? Do you want to understand reviews to see what specifically liked or disliked? Do you want to group documents based on an unknown set of topics? How much data can you get a hold of? Is it labeled?

IMO the field is quite diverse, so the best you can do is make yourself aware of the various problems people have managed to have some success on and roughly how, so that when you encounter that problem you know what it is called and can dig into the literature.

Alternatively if your goal is to build products, I would suggest trying out the various NLP APIs that exist that may be able to take care of the entire problem for you. Not everything has an API, and they don't all make sense to use when the off the shelf components are available, but higher level things like LUIS or API.ai may be useful for not really having to think about the underpinnings too much.


Basically we want to do analyze articles, what are the topics and categories and then we need to look at entites and link them to our knowledge base.

There are a lot of APIs that can solve our problem, but our customer doesn't want to use external APIs, he wants to build it in his own project.


Not sure if this is directly useful, but specifically for neural networks applied to NLP-ish tasks in PyTorch (Python machine learning framework): https://github.com/spro/practical-pytorch


Here are some NLP classes:

* Oxford Deep NLP Lectures https://github.com/oxford-cs-deepnlp-2017/lectures

* Stanford Natural Language Processing with Deep Learning http://web.stanford.edu/class/cs224n/syllabus.html

* Georgia Tech Natural Language Understanding https://github.com/jacobeisenstein/gt-nlp-class

* Georgia Tech Deep Learning For NLP in PyTorch https://github.com/rguthrie3/DeepLearningForNLPInPytorch

And some books:

* Natural Language Processing with Python https://www.amazon.com/Natural-Language-Processing-Python-An...

* Foundations of Statistical Natural Language Processing https://www.amazon.com/Foundations-Statistical-Natural-Langu...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: