
Curated list of speech and natural language processing resources - sebg
https://github.com/edobashira/speech-language-processing
======
greatthanks
Those curated lists poppung up all over the place seem to indicate a need for
pre-Google-style Altavista/Yahoo portals.

~~~
arafalov
Curation is always the next step after explosion of content. Yahoo was
curation of the whole internet. Then it got too hard. Now, we have enough
content in tiny sub-niches to need curation on that level. I definitely see
the need for curation of resources around the topic I am interested in (Apache
Solr).

Unfortunately, I haven't seen a good software platform that actually allows to
build a good curation site. Ones that exist want you to build the content for
them. I want one I can run/own/brand on my own. I suspect there might be some
in the library space though (haven't search _very_ hard yet).

~~~
davidw
> Unfortunately, I haven't seen a good software platform that actually allows
> to build a good curation site

Emacs and HTML work fine, and have been a pretty good solution for the last 20
years.

~~~
arafalov
If your time is free - sure. I prefer to outsource markup consistency,
repetition of same content under different tags, and promoted items management
to software.

~~~
davidw
There are about a zillion things out there that can do that.

Including Emacs, which you could use with a bit of elisp :-)

~~~
arafalov
Been there, done that (elisp included), don't think that's quite what I had in
mind.

But thank you for persevering. :-)

------
nl
[https://github.com/facebook/MemNN](https://github.com/facebook/MemNN) should
be in the language modelling (or Deep Learning) part. I'll give them a pass
because it was only released a couple of days ago.

The original Word2Vec[1] is missing too. While Gensim and Glove are nice,
Word2Vec still outperforms them both in some circumstances.

Surely there is a good LTSM language modelling project somewhere too? I can't
think of one off the top of my head though. There's some code in Keras[2], but
maybe Karpathy's char-RNN would be better[3] because of the documentation.

[1] [https://code.google.com/p/word2vec/](https://code.google.com/p/word2vec/)

[2]
[https://github.com/fchollet/keras/blob/master/examples/lstm_...](https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py)

[3] [https://github.com/karpathy/char-rnn](https://github.com/karpathy/char-
rnn)

~~~
michael_h
LSTM --> right now, Torch 7 and Theano are receiving the bulk of the
attention.

~~~
hendler
Keras is based on Theano - an easy way to get started.

------
gbrits
Consider a speech-to-structured-search-app in a limited domain, like a
specialized siri/google now. For example something like a real estate search
assistant with possible questions like: "what new 2 bedroom apartments have
become available in Capitol Hill, Seattle this week?"

Perhaps naively, it seems a big part of the deducing meaning could be done
doing ordinary dictionary lookups with terms like 'bedroom', 'apartments',
"Capitol Hill", "seattle" etc.

Is this indeed naive, or is this 'dictionary lookup'-technique part of the bag
of tricks used? If so, any good references to use this in combination with
other techniques described here?

Highly interested in this topic, but looking for a nice introduction to get
used to the terminology of the field.

~~~
amirouche
This is called Question/Answering (QA). "bedroom", "apparatments" are
different entities from "Capitol Hill", "Seattle". You could do as you say,
trying to understand the question based on some of the words that appears
using statistics. This is a "bag of word" approach.

The general idea of NLP is not different from general computer science ie. 1)
narrow the problem 2) solve it 3) try to solve a bigger problem.

The tower of sentence structure in NLP is:

\- bag of word

\- part of speech + named enties tagging

\- dependency tagging/framing

\- semantic tagging

The idea is to create _templates_ for most common questions. Then you parse
questions recognizing the named entities like "Capitol Hill", "Seattle" and
commons "appartement" you can resolve the question. It's not an ordinary
_dictionary hash lookup_ since for in given template there is several "key".
The value of the dictionary is the correct search method. It makes me think to
multiple method dispatch which support dispatch by value.

Also something to take into account is that in the "assistant" example you
give, the assistant can ask for confirmation. You don't explicitly state that
you are looking to "rent" something. So the system might not recognize the
question, but just guess that you talk about renting something because it's
the most popular search around Capitol Hill, Seattle. You can implement a
"suggest this question" feature that will feedback the "question dispatch"
algorithm to later recognize this question.

This is mostly a Dynamic Programming approach. Advanced NLP pipelines use
logic, probabilistic programming, graph theory or all of them ;)

The other big problems of NLP are:

\- summary generation \- automatic translation

Important to note is that like other systems it must be goal driven. You can
start from the goal and go backward infering the previous steps or do it from
the initial data and go forward. Again, it's very important to simplify.
Factorize by recognizing patterns. It's the main idea regarding the theory of
the mind.

Have a look at this SO question [1] I try to fully explain an example QA.
Coursera NLP course is a good start.

OpenCog doesn't deal solely with NLP but gives an example of what a modern
artificial cognitive assistant can be made of.

Beware that NLP is kind of loop-hole.

[1] [http://stackoverflow.com/questions/32432719/is-there-any-
nlp...](http://stackoverflow.com/questions/32432719/is-there-any-nlp-tools-
for-semantic-parsing-for-languages-other-than-english/32670572#32670572)

~~~
gbrits
Thanks for this. Looked at your SO answer, and feel what you call the 'narrow
search approach' is what I'm looking for.

Above you said: > The idea is to create templates for most common questions.

I assume here that a template would be an abstract phrase where things like
Named Entities (Seattle, Capitol Hill), Adjectives (2 bedroom), etc. are
removed and substituted by variables. Correct?

Could supervised learning then be used to map natural language questions to
templates? After all, there's only so many ways in which you can ask a
particular abstract question (i.e.: template) in a limited domain.

What I'm thinking then are the following steps:

\- 1. Source questions that cover the domain. (e.g.: Mechanical Turk)

\- 2. Manually come up with abstract templates that cover these questions.
(Although somehow I feel it must be possible to semi-automate this using
Wrapper Induction or something)

\- 3. Manually label a test set <question -> template>

\- 4. Have the system learn/classify the remaining questions and test for
accuracy (what classifiers would you use here?)

Flow of new question:

1\. if coverage in 2 was big enough, the system should be able to infer the
template.

2\. A template should be translatable to a bunch of queries (e.g.: GraphQL
format). Not the hard part I believe.

Out pops your answer in machine form. Bonus points to transform that answer
into a Natural Language answer using some generative grammar.

Of course the devil is in the details but from 10,000 feet does this look
solid? Suggestions/glaring omissions? Thanks again.

~~~
amirouche
1\. There is the Yahoo QA dataset that might be helpful. Also you can crawl
specific websites for such questions

2\. _semi-manually_ come up with templates (a grammar for the questions). You
have to analyse the dataset in a unsupervised way to find out the common
patterns and sanatize the results.

3\. maybe step 2 is enough.

4\. markov networks are useful in this context but I can be wrong

> A template should be translatable to a bunch of queries (e.g.: GraphQL
> format). Not the hard part I believe.

Yes once you have the templates with typed variables (named entities,
adjectives, etc...) like you describe you can write the code to search for the
results. I doubt GraphQL is a good solution for that problem. You can't
translate the templates into a search on the fly. It's a mapping that you need
to build manually or automatically.

I think in your case SQL will be fine. Have a look at
[https://github.com/machinalis/quepy](https://github.com/machinalis/quepy)

------
m_eiman
About the "Text-to-Speech" section there, I was really impressed with the
updated Swedish "Alva" voice in OSX El Capitan: it correctly pronounces
"tomten" in different ways in the first and second occurrence in this example:

say -v Alva "Tomten dricker julmust på tomten"

"Tomten" can mean either "Santa Claus" or "the yard"/"the plot" depending on
context, and apparently they're able to detect this properly.

~~~
motdiem
OS X makes progress with every release on this front. I typically test it with
a few tricky french sentences (think "les poules du couvent couvent") and it
seems to improve, but it's hard to say from the outside what gets better in
the model ("Mes fils ont cassé mes fils" still fails for instance, but seems
harder to detect to me)

~~~
nl
I think the OP was talking about Text-to-Speech, and you are (maybe?) talking
about speech recognition?

(The irony of this misunderstanding being kicked off by a comment about the
text-to-speech engine understanding the context of a word amuses me)

------
mohn
I'm glad to see the CMU pronouncing dictionary in there. It was instrumental
when I wrote a web app[1] to generate Spoonerisms[2] (my apologies for the UI
and the fact that I haven't yet removed the more obscure words, especially
obscure homophones, from my cmudict subset).

The cmudict isn't under the text-to-speech subheading in this list, but I
think the folks at Carnegie Mellon may have considered text-to-speech
applications, like a talking GPS navigator, when they compiled the dictionary.
I recall the cmudict containing lots of US city names.

[1] [https://spoonerizer.appspot.com/](https://spoonerizer.appspot.com/)

[2]
[https://en.wikipedia.org/wiki/Spoonerism](https://en.wikipedia.org/wiki/Spoonerism)

------
ZanyProgrammer
Also missing TextBlob, which was featured on HN recently on the front page.

------
maresca
Could anyone point me to some sentiment analysis frameworks and/or update the
list to include some?

------
ZanyProgrammer
No NLTK love?

~~~
vkhuc
Why use NLTK when we can use spaCy instead?
[http://spacy.io/](http://spacy.io/)

~~~
vasco
English only.

