Rasa NLU: Open-source bot tool for natural language understanding

espadrine · on Dec 18, 2016

For NLP, they use either MITIE[0] or spaCy[1].

That said, from my experience, you can get surprisingly far with simple systems; for instance, queread[2] relies on graph learning and statistics.

[0]: https://github.com/mit-nlp/MITIE

[1]: https://spacy.io/

[2]: https://github.com/espadrine/queread#workings

IshKebab · on Dec 18, 2016

A while ago I looked for information on how Alexa, Wit.ai, Nuance Mix etc. do this intent classification and didn't find anything.

These guys have posted a nice blog post about their approach:

https://conversations.golastmile.com/do-it-yourself-nlp-for-...

They suggest that they add the word vectors in the sentence. But it seems to me that that would make the result independent of the order of words (i.e. "when does Tesco open?" and "Open Tesco when does" are the same). I thought I had tested that and it didn't work but actually I just tried saying "Tesco open does when?" to Alexa and it said "Sorry, I don't have the business hours for Tesco". Inconclusive I'd say but interesting anyway!

bendyBus · on Dec 18, 2016

yeah you're quite right, intents are built with a bag of words model & doesn't take order into account. Entity extraction does though. If you find a case where word order is really important for getting intents right I'd love to know about it! We could find a way to make that work.

espadrine · on Dec 18, 2016

> If you find a case where word order is really important for getting intents right

This may be facetious of me, since it's still fairly uncommon, but here it is.

Set up the go game, go up game the set.

State the ban law, ban the state law.

Drive the car by the park, park the car by the drive.

Rabidgremlin · on Dec 18, 2016

The bot I have been working on uses bag of words and n-grams to identify intents. Useful for when someone says something that may have multiple entities of the same type. For instance a journey start and end point. Can use the "from" and "to" words match better. Also very useful when training on phrases that are very similar such as FAQ questions which often have the same words but the order and one or two keys words is super important to get the right match.

Rabidgremlin · on Dec 18, 2016

It's the "conversation" part that is really tricky... I have been working on a bot for a large Corp for the last few months and we have been using Inkle's Ink narration/dialog engine for this. Works very well. They let me open source the framework: https://github.com/rabidgremlin/Mutters it uses OpenNLP for intent identification and NER, Ink for conversation state and "scripting"

Maarten88 · on Dec 18, 2016

This is interesting, I've been using LUIS for some time now and an open source alternative - especially one that is drop-in API compatible - is very welcome.

However I can't find any information in the docs on how comparable the results are (i.e. does it have built-in date and time entity recognition like LUIS?). Most importantly: what languages does this support? All examples are in english-only. Is it even language aware, or do you train a model in any language? I'd be very interested if this were to support languages that LUIS does not have (like my language: Dutch)

tmbo · on Dec 18, 2016

Currently it supports english and german. In general we need a word embedding for each language. If that has been created by someone else, it's rather easy to integrate new languages.

bendyBus · on Dec 18, 2016

currently there are no built-in entities like dates, times, locations etc. But really keen to set up a way for users to share models, and that would definitely include these things as well

ragebol · on Dec 18, 2016

Nice, I've been looking for an offline solution to do this sort of thing to run on a robot for RoboCup@Home.

Perhaps http://sag.art.uniroma2.it/demo-software/huric/ might also provide some training data. It's annoying though I can't just download that corpus but have to email some guy first.

niklasber · on Dec 18, 2016

Seems like it doesn't say anywhere which language(s) it support? Guessing it's English only.

tyingq · on Dec 18, 2016

http://rasa-nlu.readthedocs.io/en/latest/config.html

"language : language of your app, can be en (English) or de (German)."

mark_l_watson · on Dec 18, 2016

Looks like an interesting project, based on skilearn and spaCy. The project provides some simple training files for the domain of asking about restaurants.

It would be useful to also have very large training data sets available.

nrp12 · on Dec 19, 2016

Cool stuff - was looking for Open source NLU alternatives for luis.ai. Thanks to the emulators, this fits right in.

Does anyone know why rasa chose mitie/spacy and not stanfordnlp?

bendyBus · on Dec 19, 2016

we could integrate with other backends, including NLTK & coreNLP. The stanford stuff is under GPL though, prefer to promote startup-friendly licenses.

qhoc · on Dec 18, 2016

How well does this scale? Let's say I have 500MB of JSON files from restaurant info and user reviews.

samcodes · on Dec 19, 2016

I think you would have to do some processing on those, I'm pretty sure the input format has sentences classified by intent.

very_goord · on Dec 18, 2016

Great stuff, Is there a Docker support already ?

bendyBus · on Dec 18, 2016

yes! Docker Cloud isn't quite working yet but the Dockerfile should work :)

very_goord · on Dec 18, 2016

Awesome!! Many thanks guys

Coldewey · on Dec 18, 2016

like the idea :-) good job!