

NatI: Multi-language voice control system for Ubuntu written in Python - rcorcs
https://github.com/rcorcs/NatI

======
sherjilozair
I've done something similar with Sphinx, which worked quite well too.

However, I feel the real challenge to productify such a technology would be to
make this system recognise utterances spoken casually, over loud music, over
friends talking in the background.

Also, how would the speech to text recognise the user's commands over some
sound playing from its own speaker. Suppose I set up a text-to-speech system
to read a book to me. I want to be able to make it repeat a paragraph (because
my attention had drifted away). However, the input channel of the microphone
would contain not only me shouting "stop", but also the output of the text-to-
speech system.

We humans have a way of factoring out our own voice when we are listening to
someone else. Is there a way to make machines do that?

~~~
rcorcs
It is doable with some analysis of the sound signal. Honda's ASIMO does that,
somehow. But I think it would be a current scientific research topic.

~~~
sherjilozair
Any links on relevant material? I'm okay for delving into current research
topic.

~~~
rcorcs
These are very interesting papers from Google:
[http://research.google.com/pubs/SpeechProcessing.html](http://research.google.com/pubs/SpeechProcessing.html)

------
khc
One problem I have with things like this is non-customizable hot words. For
example, Google Now has this feature where it would always listen for "Ok
Google" as long as your device is plugged in, which is a great feature. The
problem is I have multiple devices, and saying "Ok Google" would wake all of
them up, which is never what I wanted.

~~~
melling
Sounds like you need to be give each device a name. Jarvis!

------
zzmp
I've been working with Julius a lot, another open-source speech recognition
engine.

How easy did you find it to work with NLTK?

Did you have a lot of NL experience before this project? Do you think
something like NLTK would be easy to pick up?

~~~
rcorcs
I've studied NLTK before from the book Natural Language Processing with
Python. I even use NLTK within NatI.

------
pavanky
If it is written in Python, is this really only relevant to Ubuntu ?

~~~
rcorcs
I think it can be used in other operating systems with little (if any) change,
as far as you have all the python libraries requested. But, I've only tested
it with Ubuntu.

------
coppolaemilio
It would be nice if we all can contribute a little bit you improve the
repository :) I've already started giving a little bit of my time to it

------
Nux
Hm, something like this would be nice in conjunction with my XBMC htpc.

~~~
rcorcs
That would be great! I have some future plans for using it with some robots
([https://www.youtube.com/watch?v=X5nTHwl0K_w](https://www.youtube.com/watch?v=X5nTHwl0K_w)),
with smart-house technology, like "turn on/off the lights".

------
olefoo
I don't know about you all; but I don't know if I trust a project that check
.pyc files into version control.

~~~
rcorcs
Actually, these files are not needed, you can delete them and just execute the
python files.

~~~
olefoo
It shows a lack of care and craftsmanship.

If someone can't be arsed to borrow a .gitignore ( and github will give you a
fairly complete one for python ) what other corners will they cut.

------
Caligula
What asr engine are you using?

~~~
sherjilozair
Google.

[https://github.com/rcorcs/NatI/blob/master/gapi.py#L26](https://github.com/rcorcs/NatI/blob/master/gapi.py#L26)

~~~
rcorcs
I have just updated to use Google Speech API v2.

