

Show HN: Developments on the Star Trek / Jarvis computer - tw334
https://github.com/thomasweng15/E.V.E.

======
nshm
There are so many toys like this out there. It's very simple to hardcode few
simple commands in python script and show that as an advance in artifical
intelligence. Some examples are

Simon <https://sourceforge.net/projects/speech2text/> Vedics
<http://sourceforge.net/projects/vedics/> Palaver
<https://github.com/JamezQ/Palaver> Voicekey
<http://sourceforge.net/projects/voicekey/>

I could add 10 others. Unfortunately all such projects are practically
unusable for a wide audience just because they lack very important features
and provide only basic functions. The problem is that advanced development
require understanding of speech recognition internals, close cooperation with
engine developers and hard user interface testing.

Here is the list of features one have to implement to provide even basic user
experience:

1\. Implement and test proper keyword spotting for voice activation. This
feature is missing in Julius engine and all workarounds which are used do not
provide enough accuracy.

2\. Implement invisible speaker detection/online adaptation to deal with
environment issues. With adaptation accuracy is extremely high for a few
speakers. You can easily dictate free form speech.

3\. Implement free-form speech. This part require data collection and native
language understanding work.

4\. Implement a framework to detect microphone and noise issues. Microphone
issues like clipping are a major source of accuracy problems in speech
recognition. Most engines do not care about proper microphone.

5\. Provide an easy way to add new commands with no training and no knowledge
of phonetics. This reqiures G2P component which converts unknown words to
phonemes like the one provided by CMUSphinx.

6\. Implement dialog management and error correction. So that user can not
just run queries but have a conversation with the application. Interstingly,
that requires update of the previous engine results and analysis of n-best
lists or lattices.

So I hope someone would start doing real work on speech interfaces with close
cooperation with engine developers. That would create amazing things to
demonstrate.

------
hardwaresofton
I have something JUST like this that runs on CMUSphinx... And the feature set
is not as impressive yet -- I wonder if I should keep developing it

~~~
tw334
Julius can be off the mark sometimes. Do you think CMUSphinx is a better
option as an always-on listener? I haven't tried it.

~~~
hardwaresofton
What I actually want to build is multi-backend. It's running off of CMUsphinx
right now (which was a ridiculous pain to build, but also taught me a lot
about the things that power it), but I want the audio backends to be
switchable at the change of some strings in a config file, and of course, the
proliferation to more platforms.

The output/initialization/running of julius and cmusphinx are very very
similar.

I do think CMU Sphinx is pretty good as an always on listener -- I've had
times where it would suddenly die after running for hours (which I tried to
debug) -- and times that it would run fine after me playing hours of music
non-stop.

