

Experiment No 8 — Responding to Voice Commands - colmtuite
http://www.jordanm.co.uk/lab/respondtovoice

======
jared314
After watching Starfire[0], I came to the conclusion that computer voice
commands were another form of mystery meat navigation[1], with the added issue
of interrupting the people around me. I have used it on my phone a few times,
because the button surface area is small, but that's about it.

[0] <http://asktog.com/starfire/>

[1] <http://en.wikipedia.org/wiki/Mystery_meat_navigation>

------
kajecounterhack
I first started experimenting with voice commands when the chrome speech input
field came out, because Google's endpoint for taking audio clips became
exposed. I wrote a python recorder that would start recording upon utterances,
then stop and curl the result to Google, take the result, primitively do some
NLP, route the request to Yelp/Google/Wikipedia and return a response to a web
frontend.

What I learned? Even Google's voice transcriber, which is trained on an
immense set of n-grams and such, is not very good. There's a lot of advanced
signal processing + heuristics using ngrams to figure out what people are
saying. Also, since it is very computationally demanding, Google hosts a web
API instead of making it a local transcribing. So the internet itself becomes
a bottleneck to timely transcribing. It's noticeably laggy.

We're not yet close to having computers capable of hearing with the same
effectiveness as humans. Hopefully it happens eventually though -- that would
unlock a world of interfaces. At the moment it's far too finicky to be
practical.

If you're curious and want to try this yourself, Apple's endpoint for Siri
voice commands is also available, someone's found it.

------
baq
as a non-native speaker, i for the life of me can't get any of the commands
right.

------
zobzu
Make the page doctor.

