

Wit.ai (YC W14) Wants To Be The Twilio For Natural Language - blandinw
http://techcrunch.com/2014/03/17/with-a-voice-interface-api-for-any-app-wit-ai-wants-to-be-the-twilio-for-natural-language/

======
ddod
Since the Wit.ai guys gave a shout out on my thread earlier today, I thought
I'd return the favor and show off my own implementation of a
voice>>text>>comprehension system that I used to make my personal site voice
interactive: [https://benwasser.com](https://benwasser.com)

I'm really glad there's work and advancement being done in this arena and I'm
hoping to see more people playing around with it.

~~~
jw2013
Wow, ddod your voice>>text is awesome. Did you implement it all on your own?

Some improvement text>>comprehension will be great. Right now it does not
understand many of my queries. Keep it up the good work!

~~~
ddod
Voice to text is a multi-million if not billion dollar endeavor. I simply
implemented the Web Speech API, which ostensibly uses Google (and possibly
Apple's) voice recognition system. I came up with the text comprehension bit,
which is limited to the input quality (what I get from the API) and the
training set. I've been adjusting and adding to the training set since I first
released this, but the matching has worked as well as I'd hope for.

My training set is specifically designed to be conversational interview and
personal questions, but I think a lot of the people who reach the site don't
grasp that.

Here's some examples of the input it gets and has no clue what to do with:

\- " um changes nice so when you change the things december " matched to: -1:
no match

\- " nexus 10 " matched to: 13: Huh?

\- " pictures " matched to: 14: Huh?

\- " call " matched to: 17: Cool

\- "videos " matched to: 16: Huh?

\- " change " matched to: 15: Huh?

And this is a set of input where people did get it:

\- " hi what's your name " matched to: 23: My name

\- "what's your name " matched to: 27: My name

\- "what do you do " matched to: 31: What I do

\- "why should we hire you " matched to: 38: Why you should hire me

-"what's your favorite food " matched to: 35: Food

-"wendy's see yourself in 5 years " matched to: 28: Goals

~~~
jw2013
> My training set is specifically designed to be conversational interview and
> personal questions, but I think a lot of the people who reach the site don't
> grasp that.

I did not either, as soon as I start to ask about general questions about you
(e.g.'what's your email address'), the result got better. Now I know your
answer is predefined to your personal info, I know why other questions won't
work well.

Perhaps many people, like me, only read the div starting with "Let's chat",
but then get started immediately (because the red recording button caught my
attention right away) and totally ignored the div "I'll try to answer here"
with your intent written.

------
supremum
For some reason my mind first just picked up Twilio & Natural Language and got
quite excited at the prospect of an additional layer on top of Twilio to run
NLP on SMS/phone call streams.

Like if you could just create smart NLP around SMS menus, you'd solve the
third world's sms-as-a-helpdesk frustrations.

Or think of the premium subscription services you could charge for when people
can interact on the level of natural language instead of just replying with
simple commands.

"for the first time, the developers themselves do not have to be experts in
the field, or face the prospect of huge expense to bring in that technical
knowledge from elsewhere." \- I love that the building blocks of building cool
experiences become more well-polished and easier to fit together.

It's a good time to be alive, that's for sure!

------
MortenK
Can anybody with practical experience developing with Wit.ai comment on how
accurate and consistent it works? Is there any new and better working software
behind this, compared to the current breed of frankly abysmal voice
recognition software (Siri, Nuance etc)?

~~~
iandanforth
It's pretty good! The standard caveats around having a quiet area with a
decent mic apply, but I get good results just chatting at my laptop.

However the _cool_ thing about Wit is that they are constantly updating their
suite of NL recognizers. The more you use the service, the better it gets, and
it does so without having to buy a new release of Dragon. :)

------
dangrossman
Their homepage ([https://wit.ai/](https://wit.ai/)) says "stream audio to the
API, get structured information in return", but the API docs say "send natural
language sentences (text) and get structured information (JSON) in return".

That's disappointing since the only problem I ran into with doing home
automation via a web application was the speech-to-text, not processing
commands once they were in text. A list of regular expressions works quite
well for that.

The HTML5 Speech Recognition API in Chrome kinda sucks. It does speech to text
well, but reliably keeping the API listening for speech at all has been
challenging. Even a bunch of code basically checking "has the
webkitSpeechRecognition object borked itself yet? recreate it and restart
listening" every two seconds doesn't work reliably.

I'd love a JavaScript API that can listen to the microphone, determine if
anything has been spoken (versus silence or background noise), and when
something that may be speech is detected, send it to another API endpoint that
converts it to text.

Edit: They do take audio input, woo :) Thanks for the correction.
[https://wit.ai/docs/api#toc_9](https://wit.ai/docs/api#toc_9)

~~~
mrmch
Their documentation has a pretty clear endpoint for sending raw audio:
[https://wit.ai/docs/api#toc_9](https://wit.ai/docs/api#toc_9)

~~~
dangrossman
Thanks, don't know how I missed the links at the top of the page. I only
checked each category on the left nav of the docs page.

------
endlessvoid94
I'm blown away by how much I can accomplish with wit.ai. I have my own
personal jarvis / siri system thanks to this service.

Can't recommend this service enough.

~~~
parm289
Is your system open source? Would love to take a look at how it works.

------
feralmoan
A lot of love here for wit.ai, I was thinking about transactional contexts
today (NLP conversations) and if I could support that kind of workflow with
wit and 'oh hey there ya go its already in there as states!'. Great work!

