Hacker News new | past | comments | ask | show | jobs | submit login
Wit.ai (YC W14) Wants To Be The Twilio For Natural Language (techcrunch.com)
108 points by blandinw on Mar 17, 2014 | hide | past | web | favorite | 17 comments

Since the Wit.ai guys gave a shout out on my thread earlier today, I thought I'd return the favor and show off my own implementation of a voice>>text>>comprehension system that I used to make my personal site voice interactive: https://benwasser.com

I'm really glad there's work and advancement being done in this arena and I'm hoping to see more people playing around with it.

Wow, ddod your voice>>text is awesome. Did you implement it all on your own?

Some improvement text>>comprehension will be great. Right now it does not understand many of my queries. Keep it up the good work!

Voice to text is a multi-million if not billion dollar endeavor. I simply implemented the Web Speech API, which ostensibly uses Google (and possibly Apple's) voice recognition system. I came up with the text comprehension bit, which is limited to the input quality (what I get from the API) and the training set. I've been adjusting and adding to the training set since I first released this, but the matching has worked as well as I'd hope for.

My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

Here's some examples of the input it gets and has no clue what to do with:

- " um changes nice so when you change the things december " matched to: -1: no match

- " nexus 10 " matched to: 13: Huh?

- " pictures " matched to: 14: Huh?

- " call " matched to: 17: Cool

- "videos " matched to: 16: Huh?

- " change " matched to: 15: Huh?

And this is a set of input where people did get it:

- " hi what's your name " matched to: 23: My name

- "what's your name " matched to: 27: My name

- "what do you do " matched to: 31: What I do

- "why should we hire you " matched to: 38: Why you should hire me

-"what's your favorite food " matched to: 35: Food

-"wendy's see yourself in 5 years " matched to: 28: Goals

> My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

I did not either, as soon as I start to ask about general questions about you (e.g.'what's your email address'), the result got better. Now I know your answer is predefined to your personal info, I know why other questions won't work well.

Perhaps many people, like me, only read the div starting with "Let's chat", but then get started immediately (because the red recording button caught my attention right away) and totally ignored the div "I'll try to answer here" with your intent written.

If you're interested in conversational voice & intent recognition, check out voicebox.com

For some reason my mind first just picked up Twilio & Natural Language and got quite excited at the prospect of an additional layer on top of Twilio to run NLP on SMS/phone call streams.

Like if you could just create smart NLP around SMS menus, you'd solve the third world's sms-as-a-helpdesk frustrations.

Or think of the premium subscription services you could charge for when people can interact on the level of natural language instead of just replying with simple commands.

"for the first time, the developers themselves do not have to be experts in the field, or face the prospect of huge expense to bring in that technical knowledge from elsewhere." - I love that the building blocks of building cool experiences become more well-polished and easier to fit together.

It's a good time to be alive, that's for sure!

Can anybody with practical experience developing with Wit.ai comment on how accurate and consistent it works? Is there any new and better working software behind this, compared to the current breed of frankly abysmal voice recognition software (Siri, Nuance etc)?

It's pretty good! The standard caveats around having a quiet area with a decent mic apply, but I get good results just chatting at my laptop.

However the cool thing about Wit is that they are constantly updating their suite of NL recognizers. The more you use the service, the better it gets, and it does so without having to buy a new release of Dragon. :)

Its pretty good IMHO. For instance, when I say "Move to the next song" to control my music or "Go to the next room" for a robot I'm working on, it both works as it should even tough its easy to mix up the intentions of these sentences.

Their homepage (https://wit.ai/) says "stream audio to the API, get structured information in return", but the API docs say "send natural language sentences (text) and get structured information (JSON) in return".

That's disappointing since the only problem I ran into with doing home automation via a web application was the speech-to-text, not processing commands once they were in text. A list of regular expressions works quite well for that.

The HTML5 Speech Recognition API in Chrome kinda sucks. It does speech to text well, but reliably keeping the API listening for speech at all has been challenging. Even a bunch of code basically checking "has the webkitSpeechRecognition object borked itself yet? recreate it and restart listening" every two seconds doesn't work reliably.

I'd love a JavaScript API that can listen to the microphone, determine if anything has been spoken (versus silence or background noise), and when something that may be speech is detected, send it to another API endpoint that converts it to text.

Edit: They do take audio input, woo :) Thanks for the correction. https://wit.ai/docs/api#toc_9

Their documentation has a pretty clear endpoint for sending raw audio: https://wit.ai/docs/api#toc_9

Thanks, don't know how I missed the links at the top of the page. I only checked each category on the left nav of the docs page.

The docs were outdated indeed, we fixed this, thanks!

They have JS, IOS and Android sdks, they all do speech to text, then the text NPL processing.


I'm blown away by how much I can accomplish with wit.ai. I have my own personal jarvis / siri system thanks to this service.

Can't recommend this service enough.

Is your system open source? Would love to take a look at how it works.

A lot of love here for wit.ai, I was thinking about transactional contexts today (NLP conversations) and if I could support that kind of workflow with wit and 'oh hey there ya go its already in there as states!'. Great work!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact