
Show HN: Siri-as-a-Service Speech API - ar7hur
https://wit.ai/blog/2014/02/12/speech-api
======
kleiba
Nice demo! But someone has to play devil's advocate, so please allow me: why
would I send commands about what I do in my house to your servers? Especially
since you apparently even store them there (s. Inbox)?

Would you consider offering a version of your program that I can download and
run on my home server? That would be cool...

Finally, since the title is "Siri as a Service", where do you expect the
microphone to be in a Home Automation setting? Do you envision people using
their cell phones for that?

Thanks.

~~~
ar7hur
Yes we plan to release an offline lightweight runtime that allows you to run
Wit locally.

Many home automation systems will have a built-in microphone (or most probably
an array of microphones, which is efficient to cope with background noise).
But your smartphone might be useful in case you are in the garden for
instance!

~~~
13throwaway
I'm very happy with this answer.

------
danlash
We just did a hack-week project using wit and we were blown away at the
accuracy of it. Somehow it figured out 'show me the stories assigned to me'
meant our 'find-owned-stories' intent ... and that was without any training!
Maybe it was luck, but after using it for hours and hours it really is
remarkable.

------
endlessvoid94
Oh my. Cannot wait to give this a shot. I played with Wit awhile back when
they launched, and was seriously impressed. This will make it much easier to
develop technology for their NLP.

One step closer to Jarvis.

------
imtu80
Here is a link of an open source project.
[http://cmusphinx.sourceforge.net/](http://cmusphinx.sourceforge.net/)

Looks like they are using it for their project
[http://cmusphinx.sourceforge.net/2013/09/processing-
speech-r...](http://cmusphinx.sourceforge.net/2013/09/processing-speech-
recognition-results-with-wit-ai/)

~~~
ar7hur
Wit and CMU Sphinx are friends with benefits :)

Wit leverages several speech engines, including Sphinx.

And Sphinx speech-to-text users can send text to Wit to do Natural Language
Understanding (turn text into actionable data).

~~~
ragebol
Does Wit also use the fragments of speech that get sent over the API as a
dataset for training Sphinx?

~~~
ar7hur
Yes!

------
forgotprevpass
This is the first startup in the last 6 months that has me genuinely excited.
Congrats!

I know this is considered bad mannered, but what's the NLP behind the scenes
look like? I'm curious. :)

------
ragebol
I'm very interested in using this for robotics, especially RoboCup. That that
latter case, I can't use an internet connection, the robot must be fully
autonomous and independent.

Can I run an instance of a wit server myself? And perhaps update its speech
models regularly?

~~~
ar7hur
Yes you'll be able to run a lightweight Wit runtime locally very soon.
Learning will still happen on the server. The embedded client will upload its
usage data to feed training, and download updated models (for both speech and
natural language understanding).

~~~
ragebol
What does "lightweight" entail? Only that the learning is online? What would
still be awesome.

~~~
ar7hur
Lightweight = pure C + runtime only (as opposed to the training side, which
needs lots of CPU and memory resources)

------
balakk
Nice - does this work for unconstrained speech - as in would this work for
use-cases like transcription? What's the maximum audio length supported? Is
accent an issue? Sensitivity to background noise? A FAQ would be nice to have.

~~~
weixiyen
FAQ seconded. This looks like a potentially amazing product, especially love
the potential home automation use cases.

------
untog
Does anyone have any recommendations for the other side of this - translating
text into speech?

I know there are a million APIs for this, but most sound awful. I'd love a
service that sounds as good as Siri.

~~~
weirdcat
IVONA has some great voices:
[http://www.ivona.com/en/](http://www.ivona.com/en/)

Alas, all of their SaaS plans have the same price: "negotiable". I never felt
like negotiating, so I have no idea if they're affordable; unfortunately
custom pricing usually means it's not the case.

[http://www.ivona.com/en/saas/offer/](http://www.ivona.com/en/saas/offer/)

~~~
untog
Yes. This is what I absolutely hate - I will never 'ask for prices' because I
know it involves a harass-y phone call in my future. Such a shame people
aren't more open.

------
snippyhollow
Which speech recognition engine do you use? Kaldi? Sphinx? Good ol' HTK?
Homemade? With deep learning acoustic models?

~~~
ar7hur
We use several speech recognition engines in parallel, including customized
Sphinx. We don't use deep learning acoustic models yet, but that's in the
works.

~~~
x3c
Off topic but a question regarding text to intent object parsing:

Is there any date entity in the wit directory? How do I parse a date for e.g.
17th Feb or feb 27 or 13/02?

Datetime only has contextual date selection i.e. today or tomorrow.

~~~
ar7hur
`wit/datetime` does parse both absolute dates like "17th Feb", "feb 27" or
"13/02", and relative dates like "tomorrow". If you find a date that is not
parsed that's a bug.

------
mikeash
I love to see this service improve. I just wish I could come up with something
really neat to build with it!

~~~
sdegutis
Jarvis!

------
jahansafd
This is pretty cool!

Google's Web Speech API can also be used to build something similar.

Here is the Web Speech API Demonstration:
[https://www.google.com/intl/en/chrome/demos/speech.html](https://www.google.com/intl/en/chrome/demos/speech.html)

~~~
ar7hur
Google has achieved impressive accuracy for speech-to-text (especially for
open-domain large vocabulary speech recognition).

If you use Google Web Speech though, you receive text and you still have to do
NLP to "understand" the user intent. The other problem is, if Google does not
know about specific words (like your company or product name), you have no way
to customize the engine (no "Add to dictionary" entry point in the API!).

------
yeukhon
My friends and I did similar thing back in HackMIT.
([http://hackmit.challengepost.com/submissions/18093-jarvis](http://hackmit.challengepost.com/submissions/18093-jarvis))
They did most of the work, so kudos to them mostly. If and only if we had the
time to push our idea forward then today it would be us lol

The idea is exactly like Wit and it was supposed to integrate with all kinds
of web services out there, to be as close as Siri, but do a lot more than just
opening up new app or visit a weather forcast website. Bascially, your virtual
assistant operates like IFTT. At the end, I think speech into home automation
is the future gold mine.

------
swalsh
Click on, and play for a bit. This is actually a very impressive platform.

------
Mizza
This is cool, was just talking about doing something like this the other way
to create a "Natural Unix".. "copy the file to the desktop folder, then run
the script"..

------
dnns
Really cool service. Had to try it out today and built a little prototype to
interact with maps using speech recognition. If someone is interested:
[https://github.com/dwilhelm89/SpeechMap](https://github.com/dwilhelm89/SpeechMap)

------
ragebol
Awesome, can't wait to try this out.

I guess I'll have to update my ROS wrappers at
[https://github.com/LoyVanBeek/wit_ros](https://github.com/LoyVanBeek/wit_ros)
as well.

------
duyhuynh
This looks great. What are the alternatives to wit.ai today? In general, what
are the best natural language processing API / library / service out of the
box today?

------
fuzzythinker
For more on previous discussion on HN:
[https://news.ycombinator.com/item?id=6373645](https://news.ycombinator.com/item?id=6373645)

------
rahij
Very cool! On a side note, Is the voice Drew Houston?

~~~
ar7hur
No -- almost. Try again!

~~~
rahij
Ah Paul Graham it is!

------
lcasela
Coolest thing i've seen on HN for a while.

------
Sir_Cmpwn
Why do you not support Firefox? WebRTC and WebAudio have hit Firefox stable.
Do feature detection, not UA detection.

~~~
blandinw
This:
[https://bugzilla.mozilla.org/show_bug.cgi?id=957724](https://bugzilla.mozilla.org/show_bug.cgi?id=957724)

~~~
Sir_Cmpwn
Are you involved with the service? This seems like a very silly bug to
completely drop Firefox support over.

------
jamesfranco
[http://www.maluuba.com](http://www.maluuba.com) Does the same as wit.ai I
guess

~~~
ar7hur
Maluuba is a nice Siri-like virtual agent app for Android, but Maluuba's API
only offers predefined intents (across 23 categories). Developers cannot
create intents for their own domain.

Also, to my knowledge they rely on Android's speech recognition and the API
accepts only text, not audio streams.

~~~
jamesfranco
I contacted them a while ago and they stated that they will let developers
create their own intents soon.

[http://www.ask-ziggy.com](http://www.ask-ziggy.com) is also similar

------
rayiner
Wait, what's Wit's (hard to Google...) connection with Siri and SRI?

------
lcasela
I just tested the html demo tutorial thing. This thing works amazingly.

------
BenjaminN
This is awesome. Definitely using this soon.

~~~
ar7hur
Thanks! You can sign up on [https://wit.ai](https://wit.ai)

~~~
nighthawk24
All cool, but who is behind this project?

------
smefi
Very interesting, I will check it!

------
mtreder
Too cool to be true :). Love it!

------
eric_khun
awesome idea!

