Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Wit – Natural language for your app (wit.ai)
321 points by ar7hur on Sept 12, 2013 | hide | past | web | favorite | 95 comments

It was so confusing figuring out what this service is supposed to do. Had to look up the documentation. In summary, from what I can gather

1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).

2. Looks like they do some form of text normalization/correction, again not clear how they do it.

3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).

4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.

This different from alchemy (or many others) because this is open source(?) http://www.alchemyapi.com/products/features/entity-extractio...

Given this service was for developers with an interest in NLP, it would have been good if they didn’t hide behind a snow job title like “Siri as a service”.

> 1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).

Currently most Wit users use Google or Nuance with great success. You can even use Android's offline speech rec.

That being said, CMU and OpenEars work well, as long as you provide them with good language models (which you can't do if you hack a quick project). Our plan is for Wit to automatically generate the right language models from your instance configuration.

> 2. Looks like they do some form of text normalization/correction, again not clear how they do it. 3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).

We abstract the full NLP stack for the developer. How we do it is not really what matters to our developers, as long as it works :) Actually we use a combination of many different NLP and machine learning techniques.

> 4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.

You don't need to train hundreds of examples. Plus, our users are not NLP/ML experts and they prefer a graphical UI. But that's true it could be still more efficient, we have good features in the roadmap for that :)

> This different from alchemy (or many others) because this is open source(?) http://www.alchemyapi.com/products/features/entity-extractio....

Alchemy is great as a set of NLP tools, some of them quite academical, but it's not designed from scratch to solve the problem we're trying to solve: enable the masses of developers to easily add a natural language interface to their app.

> How we do it is not really what matters to our developers, as long as it works :)

How you do it is most certainly what matters to developers, as soon as it doesn't work as expected :)

Fair enough.

Theory is when you know everything but nothing works.

Practice is everything works but no one knows why.

Here, theory and practice are combined: nothing works and no one knows why


You could SiriProxy (https://github.com/plamoni/SiriProxy) as input to this to handle to speech-to-text part

I'm running this at home and it works great for adding custom actions to Siri

I was initially thinking that this is just a SaaS2SaaS Wrapper around Google/Apple Speech recognition. Hey, tbh. wouldn't that be a clever way to market? hehe, that reminded me to SiriProxy.

But to give the authors their credit back, it's not what I guessed. It's much more a GUI around a complex toolset that would require you to dig deep into mudwater, bad docs etc. So, yes it makes life easier and sense to use this in your app. I have not evaluated the quality of their service yet, but it's a Startup, it's not going to stop improving (hopefully) :)

(Google/Apple are essentially powered by "Nuance", but with different qualities of training-data.)

More NLP/AI Startups and more colloboration on HN please!

2cents: I hope people don't sell to their first working protoype to the Google/Apple/Microsoft Empire, but try to get big on with friendly startup-colloboration and with the help of investors/angels.

Google doesn't use Nuance.

Oh, that's interesting. Do you have more information?

Maybe I'm out of line, but if you're planning on tearing into what's wrong with something, try to offer something positive as well. Your feedback was great and constructive, it's just not very nice.

I'll be honest, HN has a tendency (myself included) to have a first natural reaction of "how can I criticize this?" But just because something isn't faster than enterprise, or not-as-scalable, or not made in your framework of choice doesn't mean it's worthless. I think this project is amazing. Great job and I can't wait to see this mature.

Is there any way to view this page with the effects turned off? With all the text constantly appearing and disappearing, I haven't yet made it to the end of a sentence, and therefore can't form an opinion about it.

I think there was a picture of a robot on the screen for a few seconds, but that's all I remember.

Would disabling javascript do the trick?

Hi, fixed it! Thanks for that first feedback!

EDIT: All animations (except "What we do") should be disabled. Please, email me at willy@wit.ai if you still have issues.

Better, but it still seems to have lots of things happening on timers. So I still have things I'm trying to read disappearing out from under me.

I imagine as the developer you don't notice it. But as somebody trying to read a page, it's really jarring to have that happen. Enough so that I give up trying because I just want it to stop doing that to my eyes.

Any chance you could turn it off completely and just put some arrow icons on there?

Same here. I find "siri as a service" an interesting project. But not interesting enough to cope with a page that blends in and out content and makes my head spin.

Not involved with the project at all, just wondering what browser you're using? It seems pretty natural and all working fine for me on Chrome.

That looks really interesting.

You should make it clearer that you don't actually handle voice recognition. When I read: "Developers use Wit to easily build a voice interface for their app." I expect you to handle things from start to finish.

Also, let me try it! It's frustrating because the UI looks like you can experiment but it's only an animated demo (or am I missing something??) In particular the mic logo is used to record on Google and here it doesn't seem to do anything?

> You should make it clearer that you don't actually handle voice recognition.

You're right, we'll make it more clear on the landing page. A full out-of-the-box integration with some voice recognition engines (we love CMU Sphinx, open source) is in our roadmap.

> Also, let me try it!

We purposely didn't provide a "end-user" demo (something that would look like chatting with Siri) because we want to focus first on the developer experience, when they configure Wit to understand their very own end-users intents. You can require an invite and try this in less than 5 minutes.

> You can require an invite and try this in less than 5 minutes.

Fair enough, but then you should make it clear: "Want to try it out? require an invite and try this in less than 5 minutes!"

You usually have to wait several days when you apply for a beta like this.

I just accepted your invite :)

Requiring an invite seems like such a high barrier to entry if you actually want people to sign up. I usually skip any such thing, because instant gratification is great, and who knows when the invite will come in and whether I'll still care once it does.

Seeing this message, I bit the bullet and requested an invite anyway, and have seen no action in the couple of hours since... thus validating my initial reluctance.

I totally understand what you mean.

But as a bootstrapped startup, we have to make tradeoffs as our budget is limited. We have to accept invitations gradually today to keep our servers alive. We should be able to accept everybody within a few days at most. Sorry for the inconvenience.

I hadn't thought of the load. If it's there because you need to rate-limit users, then I completely understand. Would be ideal to take everyone instantly, but the world isn't always ideal.

The word "Siri" doesn't belong in the title or the article, unless a Trabant advertisement has the right to mention Mercedes-Benz in its promotional text. The project does a primitive kind of voice recognition, but it doesn't use Siri.

On this topic, I invite people to try out my non-prototype, non-project toy that uses Google's support for HTML5 speech recognition. It's pretty funny how wrong things go when you try to say something even a bit out of the ordinary:


If I say, "Now is the time for all good men to come to the aid of their country," an old teletype test sentence, the Google recognizer always nails it. If I say, "I hit an uncharted rock and my boat is being repaired," things go hilariously wrong, and every time differently.

I also found it going very well or very, very badly.

For instance: "In most cases, before beginning to listen, the browser will ask permission to monitor your microphone."

Came out as: "In Las Cruces, f listen, permission to monitor your microphone."

I think in the future, when computers are 100 times more intelligent than they are, we'll laugh at these examples. But no one should doubt the difficulty of interpreting continuous speech without prior training for a given speaker. It's no wonder that speech interpretation on telephones tend to be limited to understanding a handful of possible responses: "Yes", "No", "Let me speak to a human!"

I like the concept a lot. I'm going to have to read more about it. One thing that I'm unclear about is if this does voice->text, or if the developer does that and Wit handles translation of that into actions.

Just a heads up, but Get Started on the pricing page does nothing. It's natural progression for me to go home page->pricing->OK, looks good, let's get started.

Thanks for the feedback.

Wit takes the output of the voice recognition engine as input. It's quite robust to voice recognition errors. Most devs use Google's engine or the open source CMU Sphinx engine.

Fixed the Get Started link, thanks!

This is amazingly timely for me, I've been building my own version of jarvis using speakeasy-nlp (a node NLP library) and Chrome's builtin support for HTML5 webkitSpeechRecognition:

https://github.com/dpaola2/jarvis (work in progress)

I absolutely would love a better NLP api. Please let me in!

Jarvis-like systems are a great use case.

You should be able to sign up now.

You rock

Hey everyone. Wit guy here. We've been working on Wit the past few months and we think it's time to get your feedback. I'm happy to answer any questions you have.

Bringing Natural Language Understanding to the masses of developers is hard and we still have a lot of work ahead of us. Please don't hesitate to reach out to us!

Is Siri required? Is there an option for Android devices? I admit I didn't dive too deep into the website because I was looking at all the eye candy.

No, Siri is not required.

Here is a tutorial for quick Android integration: https://wit.ai/docs/android-tutorial

Nice concept, I just came back on here to let you know that I don't know what is happening on that page but I left it open about 45 minutes ago and noticed my fan kicked in a lot. It was that page I left open. Ended up taking 25% CPU, you didn't work on the iTunes software did you??

Only messing, it was taking a lot of CPU though.

Haha, sorry about that.. I guess we are better at Clojure than Angular.js!

Working on a fix now.

The .ai extension is cute :-)

I don't know if people will remember it and be receptive to this touch but I like it.

Interesting! Nothing happened after I registered with my Github account though using Opera. I also wonder how this compares to http://www.maluuba.com/ ?

> how this compares to http://www.maluuba.com

Wit is 100% open and flexible, you can create any intent you need for your app, you're not limited to a static set of domains/actions.

EDIT: @ragebol we are very interested in ROS and robotics, don't hesitate to get in touch with me arthur at wit dot ai. In the future we would like to provide an off-the-shelf human/robot communication module for developers.

For the robot I'm working on, we're using http://wiki.ros.org/wire to make probabilistic world models. When the API would not only return a confidence score for one possible interpretation, it would be interesting to get multiple interpretations with varying confidence scores, so you can somehow determine one that makes the most sense given your world model.

Awesome! I made a wrapper around Maluuba for ROS before (https://github.com/yol/maluuba_ros) and maybe I get to a Wit-wrapper for ROS as well.

With Maluuba, we can't make a command like "Introduce yourself" or "Grab that can for me" because of the limited set of categories. Wit should be able to handle those as well, from the looks of it.

We (Maluuba) are actually working on improving nAPI and now have the ability to define your own domains and actions. It's not public yet, but we're planning on releasing it mid November.

Nice! I'll check back on Maluuba in November.

How "open" is Wit? Open source or open APIs allowing users to modify your algorithms/training data? I assume there would be a significant cost to being able to run our own instance of Wit on-premise, right?

I applied for alpha access using the github username "marks"

> How "open" is Wit?

Openness is one of our core values. We're inspired by companies like GitHub.

- Regarding data, we encourage users to share their data, making “public” Wit instances free of charge (à la GitHub). We’re also working on standard formats for NLP/ML data (models, sets, etc).

- Regarding open-source, we plan to release our algorithms and infrastructure piecemeal (à la Prismatic). We’ll announce our open source plans in the near future.

I made a simple wrapper at https://github.com/yol/wit_ros

Are there any test cases I can use, which utilize the full power of the API?

Regarding Opera, it has been tested and everything should work as expected!

Can you email me your GitHub username at willy@wit.ai to make sure we got your request? Thanks!

Your message isn't clear. AFAIK there is no official way to interact with Siri or Google voice rec.

It seems like WIT will take the text that has already been translated from a user's voice to text and make it easily accessible to my application but how does WIT access the text generated from a Siri request in the first place for example? Does WIT have some other way of getting at this data that has already been converted from voice to text by Siri or Google or some other speech-to-text engine?

> AFAIK there is no official way to interact with Siri or Google voice rec.

Actually there are ways. On Android devices, voice rec is available to devs (even offline if the user enabled it!). We have a simple tutorial about how to integrate on Android https://wit.ai/docs/android-tutorial

Right now on iOS you have two options (none of them involves Siri, which is kept closed by Apple):

1/ Do the voice rec server-side (Siri does that)

2/ Use OpenEars to do it client-side

Server-side, you have many voice rec options, including open source CMU Sphinx.

Providing a fully-integrated solution with speech rec out of the box is in our roadmap.

You could read witai as "witaj" in Polish, which means "hello" in slightly official manner.

Sounds like they are trying to be this, http://www.youtube.com/watch?v=Ko-r4gpM3Rc

Except Stremor has a Query Language so you don't have to do anywhere near as much heavy lifting.

Hi Stremor! :)

Looks like you focus on search, summary, entity and sentiment extraction with a rule-based approach.

Wit's focus is to power human/machine interfaces, and our priority is to provide developers with a 100% configurable solution, with no prior assumption on their domain. And we don't believe in rules, we chose a machine learning approach.

No, PleaseAPI is converting Natural Language to Database like structures/commands with no need to have a human come up with every way you could say something because the vocabulary is built in.

Unlike Wit it also offers the option to use the API's that are already integrated or Bring Your Own Backend so that you can have a mix of info/responses from your own system, or leverage what is already there.

Where can I sign up for that?

It will be on Mashape shortly, the Natural Language Part of Speech Tagger just went live Friday. Documentation takes time after we write code and is less fun to write. :-)

Hi guys, here you can find the full API Documentation: https://www.mashape.com/lxbrun/nlp-and-voice-interface-for-a...

This would be great for open source projects, but I feel like I would trip over a very large pile of patents if I tried to build a product around it. I don't have any relevant experience myself though, so it's just a feeling.

The pricing model doesn't scale realistically and would require a subscription service for users. An app with 1M+ installs could do 1M+ calls per day making this service $24k / month.

That's why at the bottom of the pricing page we encourage you to contact us if you have more than 1M calls per month.

Meanwhile you can decide to share your configuration data and get Wit for free (à la Github) :)

I'm interested to use it in combination with a robot (NAO). Could you provide a tutorial for it. Not sure if ROS on NAO will be necessary or not.

This looks neat, I will definitely keep it in mind.

I would be weary of using the Github Octocat mascot though. I believe Octocat is protected under copyright.

Hi Tony,

After reading http://octodex.github.com/faq.html, we thought it was okay to put this image given that we advertise and reference GitHub a lot, we heavily integrate with it and we love Octocat!

Do you think we should remove it?


The landing page looks like a minefield of legal issues. Marketing everything explicitly with the references to Siri is asking for a law suit from Apple.

:) I just wanted to say that it looks like a really cool project, so I would not want some troll to come and stomp it out.

How does this compare and contrast to http://www.ask-ziggy.com/ ?

We share the same vision that voice becomes the key human/machine interface, especially for the upcoming generation of wearable devices, home automation, etc.

I don't know if Ask Ziggy is 100% self-service for the developers. That's a key requirement for us.

At Ask Ziggy, we are also 100% self-service.

Is there a pricing page?

Not at this point. Feel free to sign up for the beta on our site and take it for a spin, I'll make sure to get you your credentials quickly. PS: by the way, big fan of historio.us...

Already did, thanks! It looks pretty nice, I'd like to implement it in a few of my apps for an easier UI. I'm glad you like historious, thanks!

hah! My startup is doing a similar platform in a little bigger scale. I realized I did pretty bad on the hackathon :( http://on.aol.com/video/jarvis-2-0-demo-at-hackathon-sf-2013...

I'd love to see the full video of your demo, it looks cool!

This looks really promising - can't wait to see where it is in a year with community additions.

This "fade-in as you scroll" thing is annoying. Get-rid-of-it-right-now kind of annoying.

Just removed it, thanks :)

How do you compare to Ask Ziggy? It seems you have created an interface similar to theirs

Actually I think the developer UX is quite different... but you should try both and make your own opinion.

what about languages other then English ?

French + English available now. Which language would you like to get?

  - Spanish (Mexican, Castellano, others?)
  - Chinese (Mandarin, Cantonese)
  - Hindi
These would be logical next steps with some important commonalities: broad base of native speakers, high importance in the US market (maybe less so for Hindi), and very important dialect differences. Mandarin, Cantonese, Hindi, and Russian could also force the issue of non-Romanized character sets.

Dutch... But I guess the odds are low for that, because the smaller user base. There are some companies focusing on care robotics emerging in the Netherlands though, which could use a service like this.

Japanese, due to demands for translation and robotics.

As a Service? Why not go open source?

Because people like to make a living from it when they work on a product full time.

How would you make money with that? honest question.

It's obvious to most people (regardless of whether you think that free software is in important principle) that the techniques described in that section will result in you making MUCH LESS money, at least in the current context of these people selling their API.

Ignoring any open vs. closed source arguments, this kind of thing would work much better as library than some third party service. There's no way I'd make an application that depended on a third party service over the internet for basic UI interaction.

You're absolutely right, that's why we plan to release a "run on your server / embed on your device" runtime.

Having it online for configuration has at least two advantages:

1/ Easier to start (1 minute and you're playing with it)

2/ Leverage existing, "live" data to build your configuration

In that case, I'll be on the look out when it's released!

Great job on this. Are there any plans for Ruby or javascript tutorials, or am I being too optimistic?

Here is a node.js tutorial: https://wit.ai/docs/nodejs-tutorial

We'll release a Ruby tutorial soon!

Love you guys already. Thank you!

Looks great. Will this work with web apps too, or only mobile apps?

I wonder how easy it would be to get it working on the desktop, like Palaver does on linux.

It actually works with web apps as well.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact