
Amazon Lex – Build Conversational Voice and Text Interfaces - appwiz
https://aws.amazon.com/blogs/aws/amazon-lex-build-conversational-voice-text-interfaces/
======
djyaz1200
Seems like these chat bots have A LOT of lock in. So the question is which one
to pick? This one? IBM Watson? Others? Which horse are you betting on and why?

Amazon making this is great because it's developer friendly and will surely
fall in price and improve in quality over time. They also have to make it work
well because it's a critical ordering channel for them going forward.

Seems like IBM has been focused on this for years though? Worried they will
try to make this a profit center though... rather than continually dropping
the price.

Other companies working on this I should be aware of?

Thoughts?

~~~
olavgg
I'm working for a startup called Boost.ai where we have decided to build our
own deep learning model and create our own training data. We use several
models that predicts what word comes next, take care of mispellings and
dialects. We also have memory so it can remember contexts when you ask a new
question or it predicts that you want to start with a new context.

If you want something kick ass, I highly recommend building it yourself. The
bar set by IBM's Watson, Nuance's Nina or IPSoft's Amelia isn't actually very
high. For non english languages, anyone who has some knowledge about NLP and
deep learning will easily surpass them.

~~~
choxi
Any pointers on where to get started? I've messed with Karpathy's RNN example
using Shakespeare and PG essays, but don't know where to go from there.

~~~
olavgg
Karpathy's example tries to predict the next character, there is a fork of it
that tries to predict the next word.

[https://github.com/larspars/word-rnn](https://github.com/larspars/word-rnn)

RNN is great approach if you want to play around with text generation with
deep learning. But for a chatbot, deep learning alone is not there yet. We
create our own intents based on the domain and predict the intent of each
question. We have also made it looks smarter by creating an intent hierarchy
where we try to do multiple predictions for a question with a goal to drill
the question down in the intent tree. In that way we know that the question is
about a bank card and can figure out if you want a new one, block it, increase
credit, set limits and so on.

------
mtthwmtthw
Are there any open source project like this? I would imagine it's a machine
learning model to match intent and then a custom NER that extracts slots. I'm
sure the actual models are pretrained on lots of data, and the process is
probably a lot more complex. Given the abundance of vendors in this space, it
has to at least be a possibility .

~~~
sprobertson
For very simple human language to intent parsing (no prompting or context
keeping), here's a RNN model in Torch that learns intents and slots based on a
bunch of template sentences:
[https://github.com/spro/intense](https://github.com/spro/intense)

------
zitterbewegung
Making the backend of Alexa and offering it to users is a killer feature. I
have toyed with the idea of making an app with a voice interface. I was able
to make an alexa app and looking at the preview pictures it looks similar.

~~~
mtrn
> Making the backend of Alexa and offering it to users is a killer feature.

Same scheme that started AWS.

------
hackcrafter
This is neat but I've been on the lookout for a service that takes _speech_
and responds with _speech_ that you can use in a domain-specific app.

Basically, Alexa as a service?

Is there something out there that does this?

For certain things that require hands-free usage, this would be a killer
feature.

ex. a workout-app that tells you your next rep and weight, and you can respond
with what weight/reps you did to add to your log!

~~~
ronack
Pairing Lex with Polly should get you there.

[https://aws.amazon.com/polly/](https://aws.amazon.com/polly/)

~~~
spitfire
I wish they had a demo of Poly on the site. It's pretty bad to have page after
page of text about a voice synthesis service and not have "Click here for
demo" above the fold in the first page.

I too, am very interested in a domain specific application of lex+domain
knowledge+Poly

~~~
OJFord

        > It's pretty bad to have page after page of text about a
        > voice synthesis service and not have "Click here for
        > demo" above the fold in the first page.
    

There's a table of demo clips in male/female voices for a few languages at the
bottom of the first page.

------
david927
My daughter, a 6th grader, wants to create a robot that can take voice input
and act on it. I think this could be a great place to start.

~~~
83457
In intro to comp sci we had an open project using an irobot bot (roomba
without vaccum) so my group of some of the more experienced students pieced
together a robot dog using a laptop and xbox kinect on top. You could say
commands and it would obey such as sit (moves back and stops), rollover
(spins), bark (sound from speakers), etc. It would also track a primary user
and if you said follow it would stay a certain distance near you and follow.
This functionality was "easy" with the kinect sdk capabilities as it was just
a matter of connecting events from kinect code to commands to send to the
robot. If given more time we could have done stuff like hand movements or
throwing motions with the kinect skeletal tracking and vectors.

~~~
david927
That sounds great! You should post a video of it.

Personally, I was thinking of something cheaper, such as a Raspberry Pi with a
stepper motor.

~~~
83457
unfortunately I didn't get video (as silly as that sounds) and the person who
did never sent to me

I even put my son's furry dog halloween costume on it :)

------
wyldfire
Are all these Amazon stories from re:Invent? Would it make sense to combine
them into a single index today?

~~~
ceejayoz
> Would it make sense to combine them into a single index today?

No. You'd have two thousand comments about totally disparate systems to filter
though.

~~~
wyldfire
Yeah, you're right, that would be confusing.

------
jlkjr2
Does anyone have experience with the Web Speech API?
[https://www.google.com/intl/en/chrome/demos/speech.html](https://www.google.com/intl/en/chrome/demos/speech.html)
What speech engines are behind these? Do they run in the browser? I want to
implement speech input commands for an SPA for 3D design. Thanks.

------
dominotw
>you know how simple, useful, and powerful the Alexa-powered interaction model
can be.

This is so comical. Alexa is phenomenally bad at conversation, it is so bad
that there are almost no successful "apps" built around it despite having an
API and app platform.

Alexa is decent at single command/action model nothing more than that.

~~~
chishaku
What are the best alternatives?

~~~
sgwealti
The "Ok Google/Google Now" voice interactivity (or whatever the official name
is) on Android phones is much better than Amazon's Alexa/Echo. It actually
remembers context from sentence to sentence so you can say "What is the
population of Chicago" and it will answer. Then you can say, "How long would
it take me to drive there?" And it understands that you are still talking
about Chicago.

Every interaction with Alexa is command, response, and that really limits what
you can do with it.

~~~
jon-wood
There's nothing about Alexa's API that would prevent doing that, it already
has the concept of a conversation and I think we'll see raid improvement in
how people make use of that.

~~~
sgwealti
I agree. I'm just surprised that none of the base functions provided by Amazon
take advantage of it. Compered to google it feels really basic and limited. I
saw this as the owner of two Alexa's who uses them a lot. I'm constantly
frustrated by the limitations even though it works well for certain things.

------
ronack
It's great to see so much competition in this space. This initial offering
looks limited in its ability to handle more complex branching and context
building but hopefully it will evolve. I'm also a little surprised at the
pricing given that Google and Facebook offer comparable services for free.

~~~
sparky_
Ditto on the competition space. In my mind, letting Amazon and Google fight
out the epic war of machine learning / intelligence will only yield better
APIs for those of us wanting to leverage such technologies.

------
danso
Very nice! With all the great things they've announced today that apply to my
work (Lightsail, Rekognition, Athena), I'm crossing my fingers that their next
announcement is making speech recognition available as a separate API. And
Python 3 support for Lambda.

------
simonebrunozzi
Great presentation by Matt Wood. I've been working with him for years (until
2014) and I think he's a great presenter (among other things).

------
keyboardsamurai
What exactly is a "speech request" in Amazon Lex parlance though? Their
pricing is missing a specific timeframe isn't it?

------
sumsted
It's great to see the utterence/intent/slot model to be available outside of
Alexa. It's been easy to work with so far in Alexa. And you see a similar
model from Nuance with Mix, though Mix seems to be stuck in beta.

Lex looks alot like Alexa though the setup flow is a bit different. Also has
prompts for each slot needed. That's nice.

------
Roritharr
Does this work with languages other than English?

