
Two Years of Voice-Based Assistants, Echo and Home - dgacmu
https://da-data.blogspot.com/2016/12/22-months-of-voice-based-assistants.html
======
Waterluvian
What I need is _natural_ natural language processing to become so good that it
just works. I don't respect voice as an input method because I feel
uncomfortable using it, since I have to pre plan my question rather than
stream consciousness like I will to my family and peers. I want to be able to
have this conversation:

"Computer, how much does a.. hmm what's it called.. uhh.. a loud keyboard,
what's that called?"

"A mechanical keyboard?"

"Yes. The ones from the.. uh I forget the company name."

"Corsair? Razer?"

"Razer! How much does their newest mechanical keyboard cost?"

"The newest keyboard from Razer is...<etc>"

~~~
dqv
>What I need is natural natural language processing to become so good that it
just works.

That's _supernatural_ language processing. You're asking the system to
recognize a thought you have in your mind that you can't effectively
communicate with words.

The loud keyboard - makes sense. But the company name? My first thought was
IBM or Cherry.

I think it can be improved, but I'm never going to expect voice recognition to
fill in the blanks for something I can't describe with my voice.

~~~
frankomonster
Filling in the blanks is something that humans do all the time with
conversation. Just the other day I was asking my girl what was the last name
of that woman named Susan from Britains Got Talent. Conversation went like
this:

Me: Babe, do you remember the last name of Susan from Britains Got Talent?

Her: Susan... kinda...

Me: I think it began with a "B"

Her: Boyle!

Maybe todays systems would understand me if I phrased the question in a
context they understand but why bother taking the time. I can just talk to a
human.

It'll get interesting though as computers continue to improve.

~~~
harisenbon
I was curious, so I asked Siri your question: what is the last name of Susan
from Britain's got talent? She successfully answered Boyle.

------
skoocda
Children's voices are a surprisingly tough problem in speech recognition.
Mostly because there isn't much labeled data with children's voices- ASR
follows the trend seen in other deep learning fields of working best for North
American adult males.

When children talk, it's a significantly different pitch than any adult (male
or female) and their enunciation is usually poor. Being robust to that range
requires a ton of data and a very deep neural network. It will definitely be
solved earlier on the cloud: don't expect super-adaptable speech recognition
to be available on your phone any time soon.

~~~
gok
There's also a legal/moral question of whether it's ok to capture the speech
of children to build better models for this kind of thing.

~~~
grzm
Interesting question, but I'm not sure why it would be any different from
monitoring the adults who bought and installed the system. It would be part of
the terms of service, I would think. The adults accept this on their behalf.
Would there be another interpretation?

~~~
gok
In the US, COPPA makes it very difficult to legally collect information from
children under 13 for commercial purposes.

~~~
grzm
Interesting. Do you know of specifics about COPPA (or COPA) that would apply
to this situation?

~~~
gok
COPA is a piece of dead legislation. COPPA is in effect since 2000.

The big issue recently is that in 2011 the FTC made the rules much stricter
about data collection [1]. Parental consent now requires identification checks
which are hard. Data retention is also a bit of a mess; guidelines now imply
that the data should be deleted as soon as possible.

Morally, I do feel like there's a bit of a question here. Is it ok to have a 6
year old donate her voice to improve your speech recognition product even
though she wouldn't directly see a benefit from it?

[1] [http://www.natlawreview.com/article/ftc-will-propose-
broader...](http://www.natlawreview.com/article/ftc-will-propose-broader-
children-s-online-privacy-safeguards)

~~~
icebraining
_Is it ok to have a 6 year old donate her voice to improve your speech
recognition product even though she wouldn 't directly see a benefit from it?_

Why couldn't you provide some direct benefit? IIRC that's was the point of
Google Voice: provide a free product in exchange of getting people to help
them improve voice recognition.

------
donw
I'm going to stick my neck out as a potential Luddite here, but outside of
playing music, and some general "answering questions", I don't see a use case
for things like the Echo or Dot.

Being able to ask for timers, or unit conversions while cooking, is probably
the biggest bang-for-the-buck that I get out of Siri.

But outside of that, there's nothing that a Dot does for me that warrants
having a microphone in my house that is 24/7 connected to Amazon.

Not having some sort of voice-print analysis is also a real concern. A friend
bought an Echo a little while back, and me being me, I couldn't resist the
urge to ask it to order 10 large bags of kitty litter... which it cheerfully
tried to do.

Maybe I'm just odd. What do other people use these things for?

~~~
agildehaus
Under your definition, your phone is a 24/7 microphone connected to either
Google or Apple, your PC a 24/7 microphone connected to Microsoft and
literally every other vendor that has a service running.

The privacy concern is valid, but its also valid for literally every other
piece of electronics. At some point you have to trust that it's only recording
during the short period after you say "OK Google" and if it were doing
something more nefarious someone would figure that out and it'd be huge and
damaging news.

These are basically first-generation products. The idea is to improve upon
their problems, find where they are useful and where they are not, and step
closer to having a real virtual assistant.

~~~
visarga
TL;DR - automate privacy protection in order to serve the uninformed masses of
people leaking sensitive data online

We're disclosing more and more private information to assistants, but
especially to Facebook, Google and the respective phone company we're using. I
think it would be the case to study automatic detection of sensitive
information disclosure, in order to place better privacy guards.

I envision a system where the web browser or voice agent would immediately
know the sensitiveness level of information we are about to divulge, and route
it through anonymous systems or block the leak before it happens.

A database containing our online identities, credit cards, passwords and text
run through a topic classifier would make good features for sensitive
disclosure protection. It would be the privacy equivalent of antivirus
software. Maybe we could have a smart (privacy protecting) web browser and
agent.

In the future, people are going to have to convince their personal assistants
if they need to disclose any private info to third parties. It's not going to
be so easy to collect massive hordes of private data about people. It's just
the natural step for privacy, in a world where AI is already being used so
much to undermine it. Time to get some AI fighting for our side of the privacy
war.

------
tootie
I'm still ambivalent. I think this tech will end up as another dead end.
Playing music by speech is a novelty. I don't see anything really interesting
coming from this space.

~~~
wsh91
It's way more than playing music by speech. It's looking up stuff, controlling
your house, checking restaurant hours, all sorts of functions that would
otherwise be fulfilled by a smartphone with less input effort.

Give one of these devices a try, you might be surprised by how much you adapt
to them.

~~~
te_chris
This. Also, because it's a home device it can just sit there, being used when
you need and ignored until you need it again.

------
chris_st
Possibly the coolest thing about the Echo is developing your own skill. If
you're comfortable with JavaScript (or, I believe, python or Java) you can
really quickly put together a skill, which in development mode only you can
use. Their example templates are really clear.

So a lot of this would be easy to set up, if you're willing to do a bit of
below-IFTTT type programming. And, since it uses AWS Lambda, your usage is
almost certainly going to stay in the free usage zone (you get a comparative
ton of compute time for free each month with Lambda).

I'm sure Google has, or will soon have, such a development environment for
their device as well.

~~~
dgacmu
I think it's Actions on Google (but I've never tried it):
[https://developers.google.com/actions/](https://developers.google.com/actions/)

It looks like there's a "user friendly" version that's api.ai, and then
there's an SDK, and the SDK seems to be node.js based. I'm not clear from
reading it if you can do a private version just for your device. I'll have to
kick the tires one of these days - thank you!

------
westmeal
Things like the Amazon Dot and Echo make me extremely uncomfortable. If I had
to create some sort of voice assistant it'd have to store voice data locally
instead of being on some server somewhere.

~~~
deegles
You can delete all of your voice recordings from Alexa using the companion
app.

------
mxstbr
I got a Home for Christmas, now I'm even more excited to try it!

The article doesn't really got into it, but it'd be interesting to know why
the authors family now only uses Home vs the Echo previously!

~~~
dgacmu
The Home did a better job of interpreting requests, mostly. We found that we
had to contort ourselves more to get the Echo to do some things. "Play the
Nutcracker" \-- no go. Even though it was in our library. "Play tchaikovsky's
nutcracker suite" would work, IIRC. The home handled both. Informational
requests also worked a bit better on the Home. ("What is a..."). We weren't
using any extra skills that we'd miss, so it was a very easy transition.

I wish the Home had the equivalent of the Dot. I wish the Echo had the Home's
inference capability.

------
lowglow
If you're interested in building voice based personalization models, we're
going to be tackling this problem. I'd be interested to chat with anyone with
experience in this area. Hit me up. :)

