
Living with and building for the Amazon Echo - edward
https://medium.com/@sicross/living-with-and-building-for-the-amazon-echo-525caea9f280#.fzbce5e7h
======
roel_v
Who wants to have a black box in their home (or bedroom!) that listens to
every sound, connects to the internet and over which you have no control? Sure
I believe Amazon when they say they only send clips of the actual commands,
but how long before these things get rooted left and right?

I build my own version of this, which I can customize as I want. I can get the
weather (my central home automation server pulls forecasts from
wunderground.com, current temp from my own sensors, the voice control unit
then pulls data from there), open/close curtains and turn lights on/off, and
it says the time. I have a small list of other 'dialogues' (as they're called
in my system) I'm going to add when I have some time, but I'm still figuring
out what functionality is worthwhile.

My software is based on pocketsphinx on raspi, so it's easy to put one in each
room (I have only one now myself). I'm using a mxl ac404 teleconf mic, which
works ok; you do have to speak up for it to pick up the commands. I'd love to
get an echo to see how much better it works. I have a primitive 'tts' system
that plays back prerecorded messages, and falls back to festival for unknown
words. I paid a voice actor a bit through fiverr to get the 100 or so words I
need. Sounds better than full synthetic tts systems, although I need to work
on improving the timing between words.

This is only a few days worth of work, too. It's not hard to make. I'm not
claiming my systems works as well as an echo, but I prefer the control mine
gives me.

~~~
AndrewUnmuted
> Sure I believe Amazon when they say they only send clips of the actual
> commands

I worked on speech parsing software at Audible, where much of the data was
originally gathered to make Alexa and the Echo possible.

They are not telling the truth when they say that they only send clips of the
actual commands.

EDIT:

This is not to say that they constantly stream audio data, either. But they
send much more than just the voice extractions of the commands themselves.
They have to in order to build a profile of the users' voices, habits, etc.,
which aid in the quick processing of incoming speech data.

The service is not selective enough to only pick up on voice information
proceeded by a correct utterance of "Alexa." Amazon is "customer-obsessed" \-
and one of the product execs asked on a phone call:

> I am an Alexa user, and call out her name but stutter slightly. I still
> intended to say "Alexa," so why shouldn't she respond to me? That's a bad
> user experience. Customers want to be understood, not ignored.

This is paraphrased, but the consequences of this question were enormous. It
basically ensured that Alexa users would not ever have privacy again.

~~~
roel_v
Does the audible/amazon stt engine train its own language model and thus
'learn' the user's speech peculiarities? If so, how does it know what the user
really meant and thus should train towards?

~~~
AndrewUnmuted
I can't speak to the training of the models. I'm not a data science guy, I
specialize in signal processing and feature extraction.

What I can say, though, which may give good insight into your question is
this: The "Hushpuppy" project, which became known as the WhisperSync and
Immersion Reading technologies, were not self-trained or self-reinforced. QA
engineers in India were contracted to ensure synchronization between the book
and the audiobook. These technologies, though, were the building blocks upon
which Alexa was developed. The data collected from these operations became
essential knowledge for the engineering team that developed Alexa.

EDIT: As an aside, Amazon bases all of its services around that Amazon account
that almost everyone in the civilized world now has. Alexa is not the first,
and certainly not the last Amazon service that learns and improves based upon
the user's Amazon account.

~~~
roel_v
", I specialize in signal processing and feature extraction."

In this context, do I speculate in the right direction when I think about
isolating voice from background noise? Any tips to share that could help my
diy tinkering?

------
david_mitchell
This is something I would both love to develop something for but could never
allow into my house. I know I could do both but it feels kind of wrong to make
anything that might encourage other people to have one given the terrifying
privacy implications.

~~~
IanCal
You might want to have a look at jasper:
[http://jasperproject.github.io/](http://jasperproject.github.io/)

You can use an offline speech recognition engine in that.

~~~
StavrosK
How well does it work? I think recognition latency is pretty large.

~~~
CaptSpify
I use this. Yes, latency is kind of big, but it's tolerable. The big
difference for me is that you have to program every command. I've never used
the Echo, but my understanding is that it has a ton of pre-built commands that
you can use: set a timer, what is the weather, play somesong, etc.

For jasper (pocketsphinx) you have to manually program the action for all of
these. So it's a lot more setup. I still like it and use it all the time
though.

~~~
mk4p
How do you find yourself using it?

~~~
CaptSpify
I do a few things: control music (pause/play, volume up/down, etc), change my
lights (dim, bright, color), read off the weather, ask what time it is.

I've got a few other ideas: control my roku, add milk to the grocery list,
read emails, etc.

Nothing life-changing, but fun stuff that makes small parts of my day easier.

------
bamie9l
I find something worryingly Orwellian about an always on listener in your home
that monitors for keywords via a remote server.

~~~
pmarreck
The monitoring for keywords is done in the hardware or firmware. Nothing ever
gets sent to Amazon until "Alexa" or "Echo" is recognized.

Do you really think it would be easy to hide a persistent audio datastream to
a central server?

~~~
dantillberg
> The monitoring for keywords is done in the hardware or firmware. Nothing
> ever gets sent to Amazon until "Alexa" or "Echo" is recognized.

I think that what you _mean_ is that the monitoring is done in the hardware or
firmware using closed-source code that can and will be regularly updated
remotely and hopefully securely. And that Amazon told us that it would wait
until it thought it heard "Alexa" or "Echo" or anything that sounds sort of
like it, or whatever they decide to change the software on your particular
device to listen for in the future.

~~~
pmarreck
Now would be a good time for a third-party audit of the Amazon Echo firmware
updating process.

It would probably also be fodder for frontpage HN, in case anyone needed some
attention out there.

~~~
echo_throwaway
An audit would be wonderful. I worked on the Echo, and the updating process
was extensively tested and audited, both internally and by third parties.

~~~
pmarreck
That's great! (And I'm a fan of my [two] Echos!) Do you have a link to said
third-party auditing?

------
asg
I've wanted something like this for years, I can easily see why this is
awsome. I considered the Echo seriously when it came out. However, even as an
Amazon Prime member, I'm hesitant to have amazon in my living room. I don't
want to tie that deeply into a single ecosystem. Therefore, I am eagerly
waiting for my Mycroft[1]... One for every room! If your are willing to muck
about with hardware, the Jasper project does a lot of this already.

[1] [https://mycroft.ai/](https://mycroft.ai/)

------
veli_joza
My main reason for not getting one is that it's tied to services that are not
available in my country (Amazon Prime, Spotify...). It seems like the rest of
services ("Skills") are third-party citizens on this device. Although I can
appreciate the sleek design of Echo, I'd rather get an opensource variant
(Jasper and Mycroft are ones I heard of).

------
WhitneyLand
Ideally Echo would be able to control HomeKit devices. iPhone users now have
no Nest integration, no Echo. Starting to kill the idea of an open ecosystem.

------
ChikkaChiChi
I bit the privacy bullet and got an Echo for my home theater. It's nice, but
there is a mountain of unrecognized value because of Amazon's desire to
control everything.

As a for instance, it would be nice if keyword enablement could mute the
stereo. It'd be nice to Chromecast audio to it. It'd be nice if I could use it
to play things on a fire stick.

If it really wanted to be a device of the future, it would link to other
Echoes in the house, allowed for intercom, and localized audio tracking.

Some of these things will either come or never see the light of day because
Amazon hates interacting with other companies (see: Android).

I'm waiting for an open source alternative...

------
dharma1
I've been trying to find a usb array mic/software combo that works better for
far field sound capture than a single mic - to use with either OK Google on
old android phones, or Jasper running on Rpi.. Any recommendations?

~~~
StavrosK
I would also love having something like this. I found the Playstation Eye
camera for PS3 has an array of four mics, but there's no accompanying
software, so it's not as useful. I'd buy a hardware/software combo that did
far-field sound capture well.

Essentially, I want a better mic so I can run this:

[https://www.youtube.com/watch?v=8eiHO7uqccs](https://www.youtube.com/watch?v=8eiHO7uqccs)

~~~
dharma1
Yeah, same here, PS3 Eye but no drivers/software for array mics or
beamforming. I wanted to create something like in that video too. Looks
promising with OK Google - a little laggy/glitchy but the idea definitely has
legs.

I think in a couple of years this stuff will become ubiquitous

~~~
StavrosK
I hope so, and I certainly hope that we'll see hardware-only solutions with
open software so I know where my sound is going.

~~~
dharma1
Yeah, I'm pretty sure we'll have state-of-the-art open source deep learning
based speech recognition libraries this year that will run on local hardware
without significant latency.

Baidu open sourced their warp CTC a few weeks ago, give it another few months
before someone will release a trained English network for it

[https://github.com/baidu-research/warp-ctc](https://github.com/baidu-
research/warp-ctc)

------
mhd
Judging from the questions listed in the article, I would assume that there's
a way smaller ROI for an European homebody (fewer services that can be
queried/integrated, usually worse language recognition, no need to constantly
check the weather, public transport).

------
te_chris
I've got one, I find it mostly useless as I can't set my location to London.
The speaker is fine - but I have a much better system in the same room with an
apple tv (v2) connected with a dac. I guess it's just me, but I also like
controlling what I'm listening to and having some agency over it, so just
mindlessly listening to playlists or automated radio isn't really my thing.

I think it is cool that it's programmable, but I'm not that impressed thus
far.

(I should add, the reason I have one is that it was a gift from AMZN after
attending an event of theirs last year, I didn't buy it)

------
silverlight
I've had the Echo a bit longer (6 months? We were in the first batch that went
out to Prime customers) and we love it as well. I also want to get one for the
whole house instead of just currently the basement. In addition to music and
trivia and timers, we also use our to control our lighting (via the Hue stuff
from Phillips) and we love that as well. Overall it's amazing.

------
deanCommie
I think the audio out isn't going to happen as Amazon would prefer the 3rd
parties built their own integration to Alexa into their devices:
[https://developer.amazon.com/public/solutions/alexa/alexa-
vo...](https://developer.amazon.com/public/solutions/alexa/alexa-voice-
service)

------
noja
Sounds great.

audio jack: I like that it doesn't have an audio jack. I already have a Fire
TV connected to my tv setup, it should play the audio there.

~~~
LastMuel
Honestly, do you leave your television on 24/7?

~~~
noja
In standby. The TV supports CEC so would turn on when asked.

~~~
Klathmon
Yeah, but in practice CEC has been iffy on most TVs i've owned, and i've never
seen a TV that can startup in less than 10 seconds.

Plus AFAIK CEC can't turn back off the TV, so after the first use it's just on
forever until you manually turn it off.

~~~
camhenlin
My Apple TV definitely turns my TV off via CEC, it works great

~~~
peatmoss
New AppleTV 4th gen has an IR blaster that'll turn your TV on and off even if
your TV doesn't support CEC. It seems to figure out the right IR codes
automatically.

------
chatmasta
Generally you're operating in one of two modes:

1) you don't believe anyone is listening, and even if they are, you don't care

2) you don't believe anyone is listening, but if they are, the consequences
are worth avoiding

If you find yourself in the #2 mode, don't buy an Amazon Echo.

------
peteretep
I note that Siri on my iPhone does much of this, and is also always listening.
Siri switches my lights on, tells me what time it is in London, makes calls,
and sets timers for me, and with the newer phones, doesn't require the phone
to be plugged in.

------
return0
Sounds like it would work better as a watch.

OTOH, having a platform that accepts questions (even in written form, i.e. an
NLP google) on which we could build on would be great.

------
debuasca
2001 A Space Odyssey: Hal

