
The Real Story of How Amazon Built the Echo - nickles
http://www.bloomberg.com/features/2016-amazon-echo/
======
spdustin
I have the remote as an early adopter. It's fun to hide in another room and
say into the remote "Simon says, hey kids, Kraft no longer makes Mac and
cheese and now just makes carrots and ranch dressing."

A moment passes.

"Dad, Alexa just told us...!"

My geek dad game is strong.

~~~
PascalsMugger
It's amazing to think about kids growing up with a wonder box in their house
that tells them the answer to any question when they yell at it. It's like
straight out of the fables or science fiction of the past. I hope this result
in people generally having better bullshit detectors in the next generation,
and not some other unintended side effect like making them enthralled to said
wonder box.

~~~
shostack
What I've wondered is how much non-verbal context will be lost through such an
interface. Today, when I'm trying to verify the authenticity or quality of
something, it is a game that goes something like this:

1\. Google it and click least spammy looking link (gauged by the URL, although
that's no guarantee). I also see what sort of JS they try to load, if it is
absolutely bloated with ad tags, etc.

2\. Try to gauge from the design and other content of the site how genuine the
answer might be. Are there affiliate links for anything they recommend
peppered throughout? That's a signal it might not be unbiased info.
Unfortunately, this is where judging a book by its cover comes into play as
well. Some poorly designed sites buck the trend, but they are also prominent
with spammy sites.

3\. Find other sources to see if there is any consistent logic with said
answer/product review/etc.

4\. Make a decision hopefully.

So many of those pieces of the puzzle just aren't available when you limit
your input/output to a system like this. There are obvious tradeoffs here--I
just hope we don't lose the ability to make more informed decisions. Already I
hate things like Apple obfuscating the full URL in the URL bar.

------
amzn-336495
The story is that Amazon sacrifices engineers and others after every big
project, success or failure. They have developed a weird human sacrifice
culture. Fire phone? It was a failure implementing Bezos's wacky ideas.
Engineers had to be purged to atone for it. Fire TV? Huge success but we don't
want engineers getting any credit for it. Result: purge some engineers. Echo?
Another success that now had to float all those managers from both echo and
phone project. Better kick some more engineers off the life raft.

[http://www.businessinsider.com/amazon-hardware-division-
layo...](http://www.businessinsider.com/amazon-hardware-division-
layoffs-2015-8)

They have a real 'slaughter the goose that layed the golden egg because holy
shit! there is a huge expensive golden egg and it's mine all mine, die, die,
die..." culture. Bezos probably has an employee purge button built into his
desk.

~~~
pyb
This startling quote is in the article :

"Many of the people who helped create the Echo no longer work for Amazon..."

As you say, it sounds like success gets punished at Amazon.

~~~
runT1ME
Isn't it more likely that after shipping a 'named' consumer product, many of
those engineers rose in value a tremendous amount and rather than negotiate a
raise they took a huge salary bump working for Apple/Google?

~~~
pyb
In my experience, companies are not that great at recognizing and rewarding
achievement. So unfortunately, I'll stick to my cynical interpretation. Also,
we're talking about dozens or even hundred of people moving on from Amazon
here...

~~~
runT1ME
>In my experience, companies are not that great at recognizing and rewarding
achievement.

Is your opinion the crazy stock grants that are happening right now at big
tech companies not due to achievement and instead just political or random?

~~~
pyb
Google and Facebook offer largeish stock grants when you join the company, no
? Google is also known, to offer some kind of bonuses on merit, but that
policy is seen as remarkable in tech.

I'd guess Amazon is more political and/or stingy than the above 2.

Anyway I was not referring to big tech specifically.

------
ccozan
I was looking for a fair comparison between Now, Siri, Cortana and Alexa and I
stumbled across this [0]. I find it interesting because it seems you would
need all of them to answer all your questions.

Do you have also some personal experince comparing with Now , Siri or Cortana?
What went really bad or amazed in terms of speech recognition and speech
comprehension?

[0] [http://www.cio.com/article/2953989/software-
productivity/10-...](http://www.cio.com/article/2953989/software-
productivity/10-questions-for-cortana-siri-amazon-echo-and-google-now.html)

~~~
gervase
I can't speak for Cortana, but I have personal experience with Siri, Now, and
Alexa. In terms of the actions that I do day-to-day (I would never ask for a
dirty joke, for example, but I ask for the weather daily), I would rank them
Now, Alexa, and distant third, Siri.

There are two axes that I care about - how many useful commands are available,
and how well they can understand me.

Google Now has a pretty wide range of behaviors, but it's the spooky-accurate
voice recognition that sets it clearly ahead. In the last year, Google Now has
only misunderstood me ONCE.

Alexa also has a very wide range of behaviors (especially with the Skills),
but I do occasionally run into things I feel like it should do that it doesn't
(yet). To be fair, I also use her the most, so it could be that I push her
further than the others. The voice recognition is quite good; I would say it
misunderstands me less than 5 times in 100.

Siri is by far the worst. Her abilities are limited, but worse, she has a
tendency to funnel commands that she can't do into things she can. For
example, asking for the hours of a restaurant will instead show you a Bing
search (!) for the restaurant's name, sans hours. This is not useful. It would
be better to just inform me that she can't do that yet, file that in a log at
Apple, and then add that behavior some time down the road.

She also mishears me 25-45% of the time, AND she seems to be getting worse
over time. I assume this is because they are trying to widen her audience
beyond a standard California accent, but it's incredibly frustrating for a
product to get worse, not better, over time. I now use her only for voice
dictation of texts on my iPhone, and use Google Now in app form exclusively
for more advanced queries.

~~~
kuschku
> Google Now has a pretty wide range of behaviors, but it's the spooky-
> accurate voice recognition that sets it clearly ahead. In the last year,
> Google Now has only misunderstood me ONCE.

> She also mishears me 25-45% of the time, AND she seems to be getting worse
> over time. I assume this is because they are trying to widen her audience
> beyond a standard California accent, but it's incredibly frustrating for a
> product to get worse, not better, over time. I now use her only for voice
> dictation of texts on my iPhone, and use Google Now in app form exclusively
> for more advanced queries.

And this is pretty much a huge issue for this market. Entry requires having
already spent several billions on voice tech.

You’d have to practically use all of YouTube as training set. Which Google
did.

Maybe someday we can have laws that require all neural networks trained with
data from others to be in the public domain, even if the people providing
training data agreed to the data being used for training.

Then we could compete on other things than "understands voice".

~~~
dharma1
I don't think it will take billions. LSTM+CTC speech recognition
implementations are coming to OSS and I think we have large enough available
data sets for training without needing Google size resources. Audio books are
a good resource for training and it's not hard to amass hundreds or thousands
of hours of training material that way. I think we will have 95%+ accuracy ASR
as open source trained networks very soon.

In the case of Amazon Echo/Dot they benefit greatly from the far field array
mic for isolating a clean speech signal in noisy/far field environments, would
be nice to see generic USB array mics with Linux drivers

~~~
beagle3
What's the OSS LSTM+CTC package that's coming along?

~~~
dharma1
[https://github.com/srvk/eesen](https://github.com/srvk/eesen)

I expect there will be something in Kaldi soon, too

------
JabavuAdams
> "Generally, the engineers and product managers at Lab126 quelled their own
> dissent before it reached Bezos, instead concentrating on giving the boss
> what they thought he wanted. “We spent so much time trying to anticipate
> what Jeff would do or say, and read into little words he would say in
> meetings,” said one former employee. “It would lead to so much additional
> work.”

This sounds like a recipe for disaster. Very worrying. How have they managed
to be so successful with such broken communications?

------
goshx
I am waiting for the day when someone will sneak in those magic words into a
TV commercial to wake up Echo, Siri, etc. Imagine like a DDoS caused by a TV
commercial on Super Bowl.

I have the Echo and sometimes it scares the * out of me when something on my
TV sounds similar to "Alexa", like "a lexus".

~~~
rootbear
Yeah, they call the activation word a "wake word", but I couldn't help but
think of it as a "safe word". It needs to be something that you're only likely
to say when you _mean_ it. Which, to me, implies it should be customizable. I
should be able to set it to whatever makes sense for me.

This issue comes up in science fiction stories, often set on spacecraft, where
there is an AI that is spoken to. On Star Trek, they just said "Computer?" and
that got it's attention. But that always seemed a little clunky to me. At
first I thought the ship AI should have the same name as the ship, but I
quickly realized that probably wouldn't work so well. Picard: "This is the USS
Enterprise!" Computer: "Yes?" On Babylon 5, a Minbari character just called
his ship, "ship", and in Alien, the ship was Mother, etc. I decided that if I
ever wrote an SF story with a shipboard AI (or HAD a ship with a shipboard AI
:) I would call it Maru-san.

------
sksixk
reads like a PR piece. does Amazon Echo really need a "the real story" piece?
it's barely been out...

~~~
projproj
Probably not a coincidence that the Echo is on sale today[0].

[0] www.amazon.com/Amazon-SK705DI-Echo/dp/B00X4WHP5E

------
amelius
> You can use Alexa to turn off the lights, ask it how much gas is left in
> your car, or order a pizza.

I'm getting a bit tired of these examples. Is there a comprehensive list of
things Alexa can do?

~~~
delecti
Such a list doesn't exist for the Echo any more than it does for your cell
phone. The Echo is integrated into IFTTT, so any of the actions here [1] are
available, and there are also an ever expanding list of actions called
'Skills' [2] that you can add to your particular Echo's list of available
integrations (like installing a phone app).

That's all in addition to the basic things it can do out of the box, like set
timers and reminders, manage a todo list, play music and books (from an
admittedly limited set of sources), tell (mostly bad) jokes, sync with your
google calendar and respond to questions about upcoming events, tell you the
weather forecast, read off headlines, and a bunch more.

To a greater or lesser extent, it's like a living room installation of Siri.

[1] [https://ifttt.com/amazon_alexa](https://ifttt.com/amazon_alexa) [2]
[http://alexa.amazon.com/#skills](http://alexa.amazon.com/#skills) (might
require you to already have an Echo)

~~~
derefr
You could, in theory, list every verb my phone can natively perform. That list
is the set of view controllers in the phone's onboard OS apps, plus the set of
view controllers in every app in the app store (which is growing, but finite
and still easily enumerated.)

The only assumption you need to make is that actions on all server-delivered
forms are collapsed into a single "make a network request" verb; otherwise you
_do_ get the infinite expanse of actions that is the web.

------
grillvogel
Maybe I'm old but i still really don't understand the appeal of a device that
monitors and datamines all the communication in your home so you can buy
products more easily.

~~~
blacksmith_tb
Not sure just how seriously this is meant to be taken, but the Echo doesn't
monitor anything until you say "Alexa" (and recognition of that is handled in
the hardware, locally - if you're worried about devices with net access and
microphones, you'll need to keep your phone turned off at home, too). It's
fair to assume that once you are interacting with it Amazon mines at least
some data, however.

~~~
grillvogel
>but the Echo doesn't monitor anything until you say "Alexa"

then how does it know when you say Alexa?

~~~
dbbk
Because the code that understands 'Alexa' is hardcoded into the device. Once
it hears that word, that's when audio starts getting sent over the Internet to
Amazon's servers.

------
brndn
When is Google going to build a Google Now for the living room to compete with
Echo? I want an Echo, but I think a Google product with access to the whole
Google ecosystem would be more useful to me.

We know Echo will never play nice with Google (you can't buy Chromecast on
Amazon is evidence). If Google built this product, the only feature they
probably couldn't match would be buying products on Amazon, which I don't know
if I would use anyway.

------
post_break
I really want an Echo, I just can't justify the cost. And the cheaper options
don't have the listening mode which nerfs the whole appeal.

~~~
manyxcxi
I got early access to pre-order (can't remember why) at $99 and almost didn't
do it. I'm SO glad I did. It gets used all day every day. I grab the last two
eggs and say "Alexa, add eggs to the grocery list", hands are full of mess
while cooking, "Alexa, set a timer for 5 minutes", etc. The fact that it works
with my home automation hub is also nice, when I'm across the room and want
the lights off, or want the garage door closed. I could live without it for
sure, but it just makes things easier. The voice recognition, especially from
far away is incredible. I don't have to yell, or say things exactly right
(like in my car).

Knowing what I know now, I'd pay full price- especially with the new
functionality and apps that get released every week it seems. I've even
written a few of my own.

The speaker itself is just okay, we play music on it all the time for the
kids, but there are way better Bluetooth speaker setups, so if that's your
primary goal you won't be that excited.

If you can afford it and have a routine of any sort throughout the day where
checking the weather, traffic, maintaining a grocery list (especially when you
might not be the person going), getting a news briefing or listening to a
podcast are part of it, then you'll find it useful immediately. If you don't,
to get your money's worth, you'd want to start a routine of sorts.

~~~
post_break
I have a gen 1 iPad mini taped to my fridge so I have voice control with Siri
and I don't think Alexa can do anything Siri can't (for me). I want one, I
just don't think I need it. I'd love to automate everything though.

------
nxzero
Bloomberg's 404 is...

[http://www.bloomberg.com/404.html](http://www.bloomberg.com/404.html)

------
jamespo
Maybe it's time to release it in other territories now

------
beagle3
I've played with the Echo, it is really nice, but ... much like Siri, Cortana
and Google Now, using it fully gives up any shred of remaining privacy you had
within your own home. I would gladly pay twice as much for an Echo-device that
doesn't route all my instructions through $BEHEMOTH servers.

------
dang
There was another long article about this a week ago with almost an identical
title:
[https://news.ycombinator.com/item?id=11416795](https://news.ycombinator.com/item?id=11416795).

------
anotheryou
ugh, please no gifs, I'm trying to read here!

Big quotes I already read in the text are similarly annoying...

------
zoidb
I wonder what conversation at bloomberg resulted in adding all of those
extremely distracting (and annoying) animated gifs..

~~~
daveguy
Holy crap that is annoying. Is there a way to pause all animated gifs in
chrome? Congratulations, Bloomberg, you captured the essence of a Geocities
site. This has to be a not so subtle graphic design joke.

~~~
neogodless
I opened up Chrome Developer tools and added img { display: none } which hides
everything but the top banner, which you can scroll past ;)

~~~
ajford
I have uBlock Origin, and it was as simple as right-click>Block Element and
selecting the suggestion of amazon_echo_animation_ _.gif

The filter for the curious:

    
    
        ||www.bloomberg.com/features/2016-amazon-echo/amazon_echo_animation_*.gif

------
paulcole
"An e-commerce company wouldn’t seem like an obvious leader in [augmented
reality]."

Amazing that even today articles are still calling Amazon an e-commerce
company. I wonder if this is an image Amazon actively cultivates or if it's
just consistently lazy reporting?

~~~
scott_s
It's possible the author of the article knows that Amazon is now a tech
company as well, but is assuming that their audience does not yet realize
this. My impression is that people who don't work in tech don't know this; I
have explained many times to people outside of our world how Amazon is one of
the most important tech companies in the world, and that discussion always
starts with a puzzled look from them.

~~~
paulcole
Yeah, that's kind of my take as well. But I put that in the lazy reporting
category.

Seems like it would be worth the paragraph explaining the misconception and
helping people realize why the seemingly weird stuff Amazon kinda does makes
sense.

~~~
apaprocki
Don't feel bad -- we at Bloomberg always get classed a "media" company when
the media properties came about a decade after the software (& hardware). It's
just easier to say because it fits better into public expectation. To most
people Amazon is e-commerce because nearly everyone buys things from them
online or via mobile apps.

~~~
narrowrail
If this sub-thread were not about being specific in one's characterization, I
would not point this out, but:

> online or via mobile apps

As far as I'm concerned 'mobile apps' operate 'online' and it isn't clear the
distinction you are making. Is it common to refer to 'online' as via a web
interface (ports 80/443)?

~~~
sokoloff
I think that's the distinction and I think it's a valid one. My experience on
Aliexpress, Amazon, and Ebay (to name the 3 I use the most) is that the web
and mobile interfaces and use cases are still quite distinct from each other.
On Aliexpress, I sometimes use the mobile to checkout for the explicit
discount, but never search from mobile. On Ebay, I respond to bidding alerts
on mobile, but don't initiate transactions/searches on mobile. On Amazon, I
basically never use their mobile app (and usually use google to search
Amazon's [web]site).

------
serge2k
> Generally, the engineers and product managers at Lab126 quelled their own
> dissent before it reached Bezos, instead concentrating on giving the boss
> what they thought he wanted. “We spent so much time trying to anticipate
> what Jeff would do or say, and read into little words he would say in
> meetings,” said one former employee. “It would lead to so much additional
> work.”

That sounds like a questionable at best policy.

