Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Echo Dot (amazon.com)
351 points by endtwist on Mar 3, 2016 | hide | past | web | favorite | 400 comments

The transition from primarily visual UX towards an auditorial UX is really powerful.

Looking at screens to get key information distracts me from my surroundings and seems archaic.

My wife is a sound designer who has opened my eyes to the importance of sounds both in film and in the world. It's not that I was unaware of sounds, but I didn't realize how important they are to centering me in this world and the made up worlds of films and games. Try watching a scary movie with the sound turned off, it turns into a comedy.

I think its unexplored territory that has huge potential to impact the way we interact with the real world, even more so then Glass or Hololens.

When I listen to music as I walk down the street I change, my mood, my posture and the way I look at the world. The music augments the reality around me in a way that visual UX never can because it's a lens between my eyes and the world.

The problem is that voice interfaces break down pretty quickly once you try to do anything complicated. The Echo has pretty solid voice recognition--far better than anything else I've ever used--but it's still hard to get it to do anything useful once you get beyond a pretty narrow script. (e.g. what's the weather forecast, play this artist, etc.)

I've found that the voice recognition on Android phones works well enough to be useful in a wide variety of circumstances. Navigating, getting directions, setting alarms, taking notes, sending text messages, sending emails, searching for things, and many more. When I was still using my Moto X I did the majority of every-day tasks with voice recognition.

The iPhone is catching up fast too...my wife's taken to sending emails via Siri (to avoid strain on her hands), and most of the time it gets things perfectly.

The biggest problem is privacy. One of the nice things about touchscreens is that you have a personal dialog with the device that can't be overheard by anyone nearby. That doesn't apply to voice recognition systems, and it can be pretty awkward to dictate an e-mail to a phone in a crowded place.

Being overheard isn't the only privacy concern. Most of these solutions offload the speech recognition and language parsing functions to corporate servers. I like texting with Siri but I'm not exactly keen on having Apple record everything. It also seems limiting in that I can't use voice commands without a network.

It would be nice for voice recognition platforms to start being built in. I know there's training data that's needed, but there's some convenience afforded.

I think the processing requirements for handling on-device Siri would destroy battery life.

This actually doesn't seem to be the case. Take a look at Google Translate's offline voice recognition AND translation - it's really amazing, considering it's all happening on your device.

I forget where it was, but they published something about training a very small very fast neural network that could fit comfortably in the phone's memory. Tricky tricky. :D

Plus the only way to train these things at scale is to upload the recordings once you have some usage.

Worse for battery life than firing up the radio?

And devices that listen to you 100% of the time is yet another privacy concern... even if they don't send everything to a remote server.

If you have a human assistant who does that job, he also listens 100% of the time.

But he or she is less vulnerable to being automatically hacked by a three letter agency, foreign government, and/or hacker gathering data for identity theft.

The privacy concern _isn't_ necessarily about having something to hide. It's about the consistent hacking of major systems, and exposure of personal data.

And you don't think there are privacy concerns with that? It is a /very/ intimate relationship, and generally requires some ritualized/formalized interaction, and a very high degree of trust.

Just on the note of hand strain, without knowing anything about your wife's condition, a way that could help alleviate it is to critically analyse hand position/technique. As a pianist, I have been trained to have a very supple hand position when operating any device but I notice this isn't at all the case for many people I observe in their day to day activities.

Historically probably wasn't much of an issue but given that most people will spend hours at a desk on a keyboard, it's likely to become more of a problem. Think of it akin to paying attention to your posture

The use of Google Now from my bluetooth'd helmet has really improved my motorcycling experience.

Real easy to say: "Okay Google... navigate to California Academy of Sciences."

What's missing for me is spotify/app specific integration.

> What's missing for me is spotify/app specific integration.

For that to really happen in a robust way, I think Google needs to open up Custom Voice Actions.

[0] https://developers.google.com/voice-actions/custom-actions

"Ok Google.... Play <artist> on Spotify" works for me.

I agree discovery of these magic phrases needs work.

Yeah, there's some that can be done through system actions (which I think that is) and it sounds like custom actions have been implemented by selected partners, I just mean they need to open up custom actions to enable more general app-specific integration.

I thought this already worked.

Okay Google... Play music will start Music app Okay Google... Start Radio will start NPR app

I can say "Open Spotify" and it will open the app. Then I have a button on the helmet that sends the Play command. But I can't do anything robust like playing a specific artist.

Perhaps if I used Google Music the integration would be built out.

On my phone "Play <artist>" uses Google Music. "Play <artist> on Spotify" makes it use Spotify.

On my Nexus 6p saying "OK Google play 'artist'" will open Spotify and start playing the top songs of that artist. This does not work to play specific playlists though.

Define work well? It doesn't work well if you're not connected to the Internet, if you speak quickly, if you interrupt it, it can only do limited follow up.

>The problem is that voice interfaces break down pretty quickly once you try to do anything complicated

I've done a fair bit of interface engineering for the web. Between that and using so much software over the course of my life, I'd say that this applies to GUIs just as much as voice interfaces.

Yes, but GUIs have two or three dimensions available (up/down, left/right, time) whereas voice just has the one (time). We humans can also full-duplex GUIs much more easily than voice-based interface. And GUIs at least can be hooked up to full-powered grammar-based interfaces whereas voice, somewhat ironically considering the nature of human communication, has more trouble with it.

(I'd suggest this is actually a combination of the still-non-trivial nature of NLP, combined with a lack of feedback, combined with the fact that giving instructions is quite hard. Humans overestimate human language's ability to communicate clear directions, as anyone who has done tech support over a phone understands.)

Just as the mouse input has evolved to include multitouch and 3d touch gestures, voice input can also evolve. The full range of tone, inflection, pitch, etc is available from the human voice.

I wonder if NLP research should have started as our ancestors did, with grunts and hoots and cries. Instead it's focused on recognizing full words and sentences while almost completely ignoring inflection.

Another dimension to add with vocal input is directional. If you have mics in all corners of a room, which direction you speak in can affect whether "turn off" operates your TV, your lights or your oven.

Very good points. I can't wait until devices can read my emotions or inflections in my voice. I can voice-to-text most of my short messages, but anything that requires punctuation or god forbid emojis still require manual input. And I don't want to have to say "period" or "exclamation mark" to indicate my desired punctuation. If I say it unusually loudly, insert an exclamation mark. If I pause at the end of a sentence (Word has known a grammatically correct sentence for decades) and don't say "um" or "uh", put a period. If my inflection goes up or there is a question word in the sentence, add a question mark.

There is a lot of improvement for voice processing in several dimensions of voice.

And copy and paste. People seem to always forget the power of it. It's the GUI equivalent of "Search for that on Google" or "Now, SSH to this IP I found digging through AWS." Copy and pasting of text from application to application is the clunky Unix Pipe. It's universal and deeply important.

Taking sections of the last response, or hell, even having every response essentially be wrapped up in some sort of object you can reference in your next query to the interface is what all of these lack.

Even Androids "Search this artist" doesn't quite get there. The lack of context between queries is what murders Siri for me. That and her seemingly random selection of what goes to Google and what goes to Wolphram Alpha. Sometimes even the "wolfram" verb prepended to a query just doesn't go to wolfram no matter what.

I've often postulated that copy and paste is perhaps the biggest productivity enhancement in the history of computing.

I know some software maintainers who might disagree. But I like PopClip (https://pilotmoon.com/popclip/) as an enhancement on top of that one.

I second PopClip as a fantastic product, incredibly useful. Their DropShelf[0] tool is also useful, but not nearly as much as PopClip. But definitely worth the money.

0: https://pilotmoon.com/dropshelf/

I use KDE Connect to enable seamless copy and paste between my PC and my phones. It's the single best thing I ever installed in the last 1 or 2 years.

Sure, but the difference is that it's (almost) always obvious what actions are possible in a GUI. With voice interfaces you're back to trial-and-error.

There is still a fundamental problem with voice: it has to understand your words.

A text field in contrast doesn't need any intelligence, nor do buttons. This is in particular important for instance for people living in non english speaking countries but using english in specific contexts (work, gaming, minor hobbies etc.). Switching language in audio applications are generally a PITA. Then even when you do the switch between languages every time, the engines are still have huge performance gaps between the languages.

Sofware has become way extremely tolerant for multiple languages IMO. Voice recognition interfaces are not so mature yet in my experience.

I'm not so sure about that. Check this out. One of the toughest fights in one of the toughest games performed with only voice commands. https://www.youtube.com/watch?v=5m2a2dLdZ0M

Now, granted, this is a specific use case, but, you know... "explore the space" and all that. (more cowbell!)

> One of the toughest fights in one of the toughest games performed with only voice commands. https://www.youtube.com/watch?v=5m2a2dLdZ0M

After 111 failed attempts :)

Still, it's a hell of an achievement.

EDIT: to be fair, Ornstein & Smough is a very tough fight even with normal controls.

Also notice the voice recognition fails to recognise some words like "item" even though they are spoken clearly. Almost gets the guy killed at one point.

The "play some good 60s rock" example isn't a VUI breakdown, it's a functionality gap in the backend. One that will probably be fixed pretty quickly, given the way things are headed.

A VUI breakdown would be inability to understand accents, or non-responsiveness to commands. As a user input, Alexa is pretty well buttoned up.

Sounds like the Enterprise computer:

Geordi: Computer, subdued lighting.

(computer turns the lights off)

Geordi No, that's... that's too much. I don't want it dark. I want it cozy.

Computer: Please state your request in precise candlepower.

(The scene: https://www.youtube.com/watch?v=OPZnR3Ue1n4)

There will certainly be some aspects of the computer training the human, too. Just using this as an example, I don't know how much candlepower I want, but computers don't get bored or annoyed by my requests. I could start with 1 candlepower and move up to 10 if it's not bright enough. 100 might be too bright, so now I know what range I'm looking at. Next time I could just say "computer, 12 candlepower lighting, please".

Computers train users on how to use the computer all the time. It's less ideal than having the computer know everything, but once you know what you can expect from a computer, it's easier to get a good result.

I think that cuts both ways. If the computer can be trained to understand the user's intent, that seems like a better solution than forcing the user to think a different way.

Which would you rather do? Be forced to state your lighting preferences in candlepower, or have the computer learn that when you say "subdued lighting", you mean "12"?

Very true, but this is one simple example. Look at what Wolfram Alpha tries to do for even more complicated examples. If I put in "if I am traveling at 60 miles per hour how many hours does it take to go one hundred miles" it gives me an answer of 6000 seconds (1.66 hours). Very intuitive, and it actually ruined my example because I did not expect the site to understand what I was saying.

But if I type in "how fast do I need to go to travel 100 miles in 6000 seconds", now it has no idea what I'm talking about and instead gives me a comparison of time from 6000 seconds to the half life of uranium-241.

Now, when I get that result, I don't usually just give up on trying to figure out the answer. Instead I try to figure out what the computer expects me to say. Through some trial and error, I can shorten the query to "100 miles in 6000 seconds" and boom, I get the answer of 60 miles per hour. Instead of natural language, I'm using the search engine like a calculator.

The computer has just taught me how to use it. Ideal? No, but we work within the reality we're given. 12 candlepower is dim for you but for someone with decreased vision, that might be completely dark. The computer doesn't know unless it's taught, and we know from looking at history that users would rather the computer train the user than the user having to train the computer.

You asked: "how fast do I need to go to travel 100 miles in 6000 seconds" Which is equivilent to saying "at what rate do I need to go to travel {rate}". It's a nonsense question, you already know the answer. You need to go 100 miles per 6000 seconds.

What you should have asked is: "100 miles per 6000 second to miles per hour", which it will happily convert the rate you gave, for the one you really wanted.

I guess what your saying is it should be able to figure that out, but at some point, the old phrase "garbage in garbage out" surfaces.. You never told it to convert the unit.

Wolfram is, and has always been, much more inclined to understand you if you work out what exactly you are trying to calculate before hand.

Some phrases exist as a "wow, 1 million people phrase this problem this way, let's throw that in." The fact it can take an easily dictated, albeit strictly phrased problem, and get you your answer is really what I love about it. Now if Siri would just stop sending stuff to Google. -_-

What if you could define the equivalent of Bash aliases via voice control? This would allow users to tailor their experience from the default (possibly complex/unintuitive) commands to their own personalized ones.

Example format: "Computer, define X as Y"

"Computer, define subdued lighting as set lighting to candle power twelve"

Then the VUI just adds a new entry to the voice commands where saying X results in Y.

So unrealistic. They'd use candelas.

You're thinking too much like an engineer :-) It's not a speech recognition breakdown but it's certainly a voice interface breakdown in the sense of I can't get the device to do what I want it to do. As a user, I don't care where in the pipeline my attempts to communicate a desired action break down. I just know that they do.

Exactly. We're used to dealing with either humans, who are intuitive and highly adaptive, and technology, which we manipulate and have total control over (so long as the system displays its status, we can find our way). We're not used to systems that expect us to interact with them in natural language, but have very specific criteria around what we ask for.

It still feels a lot like the old text-based RPGs, in that you spend most of your time trying to figure out how to phrase something to accomplish a basic need, while angrily thinking "it would have just been easier/faster to pick up my phone."

It's 2016. How are we still OK with the unreasonable constraints of technology that make us jump through a hoop like a trained poodle to get the treat?

Same can be said for GUI as well. Remove the search engine concept, you are only left with playlist, song/artist name on such sites.

We don't have audio search engine equivalent yet but that day is also not far.

That's the thing. It is a use case with voice commands that map to specific actions. In the case of music, I can give Echo the name of a specific artist or maybe a playlist. But it breaks down pretty quickly if I tell it to play "some good 60s rock."

Ok, that is pretty damn cool. I've played Dark Souls so I can appreciate how difficult that must have been. Very impressive.

Devil's advocate though: this seems more like a case of the guy being good enough at the game to win in spite of the voice controls rather than because of them. Compared to a regular controller/keyboard+mouse/whatever there's just no contest in terms of input speed and precision. Not all genres are a good fit for this either. I'd be really interested to see if anyone could make it work with, say, a competitive FPS game.

Never mind that in order to use a voice service, it requires you to speak at a rate slower than many can type, all while demanding that the people in the room hush up so it won't get confused. Repeat if there was a mistake.

Try Hound. It's faster than anything I've tried and it's context management is just impressive as hell. The echos lack of negative clauses is really really frustrating.

I just can't stand talking to a computer. Never liked the idea of it. I loathe voice-controlled telephone menus. I can type faster than I can talk (if you include the inevitable revisions -- even without it's pretty close). I don't even like to leave messages on voicemail. I don't think voice interfaces are anything I will ever use if there's another option.

That holds true with pretty much all first generation products of it's type. The first "smart phones" couldn't do a whole lot of things. Over time, the Echo will improve and you'll be able to hold conversations with it.

My children are quite young. The world is going to be an amazingly interesting place when they are my age.

I can recall the first time I ever saw a computer and how primitive they now look.

Now we have little bots that listen to you and reply with info.

When my two-year-old is forty - we will have ghost in the shell.

It's crazy beautiful and scary to me that we all grew up reading cyberpunk fiction and watching anime and not all of us did, but pretty much all of us are actually building that future.

There is a balance between dystopia and utopia though.

We are all working at the Great Game - and the future is going to be interesting, but we can never turn back. So hopefully we keep the balance and get it right.

My worry is that at this literal nascent stage of technology, that we don't fuck it up as we don't fight hard enough for privacy policy.

We need privacy policy that is thinking at least 50 years in advance.

The control of government apparatus is thinking in advance - I personally feel that the tech sector's vision is myopically focused on today's profits and not in the future where it should be viewing, with the exception of this most recent case between apple and the FBI. At least Cook's comments were salient and forward thinking and truly for the greater good... Let's hope that invigorates the tech industry as a whole to think about where we are headed.

Speech recognition has improved dramatically over the past few years through using cloud back-ends. It's actually usable for many tasks.

However, we seem to still be pretty far from natural language interfaces that make sensible inferences about actions you're requesting and perhaps join multiple data sources to answer your query. There have been a lot of advances--don't get me wrong. But it's a very hard problem that's been being worked on for a very long time.

Just like you hold conversations with Siri, Cortana and Google Now?

Are they not first-gen?

Well, I mean, they aren't fixed artifacts like a piece of hardware. I'm pretty sure they have been updated a few times.

is it better than google voice? Siri is completely useless for me but google voice recognize everything I said (love my new iphone 6s but I wish I could say "hey siri" and it would actually work).

The other issue is it becomes less useful when more than one person is active in the room. Small party? Interface no longer functioning as talking in the background interferes.

And if you do get beyond a narrow range does the user spend a lot of time thinking about how to craft a question so that the machine can understand it?

How complicated is controlling a TV or a radio? And voice is much easier for a variety of tasks than remote controls.

I think the main problem with voice interfaces is that it's not discoverable. You need a good understanding of what the system can and cannot do, its current state etc before even speaking.

CLI has the same issue, but at least you can man-xxx, which I imagine works a lot better in text than it does in audio.

I think the goal is that the system gets to be good enough that nobody worries about discoverability any more.

I think Google is quickly getting there with their search interface. I'm always amazed at what a good job Google does when I ask it a question like "what's the name of the instrument powered by steam" and milliseconds later it's showing me info about calliopes.

I really liked how this was done in the movie 'Her.' There's something especially nice about only having your attention distracted audibly and not visually, especially in public.

I wonder if the smartphone age will go away as quickly as it came. I picture a world where we just have smart wearables like a watch which has a tiny visual interface, but a powerful audio one (speaker, earpiece, put watch up to ear, etc). It seems a lot less intrusive. I imagine as we get better with AI and voice recognition, it'll be as practical as a phone. What I'm able to do with Google Now on my watch is fairly impressive today. We already have the technology to understand things in context like "Navigate to Katz's deli" brings up Google Maps to the deli as opposed to a google search results page about navigating to a cat themed deli, which was the status quo not too long ago with voice search.

I imagine carrying around this big selfie/facebook machine around, constantly charging it, whipping it out all the time, etc will be pretty gauche if wearable-only solutions become competitive.

For many functional tasks, I can see an auditory UI being superior. But currently most people use their smartphone to skim content. I don't want the equivalent of listening to voicemail for everything.

Not to say that content can't shift for the medium, just as it always does. What would an audio Facebook sound like?

Well, I do that now sorta on my watch with its small screen. I scroll through notifications, but no, I don't get the full FB web or mobile experience. I'm not sure how many people actually want that; I often hear complaints about how phones and apps aren't simple anymore. I also believe that we really haven't figured out the best way to use these small screens. I'm surprised at how usable my watch is sometimes with its 320x320 screen at 1.8". For reference the original iphone was 3.5" at 320x480 resolution.

For teens and such I can see the big phone never going away but for most adults, having an inconspicuous wearable just seems like a more refined experience. I imagine there's a logical procession here from desktop > traditional laptop > ultrabook laptop/convertible > tablet > mobile > wearable. You lose functionality with every step, but depending on the use case, it doesn't really matter. For people in my peer group, a wearable that could work without a phone would sell like hotcakes.

> The transition from primarily visual UX towards an auditorial UX is really powerful.

It's also less accessible. I'm sure auditory UI is useful in many cases, but it also seems to be more cumbersome in others. In any case, I hope that pervasive auditory UI doesn't become any sort of standard without an accompanying visual/physical interface.

> Try watching a scary movie with the sound turned off, it turns into a comedy

Allow me to be pedantic and say it is that being fully immersed in the context of the movie that really matters. You could probably achieve a similar suspenseful effect with silence+subtitles, although I'm sure the experience isn't identical. Otherwise, the deaf could never enjoy scary movies, including me.

>It's also less accessible.

For whom? To the blind this would be a godsend. From a practical medical perspective, audio is superior because we have decades of experience with effective ear implants to help the hard of hearing and the deaf, but the visual equivalent still eludes us.

> To the blind this would be a godsend.

Actually, I'd imagine that a good old-fashioned tty is pretty good for a blind person: it's TUIs and GUIs that get progressively more painful.

Source: am blind without my glasses; can imagine preferring ed to emacs, vim, Atom, SublimeText if I had to use an audio interface.

> To the blind this would be a godsend

For sure. Different interfaces disadvantage different classes of people. There is no silver bullet; I'm trying to point out that an exclusively audio/voice-driven UI would not be desirable.

> we have decades of experience with effective ear implants

The problem is multi-faceted. Hearing loss, especially from a young age, often leads to difficulty speaking -- it is no use if a voice-driven system can't understand you in the first place.

And while cochlear implant technology has helped a lot of people, it is by no means a cure, and there are many, many others that don't benefit enough from assistive technology to achieve functional equivalence (which is the key phrase when talking about accessibility). I have a cochlear implant and haven't worn it in years, because it really doesn't help.

> It's also less accessible

Well, I think blind people would disagree with you.

> I hope that pervasive auditory UI doesn't become any sort of standard without an accompanying visual/physical interface.

Any speech interface could be trivially translated to a text interface, right?

> Well, I think blind people would disagree with you.

Answered downthread.

> Any speech interface could be trivially translated to a text interface, right?

Pretty much, which is why UIs should not be exclusively auditory, that is, delivered without an accompanying visual interface (text or otherwise). Ordering the Echo Dot verbally is a cute gimmick given its premise, but it would really suck if otherwise useful products and services were only usable through audio.

Hopefully the audio UI trend does not follow the obsession over touch screens: a rapidly adopted, de facto standard driven by tastemakers that leave little consideration for others that might prefer an actual keyboard or other physical affordances.

> Allow me to be pedantic and say it is that being fully immersed in the context of the movie that really matters.

I hope to not be a super pedantic ass for pointing out that the 'immersive' media in films is the audio, not the visual components.

> the 'immersive' media in films is the audio, not the visual components

That's a non-falsifiable opinion, really (even if it does apply to the majority of the population). I'm living proof you can enjoy movies without the audio.

It's the sum of our experience that colors our perception -- almost irrevocably in this case, since I imagine it would be difficult for the typical person to really be able to enjoy something in complete and utter silence.

> I'm living proof you can enjoy movies without the audio.

I am not looking to equate immersion with enjoyment, and by no means do I intend to disrespect the manner by which you enjoy a type of media. My apologies for coming off that way!

When I refer to 'immersive media' I am referring to the 360-degree omnidirectional dispersion pattern of sounds and our similarly omnidirectional hearing of those sounds. This is 'immersive experience' as opposed to a 2-dimensional or stereoscopic experience, which is what we get with visual media. Television/film screens fire light directly at the eyes; even in iMax situations the film is never experienced behind us. That isn't immersive, whereas say a VR headset can potentially offer this type of immersion. But since this technology is still in its infancy I think it too early to call it fully immersive like audio is.

> 'immersive media' I am referring to the 360-degree omnidirectional dispersion pattern

Then that is splitting hairs over a definition of immersion, and quite unrelated to how the word was used in my original comment. Had I instead said "fully engrossed," my point would still hold, and you would not have one.

I understand you were being "super pedantic," but if you're going to do that, then you should be super precise in the pedantry, otherwise you're arguing a strawman.

>> auditorial

Don't you mean oral or aural?

You would probably be interested in what we've been building over at https://www.narro.co.

It would be nice if it could extract forum discussions, like YC and Reddit. Sometimes I like to hear the text I am reading, it helps with concentration.

Yes, I'd like to see the possibility to select text, right click and select "read out loud".

I think all the browsers on OS X support that using the system text-to-speech (edit: Safari and Chrome, not Firefox)

I'm using Linux. It seems that Linux is falling behind in the area of speech input/output. I hope they will catch up.

Voice will become an important, if not the primary, interface to home/car audio/video.

"computer lights on" "dimmer"

no thank you.. i will use my hand

A device to change the channel on my TV? No thanks; I'll just use the dial on the TV.

let me pick up my phone, open the app for light control, dial in some setting, and hope the app doesn't crash.

TV remotes are awesome because it has physical buttons, and it's fairly dumb... almost no chance of issues.

And if you're on the couch watching a movie and the light switch is on the other side of the room? Or you want to switch on the porch light for guests. Or switch off outside lights?

i get my non-lazy ass up.

For the few times where I may need to walk additionally around the house, it's a non issue

and what if you weren't so mobile?

They should announce Amazon echo for the deaf, which would just be a screen.

... with a couple of kinect type devices to monitor one's signs.

"If you have more than one Echo or Echo Dot, you can set a different wake word for each".

This is something I've been thinking is becoming more problematic as well as an opportunity for real ubiquity. I have 3 separate devices nearby that are Google Now voice activated (the newer devices support this even if the screen is off), and they will sometimes trigger at the same time accidentally.

Since the processing is cloud based, and they know my identity, why don't the devices recognize this fact and cooperate. Instead of just 7 beam forming mics in the Echo, if you have two within hearing distance you could have the benefit of 14 and a unified response. Don't tie the request & response to a particular device, instead think of it as ubiquitous network that moves with you as you walk around the household, you should be able to continue your conversation from one room to the next seamlessly.

Since the processing is cloud based, and they know my identity, why don't the devices recognize this fact and cooperate. Instead of just 7 beam forming mics in the Echo, if you have two within hearing distance you could have the benefit of 14 and a unified response.

The echo and noise reduction software that I'm aware of can't really do that in a reasonable fashion.

With current solutions, you've got one DSP that's receiving all the audio streams simultaneously, and they need to be exactly synchronized in time. Then, using basically pattern-matching, it figures out what direction the user's voice is coming from, and combines some/all of the audio streams together to eliminate environmental noise and make the speech as clear as possible.

To do this with separate devices, you'd want extremely precise time synchronization. Which is possible, but I wouldn't want to implement it.

The extra processing and synchronization would take longer, and delay input to the speech recognition engine. I don't think it would enhance the user experience.

Edit: spelling.

Just have the Echo that hears the person best be the one that responds. So simple, and easy to implement. I honestly don't understand why Amazon hasn't fixed this yet. It's so fucking obvious.

> So simple, and easy to implement.

Ah yes, the rally cry of the person not doing the actual development work... In my experience, rarely is _anything_ "So simple, and easy to implement".

Agreed. Doing something sensible at a higher level than the actual audio recording would be easily possible.

> I don't think it would enhance the user experience.

Baidu trains the voice recognizer by adding all kinds of noise to the training data. I think it might be easier to do that than use multiple microphones. The neural net learns to do the difficult process of separation of useful data from noise.

I learned to not have the wake word be "Amazon" when I was watching online training for AWS. The Echo went nuts until I finally paused everything and changed the wake word back to "Alexa".

They really need to make it so that all of the Amazon Echos on the same network use a proximity algorithm to determine which one responds. Simply: The Echo that hears you best should be the one to respond.

I want to have an Echo in every room, and I don't want to have to remember all their different names!

> I have 3 separate devices nearby that are Google Now voice activated (the newer devices support this even if the screen is off), and they will sometimes trigger at the same time accidentally.

> Since the processing is cloud based, and they know my identity,

Interesting, so everything said in that room gets processed and potentially sent to Google for indefinite storage? What a 1984-style luxury.

AFAIK, every one of these devices does nothing until a "wake word" is heard, and only then do they record+send.

Having all of the devices listening all the time would be a bandwidth and power nightmare, if not for the sender, for the receiver.

Correct, of course it can activate accidentally.

https://history.google.com/history/audio has a list of all audio recorded

What about accidental triggers of the wake word? What about planted "wake words" to record people discussion "inappropriate" things?

For the Echo, at least, it has to use your home network, so you could pretty easily run a packet capture to see if it's ever sending audio out when you don't want it to.

Harder for things with cellular data, though.

It's a cat-and-mouse game: What if it only sent the clandestine information when it picks up the "normal" word? The point is you don't control the device or its software.

>What a 1984-style luxury

exactly why I think none of this is worth it (echo, google now, siri, smart tvs etc) - especially given the current applications of the 3rd party doctrine - you are giving up the right to privacy for everything that is said in your home.

I agree with this entirely. I've been waiting patiently for a way to add microphone distance to my Echo and this is perfect for that... except it doesn't work that way.

I am very much hoping they fix it in the future and add a software layer to combine/route commands with one single wake word.

It's also a bit annoying that the Android Wear version of Now doesn't work the same as the regular Android version. For example, the full-sized one seems much more flexible with wording, and supports listening in several languages at once, while Wear is limited to one language.

But it's limited to 3 words which is weird. I'd rather three "Office, order socks" than try to remember that the one in my office goes by "Amazon".

When did I turn from the enthusiastic kid who dreamed of audio-controlled personal assistants like this to a cranky old man who doesn't want anything remotely spy-possible in his house?

I think when we were kids we didn't think that the personal assistant would have to communicate with the outside world via the internet in order to perform its function.

If all of "Alexa" was included in a disconnected local database I bet it would still be as appealing.

Rosie on the Jetsons didn't have to "phone home".

Almost. More specifically, we didn't think that the personal assistant would have to communicate with a corporation that wants our info to make (more) money. The government option doesn't sound any better, either.

I think they are rather creepy, because it's so obvious there is (or, could be) a hidden agenda.

While indeed creepy, I ordered the original Echo as soon as it was made available, but I'm probably a special case. I live by myself and barely even speak out loud at home.

If Amazon can somehow monetize my primary use of Echo as a glorified kitchen timer I will be impressed.

> I live by myself and barely even speak out loud at home.

It occurs to me that the background noise in your home actually reveals a whole lot about the self:

- What you're listening to and when

- What you're watching and when

- What type of gentleman's material you enjoy and when

- When you leave home and get home

- When you wake up, when you go to bed

Some of these can be limited by the size of your house, but the trend in urban dwellings has been towards smaller so one unit could presumably capture every sound in your home.

And at the end someone pays money for some company to install a device to collect all this.

The tech insanity has really gone far ..

Finally the Telescreen is here.

Your first three points are moot in my case because, as a testament to your mentioned small apartment size, I consume all my entertainment with headphones after some real passive aggressive comments from neighbors a few years back.

When I wake, sleep, leave, and come home could be monitored by Echo, but it's also already being monitored by other devices I own, and it's data I'm not particularly concerned about at the moment.

> I live by myself and barely even speak out loud at home.

Not sure how other people feel about talking out loud at home, but as someone who also lives alone (in a 250 sqft apartment) and always wears headphones, I can't really imagine talking out loud. Just seems weird for some reason. I never use Siri either.

Wonder if that's a living alone thing, or a small apartment thing, or ...?

$180 for a kitchen timer seems a bit steep.

I would pay $180 for a voice-controlled kitchen timer which did not need an Internet connection to function and had verifiably secure command log deletion.

I'm less than enthusiastic about a $180 kitchen timer that uploads everything I say to the cloud for analysis, even if I understand that the analysis is to some degree necessary to improve the voice recognition.

While I hear what you are saying (no pun intended), it's important to be clear that it is not uploading everything you say to the cloud. It's uploading what you say once it wakes up by detecting the wake word, which is done completely locally.

It was $99 (there was a special offer when it was first announced at the end of 2014).

$99 for a kitchen timer seems a bit steep.

Considering most smartphones already have this app on them - I'm going to agree.

FREE vs. $99? No contest there my friend

We all spend our money how we want, and cell phones most certainly aren't free, either.

Aside from that, I didn't purchase the Echo with the intent of it being primarily kitchen timer. It just so happens that after owning it for over a year my usage of it is mostly limited to that.

My usage is probably around 85% timers and alarms, 10% streaming music, 4% shopping lists, and 1% everything else.

I'd be interested to find out how much you still use it a year from now.

Do you think you've used like you thought you would, or did you have ideas about how you might use and those didn't pan out or the device didn't work very well for those?

I ordered it originally purely on the "Oh, cool gadget!" factor, and I was willing to part with $99 for it.

I really didn't have a particular use case in mind at the start, but I was (and still am) impressed by the sound quality from such a small speaker. It's nice to be looking in the fridge and say "Alexa add X to my shopping list" or when my hands are covered with flour say "Alexa set a timer for 30 minutes" or whatever. And for those things it's worth the cost to me.

Most of the features that have rolled out just seem gimmicky, though. Take the news briefing: It either provides too little info to be useful, or it drones on and I get annoyed by the voice which, while it sounds natural compared to Microsoft Sam, still feels cold and artificial. In general I like having more control over my internet actions. I'll never use it to order a pizza or anything from Amazon because I don't know what happens if it misinterprets me or I make a mistake. And the third party apps are clunky ("Alexa, ask X to do Y").

To sum it up, aside from the very basic features I've used since day one it just feels like a toy.

Basically everything B2C today is a data play. Customers want everything to be cheap or free, so the only way to make money in B2C is to turn the customer into the product.

It's a deflationary race to the bottom. The bottom is a hell where everything watches you and sells absolutely everything about you to whomever can afford to buy the data.

Whenever I read these, I can't tell if the group is paranoid or prescient. But anyway I ordered one via my alexa. Amazon probably already knew I would.

> I think when we were kids we didn't think that the personal assistant would have to communicate with the outside world via the internet in order to perform its function.

Human personal assistants were connected to the outside world -- how else would they make appointments and reservations, book flights, find out what the weather would be, etc.? The whole point is to be connected to the outside world, automatic or no.

> Human personal assistants were connected to the outside world...

There's a difference between the "always on" communication these devices have and communication the user specifically requests.

When I want to make an airline reservation, I'm requesting the device to send the booking information to the airline. I'm not asking it to send a recording to the mothership of everything that happened in my home for the last 5 hours, which a human assistant would never do.

Hah, it'd be like hiring a personal assistant from a staffing agency who is constantly on the phone with the staffing agency parroting what you say.

That's also not what's happening with Echo. You'd literally have a few seconds of audio being sent to Amazon and then some text (the result of the ASR) being sent to the third party ticket search / reservation system.

Sure, but I also wouldn't let a human assistant live in my bedroom 24/7 listening to everything I say. I would also choose my words and topic differently when a human assistant is around.

You have to be able to trust that Echo isn't recording everything you say, unless you prefix it with "Alexa", and that this behavior will never change (say this is the behavior for the average user, but with a police warrant, they're able to tap your Echo).

I'm part of the group that thinks the tradeoff is worth it for the convenience, but I understand why many people would disagree.

This is exactly it for me. I'd buy an echo and a dot for every room if it didn't phone home.

I wonder what sort of memory related tech it would take to pack nearly all of the internet in a small space, and have it incrementally update(the internet!) and yet write it in available memory.

Besides any contact with outside world would need communication. So you can't have an entirely standalone gadget.

Minus videos and images over a certain size... not all that much. And it would compress pretty well.

I wonder if the internet archive has a record of the size required minus images.

Couldn't legally the FBI get a court oder to be able to listen in on conversions in a room that has one of these? They already do that with car assistance services. [1]

[1] http://www.cnet.com/news/court-to-fbi-no-spying-on-in-car-co...

Echo (supposedly) doesn't start sending audio to Amazon until you trigger it with a "wake word", i.e. "Alexa".

Of course:

a) it's not open source so we can't be sure (aside from monitoring network traffic, which is probably encrypted)

b) if the FBI is successful in compelling Apple to develop a backdoor for the iPhone there's nothing stopping them from compelling Amazon to do the same with Echo.

c) better hope you don't say "Alexa" or something Echo mistakes for it.

The traffic is encrypted. But you could certainly watch the network traffic and see that there's no traffic if the Echo doesn't wake and the lights don't turn on. (Of course, you'd have to trust that it isn't time delayed for hours in some sort of intentionally-sneaky way.)

It would also be possible to take a look at the hardware design and determine the linkage between the "mic mute" button light being on and power going to the mics.

The customer can set the device to provide both audio and visual indication when it "wakes up" and begins streaming to the cloud. And, of course, the customer can also press the mic mute button to avoid accidental wake up.

Yes, the FBI could try the same approach with Amazon as they are trying with Apple. For all of our sake, let's hope that Apple wins.

> It would also be possible to take a look at the hardware design and determine the linkage between the "mic mute" button light being on and power going to the mics.

How would the mics listen for the wake word if they aren't always on?

There is a mic mute button that is able to turn off the mics, which then prevents the device from waking up, as it is not receiving audio signals to process and detect the wake word. When the button is activated (== the mics are off), there is a glowing red light illuminated inside the button.

My point was that you could check to see if the linkage between that red indicator light and the power going to the mics was in software or hardware.

This is analogous to the warning light that many laptops have for when the built-in webcam is on.

b) if the FBI is successful in compelling Apple to develop a backdoor for the iPhone there's nothing stopping them from compelling Amazon to do the same with Echo.

No backdoor needed if they information is sent to Amazon. All that is needed is a court order for Amazon to hand it over.

Sure, but all you'd get are commands you give Alexa ("Alexa, turn off the lights", "Alexa, what's the weather today"), which I suppose could be interesting to law enforcement, but certainly not as interesting as the "full-take" of an always-on wiretap.

I'm suggesting in order for the FBI to use Echo (or any other internet connected device that has a microphone) as a wiretap, the FBI could try to compel the manufacturer to write, sign, and push an update that causes the device to transmit audio to the FBI at any point.

That would have seemed a little far fetched in the past, but the current FBI/Apple situation could set a precedent.

The answer is obviously 'yes'. If there is a way for Amazon to listen to conversations then a court can compel them to give the FBI access.

I'm not terribly worried about various ways companies expose me to govt surveillance that requires a court order.

I do worry about said court orders being rubber stamps, and about surveillance that DOESNT require a court order.

Otherwise we can make no technological advancement.

I love "smart" devices, but hate "devices that needlessly insist on connecting to the Internet".

One of the worst offenders is Dropcam. They have a super camera, easy to set up and use. Great picture quality. Would be an awesome baby monitor or "closed circuit TV replacement". But why the goddamn hell does it need to connect to the Internet? Why is the only option available to needlessly stream video out of my home network to the cloud, only so that I can then stream it back into my home network for viewing??? WTF? That's both a waste of outbound bandwidth and a waste of inbound bandwidth. I should be able to put it on my network, switch off the cable modem, and still be able to view video locally. How hard is that? I could do that with a webcam and a really long USB cable!

Their business model depends on some percentage of their customers using the subscription service.

My guess is: if they offered the version you describe, they'd need to make it much more expensive. Which many consumers would find odd: the one with fewer features would cost much more. Granted, those consumers wouldn't be looking at the big picture...but I find many consumers don't. Up front costs matter a lot to consumers.

As dumb as it sounds, it is probably easier that way. Sometimes in LANs it is easier to get data out then back in. For example, a lot of dorm networks don't support Chromecast devices because chromecasts tries to multicast on the LAN for discovery, but dorms have networking policies that prevent this.

A webcam that sends the data out to the internet then back would avoid the discovery issue by using an external webserver as a rendezvous point.

I don't think people spend a lot of time thinking about their home networking. You could imagine most people just plug in their home routers and it is a crapshoot whether or not the router will support the necessary functionality, whereas a router will always enable communication to the outside world (or people would return it ASAP).

With that said, this seems like a straightforward technical problem that may have technical solutions.

Ease of setup for regular Jane/Joe because they know shit all about router configuration. That's why devices just transfer everything over someone else computer a.k.a. "teh cloud".

If you don't care about recording video or video recognition features, the cheapo chinese cams on amazon actually perform pretty well. For $80 you can get 720p video with IR lights, speakers, microphone & it can move around. Usually it doesn't zoom like a dropcam can.

If you willing configure a NAS server somewhere, you can even record the video locally.

The video quality probably doesn't compare but I've used an old iPhone with iPCamera (i'm sure Android equivalents exist) installed for this purpose, which simply hosts an mjpeg stream at a local IP address. It should be simple to start or stop recording the stream on any device that's connected.

Alexa probably uses forms of machine learning and also queries lots of services to find the answers you need. Also it learns from every user and gets better for every user this way. That would be really hard to do with an offline device.

Yes, that is exactly how it works.

If you, as a customer, want to, you can go to Amazon.com and delete all your voice history (or any single interaction).

This is probably a function of the amount of bad news you've read over the years about people getting exploited, taken advantage of, spied on, etc. When you're a kid it doesn't even really seem like a thing.

When you're a kid, you generally assume people around you are all wonderful.

... Then you gain life experience.

/75% jokingly

We'll be dead soon. Enjoy the little things.

Nice try NSA.

Yet I'm guessing you carry a smart phone in your pocket almost everywhere you go.

But phones don't have a microphone on them do they? :-)

Uhm, what?

Sorry, I thought the smiley face would've been enough to give away the sarcasm. For some reason the /s felt like it removed the infinitesimal amount of comedy from my post.

(should I tell him, guys?)

Don't forget the harried parent of a child with low impulse control.

These things would be a lot less "Big Brother" for me if I had a mic key in my pocket that would only turn the mic on when I squeezed it.

Hey buddy want to bet that Amazon is using this massive collection of voice to text to sell to other companies like Apple and Google?

Riight, because Siri doesn't generate enough voice data for apple.

The enthusiastic kid would probably get distracted and discouraged when X can not do "What I really want, like Ironman." While the "cranky" old man has been mis-characterized as "cranky" because "cranky" is often confused with wisdom and experience.

When you realized that the government was making an all out assault on the most fundamental American rights and the civilian sector did absolutely nothing to assure your privacy and anonymity out of sheer greed and narrow minded foolishness that they would be undermining their own success.

I am sure you would not have a problem using these kinds of systems if it were assured that you could not be tracked or monitored because the devices and systems were secured in overlapping ways.

In hindsight it all sounds amazing and ignoring the spy-possibilties, it gets old fast. I don't use Siri, and I can do alot of this with it. But since I got the first Siri enabled device, I've used it mostly just for joking around and my daughter asks her hockey scores. That's the extent of it.

Because when we dreamed of this as kids, the thought of the corporations behind these technologies that harvest our data for their gain didn't come up.

Exactly when did UnconventionalButTotallyLogical = CrankyOldMan ?

The moment you clamored for MIT embedded Linux software and the "let's kill all the GPL it's bad for startups" meme came up.

So now this cool audio controlled personal assistant is just another gadget to buy more stuff from Amazon, instead of something you control.

Is this voice recognition stuff based on MIT-licensed open source speech recognition? I have a project that would benefit from good quality speech recognition.

Well, no, that's the other thing: "let's put everything in the cloud so nobody owns anything anymore!".

Voice recognition is done on some Amazon server. If it goes down or changes API in five years, it will render this thing a brick.

There is something delightfully ballsy about making this only available to users of Alexa Voice shopping:

"Echo Dot is available in limited quantities and exclusively for Prime members through Alexa Voice Shopping. To order your Echo Dot, use your Amazon Echo or Amazon Fire TV and just ask: "Alexa, order an Echo dot"

Also, this makes me sad. I'd kind of like to try this out, but I have no Alexa voice service currently (I don't think)

Even though I own an echo, I wanted to get in early. Here's a link until they remove it: http://www.amazon.com/gp/offer-listing/B00VKTZFB4/

That link still works for ordering :)

I don't imagine it will be like that forever. It's just a clever way to limit demand until they can ramp up manufacturing, or work out the bugs, or whatever their motivation is for keeping it in a limited release for now.

... and also a way to introduce the concept of shopping via Alexa (I would imagine one of AMZN's primary long term goals for the project)

Actually you can already shop via Echo/Alexa today. It's effectively limited to reorders and music for now.

I think it needs a base Amazon echo to work if I understand correctly.

No, it needs external speakers, unlike the original Echo. However, you only need an Echo to preorder a Dot, you don't need an Echo for a Dot to work.

FTA: Includes a built-in speaker so it can work on its own

Built in speaker is for alarms, not media, I think.

it does seem to exclude media.

> Built-in speaker for voice feedback when not connected to external speakers > Includes a built-in speaker so it can work on its own as a smart alarm clock in the bedroom, an assistant in the kitchen, or anywhere you might want a voice-controlled computer

That's crazy, why do I want this without a speaker? The bluetooth speakers they recommend are all really expensive; a speaker + Echo Dot is more expensive than a regular Echo... why wouldn't I just get a second Echo?

You can plug it into a hifi system.

The speaker is for voice feedback only. Doesn't actually support music, news, audiobooks etc.

Do you have a source for that? I have an Echo in my living room, but I was thinking of picking up one of these for my bedroom. I don't really care about sound quality as I would just be using it for Philips Hue, weather, and news.

Sure, this was the link that was emailed to me from Amazon, which also included the following text:

> With its built-in speaker, you can place Dot in the bedroom and use it as a smart alarm clock that can also turn off your lights, or use Dot in the kitchen to easily set timers and add items to your shopping list using just your voice


See the technical details:

> Built-in speaker for voice feedback when not connected to external speakers

My Echo news is a mix of Text2Speech and audio, so I'm not sure that it would work for News.

man that would suck. the computing internals of the echo are less impressive than a raspberry pi -0. The dot has bluetooth and apparently wifi to communicate with speakers and network devices. The real benefit of the echo over other homemade voice command devices like jasper(github.com/jasperproject) is the more proprietary far-field speaker array.

> The real benefit of the echo over other homemade voice command devices like jasper(github.com/jasperproject) is the more proprietary far-field speaker array.

Um, and the insane underlying voice API?

No it doesn't

I guess it's time to order an Echo.

Somewhat related, but if I don't subscribe to any of the services listed, this is a pretty useless product for me. I don't listen to internet radio, I don't stream music, I don't order delivery, I don't use uber, there's already 10 million ways to check the weather, and my life isn't busy enough to need a voice-activated calendar.

Is this the future of tech? Like do I need to have some kind of urban-go-getter lifestyle to find use in any of this? When can I get something useful, rather than "thing I already do, but in a new package"?

What would you find useful? You seem upset that a product was designed for a user that is not you, but that doesn't mean it doesn't have a use. Subscribing to music streaming services, ordering delivery, using Uber; these aren't incredibly uncommon things just because you don't use them. It is rare for new and exciting technology to just pop out of nowhere. Almost all new products are reiterations of previous products in new and interesting packages, it's just up to you to decide if it's worth moving to.

Totally fair point! But would you buy an Echo Dot if you only used Uber and didn't use any of the other services? Or if you used 1 or 2 of the services? How many of these services do you need to use before the functionality of Echo becomes apparent?

I want to be a fly-on-the-wall when someone sets one of these up in their home. I can't picture it fitting in with my lifestyle, so I'm curious to see how others would actually use it. Or would it just gather dust and become a conversation piece?

I find it fantastically useful for social gatherings in my small apartment. While cooking we listen to music from the Echo, and have equal control over the music selection (vs "Who has the iPhone? Can you turn it up? Oh, it needs unlocked") and timers for cooking. It could be far more powerful with playlist creation.

After that, it's Uber, schedule, and weather on my way out the door. As I leave I ask it to turn off the lights.

So I use at least 5 of its features (and stream Pandora/NPR on it, so 7?), and find it useful. I don't think I would miss it, but I do find myself wishing for it a bit when I'm at a friend's house that doesn't have one.

we've had an echo for about a year now and we love it.

by 'we', mean my busy family of four. it acts as everything from shopping lists to homework timers to streaming pandora/spotify to telling jokes -- and more. we easily talk to her (she is basically part of the family) a dozen times a day.

i can totally see how someone who doesn't have all this commotion and such would think it useless. for us tho, it's not useless. it's both fun and functional.

Personally, I won one of these in a hackathon, never thought I would use it at all. But I set it up anyways and I found it actually very handy. Give me a news report while im cooking breakfast, timers for things, playing music. I never have used the OK Google / Siri on my phone because if I get my phone out and unlock it I might as well just open the timer app or google the question at that point, but with the echo while im doing something I can just talk and gain information about different things.

Yes you can check the weather a million ways, but those usually require some kind of dedicated screen time, watching tv, loading up a website, checking an app on your phone. Whereas with the echo you just ask it while you are doing something else and it gives you the report.

Sometimes cool new technology just isn't for you.

My problem with Alexa is, I don't want to invest in a new ecosystem. I'm fine with Amazon being the hub that connects all of my services, but I don't want to use Amazon To-Do List, Amazon Prime Radio, Amazon Traffic, Amazon Sports, Amazon Calendar, Amazon Weather.

That being said, they announce partnerships with more and more services every month. Things are looking up.

It's not perfect, but it is linked now with Spotify, ESPN and other publishers, Google Calendar...

More importantly, they have done a good job (leagues better than the competing voice services) of opening their service to developers thru Alexa Skills, which has enabled hundreds of added features including things like ordering an Uber.

I just wish the Skills weren't behind that unnatural syntax.

E.g. Alexa, Ask recipes how do I make an omelet? instead of: Alexa, how do I make an omelet?

I imagine it's to prevent conflicts but I'd like the option to put some services in the default namespace as it were.

I use my Echo exclusively with non-Amazon services: Google Calendar, Spotify, Philips Hue.

So the product is quite open. That being said, the third-party experience could be smoother - it is a minor pain to have to specify "with Spotify" every time I want the Echo to play music.

Overall I'm happy with my purchase, though.

I'd be happy for more full-service ecosystems to choose from. The more that users get fragmented between these ecosystems, the more each ecosystem is incentivized to open up.

I like Echo because (after setup) I don't have to use my phone with it. And at this point, there's little reason not to be in Prime. $99/year is pretty affordable for most of America.

There is an open API, so this situation will improve over time.

Alexa and AWS Lambda are two of the things I'm most interested in these days (disparate, I know) but they're also things without open source equivalent. I'd love to see that change.

There is an api where you can build your own integrations (skills).

Yep. I'm already invested in Apple products, for better or worse. If Apple comes out with something like this for Siri I'll snatch it up. I actually kind of resent the fact that my shopping experience with Amazon has gotten worse in the past couple years while they push their original series, streaming services, and devices (many of which suck).

If it was working with Google Music, I would have bought it in a heartbeat.

Just ordered a Dot -- what is the Tap? They added that to the page, too, but no info. Is it just the next gen Echo?


Ahh -- the Tap is a portable device with wifi speaker.

(Probably wouldn't call an audio monitoring box the "tap"

Looks like a portable speaker with Alexa? http://www.amazon.com/Amazon-PW3840KL-Tap/dp/B00VXS8E8S

How much was it? I can't find pricing info anywhere.

What? The price of the Echo Dot ($89.99) is clearly at the top of the article.

Weird; when I click it (I'm on mobile), it takes me to a special part of the Amazon app, and there's no price, and it says it can only be ordered by voice by people with existing Amazon hardware that can do that.

Dot and Tap are two different products.

Dot - $90 external-speaker-port echo.

Tap - $130 battery powered speaker, wifi, Bluetooth with echo, for portable use. (I'll get one if it works great in hotel rooms; otherwise won't.)

Wow, what a coincidence. I just did a setup like this with Amazon Echo and Sonos, by "hacking" the Amazon Echo to do audio-out.

I wrote up a little post on it here: https://medium.com/@MathiasHansen/hacking-an-amazon-echo-and...

Obviously, actually having bluetooth speakers with the Echo Dot is a much better solution, but after using the Sonos setup for 3-4 weeks I must say that it works surprisingly well, and despite the audio hack the sound quality is excellent on my Play 1's.

Meh... the problem with Bluetooth speakers is that many of them don't handle the always-on use-case.

My soundbar would work well, but Alexa would get muzzled every time I turned on the TV to watch something. On the other hand, my portable bluetooth speaker will run out of battery if left on its charger.

The AUX connection is almost a better option, but then am I supposed to leave my amp turned on all the time? There's also the same problem where Alexa loses her voice when I switch the amp over to the Bluray player.

Be forewarned - if I am invited into your home for any reason, and I see an Alexa device, I will vocally add a large shopping list of nonsense to your Amazon cart :)

Just wait until my hot new pop single "Alexa, order more toys" becomes popular with the kids.

I prefer to set alarms for 3 am in the morning. I also have an Echo, so I'm waiting for one of my two victims to retaliate.

"Alexa, order 12 gallons of milk"

55 gallon drum of personal lubricant, please and thank you. Wait, make that 2 drums.

Serious question: is it feasible to implement a kind of loose voice 'fingerprint' to prevent this kind of thing? Will/could Alexa know who's talking to it?

I really want an 'Alexa, stop listenting' command. There's a button on the top that mutes the mic and puts a red ring around them, but when I have people over, it's not a great environment to use voice commands anyways.

'Everyone be quite so I can shout across the room to change my music'

A workaround would be to mute the device itself, and then use the remote (which has its own mic, and works well in noisy environments since you just hold it closer to your mouth).

but then I just have to carry a remote while I'm having a party.

It's not a bad idea. If you're hosting, it lets you change the music without interrupting your guests.

Yes, definitely. It adds complexity in that now you have another source of both Type I and Type II errors (failing to wake up, waking when it shouldn't). Voice ID itself is far from settled science to do well, so it would be a tradeoff.

Theoretically yes you can fingerprint voices. The questions are:

1) Can you do this on the Alexa servers efficiently

2) Do you want to? Seems like setting it up could be a hassle. Right now there is zero friction and it just works.

It's not necessary anyways, as you have to provide a pin to actually order things.

Short answer: yes.

Long answer? i.e. is it possible out of the box or is it possible in principle, if anyone actually builds it?

Alexa, how do I do <terrorist-associated> activity. Oh and please order a mile of bubble wrap.

Incredibly, for an Amazon product, Alexa is terrible at buying things. You can only order things you've bought before, as far as I can tell, and even then, only some things, selected by a filter I don't understand.

Will this be linked together with my echo? One thing I do quite often since my echo is in my kitchen is use it to set a timer. I'd like to be able to go to my office upstairs, and ask it how much time is left. Today, i don't think that's possible even with a second echo.

I have two echoes now. Timers are separate, backend content is synced. You could use the Amazon dev kit to make a universal timer. (That is a good use case)

The Alexa iOS app has a good drop down to manage each device separately.

When I first saw the dot I was very excited thinking it was exactly this, but it appears it is just an echo with a lower quality speaker and a simple cable (and bluetooth) you can connect to your own speakers. A bit of a bummer synced timers and playing the same music through a couple of Echos in different rooms would have been a better use case for me, but perhaps I'm the weird one.

Amazon was the only Big Four company silent on the data privacy lawsuit with Apple. Why would I place one of their always-listening products in my living room?

Thanks for the update.

Only for US customers...


* A U.S. Amazon account

* A U.S. shipping address (50 United States and the District of Columbia only)

* An annual Amazon Prime membership or 30-day Amazon Prime free trial

* A payment method issued by a U.S. bank with a U.S. billing address in your 1-Click settings

* A device with access to the Alexa Voice Service (such as Amazon Echo)"

I'm American. This makes sense, if anyone was going to order awhat seems like a range extender, for a device that just brings you stuff you were too lazy to type, it would be Americans.

Googleglass problem. The interface is me yelling publicly. So not super sure that is going to be adopted well.

I use them in my home. Being able to ask it to set a cooking timer while my hands are full is pretty awesome.

Echo is one of those things where it became magically awesome by being somewhat more accurate than I'd expect. Also, Amazon is updating the service back ends, and it is now extensible.

It's you yelling in your home. It's fine, your dog won't judge you.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact