Hacker News new | past | comments | ask | show | jobs | submit login
Why You Should Never Buy an Amazon Echo or Even Get Near One (nakedcapitalism.com)
77 points by Cbasedlifeform on Nov 9, 2017 | hide | past | web | favorite | 36 comments

I really hate articles like this, because on the one hand I'm a total privacy nut and believe that the passive of surveillance of modern society has a terrifying and terrible chilling effect that really could end civil society as we know it.

But I am quadriplegic.

This means that without these devices I would be unable to control my house the way I do, because of the moment I have about 4 Amazon echoes, Google home and various other ways of monitoring me (and controlling the house). It really really chaps my arse to have to give up this much personal information to be able to control my house, but I honestly don't see another way of squaring the circle.

Not directly related to the article, just a perspective I thought some of you might find interesting.

Edited to add: When I say I hate these articles, what I meant was not that they shouldn't be written because I think they are very valuable. I meant that they make me feel sad about the current state of surveillance on the Internet.

I read the article expecting something new, but it just goes on interminably with objections that we all know about from the snowden era. Nothing is really specific to the voice data, its germane to every device connected to the internet.

So I too dislike articles like this, the title is misleading and the text was not compelling.

While I agree with the sentiment that it doesn't tell us anything we don't already know from the Snowden era, I really do think we have to keep banging on about the Snowden revelations. They (The revelations) really do need to stick in the public consciousness, not just in the consciousness of the hive on HN. Because honestly, the only substantive privacy changes I've seen in the UK are terrible ones and that scares the crap out of me.

But I have absolutely no idea how to convey the simple idea to people that are not in the tech world that "if it's connected to the Internet, don't consider it private", which I'd love to be proved wrong about but I really don't think I am.

There's a lot of conjecture in this article. While I'm sure most voice assistant makers would love to be able to tag multiple voices simultaneously in realtime and perhaps even in the background, it's simply not there in today's technology.†

The Alexa works in a crowded environment because it has directional microphones and facing the Alexa makes a big difference.

The Google Home assistant does differentiate requests based on voice pattern, but from my testing it actually only analyzes the word "Google". So person A can say "Okay", person B can say "Google", and person C can say "What's the weather?" and the Google Home will recognize person B (this might have changed since I did my testing, so don't quote me on that).

All this said, yeah of course any company can spy on you, and honestly Alexa is the least of your concerns. Laptops have 1-2 microphones built in. Phones have 3+ microphones. A lot of monitors and "smart" TVs come with microphones. Your intercom system can be wired into, and you can most likely hack into digitally based ones as well. Don't even get me started on IoT devices from non-IT companies.

A lot of malicious things can be done with technology today, but Alexa is most likely the safest device of the examples I gave. So yes – stay vigilant – but don't ignore the more obvious vulnerabilities in your home.

† There are some examples that prove we will be able to get there eventually (like Honda's ASIMO which has a demo of three people asking for different things at the same time), but nothing of the like has been seen in uncontrolled/noisy environments.

I have learned to never rely on something being, "technically not possible" that is only a side effect.

My point is not about whether or not it's possible because just about anything can be done eventually. It's simply a case of what is more likely:

1) Bleeding edge algorithms that can not only separate multiple speakers in parallel on a small device, but also transcode them and track their identities over periods of time and report back.

2) Alexa does exactly what it says on the bin because Amazon already extrapolated everything they need to know about you (including whether you have a teenage daughter who is pregnant[1]) from your last 3 text searches.

Also see this relevant comic: https://xkcd.com/538/

Now I may sound defeatist but this is not my intention. Like I said you should stay vigilant, but this article is barking up the wrong tree and may in fact distract from the real dangers in mass surveillance and tracking. Those dangers are far more primitive, yet effective, than you might think.


[1] https://www.forbes.com/sites/kashmirhill/2012/02/16/how-targ...

Awesome, thanks for the clarification. We tots agree.

This article could be correct in 1 or 2 revs of all these devices. Which makes sense in our superscalar universe.

The Echo was able to pick a voice out of a crowd engaged in conversation. That means it is capable of singling out individual voice. That means it has been identifying individual voices, tagging the as “Unidentified voice 1″, Unidentified voice 2” and so on. It has already associated the voices of its owners, and if they have set up profiles for other family members, for them as well, so it knows who goes with those voices.

this is overstating the case. if you don’t believe me, have two people speak at an Echo simultaneously.

multi-speaker babble is still a major challenge in speech recognition, and speaker identification is equally hard and unsolved.

i still won’t buy one, but it’s important to be reasonable.

It's not just over-stating it, it's demonstrably wrong. All you need to disprove it is to ask someone else's Echo to answer a question.

What I presume it's actually doing is responding to the keyword, Alexa in this case, and somehow correlating the rest of the question to the voice that said the keyword. I don't happen to know what criteria it uses to do that but it's clearly doing that or something a lot like it.

That's much less dystopian than voiceprinting the family.

That's basically exactly what happens, except without the voice correlation part. Alexa should be easily tripped up by multi-speaker babble: speaker A says the trigger, speaker B starts a command, speaker A interrupts, and Alexa hears A+B+A.

It's certainly how _i_ solved the problem when I wrote a SR system ...

Surely it's possible to sniff what's being sent back to Amazon's servers? If Amazon are lying and they are storing/analysing everything the Echo hears, surely this would be easy to prove?

It is possible to see all traffic it sends, and possibly even fake certificate authorities (depends on how resilient the Alexa is to this tampering) and trick the Alexa into giving you the data it sends encrypted using a key that you control.

However, this line of reasoning can be refuted all the way down to being impossible to prove/disprove. For example, there is reasonably an audio processing chip in Alexa that does always-on keyword listening, and it's possible it could track breadcrumbs over time (e.g., voice fingerprints, triggering keywords like "bomb", etc). This data can then be interlaced with innocuous data, for example inside an access token (opaque blob used to identify on whose behalf the Alexa is making requests). That would make it virtually impossible to find even if you had full access to the network traffic.

Anyway, when it comes to these things I like to take an Occam's razor approach. There's a great number of things a company can do to spy on you, but most likely when it comes to mass surveillance it's easier to tap into more obvious sources of data like your browsing history from the ISP, your phone line, Facebook/Google tracking data. In fact, I'd be more scared of say Facebook's and Google's voice assistants than Amazon or Apple because the latter two don't depend as much on consumer identity as a business.

Strong encryption is a thing.

EDIT: Another thing that just came to my mind. Even when you analyze network traffic and observe that traffic only occurs during your queries (i.e. in the seconds after the hotword is uttered), that doesn't mean that the Echo won't use the opportunity to send some previously-recorded audio to the server together with the current recording. In the same way that clever hackers disguise themselves by having their network traffic mimic the shape and direction of legitimate network traffic.

Yes but we could could look at the amount of of data transmitted in total. Audio compression is well understood, and can infer within an range of usable quality, if any excess voice or other data is sent over the network.

So what you're saying is, if a company like Amazon or Google has the excess bandwidth, it is beneficial for them to send way too much data in the first place in order to disguise what data is actually being sent.

Now, there is some security basis


>Uncovering Spoken Phrases in Encrypted Voice over IP Conversations

Assuming its sending it as audio, and not as transcribed text which is both smaller and also much more compressible.

ASR is a hugely complex process that is handled by ML algorithms on Amazon's servers. The echo simply does not have the hardware to handle this on it's own.

Is it though? Not trying to be argumentative but I remember using dragon naturally speaking to do voice dictation way back in like 98 on a processor that makes today's average smartphone look like a supercomputer. I thought all the ML stuff was for figuring out context and the like, but straight transcription?

Modern voice codecs are extremely compact. An annotated text representation of voice will take up equivalent space.

You own the client. You can break any crypto it is doing.

I'm sure you could use Wireshark and see what requests are being made, however, they very likely use TLS so getting the content of those requests would be extremely difficult if not completely impossible.

However, if you don't mind potentially destroying your echo, I'm sure you could reverse engineer a way to see what's going on.

As far as I know, only this year's Echo models don't have a known way to root them, so you could likely circumvent the encryption on an older model to inspect traffic. I'm not aware of any publicized results of someone doing that though, and it doesn't necessarily tell you what the backend can and can't extract from the audio data.

If, as the article claims, my phone can be used to eavesdrop on any conversation I'm having, "even when the phone has been turned off", what is the added attack vector of a home assistant?

If I get that right, using your phone to spy on you is a targeted action that targets you if you are of interest (sidenote, it's funny how many people deny being of interest even if they could build you a rocket program if you kidnapped and motivated them).

A home assistant on the other hand is always on, listening and profiling its vicinity. So there is a large difference, not if you're under attack but rather if you are not at that moment.

I don't see much of a difference here. Both are devices that listen to their surroundings. Both can be used to target the devices owner (if they are near their device) or to profile the devices vicinity. The only difference I see is that the phone location changes more often and that the owner is probably more likely to be near it.

I guess people are just more aware of the fact that home assistants listen to everything, so they associate a greater danger with those. Also, everyone is of interest in the eyes of advertising/data collection companies.

> Both are devices that listen to their surroundings.

That's the key, though - Echo/Home are listening passively, by design. Your phone is listening actively, but can be activated remotely to listen passively if you are the target of surveillance by a state actor.

>but can be activated remotely to listen passively if you are the target of surveillance by a state actor.

As it gets cheaper and easier to retain and analyze the output of the former the bar for the latter decreases.

I really don't want to get flagged for the "random" searches and audits every time in interact with a government service in 2020 just because the way I talk checks the proper subset of boxes for some AI to set the "probably doesn't like us" bool on my row to true.

A lot of handwaving but little substance.

This sort of presentation makes for nice kindling but unfortunately not much more. In the end, your voice is unfortunately 'public'.

There is no difference between this and muttering too loudly. If someone hears you asking the voices in your head to quiet down, you would not think to blame them for violating your privacy by listening.

Eventually we will all have to re-assess what we accept businesses, government, and private parties knowing and doing with what we say. It is also a reminder that what we do (our body motions) will be up next for recording and analysis as motion, cameras, and facial recognition become more prevalent (iphone X).

>As most readers probably know, both the microphone and the camera can be turned on even when the phone has been turned off. He uses headphones to make calls. This makes the recent phone design trend away from headphone jacks look particularly nefarious.

What's preventing you from doing the same with USB 3.1 or lightning headphones?

The Alexa APIs are available for use. They opened them up so companies like TV manufacturers can build Alexa into their TV.

I'm waiting for someone to create an open source hardware device that talks to the Alexa API. Then you can be sure what text or audio it is sending.

Did anyone of you walk out of a party or other social gathering because you did not want to be recorded by an Echo (or similar)? I'm quite sure that I would, but I haven't come across one of these yet. (Or, at least, I wasn't aware of it...)

Walking out on Google glass "spies" is a real thing. Echo, Alexa, Siri not yet I assume.

So, what about Siri and Cortana?

Same deal right? Always-on microphone constantly polling an audio signal?


I always wonder, people who leave snide comments like this on privacy related issues:

What is you stance on your own freedom and independence and those of others around you, who might have not want large parts of their lives digitized and centrally analyzed.

See, because it's one thing if you upload your own data and have it analyzed to your heart's content, I think you should be free to do that. But another to do it with unsuspecting strangers, it's infringing on my freedom not to have that done to me.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact