* install device that is designed to listen to speech in the house
* the device is connected to internet
* the device is capable of contacting other internet peers/services/hosts
* the device knows a bit about its owner's internet presence such as contacts
* the device is equipped with simple conversational user interface based on fuzzy human speech-command detection
These basically bound the failure modes of the complete system. I am not surprised that a glitch like that happened. I can certainly attest to how shocking it might be to discover first hand that it really did happen, but given the context I can't say it's hard to not foresee something like this.
I remember the first voice command stuff on google. Buddy of mine was trying it out and I couldn't help myself but shout:
"NEW TEXT MESSAGE TO JENNY! FUCK YOU! SEND!"
Fortunately the phone was too slow to take it all (even if it had Jenny would have found it amusing, she was a cube or two away). But yeah first thing I thought of was that if it's listening how easy is it for it to mistake intent or someone else or etc.
Humans talk funny, computers don't get it, they're gonna mistake commands all the time.
As it is I hate how I can't talk ABOUT "ok google" around my phone... because it always goes off.
Back in college, long ago... the Mac 660 AV was state of the art for personal computing. The computer lab I worked in had just updated all the old SEs to these over the summer - I was part of the group that set up and plugged in a few score computers. With the microphones.
Our manager was a bit concerned about if the microphones would get stolen as he looked out over the rows of computers with the microphones on top of each monitor.
I yelled "Computer, Shut down. Yes."
A good chunk of the computers had indeed shut down.
We decided to go back and remove all of the microphones from the computers. Theft was not as a large of a concern as a bored college student with a prankster attitude during finals and the riot that would have ensued.
Our computer lab had a bunch of SGI Indy's, and it was first machines the CS department had with cams on them. It took less than a semester from they were installed until every student had learned that if the camera cover was not in place it was practically an invitation for someone to log in remotely and prank them while using the camera to observe from a safe distance. I like to think that it infused a healthy dose of concern for privacy and security in the student population..
FIRMLY within the spirit of good prank hacks like what has just been commented: a small number of USB webcams' LEDs are user-controllable, such as the random old Logitech C300s I have floating around here. You want webcam-tools from SF.net (other versions will spout obscure errors), then you can do
$ uvcdynctrl -s 'LED1 Mode' <0..3>
0 is off, 1 is on, 2 makes it flash (and you can even set the flash rate), and 3 is auto.
0, 1 and 2 apply regardless of whether an application has an active video stream coming out of the camera.
Best not mess with someone who knows where you live and has your credit card. If she's in a good mood, you might be hit with just a pizza prank. Hope she doesn't realize she can swat you without moving a finger.
I mean, that's an ambiguous term to begin with, and you'd confuse even humans with it.
Like, are you referring to a sculpture made of cocaine in the image of a poodle, or are you talking about a poodle train in the ways of a St. Bernard rescue dog, capable of relaying quantities of said substance, or perhaps something else entirely?
>The track was inspired by an episode in which Iggy Pop, during a drug-fueled period at Bowie’s LA home, hallucinated and believed the television set was swallowing his girlfriend. Bowie developed a story of a holographic television, TVC 15. In the song, the narrator's girlfriend crawls into the television and afterwards, the narrator desires to crawl in himself to find her.
I got a touchscreen for my desktop PC once (it was the Windows 8.1 era, after all). I immediately discovered that it provided me no significant practical benefit, but it was now infinitely easier for someone walking by my desk to poke my screen and mess up what I was doing.
I had the inverse experience. My wife's laptop had a touch screen and I thought "What a dumb idea, what use is that?"
Then she'd ask me for help and rather than have to wait for her to find what button I was talking about I'd just tap the screen a few times and we were off and running again!
I bought the original Surface Pro the week it came out. I actually really like the interface and thought that it was great. I didn't really understand why so many people hated Windows 8 until I tried to use in on a desktop. Even with a touchscreen it's pretty annoying.
I'd been using Linux as my primary OS for years at that point, so I think I'm pretty critical of Microsoft in general, however they did a good job building a touchscreen interface.
The state of touchscreen support in Linux was pretty bad the last time I checked though.
I'd definitely agree with regards to the Surface, and a lot of Windows 8/10's UI makes a lot more sense once you see it on a phone as well. I'd say Microsoft did a poor job retaining the existing experience for desktop users while focusing so heavily on new form factors and experiences.
> The state of touchscreen support in Linux was pretty bad the last time I checked though.
Anecdotal — but have had no trouble for the last two years with my dell xps. Other issues with said laptop (keyboard), but touchscreen support in linux isn't one of them.
Ah, I saw that the Dell XPS came with Linux and had a touch screen, and I was kind of wondering about that. I guess I'll have to try Linux with a touchscreen again.
My Surface Pro 1 is the only computer I have with a touchscreen right now, and it has enough other driver issues (the wifi and the Wacom drivers suck) that I probably won't be trying again on that.
Also, that'd be long-term usable only if the screen is ~flat on the desk. Otherwise it becomes exercise.
And in effect, that's what I've used for many years. I have a relatively large touchpad. And when I need resolution, a graphics tablet.
The Surface Studio is cool, but I'd want something that resisted coffee and food splatters ;) As the original Surface did.[0] It could also scan text :) Which is something that I, when I was first learning about PCs, naively expected.
Ok Google has been a massive disappointment every time I've tried to use it. I tried "next song" while in Google Maps and it marked Exxon along the route. Then it played a song in a streaming app by "Max," wasting my data and making me close the streaming app I didn't even know I had and restart the music I wanted. "Next" turned out to work a bit better but it took multiple tries.
I remember switching to Android way back whenever the iPhone 3gs was "second-best" iPhone (was the 4 the next?), from the 3gs.
The 3gs had perfectly functional "play my xyz playlist", "next song," "play songs by artist" functionality, and to this day I don't understand why it took so long for Google to implement it, and why the implementation is so shitty.
Well, it’s probably not that bad. I think it’s more like google makes everything work ok, apple tries to really nail core functionality and let the rest slide.
Music interaction is a great example of the trade off.
Not sure why you're being downvoted. Google has different business model than Apple and expecting it to care about user experience in the same way simply doesn't make sense. And yes, they mostly care about data, because that's what they're good at - collecting data, analyzing it and turning it into profit.
Really? I find it very accurate and helpful, but that could be because i come from siri hell. Which couldnt ever seem to get it right, especially using gps.
though i can easily see with background noise, next song sounding like exxon. I think due to frustrations with siri, i talk to google in very clearly enunciated words and havent had any interesting failures yet.
We did something similar to a coworker when Apple Watch came out. He tried to dictate a text response to someone that said “thanks for doing that.” Someone else yelled “DICK!” afterward. Apple Watch is notoriously slow on processing text to speech, so he didn’t think the watch picked it up and hit send. Then the text was added.
He had to call the guy he tried to text to in order to apologize.
My friend installed IBM OS/2 WARP beta when it came out, it too was voice controlled. We would shout into his room, "select all, delete" or "format c" or other childish fun commands.
Alexa has 4-5 joke responses to self destruct. E.g. "I'll start the countdown but only on the understanding you'll dramatically cancel it at the last second", and "3-2-1 Boom. Hmm that didn't quite work did it?" or something like that, and a few others.
Selling a product that had a voice command UI we found that people will go out of their way to mess with it. Sometimes for fun, sometimes to unload their frustration. There is something strange about a voice UI and I think it is the expectation that the intelligence should be about the same level as human intelligence. If it is not there, and it is making the user look silly by making them repeat something obvious multiple times, they will quickly start hating the product as a whole. It's almost better not have a voice UI unless it is pretty darn good.
Yup, Mac OS 7.5 had speech commands, my colleague named his machine "Oi!" and I would routinely shout "Oi! Shut down!" as I left the office. Never got old.
Of course, and as was widely known (and mentioned when this subject came up) at the time, FORMAT C: would not have actually worked (on systems where C: was the boot volume) because OS/2 locked volumes that were in active use as filesystems, preventing them from being reformatted.
Because it supposedly puts people on some kind of watch list? Tho pranking with stuff like that ain't exactly the coolest thing to do: "Haha I framed you for terrorism, gotcha!"
I do not think this is funny, I think it's rather immature.
But if things like this _actually_ trigger any sort of response, then the effed up thing isn't the (childish) prank, it's the system actually acting on nonsense.
If I call the police and claim that you're armed and hold hostages, then that's irresponsible. If I shout bullshit instead and these novelty devices turn that into a web search, there's no way in hell that should have any consequences whatsoever.
In my friend group, we're all on watch lists anyway. I guess I should have clarified, I'm not doing this to strangers, we all do this to each other and find it hilarious.
Yeah, but how old are you and your friends? I think eventually most people grow out of this sort of immature 'pranking' and just find it a bit boring...
This is even better because of the date, we had voice operated computers 24 years ago that ran on machines that wouldn't even be called a potato these days and they didn't need to send the voice off for cloud processing.
And voice control didn't fail way back then because it wasn't good, it failed because it was a terrible input mechanism and that is why it will fail again.
It's a terrible input mechanism for general usage. It's a fantastic input mechanism when you don't have another one handy, open and ready to use. E.g. I use Alexa to trigger my Harmony hub to turn on the TV and switch to the right input with one command, instead of figuring where the hell someone left the remote or digging my phone out and open and app. It's faster, and easier. If a phone call comes in, I don't need to find the remote or the app to mute or pause what's on TV, I just ask for it and it happens. If my hands are full I can still turn the light that I forgot about off on my way to bed.
The "failure" the first time around was that people wanted it to be the main interface, instead of accepting that where it works is as an ubiquitous auxiliary interface that's available everywhere in the house, rather than trying to compete with a keyboard when you're right in front of said keyboard.
Voice competes by working at a distance, even when your hands are not free.
It will never replace a keyboard, as there are so many instances where speaking commands is impractical, but it doesn't need to replace a keyboard to be useful.
> I’m a millennial who grew up with Dilbert, and it just feels so wrong now.
There's nothing wrong in enjoying the writings of people whose political ideas you disagree with. I'd rather say that if you cannot do that, there's something wickedly wrong in you.
Adams has been a loon for a long time, his Dilbert Future book talks about his personal philosophy which is straight up "Name it and claim it" theology except without the Jesus. Adams basically attributes all his good fortune to his own talent and hard work.
And then of course that means everybody who is struggling, well those people just didn't want it enough, they aren't putting in the hard work like Adams. Frankly it's their fault thinks Adams.
To be fair that line of think is a problem here on HN, among millenials, and among older generations as well. Americans in particular, but our species as a whole need to learn to separate the ideas of success and merit.
You don't like Trump or people who support him. What does this have to do with this article?
Not everything in the world need to be connected and associated with US politics and trump vs anti-trump politics. If we brought in the political views of every creator, inventor and artist than we would not be able to discuss anything on HN other than politics since every creator, inventor, and artist has a political view and people who agree and disagree with that political view.
Can you show me anything other then the parts where Scott Adams simply debunked Trump's tactics for conversational interference as related to business negotiation?
Adams frequently pitches his book "Win Bigly"...marketing himself as one of the few people to predict Trump's rise, due to his rhetoric. Showing examples would just be marketing for Adams.
I always think if I ever present somewhere again, I'm tempted to start off by asking people to put their phones into silent mode... wait a moment, then "OK Google, Siri show me dick picks"
It is. OK Google won't unlock a locked phone unless the voice matches, and you specifically enable the feature that allows it to unlock the screen. Once the screen is unlocked however, only certain functions that access your account will need to match your voice (if you spent the time to get Google to recognize your voice.) Performing web or photo searches wouldn't require you to match the voice if the phone was otherwise unlocked.
Asking someone to silence their phone might just be an attempt to get them to unlock their phone before issuing the offending command.
It’s silly that radios and podcasts and random people have the ability to trigger Alexa just by saying the name. “Hey Siri” gets around the issue by only listening for your voice, not just anyone. It’s shocking that anyone would build a voice assistant with zero authentication or authorization built in, so much so that an advertisement on the radio or TV can trigger an action on your device like on an Echo.
> "I felt invaded," she said. "A total privacy invasion. Immediately I said, 'I'm never plugging that device in again, because I can't trust it.'"
Let this lady's reaction be a lesson to all of those who say "people don't care about privacy." She wired her home with Alexa devices, and yet when she found out how her privacy could practically be invaded through such devices, she seems like couldn't get rid of them fast enough.
People care deeply about their privacy. They just don't understand the true implications of even a device that's supposed to "listen to you" and the possibilities of what someone else could do to their internet-connected devices.
This is why governments must intervene with laws such as GDPR or even much stronger ones in the future to protect people's privacy, because the companies themselves have no incentive to "self-regulate" other than a negligible amount.
The only problem is some governments will not intervene unless enough people go through similar situations such as this lady. In the US even that may not help much because the vast majority of politicians care most about continuing to receive (however modest) donations (also called bribe money elsewhere) from the industry.
People care about privacy to the extent that they don't want people in their social circle knowing stuff. They don't care so much about a faceless organization taking it.
No, they don't. Do you really think if, before she made this purchase, someone explained to her all the things that could theoratically go wrong with this device, she would have chosen not to buy it?
She only cares now and acts like she doesn't share quite a bit of the blame because it actually affected her. I think it is highly likely she'll just switch to an Apple or Google equivalent instead of noticing the larger problem with these devices.
> No, they don't. Do you really think if, before she made this purchase, someone explained to her all the things that could theoratically go wrong with this device, she would have chosen not buy it?
I think this would only indicate that she might care, but thought theoretical worst case scenarios are not sufficient evidence for an actual, non-negligible risk to her privacy. If they had told her that exactly this would definitely happen, I think she would not have bought the device.
I think one problem of explaining privacy issues to non-techies is justifying why theoretical problems often pose a real threat without sounding (to them) like a conspiracy theory nutcase.
This indicates that people may care deeply about their privacy when they've experienced a breach thereof. It is consistent with people not caring about their privacy because they can't imagine such a breach or project how they would feel if it did happen to them. It does not support their not caring about it.
The loss of privacy spreading throughout civilization is recent and rapid. It's not surprising that somebody might think about it differently after they've experienced it. We haven't had time to come to terms with what out technology can do.
You're ignoring risk. This is one failure out of millions(?) of customers. It's like someone being seriously injured in a bicycle crash and deciding not to cycle to work anymore. Or having their house burgled and moving out of the neighborhood. People get hurt when rare bad things happen to them and it scares them off.
Completely agree with you. I've seen this over and over and over with people I know. They always want something badly enough that they'll find any excuse to dismiss any downsides to it.
"Amazon/Google/Apple/Facebook/X have very smart people who will not allow this to happen."
"I have nothing to hide."
"People who want to listen to our boring conversations are losers. Let them waste their time doing so."
And the list goes on. I'm not making any of this up.
And then when it (occasionally) happens to them, there is a whole lot of dramatic outrage and someone else must be blamed. (Although in this case, I agree someone else is to blame - but I've seen this play out when people explicitly agree in the contract to consequences, and are warned quite clearly by a human about the implications of the contract).
That's not unreasonable. Lots of good things come with risk and when they go wrong we blame someone. Your house burns down and you blame the arsonist, not yourself for having a house made from fuel! Car crashes kill people and we blame the drunk driver not the victim innocently using the road which is known to be one of the most dangerous places to be.
> No, they don't. Do you really think if, before she made this purchase, someone explained to her all the things that could theoratically go wrong with this device, she would have chosen not to buy it?
Yes. Do you think if everyone knew what was going to happen with the Facebook scandal before it happened they would have signed up for Facebook? That's why it's called a "scandal."
>Do you think if everyone knew what was going to happen with the Facebook scandal before it happened they would have signed up for Facebook? That's why it's called a "scandal."
Yes.
I also noticed how they haven't left once the scandal came to light.
Half of Facebook's users visited one less time in a 3 month period? I think they'll survive.
I've visited less in the last month, but because they've changed the algo on my feed and it's not as interesting ... though that could be a network effect, but I'm getting quite different stuff.
> This is why governments must intervene with laws such as GDPR or even much stronger ones in the future to protect people's privacy, because the companies themselves have no incentive to "self-regulate" other than a negligible amount.
Market failure does not imply government solution. You're going to need a much stronger argument than "companies aren't doing a good enough job."
Amazon invaded her privacy, she realized it, and now she doesn't use the product. If you want others to do the same, start raising awareness yourself instead of enacting nanny state laws with other people's money.
I fail to see how market failures are not a reasonable point to look at whether governmental solutions are applicable.
Case and point - 2008 Market Crash, everyone agrees the dishonest practices that led to the crash should have been made absolutely illegal (in the cases where they already weren't)
In this case, we have MANY significant and potentially extraordinarily damaging failures in the markets to effectively protect people's privacy occurring fairly regularly. GDPR is not the ideal solution my any metric, but that's a failure of government to create effective legislation - not a lack of need for effective rules.
Indeed, "if you apply General Curtis Le May to a situation and you get havoc, well, that s what you called General LeMay in for" (to use an Eben Moglen quote out of the context in which he said it); if you host a device in your home running proprietary software you ought not be surprised that it is spying on you and the proprietors determine what to do with that data, not you.
We must not forget another critical component of all this: these always-on listening devices run on proprietary (nonfree, user-subjugating) software. Therefore the user has no permission to: inspect what it does, modify it to do only what the user wants (or, since most computer users aren't programmers, get someone technical they trust to perform such modifications for them), run the modified software in their device, and distribute copies of the improved software to help their community.
If these devices ran on free software (software that respected the user's freedom to run, inspect, modify, and share) the more technical among us could help them. But as it is even the most technically-minded willing person cannot legally do this work to help them.
As Eben Moglen reminded us after the Snowden revelations came out: It's critical that we don't fall into the trap of saying something akin to 'those kids take too many darn pictures' like concluding that we just can't have these devices or their services at all. We can have all of their alleged conveniences but only if we have free software implementations.
I dont know if it does or not but even if it did, that wouldnt make it free software. You cant compile your own alexa, or modify it to remove code you dont want, or submit improvements to it. Android is open source (excluding drivers), but google play isn't, and neither are the apps on it.
Most phone now have some kind of lock screen, which makes it pretty difficult to get to the butt dialing stage. Will speech command recognition get to that stage?
The main reason I won't have any of those products in my house is because of that. I'd much rather have a confirmation of some kind before the system takes action.
I always press the lock button before I put the phone back in my pocket, but I've seen people put the phone back in their pocket or purse unlocked. If you have the phone app already running, butt dialing is very likely to occur.
My friend does this fairly often. He'll finish a call, and turn off his screen, but he has a tendency to to hit the fingerprint scanner as he's putting it into his pocket due to the design of the case and the way he holds it.
Got an LG G5 and quickly learned that they added a "feature" where double tapping each volume button opened a preconfigured application (Camera or QuickMemo+). Even when locked. Phone had to be completely powered off for it not to trigger. Went in settings and disabled that on day two when I saw how many pictures I had of the inside of my pocket.
Some Androids (like mine) won't lock for a few seconds after turning off the screen. It's a nice feature when you press to turn off the screen, and quickly realize you need to do something else. Just turn on the screen and it's still unlocked.
Entering the pattern after it's locked is pretty much impossible though.
I can configure how long that delay should be. IIRC it's int he display/lockscreen menu.
I don't like it tho, as I much prefer the device to be locked when I tell it to, so I won't have to care anymore. The fingerprint scanner on the back makes the unlocking trivial though.
It might, but it hasn't in the 18 months I've been using it. I don't know what the timer is supposed to be, but it seems like something less than 2 seconds. I'm pretty sure I couldn't get my phone in my pocket and press the power button that quickly if I wanted to.
Also, the case adds to the pressure needed to depress the button anyway.
I dropped my phone in the gym and it somehow typed garbage all over the Notes app where I keep my grocery list and record of what I did in the gym last...
This sort of thing makes me nostalgic for the old days of explicit "Save" commands in every application.
Yeah, a digital assistant could (should) verbalize "uh huh, yes, i'm listening, interesting, hmm" like a real, nosy human. Then you get a chance to say, "go away, alexa".
I will have one of these in my house when it doesn't send recordings of me to the Internet in the first place. It can do the voice recognition locally. I'm actually really interested in having one of these in my home, I'm also actually really not interested in it being cloud based at all.
some kind of acoustic biometrics would be helpful here (ie respond only to account holder, or disable some actions for others) along with better heuristic recognition of directives
thats not foolproof but much better than what we have now, and i think we are pretty close
Agreed, but I think vocal biometrics is a significantly more difficult problem to solve when even the best speech recognition still has issues like this.
Phonetics is hard. Especially with ambient noise, echo and such. I had a conversation with one of the speech engineers when I worked at a speech recognition company and the level of detailed problems to solve was impressive. Totally made sense after talking about it, but things I would not have thought about before.
I'd imagine the next thing to come in this area that would really make an improvement is an "on person" microphone. Maybe it's a pen in your pocket, or some kind of vibration detection (that could pick up the wearers voice), that would then allow some improvemnts in the domain of "who is talking" and how well the voice is processed.
The vibration detection is commercially available, in the 3-digit $ range, but it's unfortunately hideous as it needs some mild force against the throat and thus requires a very high neckline or a scarf.
The potential alternative, Subvocal recognition, suffers from the need to have nerve/muscle-sensing electrodes on/around the throat, which have poor long-term bio compatibility (think >10h/day, on average) and the alternative, implants, suffer from growing out if a wire sticks out, at least as far as I know. One might be able to implant special electrodes the skin would not reject, but I don't know of any suitable material.
The obvious benefit is that you use your normal speaking logic, just stop at the point where you actually move your throat (afaik), and don't modulate with your mouth or lungs. It's silent for anyone around, so one could, theoretically, call someone while sitting inside a meeting, and hear both speakers, while being able to selectively talk to the other end of the phone line.
Both of these technologies work in nightclubs and fighter jets. I assume they are very high SNR, as far as ambient noise is concerned. The subvocal one might just be a phonetic/intonation input, which then requires voice synthesis if actual voice is the goal.
You don't really need perfect biometrics for it to be useful, or even biometrics at all. A "my human just spoke in this room, and here's a signature of what the fleshbag said" broadcast from a microphone to let devices within a very short range to get confirmation of which person spoke if they picked up a command, coupled with pairing your speaker to your devices to give the notification from your mic more authority would already help substantially with most of the issues people have. It wouldn't stop a determined adversary, but for most people it's not a determined adversary that is the problem.
It doesn't even need to get good audio - just enough to give a bit of an indication that what the device picked up was me talking and not random noise and ideally some way to somewhat correlate it to the audio the device picked up to give it an indication it was me it heard. It'd also give you the option of setting the devices to require confirmation for certain types of orders if they were not confirmed by an authorised device, or if they were not confirmed by a device (so you could let people present give instructions but not some random joker on voice chat in your online game for example).
If we could get support for that into e.g. a watch, it'd be very much useful.
I agree, it would probably not even be close to foolproof.
I believe any implementation of security through acoustic biometrics would be vulnerable to replay attacks.
Systems to reproduce acoustics with high fidelity are commonplace - You might be using the output component of such a system right now if you're listening to music.
You could make the Assistant remember the exact fingerprints of all previous activation phrases and only trust you if it was original. This could be circumvented if you spoke the activation phrase at any point where your assistant could not hear you, for example to another Assistant of the same brand.
On the Echo at least, there's an option to have the device make a thump-type sound when it's activated. There's also a light ring that activates out of the box and shows where it believes the audio is coming from.
Maybe, but if it can't handle that, it's kinda pointless to try to detect direction - all the places in my house where I'd be likely to want to place it are <0.5m from a wall or less; I'm certainly not going to place them in the middle of a room.
I can't pick up my coworker's phone and manually dial a number, because years of learning from our mistakes have led us to implement lock screens and such.
I can, however, say "Hey Siri, call 867-5309." Or "Hey Siri, Facebook status: my anus is bleeding. SEND!"
Because we are apparently incapable of applying past lessons to new technologies.
If they operate anything like the way I operate, they know there should be some kind of lock on there, they just haven't gotten around to building that part yet.
And if their management is anything like my management, they go ahead and release it anyway.
To my knowledge, “Hey Siri” has never been tied to the owner’s voice. I was able to post to a colleague’s Facebook account a couple years ago, using nothing but voice commands. It was sitting on his desk, locked. I haven’t tried it since, and I keep it disabled on my own phone.
Meanwhile, I have my Pixel lock on screen off (only because the fingerprint reader makes unlocking/turning on easy) with OK Google unlocking enabled.
The only problem here is that Google still doesn't recognise my voice half the time, perhaps because I'm never sure about how I should speak to an inanimate object.
Speaking of unlocking methods, face unlock appears to no longer be available as an option for me. I can't find it in settings anymore (smart unlock etc).
a) Your phone has a microphone that picks up information spoke more-or-less directly into it. An Echo is specifically equipped with a microphone array designed to extract audio from anywhere in the room.
b) There is a great deal less ambiguity around "pressing buttons" than there is around interpreting speech. While it is unlikely that your phone will incorrectly detect button presses, it's very common for voice-activated devices to a) incorrectly detect a wake word (either negative or positively); and b) misunderstand some particular word used in the command. Your phone is not going to think you pressed the "Clothes" button when you actually pressed "Close".
c) The entire functionality of the device is accessible from behind that big ambiguous interface. On a phone, there are many distinct steps and screens to step through when you want to do something (this complex interface, by the way, is a non-trivial part of why butt-dials are quite rare these days). On a "smart speaker", most things are just one misheard statement/command away from occurring.
It's interesting that the pattern of human communication where things are just one misheard statement/command away from occurring has been well modeled in these devices.
Does having the device programmed to make mistakes make it more comfortable to use, because we know humans are infallible too.
I don't think there is much of a difference. People took a while to understand butt-dialing, and not do it as much. It'll take a while for people to get used to voice assistants that trigger seemingly randomly.
when my phone is locked (which it always is in a pocket) it can't butt dial.
whats the analogous behaviour for alexa?
also remember these are shared devices in a home, not a personal device in your hand. could i shout into somebody's window, "hey alexa send my browsing history to bob@hotmail.com"!
The analogous behaviour would be you being aware of Alexa's flaws and not triggering it accidentally.
While you may have never pocket dialled, plenty of people have. I've received more than a few accidental calls in the past.
I equate this to some phone users placing their devices hanging over the edge of tables - as if they don't care about them hitting the floor. Should phone makers toughen their phones and should Amazon improve Alexa anyway? Sure.
They can still call the police while locked, which is something that happened to a friend with an old feature phone while were were in the process of illegally hiding/dumping an old car.
AFAIK, pocket dials have mainly been reduced by improvements in phones, rather than changes in user behavior; if the camera just sees inside of a pocket, it's unlikely to be an intentional button press.
I made a number of emergency calls from my pocket on my Nokia 1100 even with key lock enabled. It was infuriating. It happened once when I was having a heated argument with somebody and it took some work to convince the dispatcher not to send a law enforcement officer. (There was nothing physical happening and no threats of violence, but we were speaking very sharply to each other.) I do not miss that about my Nokia 1100 at all.
I haven't butt-dialled anyone since I got swipe to unlock on my iPhone. A 6 digit unlock code/fingerprint would make it seem even less likely to happen again.
I do not want to set up this feature on my Alexa for precisely this reason and I am finding Amazon to be really aggressive about forcing me to enable this feature and harvest my contact information. The Alexa phone app is almost unusable for me, as it constantly goes into the telecom setup screen whenever I try to do something else in the app. I am forced to exit the app and come back into it, whenever this happens, to escape that screen.
Wait! So you're saying that Alexa actually includes the capability to record voice, and send recordings to contacts? I suppose that some would love that. But, as this demonstrates, it's prone to fail spectacularly.
You'd think that there'd be protection. Maybe a restricted set of contacts who could get such messages. And a confirmation step, such as "Do you want to send this recording to 'my mother'?" Or "... to '911'?"
I use Alexa for listening to music. I rarely go into the app, but occasionally attempt to use it to set up a new music service, control the volume or track (if I am unwilling to shout down the music to tell it by voice). When I open the app, I get immediately thrown into the setup screen for the voip feature they're pushing.
You are right, but your argument is beside the point. The point is, that those devices are not designed to work 100% accurate. So even if Amazon has no malicious intent, such privacy-invading events will happen!
I have this idea of a system I would like to have in my house. It contains cameras in every room that are constantly watching where people are and relaying the coordinates to a central server. That server makes decisions on if lights should be on or if A/C should be running in that room. But I would never buy this system. I would have to make it myself. I am hopeful that open source software and hardware can produce individual components that I can trust to piece together.
> I have this idea of a system I would like to have in my house. It contains cameras in every room that are constantly watching where people are and relaying the coordinates to a central server. That server makes decisions on if lights should be on or if A/C should be running in that room. But I would never buy this system. I would have to make it myself. I am hopeful that open source software and hardware can produce individual components that I can trust to piece together.
You don't need cameras for that, just motion sensors.
> That server makes decisions on if lights should be on...in that room.
You don't need a server for that, just a motion-sensing switch. They can be totally offline. My office has them to shut off lights automatically when a conference room becomes unoccupied.
That's the main problem with motion activated office lights tbf, especially when you're doing software development for example - not enough motion to keep them on.
My problem with the motion activated office lights (at least the ones at the wework my company is located) is that you can't turn them off. The button appears to turn the lights off and disable the motion sensor for some (short) period of time, then movement turns the lights back on again. Very annoying.
Every time! The coffee shop by my old house had motion sensing lights in the washroom, and every damned time I would be plunged into darkness half-way through my visit... Urgh.
Ours does too, but there's no sensors inside the stall... You can wave all you want, there's no way to turn the lights back on without opening the stall door.
OSHA also requires a minimum light level in the US but I believe it only pertains to hazardous working environments (like on a production floor in a manufacturing facility for example). Either that or the enforcement in white collar environments is so weak that literally no one cares to follow the law which is surprisingly common with reagrd to many workplace regulations.
FLIR for turning off a light in a room seems like a ridiculous amount of overkill. I don't know numbers off the top of my head but I feel like at that point you're using more energy for FLIR than the light itself. Not to mention the cost of a single FLIR sensor is probably pretty high.
One reason you might want a server is to have more expressive power than "if someone literally enters the room, turn on the lights". You might want to turn on the lights and run the AC a little before you predict people will arrive, for example.
> One reason you might want a server is to have more expressive power than "if someone literally enters the room, turn on the lights". You might want to turn on the lights and run the AC a little before you predict people will arrive, for example.
Maybe, but that seems like a lot of work for little gain.
You probably don't actually have that much control over the AC in your rooms unless you have zoned heating and a lot of zones, regardless of how smart your sensors are.
Intent is a very important factor when answering the question "should the lights be on," I don't think you'll be able to predict that. For instance: if movement is detected in the bedroom at 2AM, should the lights come on? The answer is: a very strong maybe.
While I haven't had the urge to get any 'smart speaker', I have a hunch that the current hype* around these devices comes mainly from the novelty of being impressed with voice recognition.
As someone who doesn't have a voice for radio, I much prefer being able to interact through dexterity. Besides, I'm very picky when it comes to which specific track of music is to be played (out of various similarly named pieces of music).
* I don't think take-up is quite as big as the tech world currently makes it out to be. People in my circles don't really have that kind of disposable income and are prioritising other purchases first.
So yeah, Alexa is a lot of extra literal work for little gain.
If you don't have multiple zones you can still effectively have zones by opening/close the registers/vents. This will prevent hot or cold air from entering the room. However, it will also increase duct pressure and that could be somewhat problematic. Either way, "smart vents" are on the market. They're ridiculously overpriced though and one could hack together an alternative for under $50 per vent (source: personal experience). Probably less if they are clever. Of course then there is the problem of running power to the vents or having sufficient battery. But that's an exercise left for the reader.
> Intent is a very important factor when answering the question "should the lights be one," I don't think you'll be able to predict that. For instance: if movement is detected in the bedroom at 2AM, should the lights come on? The answer is: a very strong maybe.
You would be collecting data from many sources in order to predict intent. This is why you need a centralized server.
> You would be collecting data from many sources in order to predict intent. This is why you need a centralized server.
Please explain, exactly, what other sources you would collect data from and how the central server would process it to determine if I want the lights on in the middle of the night.
IMHO, determining intent in this scenario is impossible without...
1. a mind-reading sensor, or...
2. an explicit user signal, such as a button-press or command.
The only realistic option is a user signal, and most of those options obviate a lot of these prediction ideas.
I think there's a lot less practical value to having a "central server" controlling everything than you seem to assume.
> Please explain how, exactly, what other sources you would collect data from and how the central server would process it to determine if I want the lights on.
Machine learning navel-gazing is the new "throw a start-up at it", so I'm guessing the answer to this is going to be "if we have enough data..."
> Please explain how, exactly, what other sources you would collect data from and how the central server would process it to determine if I want the lights on.
The more data, the more smarter. In fact, the only difference between a thermostat and the human mind is the number of datas. This is because the Law of Averages predicts that half of all datas will be relevant.
Your solution is simple. However, it doesn't allow for as much modification as the original poster's idea. With his idea it could be modified in so many ways to add functionality because it wouldn't be limited by the technology. The solution you supplied will eventually be limited by the technology if the original poster wants to add other functions.
> The second advisor, a software developer, immediately recognized the danger of such short-sighted thinking. [...] "A toaster that only makes toast will soon be obsolete. If we don't look to the future, we will have to completely redesign the toaster in just a few years."
YAGNI. You can get to a 90% solution today at 10% of the cost and effort. If, later, you want to extend the system you’ll not only have learned a lot about the operations and failure modes of the current system, but hardware purchased later will be cheaper and may support even more functionality.
Sure, but what is the value gain/opportunity cost between a simple, two hour installation that achieves a majority of the desired outcome with nearly rock-solid stability vs. sinking a massive amount of time into a bespoke and likely fragile system?
> However, it doesn't allow for as much modification as the original poster's idea.
My main point was the original poster's idea was probably focusing on the wrong kind of sensor for what he wanted to do. He could still network a bunch of motion sensors.
Also, I'd dispute the idea that my proposals were less "modifiable." If only because they're far easier to implement and a couple of orders of magnitude cheaper, so it's practical to replace if more capabilities are needed.
There may be a market for something between a standard one-bit low-rez motion sensor and a full color TV camera. Maybe a 16x16 pixel IR sensor with a fisheye lens and a puny CPU that reports an approximate number of people in the area, for HVAC and lighting control, and security.
Reporting an approximate number of people is not a trivial task, even with a real camera. Depending on what exactly you mean with "approximate", of course.
> 0, 1, a few, many. Just enough to tell you how much to crank up the HVAC.
I don't think the number of people in a room will give you meaningful information to tell you "how much to crank up the HVAC." Also, the HVAC systems in most homes aren't capable of even cranking up the HVAC in a particular room.
https://www.home-assistant.io doesn't have computer vision, but can use motion sensors and the like to tell where people are, and use that to activate/deactivate devices. It's all open source and runs entirely locally.
A lot of home automation stuff seems like something that would be fun to make, but not something I really want to have. I mean, the AC might be a good energy saver (a smarter thermostat) but that's not something I want, more like something that I would buy if it was cheap and practical enough.
To some extent, this is a phenomenon of early technology. It feels full of potential, so you want to play with it but it doesn't really do anything you really need yet. The early web was ki d of like that. We made sites because we wanted to make sites, moreso than because we wanted to have them.
Home automation though... Im kind of skeptical that this goes anywhere useful. The useful examples people think of (eg remote close all the windows and lock the doors) are more about mechanisation than automation.
At my house, we have blinds installed on every window, and every evening I go around the house, pressing 12 switches (2 for every blind, because UX is obviously optional) so it's perfectly dark in the bedroom. In the morning, I go around the house and press those 12 switches again, to let the sunshine in. Now that's a task I would love to automate!
Alas, remote controlling those blinds would be a major hassle since AFAIK I would have to install 6 wifi-enabled devices and tear holes everywhere (not even sure there is a powerline near) - and likely do all the programming myself. Thanks but no thanks.
If it bothers you enough, take a look at ZigBee switches. Not sure I understand how you operate the blinds (since the current switches are apparently not on powerlines?), but you could still make a centralized solution to control them over ZB. There's https://www.home-assistant.io/ if you don't want to program (much), and probably others. And you will smile every evening and every morning for the next few years, thinking of the time when you had to do it manually... not to mention, it's cool. :) You might need to invest something in these devices and controller though.
Yeah it's not really relieving a pain point. I've never thought "O why lord, why must I toil away flipping these light switches as I enter and exit rooms?"
Another pain point with making is that it's really hard to get it integrated with stuff you have bought. I built my own garage door automation with a Particle Photon board. It works great and can do things like text me if I leave the house with the garage door open using the IFTT support from my WiFi router. The problem is that it's really hard to get it integrated with any other control system that the rest of my house uses like my ZWave light switches and Hue lights.
I've been working on a custom UI that sits on top of the Wink Hub API to unify everything, but I'be been stuck with their almost completely undocumented Pubnub event API.
I would check out https://www.home-assistant.io/ for integrating multiple systems together. I use it on a RasPi to integrate a few disparate systems (Amazon, Nest, RadioRa2) and it works very well. There are modules for most existing systems and it's easy to write your own in python.
I probably should try it. I've been writing my own partly as an excuse to learn React. I've already got a Pi with a touchscreen and a 3d printed enclosure setup to run whatever solution I actually end up using.
I know I’m a little late here, but I would look into MQTT as a transport layer for messages across your different devices. It’s super easy to interface with via python or a host of third party services.
> "That server makes decisions on if lights should be on [...]"
Why would you even want such a system?
My flat uses the Button-Framework which provides a really convenient UI/UX. The edge between 2 nodes (rooms) has a button (a haptic device to switch between boolean states) at around hand-height that you can press to switch the light on or off. It works quite well (it uses a switch-system technically) and the user decides by her/himself if the light should be on or off and just presses the button after careful reasoning.
Granted, it doesn't use Docker, but it really works well and needs low-maintainance. My model is running since 23 years and I never had any issues - it's even open source.
There's a whole class of tech like this for me. An Alexa/Echo/etc, a fitness tracker with GPS and sleep monitoring, a maps program that learns my routine and integrates with a weather app, and so on.
And ideally? All of it integrated. It actually sounds nice to say "I'm going home", and have Maps say "today that will take 35, should your oven start preheating when you're 20 minutes out?" IoT devices are overrated, but I can absolutely imagine a critical mass of integrated tools being very useful.
But I'm not even slightly willing to do that. It's too much information and too much risk surface. I'd pay a hefty premium to get local-data-only versions of these products, but no one is offering that, and it doesn't look like they're going to start.
>a maps program that learns my routine and integrates with a weather app
That sounds awesome until the company providing that service starts abusing their knowledge of their location. That abuse doesn't even have to necessarily be malicious in nature either. For example, Google Maps on Android started asking me to rate, review, and/or take photos of my present location if they deemed it a point of interest (certain restaurants, parks, etc). I never opted in to this feature and the only way to disable it that I've figured out is to literally disable all location services on the phone.
I really dislike the idea of Google storing a timestamped record of almost every place I've ever visited. Tt has to beconstantly phoning home in order to deliver the request to document my visit within a minute or two of my arrival and that constant reminder that Big Brother Google is tracking me at every moment is just disturbing on so many levels. Even if they aren't using that data right now, remember the "data is never destroyed" principle of the internet.
On a side note, I would have ditched Android if I didn't need it for work simply because use of the GPS radio is hidden behind the acceptance of enabling Google's Location system and all the invasions of privacy that entails.
I feel the same way. Between that and my overuse of my phone for checking hn, reddit, etc. I am considering trying to make a habit of leaving my phone at home when I leave the house, at least some of the time. It's a shame that I feel that way about such an incredible useful device, but there it is.
Maps currently does do what you're saying - Google knows the time I'm going home and displays current traffic information and estimated time to arrive, at least for me.
Same. But it gets such obvious things wrong. Instead of "traffic to [daycare center] is light" it says "traffic to [some weird company name probably registered in the same building] is light".
Waze also knows where I'm going in advance and most of the time it gets it right, like every Thu evening "are you heading to [evening school] ?". My weekly schedule is exactly the same 99% of the time but sometimes it still makes obvious mistakes. If you need an AI for that it must be a very simple one, yet it still fails.
Google maps also auto saves my parking location most of the time but other times it doesn't, for reasons unknown. Things like that make it hard for me to have faith in future versions of this stuff.
I actually used to have a solution with home-assistant, find (uses wifi signal fingerprinting for location) and hue. It worked okay with very little work (sometimes 1-5s delay turning lights on/off) but I never cared enough to put more work in to get it better.
I think "find" and similar tools are now much more advanced so it might work better out of the box now.
They should feel more comfortable than an equivalent system built by a company that is looking to profit off of your data - and additionally, you can give the guest stronger guarantees that when you say that the system is "off", it actually is.
>> Even if you build it, knowing guest may never feel completely comfortable visiting your house.
> They should feel more comfortable than an equivalent system built by a company that is looking to profit off of your data - and additionally, you can give the guest stronger guarantees that when you say that the system is "off", it actually is.
The chance that a random implementer has a security vulnerability is much higher than that Jeff Bezos is listening to me watch TV. A private system is more vulnerable to target attack and an Amazon system is more vulnerable to mass surveillance.
You are on the right track, but I don't think you are quite right. Amazon like systems are more vulnerable to mass surveillance.
Your vulnerability isn't a targeted attack: you are not valuable enough to be worth the effort to figure out your system. As an attacker on your system I'd have to figure out how to break in, and then how to use the hardware you have. You are more valuable as part of a botnet - attacks that already exist. (if you are a politician then maybe, but that person is also vulnerable to a targeted attack on their amazon system - probably more so because the target is easier to figure out).
And what exactly does this "vulnerability" mean in real terms? Let's be honest. No one cares what conversations are going on in your house _unless_ you're someone specifically targeted. There's little use in mopping up data with no goal.
If I'm using a dragnet to grab every conversation and filter it for things I personally care about, your conversation could make you someone I want to specifically target.
As a random example: let's say that I want to kill someone who lives in your neighborhood. I could analyze your conversations, comings and goings to figure out when and how to kill that person, and blame it on you. I could on a continual basis run numbers on everyone emitting data in your neighborhood, until someone had arrived at a point where their friends would testify against them on the basis of conversations with the potential patsy, that patsy had no alibi, and tailor the murder on the basis of the means that the patsy had available to them at the time.
I could also just be trying to figure out if you were a homosexual, or muslim.
If mopping up everybody's conversations is cheap enough Russia/Iran/China/(insert your favorite large evil) does. If you happen to run of political office 15 years from now having all your conversations available to analyze will be useful. If they don't like you, you might find some out of context snippet of "private conversation" all over social media killing your campaign. (or alternatively the blackmail threat if you don't X)
That is they will target everybody because they know in a few years that will include somebody who they currently think is a nobody.
Of course as AI gets better and cheaper they may eventually listen to everything to see what who can be targeted automatically for what.
A secure language does not protect you from insecure design. I could very easily build a system in Rust with gaping security holes, purposefully or accidentally.
I like the idea (though I think with enough abstraction, you could have it also replicating itself to "regular" cloud).
The main fight needs to happen at application level, not infrastructure. Cloud services are already mostly transparent and interchangeable. But applications aren't. The problem is, it's the application vendor that owns the code, determines where's going to run, and asks you to send over the data. How it should be working, is that you own the data and determine location of computing, and own or rent code to be run on that data.
Any idiot who breaks into your home would be well positioned to steal your server along with all of your memories and use them for ransom/blackmail/etc.
As opposed to any idiot who breaks into your IoT provider's server?
An encrypted hard disk with the key on a USB stick would be enough, just keep the key somewhere separate and you'll only have to plug it in when there's a power outage.
You don’t need to use cameras for this. A simple speaker and microphone is all you need to make a functioning motion sensor. Just exploiting the doppler effect. And even better, you can do it all outside of the human audible spectrum. And it can work with capturing motion around corners, too, since sound bounces off walls.
I’d recommend something like this in every room, or even IR sensors, over cameras. You don’t want to capture video of your children jerking off.
This is eerily similar to the concept of a 'cookie' seen in the Black Mirror episode, White Christmas.
(Spoiler Alert)
A cookie is a device "that is inserted under the clients head by the brain and kept there for a week, giving it time to accurately replicate the individuals consciousness. It is then removed and installed in a larger, egg shaped device which can be connected to a computer or tablet (to automate their smart house controls as if the house was knew their personal preferences moving from room to room.)"
IR camera would probably make this simpler than a visible-light camera. You want to know if there are people in the room, you don't really care who they are specifically.
I had exactly same idea just about the lights but am too lazy to even try to implement it. Plus sensors require power and that requires cables and that's way too much bother.
Low tech alternative solution from my friends? Get LEDs. Never turn off the lights. This way the room you are going to will always be lit.
From what I gather, if you need fine-grained location information in a building, you can do this without cameras and just a handful of strategically placed wireless access points.
Less likely to be seen as creepy and such systems already exist.
If you just need 'is a person in this room' or 'how many people are in this room' levels of data, solutions to this problem have existed for decades. No need to over complicate a solved problem.
If you are cleaver you can use the wifi signals as a radar and find people/objects even when they don't have a wifi device on them.
I'm pretty sure all current technology only works with the have a wifi decide on your person. It is significantly easier to do this and for the most part good enough.
Sounds far-fetched to me. If you are cleaver, you are also likely to get stuck in a coffee cup billboard or Eddie Haskell will trick you into insulting your Spanish-speaking friend.
Not only can you detect motion and objects through physical mass using WiFi spectrum with a properly equipped device, you can detect if someone is breathing.
“What's more, this "Time Reversal Machine" technology is essentially just some clever algorithmic work with little burden on the processor, so it can potentially be added to any existing WiFi mesh routers via a firmware update. In other words, security system vendors should take note.”
Such a system would be a huge GDPR nightmare. You'd need consent from every visitor to your house to collect data on them, and also you have to delete it if they ever request you to. Best not to even try.
Interactions between private individuals are out-of-scope.
The following processing is outside the scope of the GDPR:
- any activity outside the scope of EU law (e.g., activities of a Member State in relation to national criminal law);
- any activity performed by Member States when carrying out activities in relation to the common foreign and security policy of the EU;
- any activity performed by a natural person in the course of a purely personal or household activity;
- any processing by the EU itself;
and
- any activities performed by national authorities for the purposes of prevention, investigation, detection or prosecution of criminal offences, or performance of judicial functions.
> Echo woke up due to a word in background conversation sounding like "Alexa." Then, the subsequent conversation was heard as a "send message" request. At which point, Alexa said out loud "To whom?" At which point, the background conversation was interpreted as a name in the customers contact list. Alexa then asked out loud, "[contact name], right?" Alexa then interpreted background conversation as "right." As unlikely as this string of events is, we are evaluating options to make this case even less likely.
In my house, there is a design flaw in Alexa which encourages this failure mode: there is a single volume level for the system, instead of separate volume levels for Alexa voice vs. music/application audio.
I keep my Alexa's volume at 2-3, because otherwise the music / podcasts are too loud for me. However at volume 2-3 the Alexa system voice is fairly quiet and I often miss things Alexa says- requested or accidental. I'd prefer to keep Alexa system voice at volume 5, but music at 2-3.
I requested this feature of a distinct Alexa voice volume years ago. I would be interested if anyone has a solution for this.
I would like a feature where Alexa detected the sound level in the room and responded with an appropriate volume. For example, in the quiet of the early morning I just want to hear the days weather forecast at a low volume, to avoid waking others in the house. However, during the afternoon, when kids are playing and the A/C is rattling and the dog is barking, I would like my responses in a higher volume.
I want some YouTube tech channel to test this out and see if this story by Amazon is bullshit or not. Have a private-sounding conversation and intersperse it with a sudden nonchalant, 'hey, alexa visited me today and...' and then keep talking and then 'I really want to send a message to these students, you know and...' and so on and so forth.
I think it's decently plausible if your Alexa hears you, but you don't hear your Alexa. So you continue on your conversation, and you say enough to keep progressing the flow.
I would assume any failed speech recognition attpempts are recorded so Amazon can have a human look at and classify them, but at the very least they probably keep logs, so they should have this information when debugging. Maybe we'll get a more detailed postmortem later.
It's like those "one in a hundred million years" events that nevertheless do happen, inexplicably, because hey, we calculated the odds, why won't the universe adhere to our calculations? :-) Taleb, Black Swan, etc... I'm aware I'm not making a high-quality comment of amazing insight, on the other hand, plenty of examples where people don't seem to care. Since I presume people, while being "people", are not that gullible, I'm sure if one follows the incentives one may find reasons...
I mean, if this was a one in a million chance of happening, and there have been billions of conversations happening near devices like these, I'm surprised this hasn't happened more often.
If you have an echo and it kicks on unexpectedly, you can go to the app to see what it thought you said, along with a recording of you saying it.
I've had "alright, uhh..." trigger it. and on the grainy bad across the room recording, I could see how the computer thought it was "Alexa". So I could see how if the computer is purposefully listening for "right" or "yes" that it could mistake any number of other words for that.
I would much prefer an open source + open hardware alternative that works reliably, but I received an Echo dot as a gift, so after a few months of debating, I plugged it in.
Having an always listening internet connected device that has access to your contact list is a pretty big stretch. But somehow we go along with that just fine.
Given the numbers it has almost certainly happened at least hundreds of times and probably thousands of times. But the remaining sequence of events - the receiver notifies the sender, and a media outlet picks it up, and amazon comments seems to have only happened once so far.
After using these systems I think it's plausible: the current ML systems produce valid but incorrect words frequently and since the UI is designed for marketing rather than usability nobody wants to expose low confidence levels or a correction UI. They have enough customers that someone was bound to get a false positive which the system mapped to one of a few constrained choices in a brittle interface tree.
As an example, I've noticed that Siri tends to default to offering to open Notes on my iPhone any time it fails to recognize a query correctly. Alexa on the FireTV is better but still has lots of failure modes which look suspiciously like the default branch of a switch statement where it ends up really off on a common query in a quiet room.
I received one of these messages just a week ago. Alexa sent me a message of my friend and his girlfriend having a private conversation. I immediately texted him to ask if he intended to do that and he did not- so weird.
Can you explain a bit more. What was the context of the message? was it an email? What was the subject line and the text? Was the audio an attachment or a link? A link to where...?
"Amazon sent me a message" is so frustratingly vague...
That means that, unlike butt-dialing, you're more likely to accidentally send a message to someone you're actually talking about since the person's name is part of the command. Yikes!
Yea, I wish the girl mentioned what the message started with. Something had to have been misinterpreted as a voice command and a contact name ... and the device UI has to be broken to the point where it doesn't give you any audio acknowledgement or confirmation.
So that we consumers can better understand this potentially huge vulnerability to determine if it's a bug, a "bug-feature", user error, user lying, or Amazon actually doing something evil.
Do you have any proof of this online or can you provide more specific details? Not that I don't trust you but I don't trust anything I read on the internet without a minor amount of verification.
I'm sure another anecdote from a stranger won't be enough to convince you, but this also happened to a friend of mine just last week. A work contact of his received a voicemail from him which consisted of police sirens and muffled shouting. Of course he panicked thinking something bad had happened. Turns out my friend's Alexa had picked up a police show on the TV in the same room.
Since Alexa wasn't originally available in my country, I setup a whole separate account for "Alexa LastName" in the US with alexa@mydomain.com as the email address.
Now that Alexa is available in my country, I've considered switching it over to a real account explicitly to be able to use some of these calling and messaging features. Maybe I'll wait till the bugs are worked out.
Summary of vague technical details (which may be all we hear about this):
> an Alexa engineer investigated ... they said 'our engineers went through your logs, and they saw exactly what you told us, they saw exactly what you said happened, and we're sorry.' He apologized like 15 times in a matter of 30 minutes and he said we really appreciate you bringing this to our attention, this is something we need to fix!"
> the engineer did not provide specifics about why it happened, or if it's a widespread issue.
> "He told us that the device just guessed what we were saying" The device did not audibly advise that it was preparing to send the recording, something it’s programmed to do.
It’s even worse. Sometimes the packages that the DRAM ICs are encapsulated in have minor radioactive elements in and alpha particles emitted can flip bits.
Right, or "the current source code says not to do that, but some discrepant, previous, buggy build of the machine code made it into the product you actually bought."
I don't. And in this particular scenario, it's an important distinction to make because there's a risk of Amazon using equivocation to deflect responsibility.
If this was intentional, then yes they shouldn't be able to deflect responsibility. However, if it was just a bug, I think it is a bit unfair to vilify either the company or the developers. Bugs happen, and hopefully they can learn what caused this and prevent this class of bugs in the future.
And this is where "software" diverges from "engineering".
Bugs happen in architecture, aircraft, etc. too. the difference is that the actual engineers are paid to have a precautionary approach and spend significant resources to actively prevent bugs from making it into the final product.
In contrast, software is often written to "ship first", be "agile", and "move fast and break things". Yet when it causes problems, they just say "bugs happen", and "it is unfair to vilify them".
Features are not better than reliable security.
And yes, negligence is less bad than malice, but it is still damaging and developers and managers need to be held to account.
If your manager is pressing you to do unsafe crap in too big a hurry, it's your responsibility to push back, and if unsuccessful, leave for saner pastures and make it more difficult for that management to proceed.
Well, yes and no. Bugs, security, reliability, etc are all important things to consider, but they can't be the only focus. Security doesn't matter if the thing you are creating has no features; it would be the same as if it didn't exist at all.
Instead, we must manage risk; the risk of bugs, the risk of security vulnerabilities, etc. Nothing we do is risk free. Even walking across the bathroom floor has SOME risk; we might slip and fall. Does that mean we should just stay in bed all day to avoid any risk?
No, we measure risk by factoring the chance of the bad thing happening and the consequence of the bad thing happening. We then determine how much effort we should spend on that risk, since there are infinite risks and only a finite amount of effort we can expend.
If the consequence of a risk is death, then we should absolutely put a lot of effort into minimizing that risk. If the consequence of a risk is that a private conversation is sent to a contact, we should definitely put a lot of effort into minimizing that risk, but probably not quite as much as you would into something that has the consequence of death.
Even when the risk is death, however, we don't put infinite effort into avoiding it. We choose to cross the street, even when we know there is a risk of death when we do it. We drive cars that have chances of mechanical failures that could cause our death, but we don't bring the car to the shop every day to check for failures.
Things are not so black and white as to say "security is always the most important thing"
I did not say >"security is always the most important thing". I did not suggest that anyone should develop such that >" the thing you are creating has no features". I certainly never suggested that no one get out of bed because they might slip in the bath. These are strawman arguments.
You do not need to lecture me about risk. I've had a career in international downhill ski racing, have won auto racing championships, and enjoyed lots of technical rock climbing, all of which require a high degree of risk assessment, both in extended preparation phases and at split-second time scales. I've also run risk analysis for UAV flight systems
I understand well the difference between smart-crazy and dumb-crazy, and where the pseudo-mathematical risk models like yours break down.
Your 'analysis' to "...measure risk by factoring the chance..." would have fit right in at the meetings where Ford decided to just go ahead with the design of the Pinto/Bobcat because the lawsuits would cost less than the fix -- they wound up killing dozens of people.
Your 'analysis' would have fit right in where the trading algorithms were being designed, which worked fantastically profitably, until they didn't and ended up crashing the global economy in 2007-8.
You cannot simply multiply the cost of the consequences by the expected probability and get an allocation of resources. That is what you do to see if the lottery jackpot is big enough for you to want to buy a $2 ticket this week.
You must instead 1) fully examine the system for potential critical failure points/modes and then 2) allocate WHATEVER resources are necessary to account for preventing those critical failures, then implement those remedies along with the features.
These preventative measures may involve installing redundant systems around the critical points, redesigning the points so they fail in a safe mode (e.g., fail to send the data vs sending it off, shut down vs, explode, etc.), adding check procedures around the potential critical failure, etc.
Note that NONE of these measures involve not implementing the feature. They involve 1) checking for critical failure modes, 2) allocating R&D to develop preventative & fail-safe measures, 3) implementing the measures, 4) testing, and 5) field monitoring.
This is what you do if you are serious about risk.
Ok, I never said that the 'analysis' should be purely based on dollar value. I never even said it should be a mathematical model. You accuse me of a strawman and then turn around and do the same to me.
I was simply pointing out that we can never get to zero risk, and since we can't, we have to weigh risks based on consequences and probability.
> 1) fully examine the system for potential critical failure points/modes
Sure, to the best of your ability. How can you know for certain you have found all potential critical failure points? You can get pretty sure, but never fully sure. We still have industrial accidents, in every single industry in the world.
You also have to define what a 'critical risk' is. I don't think it is an a priori fact that accidentally sending a recording of a conversation to a contact is a 'critical risk'.
So your model isn't even mathematical, it's what, just a SWAG of the combined hazards and odds? That works for linear, small risks.
It absolutely does NOT work for serious risks, e.g., of death, serious injury, massive privacy violation, and other potential life-changing events.
The concept you are clearly avoiding or missing non-linear risk.
You (and amzn_engineer1) are advocating for simply subsuming risk assessment into the ordinary development cycle, and calling it "taking it seriously".
That is fooling yourself.
Taking it seriously is actually making full and serious effort OUT OF THE NORMAL DEVELOPMENT CYCLE for no other purpose than to SEEK and identify potential critical risks.
It is then engineering a variety of in-depth solutions to prevent those critical failure points from ever seeing the light of day. And implementing them. and testing them. And monitoring them.
>> I don't think it is an a priori fact that accidentally sending a recording of a conversation to a contact is a 'critical risk'.
This is an exact example of this sort of failure: 'it's not a priori bad'..., minimize it and streamline it into dev.
I really want to know in what world any sane person would say that it's OK to randomly divulging an intimate conversation to a contact or random recipient -- seriously, who would say that?
I mean sure, most conversations are benign, but some could be utterly life-changing if revealed. and that's OK with you?
It's worth observing that, firstly, aircraft still crash and bridges still collapse, amd secondly, that a culture of blameless analysis has gone a long way towards making them crash and collapse less.
But note the blameless analysis is NOT the same as saying "meh" errors happen.
It is a culture and deliberate practice of allocating resources to seeking and classifying risks, and designing, implementing, and testing engineering and procedural mitigation strategies (vs. development as usual).
Yeah, pretty much. It's likely that the device heard something like "Alexa, send a note to Enrique" and then recorded and sent the subsequent note to the matching contact that it found. The microfailure is that it either failed to respond to the user in a way that was clear, or did it in a way that the user didn't notice (I dunno, volume all the way down? Output to a bluetooth speaker or headphone jack that wasn't audible?). And that can be fixed.
The macrofailure, though, is that you have a device just guessing at what you want and doing it. That can't be fixed without going back to manually constructed (or at least manually triggered) notifications. Which for "send a note" might be an obvious fix, sure, but there is a lot of gray area in what constitutes "privacy" and Alexa would be pretty useless if you had to affirmatively OK everything it did.
>Output to a bluetooth speaker or headphone jack that wasn't audible?). And that can be fixed.
Not sure how to fix that, if you use the aux out it has no way of knowing what happens on the other end; a confirmation prompt that isn't heard could still pick up an errant "Ok" response. Perhaps require a PIN like the shopping confirmation that would make it much much more unlikely? Or better yet, disable messaging as an option.
I thought the response was pretty disturbing. They have logs of everything you say to Alexa? I guess I assumed the private conversations were not kept beyond some sort of metadata. No thanks.
That response could easily mean that they saw that there was an interaction where a message was recorded and sent without the device getting confirmation beforehand which matched what was reported to them, not necessarily that they kept the content of the conversation and could play it back. That's also a paraphrased conversation reported by a possibly non-technical person and likely passed through a customer service representative, so I'd take any technical details gleaned from it with a grain of salt.
Unless it’s expected behavior that malfunctioned. Antecdotally I’ve found a lot of companies support have almost no leeway for appeasement if the triggering behavior is considered “expected behavior” or a bug based off of it.
But this is Amazon. I guess things have changed for the worse on the customer service front though. This happened not too long ago (2-3 years) - probably around the time they posted a net profit
In terms of the big picture, this was dumb PR wise and this story should have never happened. For less than a $1000, Amazon could have stopped the story if the PR dept were more in sync with customer service
I don’t disagree at all on the PR side. Spent sometime in the contact center tools space and my general experience is if a company allows it exempt higher level employees avoid engaging with customer service directly at pretty much all costs.
Not saying it happened here but thats how you get $10/hour vendors making judgement calls on customer situations that are front page news.
Yes, I also used to work in that space, our software / platform was used by Fortune 500 companies.
My point still stands. This is Amazon and not random company X's customer service. There are very few companies like this that people can name (e.g. Nordstrom, Patagonia, ...) It this used to not happen at Amazon. Customer service used to be empowered to make customers happy, which was what made Amazon legendary. Things have changed. Customer service is now crippled like almost every other mediocre company. This is just one of the symptoms. Maybe Amazon feels that it has enough market share now, so it doesn't matter as much?
I see what you are saying it definitely could be the market share or maybe even just the overall volume (QA at scale).
I guess I just don’t have a long enough track record with them to see them in that grouping of great customer service. They’ve pretty much fought me tooth and nail over price matching and prime shipping that doesnt meet the two guarantee over the last couple of years which is probably biasing my view.
That is not like Amazon. Typically their customer support is almost unreasonably good. I've gotten refunds and perks without even asking for them on support calls.
I've told my roommate I'm moving out if he ever buys an Alex/Google/Apple assistant device.
I have a microphone and I've been intending to get one of the open source solutions working and just tie it in to mpd, weather and a few other things. But all the processing should really be done on your own device, by hardware you own, software that's open and that you configure, and not send up to someone else's computer (aka "the cloud").
Of course non-tech people probably wouldn't bother because that's a steep curve and you're talking about more expensive devices to do on-board processing. There are some companies that are trying to make this more accessible to regular consumers. I hope we see a move in this direction.
> I've told my roommate I'm moving out if he ever buys an Alex/Google/Apple assistant device.
Do you have an Apple or Google device near you right now? Does it have a microphone, battery power, and a connection to the internet? What about your roommate? Do you ever have sensitive conversations near these devices?
Of course I'm not suggesting you should abandon such technology. I'm just wondering why you draw the line at "Alex/Google/Apple assistant device"
This is actually a pretty interesting question about threat models. A lot of "it can listen" fears are inconsistent with "I carry a smartphone", but I do think there's a meaningful distinction. My expectation isn't secrecy, but it is privacy.
I carry a smartphone, but I keep voice-command activation turned off. I'm sure it could be activated remotely, but I expect that would require a targeted effort. I assume that Google isn't constantly recording everything in the vicinity of all Android phones - the news would probably get out, and honestly I'd see the battery impact pretty quickly.
As the saying goes, "Quantity has a quality all its own". My basic objection is to systems that are likely to store/upload all available data, with the attendant risks of leakage or commercialization I object to. Anything I say around my phone is potentially forfeit, but I don't think everything is.
They may not be recording everything all the time, but you have no idea when it is recording. At least I didn't on my moto x.
I went on to the google page that lets you see a lot of data they have stored for you. I had read about it in a blog post or something, so being curious I went to check it out. I believe it is the "control your content" page in your settings. I don't want to login to google right now so I might be wrong.
It has stuff like web searches (if you have that enabled). During the year I had an android, it captured several conversations, and I only found out because they were on that data page.
None of them would have had the words "OK Google" or whatever triggers it. One of the conversations was definitely private... it with my doctor during a consult. Nothing that was recorded was actually sensitive, but the fact it happened, and I had no idea, was disturbing.
So how is an Echo different from your smartphone? It feels like you've failed to quantify that.
It's much easier to turn off voice-command activation on the Echo than your smartphone. The Echo is not constantly recording.
> I assume that Google isn't constantly recording everything in the vicinity of all Android phones
AHAHAHAHAHAHA, see for example: https://www.google.com/maps/timeline The vast majority of Android users I've met unknowingly had this enabled. It doesn't record your conversations, but it still records far more than an Echo.
Why would remotely enabling recording on your Android phone be more difficult than on an Echo?
> So how is an Echo different from your smartphone? It feels like you've failed to quantify that.
> It's much easier to turn off voice-command activation on the Echo than your smartphone. The Echo is not constantly recording.
The difference is that a smartphone has the ability to constantly listen, but an Echo's primary use-case is to constantly listen. So if you're going to turn off voice-command activation on an Echo, there's much less reason to own an Echo in the first place than there is to own a smartphone with voice-command activation disabled.
Furthermore, an Echo is constantly plugged into essentially limitless power, while a phone is not. People in general are much more likely to notice excessive power consumption on their smartphone from the fact that their battery would drain more quickly than usual. Whereas, not many people are directly monitoring the power usage of their plugged-in Echo on a daily basis, so they'd be much less likely to notice if their Echo is suddenly using more power than usual (for example, by being remotely activated to listen constantly).
> AHAHAHAHAHAHA, see for example: https://www.google.com/maps/timeline The vast majority of Android users I've met unknowingly had this enabled. It doesn't record your conversations, but it still records far more than an Echo.
Granted, but there is no built-in voice command to send your timeline to one of your contacts.
> Why would remotely enabling recording on your Android phone be more difficult than on an Echo?
It's not that one is more difficult than the other, but rather that your risk grows with the number of possible attack vectors. If I already own a smartphone, then I may already be at risk, but adding an Echo to the mix increases my risk.
Two channels for remote listening will always be more risky than one. I don't think anyone is suggesting that you can replace your smartphone with an Echo, so the discussion about the risk of an Echo will always be within the context of _adding_ to the risk of having a smartphone. I don't think it's surprising when someone draws the line between the thing they currently rely on and the additional thing they can live without.
Thanks, this is summarizes my position really nicely.
The most fundamental point is that a smartphone with voice-control disabled is still a useful tool, but an Echo with voice-control disabled is an LED rolling pin.
And beyond that, yes to the rest - I don't want to add an attack channel, I trust my knowledge of phones more than that of an always-on black box, and I object to voice control as a more open-ended threat than location tracking.
JangoSteve's responses cover most of my thoughts - I don't think remotely enabling recording is harder on a phone than an Echo, I just think the base state of the devices is different.
But as for "AHAHAHAHAHAHA, see for example: https://www.google.com/maps/timeline The vast majority of Android users I've met unknowingly had this enabled"?
I do have Timeline turned off. I know most people don't, but I'm not sure why "many people make this mistake" is supposed to be a rebuttal to "I'm more privacy conscious than many people".
> I'm just wondering why you draw the line at "Alex/Google/Apple assistant device"
There’s a big difference between:
- Devices which are by design always actively listening and sending real-time audio to computers you can’t control or even really trust, with unknown security properties, and consumer-appliance security life-cycles/support
- Devices that can be configured to do that, but which you have some control over (phones, laptops, etc.) and generally won’t, without some form of consent (even if via a dark pattern, e.g. LinkedIn on Android).
To be fair, Apple’s homepod doesn’t send any data until it’s activated with the wake word. Amazon and google products send a ton of data back constantly.
Where have you read that Amazon and Google send a ton of data back? I've heard that none of these devices send anything back until they're activated by a specific phrase.
Speaking for myself, I have my iPhone set so that I have to authenticate then hold down the physical home button before Siri will start listening to me. Home assistant devices are listening all the time.
The reason this matters is well-illustrated by the linked story here--the potential consequences of bugs are more severe when the mic is always on.
The way I have Siri configured, the opportunity to make mistakes about what I mean is limited to when I am intending to speak directly to Siri. It's not that often, and I am aware of what I'm doing.
But if Alexa is going to make a mistake about a trigger word and start transmitting data... well, that potentially could happen whenever I'm in voice range.
Sure, if some three letter agency is targeting me, a smartphone is probably no safer than a home assistant. But I'm not concerned about that level of targeting, for myself.
These taps are expected to always be listening and on non-battery power and dedicated ISP connections.
Phones are expected to be on battery and metered internet. There are tricky ways to tap a phone without draining the battery or internet billing, but that very much limits what can be recorded and sent remote. The risk still exists, but is lower and the bar to entry is higher.
These taps are expected to always be listening and on non-battery power and dedicated ISP connections.
Phones in your pocket are expected to be on battery and metered internet. There are tricky ways to tap a phone without draining the battery or internet billing, but that very much limits what can be recorded and sent remote. The risk still exists, but is lower and the bar to entry is higher.
How is a home speaker any different? you can monitor your network traffic and see how much data it's sending home right now. It's definitely not a constant audio stream. You'd have to be targeted.
What actually happened: Alexa misinterpreted some voice commands and activated a "call" skill. The people involved and local news got very excited and escalated this into a conspiracy story.
Amazon takes customer privacy EXTREMELY seriously. There's no way a team would get the "ok" to build a skill that randomly records private conversations then sends them to a random contact. It also doesn't make any logical sense to build such a skill.
Yes, I might sound biased because I am an engineer at Amazon. This statement is my own and unrelated to Amazon's opinion.
I get what you are saying but I would say that Amazon does NOT take privacy extremely seriously or this couldn't have happened. Let me be clear that I'm not saying they don't care at all or they are conspiring with the NSA.
What I mean by the above is that the "call" skill is much different than the "weather" skill. All Alexa has to do is have a confirmation prompt in the "call" skill and this wouldn't have happened. That is what extremely serious looks like. This is exactly the same as the phantom laughter incident from a few months ago. Alexa "heard" someone say 'Alexa laugh' and laughed, but that wasn't the user's intent. It was fixed by moving to 'Sure, I can laugh,' followed by laughter.
Voice UI is very hard and still in its infancy but ability for personal harm (physical or emotional) must be considered in these interfaces. Turning off the lights may not need confirmation but unlocking the doors or turning off the alarm probably should. Sending recordings or answering calls or even calling people should require more hoops or at least allow the user to control the risk/reward.
There is absolutely no reason to believe they are not conspiring with the NSA. They have a huge deal with the CIA, plausible some of these funds are for surveillance capabilities. They would not (and likely would be legally prohibited from) disclosing any relationship they have.
> There is absolutely no reason to believe they are not conspiring with the NSA.
Or, more likely, being made an offer they can not refuse by the NSA. And the rank-in-file engineers may not even know it's happening, all it takes is inserting some diverting code into the pipeline and calling it "QA monitoring" or something. Couple of people in the whole org would know that somebody from some IP connects and downloads these "QA" data periodically, all the rest would be completely ignorant and indignant at the thought. Don't see anything preventing this from happening at Amazon - or anywhere else.
I think you accurately identify the issue with this situation / skill; voice triggers need to weigh convenience (how easy should it be to activate this skill) against it's permissions or potential (how can this skill affect a customer). In this case, I think this was not done properly.
This does not mean Amazon does not take privacy seriously. It is a company of small teams with very few layers of management between an engineer and business decision. The error in judgement of one team does not reflect on all of Amazon.
As you say, Voice UI is still in its infancy and not lacking growing pains. However, because Amazon does take privacy very seriously, after this incident, I'm certain there will be actions taken internally to ensure teams properly weigh the gravity of a skill with its voice trigger (or adjustments made if there is already an existing policy).
> I get what you are saying but I would say that Amazon does NOT take privacy extremely seriously or this couldn't have happened.
Apple shared personal photos of myself onto the internet without my permission. I could not delete it without getting support to assist, and they could not provide me with a reason why this happened.
Would you say that Apple does not take privacy seriously?
I know you feel like the story is not being fairly presented, but you are almost certainly not helping Amazon by posting here about it.
Why? Because from a PR perspective, it doesn't matter what Amazon or the engineers intended, all that matters is what actually happened. You said:
> There's no way a team would get the "ok" to build a skill that randomly records private conversations then sends them to a random contact.
But that's exactly what happened. Whether it was a bug or a feature doesn't matter. Imagine if the wing fell off a plane and an airframe engineer came on here and said "hey, there's no way we would get the OK to design a wing that falls off in midair." Yeah, we know. The fact that it happened by accident doesn't make it better (maybe worse, actually).
Now you've got dozens of people responding to you, and BTW a lot of reporters read HN too. Do you really want to see stories like "Amazon Engineer Calls Customers Conspiracy Theorists"?
The obvious, predictable and serious screwup occurs and people shrug it off. Engineers from Amazon post how it's all a regrettable mistake and it won't happen again. I am not gonna unload on you specifically here, but if you could maybe pass along to whoever is making business and product decisions over there that yeah, maybe 80% are stupid enough to buy this crap, but there are a huge number of people that are extremely creeped out by this and avoid it like the plague. That deafening silence you here from them should not be interpreted as license to push the boundary even further.
If there was an Alexa in my place of residence, I would rip it out of the wall, smash it to bits with a hammer, and fire it out of a cannon into the sun.
Hell no to this crap from my side. I hope there are more incidents like this until people wake up to the fact they are putting telescreens in their homes. You guys can't be trusted!
>If there was an Alexa in my place of residence, I would rip it out of the wall, smash it to bits with a hammer, and fire it out of a cannon into the sun.
You may consider disposing of the bits at Kilauea instead of the sun. It would be almost the same and much more affordable.
It's pretty clear to me that this was an inadvertent activation of the call skill. The equivalent of butt-dialing someone from your phone. I get random voicemails from people all the time that are clearly recordings of their phone in their pocket.
If that's the case, then you need to provide additional features to reduce the chances of this happening. Longer and more unique wake word options, or more complex and deliberate confirmations before the call is placed. Something that a user can enable if they're worried about this sort of thing.
100% this. Seems like there should at the very least be a product feature that automatically disables any outbound messaging by default (similarly to how you can block purchases without a PIN). This doesn't solve the problem of always listening, but at least prevents this particular situation from inadvertently happening. Additionally you should be able to set an automatic deletion of voice recording after a specified amount of time. I'm sure some of this has to be in the pipeline with GDPR.
Your comment sounds belittling to me. The OP isn't a "conspiracy story", it simply says what happened. Yes, it tells the story from the couple's point of view. Shouldn't it? Sure, they "got excited". Who wouldn't?
What is this alleged 'conspiracy' story? An Amazon device recorded a private conversation and transmitted it to someone else, and it did so without the user's knowledge or intent. That actually happened.
How it happened is only relevant to the engineers who build and maintain the thing. I, on the other hand, could not care less how it happened, and the fact that it did happen is reason enough never to buy one of those infernal devices.
>How it happened is only relevant to the engineers who build and maintain the thing. I, on the other hand, could not care less how it happened
I'm sorry, this is just stupid. If you cannot see the distinction between a feature that was intended to spontaneously record audio, and a bug caused by faulty voice-processing, then that's on you. The distinction is pretty critical
Critical to what, exactly? The consequences to the user are the same.
Obviously, intentionally designing a feature that spontaneously records and transmits audio is a problem for many reasons. But the lack of intent does not magically erase the consequences for the people who experience this kind of bug.
And, to be clear, this was not simply a matter of "faulty voice-processing". The fact that this could happen without the user's knowledge is a problem in itself. Clearly, there are inadequate visual and audio cues, and insufficient or nonexistent verification. Those failures are not bugs; they are bad design and engineering.
And this is where "software" diverges from "engineering". It's not a conspiracy, it's negligence.
Bugs happen in architecture, aircraft, etc. too. the difference is that the actual engineers are paid to have a precautionary approach -- and spend significant resources -- to actively prevent bugs from making it into the final product.
Amazon and your team has built a great product (I have one and make moderate use of it, have even considered building some skills).
But, you have planted a full-on bugging device in millions of people's homes. Done by a government, this would be cause for war or revolution. This is serious, and you need to treat it much more seriously than you obviously are. Every 'skill' does not require the same minimal levels of security and verification, some, like this one, require much more, or should be forbidden outright until such security can be properly implemented (and yes, this should probably include calls only to pre-configured whitelists, intent confirmation, etc. and to any manager that says "that's too inconvenient to the user", the response is "screw you, it's critical").
You call yourself an "engineer" at least twice, and claim that you take privacy "extremely seriously". The evidence from this incident and others noted in this thread indicates otherwise. Clearly, insufficient resources were allocated to figuring out the potential failure modes of a "call skill", and preventing them.
All due respect, but your team needs more of an engineering approach than you have. This entire "it's gotta ship yesterday" mentality in the software industry used to be just inconvenient. Now it's getting dangerous. Please help stop it.
First, the Tech/Dev managers need to stand up. Instead of saying 'Yes' to every feature request, they need to say 'No', or 'Later'. I've seen too many who just feel that their job is to implement everything as fast as possible, and as close to approximating the sales/mktg/product guy's latest half-baked idea as fast as possible.
This is not easy, especially as the CEO is still typically above the CTO, and can overrule. I had it happen to me, when we were ahead on a scalable version of the product, but they didn't like the timeline. The mgt decision cost almost a year of messed-up myopic development schedule just to roll out apparent features sooner, but ignored the likelihood and eventuality of bugs. Afterwards, when we got back to the scalable highly modular version, we started taking biz from competitors who couldn't scale. I'd say that my mistake was to only give the broader consequences, and not spend time to be able to enumerate in detail the consequences of non-scalable quick program would be. Of course you cannot predict exactly what bugs will happen, but I could probably have done a better job of drawing the scenarios (not sure it would have made a difference, but it might have).
I'd also say that we need to create specific structures and plans to study and quantify risks, as is done in real engineering like aerospace, architecture, etc. Classify those risks into a range of categories, from small bugs to existential for your customers or project.
Different steps need to be taken for each class, and significant part of the planning needs to go into de-risking the project.
I'm in physical vs software development now of carbon fiber type technologies, and I notice that my military customer who are building very cutting-edge stuff often talk of 'de-risking' the project, whereas I don't hear this much from other customers. Seems like an important distinction to take on board.
---
From a user perspective, I noticed after looking at the issue on our own Echo yesterday: the UI is a totally greased slide to hide choice for the user and slide them right into giving permissions for contact list. It seems that effort was made to hide the actual features and functions that will result from giving permission, and obscure the 'Skip' option. So it would be easy to not even notice that your device had these new possibilities. Obviously, I'd recommend taking more time to sell the features and let us make an informed choice. Then even if things go wrong, you'll enjoy some benefit of the doubt in the market and press.
> First, the Tech/Dev managers need to stand up. Instead of saying 'Yes' to every feature request, they need to say 'No', or 'Later'.
Yes, I see this a lot too. I think software managers have more incentive to get new features deployed. I don't think this is a good way to measure their performance as a manager because of the consequences we've already mentioned.
I would also like to see more concepts from physical development implemented in software. At times feels like the wild west out here and too often we ignore the lessons from similar experiences.
I know amazon's got a different motivation matrix than a startup; e.g., Amazon won't die if some feature isn't delivered by the next trade show, but they do have competition from the other majors.
That said, amazon certainly also has the funds available to invest in a parallel risk team. If they're not motivated to do it from the risk to their users, the best argument might be the potential reputation setbacks if stuff like this gets out there & causes problems, bad press, reputation for creepiness, etc.
I know it always seems inevitable that you'll weather the reputation hits from errors and just press on to greater usage/adoption/sales, but it will always seem that way from the inside -- until it doesn't. Google's experience with Glass comes to mind; could have been a fantastic product, but it went just under the tipping point of being creepy, and poof, they're gone. It'll be a generation before anything similar comes back.
I'd hate to see that happen to Echo/Alexa. TBH, it looks like this product has both greater potential, and also greater creepiness potential than Glass ever did.
If a "call" skill is accidentally triggered, before it is sent to any email addresses, it should tell user "N seconds of voice was recorded and about to send to ....", please said "Send" to send, "Play" to play back the message, etc.
The default must always be voice recording will be auto deleted after 1 minute if no response is heard. It should let user know about that too.
Better yet, let the user pick a confirmation word (kinda like a 'safe word'), or choose one at random from a list of moderately complex but well known words that are unlikely to be uttered in casual conversation.
To send this message to John Smith, say 'artichoke'.
Having a UI that gives the customer no audio confirmation that it got a command to record and send a message is a serious UI failure. This isn't a small thing here. If a bug like this makes it though, how can I trust that device.
I don't think the article is indicating what you say. The report says it's a bug, but it does bother me why Amazon won't comment on the specifics. This should be patched immediately with a full retrospective explaining what happened.
> There's no way a team would get the "ok" to build a skill that randomly records private conversations then sends them to a random contact.
Of course. But it is entirely plausible that somebody builds a skill that records certain conversations, and somebody builds a skill that sends recording somewhere, and then due to some bugs or coincidences or missing controls to prevent such occurrence, the first skill is activated when it should not be, and the second one is activated when it should not be and with wrong parameters.
So there should be more checks. Like - ask for explicit permission before sending any voice recording out, or ensure that the target where recording is set is validated against whitelist explicitly set by the user, etc.
I shouldn't have called this a "conspiracy story" and I should have given more respect to the parties involved who experienced this. While I think the issue is similar to a "butt dial", the customer felt violated and more precautions should have been taken to prevent this.
What I should have said is this report makes it difficult to understand what actually happened. It seems clear this article favors click-bait quotes that insinuate a "big brother" vibe such as:
> "'unplug your Alexa devices right now,' she said. 'You're being hacked.'"
If she had instead said "Alexa butt-dialed me!", would you still be interested in this report?
>There's no way a team would get the "ok" to build a skill that randomly records private conversations then sends them to a random contact.
If you took privacy seriously you would have added an audible prompt to ask for confirmation before sending or similar safeguard.
I mean if you are an engineer then you know that there is a nonzero false positive rate for the device to detect the "call" command.
Knowing this, and not implementing a trivial safeguard sure seems to me like the team was "ok" with building exactly that, a device that: "randomly records private conversations then sends them to a random contact."
> Yes, I might sound biased because I am an engineer at Amazon.
> This statement is my own and unrelated to Amazon's opinion.
Just because you admit your bias doesn't dismiss you from it. Your statement and opinion could very well be influenced by your work surroundings at Amazon. While I do not doubt that Amazon does take security and privacy seriously, I find it hard to believe that they employ proper ethics (based on this and past examples such as the NYT incident).
Is your best guess at why this might impact someone's trust in Amazon products really that people would think it was an intentional design choice? (Seems kind of straw-man-y.) If not, then you should generally address the part that will matter to people -- not whether this was on purpose, but what issues caused it and how will they be fixed.
I have not experienced that, can you elaborate? They keep spamming me with mails of all there different services, show me what I should buy, etc. No way off even turning these things off...
Actually, I can think of a very good use of the ability to trigger silent phone calls. In the case of home invasion or domestic violence. If Alexa detects what could be signs of distress it can dial an emergency number without alerting that it was triggered. Otherwise, trying to shout "Hey Alexa, dial nine one one" will alert or enrage the perpetrator.
The most shocking thing about this story is how a customer was able to get qualified engineering support for an unusual support request, from a large company.
Or... what's missing is the hours spent going through support tiers 1 through N, and somehow not getting dropped/lost along the way.
Back in the early 90s, I remember seeing a cartoon in one of the tech magazines, possibly BYTE or PC/Computing. It featured an elevator opening onto a floor of office cubicles.
The guy in the elevator screams "COMPUTER! FORMAT SEE COLON SLASH WHY ENTER" across the whole cube farm. The doors close, and the elevator is gone.
Caption was something along the lines of "Speaker-independent voice recognition might be a bit tricky."
I feel creeped out by these home listening devices and I don't own one, but don't our phones already have this capability? You can turn "Ok Google" on on an android phone. I sometimes record audio, and the mic is incredibly good. Is there a substantial difference between our phones and these devices?
EDIT: Just realized the substantial difference is that Google and Amazon own all of these things. They don't control all makes and models of phones.
I got an Echo Dot a month ago, because there was a special offer on Amazon for the Philips Hue system, and it included one free. After reading this and having a discussion with my girlfriend last night about how she needs a backup of something, but we don't have a DVD writer at home (she's not tech savvy and thinks people still back-up stuff on DVDs), and now, the second day my Amazon daily offers is filled with portable DVD writers, it starts to creep me out and the first thing that I want to do when I go home is to unplug it. I know it can be a crazy coincidence and I was never the kind of guy that believed this Facebook is listening to you, but still, what if they actually just listen for some keywords?
It might just be a coincidence and a psychological bias that you noticed it, but given that it would have a very, very clear economic benefit to Amazon if it were true, we're stuck with these facts:
1.) They have an economic incentive to do so
2.) Only they know how their systems work
3.) Their network traffic is encrypted
4.) They face legal risks and user backlash if caught doing so
Given 2+3, you can't be entirely sure that they aren't doing it. If they deny it, your only recourse is to hope that their cost/benefit calculus considers 4.) to be more costly in terms of dollars.
"Okay Google" and "Hey Siri" run locally on the device, but all other transcription is done "in the cloud". Try setting your phone on airplane mode and going "Hey Siri, What time is it?"
It will recognize the Hey Siri, but give an error for everything else. Pretty sure "Okay Google" will behave the same way. Also if you turn on battery save on iOS it disables "Hey Siri".
This. It would be a tremendous battery drain if your phone had the microphone on at all times, with a constant connection to the server, sending all audio over it at all times, just so it can detect when somebody says "hey siri".
It's a custom low-power chip they added on the 6S and later to allow a local tight loop that only listens for that utterance using a hardware-assisted neural net, and only activates the rest of the software stack if it detected it with reasonable confidence.
Not sure about Android phones with "Ok Google" but on iOS, all of Siri's voice processing is on the device. As opposed to Amazon Alexa which sends the data to the cloud for processing.
I think we can assume there was a light here. It's almost happened to me before. Alexa disastrously misinterprets an ambient conversation as "start transcribing and send to one of my contacts". During this, the light does blink to indicate it's gobbling up your input. You just wouldn't know to look for it if you didn't even know you'd activated the device.
What does "recording" mean in this context? The way these devices work is they're always "recording" in a loop on a secondary processor looking for the wake word. If we take this request literally the LED would never be off.
Presumably, it would mean "when anything other than that secondary processor comes on." I.e., when it stops throwing away the buffer containing your speech at a hardware level, and starts instead feeding it through its local parsers and to the cloud in a way that could result in information from your speech being captured.
That would require, though, that it's not buffering the last N seconds of audio to reprocess once that processor wakes up. Do any/all of the modern smart-speaker devices do that? If so, then you'd have to take into account that when you see the light, you've potentially leaked any secrets you said in the last N seconds as well. Less like a reporter coming in and asking to speak to you; more like an eavesdropper coming in and telling you they heard what you were just saying through the door.
I'm not entirely sure that this would be a bad thing. It would set the expectation in customers' minds that the device is always listening, which is at least somewhat true.
> expectation in customers' minds that the device is always listening
If, even with that expectation, enough people still use "voice assistant" technology for a judge to consider the tech to be "in general public use"[1], the bright line test defined by Kyllo v United States is triggered and the police no longer need a warrant to use the technology (in the abstract - they don['t need to use your hardware) to view "details of a private home that would previously have been unknowable without physical intrusion"[2].
Normalizing the expectation that previously private areas (like the inside of your home) might be recorded and sent to a remote 3rd party will eventually result in everyone losing some of their 4th amendment protection against search and seizure.
No, it would just become a meaningless decorative light if it's on all the time and customers would pay no attention at all.
It's like the misguided Prop 65 signs that are in practically every commercial building in California so instead of warning people about dangerous levels of hazardous chemicals, they get a meaningless sign in every building for hazards that are no worse than if they were walking down the street or sitting at home.
As nerds, it makes sense to us to do local wake recognition and only upload a request phrase after that. But in reality, this not the first time[1] we've learned that continuous speech was getting uploaded.
Charitably, it could be a mistake, maybe the previous request never terminated and it just kept streaming. But again, not the first time.
More likely, my opinion, is there's a mountain of cash on the table in the form of private conversation, and AMZ would be foolish to walk away from it. After you talk about "hardwood floors" in your house, you could expect targeted ads, consumer profiling, variable pricing, and third party sales. Why woudln't they do this? Why else are they pushing home devices so hard? They're front and center on the front AMZ page every time you visit, aren't they? Oh and GOOG too, they just haven't been caught yet.
I wonder how would you even do that?
It's not a tape recorder, where someone has to physically press a button, there are no moving parts, thus it must be software controlled.
There are usually 2 stages, a local processor that can only detect the wake word, and then once that hears the wakeword it begins streaming data to the internet.
It doesn't need to be based on the microphone, it can be based on that second processor. It can be based on network activity (my USB Wifi adapter blinks whenever data is sent/received, even though it's connected 24/7, I don't see why anything else couldn't do that).
We're also discussing theoretical hardware changes, there's no reason it can't have 2 sets of microphones where one is hardwired only to the wake-word processor which has no direct connection to the main processor except some one-way signalling, and the LED is hardwired to the second set.
Your WiFi led is probably blinking pretty constantly, which would freak people out if they were told that blinking means active listening (as opposed to checking it updates or any of dozens of other things that might use the network). It’s probably also software controlled anyway.
The idea that there would somehow be a dedicated wake word microphone is a little ridiculous. Firstly, no one would trust this supposed 1-way connection. Second, it would require a dedicated processor to make the wake word even work, driving up costs. Third, the echo uses an array of microphones so your wake word would either be unreliable or drive costs up further as you duplicate the entire array. Hardly a net win.
The reality is that if you don’t trust amazon to do the right thing you shouldn’t install their listening device in your home. (Likewise for your phone.)
WiFi is always on. WiFi doesn’t disconnect just because you aren’t actively moving data. You’d be pretty annoyed if every time you wanted to send or receive data your device had to reconnect.
> where someone has to physically press a button, there are no moving parts, thus it must be software controlled.
If the microphone uses an amplifier, you could wire an LED to light when power is supplied to the amp. The indicator light is physically part of the circuit, so its operation cannot be modified by software. There are probably other, better, ways to do it, I'm not an electronics guy.
> Due to watchword detection, the microphone amp would always be on.
I don't think that would be a bad thing; especially if there's a switch to disable the mic. When you turn it off, you'll get reliable feedback to know it's actually off.
I think it's important that these kinds of devices have simple feedback and control mechanisms that can be independently verified and reasoned about. Software is too opaque and too untrustworthy.
I don't disagree, but the light ring is also important to know that it actually heard your command. Having it on all the time would make the UX much worse. Unless it was a separate light.
> I don't disagree, but the light ring is also important to know that it actually heard your command. Having it on all the time would make the UX much worse. Unless it was a separate light.
I see that, I think it would be best as a separate light.
I think product UX has drifted too far towards blank monoliths; I know I wouldn't mind a few more blinkenlights :)
Just one example, separate memory for storing said data, which when accessed in write mode (current line or write enable low voltage signal for writing to DRAM) activates LED as well. Or, again if DRAM, DQ pins and monitor for voltage high. I'm sure there are a dozen of ways to do it in hardware like that. Maybe someone with more knowledge can chime in.
Doing it via microphone/amp wouldn't do it, since you'd still want to use it without LED on (and not software controlled).
I think is possible though I am not an electrical engineer, you can have one LED on if the microphone has power(so you can't be tricked is off) you can have an LED for the networking device, but now considering that the microphone is always on it won't be as useful,
True but the microphone is always on (otherwise how would the device be able to listen for voice commands). Thus what you're describing would effectively just be a power LED rather than a recording LED.
We have laws for all sorts of things to protect users from their own ignorance. You can't expect every consumer to be 100% aware of the dangers of things they don't fully understand.
great way to just have companies add an annoying, always on LED, that wastes energy to avoid lawsuits. how about we let the market regulate, as it is doing currently. did amazon respond faster than the government has on surveillance? yes. should companies that don't abuse their power/neglect their software be punished for all eternity because of amazon's actions? i think not.
Do you also get mad for the LED in your TV,router and monitor, that shows the device is in stand by or on? I t would be a small one and you should have an hardware slider for brightness , you could also tape it off
yes. if you want that, you can only buy from a company that does it. if enough people agree, they'll all do it or fail. i don't abide by the 'its such a good idea, everyone must be forced to do it' mentality
> yes. if you want that, you can only buy from a company that does it. if enough people agree, they'll all do it or fail. i don't abide by the 'its such a good idea, everyone must be forced to do it' mentality
We don't live in a world with a healthy enough market or enough competition for that to work.
The device is not always visible when you are using it. Amazon Echo has such an LED (at least the one I am familiar with). But it will misinterpret conversations happening in other rooms now and then. It does get confirmations for most things, so the normal scenario for us is this:
Talking loudly in another room.
Alexa: "Do you want to send this message to Jim?"
"Alexa, you suck"
You mean like the very visible colored LED ring that already exists on the devices? Though I suspect that most people keep their devices tucked away in a corner where they are hard to see -- but that's not amazon's fault -- few people would want the devices if they had a bright flashing strobe light visible anywhere the room.
A confirmation after every request makes the device a lot less useful, I suspect most people do not want the device to confirm commands before it does them.
The trouble is that the LED would be lit 24/7 and become exactly as meaningful as the absence of a lit LED. These devices can't function without constantly recording to listen for trigger words or phrases.
Every day I become more convinced that we've really let tech get a little too out of control, and could probably benefit from putting the brakes on a bit to get some very critical stuff under control first (e.g. security/privacy).
Don't get me wrong, technology is amazing and the cutting edge stuff going on today is super exciting... But it seems the capabilities of software now are far, far outpacing our abilities to ensure an adequate level of security and personal privacy. We've been blinded by flashy tech, too busy being amazed at everything all the time ("look at what AI can do!") and not stopping to consider the huge hidden price we're paying.
The reasons this has happened are kind of obvious...
I just fear that it's going to take things getting to a point where we have a really major catastrophe on our hands for anyone to be willing to really do anything about it. A lot of people can and will get seriously hurt. I can imagine such an event resulting in a great "cooling" period in tech where advances are slowed while bigger investments are made into security, cryptography, solving the "identity problem", etc. But it would be nice if we could just have the foresight to fix things before it gets to that point...
Sure, but isn't that still terrifying? It brings me back to something I wonder about with these voice activated devices: you have no idea what it can do. Voice UIs are so utterly opaque, and Amazon(/Google/whoever) pushes new software updates without informing you. So it'll add new commands all the time, and you won't know about it until you accidentally trigger it, making it do... whatever.
But either way, they should surely be confirming this before send, the same way Siri does when I ask it to send a text.
I haven't used a smartphone in years. Looking back on the days when I did use a smartphone, there was always this tiny fear in the back of my head that maybe I'd pocket-dialed someone and they'd be able to hear my conversation. It actually had a subtle chilling effect.
The Hawthorne effect (which discusses how people react to being observed) seems to be especially relevant nowadays as there's a real possibility that we're being recorded at any time. https://en.wikipedia.org/wiki/Hawthorne_effect
It's actually a big theme in 1984 where the mere possibility that you were being listened to via telescreen was enough to keep you in line and influence your behavior.
I could be mistaken but I believe there are confirmations required?
Could it be that the volume was set very low and they inadvertently triggered their way through the entire dialog? I've accidentally set alarms from a few rooms away but this is a whole other level.
POST requests should require secondary verification from the user. GET requests are probably fine to execute.
Some POST requests may only need to let you know that they did it, such as adding a calendar reminder on to YOUR calendar. It should ask if it wants to send a calendar invite to someone else's calendar.
The article makes it sound like the engineers didn't know what was going on, so it could be an actual bug, unless support was just trying to placate her.
I imagine there are a number of normal conversations that can trigger unexpected behaviors from these smart devices, especially when you consider the thousands of different regional accents, along with the thousands of ways of speaking the same words with various speeds, pitches, and annunciations. The article mentioned they were talking about wood floors:
...Oak in the annex and for the record...
recording begins
...it's just the annex, stop fretting...
recording stops
just send [flooring person] a quick message with samples you like.
sends recording to someone with same name as flooring person
It sounds like the problem is that it was overly confident it understood the user's intentions, and did all of this silently.
Tangential point, but you would think that Amazon would have immediately given this woman the refund she asked for, rather than instead offering to take her through a technically complex de-provisioning process. It just makes them look even worse during the inevitable bad publicity that they had to know they were going to get from this.
Google let me listen to my own queries following, "okay, google." Mostly they were reasonable, but sometimes it heard "okay google" in what sounded to me like regular driving noise. It's a good reminder that even though we've trained computers to recognize sounds and images, we didn't train them to listen & see the way that we do. The mistakes that these algorithms make need not be anything like the mistakes that a person would make.
Echo woke up due to a word in background conversation sounding like “Alexa.” Then, the subsequent conversation was heard as a “send message” request. At which point, Alexa said out loud “To whom?” At which point, the background conversation was interpreted as a name in the customers contact list. Alexa then asked out loud, “[contact name], right?” Alexa then interpreted background conversation as “right”. As unlikely as this string of events is, we are evaluating options to make this case even less likely.
Actually it's very remarkable on how bad the voice recognition is, and how Amazon can knowingly ship this crap to people, putting it in their homes, recording their private interactions. Ya'll recorded people having sex yet?
It sounds like the machine learning models are tuned for specific phrases and have terrible, terrible false positive rates. "Evaluating options to make this less likely"? Now I don't even know what mental model you guys are using (if you are from Amazon). This shouldn't be a matter of "likelihood". But OK, fine, let's use math. AFAICT there is about a 0% chance that a human would make such a string of errors interpreting human speech, but let's call it 1 in a trillion. If you guys aren't doing better than about 1 in a trillion for this string of at least 4 interactions, then one of those terms is stupidly, stupidly high.[1] It actually interpreted random conversation as the name of someone on a contact list? Horrible.
trust--
[1] Not even to mention, a human has context, understands boundaries, preferences, and has a ML voice recognition model developed and tuned over decades of interacting with real people. A human would also be smart enough to understand the other human's situation, context, state of mind, and realize that even just the cadence of the conversation not changing in response to queries was indicative of the humans not acknowledging the query. Machines are f'in stupid.
1 in a million happens about 4000 times a second on a single CPU running at 4ghz...
Did you really want to talk concrete numbers? Because if so, I am wondering about the probability that Amazon's voice recognition mistakes random conversation as a valid entry in someone's contact list, as well as the other terms in this equation.
I don't know if you work for Amazon, and if so I don't want to single you out specifically, but this is a pretty bad screwup, and it does not inspire confidence. Please don't brush it away with "oh it's a 1-in-a-million edge case". That attitude is even worse, and if it is indeed that attitude inside of Amazon, then I am even more strongly against this and I hope that further, deeper scrutiny is applied here, because this screwup is actually illegal.
I do not work in Amazon. I side with them on this because look at this protocol:
1 Wake up w “Alexa.”
2 respond to “send message”
3 respond to “To whom?”
4 respond to “[contact name], right?”
As an engineer (well, an AI researcher who used to engineer), that looks to me like they were not negligent and it was hard to predict background conversation would produce this unlikely set of inputs - it would be nice to see stats but this is the first time it is covered in the media to my knowledge. And as with the laughter story, they will now change the inputs to make it likely 1-in-a-trillion this will happen.
However, it does seem like Alexa etc. will have to be better about recognizing audio from TVs/conversations and stuff directed at it - and I am sure they are working on it.
PS comparison to CPU is not great obvs since it's about number of instances (how many times does "Alexa" get woken up by background audio - not 4ghz)
They would like to put this in a billion homes no doubt, and suppose that people are home, talking, a couple hours a day. Now we are talking real numbers. A few billion conversation-hours per day, 365 days in a year--suddenly one in a trillion is starting to look like it's gonna happen a couple times a year. Now if Amazon knows these probabilities--which they don't, because they clearly have not done due diligence in understanding their rapidly evolving, inscrutable voice models--they are now knowingly violating eavesdropping laws, probabilistically.
This is part of the problem. People want to handwave away small probabilities when they should be busting their asses to make probabilities actually 0--solutions like not having this crap in their house at all.
I am seeing comments saying, Alexa misinterpreted some voice commands and activated a "call" skill. This sounds like a good excuse but does absolve Amazon of responsibility. If their voice and command recognition is broken they should add second step verifications like "Are you sure you want to call X" or "Okay, I am calling 'X' in 10s" or something similar. Crazy to imagine Amazon is shipping this device with capability of video calling people and buying stuff online without having fully testing and thinking about all scenarios.
If there is even a slightest possibility that Alexa/Siri/Google/Cortana is going to misinterpret commands (with privacy implications) then they should do two step verification of some sort.
"Always listening" is a fundamentally unsafe design.
Once recordings of private conversations leave the local environment and make it to the cloud, eventually they will leak. It's akin to data collection by law enforcement: once the data exists, eventually it will be abused.
Yes. Assume that all data uploaded to the cloud will eventually be compromised. That is the only safe assumption.
Not even the most responsible companies (e.g. Google) can hold out 100% of the time in the face of determined assault by government. Some of that data is going to leak to three-letter agencies, or similar.
Somewhat less responsible companies (e.g. Amazon) will leak data more often, to a wider range of threat agents.
So then we have to consider how valuable this data is. Random sampling of private conversations within the home? Sometimes innocuous -- but if the wrong moment gets leaked, the consequences are potentially life-shattering.
That seems like an extreme perspective. Kind of like the tech version of abstinence only sex ed. Sure, it's the only 100% safe way, but it's not useful for most people. It doesn't weigh the benefits against the potential cons, or even take into consideration the actually likelihood of data being leaked.
The alternative perspective you're presenting sounds to me utterly cavalier about the prospect of ruining people's lives. It's like Equifax's attitude towards identity theft: it doesn't affect their profitability, so why care?
It's because such blithe dismissal of the damage caused by data gathering is so prevalent in the industry that the likelihood of devastating compromise is so high and the costs borne by the populace are spiraling upwards.
Some data should never be collected. Some data should never even be uploaded.
The problem I have is that you are conflating potential damage and actual damage as the same thing, which is not how you accurately measure risk.
I am honestly confused as to how you interpreted my last comment as "utterly cavalier about the prospect of ruining people's lives", when all I said was that your assumption doesn't take into account the actual probability of data being leaked and it doesn't weigh any of the benefits of data collection against that risk.
Small-likelihood times many-chances times grave-consequences equals a finite but significant number of lives wrecked. A gamble you deem acceptable.
I can only hope that karma visits those who arrogate to themselves the decision to sacrifice a few of their fellow human beings: may they and their loved ones become the sacrifices.
These devices are all driven by machine learning voice recognition. I have no idea how I am supposed to derive trust in the software running on them when even the engineers themselves are arms-distance away from their machine models, and the machine models are tuned to terabytes of input data and are trained with reinforcement learning. I don't want to disparage engineers working on these products, but hell, we don't even know how big software systems really work anymore, and we've got neural networks thrown in the mix. We're all just plugging shit together until it limps these days.
Well, when you consider that Amazon's devices once interpreted random noises as a "<trigger>, laugh" command (as per their explanation), that a voice command to record and send a message could be created from random conversation is not in the least bit surprising.
That's the "Facial recognition fooled by funny-colored glasses" study from Carngie Mellon, where researchers were able to make machine learning algorithms fail disastrously (e.g. mistake a man for Milla Jovovich) with a pair of glasses printed with what looks like a random assortment of colorful pixels, but is in fact a targeted attack specifically designed to trick the algorithm.
This Alexa failure is obviously not a targeted attack; but when your system is exposed to enough data, eventually you will stumble over some input that happens to resemble a targeted attack by pure chance, right? It's equivalent to saying that if you aimed the facial recognition algos in that paper at millions of faces wearing randomly-colored glasses, eventually some of the glasses would be close enough to the targeted-attack glasses to produce the same effect.
Obviously I'm theorizing on almost no data here - I don't know anything about Alexa's voice recognition, maybe it contains no ML at all. But it seems plausible that this might be what happened here - not a bug per se, but the natural and totally expected result of giving an opaque, machine-generated system with a very low failure rate so much input data that the failure rate is significant.
I recently stayed at a house which had an alexa device. In a conversation where I said the words light switch several times Alexa beeped and responded to me each time. I think most of us just believe Amazon when they say "Alexa responds to its name" and don't stop to consider the possible failure modes. We want to believe it can understand our words, when it's really just guessing. Over long enough time it's inevitable that it will misunderstand you and do something you don't want.
Still, I wonder how it heard "record this conversation and send it to someone on my contact list".
I recall a few weeks ago how some researchers figured out that the voice-commands could be triggered with subliminal messages, too. i.e., you wouldn't have to go 'Ok Google', but rather just play a fingerprint of that sound that makes the algorithm think you said that, without it being audible or understandable to human beings. And you could hide it in other audio like music.
That feels pretty scary particularly because these devices have such a high mandate. Right now you can shop for things online by voice command. Record and send messages. What's next, sending money, sending a data dump of sensitive data like emails, passwords, contact lists?
We've already heard of some reports where you'd have a smart home device listen to a commercial on television where they were demo'ing a purchase of some product via such a device, and interpret it as a command from the owner. Amazon took steps to avoid that with their superbowl commercial, but it seems to have done so by changing the commercial, not the product itself.
Just another attack vector to worry about. I'd happily buy these home speakers if I could limit them to just downloading information, and only uploading limited pieces of information, e.g. a music playlist. I'd want to be able to shut off any of the commercial/financial or social capabilities. I just don't care about them and they're risky.
Voice is not for personal computing. It is best for social computing. This is a completely new metaphor for interactivity, and problems are to be expected.
This incident speaks more to Amazon's speech recognition rather than a weakness of voice. Am not an expert in the topic, but it certainly seems like Alexa tries to hear patterns rather than translating speech to text then text to commands. Google Home's recognition is much better.
Question - how can these potentially nefarious acts be mitigated, where I plug a device into my home network which listens/watches "periodically".
I guess naively I'm thinking some sort of light/display which says "Amazon is sending/retrieving data" - it would need to be from a third party to have integrity, and I suppose in order to actually allay the fears of the masses it would need to be plug-and-playable.
This is why I unplug my Echo whenever I’m not using it.
Side note: why is it that after years and years of privacy breaches, there has been almost no support to use hardware switches for cameras and microphones?
I would feel 100000% safer using my internet-connected cameras and microphones knowing that I can turn off those devices independently of their host devices and that no hacker can monitor my cameras and microphones, even if they’ve rootkitted every device I own.
"He told us that the device just guessed what we were saying," she said. Danielle said the device did not audibly advise her it was preparing to send the recording, something it’s programmed to do.
Is it possible there's an exploit out in the wild for these devices that allows an attacker to control the device remotely.
Is it possible an attacker told the device not to make an audible announcement?
Yes, but it's improbable. You can communicate with Alexa at human inaudible frequency and instruct it to set it's volume to 1/10(0/10?) but of course you need to be reasonably local and targeted.
Is it possible that there's a remote exploit? Yeah of course but if it's in the wild enough that this particular individual was struck by a seemingly random (and valueless) attack we'd be seeing a lot more of these in the wild than we are.
I feel that while these devices are great in principal, there still needs to be work done on whether the "trigger word" activation is the best way to enable them. Maybe we need longer / more complex trigger words? Maybe there's a different way we can activate them alltogether, avoiding the "always listening" problem?
Really, folks... we can't even get the net to work the way we want, but then we heap these enormously sophisticated, delicate tasks onto relatively-untested, non-deterministic warez, and pray all goes well.
Much like running into a human on a bike, but less deadly. (Unless there's a psycho in my contact list.)
This makes private-by-design alternatives like Snips (https://snips.ai/) even more legit. When everything is processed locally, the user does not have to trust anyone/anything.
Same thing with my Android phone. When I am in an area without internet access, but phone-calling signal (at least) the dumb ass phone defaults to using an internal low-quality OK-Google speech recognition. So instead of looking up hours to a nearby restaurant I look at my phone and it is suddenly calling someone in my contact list (that I haven't really talked to in years)... and of course since the speech recognition blasted the CPU's horsepower, nothing on the screen is responsive and my thumbs are constantly slamming the screen with no feedback - the calls goes through for about a second or ringing before my phone decide to hang up fully.
It's like saying that someone buys a car that's powered by an explosive liquid, and then complains when all the gas in the tank explodes at the same time.
I feel like you're being purposefully reductive here. You know they're different right? The entire value proposition (and marketing strategy) of Alexa and related devices is that they are voice activated, so you don't have to physically interact with the device to make a request.
Given that, it would be ridiculous to expect paying customers to switch it off when not in use. The onus on Amazon is to protect their customers' privacy while the device is being used as intended.
If you really want to pick apart the car analogy, maybe this is closer to having automatic start and driving capability on a car, but the car sometimes randomly starts and drives away. Are you really going to blame the customer for that, even though they could disable the feature every time they exit the car?
The car analogies are usually quite poor. However, it depends on your use case. Echo Dot's have a mute button which prevents monitoring, or at least it seems so. I'm not an echo power user, so my needs are definitely different than for intended uses like home automation.
I think for some, at least users like me would like a feature similar to "su". I'd like to confirm that Alexa should start doing stuff if it has not been confirmed for x number of minutes. I think the real problem with these home assistant devices is they are not designed with a confirmation message. This would be annoying in some cases, but surely a balance can be struck.
I'm not trying to be reductive at all about the car analogy. When you turn on a car, you acknowledge that the engine is running. You can also mute an echo if you have the foreknowledge that you will not be doing activities that require voice activation.
I don't know about her but I would be upset not because Amazon recorded private conversation but because they sent it to some random contact on my contact list. I don't want to bother my contacts with my inane ramblings in the privacy of my room.
I would think what if that was not the only time this happened, what if something else was sent to someone else but that person did not did nothing about it, what if the data was sent to some server and logged, what if all it hears is transcoded to text and stored on some dev box logs now.
So when will such devices just be made illegal entirely? It doesn't seem they can be made to be safe for average people if stuff like this keeps happening and companies are unwilling to compensate people.
What if this was a lawyer talking about clients or a doctor talking about patients? Is Amazon (or Google, or Microsoft) willing to deal with such legal liabilities? Is this what move fast and break things looks like?
Edit: To those trying to downvote, remember, the Silicon Valley lives in a bubble, and for the safety of people inside and outside of the bubble, the bubble must be poked every so often. This is one of those times.
You ever butt dial someone? Should all cell phones be outlawed?
I'd say the Echo needs to have a more difficult activation routine available. Maybe have the option of setting the wake word to "Alexa, can you please" or something similarly long and unlikely to appear in normal conversation.
Also, I wonder if they had their wake word set to "Computer". I did that once, and quickly reverted after so many false activations.
I think this tech is just in it's infancy, calling it inherently dangerous or saying it should be banned is silly. I think it will be done properly some time in the future, it's most likely going to be similar to the Windows vs. Linux vs. MacOS battle.
This is something I don’t believe we’ve really figured out in terms of QAing stochastic processes. Doing QA on normal code is binary...did it work or not? Doing a QA on a model is a lot more difficult. You need to think about Type I and Type II error. If I were a product manager thinking about wake words, I’d want my Type I error to be as small as possible...to the point where I’d accept a pretty high Type II error.
Lately I’ve been seeing Alexa wake on accident more often. It’s like someone isn’t thinking critically about what kind of Type I error they’re willing to accept.
"My husband and I would joke and say I'd bet these devices are listening to what we're saying,"
Think about this statement for a sec. Of course is listening to you. It has to listen to you because it needs to be able to respond to "Alexa". So yes, this is always listening.
Now, once you come to terms with that, do you feel comfortable having a technology with access to the internet, location, habits, account, contacts, email, phone, CC information etc etc, to be actively listening to your most intimate private conversations?
It does not take a 5 year old two seconds to figure this out.
The problem with this logic is that it applies to phones, tablets, and laptops as well.
There isn't much additional risk from having an Echo or a Google Home if you already keep a smartphone within 10 feet of you at all times, which most of the people who are buying these devices do.
People make the same jokes about cell phones and location tracking. But it seems like with these invasions of privacy they're always unable to make the leap from joking about it to the reality itself.
My Echo somehow woke up when we were in another room, then heard itself talking to itself, and called my friend in the middle of the night. You can see the log of what it said and how it heard the last word as the command:
https://www.facebook.com/mike.deeks/posts/10215464075417775
It was both hilarious and infuriating. I immediately turned the calling feature off (you have to contact support btw) and later we switched to Google Home.
I don't see where Amazon would be at fault here. Remember back in the old days of T9 keyboard phones, remember that one time where you forgot to lock your keyboard when you put it into your pocket and it called a random person from your contact list? You didn't immediately call the press to say it was Nokias, Ericssons, Siemens' (or whatever your phone manufacturer is) fault you left your phone unlocked and it called a random person because of the hardware keys being pressed within your pocket.
I was chatting with my wife about something the other day and my android phone, from my pocket, randomly added to the conversation "That's good to know". O_O
Wouldn't that be a crime to record someone without consent and share that to a 3rd party? I would have called the police or is Amazon too big for the law to grasp?
That's an interesting point. In California and some other states, having a device actively recording any conversation without consent from everyone involved is illegal, much less sharing it.
My experience is totally anecdotal and I'm not even an Alexa owner, but I had a Skype conversation with a client that had one them set up in his office and it was constantly triggering, even when the word wasn't used.
So if it though it heard "Alexa, send this message to John: {conversation of ten minutes}" it just did what it thought it was supposed to do. But it's weird they didn't hear any audible confirmation or anything.
It seems like an easy safeguard to implement to prevent sending unwanted emails/texts would be for Alexa to generate a unique confirmation phrase that the user has to repeat back in order to actually execute the send command.
Something like
Alexa: "Say 'purple people eater' to send email."
Real Person: "Purple people eater."
Alexa: "Email sent."
>>>But Danielle is hoping Amazon gives her a refund for her devices, which she said their representatives have been unwilling to do. She says she’s curious to find out if anyone else has experienced the same issue.
If that's true - that dumb greedy move is going to cost AMZN way more than simply giving customer refund.
I've said it before, but this solidifies it for me - I'm never installing a internet enabled hot mic in my home. I'd rather spend the 2 seconds finding the song on my phone.
This just seems so unlikely. How was it sent? Was a MP3 file generated and attached to an email? Was it a voice call? I don't really understand a circumstance that this could have happened.
I am trying to think what kind of architecture, code design could have led to such a defect if indeed it's a defect. My guess this is a feature that got triggered by an easter egg.
I have 3 google home devices and they work very well. It also doesn't feel like a bigger privacy threat than my laptop or my phone. Also, we all want JARVIS!
It's a little more than just that. Every light in my house is controlled with Alexa which is a very nice and hard to explain until you have tried it. Also my home theater system is all controllable via Alexa as well. This kind of thinking makes no sense to me.
> People are voluntarily paying for their houses to be tapped for the convenience of being able to shout: play me some song.
> I’m so out of touch with this world it’s scary.
I could just as easily say "People are voluntarily paying for their locations to be tracked for the convenience of being able to get live maps on the go" (ie. cell phones, which also are listening to everything you say unless you disable it)
> It's a little more than just that. Every light in my house is controlled with Alexa which is a very nice and hard to explain until you have tried it. Also my home theater system is all controllable via Alexa as well. This kind of thinking makes no sense to me.
There's no reason why your house needs to be "tapped" for either of those use cases. All those things could be accomplished without leaving your home network.
There is a massive difference between walking a few steps to turn on a light switch (which is only useful when you are in close proximity) vs having a accurate real-time mapping functionality when on the go.
Similarly, there is a massive difference in the privacy implications of listening to every conversation everyone in your home (including guests) is having vs. having your current location known. Wiretapping laws exist for a reason.
Turning off all the light switches that have been left on in a big house though is a meaningful convenience. I think the privacy point is strong enough without having to trash people's use cases.
No trashing it at all, just contrasting it. I do see the convenience (luxury) of home automation, but when you compare that against accurate mobile mapping, location finding and navigation, the difference in usefulness is orders of magnitude. One can literally save your life.
I never thought that voice control would be that useful until I had my daughter... being able to turn on the white noise, turn off the tv, play her music, etc while having my hands full of toddler is very useful.
Are you arguing that we should just stop creating anything new, since we clearly survived before without it? This comment could literally apply to ANY new thing.
No I'm not, I work on home automation for a living. I am saying the value and added convenience is largely perception and it doesn't. I raised my first kid pre-echo and my second post echo. I think for the most part, raising a kid is easier than we thing, and we are surprised at how well we are doing and attribute that to things around us, much like our old superstitious ancestors.
Yeah, that is kinda my guess, but it doesn't seem to be predictable... sometimes it happens within a few minutes, other times it takes an hour or more.
Heck it helps me sleep -- I'm a light sleeper. I started using the Noisli app every night this past year and my sleep quality improved in noticeable ways.
I also live in a city, so white noise (actually brown noise) masks out tiny noises that interrupt sleep.
Mankind has made countless decisions to trade convenience for some degree of personal freedom. I can live without mobile internet - its just so convenient to have it. Voice control for home appliances is exactly the same.
If I'm home and don't have my phone (or have my hands full, or it's dark, or whatever) I can easily turn lights on or off and without fumbling around for switches or trying to juggle a cat or a child.
I can also control my tv, thermostat, get a news briefing, check my calendar and set reminders.
Yes, it's ultimately a convenience, but so is indoor plumbing and store bought bread.
>Yes, it's ultimately a convenience, but so is indoor plumbing and store bought bread.
I’m sorry, clean water and food are not conviniences. They are among the most basics of human needs. That’s the kind of thing that scares me. That someone would make such a comparison with a straight face.
This is the kind of argument that seems really common and I find very annoying. So 'indoor plumbing' is obviously used to provide us with clean water, but the convenience aspect is the indoor part - you can have clean water from an outdoor communal well, instead. Similarly 'store bought bread' is food, but so is home made bread. I just cannot work out if the commenter was being deliberately obtuse or really misunderstood?
> you can have clean water from an outdoor communal well, instead
Have you ever considered that this is not the case in most of the world? Resources are far from evenly distributed.
It is not by chance that the availability of indoor plumbing correlates to increase in life expectancy. It's nice that I can just open the tap and clean water comes out, but it's also a lot more sanitary than people carrying and storing buckets around.
>I just cannot work out if the commenter was being deliberately obtuse or really misunderstood?
Same here. Comparing the savings of a few steps towards a light switch to clean water and cheap food is intellectual dishonesty at best and severe ignorance at worse.
The whole point is that they were not comparing anything to clean water or cheap food. They were comparing something to water that doesn't need to be moved inside manually and bread that doesn't have to be made at home.
Neither of which is needed to live. Both of which are conveniences.
It's not though, please reread the comment above. Water brought and stored in buckets from an outdoor communal well would not be nearly as clean or as plentiful.
There's a reason we have indoor plumbing and convenience is just one of them. And I can't believe I have to explain this here.
You seem to be intentionally missing the point. Change my earlier examples to remote controls for your TV if you prefer.
Convenience does, absolutely, have strong value and can be meaningful improvement to one's life. Generalized voice control is a major and meaningful convenience - even if it's currently early and might have some problems - because it drastically changes how we can interact with our environments.
Sure, we can short sell the case as "not needing to walk to a light switch", but even then if it's dark, my arms are full, and the path to the switch is littered with children's toys then that's a meaningful improvement to be able to say "Hey google, turn on the living room light".
Consider cases of people with severe MS or who otherwise can't easily walk. Sure, they could hook up a remote control and keep it near them, or get one of those stupid clapper things, but even then that only works for the lights.
A generalized voice assistant gives me control over my house, from anywhere my voice can be heard, and not only executes my actions, but can give me realtime feedback and data. Best of all, this requires no special skills on my part, just a willingness to talk to the damned thing loudly and clearly with simple words.
Can it act as a wire tap? Of course it can. My store bought bread can also be contaminated (right now in my area there is a huge contamination issue with eggs and lettuce), my indoor plumbing can burst and flood my house, so on. It's a new technology, it has kinks and flaws, and we're currently in the early adopter curve of it moving towards the initial disappointment trend.
> Change my earlier examples to remote controls for your TV if you prefer.
My disagreement with you is precisely because you consider them equally interchangeable.
To me, putting those two on the same level is actually offensive.
If data on the increased life expectancy isn’t persuasive enough, I suggest you speak with someone with no access to basic sanitation. Ask them what they think of this comparison.
Because the computational requirements of good speech rec exceed what you would build into a $100 device, and because doing it in the cloud makes training and improving it easier. The “wake word” is local, which is why it is constrained to a few choices.
I suspect we are only a few years away from being able to do a slightly inferior version using local processing, or a private cloud (already possible today when there is a market). I would probably opt for that, but use Alexa in rooms already already untrusted.
Yes to the former, pretty much for the latter - the "training" I recall was less than a minute long and consisted of repeating one or two example sentences, hardly anything that would be an adoption impediment to devices like an Echo. The resulting voice recognition worked, in my experience, better than that on modern Android phones.
I'm pretty skeptical of this. Apple has a pretty good commitment to privacy, one that is costing them on the AI front, and as far as I can tell they still upload Siri audio to the cloud for processing. If this was easy to do 10 years ago, I think Apple would at least offer as an option.
Siri does more processing locally than Alexa does, but does upload to the cloud, and depends on the cloud, both for analyzing the statement and for accomplishing it. It can then execute local actions. The info uploaded to the cloud includes essentially all the relevant data for the query, so it doesn't particularly help with privacy -- it's just not the raw waveform.
But Siri sucks for MANY reasons, and not just technical ones. It does worse processing but also does worse cloud integration, and now vs. alexa, has a far worse ecosystem around it (alexa "skills" are pretty awesome, and trivial to create)
That's not why people I know say Siri sucks: for us it's because it can't look up as much info as Amazon or Google. It understands the query, it just can't do as much with it.
Because it makes the voice recognition vastly better. Have you ever noticed that Google can get words correct in navigational searches, even if it only sounds like the name?
We are a long way from being able to do that on $100 devices.
Nobody is going to disagree that it's useful and hard to live without. That's why the big 3 or so tech companies know you'll sell your soul to have it. And yes,I and you should disable location services on your phone when not needed.
Does controlling your home need sending data to an external server thought? Has no one come up with a competing product where all the processing is done in the machine itself?
Nevermind that we are already in a world where you can play any song you like from the wireless computer you carry in your pocket. That's not convenient enough.
* install device that is designed to listen to speech in the house
* the device is connected to internet
* the device is capable of contacting other internet peers/services/hosts
* the device knows a bit about its owner's internet presence such as contacts
* the device is equipped with simple conversational user interface based on fuzzy human speech-command detection
These basically bound the failure modes of the complete system. I am not surprised that a glitch like that happened. I can certainly attest to how shocking it might be to discover first hand that it really did happen, but given the context I can't say it's hard to not foresee something like this.