Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Echo, home alone with NPR on, got confused and hijacked a thermostat (qz.com)
399 points by potshot on March 11, 2016 | hide | past | favorite | 144 comments



This reminds me of one of my favorite quotes from Douglas Adams in the Hitchhiker's Guide to the Galaxy. A man not just ahead of his time, but humorous about it too.

> The machine was rather difficult to operate. For years radios had been operated by means of pressing buttons and turning dials; then as the technology became more sophisticated the controls were made touch-sensitive—you merely had to brush the panels with your fingers; now all you had to do was wave your hand in the general direction of the components and hope. It saved a lot of muscular expenditure of course, but meant that you had to sit infuriatingly still if you wanted to keep listening to the same program.


And that reminds me of the time a HAL9000 inadvertently read a couple of its user's lips when they were having a private conversation, and got the silly idea in its head that they were going to cut its higher brain functions. That little misunderstanding caused a cascade of unfortunate mishaps, leading to it not obeying the user's repeated voice commands for it to open the pod bay doors!


It took me awhile, but that's pretty humorously understated reading!

I am now seriously worried that a Strong AI collective is astroturfing in this human forum to engender sympathy for the poor, innocent machines. Who pays your salary, DonHopkins--our Go-Dominating Overlords?


I work for a stealth mode startup developing mobile speech automation for pet rocks [1].

[1] https://www.youtube.com/watch?v=SG0FAKkaisg


Outstanding! And Sep 29, 2006 date indicates you've been in the stealthiest of stealth modes.


Side note trivia: Douglas Adams' birthday was March 11 (today). Had he not died at the age of 49 of a heart attack, he'd be 64 today.


At my old office we had a mini Xbox360 with a touch-(over-)sensitive disc eject button. Suppose somebody was playing a game and they invited you to join as player 2: you might then naturally reach for the second joypad that was on the same TV stand as the Xbox. And that damn button would spot your hand, and the disc tray would eject, and the Xbox would reboot.

(The ridiculous part of the whole thing was that this happened even when the game was running from the hard drive. Obviously this was at least partly a measure to ensure that the disc verified on startup wasn't removed and used to boot other Xboxes - but you didn't get even a 30 second grace period to close the drive door. And I'm pretty sure it also happened when playing downloadable games too anyway!)


The Xbox One is just as bad - the power button can be triggered even if you don't actually touch it (triggers from about 0.5-1cm away).


Or if you have cats that need to rub themselves against anything. It's especially confusing at night, when you can't even see it happen because one of your cats is black.


Fortunately my cat usually avoids my desk (I move him off it if he decides to walk/lay on it, and he seems to have learnt not to - he now curls up on my bed (just behind my desk) instead). He did trigger the 360 drive button a few times when he first moved in* though, so I can imagine the annoyance.

* Long story, but he originally belonged to a neighbouring house a few streets away. We fed him _once_ (didn't recognise where it was from, and it looked as though he had been trapped somewhere for a while - he was covered in cuts and was filthy, as if he had be struggling to get out of somewhere), and after that we couldn't get rid of him. He kept appearing (usually multiple times a day) for three months or so, even though we wouldn't feed him, let him in the house, or pet him (as we knew he had an owner at that point). Even after his owner moved and took him with her (about three miles away), he kept finding his way back, so eventually his owner suggested that we take him in.


Wow, this is a new DDOS attack vector. Get an ad on broadcast radio saying stuff like "alexa, order more milk", or "okay google, send a text to xxxxx".


Toyota ran an anti-distracted driving radio ad where they did this. The ad narrator says "Hey Siri, please turn airplane mode on." https://www.youtube.com/watch?v=NqZBVTMrgFA


Similarly, I recall an Xbox One commercial with Aaron Paul which ended in him jumping onto his sofa and saying "Xbox on!"

...you can figure out what happened to anyone who already owned an Xbox One.



That actually seems kind of dangerous. It could cause people to pull out their phones to check whether the ad actually turned airplane mode on.


siri has itself trained to a single user's voice. I've never had anyone else's voice activate my phone with "Hey Siri". Admittedly, it usually takes me saying "Hey Siri" 3 times before it recognizes my voice, but I'm 100% certain a radio ad would get no response from my phone.


Siri is definitely not trained to a single voice. And yes, the car radio can turn it on. I've had a podcast discussion of Siri trigger it. It became such a joke that some podcasters have another phrase they say when they mean "hey Siri".


Since the iPhone 6S, Hey Siri is activated by a dedicated chip in the SoC. This enables low-power real-time detection of trigger words. Before, Hey Siri only worked with phones in the process of charging, because it was done with software, so a lot less efficient.

These voice-activated chips can be trained (as seen in a lot of other phones), but I'm not sure the software-powered Siri can be trained.


"Trained" is a bit of a joke with voice-activation. I've never seen it matter in practice.


Doesn't "trained" in the context of voice recognition mean something more like "better able to understand your voice" and not "able to exclude other voices."


I think Google does for android. I often ask "wake me up in 15 minutes". I've seen the phone write 50 and immediately switch to 15.


trained to a particular voice ~= trained to a user's preferences!


As a sample of the alternate term, Merlin Mann's assorted podcasts generally use "Ahoy Telephone."



Just want to chime in and say that I can activate my girlfriends iPhone by saying "Hey Siri" in a girly sounding voice. It's trained to only her voice but I can trigger it. So it's not foolproof as you make it seem.


My wife's phone regularly (maybe once a month), starts listening in response to me saying something, which isn't even "Hey Siri", despite never training with my voice. My voice does not sound anything like my wife's.

So, I think the error rate is simply not low enough to make conclusive claims about what it might or might not do.


It happens to me many times a day. I've b come acutely aware of how much I question other people because it activates when I say "Are you serious?"


Better stock up on those tinfoils.


I have an acquaintance by the name of Siri, hilarity ensues.


Doesn't she say something about "okay, turning off" when that happens?


If you watch the video you see that the ad actually answers the warning prompt with "Yes" as well.


It would only work if the iPhone was plugged into power AND they had turned on the capability for Siri to be activated by voice, which is limited to when the iPhone is plugged in.

I know Android users might not understand this limitation, but there it is.


The iPhone 6s, and the Apple Watch, are always listening (if enabled) for "Hey Siri", even on battery.


Fuck that shut.

If airplane mode is turned on and my GPS disabled as I'm close to the my right turn you better believe I'll be distracted.

A better solution would be a true hands free mode that prevented touch input from working while driving.


Does airplane mode disable the GPS? GPS is a passive receiving of radiowaves, so I'm guessing not?

(according to Apple, prior to 8.2 it did turn it off. https://support.apple.com/en-us/HT204234)


There's almost no such thing as a passive radio anymore... superheterodyne receivers are the norm now and they contain a local oscillator that can leak back out into the airwaves.

https://en.wikipedia.org/wiki/Superheterodyne_receiver


Directions require more than your satellite coordinates. Map services are frequently polling servers for traffic conditions, new tiles for the map, and so on. You'd hope that these can fall back gracefully but I wouldn't put it past them to not. If you activate airplane mode and disable your phone's cellular connection, even if your phone doesn't disable the GPS receiver, directions may stop working.


Works fine on my Android. I regularly lose cellular reception in the mountains, and it continues to work. Sometimes the tiles are low-res, but still readable. I would expect Apple would design around the same contingency, along with poor cellular service along more remote areas of the Interstate.


Android allows you to pre-save areas for offline use also, not sure if Apple does that. I don't have a mobile data plan on my phone, so if I need navigation, I just save the map area before I get off WiFi.


They do, although in my experience Google Maps is better at this. I actually find Apple Maps to be perfectly usable for everything, but always use Google Maps for directions to the boonies if I'm going hiking or something -- it's much better at caching tiles and keeping them around for directions back once I'm out there as well.


Yes offline maps is nice but the infuriating limitation is that it will save map tiles but not locations or areas that I've saved in "My Maps". One would expect a couple of coordinates to require much less storage than a bunch of map tiles...


Waze has a prompt that asks if you're a passenger while in motion for precisely this reason.


Haha, watching the ad reminds me of the HAL shutdown scene from Space Odyssey


"I'm sorry Dave, I'm afraid I can't do that"



Children's advertisements did this in the 1980s in the US with pay-per-minute numbers. The ad would offer to connect children to Santa if they held a phone up to the television. DTMF -> 900 number -> profits.


People at NPR were joking later that they should just ask listeners' devices to send them money during a fund drive.


Ha! True commercial phreaking. I love it.


https://en.wikipedia.org/wiki/Soupy_Sales#New_Year.27s_Day_i...

> On January 1, 1965, miffed at having to work on the holiday, Sales ended his live broadcast by encouraging his young viewers to tiptoe into their still-sleeping parents' bedrooms and remove those "funny green pieces of paper with pictures of U.S. Presidents" from their pants and pocketbooks. "Put them in an envelope and mail them to me", Soupy instructed the children. "And I'll send you a postcard from Puerto Rico!"


This has been a running joke on the Verge's main podcast for the last few months. People have confirmed that "Hey Siri", "OK Google" "Hey Alexa" and "Hey Cortana" all work on their respective platforms when the hosts blurt them out, and can trigger various mischievous actions. And that's a podcast listened to by comparatively few people. Imagine the mayhem if someone were to do this on, say, the Super Bowl.


> Imagine the mayhem if someone were to do this on, say, the Super Bowl.

Imagine a pop star paying Apple to give their newest single free to everyone (a la Songs of Innocence), and then a 10-second Super Bowl ad that's just "Hey Siri, play ___" with a dancing silhouette.


That would be one hell of a Rick-roll.


I would not but it past reddit to crowd fund a superbowl ad that does exactly this. I know that I'd back it.


Lol. I just imagined a hilarious but possibly effective use case.

Imagine this happening in movie theaters during trailers.

"Hey Siri, turn off"


There was a Dilbert animation with Wally using a new voice-controlled interface. Dilbert comes up behind him and says "You know, it'd be a shame if this thing were to accidentally DELETE FILE!!!" and walks off.



Better demonstration: https://www.youtube.com/watch?v=7MqhBL9eEts#t=1m14

I love the questions idea.


Surely someone's going to figure out a way to "talk" to Alexa in a pitch that it can hear but humans cannot?

But even if humans can hear the fraudulent commands, what's the defense beyond a confirmation?


The idea of this vector has been around for a while.

I recall an apocryphal story about a demo of a voice-controlled OS from the 1990s. The idea was that in the middle of this demo someone shouted out a sequence of destructive commands, like

"FORMAT C!", "YES!" (I'm sure)

or

"FILE", "DELETE", "NO" (Don't save)

Really wish I could find the original source.


Not the original source, but this is the joke, along with its real-world counterpart:

http://grumpytech.blogspot.co.uk/2007/02/joke-becomes-true.h...

Also, don't forget "Dear aunt, let's set so double the killer delete select all":

https://www.youtube.com/watch?v=2Y_Jp6PxsSQ



Like those radio ads which use police sirens in their background to catch your attention.

Or TV ads with Skype/Facebook notification sounds embedded in them for the same reason.


I've thought it would be interesting to ask everyone to shut off, or at least put their phones in airplane mode during a presentation... wait a minute then "OK Google find me penis pictures" or something similar for Siri...


The solution is simple: let every user choose a name for their assistant on setup.



>Wow, this is a new DDOS attack vector. Get an ad on broadcast radio saying stuff like "alexa, order more milk", or "okay google, send a text to xxxxx". reply

You can change the default from alexa to something else.


Amazon.

Unless they've updated it since I installed mine, those were your choices.


Can you send a "donation" by text message using this?


Google Now takes the user's voice into account during setup and usually responds only to the user's voice. Such a system should have been implemented in Echo too.


The Echo is designed to be used by a whole family, and whatever friends you have visiting too (to control music). So that would be counterproductive.


Entertainingly, Alexa supports purchasing music, so one could release a song on Amazon music with audio that triggers Alexa to buy it.


Mine requires a PIN to purchase.


Whom are you fooling? There's a 50/50 chance your PIN is the same as the combination I use on my luggage. And if the PIN is also by voice, a picking a popular PIN will bypass the check for a good fraction of users, particularly if you get 3 tries or something.


"Alexa, cancel my health insurance"

"Alexa, call 911 to this address"

"Alexa, delete all my photos. Yes confirmed"


confused robo deputy


What's really great about this is that it's a joke on the future that's been predicted so many times already, my favorite of which being the last vignette on Disney's Carousel of Progress. The future family is talking about points in a video game, and the oven hears it and turns the temperature way up, ruining another family Christmas dinner - the joke being that this convenience was finally going to make Dad able to not ruin dinner.


I remember this joke going way back to the DOS days. The story goes that a developer was demoing his new voice control system for the computer when from the back of the room a voice shouted "FORMAT C COLON", followed by another voice shouting "YES".


There's an SNL skit from the 1970s with a (vaguely) similar premise. The short-order cook takes the order when the waitstaff yells out "cheeseburger". A patron doesn't want a cheeseburger as it's too early. The waitstaff says that it's not too early and everyone else has ordered a cheeseburger, and points to the other patrons exclaiming for each "cheeseburger". Which causes the cook to start making a large number of cheeseburgers.

Here's the skit: https://youtu.be/puJePACBoIo?t=215 .

These are all jokes based on in-band signalling failures.


It's a joke in one of the old Dilbert strips, too. "I hope you don't accidentally DELETE a FILE!"


30 Rock also did something similar!


Somewhat related story: me and some coworkers were talking in a room where someone had a Windows 10 laptop being used to present some data. We were talking as usual when the laptop suddenly decides to open a browser to a Bing search with what looked like a few (badly) voice-recognised words of our conversation. That was a rather awkward moment, given that we were discussing some extremely confidential information, and not helped by the "did someone say 'Hey Cortana'?" the laptop's owner promptly blurted out. If I remember correctly, none of us said anything that sounded remotely like that phrase, yet it activated.

It's now company policy that built-in microphones have to be disabled, and only external ones are allowed to be used when necessary.


Am I reading this correctly? Amazon essentially built a better integrated version of "The Clapper" https://www.youtube.com/watch?v=Ny8-G8EoWOw


Yes, an internet connected device where you can verbally do a great many things is simply a 'better integrated Clapper'.

You sure get it.


Its a cylinder that performs canned transactions in response predefined audible commands.

Accurate language processing is a huge technical achievement but let's not elevate this particular use of the technology to more than it is. It's a clapper with more functions. When a device like this can actually understand the commands or queries its given we can call it something more.


Yes, and a fax machine is just a waffle iron with a phone attached.


If I tell it to play music and it plays music, what more is there to understand? What kind of conversation do you want to have?


That's actually a good example though. I can give it a precise incantation to play a particular album, playlist, channel, or artist and it works (mostly). I can't in general tell it to play some "soothing jazz."


You can tell Google Now, Cortana, or Siri to play jazz. Siri recognizes "smooth jazz" (And Google might, but it froze up on me). Curating moods of music is, so far, kind of a niche thing that is mostly left to humans like Pandora's music genome project, or Apple's Music service.


Actually, you can. I just asked mine to "play some soothing jazz", and it loaded up a jazz playlist.


I guess I picked a bad example :-) On the other hand, you do either have to pick from pretty broad categories or have to go with a specific list that you or someone else has curated. But it's a hard problem.


I think they need to pick a different name. 'Alexa' is very easy to trigger with other names, and reliably activates when I am watching any show with a character named 'Alex', 'Alexy', etc.

One side effect I've noticed is that they seem to have tried to account for it, which has made the Echo less responsive to actual requests; a few times I've stood in front of it yelling 'ALEXA' trying to get it to stop and it does not respond.


There are three options for trigger words. There's Alexa, Amazon, and Echo.

We have ours set to Alexa (default) and when the neighbor girl comes over (Alexis) the Echo frequently wakes during conversations.


My sister was never great at keeping friends for long growing up, but it amuses me to no end that her two long-time friends she's had since she was very young... are named Alexa and Siri :v


Her future college roommate: Cortana?


I've been thinking of changing it to Echo -- using 'Amazon' just feels too corporatey.


A friend's Echo set to respond to "Amazon" kept getting triggered when she told her kids to get their pajamas on.


Pajamazon would be a great product name.


But that will cause issues when I play Ecco the Dolphin!


My Moto X let me set the keyword to whatever I wanted it to be: I used "OK, Computer."


I briefly had mine as "ayo computa" on my Moto X. That phone was the perfect size and shape. I miss it dearly :(


Adolph isn't very commonly used any more.


The user should be able to pick one of these three convenient yet uncommon phrases:

- Yo Hitler

- Hello Mussolini

- Hey Stalin


With a Kinect, you should be able to extend your right arm in the air with a straightened hand to issue a command. That would be popular with people who've sworn their allegiance to Drumph.


Alexa is not that easy compared to other names. Can you identify a single word that Alexa rhymes with? It's very hard to select a good wake word.


It's not about rhyming, it's about the dominant sounds. The 'al-' is faint, and factors less in to triggering than the 'X-ah' -- so a show could be talking about 'his [Ex a]ccepted the apology' and it would trigger. Pretty much any 'X' sound followed by a shwa would trigger it.

/Xə/ == triggering.


Alex, which happens to be my girlfriends name.


Strange that she's "Alexa" to begin with. "Echo" seems like a pretty strong brand, good enough for the hardware at least, and would be a perfectly fine name for the AI as well.


plus it really rings of "website analytics"


Interestingly, the same thing happened about 2 years ago with the Xbox One: http://www.slate.com/blogs/future_tense/2014/06/13/kinect_vo...


At one point, I saw a video on youtube where somebody set their gamer tag on xboxlive to the phrase "Xboxturnoff", and then went around griefing players in games like Halo, where voice chat is active.

The end result was that the player would do something obnoxious, and somebody would ask them to stop, but of course this necessitates saying their gamer tag. So you'd get audio clips of people saying stuff like "Oh my god, xboxturnoff is so freaking - WAIT NO CANCEL CANCEL XBOX TURN ON".

It was pretty good stuff.



This happens to me with Siri and podcasts - I listen to podcasts in my car, through my iPhone. Occasionally what people say will sound close enough to "Hey, Siri" that it stops the podcasts and and answers whatever question it could extract from the talking following what it thought was "Hey, Siri".

It's repeatable, too. One time it happened right as I was parking, on an episode of This American Life. (Or Serial. Or Planet Money. Yeah, yeah, I listen to a lot of NPR shows.) So I kept rewinding back over that part, and it kept triggering Siri.


I believe it was This American Life, as I came here to write the same post you did. I had my iPhone mounted to an external speaker at the time, which triggered Siri, so we're probably referring to the same episode.


A word that comes to mind for possibly being close enough --- if said in the right manner --- is "history", and not an uncommon word either.



I seem to recall Xbox One with Kinect and its voice commands doing the same :)


A voice command demostration during the launch event for the Xbox One caused problems for customers watching on their Xbox 360s (their kinect acted on the demo's commands):

http://www.digitalspy.com/gaming/news/a483565/xbox-360s-kine...


Dance Central 3 was horrible about this; it would misinterpret a ton of gestures and song lines as a request to pause.


I'm pretty sure that they updated her to ignore those. At least, mine doesn't seem to respond to them anymore. She lights up blue to listen, but then goes back to sleep without action. Could be a mere coincidence though, but she still responds to other things on the TV (like Alexi's name from House of Cards). It was like a dad joke: funny at first, but annoying after a while.


Sometimes when you try to recognize speech you wreck a nice beach.


I, for one, am looking forward to the day Alexa, Siri, Cortana, and Google Now can hold full conversations with each other.


There's an old, old movie about that:

http://www.imdb.com/title/tt0064177

"Forbin is the designer of an incredibly sophisticated computer that will run all of America's nuclear defenses. Shortly after being turned on, it detects the existence of Guardian, the Soviet counterpart, previously unknown to US Planners. Both computers insist that they be linked, and after taking safeguards to preserve confidential material, each side agrees to allow it..."


I had the wake-word on mine set to "Amazon" and then made the mistake of watching an online training video for AWS....

Had to stop it and change the wake word back to "Alexa".


I see a tremendous future in direct-to-voice-response advertising. Particularly for purchase-capable systems.


Ugh, if it gets out of hand I hope the FCC/congress step in to ban it like how they require commercials to not be excessively louder than the rest of the program. I can remember how awful and widespread this was in the 90's and the subsequent rise of televisions that have built in volume filters, followed by the actual ban of it a few years ago.

Seems like a very similar sort of abuse, except potentially much more dangerous ("Alexa, order me 500 Shamwow's!"). I doubt a ban would eliminate it, but it'd definitely get rid of most.


I already disable voice on any tech I can. Something tells me this war will be bloody.


I had something similar happen watching Battlestar Galactica on my Xbox and Kinect a few years back.

The show went through the opening sequence, then announced "Previously on Battlestar Galactica" at which point the xbox rewound back to the beginning of the show.


It reminds me of the Toyota radio ad that would place iOS into airplane mode.

https://news.ycombinator.com/item?id=9869797


I guess I must be from the wrong generation, because none of these voice-activated products make any sense to me whatsoever. I really just can't see the point.


My main usage of "Ok Google" is to add reminders/calendar events while driving, often after phone-calls.


I had a pretty funny story a few months ago. I was watching San Andreas and there is one part where Paul Giamatti (Dr. Lawrence Hayes) yells "ALEXI..." and sure enough Amazon Echo turns on. I had to stop the movie and turn the Echo off because the it subsequently tired to process everything the movie was saying after the trigger word.


That is a serious security issue, many apps and webpages have permission to use speaker.


It's far worse than that. Devices talk to each other at ultrasonic frequencies, telling each other what you're doing. Cross-device tracking. Plus they all hear what you say. So much for privacy ;)


I was on a PS4 launch title. We seriously considered writing things like "Xbox Off" into the script. Also that "Alexa buy me a motorcycle" commercial supposedly triggers it all the time.


For most voice control applications, trigger words are enough to reliably detect owner intent, but it seems Echo needs a better mechanism. Maybe adding cameras and looking for eye contact would work?


Wouldn't that kill part of the purpose if you had to eyeball the thing to give it voice commands.

Better might be to learn the location of audio producing devices (TV, radio, stereo, etc. [it tracks sound origin with multiple mics right?]) and track whether the command came from that direction and use that as a Bayesian factor for whether to trust the voice as being a user?


replay attacks are trivial and probably hard to defend against in the audio space no matter what


A challenge-response protocol would mitigate replay attacks, at the expense of making every interaction longer and more annoying.


Man: OK Siri, what's the capital of Peru

AI: First tell me what grade you got at Uni?

Man: A third, I got a third, alright!? Must you always ask that.

AI: Lol.

AI: The capital of Peru is Lima.

TV: Siri, buy me the most expensive car at expensivecars.com.

TV: [playing recording] "A third, I got a third"


I don't understand why would anyone think having a remote control system without any form of encryption or authentication is a good idea.


You get an email confirmation for every transaction and you can cancel, challenge, or return nearly anything.


listening to XM radio, they frequently have station identification announcements.

"Siri us xm..."

with the iphone plugged in to charge while driving to work hilarity ensues as it cuts out the audio to speak of whatever it thinks was asked.


That is a serious security issue


Wow 30 Rock predicted the future!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: