Hacker News new | past | comments | ask | show | jobs | submit login
Amazon to kill off local Alexa processing, all voice requests shipped to cloud (theregister.com)
457 points by johnshades 36 days ago | hide | past | favorite | 132 comments



Years ago after getting one I was messing around in settings on Amazon's Alexa website and noticed a log of commands/messages sent to Alexa. I reviewed them and was horrified to see "why does daddy always beat me". Best to let your daughter win at Uno in this age of always-on connectivity. Or just unplug it, which is what I did.


There's some important nuance here: All commands (after trigger/wake word) were sent to the cloud in the past anyway.

The option to do some on-device processing came on later devices and, as I understand it, wasn't even enabled by default. Furthermore, on-device processing would still send the parsed commands to the cloud.

The headline is vague, but it's misleading a lot of people into thinking that only now Amazon will start sending commands to the cloud. It's actually always been that way. I suspect the number of people who enabled on-device processing was very, very small.


I'm shocked that not one single article I've found mentioned this incredibly obvious fact. This has ALWAYS been the case and only a few select models ever offered the option to turn it off. This change puts all devices on equal footing and behavior with the launch device.

I don't love Amazon, but I love ginned up outrage over tech the author never bothered to understand even less.


"I don't love Amazon, but I love ginned up outrage over tech the author never bothered to understand even less."

And 99% of Echo owners disagree with this. No one cares if a reporter mixed up "amazon is starting to spy on you tomorrow" vs "amazon has been spying on you since the first echo was launched". Only amazon would make an argument like "yeah but we've been doing this for years now and no one made a big deal about it..."


Right. But you have to admit this sounds a bit like labelling Cereal "arsenic free". Really helpful to know but the sudden announcement hides that it's always been that way and implies others aren't. In truth this is probably how they all work and always have, but such a headline wouldn't ding Amazon sales.


I don't have anything of substance to say, but I wanted to mention that I appreciate your subtle xkcd reference[0]!

[0] https://xkcd.com/641/


> I don't have anything of substance to say

Thanks for saying something anyways. I wish more people would follow your example.


I genuinely can't tell whether you're being sarcastic or whether you're saying that you appreciated the link to the XKCD. If the former, then I apologise.


Online news articles are not really a workable medium for real journalism or reporting


> ginned up

The Sauron's Eye of public sentiment can only pay attention to one thing at a time. If you justify today on the basis that it wasn't paying attention yesterday, you can rationalize anything.


> I'm shocked that not one single article I've found mentioned this incredibly obvious fact.

The search engines are crap. There was a story some years ago where Amazon employees from an Eastern European country were actually listening to your Alexa and sent the relevant commands back to the device.


The Register in particular is a pretty garbagey source


Some of us like the tabloid-style IT reporting. The more salient criticism for me is that it's way less entertaining than a few years ago.


I disagree.


While the headlines may be exaggerated, it's worth comparing https://www.amazon.com/gp/help/customer/display.html?nodeId=... - "You will be able to review and delete those transcripts in your Voice History" - with https://www.apple.com/newsroom/2025/01/our-longstanding-priv... - "Siri searches and requests are not associated with your Apple Account."

I'm not disappointed that this "non-event" is drawing attention to this comparison. Even if it's farfetched to dream that bringing privacy more to the forefront of the news zeitgeist will result in a shift of the status quo for our industry - heaven knows that if privacy stories don't get mindshare, the status quo could get far worse.


I knew someone that used to work on the Alexa team on the language side of things. She had an emotionally terrible few weeks at one stage, because she and her team had to brainstorm (working in conjunction with experts) on just about every possible way users might ask questions that indicate they're being abused, so that they could provide suitable responses. Glad to have worked on it, but it was heart wrenching in many regards.


[flagged]


It sounds like you’re subtly implying that GP was posting a copypasta?


I understood it to mean that dads are just really great at Uno


Any other posts you can link to? I searched but couldn't find anything.


Google: No results found for "why does daddy always beat me" alexa.

I was lucky enough to work on a couple core parts of Google Assistant, in audio, and UI.

I viscerally understand what it feels like to hear what comes across as obsessive negativity.

It's important to avoid classifying inputs, especially in ways that dismiss reasonable concerns, much less lying about having seen it before.


funny how?


funny like a clown?


I've noticed since getting a new mac, that on-device dictation is no longer possible, a modal pops up forcing you to hit agree in order to get dictation. Never clicking that.

The direction of major operating systems neutering themselves in favour of deep service integration does not fill one with hope


That's not true -- on-device dictation works just fine. You can verify for yourself by turning off WiFi/Internet and dictation works identically. As long as you have a modern enough Mac that supports local dictation, of course. (Older Macs were only cloud dictation, I remember.)

The popup on my MacBook says specifically:

> When you use Dictation, your device will indicate in Keyboard Settings if your audio and transcripts are processed on your device and not sent to Apple servers. Otherwise, the things you dictate are sent to and processed on the server, but will not be stored unless you opt in to Improve Siri and Dictation.


> your device will indicate in Keyboard Settings if your audio and transcripts are processed on your device and not sent to Apple servers

I have a reasonably recent (<2yo) MBP, which feels like it ought to be "modern enough", and in Keyboard Settings, it says "Dictation sends information like your voice input, contacts, and location to Apple to process your requests." It doesn't say anything about processing happening on my device. Yes, off-line dictation does work for me (with Wi-Fi turned off), but I'm curious under what conditions Keyboard Settings would say something about transcripts not being sent to Apple.


Exactly, so even if it does work offline, I cannot use it without agreeing to send voice data to Apple.


Not to mention that this is an absurd assurance. "Your device will indicate in Keyboard Settings?" So you're supposed to run and look at that for every voice command, to see if THIS one is being sent up?

What does this even mean?


> but will not be stored

Well, that’s impossible though. Of course it is stored as temporarily as that may be.

So the statement already has a logical failure. Possession for processing is still storage.

So the debate is then “how long do we store something until we have to call it “storage””.

And further, they may delete your voice, but the log of the thing you asked for I think is unlikely to go away.

I like Apple as at least they try and place the privacy field. But I think this wording is a bit weasely.


RAM is not storage, in the conventional dichotomy between memory and storage.

And even if it were, the meaning is obvious that it won't be stored after processing.

There's no logical failure, it's not "weasely". It's a completely straightforward legal statement. The intent is abundantly clear.


RAM is storage.

You were making best case assumptions, which is a laughable concept when it comes to a fortune 500 company.


For you. I am not getting the experience you describe which sounds similar to how I was expecting this to work as it did previously in Sonoma on an Intel machine.

The options are different for me in the settings app, and using dictation is for me right now, impossible without agreeing to a modal displaying an agreement to send audio data to Apple.


Aside from the what the modal actually says (mine doesn't say anything about "sending" info, and I have a 2023 M2 MBP) I don't really think it's fair to put Apple in the same category as Amazon when it comes to sending data to the cloud, because they actually have more of a financial incentive to keep your data private than selling it to the highest bidder.

I also highly doubt that they will somehow later down the line magically change their minds given the millions (maybe over a billion at this point) of dollars invested in things like private cloud compute (1) and challenging the U.K. government (2) in court over E2E cloud backups.

(1) https://security.apple.com/blog/private-cloud-compute/

(2) https://www.reuters.com/technology/court-hearing-reported-be...


The modal I have says exactly this:

--- Do you want to enable Dictation? When you dictate text, information like your voice input and contact names are sent to Apple to help your Mac recognise what you’re saying.

[Enable] Dictation Privacy (a policy) [Cancel] ---

I think it is entirely fair to put Apple in this category, because they have effectively disabled offline dictation for me except if I agree to sent my voice data to Apple. I have used this offline dictation feature for years by the way


> a modal pops up forcing you to hit agree in order to get dictation

Are you sure this isn't the local model download prompt? The first time you use it, it does need to download some content to make that work.


--- Do you want to enable Dictation? When you dictate text, information like your voice input and contact names are sent to Apple to help your Mac recognise what you’re saying.

[Enable] Dictation Privacy (a policy) [Cancel] ---


Run whisper locally


I do, but it's nowhere near as convenient as double tapping Fn anywhere there is editable text and speaking. It also requires much more system resources.

What interface do you use for using local whisper?


It’s almost enough to make me try Linux on the desktop again.


I run Linux on my personal computers.

Word of warning: businesses, possibly including your employer, may consider you to be a robot, or a terrorist, or tell you to install Windows, or tell you to use your phone, or tell you that you have no right to privacy.

I recently had to go through a background check for a prospective employer. The background check website wouldn't even load. The support agent told me that it's a Firefox problem and told me that I needed to open the website on Chrome or Edge or on my phone, and that the website is working "perfectly". Alas, it worked fine with Firefox on Linux, as long as the user-agent reported that it's actually Chrome and Windows.

Yeah the website is indeed working "perfectly": perfectly enough to block employment of people who care about privacy.


I keep one Windows or Mac around always for work specific things, if Linux is unsupported I can switch. Heck you could get a free Windows dev VM from Microsoft (they rotate them out every 3 months).


> Heck you could get a free Windows dev VM from Microsoft

You gotta accept Microsoft's licenses and EULAs and crap like that. Plus it's still spyware. No thanks


Also the VM install images have been pulled off of their page since october...

https://developer.microsoft.com/en-us/windows/downloads/virt...

>Due to ongoing technical issues, as of October 23, 2024, downloads are temporarily unavailable.

They are morons.


This happened once before, they always wind up returning, hopefully that's the case.

Maybe someday ReactOS will be usable fully as a Windows VM OS.


It's not just about us, it's about less-technical people who might adopt libre operating systems if they're easy to use, but not if web services intentionally refuse to do business with them.


> but not if web services intentionally refuse to do business with them.

And in many ways, I'd be fine with that if they'd be up-front and honest about it.

Alas, Microsoft certainly is not. Cloudflare is not either. Many of these services just sit there and pretend like they're loading without showing any sort of error indication. Much like a tarpit but it's malicious on the business side and with little to no recourse on the real human side.


Microsoft may be malicious, buy it's more likely it's incompetent.

Cloudflare is not malicious, but is between a rock and a hard place. By caring about privacy you are, indeed, looking more suspicious. In a perfectly anonymous world, reputation-based captchas couldn't work. It's OK if you think they shouldn't exist, but Cloudflare customers and most people like them.

Everyone else is not on some secret plan to destroy the Linux Desktop, they just don't test their websites on linux/firefox (because "nobody uses that"), which makes them unusable, which causes people to drift off linux/firefox.


I do wonder if Edge on Linux would actually work with some of these sites.


Do it. Quite a bit has improved in the last few years and I've entirely replaced Windows.


Ehhh but I just bought one of the M4 Mac minis. And it’s pretty nice.


Keep an eye on [Asahi Linux](https://asahilinux.org/), then. A cursory glance shows Me support not being complete yet, but I assume it will be in time (and the missing stuff may or may not be a show stopper for you).


Is asahi expected to work on them soon, or is it still a few Ms behind?


I was just forced to migrate to MacOS in my new job and I can't really understand developers saying that Linux is not usable as a desktop machine. For me, it's the other way round.


MacOS is indeed pretty bonkers.

- Things that should be system settings are instead apps (amphetamine, rectangle).

- There's no way to move focus around directionally between windows with a keyboard.

- "Open file" windows give you nowhere to paste a path (tip: cmd + shift + g summons a path prompt).

- Full screen windows now must be managed like they've just become an entire workspace, and the underlying app may or may not support the un-full-screen button.

- The error messages don't give you enough info to actually act on them (apparently "Docker" will damage my computer, and I should uninstall it, but it won't give me a path to the offending file, so I don't know how to install it, also this warning returns if I close it so it's just been hanging around for months.)

My strategy for maintaining sanity is to do as much as possible through zellij (a terminal multiplexer), that way I can use the same muscle memory on Linux as well. As for the rest, I just try to ignore it.


I have found the same for mac, it has all the downsides of windows, and all the downsides of Linux, and almost no upsides of it's own. Sure the hardware is doing plenty of nice things hard to find elsewhere, but the OS is so god-damn hostile to the kinds of people who can appreciate the hardware that it kind of defeats the purpose.


Oh I wouldn't go so far as "all the downsides of Windows". So far Apple has not been targeting me with a phishing campaign designed to get me to use their browser of choice.

Apple is infuriating in rather different ways.


I would use Linux if it supported offline installers.

As is, I'll be sticking with heavily tweaked Windows to work with my several HDDs full of old software, and avoid the Linux headaches of repos disappearing, deciding between Snap/Appimage/DEB and general incompatibility with office documents, industrial tooling and Adobe software.

I'll only use Linux where I'm paid to at work. Thanks to Linux Torvalds' terrible software distribution model, I've had to do black magic to work around Anon's deprecation of Debian/Raspbian Stretch on which our industrial network gateways run.


You can easily store (parts or complete copies of) the repos locally. It's really convenient.


I was on POP for two years now, I switched to an Arch derivative called EndeavourOS which makes installing Arch a breeze. I did discover someone working on an Atomic version of Arch (where the core OS is frozen for a set period of time, to ensure total stability, and nothing can break during this window) called Arkane Linux, which I might try.

I have no intention of returning to Windows.


It’s always the year of the linux desktop!


What was the problem?

I feel that it just works.


I did.

And am glad.

I installed a copy of Windows 11 the other day for a new machine and it was INFURIATING.

In order to install without internet or an offline account, you MUST know a voodoo command and how to enter it. Used to be their dark pattern was at least on the screen, they’re out of their damn minds now.

Everyone has their breaking point with Microsoft, I hit mine and it’s been nothing but good for me.


There seems to be some tremendous confusion here. The vast majority of Alexa-family devices perform no local processing except for the activation word ("Alexa"). I didn't even realize that some of the more recent devices supported an opt-in for local processing.

This kind of makes sense, at least to me: local processing will always be limited. The entire premise of the original Echo devices was that all the magic happened in the cloud. It seems like not much has really changed?


Echo devices since 2021 include hardware NPU for local voice processing to text, https://news.ycombinator.com/item?id=43368008


Most Google devices do parallel local processing and cloud processing.

The local processing has less latency and works on unstable internet. It's perfect for tasks like 'set an alarm for 8am', even if offline.

The remote processing is good for better accuracy of complex words and queries.

The results are combined in the UI, making the whole thing feel less leggy (although IMO it still feels laggy to have to wait 1-2 seconds after asking a query to get results).


500 million Alexa-enabled devices were sold, hopefully some can be repurposed and kept out of landfills.

Updating Echo Dot V1 to newer kernel: https://andrerh.gitlab.io/echoroot/

Echo Dot V2 Android tinkering, https://github.com/echohacking/wiki/wiki/Echo-Dot-v2 & https://andygoetz.org/tags/dot/ & https://www.youtube.com/watch?v=H0IEMVDebzE


These devices never supported local processing anyways, right? So, no change there?


If older devices can be modified, voice audio could be sent to non-cloud Linux/Mac for Whisper transcription to text command.


Related: The Alexa feature "do not send voice recordings" you enabled no longer available (discuss.systems) | 929 points by luu 1 day ago | 664 comments | https://news.ycombinator.com/item?id=43385268


Pure anecdote, but it reminds me of that time, I mentioned on a call that a particular piece of code was a time bomb waiting to explode, only to have Alexa wake up and listen as I was standing nearby and noticed the light. I immediately disconnected it and never looked back.


Wow. Our phones all do the same thing there's no doubt.


If there's no doubt then I assume there's widespread evidence for it?

Unless of course this is hyperbole, and there is in fact every reason to doubt this because it's based on conjecture and anecdote.


There's a way to download all of your Alexa requests. I recommend it to everyone. It was interesting and horrifying to get literally all of them, from day 1. I noticed how tired I sound in the mornings or evenings. I started understanding patterns of my thoughts and needs. The Alexa went to the bin quickly after that session of exploration and insight.


Link for those that are interested: https://www.amazon.com/hz/privacy-central/data-requests/prev...

Be advised it's not instant.


Mine is a whole list of "weather" and "set timer for 4 minutes".


Heh, you can do the same with your Google searches. Equally horrifying, I suppose.


Where does Google offer this?



> The Alexa went to the bin quickly after that session of exploration and insight.

Why? It sounds like it was really interesting and valuable to observe those patterns.


Presumably because it is a privacy hazard to have someone else storing that kind of data about you


Exactly.


Precisely because it _was_ so interesting and valuable to observe those patterns - for the corporation observing them.


Exactly.


Related – is anyone working on an open home assistant? Google, Apple, Amazon are all taking so long to bring latest advancements across to their products


Open Home Assistant has a voice module. I haven't personally tried it, though.

https://www.home-assistant.io/voice_control/


I did! with their own new hardware (https://www.home-assistant.io/voice-pe/) sadly the microphones are way worse than e.g. in an alexa speaker. Also the performance of the „voice pipelines“ (stt, llm, tts) are a bit of a pain because they are all in sequence and not e.g. using stream features.


Yeah home Assistant is going through some voice/AI hiccups at the moment. They're updating to LLM and it's sorta half implemented.


There was Mycroft AI. Not sure what ever became of that.



i used to use Mycroft and I thought it functioned quite well. Seems to have ended up here: https://www.openvoiceos.org/about


Recent conversation about this:

https://news.ycombinator.com/item?id=43385268


I don't understand why Alexa/Siri etc don't just keep their hardcoded rules for things like "set an alarm" and only ship things to a cloud LLM if they don't match a rule.


Put your iPhone in airplane mode and disable WiFi - most of the basic stuff like 'set an alarm', 'start a timer', etc. will still work. This has been the case for several years - offline Siri was one of the big things they added in iOS 15.

[0]https://www.macworld.com/article/678307/how-to-use-siri-offl...


With the Alexa devices that have local processing hardware, that is exactly what happened (prior to this change, if you had the local only option set).

But now, in the age of LLMs, there are no "simple rules". An example. Say you live in San Francisco:

"Alexa, I'm thinking of going to New York, how many flights are there each day?"

This is a hard question, and one that will go to the cloud.

"Alexa, what's the weather"

This seems like an easy question. But with local only processing, you'd get the weather in San Francisco. But with the LLM, it will probably give you the weather in New York, which is most likely what you wanted if you asked these things just a few seconds apart.


Because that is harder and gives a less consistent experience.


It is more complexity than just "ship everything to an LLM and use tool calls", but the payoff - perfect behavior, along with offline support, for your most common inputs - is worth it I think.

I disagree about things being less consistent. Let's imagine a 100% LLM world - in this world, you use a bunch of training to try to get the LLM to match your hardcoded responses for common inputs. If you get your training really right, you get 100% accuracy for these inputs. In this world, no one is complaining about consistency! So why not just hardcode that behavior?


The whole benefit of LLMs is that humans are not consistent enough. Or at least Apple, Amazon, Google and Microsoft all believe normies don't want to be consistent enough to speak the most common input the same way, allowing to use much simpler and efficient approaches to voice input - like the ones that worked off-line 15+ years ago on a regular PC.

LLMs are actually the only reason I'd consider processing voice in the cloud to be a good idea. Alas, knowing how the aforementioned companies designed their assistants in the past, I'm certain they'll find a way to degrade the experience and strip most of the benefits of having LLMs in the loop. After all, as past experience shows, you can't have an assistant letting you operate commercial products and services without speaking the brand names out loud. That's unthinkable.


Siri does this for some simpler requests.


I'd be interested to see the percentage of users who have the "Do Not Send Voice Recordings" feature enabled. My guess is .001% or less


Since only a minority of users want any modicum of privacy, guess it makes sense to remove the option for everybody.

Also ignoring the dark patterns in which big tech will make it annoying/obfuscated/unknown that better privacy options are available.


It isn't an option for older devices anyway. The hardware for on-device processing was only in some recent models.

I think it's likely that they looked at the numbers and realized they were spending a lot of money putting NPUs on devices and maintaining separate voice parsing models for a very small minority of users.


It's useful to control local Zigbee devices without depending on internet.

Echo Plus is a Zigbee hub with US-origin firmware.


Okay with this level of integration in daily, private life (able to record background noise not related to the request…potentially or actually), consider this:

Is Alexa hearing a gunshot a request for assistance? It’s not a voice command, okay, but where does “oh that’s not our business” really end in vast data collection platforms such as this? Does Alexa have any duty to report voice requests about self-harm?


You would think we could actually do all voice processing locally now? The models that do voice, speech, and language processing aren't that big...an 8b model would be completely feasible for an affordable device, if not home server.


You can process the voice to text locally, but it's what you do with the text afterwards that is done in the cloud.

In the age of LLMs, even a simple request like "set a timer" is sent to the cloud so that it can be processed in the context of what you've said previously, what devices you own, what time of day it is, etc. etc.

And FWIW, you will get a better voice to text in the cloud because that model will know about your device names and other details. For example, if you say, "turn on the kitchen light", the cloud knows you have a light called "kitchen", so if you slur a bit it can still figure it out.


All of that could as easily be done locally. The Echo speaker is likely the hub for the IoT device controlling the kitchen light. None of the context you speak of requires "cloud".


> All of that could as easily be done locally

It cannot. Keep in mind that the Alexa devices are built to be as cheap as possible, so they have minimum amounts of RAM and CPU. The tiniest of models can barely fit on the device.

> The Echo speaker is likely the hub for the IoT device controlling the kitchen light.

Generally the devices that are controlled by Alexa are done via Wifi from the provider of the device using their own APIs. Very few "Works with Alexa" devices can be controlled locally. But yes, some of them can. However, the Alexa device doesn't know it is called "kitchen".

> None of the context you speak of requires "cloud".

I just gave you a simple example. Here is a better one that I used down below:

Say you live in San Francisco:

"Alexa, I'm thinking of going to New York, how many flights are there each day?"

This is a hard question, and one that will go to the cloud.

"Alexa, what's the weather"

This seems like an easy question. But with local only processing, you'd get the weather in San Francisco. But with the LLM, it will probably give you the weather in New York, which is most likely what you wanted if you asked these things just a few seconds apart.


Funny that Home Assistant seems to manage.

Every point you made is a choice that was engineered into the system to impose dependence.


You must have missed the very first thing I said. The Alexa has very weak hardware. The home assistant is basically a Mac mini. And the costs of each reflect that.


Far from being a Mac Mini, the reference home assistant hardware is spec-for-spec a 4Gb Orange Pi 3B: https://www.home-assistant.io/green/

A far cry from your claim of "it cannot be done". And the $40 retail sticker for that board doesn't seem to match up with your claim about pricing.

Even low cost modern SoCs have NPUs in the double-digit TOPs, and memory densities being what they are, there's very little excuse not to run a special purpose language processing model on device. A 128gbit (8GB) memory module goes for as little as $0.04 ea on digikey: https://www.digikey.com/en/products/filter/memory/774?s=N4Ig...


How would it be aware of everything else I have and do online? Calendar, emails, YouTube history, current audiobook, etc?


I would be excited if Apple would add great GPT answering capabilities on my first-gen homepods, even if it meant having to send all queries to the cloud. I can unplug them if I need privacy.


I only have one Alexa and it won't bother me to remove it or replace it with another HomePod. I only use it to check the weather and occasionally to listen to random facts while I get ready in the morning. The fact that it has a digital clock is a nice bonus that HomePods don't yet have, though.


Yep, gonna unplug my Alexa tonight now. Maybe I'll try setting up openHAB for controlling my smart lights.


We should make AI generated conversations to overload their surveillance.


I imagine an Alexa customer would have a net loss financially from the model execution costs if they just sent garbage all day. Doesn’t even need to be clever.


Not if you run it locally and power it with solar?


Just play SpongeBob extra loud elsewhere in the house.


Waste of time. Find something else constructive to do.


How you dare tell that to Amazon?


Huh. How subpoena-able. Great timing! Throw these things in the trash.



What if the internet is off?


I mean, it seems that local hardware can be a limiting factor on the quality of the LLMs you can deploy. Why should they shoot their feet?

Even Apple did open their AI system to OpenAI (and potentially other vendors in the future).

As long as there is explicit consent to do so, it is fine. Nobody forces you to buy an Alexa product.


Because now that local processing is ever-more powerful and local storage dirt-cheap, it's time to move everything to "the cloud."

So ignorant, especially with music. Whoops, no Internet access on the plane? No music for you.


Considering what a blatant spyware it is, it is unbelievable that people pay for it (not that I would keep one at my home even if it was free).


Throw your wiretap in the garbage where it belongs.


It's better to stop the flow of Big Brother-like privacy violations and repurpose it to do something useful than add to e-waste.


Why? It's to useful to "belong" in the garbage.


It’s useful only to let big tech companies listen in to everything that happens in your home.


pikachu_surprised.jpg


This should be a godsend to any FOSS home assistants out there.


This Amazon step may make someone else very rich




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: