Years ago after getting one I was messing around in settings on Amazon's Alexa website and noticed a log of commands/messages sent to Alexa. I reviewed them and was horrified to see "why does daddy always beat me". Best to let your daughter win at Uno in this age of always-on connectivity. Or just unplug it, which is what I did.
There's some important nuance here: All commands (after trigger/wake word) were sent to the cloud in the past anyway.
The option to do some on-device processing came on later devices and, as I understand it, wasn't even enabled by default. Furthermore, on-device processing would still send the parsed commands to the cloud.
The headline is vague, but it's misleading a lot of people into thinking that only now Amazon will start sending commands to the cloud. It's actually always been that way. I suspect the number of people who enabled on-device processing was very, very small.
I'm shocked that not one single article I've found mentioned this incredibly obvious fact. This has ALWAYS been the case and only a few select models ever offered the option to turn it off. This change puts all devices on equal footing and behavior with the launch device.
I don't love Amazon, but I love ginned up outrage over tech the author never bothered to understand even less.
"I don't love Amazon, but I love ginned up outrage over tech the author never bothered to understand even less."
And 99% of Echo owners disagree with this. No one cares if a reporter mixed up "amazon is starting to spy on you tomorrow" vs "amazon has been spying on you since the first echo was launched". Only amazon would make an argument like "yeah but we've been doing this for years now and no one made a big deal about it..."
Right. But you have to admit this sounds a bit like labelling Cereal "arsenic free". Really helpful to know but the sudden announcement hides that it's always been that way and implies others aren't. In truth this is probably how they all work and always have, but such a headline wouldn't ding Amazon sales.
I genuinely can't tell whether you're being sarcastic or whether you're saying that you appreciated the link to the XKCD. If the former, then I apologise.
The Sauron's Eye of public sentiment can only pay attention to one thing at a time. If you justify today on the basis that it wasn't paying attention yesterday, you can rationalize anything.
> I'm shocked that not one single article I've found mentioned this incredibly obvious fact.
The search engines are crap. There was a story some years ago where Amazon employees from an Eastern European country were actually listening to your Alexa and sent the relevant commands back to the device.
I'm not disappointed that this "non-event" is drawing attention to this comparison. Even if it's farfetched to dream that bringing privacy more to the forefront of the news zeitgeist will result in a shift of the status quo for our industry - heaven knows that if privacy stories don't get mindshare, the status quo could get far worse.
I knew someone that used to work on the Alexa team on the language side of things. She had an emotionally terrible few weeks at one stage, because she and her team had to brainstorm (working in conjunction with experts) on just about every possible way users might ask questions that indicate they're being abused, so that they could provide suitable responses. Glad to have worked on it, but it was heart wrenching in many regards.
I've noticed since getting a new mac, that on-device dictation is no longer possible, a modal pops up forcing you to hit agree in order to get dictation. Never clicking that.
The direction of major operating systems neutering themselves in favour of deep service integration does not fill one with hope
That's not true -- on-device dictation works just fine. You can verify for yourself by turning off WiFi/Internet and dictation works identically. As long as you have a modern enough Mac that supports local dictation, of course. (Older Macs were only cloud dictation, I remember.)
The popup on my MacBook says specifically:
> When you use Dictation, your device will indicate in Keyboard Settings if your audio and transcripts are processed on your device and not sent to Apple servers. Otherwise, the things you dictate are sent to and processed on the server, but will not be stored unless you opt in to Improve Siri and Dictation.
> your device will indicate in Keyboard Settings if your audio and transcripts are processed on your device and not sent to Apple servers
I have a reasonably recent (<2yo) MBP, which feels like it ought to be "modern enough", and in Keyboard Settings, it says "Dictation sends information like your voice input, contacts, and location to Apple to process your requests." It doesn't say anything about processing happening on my device. Yes, off-line dictation does work for me (with Wi-Fi turned off), but I'm curious under what conditions Keyboard Settings would say something about transcripts not being sent to Apple.
Not to mention that this is an absurd assurance. "Your device will indicate in Keyboard Settings?" So you're supposed to run and look at that for every voice command, to see if THIS one is being sent up?
For you. I am not getting the experience you describe which sounds similar to how I was expecting this to work as it did previously in Sonoma on an Intel machine.
The options are different for me in the settings app, and using dictation is for me right now, impossible without agreeing to a modal displaying an agreement to send audio data to Apple.
Aside from the what the modal actually says (mine doesn't say anything about "sending" info, and I have a 2023 M2 MBP) I don't really think it's fair to put Apple in the same category as Amazon when it comes to sending data to the cloud, because they actually have more of a financial incentive to keep your data private than selling it to the highest bidder.
I also highly doubt that they will somehow later down the line magically change their minds given the millions (maybe over a billion at this point) of dollars invested in things like private cloud compute (1) and challenging the U.K. government (2) in court over E2E cloud backups.
---
Do you want to enable Dictation?
When you dictate text, information like your voice input and contact names are sent to Apple to help your Mac recognise what you’re saying.
[Enable]
Dictation Privacy (a policy)
[Cancel]
---
I think it is entirely fair to put Apple in this category, because they have effectively disabled offline dictation for me except if I agree to sent my voice data to Apple. I have used this offline dictation feature for years by the way
--- Do you want to enable Dictation? When you dictate text, information like your voice input and contact names are sent to Apple to help your Mac recognise what you’re saying.
[Enable] Dictation Privacy (a policy) [Cancel] ---
I do, but it's nowhere near as convenient as double tapping Fn anywhere there is editable text and speaking. It also requires much more system resources.
What interface do you use for using local whisper?
Word of warning: businesses, possibly including your employer, may consider you to be a robot, or a terrorist, or tell you to install Windows, or tell you to use your phone, or tell you that you have no right to privacy.
I recently had to go through a background check for a prospective employer. The background check website wouldn't even load. The support agent told me that it's a Firefox problem and told me that I needed to open the website on Chrome or Edge or on my phone, and that the website is working "perfectly". Alas, it worked fine with Firefox on Linux, as long as the user-agent reported that it's actually Chrome and Windows.
Yeah the website is indeed working "perfectly": perfectly enough to block employment of people who care about privacy.
I keep one Windows or Mac around always for work specific things, if Linux is unsupported I can switch. Heck you could get a free Windows dev VM from Microsoft (they rotate them out every 3 months).
It's not just about us, it's about less-technical people who might adopt libre operating systems if they're easy to use, but not if web services intentionally refuse to do business with them.
> but not if web services intentionally refuse to do business with them.
And in many ways, I'd be fine with that if they'd be up-front and honest about it.
Alas, Microsoft certainly is not. Cloudflare is not either. Many of these services just sit there and pretend like they're loading without showing any sort of error indication. Much like a tarpit but it's malicious on the business side and with little to no recourse on the real human side.
Microsoft may be malicious, buy it's more likely it's incompetent.
Cloudflare is not malicious, but is between a rock and a hard place. By caring about privacy you are, indeed, looking more suspicious. In a perfectly anonymous world, reputation-based captchas couldn't work. It's OK if you think they shouldn't exist, but Cloudflare customers and most people like them.
Everyone else is not on some secret plan to destroy the Linux Desktop, they just don't test their websites on linux/firefox (because "nobody uses that"), which makes them unusable, which causes people to drift off linux/firefox.
Keep an eye on [Asahi Linux](https://asahilinux.org/), then. A cursory glance shows Me support not being complete yet, but I assume it will be in time (and the missing stuff may or may not be a show stopper for you).
I was just forced to migrate to MacOS in my new job and I can't really understand developers saying that Linux is not usable as a desktop machine. For me, it's the other way round.
- Things that should be system settings are instead apps (amphetamine, rectangle).
- There's no way to move focus around directionally between windows with a keyboard.
- "Open file" windows give you nowhere to paste a path (tip: cmd + shift + g summons a path prompt).
- Full screen windows now must be managed like they've just become an entire workspace, and the underlying app may or may not support the un-full-screen button.
- The error messages don't give you enough info to actually act on them (apparently "Docker" will damage my computer, and I should uninstall it, but it won't give me a path to the offending file, so I don't know how to install it, also this warning returns if I close it so it's just been hanging around for months.)
My strategy for maintaining sanity is to do as much as possible through zellij (a terminal multiplexer), that way I can use the same muscle memory on Linux as well. As for the rest, I just try to ignore it.
I have found the same for mac, it has all the downsides of windows, and all the downsides of Linux, and almost no upsides of it's own. Sure the hardware is doing plenty of nice things hard to find elsewhere, but the OS is so god-damn hostile to the kinds of people who can appreciate the hardware that it kind of defeats the purpose.
Oh I wouldn't go so far as "all the downsides of Windows". So far Apple has not been targeting me with a phishing campaign designed to get me to use their browser of choice.
I would use Linux if it supported offline installers.
As is, I'll be sticking with heavily tweaked Windows to work with my several HDDs full of old software, and avoid the Linux headaches of repos disappearing, deciding between Snap/Appimage/DEB and general incompatibility with office documents, industrial tooling and Adobe software.
I'll only use Linux where I'm paid to at work. Thanks to Linux Torvalds' terrible software distribution model, I've had to do black magic to work around Anon's deprecation of Debian/Raspbian Stretch on which our industrial network gateways run.
I was on POP for two years now, I switched to an Arch derivative called EndeavourOS which makes installing Arch a breeze. I did discover someone working on an Atomic version of Arch (where the core OS is frozen for a set period of time, to ensure total stability, and nothing can break during this window) called Arkane Linux, which I might try.
I installed a copy of Windows 11 the other day for a new machine and it was INFURIATING.
In order to install without internet or an offline account, you MUST know a voodoo command and how to enter it. Used to be their dark pattern was at least on the screen, they’re out of their damn minds now.
Everyone has their breaking point with Microsoft, I hit mine and it’s been nothing but good for me.
There seems to be some tremendous confusion here. The vast majority of Alexa-family devices perform no local processing except for the activation word ("Alexa"). I didn't even realize that some of the more recent devices supported an opt-in for local processing.
This kind of makes sense, at least to me: local processing will always be limited. The entire premise of the original Echo devices was that all the magic happened in the cloud. It seems like not much has really changed?
Most Google devices do parallel local processing and cloud processing.
The local processing has less latency and works on unstable internet. It's perfect for tasks like 'set an alarm for 8am', even if offline.
The remote processing is good for better accuracy of complex words and queries.
The results are combined in the UI, making the whole thing feel less leggy (although IMO it still feels laggy to have to wait 1-2 seconds after asking a query to get results).
Related: The Alexa feature "do not send voice recordings" you enabled no longer available (discuss.systems) | 929 points by luu 1 day ago | 664 comments |https://news.ycombinator.com/item?id=43385268
Pure anecdote, but it reminds me of that time, I mentioned on a call that a particular piece of code was a time bomb waiting to explode, only to have Alexa wake up and listen as I was standing nearby and noticed the light. I immediately disconnected it and never looked back.
There's a way to download all of your Alexa requests. I recommend it to everyone. It was interesting and horrifying to get literally all of them, from day 1. I noticed how tired I sound in the mornings or evenings. I started understanding patterns of my thoughts and needs. The Alexa went to the bin quickly after that session of exploration and insight.
Related – is anyone working on an open home assistant? Google, Apple, Amazon are all taking so long to bring latest advancements across to their products
I did!
with their own new hardware (https://www.home-assistant.io/voice-pe/)
sadly the microphones are way worse than e.g. in an alexa speaker.
Also the performance of the „voice pipelines“ (stt, llm, tts) are a bit of a pain because they are all in sequence and not e.g. using stream features.
I don't understand why Alexa/Siri etc don't just keep their hardcoded rules for things like "set an alarm" and only ship things to a cloud LLM if they don't match a rule.
Put your iPhone in airplane mode and disable WiFi - most of the basic stuff like 'set an alarm', 'start a timer', etc. will still work. This has been the case for several years - offline Siri was one of the big things they added in iOS 15.
With the Alexa devices that have local processing hardware, that is exactly what happened (prior to this change, if you had the local only option set).
But now, in the age of LLMs, there are no "simple rules". An example. Say you live in San Francisco:
"Alexa, I'm thinking of going to New York, how many flights are there each day?"
This is a hard question, and one that will go to the cloud.
"Alexa, what's the weather"
This seems like an easy question. But with local only processing, you'd get the weather in San Francisco. But with the LLM, it will probably give you the weather in New York, which is most likely what you wanted if you asked these things just a few seconds apart.
It is more complexity than just "ship everything to an LLM and use tool calls", but the payoff - perfect behavior, along with offline support, for your most common inputs - is worth it I think.
I disagree about things being less consistent. Let's imagine a 100% LLM world - in this world, you use a bunch of training to try to get the LLM to match your hardcoded responses for common inputs. If you get your training really right, you get 100% accuracy for these inputs. In this world, no one is complaining about consistency! So why not just hardcode that behavior?
The whole benefit of LLMs is that humans are not consistent enough. Or at least Apple, Amazon, Google and Microsoft all believe normies don't want to be consistent enough to speak the most common input the same way, allowing to use much simpler and efficient approaches to voice input - like the ones that worked off-line 15+ years ago on a regular PC.
LLMs are actually the only reason I'd consider processing voice in the cloud to be a good idea. Alas, knowing how the aforementioned companies designed their assistants in the past, I'm certain they'll find a way to degrade the experience and strip most of the benefits of having LLMs in the loop. After all, as past experience shows, you can't have an assistant letting you operate commercial products and services without speaking the brand names out loud. That's unthinkable.
It isn't an option for older devices anyway. The hardware for on-device processing was only in some recent models.
I think it's likely that they looked at the numbers and realized they were spending a lot of money putting NPUs on devices and maintaining separate voice parsing models for a very small minority of users.
Okay with this level of integration in daily, private life (able to record background noise not related to the request…potentially or actually), consider this:
Is Alexa hearing a gunshot a request for assistance? It’s not a voice command, okay, but where does “oh that’s not our business” really end in vast data collection platforms such as this? Does Alexa have any duty to report voice requests about self-harm?
You would think we could actually do all voice processing locally now? The models that do voice, speech, and language processing aren't that big...an 8b model would be completely feasible for an affordable device, if not home server.
You can process the voice to text locally, but it's what you do with the text afterwards that is done in the cloud.
In the age of LLMs, even a simple request like "set a timer" is sent to the cloud so that it can be processed in the context of what you've said previously, what devices you own, what time of day it is, etc. etc.
And FWIW, you will get a better voice to text in the cloud because that model will know about your device names and other details. For example, if you say, "turn on the kitchen light", the cloud knows you have a light called "kitchen", so if you slur a bit it can still figure it out.
All of that could as easily be done locally. The Echo speaker is likely the hub for the IoT device controlling the kitchen light. None of the context you speak of requires "cloud".
It cannot. Keep in mind that the Alexa devices are built to be as cheap as possible, so they have minimum amounts of RAM and CPU. The tiniest of models can barely fit on the device.
> The Echo speaker is likely the hub for the IoT device controlling the kitchen light.
Generally the devices that are controlled by Alexa are done via Wifi from the provider of the device using their own APIs. Very few "Works with Alexa" devices can be controlled locally. But yes, some of them can. However, the Alexa device doesn't know it is called "kitchen".
> None of the context you speak of requires "cloud".
I just gave you a simple example. Here is a better one that I used down below:
Say you live in San Francisco:
"Alexa, I'm thinking of going to New York, how many flights are there each day?"
This is a hard question, and one that will go to the cloud.
"Alexa, what's the weather"
This seems like an easy question. But with local only processing, you'd get the weather in San Francisco. But with the LLM, it will probably give you the weather in New York, which is most likely what you wanted if you asked these things just a few seconds apart.
You must have missed the very first thing I said. The Alexa has very weak hardware. The home assistant is basically a Mac mini. And the costs of each reflect that.
A far cry from your claim of "it cannot be done". And the $40 retail sticker for that board doesn't seem to match up with your claim about pricing.
Even low cost modern SoCs have NPUs in the double-digit TOPs, and memory densities being what they are, there's very little excuse not to run a special purpose language processing model on device. A 128gbit (8GB) memory module goes for as little as $0.04 ea on digikey: https://www.digikey.com/en/products/filter/memory/774?s=N4Ig...
I would be excited if Apple would add great GPT answering capabilities on my first-gen homepods, even if it meant having to send all queries to the cloud. I can unplug them if I need privacy.
I only have one Alexa and it won't bother me to remove it or replace it with another HomePod. I only use it to check the weather and occasionally to listen to random facts while I get ready in the morning. The fact that it has a digital clock is a nice bonus that HomePods don't yet have, though.
I imagine an Alexa customer would have a net loss financially from the model execution costs if they just sent garbage all day. Doesn’t even need to be clever.