Is there any evidence of this? or are there any groups monitoring the appstore binaries to tell if they actually sending voice data over the wire?
Recording and interpreting speech would require a lot of CPU (if done on device) or network bandwidth (if uploaded to the cloud). Enough that it would be immediately obvious if apps were trying to do this.
That is, if they even could. iOS limits what apps can even do in the background and shows an icon when the microphone is in use in the background. Again, it would be obvious if apps were listening.
But let’s assume that somehow they managed to avoid all of these pitfalls and they were listening to conversations, performing speech to text, and uploading your conversations. This would require communication with their servers, which isn’t difficult to extract through basic reverse engineering. Many security researchers reverse engineer these communications on a regular basis to look for bugs, some of which can be worth six figures in these companies’ bounty programs. If there was an API for uploading your secret conversations, it would be the holy grail discovery for a security researcher. Someone would have found it.
The myth persists because coincidences will happen in high numbers at scale. If hundreds of millions of people are spending hours on social media each week, some number of them will see ads related to some conversation they had recently by pure random chance. Add in a general distrust for big tech companies right now and some subset of people will become convinced that their coincidences are evidence of a conspiracy.
We're never paying any attention to irrelevant ads - we're ad blind to them. But the moment we see a very relevant ad to a recent conversation, that will inevitably pop up.
"See, this one ad is tied to this conversation I had yesterday: they must be listening", while in reality it's more like "See, this one ad out of 100s I've seen in the last day is tied to this one conversation out of 10s I had in the last day".
It's the reason I hadn't seen a Volvo for over 10 years until I mentioned that fact, then saw three that day.
So to your point, the RAS filters out/ignores 99 ads but will bring to your attention the one that is similar to something you've mentioned recently.
I don't know how valid it is though. But it could contribute to the whole thing I guess...
It was my understanding that typically phone voice audio quality is a few kilo bits per second - not much more than a kilobyte per second. This would get positively lost in the noise of modern smartphone constant background connectivity, no?
(largely agree with your other points, though my understanding of original post was question for specifics on who actually may be checking these things rather than "somebody surely would've caught it")
If you’re only looking at total up/down traffic for the entire device, it might.
But we can inspect traffic per-app, including decoding the actual traffic (if you go through steps to circumvent certificate pinning).
Security researchers can, and do, look at the actual raw data coming out of apps. If voice was in there, it would be seen.
The second point is why would a company do that? If the NSA or whomever wants to snoop on you, they’ll drop malware on your phone, subpoena your iCloud backups etc. not try to coerce a company to record people’s conversations when said company has no desire to do so.
Edit: I was wrong, they were upfront with people about the purpose of recording (https://www.techrepublic.com/article/how-one-spanish-soccer-...)
> This functionality is not happening surreptitiously, as the app requests access to the microphone and geolocation service—it does not rely on a vulnerability to access these components without explicit permission
Users had to have the app open and grant the permission. They weren’t recording secretly in the background.
Here's a better article on this: https://www.techrepublic.com/article/how-one-spanish-soccer-...
From the article: "Despite this, users were not explicitly informed of the intended use of the microphone and geolocation permissions, which is central to the decision by AEPD to levy fines against LaLiga."
Sounds like they were spying while using microphone permissions, which is what the discussion is about.
However, your description of what it would need to work is woefully naïve. There were startup funded apps that matched entire recorded song libraries to ambient sound (along with location) in the age of flip phones to track copyright violations in clubs. They relied on matching compressed periodogram hashes with an error tolerant database. The databases were large, but not unwieldy, and the computational load reasonable in the early 2000s. The ability to craft a database to a person to match a large set of commercial words/names with very low bandwidth is not impossible. Machine learning has made it easier.
Edward Snowden, famously, rids of the entire microphone-array from his phones and only uses external mics: https://youtu.be/ucRWyGKBVzo?t=888
https://news.ycombinator.com/item?id=27932952 (Snowden on Privacy and Democracy)
https://twitter.com/snowden/status/1175422816403038208 (Snowden on smartphones)
Perhaps it’s content you read. Perhaps it’s another product you researched. Perhaps it’s something someone in your office or home looked for from the same IP. Perhaps another ad campaign triggered the discussion. Then you talk about it. Then you see an ad related to the original trigger, and you notice it (the Baader Meinhof bit).
And you have not actually answered the question. Which is, "have you actually VERIFIED that popular apps are not recording your conversations?"
A potentially better option would be if some independent non profit security firm audited major apps every once in a while, monitoring network traffic and such.
I think the difference here is that apps getting uploaded to the app store have static analysis run on them looking for unauthorized calls. Still doesn't rule it out, but adds an extra check to the process.
Does this include the newer Apple Axx series units with increasing number of ML cores? Siri is supposed to run locally on device. Hardware is moving explicitly in this direction. If this is done locally, no actual voice data would need to be sent down the wire. Insstead, a small compressed bit of text on keywords could be sent whether that's immediately or saved up until user foregrounds the app. No tinfoil hat needed to see how your premise is pretty weak.
Which would then appear in the packet captures that can be performed on the traffic (after circumventing certificate pinning).
My premise wasn’t that each step was impossible. It was that many steps would have to take place without being detected anywhere along the chain.
Surreptitiously taking advantage of what you describe would be restricted to Apple, which is its own concern but a different topic than what we started with, which is 3rd-party apps.
I think the myth persists because Big Tech can learn more about you from browsing history and credit card purchases (from you and your friends and family) than most people realize.
Another, probably more common reason that people often don’t think about is that a participant of such conversation may have searched for said product, or be otherwise profiled for it, and you might get the ad because Big Ad knows you are related to them. For example FB doesn’t need to know the contents of your Whatsapp or FB messenger messages to know how close you are to someone: how often you message, how many messages, when etc
I'm skeptical regarding bandwidth.
Modern codecs can achieve spectacular compression, you don't necessarily need to record audio-cd-level quality audio, and Facebook has state of the art hardware and software in terms of ai/ml and a very well trained workforce in the field.
If anything, one should regard Facebook as "capable of pulling this off".
For my own learning, I'm interested in this and what methods are used. Do you have any resources (blogs, videos, etc of these security researchers who are looking at these apps) you were thinking about in particular when you wrote this?
Or perhaps, they're just correlating the interests of people:
1. Person A reads about X online
2. Person A tells Person B about X
3. Person B sees ads about X, without ever having searched for it, concludes they're being spied on
I'm 99.9% sure WhatsApp does this, perhaps illegally or at least grey-area.
- there is no OS indicator
- it doesn’t require CPU time
- it doesn’t require much bandwidth
- it could happen only occasionally and could’ve slipped (automated) third party security screening
I don’t know if they do it. But I assume they can and might have done this. Hence, no access to my photo library.
If the app does not explicitly say it does not, I'd assume it does. I don't know if facebook/instagram has said if they do or don't.
But I had always thought big tech would just correlate interests of people it knows spend time with each other. E.g. my father gets an e-bike, I get e-bike ads. It could of course be due to me scrolling over one e-bike ad less quickly once.
It's certainly more frequent than just by random chance, but by virtue of the fact that we say many things, and the things we will say can be predicted (at least to some extent), by our browsing history, which already is constantly being churned through however many ML algorithms to know what ads will be relevant to us. I'm also convinced location data is thrown in there too; after I recently started spending more time in a new area, I stared getting ads for one of the adult braces companies that I've only ever seen there a block away. In those cases there's a decent probability someone will mention it because it's new to them, only for the algorithms to also figure: 'this is probably the kind of person that would be interested in this, and they're nearby now!' It happens frequently enough to almost everyone I know that the random argument loses its credibility to me.
Of course, you discount the thousands of adverts you see which aren't about X.
Why are you and your friends talking about X? Because it's a demographically popular topic. And advertisers have noticed that $gender people of $age_range in $city are searching for X. So you get an advert. You're not as unique as you think!
Or, while hanging out, you're attached to the same WiFi spot. One of you does a search for X before talking about it. So the retargetting continues.
The key thing is - why are you talking about what you're talking about?
But yes, both iOS and Android have APIs for accessing the name of the connected wifi network. Access depends a bit on how old your OS is and what permissions your app has.
And even if you've locked down permissions for the facebook app, you might have a different app which includes an SDK that slurps up the wifi name, GPS, signal strength of nearby wifi networks, connected cell tower info, and nearby bluetooth devices. That info gets sold to a data broker, which facebook may buy to hit you with an ad for whatever your mate just googled on the other end of the couch.
The hotspot isn't sharing anything, but the apps are putting data together. Google says "This IP is looking this up, and the tracking matches one of your users" (Or maybe they just firehose searches for relevant terms? I dunno.) Facebook app sees that, then sees that Dave is also using that IP for his Facebook app, put together with a list of Bob/Dave's friends and IP's, it's a high probability that they are talking about it currently, so show Bob's friends nearby that ad.
No need, the internet is browsing you.
There is a Reply All episode from 2017 about it. The answer was no.
“Actively” listening all the time would simply consume too much CPU/bandwidth/battery.
What is your source on this? Every recognizable song is surely not stored on-device so something must be uploaded.
Which I feel is a huge security oversight. You should be able to choose from a list of activation keywords at least. I love being on video/voice chats with people that I know have a device and randomly say "Alexa, play Dancing Queen" or something equally obnoxious like "Alexa, turn off lights".
To make it more interesting, let's assume that 2021 algorithms can't do that. Will better low power algorithms be developed in the next 20 years? How is a democratic society supposed to work when would-be-snitches are ubiquitously deployed?
To make it even more interesting, as more an more of our communications move to the internet (oooOOOooo, covid, cower in fear), what confidence you have that voice2text and voice fingerprinting does not already apply to all your phone/Zoom conversations? Do you run the phone/Zoom servers and monitor all potentially suspicious traffic? Have you heard of one Edward Snowden?
You are asked to do a few iterations of training so that the phone has a model on your voiceprint for "Hey Siri". Which is why, when someone who does not have a similar voice to you says "Hey Siri" it does not trigger. But someone who does have a similar voice to you can sometimes trigger it. The model is super basic.
The microphone is always passively passing input data to the "Hey Siri" detector (and a buffer), which acts as a sort of internal switch. If it detects the trigger phrase, it then sends the audio data to the cloud to be transcribed, and acted upon accordingly.
It is not always sending voice data to the cloud. If it were, your battery life would be noticeably impacted - worse than being on a phone call.
Myself, I believe any technical reason, if it actually exists, is a secondary factor - the primary reason is branding. They don't want people changing wakewords, because as long as everyone uses the same, "Ok Google" and "Siri" are terms/brands.
I also believe it is restricted mainly for branding and easy marketing.
The devices aren’t translating all speech to text all of the time and then checking it against the phrase “OK Google”. They’re looking for the trigger phrase.
But "listening, on everyone" and "all the time" are not necessary assumptions.
No, because this is a conspiracy. Whoever claims something like that should prove it and not the other way around.