Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is anyone monitoring popular apps to check if they're listening?
94 points by gitgud 30 days ago | hide | past | favorite | 91 comments
Everyone's heard the conspiracy that Facebook and Instagram record audio from your phone's microphone and use it to recommend you advertising...

Is there any evidence of this? or are there any groups monitoring the appstore binaries to tell if they actually sending voice data over the wire?

No, Facebook and Instagram are not monitoring conversations. Yes, many security researchers routinely examine traffic from these apps to see what’s being communicated with the servers.

Recording and interpreting speech would require a lot of CPU (if done on device) or network bandwidth (if uploaded to the cloud). Enough that it would be immediately obvious if apps were trying to do this.

That is, if they even could. iOS limits what apps can even do in the background and shows an icon when the microphone is in use in the background. Again, it would be obvious if apps were listening.

But let’s assume that somehow they managed to avoid all of these pitfalls and they were listening to conversations, performing speech to text, and uploading your conversations. This would require communication with their servers, which isn’t difficult to extract through basic reverse engineering. Many security researchers reverse engineer these communications on a regular basis to look for bugs, some of which can be worth six figures in these companies’ bounty programs. If there was an API for uploading your secret conversations, it would be the holy grail discovery for a security researcher. Someone would have found it.

The myth persists because coincidences will happen in high numbers at scale. If hundreds of millions of people are spending hours on social media each week, some number of them will see ads related to some conversation they had recently by pure random chance. Add in a general distrust for big tech companies right now and some subset of people will become convinced that their coincidences are evidence of a conspiracy.

It's not only a matter of "big numbers", but also of confirmation bias.

We're never paying any attention to irrelevant ads - we're ad blind to them. But the moment we see a very relevant ad to a recent conversation, that will inevitably pop up.

"See, this one ad is tied to this conversation I had yesterday: they must be listening", while in reality it's more like "See, this one ad out of 100s I've seen in the last day is tied to this one conversation out of 10s I had in the last day".

This is correct. We even have a brain circuit for this: the reticulating activating system.

It's the reason I hadn't seen a Volvo for over 10 years until I mentioned that fact, then saw three that day.

So to your point, the RAS filters out/ignores 99 ads but will bring to your attention the one that is similar to something you've mentioned recently.

I also have one more theory which is that this effect can be caused by the network aspect of Facebook. Maybe I haven't been interested in a specific topic, but one of my friends was. And Facebook is serving me ads because they were relevant for my friends so they might be relevant to me. And the conversations where I heard about a topic are a consequence of that same thing -- one of my friends is interested in a certain topic.

I don't know how valid it is though. But it could contribute to the whole thing I guess...

"Recording and interpreting speech would require a lot of CPU (if done on device) or network bandwidth"

It was my understanding that typically phone voice audio quality is a few kilo bits per second - not much more than a kilobyte per second. This would get positively lost in the noise of modern smartphone constant background connectivity, no?

(largely agree with your other points, though my understanding of original post was question for specifics on who actually may be checking these things rather than "somebody surely would've caught it")

> This would get positively lost in the noise of modern smartphone constant background connectivity, no?

If you’re only looking at total up/down traffic for the entire device, it might.

But we can inspect traffic per-app, including decoding the actual traffic (if you go through steps to circumvent certificate pinning).

Security researchers can, and do, look at the actual raw data coming out of apps. If voice was in there, it would be seen.

Average person speaks 16k words per day. Android considers app background traffic excessive at 50MB / hour, which is 1GB / day. 10% of that rate to steer clear of raising red flags is 100MB / day, about 10kB for each word you speak. Rather trivial to steganograph a word in 10kB of background traffic.



You have to calculate the amount of data provided from all users in this case + storage + processing power to do all of that. It’s probably more than you would expect too.

The second point is why would a company do that? If the NSA or whomever wants to snoop on you, they’ll drop malware on your phone, subpoena your iCloud backups etc. not try to coerce a company to record people’s conversations when said company has no desire to do so.

Any insight into how this was uncovered? A sports league secretly used their Android app's microphone to uncover illegal broadcasts.


Edit: I was wrong, they were upfront with people about the purpose of recording (https://www.techrepublic.com/article/how-one-spanish-soccer-...)

Fourth paragraph of the article explains it:

> This functionality is not happening surreptitiously, as the app requests access to the microphone and geolocation service—it does not rely on a vulnerability to access these components without explicit permission

Users had to have the app open and grant the permission. They weren’t recording secretly in the background.

edit: ah yes, I was wrong here, they were disclosing

Here's a better article on this: https://www.techrepublic.com/article/how-one-spanish-soccer-...

there wasn't anything to "discover", the app literally was telling users that was what it was doing.

edit: I was wrong, they were pretty up front with users about the purpose of recording (https://www.techrepublic.com/article/how-one-spanish-soccer-...)

From the article: "Despite this, users were not explicitly informed of the intended use of the microphone and geolocation permissions, which is central to the decision by AEPD to levy fines against LaLiga."

Sounds like they were spying while using microphone permissions, which is what the discussion is about.

And they prompted people to opt-in (with somewhat leading language) into this system to search for unauthorized playback. So there was no need to "discover" that the app did that, because the app literally said it had a system to do so. There is a range between "secretly" (= would need research attention to discover it happens) and "done correctly legally".

Thanks for this! Found another article where this is better explained: https://www.techrepublic.com/article/how-one-spanish-soccer-...

I agree that the platforms don't currently allow them default access to microphones, but that's not how it used to be. I also agree that we would have likely heard about this (old) capability, due to the number of people who would have worked on it (unless they were supported by a TLA).

However, your description of what it would need to work is woefully naïve. There were startup funded apps that matched entire recorded song libraries to ambient sound (along with location) in the age of flip phones to track copyright violations in clubs. They relied on matching compressed periodogram hashes with an error tolerant database. The databases were large, but not unwieldy, and the computational load reasonable in the early 2000s. The ability to craft a database to a person to match a large set of commercial words/names with very low bandwidth is not impossible. Machine learning has made it easier.

Shazam was bought by Apple several years ago, but identifying music is a different beast from picking out the phrase Hey Siri/Google/Alexa.

Those are both additional proof cases, but I wasn't talking about either Shazam (which does a lot of processing remotely) or "wake words" although I'm familiar with the accuracy and compute power requirements for those. Multi-keyword detection for advertising with >50% FAR/FRR is much easier than either of them. A conversation is likely to hear the word (or related ones) mentioned multiple time by multiple parties and the matcher can be tuned to the individual locally.

The advertisements I see online or in email spam happen to coincide with whatever I had just been discussing with my mates. This has been the case for a good part of three years. May be its the Baader–Meinhof phenomenon as someone else in this thread points out, but the ads I see, at the time I see, make me quite uneasy knowing what I know being in tech.

Edward Snowden, famously, rids of the entire microphone-array from his phones and only uses external mics: https://youtu.be/ucRWyGKBVzo?t=888

See also: https://news.ycombinator.com/item?id=27932952 (Snowden on Privacy and Democracy)

https://twitter.com/snowden/status/1175422816403038208 (Snowden on smartphones)

The simplest explanation of this is that there’s an antecedent reason you’re discussing what you’re discussing, and that’s the reason that correlates to the ads you see, rather than your discussion itself.

Perhaps it’s content you read. Perhaps it’s another product you researched. Perhaps it’s something someone in your office or home looked for from the same IP. Perhaps another ad campaign triggered the discussion. Then you talk about it. Then you see an ad related to the original trigger, and you notice it (the Baader Meinhof bit).

IMO this is the real story. Tech companies know so much about you from your browsing history and your purchase history. But they also know this about your friends and family, which I think increases the creepiness factor, as they may serve you ads based on _your friends & family's_ activity.

You have started with a conclusion and filled in the blanks with your personal opinions.

And you have not actually answered the question. Which is, "have you actually VERIFIED that popular apps are not recording your conversations?"

Depends on what level do you trust. For me, I'd generally trust OS indicators. So, if iOS doesn't show me a mic on indicator when Facebook is not open I'll consider thar verification.

A potentially better option would be if some independent non profit security firm audited major apps every once in a while, monitoring network traffic and such.

"If there was an API for uploading your secret conversations, it would be the holy grail discovery for a security researcher. Someone would have found it."

"That 20 dollar bill lying on the ground in front of me clearly does not exist. Somebody would have picked it up already!"

Yeah the logic here is seriously lacking. Heartbleed shows that it's entirely possible someone might not find it for over a decade! Or the most recent Apple Bootrom exploit.

Or, y'know, the NSO group exploit.

I think the difference here is that apps getting uploaded to the app store have static analysis run on them looking for unauthorized calls. Still doesn't rule it out, but adds an extra check to the process.

>Recording and interpreting speech would require a lot of CPU (if done on device)

Does this include the newer Apple Axx series units with increasing number of ML cores? Siri is supposed to run locally on device. Hardware is moving explicitly in this direction. If this is done locally, no actual voice data would need to be sent down the wire. Insstead, a small compressed bit of text on keywords could be sent whether that's immediately or saved up until user foregrounds the app. No tinfoil hat needed to see how your premise is pretty weak.

> Insstead, a small compressed bit of text on keywords could be sent whether that's immediately or saved up until user foregrounds the app.

Which would then appear in the packet captures that can be performed on the traffic (after circumventing certificate pinning).

My premise wasn’t that each step was impossible. It was that many steps would have to take place without being detected anywhere along the chain.

How do you distinguish the, assumed, encrypted transmission from a foreground app sending this data compared to just requesting an update for the app's viewport?

With a fully reversed engeneered app, this would be possible. But I believe the effort for this is quite high. And with obfuscation even higher.

> increasing number of ML cores? Siri is supposed to run locally on device. ... If this is done locally, no actual voice data would need to be sent down the wire.

Surreptitiously taking advantage of what you describe would be restricted to Apple, which is its own concern but a different topic than what we started with, which is 3rd-party apps.

> The myth persists because coincidences will happen in high numbers at scale.

I think the myth persists because Big Tech can learn more about you from browsing history and credit card purchases (from you and your friends and family) than most people realize.

> If hundreds of millions of people are spending hours on social media each week, some number of them will see ads related to some conversation they had recently by pure random chance

Another, probably more common reason that people often don’t think about is that a participant of such conversation may have searched for said product, or be otherwise profiled for it, and you might get the ad because Big Ad knows you are related to them. For example FB doesn’t need to know the contents of your Whatsapp or FB messenger messages to know how close you are to someone: how often you message, how many messages, when etc

> Recording and interpreting speech would require a lot of CPU (if done on device) or network bandwidth

I'm skeptical regarding bandwidth.

Modern codecs can achieve spectacular compression, you don't necessarily need to record audio-cd-level quality audio, and Facebook has state of the art hardware and software in terms of ai/ml and a very well trained workforce in the field.

If anything, one should regard Facebook as "capable of pulling this off".

That's assuming the audio gets uploaded. For the purposes of serving more relevant ads, simple hotword detection need only be good enough to distinguish eg Nike vs Adidas, and could be done locally. That changes the upload to mere bytes, which is way more easily obfuscated in any number of opaque IDs. fbclid= anyone?

The coincidences are generally amplified by the fact that apps generally do send location data, when you visit friends you generally talk about stuff you've been looking at lately, so it certainly plausible that linking profiles based on proximity and similarity of interests would mean that there's a much higher chance that advertising interests spread from one user to another, merely by geography.

> Yes, many security researchers routinely examine traffic from these apps to see what’s being communicated with the servers.

For my own learning, I'm interested in this and what methods are used. Do you have any resources (blogs, videos, etc of these security researchers who are looking at these apps) you were thinking about in particular when you wrote this?

> some number of them will see ads related to some conversation they had recently by pure random chance.

Or perhaps, they're just correlating the interests of people:

1. Person A reads about X online

2. Person A tells Person B about X

3. Person B sees ads about X, without ever having searched for it, concludes they're being spied on

I'm 99.9% sure WhatsApp does this, perhaps illegally or at least grey-area.

Maybe WhatsApp knows that these 2 people are friends, so it tries to serve person B ads that worked for person A.

That's was the point I was trying to make. Maybe I wasn't clear

So, does Instagram upload meta data such as GPS infos of your entire photo library if you give it access?

- there is no OS indicator

- it doesn’t require CPU time

- it doesn’t require much bandwidth

- it could happen only occasionally and could’ve slipped (automated) third party security screening

I don’t know if they do it. But I assume they can and might have done this. Hence, no access to my photo library.

It's reasonable to assume that any photo sharing app checks EXIF data (and uploads it), including geotags. For some people this is something they want, as some people want the app to be able to show that info or be able to search their library by date/location.

If the app does not explicitly say it does not, I'd assume it does. I don't know if facebook/instagram has said if they do or don't.

Confirmation bias, coincidences at scale, sure, certainly a part of it.

But I had always thought big tech would just correlate interests of people it knows spend time with each other. E.g. my father gets an e-bike, I get e-bike ads. It could of course be due to me scrolling over one e-bike ad less quickly once.

I've had at least 10 situations where Facebook or YouTube recommend me an ad or video based on a real-life conversation which took place around 1 minute before.

> The myth persists because coincidences will happen in high numbers at scale. If hundreds of millions of people are spending hours on social media each week, some number of them will see ads related to some conversation they had recently by pure random chance.

It's certainly more frequent than just by random chance, but by virtue of the fact that we say many things, and the things we will say can be predicted (at least to some extent), by our browsing history, which already is constantly being churned through however many ML algorithms to know what ads will be relevant to us. I'm also convinced location data is thrown in there too; after I recently started spending more time in a new area, I stared getting ads for one of the adult braces companies that I've only ever seen there a block away. In those cases there's a decent probability someone will mention it because it's new to them, only for the algorithms to also figure: 'this is probably the kind of person that would be interested in this, and they're nearby now!' It happens frequently enough to almost everyone I know that the random argument loses its credibility to me.

If they are not actually listening, has someone mapped out how I get an ad related to a verbal conversation? What did they do to put the pieces together? I would imagine several different scenarios to get an actual match - but Ive wondered this at least a couple of times (too close a coincidence that is).

They don’t need to listen to audio, which is very cumbersome and error prone, if they have your location. If your location is available, so are all of your friends’ locations. The apps can easily see who’s in the same place at the same time and generate a “friend” network. Presumably you talk to your friends about your thoughts, and they talk to their friends. All it takes is a few people to start googling something new (or looking up products in Amazon, or searching for the same tags on Instagram, etc etc) in this network for the algorithm to realize this might be something you’re ALL talking to each about, and thus advertise straight to you, giving the perfect illusion that whatever you’re talking about is advertised back to you.

It's a mixture of things. The first is "Baader–Meinhof phenomenon" / frequency bias. Your friends talk about X, all of a sudden you notice adverts about X.

Of course, you discount the thousands of adverts you see which aren't about X.

Why are you and your friends talking about X? Because it's a demographically popular topic. And advertisers have noticed that $gender people of $age_range in $city are searching for X. So you get an advert. You're not as unique as you think!

Or, while hanging out, you're attached to the same WiFi spot. One of you does a search for X before talking about it. So the retargetting continues.

The key thing is - why are you talking about what you're talking about?

Can all apps access information on the wifi hotspot that a user is connnected to? Seems like an information leak that should be closed by Apple/Google.

The server the app is communicating with can see the IP address of the user, which may be enough to uniquely identify multiple users connected on the same network.

But yes, both iOS[1] and Android[2] have APIs for accessing the name of the connected wifi network. Access depends a bit on how old your OS is and what permissions your app has.

And even if you've locked down permissions for the facebook app, you might have a different app which includes an SDK that slurps up the wifi name, GPS, signal strength of nearby wifi networks, connected cell tower info, and nearby bluetooth devices. That info gets sold to a data broker, which facebook may buy to hit you with an ad for whatever your mate just googled on the other end of the couch.

[1] https://developer.apple.com/forums/thread/679038

[2] https://stackoverflow.com/questions/21391395/get-ssid-when-w...

Bob searches for "new Ford car" on Google, Dave is sat next to him in the pub, he opens up his Facebook, and sees an ad for a new Ford car.

The hotspot isn't sharing anything, but the apps are putting data together. Google says "This IP is looking this up, and the tracking matches one of your users" (Or maybe they just firehose searches for relevant terms? I dunno.) Facebook app sees that, then sees that Dave is also using that IP for his Facebook app, put together with a list of Bob/Dave's friends and IP's, it's a high probability that they are talking about it currently, so show Bob's friends nearby that ad.

Wondering the same thing..I'm having targeted ads based on my verbal conversations, way too many for coincidence.

It's no longer as sensible as it used to be to browse the internet any more.

No need, the internet is browsing you.

> Is there any evidence of this?

There is a Reply All episode from 2017 about it. The answer was no.


How do Android and iOS known when you say "OK Google" or "Hey Siri" if they are not listening all the time? Or, is it accepted that they are?

The listening for the initial keyword (e.g. “Hey Siri”) is done on hardware designed specifically for that purpose. That’s why you can’t change the keyword to whatever you want. Only when that keyword is heard does it actively listen to and process whatever follows.

“Actively” listening all the time would simply consume too much CPU/bandwidth/battery.

Don't Android phones have passive listening? Pixel phones have always-on song detection (Shazam-like) and usually display song info on the lockscreen within ~10 seconds of the song being played nearby.

That works somewhat similarly to the "Ok Google". Dedicated chip and super small on-device ML model. Nothing goes back to the cloud.

> Nothing goes back to the cloud

What is your source on this? Every recognizable song is surely not stored on-device so something must be uploaded.

It's not every song, the model is built from a list of N (100k?) most popular songs. If you pick something a bit obscure it won't work. The model is updated on occasion and pushed in bulk to pixel devices.

I've tried putting my phone in airplane mode and the feature still works. You can easily try that yourself.

That says the analysis is done locally. It does not say nothing goes to the cloud.

It is actually stored in the device. A new database is downloaded from time to time

How do you know nothing goes back to the cloud, like metadata?

>That’s why you can’t change the keyword to whatever you want.

Which I feel is a huge security oversight. You should be able to choose from a list of activation keywords at least. I love being on video/voice chats with people that I know have a device and randomly say "Alexa, play Dancing Queen" or something equally obnoxious like "Alexa, turn off lights".

What exactly prevents the 'hardware designed specifically for that purpose' to include in its purpose the translation of your voice to text, with voice fingerprinting for good measure? Have you ran an electronic microscope atom layer slicing of the hardware to guarantee its complete set of functionalities?

Power consumption

The hardware runs an algorithm to detect the wakeword. What makes you confident that a different algorithm can't be run to translate voice to text? They will be in the same family of algorithms, just with different purposes.

To make it more interesting, let's assume that 2021 algorithms can't do that. Will better low power algorithms be developed in the next 20 years? How is a democratic society supposed to work when would-be-snitches are ubiquitously deployed?

To make it even more interesting, as more an more of our communications move to the internet (oooOOOooo, covid, cower in fear), what confidence you have that voice2text and voice fingerprinting does not already apply to all your phone/Zoom conversations? Do you run the phone/Zoom servers and monitor all potentially suspicious traffic? Have you heard of one Edward Snowden?

When you train your phone to understand "Hey Siri" it requires an internet connection for the training so that it can verify that you are actually saying, "Hey Siri" and not "Yo, phone spirit". After the initial training, it no longer requires (or uses) the internet to detect the keyphrase.

You are asked to do a few iterations of training so that the phone has a model on your voiceprint for "Hey Siri". Which is why, when someone who does not have a similar voice to you says "Hey Siri" it does not trigger. But someone who does have a similar voice to you can sometimes trigger it. The model is super basic.

The microphone is always passively passing input data to the "Hey Siri" detector (and a buffer), which acts as a sort of internal switch. If it detects the trigger phrase, it then sends the audio data to the cloud to be transcribed, and acted upon accordingly.

It is not always sending voice data to the cloud. If it were, your battery life would be noticeably impacted - worse than being on a phone call.

Good explanation, thanks. The scenario is 'translate on device, steganograph the output on background traffic'. There is a huge push for 'on device ML', driven by noble intentions of hiding raw data from Skynet. Alas, the end of that road is devices that compress the raw sensorial input and report, possibly stealthily, the digest to Skynet.

This explanation does not explain why neither Google nor Apple allow you to change the wake word.

Myself, I believe any technical reason, if it actually exists, is a secondary factor - the primary reason is branding. They don't want people changing wakewords, because as long as everyone uses the same, "Ok Google" and "Siri" are terms/brands.

To my understanding, there is no technical reason behind not being able to use custom wakewords besides them just not wanting you to.

I also believe it is restricted mainly for branding and easy marketing.

It’s less CPU intensive to monitor for a single trigger phrase (“OK Google”) after which the more general, CPU-intensive speech to text is performed (either on device or in the cloud).

The devices aren’t translating all speech to text all of the time and then checking it against the phrase “OK Google”. They’re looking for the trigger phrase.

They are listening all the time. Presumably 'just' for the wakeword. Better yet, Alexa listens to everything you do inside your house, and the Ring Network listens to everything you do outside your house. And both of them are connected to the power outlet, so the 'but battery depletion' excuse doesn't apply.

In this case, the microphones are in fact on and processed by some relatively cheap (in terms of power consumption) algorithm whose sole purpose is detecting that phrase. Only after it detects that phrase does it start processing speech in any significant way

I wonder if anyone is monitoring the apps you can get on pirate bay (windows, photoshop, etc.)

I’m also curious about this. Would be nice to have some kind of general independent service (or wiki?) similar to consumer reports for cybersecurity.

From the discussion here, it appears that no one is monitoring broadly. The big apps (fb, insta) potentially are being checked as they have bug bounties and warrant research time. But for less popular apps, or pirated apps, it sounds like no one is checking.

Good question. iPhones tell you when applications access the mic or camera using an indicator light along the top, and I haven’t noticed unusual hardware access patterns from any of the popular apps that I use. But like most other users, I’m not always paying attention, and I don’t keep track of every access.

I was at the Zurich airport while I had a connecting flight to Barcelona from Belgrade. My friend and I briefly spoke about Zlatibor (a city in Serbia), 30 seconds later I got an ad for a hotel in Zlatibor, in the middle of Zurich Airport. We never spoke about or searched for Zlatibor.

If there was any evidence I’m sure the EFF would be talking about this.


EFF cannot monitor everything, it just protects your rights.

Sure, but if someone was monitoring and they did have evidence then the EFF would most likely publicize the findings. You can answer the question “has anyone found evidence of popular apps monitoring audio data” by searching EFF archives. It’s not exactly the same thing, but it is something of interest.

If this did happen, some 12-year-old with WireShark would probably make national news. In other words, Facebook and Instagram are probably 3 steps ahead of you. Whether that should scare or console us, well, that's up to you.

I have learned one thing, regardless of assumptions about if they are listening... never name your smart speaker "computer", the number of false triggers is frustrating as heck.

Only Spock was allowed to do that.

Haven't checked but couldn't you make one that does and compare data usage volume?

But "listening, on everyone" and "all the time" are not necessary assumptions.

There is so much misinformation in these comments about so many topics. It’s weird. Like bots or a hired damage control media agency that are meant to diffuse certain topics anytime they come up.

> Is there any evidence of this?

No, because this is a conspiracy. Whoever claims something like that should prove it and not the other way around.

I think you mean a "false conspiracy theory", because if there actually were a conspiracy to monitor conversation through common apps, as other comments mentioned, it would have been found already.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact