
Google admits listening to some smart speaker recordings - beastibash
https://techerati.com/news-hub/google-admits-listening-to-some-smart-speaker-recordings/
======
owlninja
Similar discussion:
[https://news.ycombinator.com/item?id=20402070](https://news.ycombinator.com/item?id=20402070)

------
carlosdp
I like to imagine sometimes how these kinds of "revelations" happen in tech
newsrooms.

Reporter: Tell it to me straight, do you listen to the recordings? Google:
Well yea, that's how we train the... Reporter: WE GOT 'EM!

It's like the "Apple admits throttling CPU when battery starts dying" story
all over again. It wasn't a secret, you just didn't ask before.

~~~
danShumway
In some ways, this is missing the point. To people working in machine
learning, this isn't a revelation. To ordinary people, it is.

A common refrain that comes up in discussions about privacy is that ordinary
consumers don't care about stuff like Google Home. They don't care about
privacy, only weird tech people care about privacy.

However, the fact that articles like this get traction shows that a
substantial portion of ordinary people don't understand what privacy they're
giving up when they use Google Home. They didn't understand when they were
installing the devices that a human was going to be able to listen to their
recordings. And when they do understand that a human might be listening, that
creeps them out.

This implies two things:

a) if properly informed and educated, normal people probably would care about
privacy more. Part of the reason why it's mostly tech-people complaining about
Google Home and Alexa is because it's mostly tech-people who understand what
these devices do.

b) consumers aren't being properly informed about the privacy implications of
devices like Google Home and Alexa, or else they wouldn't be surprised by any
of this. If this news story is getting traction, it means that Google did not
do a good enough job informing users about who had access to their data.

~~~
yongjik
As an ex-Googler, I'd like to say that "ordinary people" still won't
understand what privacy they're giving up. Before the news, they
underestimated it; not they overestimate it, but still without clear
understanding of what's happening.

A group of researchers listening to a random sample of audio clip with no way
to identify actual speakers is very different from someone being able to look
up your name and address and pull down your conversation for leisure. (To be
fair, the latter is technically not impossible - it's just that such an act
will likely trigger half a dozen alarms, and the perpetrator will be fired
quickly. Unless it's the government secretly asking for your information - but
then, if the government is specifically looking for you, all bets are off
anyway.)

It's basically the same as Google search. If you type anything into Google's
search box, your search will be recorded and preserved forever so that
Google's engineers can analyze usage patterns. How else would they improve
their search algorithm?

Edit: I probably shouldn't have used "forever" \- I don't know exactly how
long your search results will be preserved. If it helps, consider it replaced
by "long enough that someone can write a TechCrunch article that enrages
people".

~~~
reaperducer
_How else would they improve their search algorithm?_

There are lots of ways. Ways that other industries test and refine without
violating privacy.

One example: Having a group of people who sign up to be part of your testing.
That way there's informed consent.

~~~
GrapeFriedNiggr
Well yeah, but it would be way more expensive and you would have orders of
magnitude less data. There are also issues with biased sampling.

Point being, the search wouldn't be as good.

~~~
dang
Trollish usernames aren't allowed on HN because they end up trolling each
thread you post to.

[https://hn.algolia.com/?query=by:dang%20trollish%20usernames...](https://hn.algolia.com/?query=by:dang%20trollish%20usernames&sort=byDate&dateRange=all&type=comment&storyText=false&prefix=false&page=0)

I've banned this account for now, but if you want to email hn@ycombinator.com
a new username, we can rename the account and unban it for you.

------
vezycash
If Google and it's outsourced partner can listen to the recordings, 3 lettered
agencies can too. Even if they currently can't, they'll soon make a law to
grant them access to Google's and Amazon's.

After this, a hijacking, explosion, or terrorist attack would occur. They
would argue that it could have been prevented if place X had a listening
device installed. And that's all it would take to push a law mandating the
installation of such devices in open spaces, eventually private ones.

UPDATE

On second thought, there's a faster way to the goal. Just make smartphones
listen all the time.

~~~
malvosenior
> _On second thought, there 's a faster way to the goal. Just make smartphones
> listen all the time._

The good thing about trying this is people would definitely notice the
increased bandwidth usage of their phone streaming audio in real time, all the
time.

~~~
smattiso
Not necessarily, speech to text engines are pretty decent so it's conceivable
a listening device could just send the transcript.

~~~
crooked-v
If the point is to train speech to text analysis, just sending the transcript
doesn't help.

------
dev_dull
Is this honestly surprising to any developer? How can you develop a product
like Siri/OK Google without being able look closely at edge cases? And if you
have to record conversations to troubleshoot even extremely rare edge cases,
you still end up with a system that allows this eavesdropping.

~~~
stefan_
You recruit people for testing and obtain their informed consent?

What do people here think how say medicine or research is developed?!

~~~
thfuran
I think medical software works on pretty much exactly model the parent comment
describes, where software developers commonly have access to (anonymized)
patient data and less commonly and less readily to unanonymized patient data.

~~~
davesmith1983
I worked for a year with Patient Medical Data in the NHS in the UK. Typically
it is as simple as script to replace patient names after downloading a live
backup. You then delete the backup of the DB that has the real names.

That is pretty much it.

------
lota-putty
Tangent: I used to work for a Telecom S/W company a decade ago, SMS/MMS/USSD
products. We were privy to SMS/MMS exchanged between customers.

PKI is a must these days. Albeit, anything that connects to internet now can
be monitored in some way or the other, no?

------
Rapzid
Doesn't seem much different than the disclaimer given when calling into a
company that the conversation is being recorded for "QA and training"
purposes(more like performance and CYA haha).

I wonder how sensationalized people will get with the headlines for click-bait
purposes. While "recordings" is technically correct, "interactions" may be a
slightly more precise word to use in the headline. I'm imagining a ton of
headlines designed to make people believe Google Home is making random
recordings of home conversations and letting people listen to them.

~~~
jdsully
I don’t recall any google home device informing me that conversations may be
recorded. Informing the user is the difference - not the actual recording.

~~~
davesmith1983
It is probably buried in their T&Cs.

~~~
jdsully
Since I only see them at friend's houses - I'm not a party to any of those
contracts.

~~~
davesmith1983
Yes however it is their property and their device so there is implicit
consent.

~~~
jdsully
I could consent to it using AI to answer. But how could I consent to it
recording me when I was never informed?

------
bork1
I'm struggling to figure out how they have a Security and Response team to
deal with the fallout of these issues without having enough
privacy/security/customer-focused developers/product folks to proactively
bring up these concerns. Google _seems_ like the type of company to do at
least a little bit of risk modeling before the release of software. If they
knew they were going to listen to recordings, how did this concern not get
brought up? If it was brought up, did folks just decided it wasn't important
enough to protect against?

~~~
jhayward
They have the security and response team activated because someone disclosed
that they do this, not to investigate the fact that they do it. They're there
to plug the leak.

~~~
azinman2
Except the privacy policy always warned about this. Everyone doing speech
recognition is doing this — you have to in order to get any kind of QA.

The caveat is that it should be both anonymizes as well as only in respond to
the wake up command. It seems to be both, so I don’t see the problem.

Actually I do — the editorializing of these headlines makes it seem nefarious
when it’s not.

~~~
gibba999
It is pretty nefarious. In traditional research and product development
protocols, you would have people opt into something like this, and optionally
pay them for it.

If Google gave out a hundred thousand Google Home units for free to test
subjects, with informed consent, there would be no big deal. It would cost
Google $2.5 million, and it'd probably be enough data.

If my web site policy discloses "I may randomly send a thug to your house to
shoot your children," and you come, visit, click through the license which
warned you, and then I shoot your family, that doesn't mean I'm not doing
something super-evil.

Google seems to be doing something super-evil here. Their response -- plugging
the leak -- seems equally evil. People have a right to know what's being done
with their data, and at least under European law, Google has a legal and
ethical obligation to disclose things like this in language people can
understand.

GDPR is rather well-written here. It looks like Google is breaking it, and
currently trying to shoot the whistle-blower.

Thank you whistle-blower!

~~~
mehrdadn
> If my web site policy discloses "I may randomly send a thug to your house to
> shoot your children," and you come, visit, click through the license which
> warned you, and then I shoot your family, that doesn't mean I'm not doing
> something super-evil.

You kinda had me until you lost me here. Analogies need to make sense. If you
have to go this far with your analogy then that says more about your own
argument than the other side's.

~~~
saagarjha
You're missing the point, which is that you can slip anything into a privacy
policy or other long agreement, no matter how outrageous it may be, and nobody
will read it. Putting anything there does not make it ethical or legally
binding.

~~~
UncleMeat
It also doesn't make it unethical. Putting privacy related issues in a privacy
policy makes sense to me.

~~~
gibba999
A privacy policy is definitely the right place for privacy issues. My point is
exactly as vharuc made above: Putting something there neither makes it ethical
nor unethical. A contract or license is not an excuse for bad behavior.

* If my privacy policy is a copy of HIPAA, that's an ethical privacy policy.

* If my privacy policy is as Google's here, it seems unethical without clear informed consent (which a disclaimer in a novel-long privacy policy doesn't provide).

* If your privacy policy says you'll collect incriminating information about me, and sell it to the highest bidder for use in blackmail, it's unethical even with attempts at informed consent.

------
danbruc
Only 0.2 %? 1 out of every 500? That seems like a lot to me, especially given
that there must be millions if not billions of interactions. How many of those
things are out there? And how many interactions does the average user perform?
And they keep them all? Forever? I could probably find the numbers myself or
at least estimate them, I just don't care enough. I would however by happy to
learn them if someone happens to know them.

~~~
danbruc
In response to a deleted comment that said the following. [2]

 _Google is not training one language model, but many of them (I 'd estimate
~70 language models from the voice settings menu on my phone). So 0.2% in
total doesn't sound too unrealistic to me as this should be closer to 0.002%
per language._

There is a large variation of the number of speakers between different
languages. What would they want to do? Aim for the same number of training
points for each language? Then for a language with 20 times fewer speakers -
Thai compared to English - they would have to look at 4 % [1] of all
interactions in Thai. Add to this that the distribution across languages is
most likely very skewed, i.e. languages spoken in poorer regions of the world
have a lot fewer users than languages spoken in richer regions.

Or maybe they want more training points for more frequently used languages,
then, if they aim for a number of training points proportional to the number
of interactions, every interaction has a 0.2 % chance of being used as a
training sample regardless of the language. If you perform two interactions
per day - and I will happily admit that I have not the slightest clue whether
this is even on the right order of magnitude, I have never used any such
system - then you reach 500 interaction within one year, which means that
after one year of usage you have a reasonable chance that at least one of your
interactions has become a training point.

[1] Probably not actually true because due the large number of English
speakers the percentage for English would most likely be less than 0.2 % but
right now I can not be bothered figuring out the correct numbers.

[2] Meta question - would this generally be consider acceptable without naming
the user that made the comment? Or should deleted be deleted?

------
scarejunba
I have a Google Home in every room of my home. This only records after the
trigger word, right? And it isn’t tied to my account? Well, then, I’m fine
with it.

~~~
nocturnial
If you want raw data: out of the 1000 audio recordings the journalists could
get their hands on, 153 were recorded without the trigger word.

Google provided those audio recordings to language experts for transcription
without account information. The journalists managed to track down several
people using only the audio and confronted those people with the audio. They
confirmed it was indeed their voices.

~~~
scarejunba
Like for the context-keeping? I know it’ll record post-response without an
additional trigger phrase instance and that’s fine.

~~~
nocturnial
It mentioned it was probably due to misinterpreting some words as the trigger
phrase. They formulated it as "any word that remotely sounds like google could
trigger it"

~~~
scarejunba
Oh, that’s unfortunate. I’d definitely want them to first have humans view the
trigger phrase time before they okay viewing the remainder.

------
blueboo
Is the implication that we as a society would be horrified by a “Chinese room”
voice assistant — a human responding to voice commands and performing an
assistant’s tasks?

A service whereby this function is automated 100% of the time and very rarely
requests are transcribed for QC by trusted partners (a failure point
apparently) seems .. reasonable?

------
A4ET8a8uTh0
With nest debacle I sent a message to 'my' senator ( this time I will send a
postcard I guess ) after nest hidden microphone debacle. I got a non answer
after a month. How much money you need to own a senator. I am not even joking.
There has to be a way to crowdsource this.

------
pikapikamtf
surely no one is surprised by this? why would you put a device in your home
that is constantly listening to you? crazy.

~~~
blueboo
Convenience outweighs infinitesimal downside

------
dvaun
The posted article did cite their sources; the original report was made by
vrtNWS [1]. I recommend reading the discussion which occurred yesterday for an
array of arguments [2]. \----- One thing I personally believe — and find
apparent about these discussions revolving around data privacy — is that the
general public (at least in the USA) doesn't understand the capabilities of
modern apps and other tech they use. Sure, there's news coverage of tech
giants and adtech firms mining/utilizing user data for profit — the picture
painted, though, seems to only show a black box to most people who aren't
involved with the tech industry, and who don't have the background to really
understand certain concepts (eg networking, ML, etc). There are definitely
growing concerns about personal privacy [3][4]; continuing forward, however,
the availability of user data and the growth with social media will probably
rise and have an even larger impact on society as my generation (and younger
ones) has already been tied into using these platforms [5].

I'm hopeful that this can be changed with changes to the education system. I
would love to see computer education added as a general topic covered (ie it
would be a very clear example of how useful math is...it would help others
understand why they should learn beyond add/sub/div etc).

There is already a push to modernize schools and prepare students for the
adult world; chromebooks are very common in schools that receive grants and
funding for tech initiatives. Home econ, carpentry, and other courses used to
be common (maybe still common in some places?) in highschool. How about
computer education, specifically?

[1]: [https://outline.com/2PmPtH](https://outline.com/2PmPtH) [2]:
[https://news.ycombinator.com/item?id=20402070](https://news.ycombinator.com/item?id=20402070)
[3]: [https://www.pewresearch.org/fact-
tank/2018/03/27/americans-c...](https://www.pewresearch.org/fact-
tank/2018/03/27/americans-complicated-feelings-about-social-media-in-an-era-
of-privacy-concerns/) [4]: [https://www.pewresearch.org/fact-
tank/2018/09/05/americans-a...](https://www.pewresearch.org/fact-
tank/2018/09/05/americans-are-changing-their-relationship-with-facebook/) [5]:
[https://www.pewresearch.org/fact-tank/2019/04/10/share-
of-u-...](https://www.pewresearch.org/fact-tank/2019/04/10/share-of-u-s-
adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/)

------
bprasanna
Should we be surprised!

