
Google employees are listening to Google Home conversations - bartkappenburg
https://translate.google.com/translate?sl=nl&tl=en&u=https%3A%2F%2Fnos.nl%2Fartikel%2F2292889-google-medewerkers-luisteren-nederlandse-gesprekken-mee.html
======
gringoDan
I think the responses to this can be broken down into a 2x2 matrix: level of
concern vs. understanding of technology.

1) Don't understand ML; not concerned - "I have nothing to hide."

2) Don't understand ML; concerned - "I bought this device and now people are
spying on me!"

3) Understand ML; not concerned - "Of course, Google needs to label its
training data."

4) Understand ML; concerned - "How can we train models/collect data in an
ethical way?"

To me, category 3 is the most dangerous. Tech workers have a responsibility
not just understand the technologies that they work with, but also educate
themselves on the societal implications of those technologies. And as others
have pointed out, this extends beyond home speakers to any voice-enabled
device in general.

In conversations about this with engineers the response I've gotten is
essentially: "Just trust that we [Google/Amazon/etc.] handle the data
correctly." This is worrying.

~~~
cronix
I'm in the 5th category. 5) Understand ML; concerned - won't allow any of
these things in my house period because they will always use them for things
behind the scenes that they won't state. I don't care how well trained they
are, or "ethical." Ethical... according to who, and at what time period in the
future? Ethics change. The data they have on you won't. Look at all of the
politicians and other people getting in trouble for things they said 15 years
ago, which were generally more acceptable at the time but we've "progressed"
since then. Who will be making decisions about you in the future based on last
years data? Just don't give it to them.

~~~
outworlder
Before I go further: do you own a smartphone? If not, you can ignore the rest.

So, assuming you have a smartphone. That is a device which has:

\- Permanent network connection. Doesn't matter what you do on your Wifi, it
has the cellular data network. Which is controlled by an independent processor
with its own firmware

\- Excellent noise-cancellation microphones. They may not be able to pinpoint
the sound source like an Echo would, but they are still pretty sensitive

\- Accurate location updated via GPS. If you block GPS, it can still use the
cell phone towers as an approximation, plus any SSID beacons nearby

\- Very powerful processor, capable of listening for 'interesting' data before
sending anything

\- Similar functionality to an Echo / Google Home / Apple whatever. They'll be
listening for "hey siri" or "ok google"

And so on. Plus, anyone you interact with will have such a device in their
pockets.

Given that. why would an Echo be of any particular concern? At least you can
monitor their network activity easily. Not so with a smartphone, which is a
much higher threat. And yet most people sleep with them next to their beds.

~~~
inflatableDodo
>Before I go further: do you own a smartphone?

Someone really needs to make a decent modular smartphone with a detachable
radio.

~~~
meruru
The Librem 5 has a kill switch for the radio:
[https://puri.sm/products/librem-5/](https://puri.sm/products/librem-5/)

~~~
jakeogh
Does the radio have DMA?

~~~
jakeogh
nope! awesome.

------
SmirkingRevenge
The one thing about these stories that keep coming out about the home
assistants... they kind of create the impression that this is an issue
specific to home speakers, and you can avoid it, by simply not buying them.

That's misleading.

Any voice command you use to operate any internet connected tech gadget, from
phones to smart TV's, is potentially stored and flagged for human review.

You really have to avoid using voice commands at all, on _all_ of your
devices. Even that is probably insufficient. You probably have to go even
further and actively disable voice command features on all of your devices,
assuming they actually support such a setting. Otherwise here's still the
possibility of an accidental recording taking a journey through the clouds, to
a stranger's ears.

~~~
arien
And not to be anywhere near anyone else's listening devices.

Isn't there a law in some US states that there needs to be consent before
recording someone? How does this fit in and who would be held responsible, the
owner of the listening device or the company behind it?

~~~
bastawhiz
You already consented by agreeing to the terms of service. If someone else is
talking to their smart device while you're talking, it's ostensibly their
responsibility. There's no reasonable laws that prevent you from being
overheard in the back of a recorded phone conversation

~~~
ethbro
(IANAL, but) Not accurate in the US.

Most states are either one party or two party consent states. One party = you
can unilaterally record anything (not sure this includes things you're not
actively involved with, e.g. spying). Two party = you must have consent of
everyone in the recording.

By a plain reading of two party consent statutes, people are in violation if
their home speaker records a guest without obtaining consent.

I'm sure Google and Amazon's lawyers would try to weasel out of compliance via
claimed anonymization, but that's definitely not the spirit of the law.

Old, but thorough: [http://www.mwl-law.com/wp-content/uploads/2013/03/LAWS-ON-
RE...](http://www.mwl-law.com/wp-content/uploads/2013/03/LAWS-ON-RECORDING-
CONVERSATIONS-CHART.pdf)

You're also going to bump up into specific wording on whether a given statute
covers only telephone conversations or oral conversations, as most of these
are phone wiretap laws that may or may not have been worked ambiguously.

Additionally, there are federal statutes that likely also bear.

~~~
testvox
Know that even in all party consent states, if you continue talking after
being made aware that the conversation is being recorded implies consent. This
is why devices like google home are legal, they make a loud warning sound
before they begin recording. For example in CA the law states that:

> (a) A person who, intentionally and without the consent of all parties to a
> confidential communication, uses an electronic amplifying or recording
> device to eavesdrop upon or record the confidential communication, whether
> the communication is carried on among the parties in the presence of one
> another or by means of a telegraph, telephone, or other device, except a
> radio, shall be punished...

(b) For the purposes of this section, “person” means an individual, business
association, partnership, corporation, limited liability company, or other
legal entity, and an individual acting or purporting to act for or on behalf
of any government or subdivision thereof, whether federal, state, or local,
_but excludes an individual known by all parties to a confidential
communication to be overhearing or recording the communication._

[https://leginfo.legislature.ca.gov/faces/codes_displaySectio...](https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=PEN&sectionNum=632)

~~~
brlewis
> they make a loud warning sound before they begin recording

This contradicts my personal experience last week with a google-controlled
music player. Music was the only response to a voice command to play music,
and silence was the only response to a voice command to turn it off.

------
electrograv
So Google’s response is (paraphrased as fairly as I can while removing the
sugar-coating):

 _’Yes, we hire people to listen in to and transcribe some conversations from
the private homes of our customers (so as improve our speech recognition
engines); but the recordings aren’t linked to personally identifiable
information.’_

Even assuming they have only the purest intentions here, I still don’t
understand how they can possibly guarantee that these recorded conversations
are not linked to personally identifiable information!

For example, what’s to stop me from saying _“Hey Google, I am <full legal name
/ ID> and my most embarrassing and private secret is <...>”_?

One might argue that they could detect this in the recognized text and omit
those samples, but presumably the whole purpose of hiring people to create
transcripts is because the existing speech-to-text engine isn’t perfect, and
they need more training data.

~~~
hammock
You paraphrased it in a different way and that might be why you're confused.

Google says "the excerpts are not linked to personally identifiable
information." To me that means the metadata is stripped, not that they strip
anything out of the audio.

~~~
electrograv
Thank you, good catch. I’ve edited my paraphrase to make it more accurate in
this way.

That said, it still sounds like Google is trying to convince us that the data
they capture (not just the metadata) is never linkable to personally
identifiable information, which if true would genuinely ease many privacy
concerns here.

As far as I know, just because data is not explicitly annotated with PII
doesn’t erase the legal (and ethical) responsibilities associated with
handling data that _contains_ PII.

So even if they worded their response so it’s truthfulness is
legally/technically defendable, it’s still a bit of a ‘red herring’ at least
(I don’t think anyone is accusing Google of _explicitly_ associating these
audio recordings with user IDs).

~~~
d1zzy
But in order to tell if it contains PII it has to be listened in by a human to
transcribe it... It's like Schrödinger's audio assistant ;)

------
TheAdamist
"The man, who wants to remain anonymous, works for an international company
hired by Google. "

So not a Google employee at all, a probably low paid contractor who is in
possession of thousands of audio files. Your privacy matters, except when the
bottom line is involved.

~~~
numbsafari
What is doubly concerning here is that the contractor was in a position to
demonstrate how the system worked to the reporters. That would seem to
indicate they have access to that data in a non-secured environment.

I'm not familiar with EU law around these things, but I would imagine there is
some kind of whistleblower mechanism available, and a right for authorities to
audit/inspect such activities?

~~~
C1sc0cat
I would expect that a telecoms employee who was doing similar work on quality
etc to be quite securely vetted.

If I was doing now what I did back in the day For BT / Dialcom (I had root on
the UK's main ADMD) I would probably have to pass DV vetting (TS in USA terms)

~~~
dmix
What about all the telecom APIs like Twilio? They have raw access to millions
of phone calls every day. I doubt they have ‘secure rooms’ for debugging.

------
inerte
Home, Siri, Alexa, M, they all do. I have friends that work on this field
transcribing the audio, and measuring its accuracy. Sometimes it's multiple
layers of contractors: An employee hands the task to a contractor, another
contractor verifies the speech to text, and they're all managed by a
contractor.

Search for languages like Portuguese, Swedish, Chinese, etc on LinkedIn and
you'll find the jobs posts
[https://www.linkedin.com/jobs/search/?keywords=portuguese&lo...](https://www.linkedin.com/jobs/search/?keywords=portuguese&location=San%20Francisco%20Bay%20Area)

~~~
rwc
"They all do" ... my understanding is that this expressly does not happen with
HomePod conversations.

~~~
inerte
So I meant "they" in the sense of the companies, not necessarily the home
devices. Sorry about the confusion.

I know 100% that it happens with Siri. If Apple is excluding HomePod
conversations from Siri's dataset, that I don't know.

~~~
danieldk
Do you have a source for human annotation of Siri recordings? Do they use
subcontractors like Google?

~~~
inerte
First hand source, it was the first job several of my Brazilian friends (or
their spouses) got when they relocated to the Bay Area. They use companies
like Moravia or Welocalize. Take a look at some of the job posts from my link
above.

------
paganel
I grew up as a kid in a country ruled by Securitate [1], one of the few
institutions that rivaled the East-German Stasi when it came to spying on its
own citizens, and as such I'm very, very perplexed of why would anyone bring
in a listening device in his/her own house out of his/her own volition. And
those people even pay for the privilege of having their home-lives actively
monitored and listened to almost all the time, it's crazy.

[1]
[https://en.wikipedia.org/wiki/Securitate](https://en.wikipedia.org/wiki/Securitate)

~~~
viklove
Do you have a smartphone? Why would you bring that listening device (the
smartphone) into your own house out of your own volition? Please explain,
because I am very perplexed.

~~~
clebio
This is such a specious argument and yet repeated ad nauseam. Please stop. For
one, it doesn't make the listening and harvesting of data ok, just because it
may already be happening. Also, it's just condescending. You don't think the
parent or other people of reasonable intelligence and valid concern haven't
thought about that? Then, it also just misses the point anyway -- no, I'm
_not_ OK with my cell phone harvesting anything and everything it can get
(cadence of my walk, say). Yes, I like having access to maps on the go. These
concerns aren't mutually exclusive, and most of us choose not to live like
Stallman. We still live in a rich and complex ecosystem of law and precedence,
highly dependent on where we are on a given day, what nationality we are, etc.
None of that invalidates the quite reasonable expectation to privacy (not to
mention that solutions exist such as opt-in, but cavalier people working in
tech choose instead the race to the bottom of privacy and chasing cost-per-
click).

~~~
viklove
Right, I'm not saying it makes it ok, I'm saying OP is a hypocrite or doesn't
understand what technology is capable of. But of course if you believe any
device _capable_ of listening _is_ listening, then you have to forfeit your
smartphone along with your home automation device.

------
chance_state
"What Orwell failed to predict is that we'd buy the cameras ourselves, and
that our biggest fear would be that nobody was watching."

~~~
duxup
I think Frank Herbert hit it a bit more on the nose:

“Once men turned their thinking over to machines in the hope that this would
set them free. But that only permitted other men with machines to enslave
them.”

Frank captures the motivation, the fact that it is desirable is the scary
part.

Akin to the genetic engineering scene from Gattaca.

~~~
noir_lord
Aldous Huxley got there earlier with A Brave New World as well.

Speculative Fiction was ahead of the curve in that era.

~~~
jammygit
Anyone want to recommend a book? Let me join in by suggesting the Asimov robot
books, which I’ve just begun

~~~
Medicalidiot
Brave New World by far is the most accurate book to date of what the modern
world looks like. That's my strongest recommendation.

Fahrenheit 451 is also a solid read.

~~~
Loughla
Second to Brave New World. It's startling how much that book seems like a map
to where we're headed.

------
duxup
This falls into the category of:

I bugged my house... NOW MY HOUSE IS BUGGED!

Not to dismiss the value of the news here, it is important for folks to know,
but the overall situation is both concerning, and amusing.

~~~
espeed
Your phone has more sensors on it than these devices and most people have
their phones on them 24/7\. And most phones have the same voice-controlled
features too. Why people are concerned about narrow scoped voice-devices more
than their phone is a curious thing. Is it because it talks back to you that
makes it more apparent?

~~~
duxup
It is just more recent.

Plenty of articles about phones listening too.

------
lovetocode
I own 4 Home Minis, 1 Home and 2 Home hubs I honestly don't care so long as my
data is used to improve the functionality and stability of my investment. It
is quite another thing if they are selling my conversations to third-party
vendors.

~~~
groovybits
Per article:

> The man, who wants to remain anonymous, works for an international company
> hired by Google. His job is to listen to the audio clips and to write out
> what he hears, so that Google can improve the speech assistant.

So,

> It is quite another thing if they are selling my conversations to third-
> party vendors.

That is exactly what Google is doing.

~~~
bagacrap
"Selling" is when you trade goods for money. Google is paying money for a
service. Pretty much the opposite of selling.

~~~
groovybits
The concern remains the same though: Making user data available to a non-
Google entity.

Do you care if Google makes money off of that trade? Really, they get value
one way or another, because the analysis is incorporated back into the
product.

------
RosanaAnaDana
I mean. Of course they are. Do you expect to be able to do any meaningful
level of training on data that hasn't been properly labeled? At some point, a
human has to go in and correct the software when the software gets it wrong.
If you want services that do what Google Home does, you have to have this.

Even with that, I'm sure that the engineers are flagging voice requests that
happen more then once, or where some one has to manually change or correct
what the software thought was the request.

This is only creepy if you don't understand how the software works.

~~~
anbop
How about Google PAY MONEY to generate training data or gather it from
informed people? (“Make $100 by recording your voice for Google’s machine
learning algorithms.”) not “Arms full with an infant? This $50 device will
solve all your problems” and then shipping those recordings to third party
contractors in unsecured facilities.

~~~
tbrownaw
Maybe if you get you training data by some _very different_ source than your
real data comes from, it won't be representative and won't work very well?

~~~
la_barba
We test our medicines on a small group that is representative of the entire
worlds population. We build soil models based on sampling a small region. We
don't test your entire blood to do a medical test. I don't know what you mean
by "real data", but representative sampling is how work gets done in every
single domain in the world. Google can do this.

~~~
RosanaAnaDana
Yeah no.

Representative sampling is how we formerly did this kind of work. It wasn't
particularly good or effective, but we didn't have the methods or compute to
go beyond that. No longer.

~~~
la_barba
You're free to have your own opinion, but anything specific beyond "it doesn't
work"? I work in pharma, and we use representative sampling every single day
in every single thing we do, and it works.

~~~
RosanaAnaDana
Representative sampling, does 'work' in the sense that it may or may not
'prove' whatever it is you had a question about. But the issue is that you
effectively building in your assumptions about what is 'representative' into
your sample. Its (imo) the central issue in the reproducibility crisis: our
assumptions about the world and how that impacts the questions we ask about
it.

It was previously intractable to do a census rather than a sample, and maybe
for your purposes a sample is good enough or a census remains intractable. In
my field , this is how things were done for decades (and still largely is),
and even though (imo) it did a piss-poor job, it was good enough for some
purposes. As piss-poor job is _still_ better than knowing nothing. Maybe this
is good enough for your purposes.

There's a third way however, which is to move beyond sampling and to perform a
census. This is the difference I'm speaking of. We're at the point where we
don't have to sample because we can measure. Effectively, this is what modern
data science _is_. We've always had the ability to sample and interpolate. It
doesn't work very well (imo:
[https://en.wikipedia.org/wiki/Replication_crisis](https://en.wikipedia.org/wiki/Replication_crisis))
and usually is reflecting back to us something about our assumptions in how we
sampled. But thats just it. We don't have to rely on a sample if we can take a
census.

~~~
la_barba
>But the issue is that you effectively building in your assumptions about what
is 'representative' into your sample.

Even if I agree with your premise, Google is not going to build a custom voice
model for every individual anyway. There will be simplifications made. There
will be assumptions made, and they will end up with a representative model
anyway. So you're actually just bolstering my point. It makes a ton of sense
to record people in a known, controlled environment and tweak variables one by
one- such as the size of the room, the location of the microphone, introducing
varying amounts of background chatter, etc etc. This is how normal science
happens all the time, and it has worked for us so far. And we haven't even
addressed the ethics of spying on people in such a blatant manner. That is a
whole another conversation.

> It doesn't work very well (imo:
> [https://en.wikipedia.org/wiki/Replication_crisis](https://en.wikipedia.org/wiki/Replication_crisis))
> and usually is reflecting back to us something about our assumptions in how
> we sampled. But thats just it.

Modelling aggregate human behavior/psychology is not a proper science. The
same is true of macro economics and other such non-exact fields. Problems in
those fields do not apply across other fields.

------
gtirloni
From a computer science perspective, what should Google do to train its models
in a privacy conscious way?

~~~
Maximus9000
Pay beta testers who know what they're signing up for? Train the software with
previously human transcribed audio (like TV/audiobooks/etc)

~~~
libria
How about 2 Google Homes: The $100 Privacy version and the $30 Opt-in Testing
version. That way some people get their privacy and I get Chicken Little to
subsidize my cheaper IoT products. =)

~~~
vineyardmike
The problem with this is that "Chicken Little" is probably not an adventurous
person, but rather a poor person. This makes privacy another privilege that
can be bought. This devolves into another way to exploit poorer people.

------
JorgeGT
The biggest issue IMHO is how the average consumer has been deceived into the
belief that current AI is pure AI, when in reality a lot of humans are looking
at your pictures, listening to your recordings, crawling through your inbox
and analyzing your browsing/purchasing/streaming history, right now:
[https://imgs.xkcd.com/comics/trained_a_neural_net.png](https://imgs.xkcd.com/comics/trained_a_neural_net.png)

~~~
itronitron
Engineering Tip #2 -> if you pay someone else to do it, you can technically
say you did it in the cloud

------
rev12
I think a lot of people here are under the assumption that voice commands, on
_any_ device, have the potential to be human reviewed. I am not sure whether
or not the general public has that same assumption.

That being said, my biggest concern is the fact that many of these device
don't have a _hardware_ microphone kill switch. I feel better when I _know_ I
can control when a device is listening in. I've read reports that some Alexa
devices have them, but I don't own any so I am unable to verify that.

I want all of my devices with microphones to hardware based kill switch for
the mic; that's my phone, laptop, tablet, _everything_.

~~~
pessimizer
> I think a lot of people here are under the assumption that voice commands,
> on any device, have the potential to be human reviewed. I am not sure
> whether or not the general public has that same assumption.

That's because we, as an industry, have fooled them into thinking that AI is
real, and that people who don't know it's real are idiots. We don't think
students deserve a proper tech education, so unless they are professionally
techies, they have to learn from marketing materials (designed to convince
them to buy things.)

------
groovybits
Assuming $0.3/audio clip and base wage of $10/hr, that equates to 33.3 audio
clips/hr = 266.4 audio clips/day that are being monitored by any one 'language
expert'.

However, Google does not specify how long a 'conversation' is. How many
sentences make up a conversation? When is the cutoff point?

Google also says '1 in 500' conversations are monitored. That means for any
one 'language expert', there are approx. 133,200 conversations/day that have a
chance of being monitored.

So basically, you have a 0.2% chance that your conversation is being picked up
by any particular 'language expert' per day.

------
Ensorceled
The number of people in this thread who believe that this is ok because, 1)
it's obviously the only way Google could train their voice system and thus 2)
people clearly knew what they were getting into, is horrifying.

------
rchaud
It's no coincidence that companies like Amazon market their Echos as "stocking
stuffers" for the holiday season. I've wondered how Google Home and these
"smart home" devices were always able to be priced as low as they are. Goes to
show that paying for the product doesn't exempt you from still being part of
the product.

------
amacneil
Serious question: how do people think the ML models for Home, Alexa, Siri, etc
are trained, if not with human labeling?

~~~
arien
Doesn't mean you can use live data for tests. Should be opt-out by default,
with the chance to opt-in if you wish to collaborate with your data.

------
jankeymeulen
Link to the original source: [https://www.vrt.be/vrtnws/nl/2019/07/10/google-
luistert-mee/](https://www.vrt.be/vrtnws/nl/2019/07/10/google-luistert-mee/)

Submitted link is citing this one...

------
rosszurowski
A bit tangential, but I tried sharing this link with a few friends on Facebook
Messenger, and noticed it's blocked because it "violates Community Standards"
[1]. Even shortened bit.ly links are blocked.

Anyone know why that would be the case? I'm trying to not assume malice (eg.
maybe it got misflagged?) but it certainly feels like censoring and is yet
another push for me to drop Messenger too.

[1]: [https://i.imgur.com/9n1Hyqb.png](https://i.imgur.com/9n1Hyqb.png)

~~~
css
Google Translate links have been blocked for a long time because
translate.google.com can be used as a proxy.

~~~
SquareWheel
And for the record, so can bit.ly links. URL shorteners or redirection
services are often blocked from being posted (eg. also on reddit).

------
bisRepetita
What I am interested is not just to know that employees are sometimes
listening, and why.

I want to know what instructions both humans and computers are given if they
hear illegal actions, such as violence, illicit trade, etc

If you are an employee, and hear a rape scene, a blackmailing dialog, do you
have a duty to report, or to remain silent?

I also want to know how much access law enforcement has on this data. And
whether they can re-identify the info, with or without a warrant.

------
lonelappde
Some companies tell me on every phone call that I am being recorded. Some
don't (but maybe are recording anyway?)

Should Voice Assistants be required to annoy that they are recording every
time they received an activation? Until the user explicitly approves long
term? Should that approval require daily or monthly renewal? Should the device
detect new voices and give them the same warning? Should these recommendations
become law?

------
air7
Anything you put online can potentially leak.

This is common knowledge for us tech oriented people but with time (and
stories such as these) it is (hopefully) becoming more and more general
knowledge.

Personally I'm ok with it: If I use a cloud based voice enabled device, I
accept that my voice might be heard by someone working there. Similarly, if
one sends nude pics as a "private" message in messenger, they shouldn't be
surprised if someone at FB sees them. Know the risk, evaluate the benefit, and
make an informed decision.

One caveat is the _explicitness_ of "put online": The above holds for things
put up deliberately, not for things uploaded accidentally or covertly. In
these cases the data is illegally obtained and therefor, imo, not important
for this discussion. That's because I (naively) don't think big companies
would risk a large scale purposefully illegal enterprise (and if they do,
there should be major ramifications) and 3LA can spy on anyone anyway if they
want, and that's (currently) legal/above the law.

------
svat
For the record, a response here:
[https://www.blog.google/products/assistant/more-
information-...](https://www.blog.google/products/assistant/more-information-
about-our-processes-safeguard-speech-data/)

------
lexandstuff
One thing that Google Home does noticeably poorly with is disabled voices and
non-native speakers. Google Home would be really handy for a family member
with down syndrome who can't read nor write but can easily learn voice
commands, but Google Home doesn't currently understand him at all. My partner
with a Malaysian Chinese accent also really struggles to use Google Home.

That said, I notice the quality of voice recognition is rapidly improving and
I'm thankful for the work the Google engineering team are doing; voice
recognition and language translation has the potential to improve the lives of
so many people.

------
asperous
The current headline seems to imply more than what's actually happening.
What's happening is when google hears "ok google" the recording is saved and
may be manually reviewed for ML training purposes.

The problem is hearing "ok google" is not reliable and it sometimes captures
normal conversations.

I do think it's an ethics issue that people may not be told clearly when
buying these products that these recordings may leave their homes and can be
heard by people (though only up to a minute and without any clear PII unless
someone says something identifiable).

------
m0zg
I don't get why anyone is surprised by this or why the outrage is restricted
to only Google Home. Google employees and contractors are listening, viewing,
and reading stuff from time to time in order to provide labels to their deep
learning systems. Moreover, you've consented to this when you created your
account or in some cases just used their services without an account.

You can't really have useful machine learning without lots of annotated data
quite yet, and Google prides itself on useful machine learning.

------
aflag
I've been toying with the idea that companies that use user data passively
(while the user is not interacting with the system in a way to expect that
data usage) should have to notify the user about the usage. For instance, if
data you produced was randomly selected to be listened to by someone, you
should get a notification somewhere about it.

I think something along these lines would help increase user awareness.

------
gopher2
Of course they are! There are user researchers, people looking for spam and
abuse, people tagging data, among other things who are absolutely looking at
your: private messages, search queries, profiles viewed, Alexa conversations,
Tinder messages, bookmarks, call history, purchase history, geolocation
history... and on and on. I guess there are a lot of news stories left to
write on this topic.

------
lonelappde
Catch-22/Irony?

The main problem here is that Google hired someone like this person who leaked
days they were trusted to protect. It's a concern for users (Insider Risk at
Google), and something we need laws to protect. Surely if Google Corporation
is a poor steward of this data, the specific malicious actor here who violated
both his employment policy and his users' trust is especially bad.

------
deftturtle
I unplugged my family’s Alexa, since they weren’t using it but left it on.
They haven’t used it at all for months and haven’t plugged it back in. It sits
on their shelf. I wonder if this is common, where families overestimate the
usefulness of voice assistants and then rarely use them after a couple months.

------
C1sc0cat
Well of course you will have some qa / quality monitoring just as you will
with telecoms and I seem to recall that the Dutch go in for monitoring of
phones in a big way.

Also does that article remind me of the Monty python sketch "news for wombats"

------
rickyc091
Now it's making me wonder if the light on my Google Home Mini is on if that
means it's passively listening to me. I always do a hard reboot since the
indicator light suddenly stays active for a long period of time.

~~~
guyzero
That means you have a notification:
[https://support.google.com/googlenest/answer/7073219?hl=en](https://support.google.com/googlenest/answer/7073219?hl=en)

------
mirekrusin
"(...) honey, what's our card number? 123 (...)" "(...) honey, what was the
password? 123 (...)" "(...) honey, i got us 10 first bitcoins, we need to
printout mnemonic 123 (...)"

------
dmje
I'd love to read the piece but the cookie pop covers 70% of the window on
mobile and I can't close it

[https://postimg.cc/yJCgdWbr](https://postimg.cc/yJCgdWbr)

------
api
There are two big things right now on my list of things I will never use:
closed remotely operated home assistants with cameras or microphones that are
not physically switched, and Facebook Libra.

------
r00fus
My biggest frustration is going to a friends house which is blanketed with
Echo/Google coverage and realizing that all my words could be used against me
at a later time.

How does one approach this situation?

~~~
DanHulton
I mean, it's roughly the same level of exposure you run into literally every
minute you're out in public anyway - all those phones around you have
microphones, and more and more, they're always listening in exactly the same
fashion. This is how always-on "Hey Siri" or "Okay Google" works.

Which isn't to say it isn't frustrating, just that your frustration at your
friend is misplaced - they're not exposing you to anything which you aren't
already exposed. If your fear is that these tech companies are going to misuse
casually-overheard conversations from these devices, they're just as likely to
use it from all the phones you're near.

If your fear is that your friend is going to use this tech, however, maybe you
need better friends. There's nothing stopping an unscrupulous friend from
putting plain-old audio recorders through their house anyway.

~~~
thrwer3434312
Scary though isn't it ? With increasing moves in these companies to govern
human thought, I can imagine a time when it'll be extremely difficult to have
any independent thought of your own. It's already this way in the
academia/media, where anything outside the norm is considered 'blasphemy'.

------
solarkraft
What's next, Google collecting everyone's location data?

------
intopieces
Will someone walk me through the nightmare scenario implied by this
revelation?

Random contractor overhears me saying 1 minute’s worth of speech in my home.
Doesn’t know my name or where I live.

What happens?

------
tus88
I don't see the problem to short anonymous fragments to help resolve
recognition problems. Full conversations, even anonymous, seems problematic
though.

------
anonygler
As an employee I can opt into having my Home recordings reviewed to improve
quality. I’ve never heard about this being done for regular users.

------
omnimkar69
google employees liste to the recordings made with the smart google home
speakers and via the google assistant app on smartphones. worldwide ex.belgium
and netherlands people listen to those recordings to make the search engineer
smarter. VRT NWS was able to listen to more 1,000 tracks thats why the
employees must be listeing to the home conversations.

------
omnimkar69
google employees listen to the recording made with the smart google home
speakers and via the google assistant app on smartphones as well as worldwide
in belgium and netherlands , people listen to those recordings to make the
search engine smarter. VRT NWS was able to listen t more than 1000 tracks
these are pieces usally spoken conciously.

------
kerng
What is often the most disturbing about these revelations is that these are
contractors, who have that amount of access...

------
cj
I've long been curious how GDPR should be applied to Google Home, Alexa, etc.

These devices record and upload voice recordings to the cloud (with a
percentage apparently shared with Google contractors to process). The audio
clips can be recordings of conversations of people who have not consented to
being recorded. And I can't imagine it's possible for Amazon or Google to
reasonably comply with Art. 15 or 17 (right of access by data subject, and
right to erasure).

Is there something I'm overlooking? Or is this just a known risk that
Amazon/Google accept in providing the service?

~~~
noelsusman
[https://myaccount.google.com/data-and-
personalization](https://myaccount.google.com/data-and-personalization)

Click on Voice & Audio Activity, then Manage Activity. You can listen to and
delete every recording Google has of your voice. It's a little trippy to be
honest. They've had this since long before the GDPR.

------
Markoff
next time you will tell me they listen to voice input data from assistant...

just joking, I had actually access to them when working on Google project, if
you use gboard, voice input and Google assistant, Google has basically
anything you input on your phone "for research purposes"

------
mbrumlow
I think this is the last straw for me. I have already tinkered with the idea
and built enough for POC. But I am going to have to replace all my google home
and amazon stuff with home built systems that I control both the hardware (to
a extent) and the software.

------
Hernanpm
how would you reproduce a bug otherwise

------
a3n
I'm shocked. Shocked.

------
Hongwei
How big is this GDPR fine going to be?

------
_pmf_
Intersect certain church bell sounds with Google assistant buyers in a small
village and you have the individual.

EDIT: thinking about it, a database of temporal-geographical background sounds
would be nice to have

------
jacobsenscott
What's most interesting to me is that an army of contractors is required to
get even the low accuracy levels of voice recognition we have today. The "AI"
revolution is pure smoke and mirrors designed entirely to bilk investors out
of their dollars. There's been no improvement in "AI" in the last 50 years,
except that we have a lot more data to push through the same useless models.

~~~
jononor
100 bucks for anyone who uses a 50 year model/method and get within twice the
word-error-rate of today's speech recognition systems.

Also, how do you explain the progress on ImageNet challenge over the last 10
years? Each method there uses the same dataset, yet error rates (top5) have
gone from 30% in 2011 to around 3% in 2016.

~~~
jacobsenscott
Faster computers, and more of them.

~~~
jononor
Not going to deny that faster GPUs has helped a lot. But it is not the full
picture either. DenseNet-121 has higher accuracy than VGG16, yet requires only
1/15 the storage space, and 1/5 the number of operations during inference. I'm
pretty sure training time is faster also.

------
notTyler
I very much believe the only reason smartphones aren't listening to us and
recording us constantly is due to battery purposes. When you plug something
in, yeah, I have no trouble believing they're recording everything and storing
it somewhere.

