
Reddit User confirms Siri/Cortana transactions are recorded - benbristow
http://www.reddit.com/r/technology/comments/2wzmmr/everything_youve_ever_said_to_siricortana_has/
======
ibejoeb
I was under the impression that this was common knowledge. Here's a 2-year-old
article about it in Wired: [http://www.wired.com/2013/04/siri-two-
years/](http://www.wired.com/2013/04/siri-two-years/)

\---

Once the voice recording is six months old, Apple “disassociates” your user
number from the clip, deleting the number from the voice file. But it keeps
these disassociated files for up to 18 more months for testing and product
improvement purposes.

“Apple may keep anonymized Siri data for up to two years,” Muller says “If a
user turns Siri off, both identifiers are deleted immediately along with any
associated data.”

\---

~~~
RandomBK
If Apple "disassociates" the data, how can they delete the files when a user
turns Siri off? Doesn't that imply they still know which piece of data
belonged to a certain user?

~~~
manicdee
1) Apple deletes associated data when the user turns Siri off.

2) Apple dis-associates data after six months.

1 & 2 = data older than six months is not deleted, bit is not directly
associated with you. So that Siri search for that topic only of interest to
communist homosexual terrorist Jews will not have your user ID but might still
be identifiable from voice analysis.

You can't escape Macarthyism that easily.

------
s3r3nity
Only "recorded" in the sense that your Google/Bing/Yahoo searches are
"recorded," or your browsing history is "recorded". You don't think your
Google Now voice searches aren't?

I knew that my searches were probably tracked when I started using Cortana
(similar to any other search - unless you're using DDG,) and personally I'm ok
with it if it results in a great experience for me. And I'm amazed almost
daily at how solid Cortana (and Google voice search for that matter) is.

~~~
cortesoft
It is a BIT different to keep a voice recording, because you can recognize a
voice recording, unlike a written sentence.

~~~
falcolas
Not to mention the recording of your surroundings, and audio from before and
after the query.

Such a recording contains quite a bit more than your average search engine
query.

~~~
slg
And your search engine query can contain your IP address, geographic
information, unique browser fingerprint, and your account information if you
are logged in. If Google, Apple, Microsoft, or some malicious entity with
access to their data wants to identify you, it is likely trivially easy
whether through voice or text data.

~~~
falcolas
Oh, absolutely. The difference is that with voice, they also have the
potential to identify the people with you, what you're watching on TV, what
radio station is on, etc.

------
Irishsteve
You can check the audio that Google stores out in -
[https://history.google.com/history/audio](https://history.google.com/history/audio)

~~~
FLUX-YOU
Other useful things that were posted in that thread:

[https://www.google.com/settings/takeout](https://www.google.com/settings/takeout)

[https://www.google.com/settings/dashboard](https://www.google.com/settings/dashboard)

------
speechduh
Uhm, yes, duh. Isn't this common knowledge? How do people think speech
recognition systems are trained? What's more upsetting is that this person is
so uninformed about speech systems that they think this is weird, AND they
have access. The only people who should have access are the people who are
actually doing science.

Ahhh. Reading clarified it: this is someone who got hired to do transcription.

"I'm given an audio file (sound bite) and the corresponding text based
translation (how the phone translated the speech). My job is to listen to the
file, compare it to the text and provide feed back on how correctly the sound
bite was interpreted by the phone. If the text and speech are a perfect match,
I just move on. However, if the phone either translated something incorrectly
due to a heavy accent or loud background noise, I note that in my evaluation.
"

------
chippy
I think the main thing from this claim is that the audio is shared with third
parties.

~~~
frik
Apple Siri uses Nuance speech recognition technology; afaik Siri run on their
server at launch.

It seems they outsourced tasks to another company to improve the speech
recognition. The classification task that the Reddit user does can be used by
Nuance to improve their data.

As another Reddit user mentioned that after doing the job for a while you will
start recognizing speakers:
[http://www.reddit.com/r/technology/comments/2wzmmr/everythin...](http://www.reddit.com/r/technology/comments/2wzmmr/everything_youve_ever_said_to_siricortana_has/covn14e)

~~~
billyhoffman
There are hundreds of millions of iOS and Android devices out there. If
someone is starting to recognize voices, it supports a common sense reaction
that a subset of voice commands are being used for QA/QC purposes to improve
accuracy.

~~~
frik
That's definitely the case. Plus there are some heavy users while many users
try out Siri/GoogleNow/Cortana just a few times and never come back.

------
ebbv
A more accurate title would be "Reddit user claims Siri/Cortana transactions
are recorded." He doesn't supply any proof.

~~~
LLWM
If someone claimed the sky was blue, would you require them to provide proof
before believing them?

~~~
adricnet
Objection: argumentative.

And yes, if the claim didn't match with my observations I would expect that
they have or be prepared to provide proof in some reaonable form.
"Extraordinary claims .." and all that.

~~~
LLWM
The point is that the claim does match and is not extraordinary in any way.

------
bdcravens
Title says "Siri/Cortana", yet a large amount of the discussion revolves
around the saved history for Google Now:
[https://history.google.com/history/audio](https://history.google.com/history/audio)

------
krick
Well, yeah, it might be common knowledge _for some_ , but many don't know,
don't care and I enjoy every time it's "revealed" by some community over and
over again.

But seriously though, is there a way to do something with Android phone (like
using third-party firmware, removing some programs, configuring firewall) to
be somewhat sure it's "safe" while still being usable?

