
Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 - sohkamyung
https://arxiv.org/abs/1709.02753
======
frankmcsherry
While I like this work, the title may be misleading to people who think of
"privacy loss" as something distinct from "differential privacy".

What they show is that as you use Apple's implementation, the differential
privacy parameter grows (providing weaker guarantees as time passes). They
don't show that they can bypass the mechanism and it's guarantees, just that
Apple has rigged the implementation to decay the guarantees as you continue to
use it (note: decay stops if you stop using Apple stuffs).

~~~
ramensea
This is a known weakness with differential privacy. Depending on how much
noise you inject into your data, your analytics either approach
meaninglessness or the privacy decays.

The hype machine surrounding Apple did not want to hear this and were just
caught up with the idea "Apple could data mine you while maintaining ur
privacy".

------
bugmen0t
Funny that almost everyone in this threat seems to "get" Differential Privacy
and thinks of it as a good tool. But when it was discussed for Mozilla Firefox
everybody was appalled and enraged. (Thread at
[https://news.ycombinator.com/item?id=15071492](https://news.ycombinator.com/item?id=15071492))

~~~
frankmcsherry
I think the big difference there is that Mozilla was proposing opt-out
differentially private data collection, whereas I believe Apple has
historically been opt-in (with the default being no data collection).

~~~
FlyingLawnmower
Do you believe an opt-in in necessary with a (properly implemented)
differentially private collection mechanism? Just curious about your take on
this.

~~~
frankmcsherry
I think that is the ethical thing to do, yes.

Differential privacy _does_ statistically disclose information about you, and
whether the quantitative bound used is up to your standards is a decision you
should make before it happens. Given that user-level understanding of DP is so
low, I don't think defaulting people in to levels chosen by others is a good
idea. I 100% guarantee Mozilla doesn't have the background to make this choice
responsibly, and doesn't have near the DP expertise the RAPPOR team has / had
(who I also wouldn't trust to chose things for me).

Ideally, the use of differential privacy would make you more willing to opt in
(when you have a choice) rather than being a smokescreen for organizations
that simply want to harvest more data (which was Mozilla's stated motivation).

Edit: fwiw, there are some cool "recent" versions of differential privacy that
let the users control the amount of privacy loss on a user-by-user basis. So,
you could start at 0 and dial it up as you feel more comfortable with the
tech. This incentivizes organizations to be more transparent with what they
do, as it (in principle) increases turnout.

Edit2: For context, Apple's "default" values appear to be (from this paper)
epsilon = 16 * days. That means that each day you are active, the posterior
probability someone has about any fact about you can increase by a factor of
exp(16) ~= 88 million. So, numbers matter and I am (i) glad Apple made it opt-
in, (ii) super disappointed they aren't at all transparent about how it works,
and (iii) thankful that the paper authors are doing this work.

~~~
korolova
And it seems that the "default" values will be increasing further in iOS 11 --
to 43 * days.

------
simonh
So now Apple's privacy system is only stupidly more secure than everyone
else's instead of absurdly more secure.

So 16 per day sounds like a lot more than 1 or 2 per day, but what do these
numbers mean? Presumably 16 per day is a theoretical maximum if you were to
generate every kind of privacy related data ever day. But is 16 really a lot?
How high would that have to cumulatively go in order to be useful for
extracting reliable info on an individual? Wouldn't the info collected on an
individual still have to be associated with them? Frankly I'm not really able
to determine any of that from the paper.

------
mirimir
Could someone please ELI5 how an "intimate" provider (such as Apple, Google or
Microsoft) can collect _any_ data ongoingly without eventual loss of privacy?

~~~
quantisan
Let’s say you wanted to count how many of your online friends were dogs, while
respecting the maxim that, on the Internet, nobody should know you’re a dog.
To do this, you could ask each friend to answer the question “Are you a dog?”
in the following way. Each friend should flip a coin in secret, and answer the
question truthfully if the coin came up heads; but, if the coin came up tails,
that friend should always say “Yes” regardless. Then you could get a good
estimate of the true count from the greater-than-half fraction of your friends
that answered “Yes”. However, you still wouldn’t know which of your friends
was a dog: each answer “Yes” would most likely be due to that friend’s coin
flip coming up tails.

Source: Google's RAPPOR project

I pointed to some open source repos on my blog post from 2015
[https://www.quantisan.com/a-magical-promise-of-releasing-
you...](https://www.quantisan.com/a-magical-promise-of-releasing-your-data-
and-keeping-everyones-privacy)

~~~
tkuichooseyou
In that case though, you would know that the "No" friends are definitely not
dogs, and the "Yes" friends are possibly a dog, so it seems like the dogs
would still not be completely anonymous. Wouldn't the dogs be better off not
partaking in the survey and being narrowed down into a group of possible dogs?

~~~
tempay
The implementation of this should give a random answer when not being
truthful.

If the coin comes you heads you answer truthfully. If it comes up tails, you
flip the coin again and answer if the yes if the coin is heads and no if the
coin is tails. You can then no longer know if anybody is (or is not) a dog.

The probabilities can be adjusted to provide more or less privacy (while
making the data less or more useful). For example, if you only answer
truthfully 0.1% of the time it would be hard to know anything about anyone, at
the cost of knowing the total number of dogs less precisely.

