
The Algorithmic Foundations of Differential Privacy (2014) [pdf] - sonabinu
https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
======
Cynddl
Differential privacy appears regularly on Hacker News, with either theoretical
articles or projects that aim to implement it. Yet often there is a huge gap
between both.

For example, Apple has touted about its use of differential privacy, but
researchers [1] have shown that the privacy budget is reset every day and the
parameters, buried inside the code, lack a proper derivation. Similarly, Uber
seems to use DP for internal analytics. However, the proposed model does not
seem really robust and does not provide accurate results at all [2]. One
should always carefully review claims associated with implementations of
differential privacy.

[1] [https://arxiv.org/abs/1709.02753](https://arxiv.org/abs/1709.02753)

[2]
[https://github.com/frankmcsherry/blog/blob/master/posts/2018...](https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md)

~~~
singhrac
One problem I've found with differential privacy is that no one talks about
how to set \epsilon. I've read this book, and it's quite well written and
complete, but as the title says it focuses on the algorithmic foundations.

This paper [1] is much better for practitioners, and actually gives very
reasonable values for the privacy guarantee (e.g., (1.2, 1e-9)), and builds on
this great paper: [2]. Worth a read if you train neural networks.

[1]:
[https://arxiv.org/pdf/1710.06963.pdf](https://arxiv.org/pdf/1710.06963.pdf)
[2]:
[https://arxiv.org/pdf/1607.00133.pdf](https://arxiv.org/pdf/1607.00133.pdf)

------
js8
I don't know anything about DP, so my question might be unrelated. But I think
perhaps someone can answer it. Almost 20 years ago, I was told of the
following problem at the university:

You have two identical databases (sets of n bits) that do not communicate. You
want to know a single bit from the database. How many bits do you have
retrieve from each database so that neither database would learn about which
bit you were looking for?

The simplest answer is n, retrieve all bits. But we were also given a better
answer, square root of n - you order the bits into a square, ask for a xor of
random subset of columns but to the first/second database respectively
with/without the column you're looking for.

And here is my question, we were also told that this can be done even better,
in cube root of n bits. But I never learned the answer, and since I wonder,
was that claim correct? Does anyone know this problem and the better solution?

~~~
frankmcsherry
What you are looking for is "Private Information Retrieval". For the cube root
result, check out:
[http://www.tau.ac.il/~bchor/PIR.pdf](http://www.tau.ac.il/~bchor/PIR.pdf)

------
anon1253
Some interesting concepts to consider: K-anonymity, ℓ-diversity, t-closeness.

From the paper
([https://www.liebertpub.com/doi/full/10.1089/bio.2014.0069](https://www.liebertpub.com/doi/full/10.1089/bio.2014.0069))
I wrote a while ago:

While the risk of re-identification (of a record or individual participant)
might be virtually non-existent with synthetic data, one could predict unknown
attributes of a known individual, given an ideal model of synthesis. In other
words, an attacker could find unknown attributes of some individual with a
certain probability by looking for the closest match in the synthetic data.
This is known as attribute disclosure.

There are several methods for quantifying attribute disclosure, most notably
t-closeness, which is defined as: An equivalence class is said to have
t-closeness if the distance between the distribution of a sensitive attribute
in this class and the distribution of the attribute in the whole table is no
more than a threshold. A table is said to have t-closeness if all equivalence
classes have t-closeness.

In short: the distribution of a particular sensitive value should not be
further away than a distance t from the overall distribution.

Using the t-closeness metric circumvents issues associated with k-anonymity
and ℓ-diversity. Briefly, k-anonymity states that a certain attribute class
should be present in at least k records, which introduces ambiguity in the
data set. However, if each of the k equivalence classes are the same,
properties could still be resolved simply by elimination. The ℓ-diversity
metric circumvents this problem by adding a further requirement: in addition
to the class to being seen in k records, these records must have at least ℓ
‘well represented’ values. But if an attacker knows the real-world
distribution of values, then attributes could still be disclosed with a
certain probability, simply by combining different data sources

------
herodotus
Way back in 1978, Demillo, Lipton and Dobkin published a note in the IEEE
Transactions on Software Engineering (SE-4(1):73- 75 · February 1978) called
"Even databases that lie can be compromised" The basic idea was to look at the
idea of giving slightly wrong answers to median type queries in order to
protect results for individuals. They showed that, even when the query system
deliberately lied, it was possible to compromise the data base. I am surprised
that this note is not listed in the bibliography of the differential privacy
article.

------
chrispeel
Peter Kairouz [1] gave a talk [2] recently describing mechanisms to get around
the limitations of DP described in other comments.

[1] [https://web.stanford.edu/~kairouzp/](https://web.stanford.edu/~kairouzp/)

[2] [https://youtu.be/6Uur2_TnwYE](https://youtu.be/6Uur2_TnwYE)

------
Asdfbla
I love the concept of differential privacy, but it seems hard to incentivize
the "data hoarders" to actually use it, even if you ignore the challenges of
building real-world differentially private systems. Google and Apple use it
for some things, but in general it doesn't seem like something the market will
use by itself.

Also doesn't help that differential privacy itself is maybe too arcane and
subtle for the public to talk about and demand, unlike for example encryption
which people probably generally at least understand to mean something along
the lines of hiding their data in some sense.

------
aj7
I’d just love to debate a fascist Justice Department over the meaning of “You
will not be affected...” as I am led off into detention. Real rights are often
negative rights. My home ownership has little to do with entitlement to
activities within. It has everything to do with keeping you out of my house if
that’s what I choose. Get it?

