
Enabling developers and organizations to use differential privacy - ramraj07
https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html
======
TedTed
Hi, I'm one of the authors of the scientific paper¹ linked in this blog post.
Incidentally, I wrote a series of blog posts explaining differential privacy
in layman's terms. The first post might be "not-technical-enough" for
HackerNews, but maybe the next in the series make up for it. Feedback welcome
=)

\- [https://desfontain.es/privacy/differential-privacy-
awesomene...](https://desfontain.es/privacy/differential-privacy-
awesomeness.html)

\- [https://desfontain.es/privacy/differential-privacy-in-
more-d...](https://desfontain.es/privacy/differential-privacy-in-more-
detail.html)

\- [https://desfontain.es/privacy/differential-privacy-in-
practi...](https://desfontain.es/privacy/differential-privacy-in-
practice.html)

\- [https://desfontain.es/privacy/almost-differential-
privacy.ht...](https://desfontain.es/privacy/almost-differential-privacy.html)
(describes a core intuition behind the system described in our paper)

\- [https://desfontain.es/privacy/local-global-differential-
priv...](https://desfontain.es/privacy/local-global-differential-privacy.html)

I also think Section 2 of the paper should be readable by most folks with a
basic understanding of SQL and differential privacy.

¹ [https://arxiv.org/abs/1909.01917](https://arxiv.org/abs/1909.01917)

~~~
antpls
After reading all your links, I'm still not sure why or where Differential
Privacy is needed.

1) How could aggregated data (means, average, min max) be used by attackers?
Aren't aggregated data already private? For example, the Google postgres
extension returns aggregated data, why is DP required here?

2) In the case of sharing entire databases, if all the PII are removed, why
does it matter that we can match two records from two databases? Yes we can do
correlation between 2 databases, but if PII were not gathered and stored at
all in any database, there would be no privacy issue in the first place.

~~~
TedTed
Good questions =)

1) Note that the "min/max" example trivially leaks individual information: for
example, releasing the max salary of employees of a company leaks the salary
of the CEO. More generally, there have been numerous attacks on privacy
notions purely based on aggregate data. One of my favorite is this one:
[https://blog.acolyer.org/2017/05/15/trajectory-recovery-
from...](https://blog.acolyer.org/2017/05/15/trajectory-recovery-from-ash-
user-privacy-is-not-preserved-in-aggregated-mobility-data/)

2) Typically, PII is not the only thing that can be used to reidentify
someone, and matching records from different databases can sometimes infer
sensitive information about people. One example:
[https://www.cs.cornell.edu/~shmat/netflix-
faq.html](https://www.cs.cornell.edu/~shmat/netflix-faq.html)

~~~
antpls
I'm still not convinced, but I guess I'm lacking critical technical background
to grasp it.

1) The CEO example isn't really a good one to me, given the wealth
inequalities in the world, leaking CEO's salary is almost desirable... I tried
to read the blogpost and paper about mobile data location. At one point they
talk about aggregated data, but then in the paper : "This dataset is collected
by a major mobile network operator in China. It is a large-scale dataset
including 100,000 mobile users with the duration of one week, between April
1st and 7th, 2016\. It records the spatiotemporal information of mobile
subscribers when they access cellular network (i.e., making phone calls,
sending texts, or consuming data plan). _It also contains anonymous user
identification, accessed base sta- tions and timestamp of each access._ ".
So... the data is not really "aggregated"? The dataset literally lists some
user IDs.

2) If I'm fired because my boss didn't like my history of movie, then it can
probably be defended in court, depending on the country. I could also find
another boss who has a natural sense of ethic and who doesn't judge me for
what I watch.

Thank you for the links anyway. I will look at them again in a few day to see
if I missed something

~~~
vimota
"The CEO example isn't really a good one to me, given the wealth inequalities
in the world, leaking CEO's salary is almost desirable... "

Your value judgement about a potential attack vector doesn't disqualify that
it is an attack vector.

------
the_duke
Since neither the post nor the repo explain it, some context if you have no
clue what this is about:
[https://en.wikipedia.org/wiki/Differential_privacy#%CE%B5-di...](https://en.wikipedia.org/wiki/Differential_privacy#%CE%B5-differential_privacy)

~~~
kuu
I mean, the text explains it:

 _Differentially-private data analysis is a principled approach that enables
organizations to learn from the majority of their data while simultaneously
ensuring that those results do not allow any individual 's data to be
distinguished or re-identified._

~~~
TeMPOraL
This is literally: "$Method is a $buzzword that enables $DesirableThing". If
you already know that $DesirableThing is desirable, this sentence tells you
nothing about how $Method works.

~~~
chii
it may take too much technical expertise to know that $method works. For
example, most people don't understand asymetric encryption, but still
"blindly" trust that it works because they are told by a trusted authority.

------
QuadrupleA
I think this HN adtech paranoia is getting a little extreme lately, e.g. many
of the comments from "ocdtrekkie" here ("Wow, Googlers in full force today.",
"This is a market that needs to be shut down.")

I definitely share mixed feelings about these adtech companies, and tend to
think these personalized, targeted surveillance ads should die. But what about
just plain old contextual, relevant ads, e.g. ads for car parts on a hot rod
site?

There's a place for web advertising in general - it supports "free" sites in a
better than any other model invented thus far. I don't think it's gonna be
possible to burn the whole place down and go back to 1993.

Subscriptions work for large high-value offerings like Netflix or NY Times,
but average people with small sites are unlikely to be able to provide that
level of value. Other alternatives like micropayments always seem to fail,
because even if viewing an article costs 1-cent or 1/100 cent, it causes users
to hand-wring and watch their usage carefully, deliberate about their browsing
choices and eventually stop. It's a psychological thing - people loved it when
AOL switched from by-the-minute billing to unlimited, even though they
generally paid less under the old scheme. Suddenly they were free to just
browse without thinking about it, which is also what ad-sponsored content
enables on the web.

There are certainly people who just create for attention / altruism alone and
get no material reward - more power to them - but they'll have to keep their
day jobs.

In a way I'm gratified to see adblocking taking off, because the ad industry
constantly misbehaves and does every sleazy desperate thing it can - from
seizure-flashes and automatically-opening popup windows in the 90s, to
performance-killing HTML5 garbage ads, distracting animations and creepy
privacy-invading remarketing stuff today. Desperate, annoying junk that is
indeed killing the web (and people's trust in advertising).

I personally think a move back to classy, non-personalized ads could be the
way to go, ideally static images that don't track you and don't fry your CPU &
battery with 700 javsacript libraries. That may be wishful thinking too. But
perhaps the ad industry can try a little harder to stem the tide of garbage
and win people's trust back a bit.

~~~
enraged_camel
>>But what about just plain old contextual, relevant ads, e.g. ads for car
parts on a hot rod site?

Those existed before Google/Facebook/et al were a thing. You don’t need
“adtech” for them.

Adtech by and large refers to the insanely complex tech infrastructure behind
tracking people across the Internet and using a variety of tricks and dark-
semi-dark patterns to try to get them to click ads, at the definite but not
immediately obvious expense of privacy.

~~~
QuadrupleA
Again I wonder whether all the tracking BS is necessary. Google, Facebook &
Twitter aggregated a huge audience by giving away high-quality products, e.g.
search, gmail, maps, etc.

They could make a ton of money selling ad space whether it's privacy-invading
or not (indeed, search ads did just that for many years by targeting the
search term, not the personal details of the searcher).

~~~
nexuist
I think the problem is that the market demands growth and while basic search
ads did yield insane profits, we are now approaching the end of the Internet
growth period where everyone who has expendable capital is also already on
these platforms. Tech companies have effectively run out of people to convert
into customers and thus they have to do these crazy tracking shenanigans to
try to squeeze as much out of every individual as possible.

------
mplanchard
Ironically, I cannot see the article on Firefox for iOS without disabling
tracking prevention.

~~~
s_Hogg
Really? I've got Privacy Badger working in my browser (chrome) and everything
is fine. Wonder what the specific difference is.

~~~
mosselman
You might have added some exceptions for Google in your Privacy Badger
installation or PB hasn't recognised Google as a tracker (yet).

Although I have to add that my computer's installation of Firefox (with all
tracking protections enabled) does not make this blog unreadable. My settings
in ublock do though (third party requests are all blocked).

~~~
panpanna
This site loads JavaScript from a large number of sites (~6-8 domains, 20
sites in total).

This is not "tracking" per se, you can get the site working by accepting
JavaScript from the "right" sites (if you know which are not for tracking).

------
ocdtrekkie
Fundamentally, Google's initiative on differential privacy is motivated by a
desire to not lose data-based ad targeting while trying to hinder the real
solution: Blocking data collection entirely and letting their business fail.

In a world where Google is now hurting content creators and site owners more
than it is helping them[1], I see no reason to help Google via differential
privacy when outright blocking tracking data is a viable solution.

[1] [https://sparktoro.com/blog/less-than-half-of-google-
searches...](https://sparktoro.com/blog/less-than-half-of-google-searches-now-
result-in-a-click/)

~~~
joshuamorton
If differential privacy solutions lead to things consumers want more, like
better products (think: more accurate search results without uniquely targeted
tracking), then is blocking all tracking viable?

~~~
ocdtrekkie
Can we not pretend your employer wants to use this for better search results
and be up front that it's about better ad revenue, which isn't beneficial or
desirable to consumers?

~~~
kalleboo
As an independent developer I really want to know what features of my app are
actually being used (and what devices they're using it on, OS versions, etc).
I try to talk to my users but you only hear from the loud annoying ones, not
the silent majority and the conclusions are inevitably wrong.

I'd love a tool like differential privacy to gather statistics in a provably
anonymous way. Without a tool like that, only companies with shedloads of
money (like Google, Microsoft, Apple) can afford the market research (or the
amount of spaghetti to throw at the wall) to compete.

------
mailshanx
Something that always bothered me when I read posts on Google's blogs: why is
it that it's always authored by a PM? Why can't perhaps a senior engineer also
write an announcement post occasionally?

Yes, the names of the engineers on the team are present in the acknowledgement
section: but, this is a single line at the bottom of the post, whereas the
name of the PM and the fact that he is the author is featured prominently
below the title. This pattern is common across many product/OSS library
announcements.

Sure, one could argue that the PM has a holistic view of the product or
library being announced, and that developing this perspective is in fact their
job. But surely a sufficiently senior engineer can (and often does) have an
equally holistic, or perhaps even more insightful overview. At least
sometimes. Even if this were not the case, why not acknowledge everyone's
contributions at the same place in the article?

I think this is symptomatic of the ubiquitous class divide between the "suits"
and "nerds" in the corporate world.

~~~
summerlight
In my observation, it's mostly because engineers usually don't want to write
this kind of articles which needs to be reviewed by multiple stakeholders. If
they want to write its themselves, I believe it's possible and there's some
instances as well.

------
pixelcort
In addition to Differential Privacy, Secure Multiparty Computation is another
way to maintain privacy, while allowing computation across multiple users.

[https://en.m.wikipedia.org/wiki/Secure_multi-
party_computati...](https://en.m.wikipedia.org/wiki/Secure_multi-
party_computation)

The benefit of this is that you can get an exact computation, whereas with
differential privacy the output is rougher.

The benefit of differential privacy is that it does not rely on the trust of a
majority of other users; you can theoretically verify that a certain percent
of the time your device sends out a wrong answer.

~~~
perone
The only issue is that you won't be able to find reasonable libraries for it,
all of them are just PoCs without testing or stability.

~~~
aloknnikhil
Kyber has a pretty solid implementation of Shamir Sharing. But Shamir Sharing
itself has security concerns.

[https://godoc.org/gopkg.in/dedis/kyber.v2/share](https://godoc.org/gopkg.in/dedis/kyber.v2/share)

[https://github.com/dedis/kyber](https://github.com/dedis/kyber)

~~~
Ar-Curunir
Secret sharing is a tiny component of MPC; you need much more to compute
anything useful.

------
bayesian_horse
The problem I see with differential privacy is this: One part of the public
doesn't care about privacy enough to demand things like that, the other part
wouldn't trust the math and the implementation behind it.

I mean, I consider myself moderately knowledgeable about statistics, but even
I have problems understanding DP. Worse, scientists who are supposed to use it
will also have a harder time understanding DP over their usual methods.

~~~
Spivak
I mean it's nothing to do really with the math. It's the fact that you have to
send the real data to an untrusted 3rd party and rely on their word that they
will anonymize your data.

And if a situation arises where a manager at Google has to make the decision
to 'slightly' reduce the effectiveness of differential privacy because they
need a certain metric for a report do you really think they're going to make
the principled choice?

------
colton
There are a number of interesting technical challenges related to making
differential privacy work in production (e.g. implementing novel algorithms
for ML and statistical inference, proving privacy properties).

If you are interested in learning more, my company (LeapYear) is hiring
differential privacy researchers, as well as software engineers interested in
developing an enterprise machine learning platform.

Some background on our team: We recently raised our Series B, and hired
VMWare’s first VP of Engineering, who scaled VMWare from 15 to 750+ engineers.
Almost all of our backend code is written in Haskell.

On the commercial side, we’ve signed several multi-million dollar contracts
with Fortune 100 customers in financial services, healthcare, & tech, and
deployed on sensitive data at petabyte scale.

Happy to answer questions and review applications submitted here:
[https://leapyear.ai/careers](https://leapyear.ai/careers)

~~~
rpglover64
Hi Colton!

Having worked at LeapYear for just over 3 years now, I can confirm that it's a
good place to work and is solving interesting technical problems.

I can also answer questions, if anyone is interested.

------
m3kw9
To me what ever privacy google talks of doing is and should be taken with a
massive _Asterisk_

~~~
homonculus1
It should just be wholly disregarded. These companies are utterly
untrustworthy and any belief that they suddenly care about your interests for
some reason is naïve.

------
xafke
I made a video about this a while back:
[https://youtu.be/gI0wk1CXlsQ](https://youtu.be/gI0wk1CXlsQ)

I think it's good that someone is putting in the work and open sourcing tools
to make differential privacy easier. But at the same time I'm wondering if
this is just a smokescreen put up by Google.

------
formalsystem
Differential privacy doesn't mean privacy via something like encryption. It
means that a company can query the dataset without exposing sensitive
information about a small population in the dataset.

You still have to trust the company hosting the dataset so distributed
solutions lend themselves more naturally to trust.

~~~
rpglover64
While this is true, there's some nuance.

First of all, there's a lot of recent (and not so recent) work in Local
Differential Privacy [1], which uses the "untrusted curator" model. Although
this software doesn't use it, the article mentions RAPPOR, which is a good
example.

Second of all, encryption protects your _data_, but not your _privacy_; that
is, assuming your data gets used in any way, you have no guarantees about
whether the result reveals anything you'd rather keep secret. Of course, if
you're talking about normal encryption, your data _can't_ be used, but then
you're not really sharing it at all, as much as storing it there (like
Dropbox). But once you start talking about things like homomorphic encryption
or secure multiparty computation, it's important to keep in mind that they are
complements to differential privacy, not replacements.

[1]:
[https://en.wikipedia.org/wiki/Local_differential_privacy](https://en.wikipedia.org/wiki/Local_differential_privacy)

------
joefkelley
I read the article, thought to myself, "let's see how HN finds a way to say
this is actually bad for privacy", clicked through to comments here, and was
not disappointed. The hivemind anti-Google kneejerking is quite out of
control.

~~~
spookthesunset
It really is getting out of control. Anything with "google" in the title gets
overrun with people with an axe to grind. It is very tiring.

------
ocdtrekkie
Wow, Googlers in full force today.

This isn't a market that needs to be powered. This is a market that needs to
be shut down. Targeted advertising is inherently harmful to society.

~~~
summerlight
Well, the problem here is that the market doesn't agree with you so it exists.
I don't want to argue on the topic of justice (which is pretty subjective),
but your argument should give a rationale what is the value of following your
solution and why it is more beneficial to the society with an objective
measures instead of giving a groundless dogmatic assertion.

~~~
blub
Markets are amoral and should not be used to justify anything.

You're essentially arguing that something is beneficial to society because it
makes money (mostly for Google).

------
thecleaner
I expect Google to file a patent for this soon.

~~~
sveme
Apple has been talking (and implementing) Differential Privacy for years.

~~~
cromwellian
Chrome was shipping differential privacy (RAPPOR) before Apple.

~~~
thecleaner
I hope people are aware that Google has filed patents for Batch Normalization
and Dropout. Both methods are very broad.

~~~
cromwellian
True, and would be a concern if Google had a litigious history of being a
patent shakedown artist.

~~~
JohnFen
It's a concern regardless.

~~~
cromwellian
Agreed. My observation is that patent shakedowns are the last resort of a
market loser. Once your company starts losing vast amounts of marketshare to
competitors, the patent warchest comes out.

IBM and Microsoft were notorious for this, as was AT&T. Microsoft has reformed
recently.

~~~
ocdtrekkie
The problem of course, is that Google isn't the market loser... _yet_. So
while Google doesn't have a history of patent shakedowns _yet_ , if we look at
other tech companies that are past their prime, we can make a reasonable guess
Google will eventually join the patent shakedown game.

------
vijaybritto
Is google using this in their own products? They were just fined $170M in the
past few days!

~~~
londons_explore
I think you'll find legislators are less worried about something being
"mathematically proven to protect privacy robustly", and more worried about
just collecting money from someone publicly perceived as evil.

If they cared about actual privacy, they would go after no-name ad networks
and data mining companies.

~~~
simion314
And why not start with the source? Google and Facebook sell your data to the
ad networks, maybe Google and Facebook receive more complaints against then
some unknown ad networks that we did not know even existed before GDPR forced
the sites to disclose them.

~~~
aorav
Could you provide some source about Google selling your data to ad networks?
Because Google explicitly says that they don’t do that and none of the fines
received by Google is for selling your data. I’m curious to read about this.

~~~
JohnFen
Personally, whether or not Google actually sells raw data isn't that important
to me. My objection is that they (and all other companies that do this)
_collect_ the data in the first place.

