Hacker News new | past | comments | ask | show | jobs | submit login
Enabling developers and organizations to use differential privacy (googleblog.com)
261 points by ramraj07 12 days ago | hide | past | web | favorite | 170 comments





Hi, I'm one of the authors of the scientific paper¹ linked in this blog post. Incidentally, I wrote a series of blog posts explaining differential privacy in layman's terms. The first post might be "not-technical-enough" for HackerNews, but maybe the next in the series make up for it. Feedback welcome =)

- https://desfontain.es/privacy/differential-privacy-awesomene...

- https://desfontain.es/privacy/differential-privacy-in-more-d...

- https://desfontain.es/privacy/differential-privacy-in-practi...

- https://desfontain.es/privacy/almost-differential-privacy.ht... (describes a core intuition behind the system described in our paper)

- https://desfontain.es/privacy/local-global-differential-priv...

I also think Section 2 of the paper should be readable by most folks with a basic understanding of SQL and differential privacy.

¹ https://arxiv.org/abs/1909.01917


Thank you. I'll try to digest these. I barely understand the math involved. So I'm pretty sure my "intuition" is wide of the market. I'm very grateful for your efforts to explain, socialize your work.

When I studied cryptographic voting systems, my "aha" moment was realizing the magic sauce is creating hash collisions so that a secure one-way hash can be used to protect voter privacy.

Re-re-reading the differential privacy stuff, this para jumped out:

https://en.wikipedia.org/wiki/Differential_privacy#ε-differe...

"The intuition for the 2006 definition of ε-differential privacy is that a person's privacy cannot be compromised by a statistical release if their data are not in the database. Therefore with differential privacy, the goal is to give each individual roughly the same privacy that would result from having their data removed."

Oh. The "differential" part means modeling the difference between data captured and not captured.

I think (hope) this means figuring how much to fuzz the capture data so that hash collisions will match real world fuzzing.

--

Again, I'll continue to try to grok this stuff. Real world (story book style) examples will be very helpful.

Until I do understand, I think it's crucial for crypto and privacy minded people to quantify the assumptions and context involved. When I was working on election integrity (and medical records & guarding patient privacy), all the discussions were just make believe. I did help author a govt report to meant to help quantify the attack surface area for election administration. But I don't think it did much good, nor was it replicable (to new contexts).

So bravo. Please keep going.


After reading all your links, I'm still not sure why or where Differential Privacy is needed.

1) How could aggregated data (means, average, min max) be used by attackers? Aren't aggregated data already private? For example, the Google postgres extension returns aggregated data, why is DP required here?

2) In the case of sharing entire databases, if all the PII are removed, why does it matter that we can match two records from two databases? Yes we can do correlation between 2 databases, but if PII were not gathered and stored at all in any database, there would be no privacy issue in the first place.


Good questions =)

1) Note that the "min/max" example trivially leaks individual information: for example, releasing the max salary of employees of a company leaks the salary of the CEO. More generally, there have been numerous attacks on privacy notions purely based on aggregate data. One of my favorite is this one: https://blog.acolyer.org/2017/05/15/trajectory-recovery-from...

2) Typically, PII is not the only thing that can be used to reidentify someone, and matching records from different databases can sometimes infer sensitive information about people. One example: https://www.cs.cornell.edu/~shmat/netflix-faq.html


If you haven't already...

Applying differential privacy to that Netflix case study would be a terrific exercise.


I'm still not convinced, but I guess I'm lacking critical technical background to grasp it.

1) The CEO example isn't really a good one to me, given the wealth inequalities in the world, leaking CEO's salary is almost desirable... I tried to read the blogpost and paper about mobile data location. At one point they talk about aggregated data, but then in the paper : "This dataset is collected by a major mobile network operator in China. It is a large-scale dataset including 100,000 mobile users with the duration of one week, between April 1st and 7th, 2016. It records the spatiotemporal information of mobile subscribers when they access cellular network (i.e., making phone calls, sending texts, or consuming data plan). It also contains anonymous user identification, accessed base sta- tions and timestamp of each access.". So... the data is not really "aggregated"? The dataset literally lists some user IDs.

2) If I'm fired because my boss didn't like my history of movie, then it can probably be defended in court, depending on the country. I could also find another boss who has a natural sense of ethic and who doesn't judge me for what I watch.

Thank you for the links anyway. I will look at them again in a few day to see if I missed something


"The CEO example isn't really a good one to me, given the wealth inequalities in the world, leaking CEO's salary is almost desirable... "

Your value judgement about a potential attack vector doesn't disqualify that it is an attack vector.


After page 20 of your paper are pages of junk that I don't expect you mean to have there (at least in the PDF version).

Yeah, thanks, I messed up the upload to arXiv >< It'll be fixed tonight with the updated version.

Haven't read the paper yet, but have read the blog posts (which are awesome, BTW!).

I'm wondering if you have any thoughts on Frank McSherry's old blog post expressing his distrust for approximate-DP [1]. He seems to have different intuitions than your "almost DP" post expresses and makes criticisms that aren't quite addressed in your post.

[1]: https://github.com/frankmcsherry/blog/blob/master/posts/2017...


Based on a skim, it seems that this requires individuals (or whatever class of entity we don't want to leak info about) be expressly identified in each record of the queried dataset, eg with a uid column. So for a dataset that one wanted to use this method with that doesn't identify individuals (eg in the web log case, not logged in visitors, or simply not recording logged in user), one would have to heuristically assign a (potentially synthetic) identity for each record first. Is that right?

This was very insightful, thank you.

^ commenter is a security researcher who works for Google.

This whole area of research seems like it exists as a way to rationalize wide-scale data collection. Rather than focusing on the collective rights of all people being tracked, it focuses on the risk that any individual person faces from an attacker.


Apple uses differential privacy and evangelizes its use. DP is not either-or with consent.

Let's say your mapping app says 'Do you want to contribute traffic information during your drive to help provide better navigation experience for everyone' If you click "yes" and opt-in, do you, or do you not, want this to use Differential Privacy?


Since neither the post nor the repo explain it, some context if you have no clue what this is about: https://en.wikipedia.org/wiki/Differential_privacy#%CE%B5-di...

In short, you make half of your users send you fake data. For individual users it adds privacy, because you can't know which bits of data are real and which are fake. But overall you can adjust your algorithms to filter out the noise, so you still get some useful signal in aggregate.

I mean, the text explains it:

Differentially-private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual's data to be distinguished or re-identified.


This is literally: "$Method is a $buzzword that enables $DesirableThing". If you already know that $DesirableThing is desirable, this sentence tells you nothing about how $Method works.

it may take too much technical expertise to know that $method works. For example, most people don't understand asymetric encryption, but still "blindly" trust that it works because they are told by a trusted authority.

Probably the method itself is too technical. I was usually told about it that applying Differential Privacy tries to guarantee that users won't get more information from two different databases, one with PII and one without.

Well they don’t explain the mechanism well:

> The 2006 paper presents both a mathematical definition of differential privacy and a mechanism based on the addition of Laplace noise that satisfies the definition.


That text explains the goals, not what it actually is.

Imagine you want to run a survey to determine how many people have had an extramarital affair. There are obvious problems just asking people "Have you had an affair?" on a survey, especially a live one.

So instead, you ask the person taking the survey to secretly flip a coin and privately look at the result. Then, you word the question as "If the coin came up heads, OR if you've had an affair, check this box".

When you look at overall survey data, and you see 51% of people checked the box, then it's reasonable to infer that 2% of them had an affair. But even if I de-anonymized the survey and saw you checked the box, I don't really have good evidence that you had an affair.


Thanks. I found their technical paper to be approachable as well: https://arxiv.org/abs/1909.01917

I think this HN adtech paranoia is getting a little extreme lately, e.g. many of the comments from "ocdtrekkie" here ("Wow, Googlers in full force today.", "This is a market that needs to be shut down.")

I definitely share mixed feelings about these adtech companies, and tend to think these personalized, targeted surveillance ads should die. But what about just plain old contextual, relevant ads, e.g. ads for car parts on a hot rod site?

There's a place for web advertising in general - it supports "free" sites in a better than any other model invented thus far. I don't think it's gonna be possible to burn the whole place down and go back to 1993.

Subscriptions work for large high-value offerings like Netflix or NY Times, but average people with small sites are unlikely to be able to provide that level of value. Other alternatives like micropayments always seem to fail, because even if viewing an article costs 1-cent or 1/100 cent, it causes users to hand-wring and watch their usage carefully, deliberate about their browsing choices and eventually stop. It's a psychological thing - people loved it when AOL switched from by-the-minute billing to unlimited, even though they generally paid less under the old scheme. Suddenly they were free to just browse without thinking about it, which is also what ad-sponsored content enables on the web.

There are certainly people who just create for attention / altruism alone and get no material reward - more power to them - but they'll have to keep their day jobs.

In a way I'm gratified to see adblocking taking off, because the ad industry constantly misbehaves and does every sleazy desperate thing it can - from seizure-flashes and automatically-opening popup windows in the 90s, to performance-killing HTML5 garbage ads, distracting animations and creepy privacy-invading remarketing stuff today. Desperate, annoying junk that is indeed killing the web (and people's trust in advertising).

I personally think a move back to classy, non-personalized ads could be the way to go, ideally static images that don't track you and don't fry your CPU & battery with 700 javsacript libraries. That may be wishful thinking too. But perhaps the ad industry can try a little harder to stem the tide of garbage and win people's trust back a bit.


>>But what about just plain old contextual, relevant ads, e.g. ads for car parts on a hot rod site?

Those existed before Google/Facebook/et al were a thing. You don’t need “adtech” for them.

Adtech by and large refers to the insanely complex tech infrastructure behind tracking people across the Internet and using a variety of tricks and dark-semi-dark patterns to try to get them to click ads, at the definite but not immediately obvious expense of privacy.


Again I wonder whether all the tracking BS is necessary. Google, Facebook & Twitter aggregated a huge audience by giving away high-quality products, e.g. search, gmail, maps, etc.

They could make a ton of money selling ad space whether it's privacy-invading or not (indeed, search ads did just that for many years by targeting the search term, not the personal details of the searcher).


I think the problem is that the market demands growth and while basic search ads did yield insane profits, we are now approaching the end of the Internet growth period where everyone who has expendable capital is also already on these platforms. Tech companies have effectively run out of people to convert into customers and thus they have to do these crazy tracking shenanigans to try to squeeze as much out of every individual as possible.

I feel that what "adtech" refers to varies from person to person. It's becoming one of those charged terms where whatever online advertising techniques the user doesn't like are "adtech", while the techniques they don't mind aren't.

If the infrastructure to select and distribute context-relevant ads is not "adtech", what is it?


See, other than the personal attack (thank you for that), we agree! Contextual, relevant ads work great! There's no reason for targeting and collecting data on users, except that it provides justification for these companies to exist. Just about anyone can provide contextual ads, you need Google or Facebook scale adtech to stalk 7 billion people on the planet.

Around the announcement of Chrome's "Privacy Sandbox" effort, Google lied about the effectiveness of targeting advertising (claiming it was 52% more effective, despite independent studies settling for somewhere around a 4% difference), because it can't survive in a world where we realize it isn't necessary for ad revenue.


Apologies if I made it personal, or misread comments as advocating dumping advertising altogether, sounds like we generally agree - I personally suspect Google, Facebook etc. would do fine without the privacy-invading targeting stuff - they still have hugely popular near-monopolies on which to display ads of whatever kind - Google search, Facebook pages, Twitter feeds etc.

Curious to look into the true effectiveness of this personalized stuff too, even my mom seems creeped out by it. Fingers crossed there's enough backlash and adblock-boycotting that they drop it.


I actually used to strongly crusade against ad blockers. I think advertising is an important avenue for funding free content online. Between privacy violations and the amount of malware and scams distributed via large ad platforms, I've leaned towards ad blocking now, but I'm not fundamentally against seeing ads with my content.

This was a recent study on the effectiveness of personalized ads: https://www.zdnet.com/article/new-research-shows-personalize... (direct link is: https://weis2019.econinfosec.org/wp-content/uploads/sites/6/... )


Ironically, I cannot see the article on Firefox for iOS without disabling tracking prevention.

Really? I've got Privacy Badger working in my browser (chrome) and everything is fine. Wonder what the specific difference is.

Maybe because the privacy policy is “differential”?

You might have added some exceptions for Google in your Privacy Badger installation or PB hasn't recognised Google as a tracker (yet).

Although I have to add that my computer's installation of Firefox (with all tracking protections enabled) does not make this blog unreadable. My settings in ublock do though (third party requests are all blocked).


This site loads JavaScript from a large number of sites (~6-8 domains, 20 sites in total).

This is not "tracking" per se, you can get the site working by accepting JavaScript from the "right" sites (if you know which are not for tracking).


Fundamentally, Google's initiative on differential privacy is motivated by a desire to not lose data-based ad targeting while trying to hinder the real solution: Blocking data collection entirely and letting their business fail.

In a world where Google is now hurting content creators and site owners more than it is helping them[1], I see no reason to help Google via differential privacy when outright blocking tracking data is a viable solution.

[1] https://sparktoro.com/blog/less-than-half-of-google-searches...


> In a world where Google is now hurting content creators and site owners more than it is helping them

I have a hard time believing that because so few sites opt out of Google indexing.


The "source" provided also seems completely irrelevant. Just because I don't click any link after a Google search does not mean that Google doesn't provide value. If anything, it's the opposite.

Google has been providing results right in search, meaning that I no longer need to click a link. I would say 80% of my searches are answered by the knowledge graph and smart answers.


It's very relevant to the point that it's hurting content creators, though, who want people to actually visit their site, and not just read Google's summaries.

People that ask for an answer using "ok google", "alexa", or "hey siri" were never going to visit their website in the first place. Users running an ad blocker (and who doesn't these days?) are even worse for these websites. Making a deal with google might be their best option.

Why doesnt google pay a fee for showing the content of the website?

That is the whole point though. It's not providing value to the creators of the data you're getting from Google. If you get your answer from Knowledge Graph, Google scraped that data from a website to give it to you, but didn't compensate that website for it, because you never went to the website and gave them ad revenue. Google kept all the ad revenue, because you stayed on their site.

In the past, Google created value for website owners by both directing traffic towards them, and sharing ad revenue with them. Now it's doing neither.


Directing traffic to a website doesn't always deliver increased value to a website. If I can find a restaurant's hours or phone number without clicking through to the (very possibly terrible) restaurant website and decide to have dinner there, that's a win for both me and the restaurant.

It's true not all websites depend on ad revenue, however, there are other problems with Google's Knowledge Graph: Others can tamper with it. There are numerous stories about malicious edits to Google's Places data to redirect calls from legitimate businesses, or mark businesses that are open as permanently closed.

While a business can control the information on it's website, it can't control the information on Google's, and often that is not to consumers' benefit.


> but didn't compensate that website for it

They generally do compensate web sites. For example, when I ask for today's weather, it comes from weather.com and if I ask for the exchange rate for Canadian dollars, it comes from Morningstar. Both of those companies have a deal with Google to provide that information.


Google has a well-documented history of anti-competitive behavior: https://en.wikipedia.org/wiki/European_Union_vs._Google They were also literally stealing money from people using their ad platform: https://news.ycombinator.com/item?id=17103280

The FTC found that Google illegally took content from competitors, but Google was able to lobby away any punitive action: https://www.theguardian.com/technology/2015/mar/20/google-il...

If they do compensate any web sites it's only because the sites are too big to steal from or because they were sued and lost.


The problem is you have no choice. If Google is slowly killing you via starvation, but it's still feeding you crumbs at the moment. Nobody's answer is going to be to refuse the crumbs.

That's the danger of dealing in a commodity. With competition, the price of that commodity is going to approach $0.

> Nobody's answer is going to be to refuse the crumbs.

Except those that think about opportunity cost.

Google's instant answers is a case of them putting the user first and that's hardly ever the wrong decision.


It isn't putting the user first though: It's a decision to strip any semblance of accuracy, timeliness, or sourcing from a "fact", whilst simultaneously bleeding out the sources that collect such facts, ensuring such facts will get progressively less accurate and timely as instant answers take over.

Instant answers is a direct threat to organized knowledge on the Internet, and that can't, by definition, be putting the user first. When we look at my link in my original comment, we discover that instant answers gets Google a larger percentage of the ad money (all of it), and suddenly, we figure out who Google is putting first: Google.


> ensuring such facts will get progressively less accurate and timely as instant answers take over

I think your worry about this overblown. If Google doesn't do a good job with these, then that's an opportunity for somebody to do a better job.

If you are stuck on an ad-supported model from the days of the dot-com boom, then I'm not sure you have a place in the current ecosystem where a lot of searches are started with a voice query and the user expects a short, definitive answer. The ever increasing number of searches that result in no clicks is surely influenced by the rise of Google Assistant, Amazon Echo, Siri, etc...


It's not overblown, it's a problem that's already occurring. Remember that Google doesn't "do a good job with these", it just steals data from those who do. And Google is killing it's own sources: https://theoutline.com/post/1399/how-google-ate-celebritynet...

I don't see why you have no choice. Google doesn't stop you from using all the other mechanisms that exist to promote your site. In fact, I'm not sure I found any of the sites I regularly visit through Google.


I was very hostile to your take, but after thinking about it for a minute, this is an interesting point.

I don't think it holds up for commercial sites, but for news, hobbies, and blogs (which have been hit hardest by Google's vampire advertisement business) your statement is true for me!


Heh, then you're not the average user that types 'facebook' in chrome to pull up the website.

A coworker told me yesterday about watching another coworker try to get to a specific company website:

1. User opens the browser, which defaults to the correct website the user is looking for.

2. User goes to Google instead, and searches for the company name. Clicks on the link for the company.

3. User navigates through the company website for the exact same page that the browser was set to open to by default.


What other ways exist to run a successful ad campaign?

Off the top of my head, Facebook and Taboola have significant reach.

> Nobody's answer is going to be to refuse the crumbs.

Most people won't, perhaps, but I run a handful of reasonably popular websites, and I block Google's crawlers. I have no use for Google.


This announcement is about the release of an open source library. Using it doesn't help Google.

Wide acceptance of this practice legitimizes it for Google.

Differential privacy is a tool for doing many useful computations in ways that provide value to end users beyond advertising. Apple uses it too. It's like saying "wide acceptance of end to end encryption legitimizes it for Google". Do you want your phone to give you real time traffic jam information without giving away your position? Do you want to better recognize and classify photos with ML while keeping your photos private? That's differential privacy.

The rest, an implication of a desire for Google to fail even if the entire business model were based around privacy preserving techniques, makes the privacy saber rattling seem disingenuous and a means to a different end.


This about engineering choices. No doubt a phone can report on traffic without the system as a whole storing which phone it was.

You could run ML locally.

If you want to know how long patients are waiting, you can just record wait time without recording details of the patient.


> This about engineering choices. No doubt a phone can report on traffic without the system as a whole storing which phone it was.

This doesn't prevent fingerprinting. If you send me enough data, even if it is not associated with an ID, I can still use that data to deanonymize or track you. People should read the papers on DP instead of just randomly commenting "you could just do X". The security researchers studying the threat models aren't just doing things in an overly complicated way for no reason.

> ML locally

ML training depends heavy on the training data set. You may not have all of the training data, and the training data may involve other people's private data.

Also, your phone is not the best place to run training for a model that takes 8 hours of training time and gobs of memory. However, you can slice up the problem, and people's phones can work together -- Federated Learning, which goes hand in hand with differential privacy.

https://www.youtube.com/watch?v=89BGjQYA0uE


My phone can give me real time traffic jam information by traffic cameras which are placed along highways which can gauge the motion of the traffic without tracking me or my phone at all. I actually don't want anyone for any reason to be applying machine learning to my photos.

Again, differential privacy is about finding excuses to justify the existence and protection of a company that is actively harmful to society. And we need to stop pretending your employer cares about it for any reason but protecting ad revenue.


Traffic cameras don't give anywhere near the detail that crowd sourcing from mobile phones do, and you're just trading off one form of tracking which actually can be secured via DP, with another -- government CCTV cameras -- which CAN identify you via license plate recognition, and indeed, often DO. (https://www.eff.org/pages/automated-license-plate-readers-al...)

Moreover, there are public safety reasons to use crowdsourcing. To give one very important example, early-warning earthquake detection for example, where seconds of warning are crucial. Millions of people are walking around with mobile seismometers, which in aggregate, could detect with high degree of accuracy, an earthquake as soon as it starts, and notify people within 1-5 seconds. It can only filter out false positives by sampling a vast number of motion sensors over a large area, and such a notification system can only work if it is virtually free of false positives (https://myshake.berkeley.edu/) This technology could save many lives, but needs differential privacy to be secure.

Differential privacy does not exist because of Google, was not invented by Google, and is part of the academic research of the crypto/security community.

I can't speak about my company in aggregate, but I can speak about myself, and I and many of my peers do care about privacy, with cryptographically strong guarantees. I've been doing this for more then 20 years, shipped one of the first anonymizing web proxy servers in 1996, worked on early forms of crypto-currency, remailer and onion networks, long before they hit the public consciousness. The people interested in these things work on them because they believe in them and are passionate. Not everything people work on is done because of ads. Chrome didn't need to ship RAPPOR in 2014, there's no business model for it. Likely, it was driven by people who felt deeply about the need to do it.

The other hyperbolic claims are not worth addressing.


I appreciate your insight on the matter. I just want to point out that it's hard to take any claim you make about your desire to protect privacy seriously when you work for a company like Google.

Seriously though it was an insightful read.


Please don't harrass other users because of where they work. All that does is disincentivize people to talk about what they work on, which is also what they know the most about. That's guaranteed to make HN worse overall, not better.

https://news.ycombinator.com/newsguidelines.html


I'd like to point out that it's hard to take an opinion seriously that propagates the views of a company to everything that is associated with it. It undermines the actual issues of the company because now every action, person, or thing related to that company is an issue.

It's throwing the baby out with the bath water and too limited of a view for my taste.


It's good rebuttals like this that make me enjoy HN.

Traffic cameras can read and track license plate numbers, how is that private?

Depends on what they're built out to do. Highway traffic monitoring cameras don't (and usually aren't placed for a good angle to see them), toll enforcement cameras obviously do. Speed warnings signs are generally not cameras, I think, but radar, and hence don't.

Also, only the government can generally turn a plate number into a person's identity reliably (I doubt Google knows my license plate, though I'd be interested to know if they did), and even then, a license plate is far less likely to always identify a given person's movements than their phone's GPS.


Widespread government owned CCTV cameras have been widely recognized as a privacy threat, and you only need to look at countries outside the US to see how they are being used for license plate AND face tracking.

Speed cameras in the US aren't just radar, they photograph you, and even mail you a photo of yourself when you're caught speeding. This has lead to famous cases, like when a guy was caught speeding, and his home was mailed a speeding ticket of him in the car with his mistress, which his wife opened the mail.

Seems to me you're now hand waving away the long recognized danger of government surveillance, as in, actual danger, wherein government sanctions you based on surveillance, as opposed to theoretical dangers of someone showing you a Nike ad.

So while you're worrying about GDPR, this is happening: https://www.politico.eu/article/berlin-big-brother-state-sur...


Next year I can vote to remove the misogynist in charge of the United States, and hopefully replace him with someone better. (And even if I fail, he's gone in just another four years.) However, even Google's shareholders can't remove the misogynists in charge of Google due to the share class structure.

The government has problems but trends towards the people's benefit in the long run. Corporate greed trends towards enriching the lives of sick and disturbing rich men like Larry and Sergey.

Corporations are a far worse and far less regulated threat than government, with far less checks and balances in place to protect us. Throwing out the government bogeyman hasn't worked on me before and isn't going to work on me now.


[flagged]


> Maybe YOUR benefit, but you've been privileged to export misery and suffering to other people around the world.

This is exactly what I would say about someone cozily working at Google in Silicon Valley. I literally deal with the costs imposed on seniors and less technically literate users by Google's ad nightmare on a daily basis.

And before you get high and mighty about how bad the government is compared to Google, remember that Google is built upon and protected by that very same government, and that Google's NetPAC is actively contributing chunks of many Googler paychecks (perhaps even yours?) to fund the very worst members of that government: https://twitter.com/Pinboard/status/1164945275066056704


We've had many discussions with you before about overdoing your anti-Google passions on this site. It's clearly leading you to break the site guidelines, as in this comment, https://news.ycombinator.com/item?id=20888525, and https://news.ycombinator.com/item?id=20888324. Once again it has become a distraction and is turning this thread into a flamewar.

I've rate-limited your account again and we will ban you again if you keep using HN as a platform like this. I know your views are sincere, but the way you pursue them has obviously left intellectual curiosity behind a long time ago. Worse, inundating HN threads with whatever agenda you're prosecuting has a destructive effect on the discussions. This is not cool, and has happened more than once before (e.g. https://news.ycombinator.com/item?id=17216664). This is not a site for platforms, agendas, megaphones, campaigns, or harangues. It's for thoughtful conversation about curious things: https://news.ycombinator.com/newsguidelines.html.

Since the mechanics of the internet will now lead others to accuse me of pro-Google bias, I'll add that we've repeatedly banned accounts for pro-Google agenda-ness just the same way. Our concern is not Google or any other $Bigcorp, it's protecting HN.


And I and many others protest that, like we protested ALEC. I have friends at Microsoft who have lodged similar protests. I'm working on privacy preserving technology, that's my contribution. I'm trying to make things better.

Hand waving away US government's vastly more destructive tendencies, and the demonstrated abuses of government surveillance doesn't strike me as helpful. On the long laundry list of bad things Trump is guilty of, misogyny sits below the current proxy war in Yemen which started under Obama but has been boosted by Trump and Kushner.

It's now up to almost 100k dead, 45k wounded, and an incredible 84,000 children dead from starvation.

But hey, we have checks and balances on this right, thanks to FOIA? Can we get Trump's tax returns yet?


This isn't all that reassuring because it's easy enough to record license plates and figure out who they belong to later. It's like the bad ways of releasing an "anonymized" dataset that differential privacy is supposed to fix.

But again, that's only a problem if you've recorded the plates in the first place. Cameras functioning for live-only view with no recording storage, cameras without a good angle of the front or rear of cars or without the adequate zoom level for readability, or radar-based solutions are all incapable of violating your privacy in a meaningful way.

All of these are based on trusting the owner of the cameras to not do bad things (they could start recording. How would you know?). Which is no better than with differential privacy (you're trusting them to apply these protections when they say they do).

How do you know the cameras aren't recording? How do you know the resolution isn't lowered before being made public? How do you know there aren't other cameras that aren't made public?

If you're going to use a threat model that assumes malice on a "trusted" organization, you should do so for all such organizations: whether they be google or a government. It's only if you assume google isn't trustworthy, but governments are, that your threat model makes sense.

If you don't make that particular set of assumptions (and even if you do in many cases!) differential privacy is a net win.


> they could start recording. How would you know?

FOIA. That falls into the whole part where government has checks and balances that Google does not.


So you trust the foia process? How many years do you have? How much money? Do you know the right questions to ask?

How are those FOIA requests for stingray usage and tracking information going? Still blocked, last I checked. Like I keep saying: these arguments only work if you assume that the usg is all sunshine and rainbows, which is a naive assumption to make.

If you're willing to call NetPAC bad, which you do, you've got to agree that it's only bad if the people taking money do so and support the $badthings you claim google does, by not outlawing them. But if they're corrupt enough to do that, why aren't they corrupt enough to torpedo a few FOIA requests?

To put it another way if google is so bad, and you can't even get the USG to police google, how can you expect them to police themselves?


The government doesn't "police themselves", the system of checks and balances ensures one branch polices another. For instance: The courts have routinely shut down law enforcement agencies' attempts to misuse camera recordings.

This was a recent story I read: https://cdt.org/blog/digital-is-different-pole-camera-ruling...

Meanwhile, Google hid Dragonfly from the internal team that was supposed to review it's projects for ethics issues. And it was pressure from journalists, and then the US Congress that inevitably forced Google to shut it down.

Thankfully, despite NetPAC, it looks like enforcement for Google is finally coming. The hand of justice moves slow, unfortunately, but it eventually gets there.


> Meanwhile, Google hid Dragonfly from the internal team that was supposed to review it's projects for ethics issues. And it was pressure from journalists

No. It was pressure from employees, employees who had ethical concerns. Employees who, I'll note, you consistently badmouth for continuing to work at Google. It's not like a journalist just magically inferred that Dragonfly was a thing one day and wrote an article one day. And Congress had almost nothing to do with it, lmao.

> The government doesn't "police themselves", the system of checks and balances ensures one branch polices another

That's true for things that aren't FOIA. But you mentioned FOIA. Which exists as a way for the executive to police themselves. The courts do exist as a check on the executive, (and local governments). So again, how are those stingray FOIA requests going? Note also that "law enforcement" is an FOIA exemption.

> This was a recent story I read: https://cdt.org/blog/digital-is-different-pole-camera-ruling....

This was a targeted piece of tracking that requires a warrant. Dragnet monitoring often does not require a warrant. In other words, setting up a single camera to monitor a single house needs a warrant. Setting up a hundred cameras to monitor the public doesn't, although getting data from them to target a specific person might, at least if it was being used to imprison them. When was the last time Google imprisoned anyone again?

It's also really, really amusing to see you claim that Google is subject to no public accountability. There are absolutely ways you could try to hold Google accountable, if you actually think that they've done something wrong. You just need to be able to convince a Judge. Same as for the executive branch of the government.


It is true some Googlers leaked the data that started this. But everyone who organized the Walkout has already left, many others have left for ethical reasons regarding Maven or the like, and the number of events of unethical actions Google has undertaken is significant. At this point, it would be very suspect to me the suggestion that anyone truly ethical would remain at this juncture, unless they were actively attempting to sabotage and undermine the functional capabilities of the company.

But apart from that leak, it was Congress, not employees, that shut down Dragonfly. Google is happy to dismiss employee concerns, as they've done over and over again. But after Sundar was dragged into Congress and repeatedly questioned about helping China, shockingly, Dragonfly disappeared. It's very revisionist to suggest Google shut it down to placate their own employees, rather than avoid action from Congress.

Remember that after some 20% of Google's employees walked out with a list of demands, Google dismissed 6/7ths of those demands without any further comment on the matter... and the employees didn't do anything further to retaliate. Google is well aware that ignoring employee complaints works, but ignoring Congress does not.


FOIA applies to the federal government, not to state and localities which run most of these cameras. States have their own variations, but are much less responsive.

People can barely get their stuff back from civil asset forfeiture. Local and state governments are often far more corrupt and less transparent.


Pretty much every level of government is subject to a FOIA equivalent, and in fact, they're generally probably more responsive, because "national security" isn't an excuse they can hide behind.

Meanwhile, I'd like to request Google's search algorithm, and any emails revolving Google's discussion of searches for "mapquest". Any takers?

Which is to say, suggesting FOIA isn't all-inclusive is a pretty bad logical fallacy, when Google is subject to no public accountability whatsoever.


Local governments are notoriously corrupt, frequently destroy records, not keep them at all, and take years to deal with requests for information. Are you not unaware of the long history of civil asset forfeiture and the long, very long, timeline it takes for people to find out where their stuff even is, how much they even kept track of, and to get it back?

https://www.aclu.org/issues/criminal-law-reform/reforming-po...

"For people whose property has been seized through civil asset forfeiture, legally regaining such property is notoriously difficult and expensive, with costs sometimes exceeding the value of the property. "

Anyone with a long history on the internet, and in civil libertarian, privacy, and security circles would be aware of the long history of abuses, including targeting internet activists. Local law firms in the 80s and 90s, often pushed local governments to use CAF to seize computers from people suspected of piracy or hacking, often without evidence or conviction.

In a country with the highest per capita incarceration rate in the modern world, with a trail of dead and damaged bodies around the globe from military adventures, with systemic racism still in force today, and kids separated from parents and held in internment camps, some unable to even be reunited with parents and turned into orphans, I don't want to hear how easy it is to reign in our government because the evidence suggests restraint has been largely a failure.

The fact of the matter is, our government is not being checked and balanced, not by the legislatures, not by the fifth estate (news media), and not by the people. So on my hierarchy of needs: fighting climate change, war, poverty, gun violence, racism, disease, malnourishment, lack of access to healthcare and education, all of which are being held up by forces that have captured the government and by the apathy of the people and even the news media these days, your obsession that ad targeting is the top evil, and that the other things will be taken care of by oversight, rings hollow.

We are still living with the Patriot Act, the AUMF, National Security Letters, the Five Eyes agreements that enable problems like MUSCULAR, and that's been going on for 18 years now. Do I think FOIA requests will reign this in? Even Snowden didn't really reign it i much. And things like widespread automatic license plate readers and face detection being used by local governments, like with the voting machine changes, gerrymandering, and voter suppression, will largely be installed with only a wimper.



Do you have anything meaningful to contribute to this discussion or are you just here to shit on anything with "Google" in the title?

I thought until now it was Apple who was legitimizing Differential Privacy

Providing well defined privacy guaranties is legitimate. It doesn't become bad practice just because Google advocates for it.

I'm gratified to find someone here telling it like it is (IMO).

FWIW, that's how I read this too: If Google can get people on the "Differential Privacy"† bandwagon it diffuses their own culpability.

(I think privacy is effectively dead, so from my POV this DF BS is just propping up its corpse.)

†In practice, "Differential Privacy" is a euphemism for "no privacy".


You can’t think of any legitimate (i.e. defensible) uses of such a library?

Not really: Either I'd rather my data not be used by other parties at all, or I'd provide my data with consent and knowledge of how it is being used. I see no inherent value in a system designed to allow my data to be used without my consent while claiming my privacy is still protected.

> Fundamentally, Google's initiative on differential privacy is motivated by a desire to not lose data-based ad targeting while trying to hinder the real solution: Blocking data collection entirely and letting their business fail.

We don't usually call something infeasible "a solution", especially it means a fundamental structural change without giving a viable alternative to power a 300B market. So tell me, what is your proposal to give an approximately similar amount of efficiency to the market? If you don't have the one, that's fine but your criticism doesn't really resonate since at least Google tries to improve the status quo.


> at least Google tries to improve the status quo

I don't see Google trying to do that at all. I see them trying to ensure that the status quo will never change.


"...a 300B market."

Which market are you referring to?

Advertising? Search engines? Other?


Digital advertisement? If you count the overall value produced by online ads, it's probably even bigger.

I'm asking you. It's your rhetoric. What are you referring to?

Update: Bueller?


If differential privacy solutions lead to things consumers want more, like better products (think: more accurate search results without uniquely targeted tracking), then is blocking all tracking viable?

I don't understand -- why wouldn't it continue to be viable? Nothing about this proposal provides me with any reason to stop blocking tracking.

To rephrase: as a business, not as a consumer. If differential privacy leads to products consumers prefer over those with no tracking, a business with no tracking would be nonviable, or less viable than one that does some privacy concious, let's call it, tracking.

> If differential privacy leads to products consumers prefer over those with no tracking

Why would that happen? DP is a compromise. For people who are concerned about data collection, surely "none" is preferable to that compromise.


Yes. Millions are already doing it. Considering how much press privacy enhancements such as ad blocking gets, and how basically nobody is discussing how their search result quality went down afterwards, it seems pretty clear what people have chosen.

Can we not pretend your employer wants to use this for better search results and be up front that it's about better ad revenue, which isn't beneficial or desirable to consumers?

As an independent developer I really want to know what features of my app are actually being used (and what devices they're using it on, OS versions, etc). I try to talk to my users but you only hear from the loud annoying ones, not the silent majority and the conclusions are inevitably wrong.

I'd love a tool like differential privacy to gather statistics in a provably anonymous way. Without a tool like that, only companies with shedloads of money (like Google, Microsoft, Apple) can afford the market research (or the amount of spaghetti to throw at the wall) to compete.


> which isn't beneficial or desirable to consumers?

Why? They're paying moneys through conversions, which is a pretty significant commitment and proof of its usefulness. You can argue that there's no real incrementality here, but I can tell you that at least half of ad revenues in general very likely have causal relationships from my observation.


It's like the push to keep loot boxes legal by Microsoft and Nintendo. The EU is on the verge of outlawing their use because they encourage kids to gamble. MS and Nintendo hope that if loot boxes reveal their 0.00000001% odds they no longer resemble $5/spin slot machines.

Every company will try to lobby against new laws that threaten their business model.

Which is why lobbying needs to be tightly regulated and the relevant numbers made public. Democracy is about equal rights, not extra rights for the rich.

I feel the need to mention 'Surprise Mechanics" and EA at this point also.

Yup. I've spent some time researching this and watching and contributing to a lot of conversations with technical folks.

We can't instrument websites and applications. Data, once collected, never dies and nobody can be sure of where it will end up. We have to stop. The model itself is broken. Now it's up to the public policy folks to figure a way to get us out of this mess. I am not optimistic in the short run.


Something that always bothered me when I read posts on Google's blogs: why is it that it's always authored by a PM? Why can't perhaps a senior engineer also write an announcement post occasionally?

Yes, the names of the engineers on the team are present in the acknowledgement section: but, this is a single line at the bottom of the post, whereas the name of the PM and the fact that he is the author is featured prominently below the title. This pattern is common across many product/OSS library announcements.

Sure, one could argue that the PM has a holistic view of the product or library being announced, and that developing this perspective is in fact their job. But surely a sufficiently senior engineer can (and often does) have an equally holistic, or perhaps even more insightful overview. At least sometimes. Even if this were not the case, why not acknowledge everyone's contributions at the same place in the article?

I think this is symptomatic of the ubiquitous class divide between the "suits" and "nerds" in the corporate world.


In my observation, it's mostly because engineers usually don't want to write this kind of articles which needs to be reviewed by multiple stakeholders. If they want to write its themselves, I believe it's possible and there's some instances as well.

"Always" seems like a stretch. I just selected a random month of the google.ai blog archive and 5 of 5 were written by non-PMs.

In addition to Differential Privacy, Secure Multiparty Computation is another way to maintain privacy, while allowing computation across multiple users.

https://en.m.wikipedia.org/wiki/Secure_multi-party_computati...

The benefit of this is that you can get an exact computation, whereas with differential privacy the output is rougher.

The benefit of differential privacy is that it does not rely on the trust of a majority of other users; you can theoretically verify that a certain percent of the time your device sends out a wrong answer.


I think you are a bit confused. They are very different in what they guarantee.

The goal of MPC is to hide the inputs of the program. But it is okay for an adversary to make all sorts of inferences by looking at the outputs.

The goal of differential privacy is to limit the kind of inferences that an adversary can make about a particular user/input from the output itself.


MPC works only hand-in-hand with DP. MPC ensures that you don't have access to the "remote" dataset, but does nothing to mitigate model "memorising" specific private information (if we are talking about advanced analytics that is)

The only issue is that you won't be able to find reasonable libraries for it, all of them are just PoCs without testing or stability.

Kyber has a pretty solid implementation of Shamir Sharing. But Shamir Sharing itself has security concerns.

https://godoc.org/gopkg.in/dedis/kyber.v2/share

https://github.com/dedis/kyber


Secret sharing is a tiny component of MPC; you need much more to compute anything useful.


The problem I see with differential privacy is this: One part of the public doesn't care about privacy enough to demand things like that, the other part wouldn't trust the math and the implementation behind it.

I mean, I consider myself moderately knowledgeable about statistics, but even I have problems understanding DP. Worse, scientists who are supposed to use it will also have a harder time understanding DP over their usual methods.


I mean it's nothing to do really with the math. It's the fact that you have to send the real data to an untrusted 3rd party and rely on their word that they will anonymize your data.

And if a situation arises where a manager at Google has to make the decision to 'slightly' reduce the effectiveness of differential privacy because they need a certain metric for a report do you really think they're going to make the principled choice?


> the other part wouldn't trust the math and the implementation behind it.

I trust the math and method just fine, but I'm in the "won't trust the implementation" group. Ad companies like Google have demonstrated they can't be trusted too many times for me to think that they'll do DP in a way that goes against their business interest.


The same can be said for end-to-end encryption...

There are a number of interesting technical challenges related to making differential privacy work in production (e.g. implementing novel algorithms for ML and statistical inference, proving privacy properties).

If you are interested in learning more, my company (LeapYear) is hiring differential privacy researchers, as well as software engineers interested in developing an enterprise machine learning platform.

Some background on our team: We recently raised our Series B, and hired VMWare’s first VP of Engineering, who scaled VMWare from 15 to 750+ engineers. Almost all of our backend code is written in Haskell.

On the commercial side, we’ve signed several multi-million dollar contracts with Fortune 100 customers in financial services, healthcare, & tech, and deployed on sensitive data at petabyte scale.

Happy to answer questions and review applications submitted here: https://leapyear.ai/careers


Hi Colton!

Having worked at LeapYear for just over 3 years now, I can confirm that it's a good place to work and is solving interesting technical problems.

I can also answer questions, if anyone is interested.


To me what ever privacy google talks of doing is and should be taken with a massive Asterisk

It should just be wholly disregarded. These companies are utterly untrustworthy and any belief that they suddenly care about your interests for some reason is naïve.

I made a video about this a while back: https://youtu.be/gI0wk1CXlsQ

I think it's good that someone is putting in the work and open sourcing tools to make differential privacy easier. But at the same time I'm wondering if this is just a smokescreen put up by Google.


Differential privacy doesn't mean privacy via something like encryption. It means that a company can query the dataset without exposing sensitive information about a small population in the dataset.

You still have to trust the company hosting the dataset so distributed solutions lend themselves more naturally to trust.


While this is true, there's some nuance.

First of all, there's a lot of recent (and not so recent) work in Local Differential Privacy [1], which uses the "untrusted curator" model. Although this software doesn't use it, the article mentions RAPPOR, which is a good example.

Second of all, encryption protects your _data_, but not your _privacy_; that is, assuming your data gets used in any way, you have no guarantees about whether the result reveals anything you'd rather keep secret. Of course, if you're talking about normal encryption, your data _can't_ be used, but then you're not really sharing it at all, as much as storing it there (like Dropbox). But once you start talking about things like homomorphic encryption or secure multiparty computation, it's important to keep in mind that they are complements to differential privacy, not replacements.

[1]: https://en.wikipedia.org/wiki/Local_differential_privacy


I read the article, thought to myself, "let's see how HN finds a way to say this is actually bad for privacy", clicked through to comments here, and was not disappointed. The hivemind anti-Google kneejerking is quite out of control.

It really is getting out of control. Anything with "google" in the title gets overrun with people with an axe to grind. It is very tiring.

Wow, Googlers in full force today.

This isn't a market that needs to be powered. This is a market that needs to be shut down. Targeted advertising is inherently harmful to society.


You've crossed the line. Please stop. I replied further at https://news.ycombinator.com/item?id=20890092.

We detached this subthread from https://news.ycombinator.com/item?id=20888273.


Well, the problem here is that the market doesn't agree with you so it exists. I don't want to argue on the topic of justice (which is pretty subjective), but your argument should give a rationale what is the value of following your solution and why it is more beneficial to the society with an objective measures instead of giving a groundless dogmatic assertion.

Markets are amoral and should not be used to justify anything.

You're essentially arguing that something is beneficial to society because it makes money (mostly for Google).


That's your opinion. Not everyone shares that opinion. Advertising has its place in society - as a revenue stream for publishers, as a way to build awareness of a product or service. Just because you personally don't like something, doesn't mean it is harmful to society.

This is not a matter of opinion and focusing on advertising is a red herring, because Google is not just doing advertising, it's using mass surveillance in order to target advertising: https://en.wikipedia.org/wiki/Surveillance_capitalism

Surveillance capitalism is harming privacy and the idea of a free society. It gives the people holding the collected information massive power over others.


No, it is absolutely a matter of opinion. This is your opinion. Doesn't mean that it is fact.

From the way you're brandishing the word opinion, you must be using it with it meaning of "a view about something, not necessarily based on fact or knowledge", in an attempt to discredit my statements.

That's not the case, my view is based on facts:

1. According to their own privacy policy, Google is collecting data to "provide personalized services, including content and ads".

2. This privacy policy applies to all their services, which are used by billions of people. According to the same policy, the data includes search terms, watched videos, content and ads which are seen and interacted with, speech and audio data, purchases, contacts, activities on 3rd party platforms which use Google services, Chrome history, etc. They also collect phone call info including caller and receiver, etc.

3. Google's data collection practices have been investigated by several organizations. Here are some quotes:

"As demonstrated throughout this chapter, the ways that Google has designed its Location History and Web & App Activity settings are problematic in light of European data protection requirements. In this report we have questioned the legal basis Google has for collecting and processing this location data. It is questionable whether users have given free, specific, informed, and unambiguous consent to the collection and use of location data through Location History. It can also be discussed whether the user can withdraw his or her consent, since there is no real option to turn off Location History, only to pause it." - Norwegian consumer council.

"The combination of privacy intrusive defaults and the use of dark patterns nudge users of Facebook and Google [...] toward the least privacy options to a degree that we consider unethical". - Norwegian consumer council

They were already fined by CNIL for GDPR violations and are under investigation in other countries.

The FTC has been corrupted by tech lobbying, but a long time ago they did take notice of Google: "FTC Charges Deceptive Privacy Practices in Googles Rollout of Its Buzz Social Network".


> This isn't a market that needs to be powered. This is a market that needs to be shut down. Targeted advertising is inherently harmful to society.

That's an opinion. That's the whole comment that the person you're replying to said was an opinion. And it is.

> ...in an attempt to discredit my statements.

Nope. He wasn't even talking to you or about your statements.

You tried to do the same thing to me that you're trying to doing here, in another thread. You said that people "repeatedly" told me something when I actually just had made one comment in the thread at that point.

What's your deal?


In that thread about Windows 10 I actually mistook you for someone else that was saying something "repeatedly" when in fact you just said something once. Sorry... :)

In this thread eclipxe seems to be talking about my statements, since they replied to me saying "This is your opinion". Or maybe they also got confused and thought I was the person they were originally talking to.

In any case, I do think eclipxe is incorrect, because the statement which he said is an opinion is that "targeted advertising is harmful to society" and then they counter by saying that "advertising has its place in society". Targeted advertising is a totally different thing from mere advertising, and I've shown that one of the biggest companies doing targeted advertising has been repeatedly investigated and fined for their practices.

These fines are given for things which are harmful to society, QED.


Thanks for the explanation.

Good luck on your mission!


I expect Google to file a patent for this soon.

Apple has been talking (and implementing) Differential Privacy for years.

(differential privacy originated from independent research by Cynthia dvoark (name?) & others)

I think he was joking, referring to a recent trend where Google gets questionably broad patents on security solutions.


Chrome was shipping differential privacy (RAPPOR) before Apple.

I hope people are aware that Google has filed patents for Batch Normalization and Dropout. Both methods are very broad.

True, and would be a concern if Google had a litigious history of being a patent shakedown artist.

It's a concern regardless.

Agreed. My observation is that patent shakedowns are the last resort of a market loser. Once your company starts losing vast amounts of marketshare to competitors, the patent warchest comes out.

IBM and Microsoft were notorious for this, as was AT&T. Microsoft has reformed recently.


The problem of course, is that Google isn't the market loser... yet. So while Google doesn't have a history of patent shakedowns yet, if we look at other tech companies that are past their prime, we can make a reasonable guess Google will eventually join the patent shakedown game.

Is google using this in their own products? They were just fined $170M in the past few days!

I think you'll find legislators are less worried about something being "mathematically proven to protect privacy robustly", and more worried about just collecting money from someone publicly perceived as evil.

If they cared about actual privacy, they would go after no-name ad networks and data mining companies.


Do you have information showing that the FTC's ruling was unjust?

As far as the companies they go after, I think going after large brand-name ad networks and data mining companies like Google is at least as important as going after the no-name ones.


And why not start with the source? Google and Facebook sell your data to the ad networks, maybe Google and Facebook receive more complaints against then some unknown ad networks that we did not know even existed before GDPR forced the sites to disclose them.

Could you provide some source about Google selling your data to ad networks? Because Google explicitly says that they don’t do that and none of the fines received by Google is for selling your data. I’m curious to read about this.

Personally, whether or not Google actually sells raw data isn't that important to me. My objection is that they (and all other companies that do this) collect the data in the first place.


What you say was questioned recently by Brave, see HN play from 2-3 days ago.

(not selling per see, more like carelessly giving away)


First sentence of the 3rd paragraph:

> Today, we’re rolling out the open-source version of the differential privacy library that helps power some of Google’s core products.


Now, all they have to do is to find a way to prove us that they actually run this open-source code in Google's products

Download Chrome's source code, it's been using DP for years (https://static.googleusercontent.com/media/research.google.c...)

The fine had nothing to do with this.



Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: