Hacker News new | comments | show | ask | jobs | submit login
Facebook points finger at Google and Twitter for data collection (techcrunch.com)
61 points by cfadvan 8 days ago | hide | past | web | favorite | 66 comments





Contrary to most comments here, it is significant that other companies do. Facebook competes in the free market. If they scale back on data collection, that will hurt their offering to advertisers and cost them money that will go to Google and others. That lost revenue will harm their ability to retain talent and build new products, and ultimately cost them their user base. “We don’t track you” is not as compelling a feature as “free video chat across the globe.”

Facebook doesn’t operate in a vacuum. They can’t scale back data collection unless others do. Moreover, the United States doesn’t exist in a vacuum either. Scaling back all US companies will give a leg up to competitors in more lenient jurisdictions.


> Scaling back all US companies will give a leg up to competitors in more lenient jurisdictions

Laws can be written to only apply to American users. That would leave American companies free to compete on level terms in other countries.


Sort of, the issue is companies typically bootstrap in their domestic market. If laws are restrictive it will kill companies before they even get to the point of international expansion.

So Facebook is going with the tu quoque fallacy.

Perhaps the others will be in trouble in future too, but sorry Facebook, it's your turn now.


"Look, others are doing it too!"

Which was an argument of my kids when they were five. To which I replied then and I reply now, "That doesn't make it right."

This is rapidly becoming political and rule #1 in politics is “politics doesn’t have to be fair.”

I personally don't really care about what information they collect. I do hugely care who they give that information to & what they do with it.

Haven't heard anything about Google wholesale handing all my data over to anybody who clicks a couple buttons to sign up to their developer program though.

I'm guessing Facebook has the technical expertise to allow 3rd parties to run aggregate (non identifying) queries on the consenting users' data on servers they control and only allowing certain aggregated data, as well as limiting the number and type of queries 3rd parties are running.

Guess it's easier to just let anybody have at it to prove how valuable their platform/data is.


Google keeps their data private because they've already secured their advertising/search territory. Disclosing profiles would only undermine their competitive advantage.

It seems like Facebook gave away that information in hopes of developing a more engaging walled garden with 3rd party help, and perhaps some naivety.

> I'm guessing Facebook has the technical expertise to allow 3rd parties to run aggregate (non identifying) queries on the consenting users' data on servers they control and only allowing certain aggregated data, as well as limiting the number and type of queries 3rd parties are running.

This is a very challenging problem - multiple "non-identifying" queries can actually identify individuals. Differential privacy is the best solution so far, however it's still challenging to guarantee privacy without lowering quality far below competitors that don't care about consumers.


> Google keeps their data private because they've already secured their advertising/search territory.

That is one possible self-interested reason for them to keep the data private. They could also just care deeply about guarding user data they collect. This could be for either selfish or unselfish reasons -- it's both good business and moral. It's very hard to tell from the outside, because their actions would look largely the same either way.

Personally I think it's pretty naive to believe any company does any thing for a single reason. There are a constellation of reasons, some more important than others.


> multiple "non-identifying" queries can actually identify individuals.

Zero challenges from me on this. I 100% agree. However, it is doable. It just takes effort.


> I personally don't really care about what information they collect

If the government said "let's make a database of every Jew in America," people would get rightfully riled up. Yet we've allowed a single entity to assemble a database orders of magnitudes more detailed.


My opinions are influenced by (if memory serves) some Microsoft researcher who years back said something along the lines of how having all the data points is how computers can truly benefit an individual.

You can see the early fruition of this by looking at how useful Google Now/map planning etc are.

Computers could be our all knowing assistants.

That's on a very surface superficial level.

Imagine the kinds of benefits when this all knowing entity is applied to medical issues. Real substantive progress could be made.

However, today's mismanagement of data could scare people off all knowing entities for good & end up severely limiting real progress.

As with everything else, all that data is rife for abuse if the necessary legislation & technical limiting/auditing regarding access/use are not first put in to play.


There are tremendous benefits to centralizing these data. But that power cuts both ways. With the benefit of hindsight, Mark Zuckerberg and his acolytes created about the worst possible culture we could have picked, as a society, for entrusting these data to.

Worst possible? Pffft.

> If the government said "let's make a database of every Jew in America," people would get rightfully riled up.

This is an interesting case. You're completely correct, but it's basically a special privilege that Jews enjoy (and that has spread to a taboo on asking about religion on the census). If you're part of any other demographic group and you don't want to be counted, you get yelled at for not seeing the grand scientific project of the US census in the proper light.

But I don't really see that a database of every Jew in America is more inherently suggestive of abuse (if you're a Jew) than a database of every black in America is (if you're black).


> it's basically a special privilege that Jews enjoy

Any group with (a) a living memory of violent ostracisation and (b) some ability to conceal their group membership knows this.

Gay men and women, for example, would fight against a sexual orientation question on the census. (One wonders if Tim Cook, had he not been born a gay man in Mobile, Alabama, would be as sensitive to privacy issues.)

Side note: I used Jews in my example because we have a case study for them. In countries where databases of religious affiliation were kept, oftentimes for tax purposes, the Nazis took advantage.


Counterpoint: http://reason.com/blog/2018/04/06/i-dont-want-to-tell-the-fe...

> For the LGBT question, the exact opposite is happening: People who want a head count of gays and transgender people believe the data will then be valuable in influencing federal policies and spending on projects that benefit LGBT people—or, more accurately, to benefit certain LGBT organizations.

See also the effort by certain ethnic groups to split into a special MENA category on the census, despite having passed as white successfully for the last entirety of history.

It's not about ability to conceal group membership.


Hopefully, this starts a meaningful conversation about what other people are also doing. I am not a fan of Facebook but this might get the ball rolling on broader legislation similar to GDPR in US.

I really hope nothing remotely similar to gdpr is written into legislation in the US. I do not even know how I would get started writing a website that would adhere to GDPR requirements

So, on the one hand, I really would like a GDPR equivalent law in the US.

OTOH, anyone who says they clearly understand the implications of GDPR for their site has either spent a lot of money on lawyers or is lying. Let alone someone who has implemented it. Privacy by design requires deletion of data after legitimate interests and/or consent have expired, probably (!!!) in 3rd party systems. How, precisely, do you implement that?

Can you shadow-delete accounts for some period of time to allow users to change their minds? If no, what UI do you put on a "delete my account" button that has absolutely no undo, even in the 24h regrets period?

Do people have GDPR privacy rights over eg comments on YC that may mention them by nym?

Given the GDPR covers EU residents (not just citizens), as an American can I buy a plane ticket to Dublin and start requesting full data dumps? What rules are those provided to me under, and how do you make software that can do that?


There are plain english guidelines available for the GDPR, in the UK they are published by the ICO which is the government agency tasked with enforcing the law. I'm sure there are edge cases which aren't fully documented but as long as you're not pushing the edges of the law and are trying to stay within the spirit you will be fine. Probably.

0. You require the third party you passed the data on to delete data when you tell them. The third parties should tell the person that they now have their data, where they got it from, how they will process it and how to get in touch with their data protection officer. 1. You can but you must also allow someone to delete in full (assuming none of the many reasons to reject removal requests apply or you don't wish to exercise them). 2. This is murky, but probably not. There's a right of freedom of expression and information. 3. No, you have to be a resident not a visitor. You'd have to see how Eire define residency.


> OTOH, anyone who says they clearly understand the implications of GDPR for their site has either spent a lot of money on lawyers or is lying. Let alone someone who has implemented it.

http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELE...

It's long but the language is far easier than American legalese. The implications depend on your site/service behaviors. An RSS reader is pretty trivial, interactive social media... less so.

> Privacy by design requires deletion of data after legitimate interests and/or consent have expired, probably (!!!) in 3rd party systems. How, precisely, do you implement that?

Privacy by design is a design philosophy, it might be a pain to refactor into an existing system but the design constraints aren't onerous.

If your "3rd party system" is something like AWS, just delete the data. If you're sending it off to some other service, they do need to be GDPR complaint (the law covers this situation).

re: legitimate interests, we partitioned our data. Access logs, for example: one stream gets anonymized for simple analytics, another gets dumped into in-depth weekly analytics jobs, and the final log stream outputs encrypted auto-expiring S3 files with strong access control for infosec purposes. When a user withdraws consent, we just stop logging new information. Truly anonymized data is OK, our in-depth analytics data is purged within 14 days, and InfoSec is a justifiable legitimate interest.

> Can you shadow-delete accounts for some period of time to allow users to change their minds?

Yes. GDPR does not require instant response. You should be transparent about what will be kept and how long, a clearly communicated 24h shadow-delete is completely reasonable.

> Do people have GDPR privacy rights over eg comments on YC that may mention them by nym?

This is a good question, I'm also curious about quotes. The recent Google case suggests both fall under GDPR.

> Given the GDPR covers EU residents (not just citizens), as an American can I buy a plane ticket to Dublin and start requesting full data dumps? What rules are those provided to me under, and how do you make software that can do that?

Assume everyone is covered by GDPR.


> It's long but the language is far easier than American legalese. The implications depend on your site/service behaviors. An RSS reader is pretty trivial, interactive social media... less so.

Except the GDPR is full of hand-wavy stuff. Who needs a DPO? What is "large scale" in that context? How exactly do you conduct a legitimate interest balancing test? Who is your lead regulator and under what criteria as an American company can you decide?

Also, people have a lot more 3rd party systems than most think. Think transactional mailers, marketing mailers, billing systems, payroll, zendesk, etc.

And even an RSS reader is scary. What if someone follows a series of blogs about HIV treatments, or internal trade union politics? If that means you could infer the person is poz or is a member of that trade union, you now have heightened scrutiny data in your possession.


> Think transactional mailers, marketing mailers, billing systems, payroll, zendesk, etc.

GDPR has explicit provisions for all of these legitimate interests (notifications, clients, employees, customers). Most of these services are aware of and planning for GDPR, I wouldn't want to work with any that aren't.

> And even an RSS reader is scary. What if someone follows a series of blogs about HIV treatments, or internal trade union politics? If that means you could infer the person is poz or is a member of that trade union, you now have heightened scrutiny data in your possession.

Right, and I like that! Attempting to derive sensitive information should require consent, transparency, right to rectification, and stringent data handling requirements. It sounds like overkill for an RSS reader, but why the heck does an RSS reader need to do that kind of profiling in the first place? Maybe that's the right level of scrutiny and prior applications were unwarranted?

On the other hand, there are no concerns with simply storing the followed blogs.

> Except the GDPR is full of hand-wavy stuff.

Can't win, legislation is either micromanaged or hand-wavy... it's worth noting that some of the hand-waving is actually business friendly.

I'm not saying these laws are perfect. There is definitely room for improvement, but this is still a consumer win over the pre-GDPR wild west.


3rd party: the fact remains that doing deletions, both as a consent withdrawal and a privacy by design, is extremely complex. Particularly when privacy is withdrawn before a LI expires. You can hand wave it away as gdpr provides for this -- which isn't at all responsive to what I said -- but it's difficult to do nonetheless.

I never said the RSS reader is profiling. They don't have to be. Does the mere presence of the inescapable user data -- ie what feeds they monitor -- create heightened scrutiny, because someone else could infer with that data, were it to be leaked. It well may. I would seriously consider blocking EU users until this is sorted out.

Worse, the RSS reader could offer suggested feeds, and accidentally find themselves in possession of such data, entirely accidentally. Even if users were clearly asked if they wanted to see suggested data, or allow their data to be used to suggest feeds. They may not intend to derive sensitive data to possess it.

Or suggest you have a site like YC, and someone puts "hi, I'm poz" in their description. Tada, sensitive data.

The GDPR should have defined when a DPO is required, what a LI balancing test is, etc. Alternatively, the orgs could have pretended to be competent and issued guidance before -- oh right, they haven't issued final guidance yet. I'm sure 6 weeks is plenty of time.


Then you have no business with users data.

Can you tell me what I need to do to make a gdpr compliant website?

Is it the hacker news mindset that you should need a law degree to launch a website now?

GDPR is so bad that I would rather make a HIPAA compliant service than a GDPR compliant service.


The GDPR is long but trivial to read, probably high-school level at worst. The EU website has the full text here: http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELE... (warning: MASSIVE page)

If that's still too hard, a summarized form can be found at https://www.gdpreu.org


Thank you for showing me the supposedly trivial guide to understanding GDPR. The only thing that website has shown me is that no globally competitive tech company will ever grow out of the EU for the next hundred years or so.

So? Perhaps one of the facets of the GDPR is the EU’s willingness to accept that fostering “globally competitive tech companies” may not be in the best interests of itself or its citizens.

If that's the case, then they are absolute morons.

Yes building tech companies that people love to use and provide high paying jobs does not benefit the citizens of a country

Meth labs also create products with mass appeal and briefly high-paying jobs. Considering how social media is eroding American political discourse, Europe may be better-off in the long run even GDPR is as bad as you imagine.

I agree, how we are tracked needs to be more transparent/upfront. Clearly these companies will make every possible effort to hide what’s going on behind the scenes so this stuff needs to be regulated. Companies like Google/Facebook are not upfront about what they capture and how they capture it because they know many people would not be comfortable with it. Unlike non-advert business models, the actual transaction that is happening is tucked away in a EULA. This to me is the sign that consumers, especially non techies, need protection.

Under current administration? No.

I'm not so sure about that. The current administration can be kind of unpredictable (to understate it a little...), and this seems like exactly the combination of fuck-the-elitists and Trumpian populism that could work. Silicon Valley, culturally speaking, fits pretty clearly into the subculture that Trump set himself up in opposition to, in a way that even finance doesn't (given that the current govt is an uneasy alliance between business conservatives and right wing populism, as mediated by Trump's personal whims).

I think it’s more that “fuck the elitists” is a tune that can be morphed into from nearly any other tune.

But how would it get through Congress? Trump is essentially incapable of pushing legislation that Congress doesn't already support...

Because Obama or Hillary would surely do it. Come on man...

Trump is championing himself for deregulation, and fair trade. GDPR is additional regulation, and can be viewed as a protectionist move from EU side. I can't really think of a real reason why Trump would like it.

I feel like whoever wrote this article missed the point completely.

For the internet to function, websites need your information. If you want to log into a website using Facebook login, Facebook needs to know what website you are logging into.

When you watch a Youtube video on someone else's website, in order for that data to be sent to you they need to know what your IP address is and they need to know what website you are viewing the video on.

This is how the internet works. You cannot access something from someone else's servers without them knowing what your ip address is.

Are we invading people's privacy when we log ip addresses when someone visits a website hosted on our servers now?


Agreed, I found Facebook’s explanation perfectly adequate and respectable. They are basically explaining cookies and embeds/iframes to a non-technical audience — no more, no less. It makes sense to give other examples like Google ads and Twitter buttons in this context.

> They are basically explaining cookies and embeds/iframes to a non-technical audience — no more, no less.

Right but the main question was whether they were getting data on people from data brokers, and they responded by answering a completely different question.


"If you want to log into a website using Facebook login, Facebook needs to know what website you are logging into."

Agreed. But what if you don't want to log into that website, or maybe that site has no login at all. If it has a Facebook "Like" button, that site still sends a ping back to Facebook, letting them know you were there, and feeding Facebook's algorithm about your interests. Same goes for Google Analytics, a Twitter share button, or the growing list of tracking scripts which get executed upon page load without even showing a visual indicator that they serve some purpose.

I'd guess that the vast majority of visitors have no idea that by visiting a 3rd party site, they're feeding that visit to the list of 3rd party trackers you find on many sites today.


If people were educated with how the internet works they would know that the second you load some piece of content from someone else's servers they have your information as well.

> If people were educated with how the internet works they would now that the second you load some piece of content from someone else's servers they have your information as well

People aren't. That's why we have laws like Lemon laws [1]--so everyone doesn't have to be a specialist.

[1] https://en.wikipedia.org/wiki/Lemon_law


Yes, clearly every website user should have an in-depth understanding of how DOM elements are generated on the browser. Seriously, how can one know how a button is produced on a page without inspecting the underlying source code, or looking at the network traffic on the developer console? Even then, this behavior can be obsfucated in the code, and in order to produce the page to generate the content to inspect, you've already generated the remote ping.

You don't need to do that detailed of an analysis. If you see a YouTube video in a website, YouTube knows you saw the video. That simple.

I'm trying to understand what you people that are reacting to this Facebook incident so emotionally actually want done

Do you want logging ip addresses to be banned? Do you want cookies to be banned? Do you want embeddable html to be banned?

Because all I see is outrage without any real suggestions


HIPAA has IP addresses listed as one of the unique identifiers you must protect in your dataset.

Sure, when you're explicitly dealing with patient health data... which is not remotely in Facebook's realm.

Are you sure?

"Facebook sent a doctor on a secret mission to ask hospitals to share patient data"

https://www.cnbc.com/2018/04/05/facebook-building-8-explored...

"Facebook held a special breakfast for drug marketers about recruiting people for clinical trials"

https://www.cnbc.com/2017/09/07/facebook-held-a-breakfast-to...

"How Facebook can ‘unblind’ a clinical trial"

https://www.outsourcing-pharma.com/Article/2014/06/09/Pfizer...


Yes I'm sure, your links don't bring up any new points. Facebook does not have my medical records.

Only when associated with medical records...

I'm the same person as the other account btw, not trying to hide that.


Oh I completely understand that. What I meant to really say is that at least in some realms we already do consider IP addresses as personal information that should be protected.

> If you want to log into a website using Facebook login

I don't.


Perhaps the internet must know, but must it remember?

Yes. If it can't remember, there won't be progress on anything.

Yes absolutely, how do you think you prevent people from ddosing and how do you think we prevent spam if not with logs?

Also I'm surprised how nobody has mentioned how important this data is for AB testing.


Perhaps "using customers as guinea pigs" is not the hill you want to die upon? A/B testing is not a good justification for data collection. It may, actually, be a good reason for condemning it.

Well Facebook, if all of the other companies jumped off a bridge, would you do it too?

Saying "But they do it too!" is no defense. Though it does make the argument for stronger regulation.


I really dislike the, "well, they do it too!" form of argument. It shows a tremendous lack of maturity.

spiderman-pointing-at-spiderman.jpg

I am sure Facebook looked at today's internet discourse and thought - If whatabout-ism works for every discussion out there, why not us? Hence, they asked - What about Google? What about Twitter? They just dint go the full mile and asked - What about Amazon? What about those emails? What about "world peace"?

But, whatabout-ism misses an important point - we cannot get everything all at once. Everything happens one step at a time.


>That said, other tech companies have gotten off light. Whether it’s because Apple and Google aren’t CEO’d by their founders any more, or we’ve grown to see iOS and Android as such underlying platforms that they aren’t responsible for what third-party developers do, scrutiny has focused on Zuckerberg and Facebook.

Maybe Zuckerberg should do the same thing Larry Page did. Create a parent company for Facebook (maybe call it Library), of which Zuckerberg becomes the CEO. Then find someone else to be the CEO of Facebook. In addition, just like Alphabet and other "Bets", Library could have other "Books".

Then, instead of Zuckerberg CEO of Facebook, you have Zuckerberg CEO of Library.


I think this discounts the ways in which Facebook differs from Google and Apple as it relates to open, sprawling access to user data.

The alphabet conglomeration didn't obfuscate similar issues, nor would Zuckerberg following suit. Facebook's issues are related to unfettered access to a user and their graph(s) and sensitive information. More importantly that they knew of the risk and did little in response.

Obviously Google and Apple have their own privacy issues to account for, but it's fundamentally different.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: