Hacker News new | more | comments | ask | show | jobs | submit login
Facebook Wanted Gizmodo to Kill Investigative Tool (gizmodo.com)
580 points by uptown 6 months ago | hide | past | web | favorite | 167 comments

As a background, the problematic part of Facebook’s policy is that, while end-users are the owner of all content they produce, they [Facebook] essentially consider all their HTML their copyright. They have argued in court (and won [1]) that scraping essentially requires creating an in-memory copy of this HTML before the user-produced content can be extracted, and thus is a violation of their copyright.

It’s a very unfortunate precedent in my opinion, that limits pretty much all forms of scraping. Copyright law should not be abused for this, but yet here we are.

If the Knight’s institute manages to get Facebook to amend their policy that is a great step in the right direction, but I feel that copyright should not be abused to restrict scraping like this: the copyrightable content is not what is being scraped at all, it is immediately thrown away after the user-produced content is extracted.

This means there is no way for a user to even legally get their own content extracted from Facebook this way, and on top of that, web browser make in-memory copies of Facebook’s HTML all the time; heck, they even specifically instruct browsers and proxies to cache a lot of their content.

[1] http://www.knowmad.law/single-post/scraping

> that scraping essentially requires creating an in-memory copy of this HTML before the user-produced content can be extracted, and thus is a violation of their copyright.

Isn't copyright a bound on distribution? How would this work unless the HTML itself was distributed by the scraper? If you use SAX processing to stream it, does that change interpretation? (Not sure about how to interpret the in-memory copy.)

If they violate my copyright, can I sue them for royalties every time they rendered the image they never got a license for and distributed it to someone? I'd imagine their TOS/EULA precludes this by just equating uploading to refusing my license, but how enforceable is that? It neatly bypasses any consumer protections.

If facebook is acting this way, maybe I've been wrong all this time--maybe I shouldn't be paying for their service to avoid ads, maybe they should be paying me for the license to my content AND not showing ads.

> Isn't copyright a bound on distribution?

No, it covers any reproduction in a tangible medium, whether or not you distribute.

I believe a court has ruled that RAM copies count. I'm not finding the case now though.

I think the journalists would have a strong Fair Use argument...

You may be thinking of when Blizzard used this argument for going after Bot creators in World of Warcraft: https://arstechnica.com/gaming/2008/05/blizzard-attempt-to-k...

> No, it covers any reproduction in a tangible medium, whether or not you distribute.

What's the justification of this? How does this afford protection to the copyright owner if you don't distribute? Is it purely to protect against derivative works bypassing copyright?

Copyright is more or less what it says in the name - the right to legally control the making of copies. The ability to divide making a copy from distributing it is, to my understanding, mostly not relevant to copyright law. Distributing requires making copies, and copying involves distribution of sorts.

Ugh, it seems broken and not actually protecting rights, but rather forcing terms of service to apply in lawsuits.

I mean, we're talking about a system that was originally designed to apply to literal books printed with literal printing presses. It's going to be messy at best in this age.

This is a bit like selling pens, after copyrighting the chemical composition of the ink, and then insisting that anything an author writes with the pen belongs to the company.

How can this kind of reasoning survive a sane court of law?

I’m thinking of it like Facebook is a book (well, a journal that’s a compendium of everyone else’s writing). You’re only allowed to read from it while Facebook holds the book and turns the pages for you. If you pick up the book and turn the page yourself, you’ve taken possession of the book, which is theft since they didn’t give you permission.

I think that's absolutely how Facebook wants it to work (as well as many other website operators), but absolutely opposite of how the process physically works. Also, I wouldn't equate copyright violation (especially non-commercial) and theft.

What I mean is that is how Facebook wants it to work.

It has nothing to do with sanity.

With rare exception, law in the US is "More Fair" the more money you have. So yes, reading the body of law from that perspective should provide significant clarity as to why the ruling was done that way.

>law in the US is "More Fair" the more money you have.

I've read enough history to know this isnt exclusive to the US, but rather the same in every country in human history.

Only religion, politics, and birth-right seem to have any pull similar to money.

In school I learned that my country is a democracy, but I grew up to learn it is practically a plutocracy. I count on crowd funding platforms to help the majorities confront the titans. I witness this battle all the time

Agreed. Also, those are all categories that notably correlate strongly with money.


"Judge Fogel’s reasoned that MAI Systems Corp. v. Peak Computer, Inc. and Ticketmaster LLC v. RMG Techs. Inc. indicated that the scraping of a webpage inherently involves the copying of that webpage into a computer’s memory in order to extract the underlying information contained therein. Even though this "copying" is ephemeral and momentary, that it is enough to constitute a "copy" under § 106 of the Copyright Act and therefore infringement.[9] Since Facebook’s Terms of Service prohibit scraping (and thus, Facebook has not given any license to third parties or users to do so), the copying happens without permission."

Seems like an overwrought technicality that also ignores fair use. Isn’t the problem with violating copyright distribution, anyway?

I wonder if we can abuse copyright in a similar way to protect our privacy. Eg have our address contain a verse from a poem we've written thus anyone sharing our against our express permission is in violation of distributing our copyrighted content.

Unfortunately the real problem isn't so much the concept but rather the cost of hiring lawyers skilled enough to take on multi-million dollar companies.

An email address can be quite long, enough to pass off as a poem or other form of art which can be copyrighted.

Apple is doing the same with an encryption key embedded in Mac’s SMC chips. The key is something like “our hard work by these words guarded, please do not steal (c) Apple Computer Inc” and is used to decrypt system binaries.

While providing no actual security I guess it gives them a leg to stand on should anyone recreate a machine capable of running MacOS (the OS would not start without this key being in the hardware).

Several machines are capable of running macOS. http://www.osx86project.org is a project focused on doing exactly this. However, these machines do not use SMC chips. They emulate the SMC.

By including the copyrighted phrase in the emulating kext, and thus violating copyright law. Yes.

That phrase is not copyright infringement. Apple lost that case in a court of law.

Link to the case? if this hack actually doesn't work as far as law is concerned then I'm surprised it's still there.

This is not a relevant case at all. OSK0 and OSK1 keys were never a secret. But they are a literary work more or less so still copyrightable.

Apple argued it is a trade secret; the judge disagreed with them on that one.

So this poem here:

Your karma check for today: There once was was a user that whined his existing OS was so blind, he'd do better to pirate an OS that ran great but found his hardware declined. Please don't steal Mac OS! Really, that's way uncool. (C) Apple Computer, Inc.

Is not a trade secret.

Perhaps they need to use a phrase that rhymes (see discussion about using poems elsewhere in this thread).

Some poems rhyme; some don't. It's a weird distinction to make, technically speaking, and not one I'm sure I'd hang my hat on.

They could have used "all your rounded corners are belong to us". They had better outcome with courts with that argument.

It's actually "ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc"-a 512 bit AES key.

The EU proposed granting copyright protection to individuals for their content, and HN was in an uproar about it because it would interfere with individuals' ability to copy other people's work freely.

Rather than user submitted content I was thinking more along the lines of copywriting your contact details and other personal details that companies like to harvest and/or sell to anyone willing to pay.

> scraping essentially requires creating an in-memory copy of this HTML

Just navigating to Facebook requires creating an in-memory copy of the HTML...

Per stingraycharles (https://news.ycombinator.com/item?id=17709612 )'s comment, the problem is not so much the in-memory copy as the modification of that copy (although, of course, Javascript also modifies the in-memory copy of HTML … all these technical technicalities get in the way of the legal technicalities that Facebook wants to interpret as "we get to do what we want, and you don't get to do what we don't want").

This is not true at all.

First of all, the HiQ case was against LinkedIn.

Second of all, the ruling held that LinkedIn COULD NOT bar HiQ from scraping the public content.

LinkedIn appealed the ruling to the 9th Circuit, which heard the case this spring but has not ruled.

> They have argued in court (and won)

Do you have a reference for the case you're thinking of? Missed this one and really want to read it.

> heck, they even specifically instruct browsers and proxies to cache a lot of their content.

Solution: implement the scraper as a proxy that forwards the HTML, but also saves the user-authored content.

Sorry to pick on you, but this kind of "solution" is childish and misses the point. The problem is with the law, and the solution is to make the law better. Breaking the law in new ways will only end up wasting more time in court.

I always find antiscraping solutions cute. They are trivial to beat.

That would never hold up in Canadian court. Any sane court, really.

If your statement is correct then a workaround would be to perform scraping in a non-covered country such as CA and, once extracted, proprietary barriers are thus removed returning ownership to the individual.

Seeing facebook run ads constantly makes me think they'd find a way to turn you against the courts, and turn the courts against you.

Money can do lots of PR and marketing work.

Would extracting the data via a browser extension be in violation? It doesn't require making an in-memory copy of the HTML, since it's just navigating the HTML structure that has already been rendered by your browser, in the same way that your eyes (or a screen reader) must do to read the page.

Apparently the way it is presented is that, because scraping per definition alters the content of the copy (i.e. it extracts parts of it), it is therefor a modification of the original works. This is mostly the problematic part.

I have thought about just using the DOM, and/or using Chrome itself as an intermediary using its dev toolkit but I think that from a legal perspective, this is all just a modification of the original works as well.

IANAL, of course.

I'm pretty sure that's not the way it was presented at all. Modifying original works is actually an allowance of copyright, not a restriction. It's called Fair Use, which states that "transformative" uses of copyrighted material, or "those that add something new, with a further purpose or different character, and do not substitute for the original use of the work," are "more likely to be considered fair." [1]

Extracting the user's data out of the page is what's _allowed_; since that is the part of the page that Facebook does not actually hold a copyright claim to, I'm pretty sure you could make the claim that extracting the user's own data out of the page into a structured format for the user's own use is transformative.

The part that wasn't allowed was literally making an ephemeral in-memory copy of the page as-is, which included Facebook's copyrighted HTML.

(I am also not a lawyer)

[1] https://www.copyright.gov/fair-use/more-info.html

Maybe this would work when facebook shipped its own browser, as it stands you can patch firefox/chromium to extract the data, you are not making any more copies than the browser would have unpatched.

I realized the futility of trying to keep my data private. I used to never upload my contacts to any site, because I valued my privacy and my contacts' privacy. Then I found things like my phone number being detected by Facebook, and I realized that it didn't matter what I did. Any one of the hundreds of people who had my phone number could upload my information without my consent, and thus nothing I did mattered. I was infuriated but now I just give up, because there's no way to control your own information.

Even something as private as DNA is no longer under my own control. A sibling signed himself up for 23andme, and I realized that once he did that, I'm as good as being up there too. My entire family tree can now be identified because of that single person, which is scary as hell.

This is a good observation, but I don't think the conclusion is that you should give up. It means that successfully controlling the data through good personal hygiene is infeasible and we need to use legislative power to curb abuse. We cannot rely on private companies to just do the right thing. It is possible to maintain our privacy if we are willing to use force of law.

If you think you can legislate the world away from big data and data mining through legislation you are terribly naive. At most it will punish a few surface level players like FB while the massive industry continues to flourish at every single major consumer company with a modern marketing dept. Not to mention the many billion dollar companies you've never heard of who trade in this data, all the consumer financial companies, the government itself (IRS and security agencies), etc etc.

This rabbit is not going back in the hat. We need better tools which respect users, behaviour change via a new common sense, and decentralization. Not flashy legal theatrics that is out of date well before its written in law.

If you think you can legislate the world away from big data and data mining through legislation you are terribly naive. At most it will punish a few surface level players

If you think you can legislate the world of crime away through criminal law you are terribly naive. At most it will punish a few criminals here or there while the massive crime industry continues to flourish.

Just because something bad is hard to regulate doesn’t mean you just give up. Besides, saying decentralization is the solution is even more naive, because centralized systems are more convenient and people prefer convenience to privacy.

Sure, centralized services are popular, but there's nothing unrealistic about decentralized/federated ones, it's not a pipe dream like the blockchain craze. The main difference in end-user convenience is the difference between "@username" and "@username:server.net".

you think its naive to change the law but "behavior change" is a viable alternative?? the reason the op is claiming we need legislation is precisely because there is no incentive to change behavior.

im not even an EU citizen and already ive benefited from GDPR

That's like trying to solve global warming by telling everyone to be more efficient. Good luck with that.

That's not at all an adequate comparison to what I said.


The gov't already has monopolies on force and law. And we can use law to limit the gov't's collection of data as well. The power to vote may not feel like much, but it's quite a lot more power than most people have over private companies.

Everybody used to be in the phone book, a few people were unlisted and it seemed a bit strange but otherwise you could just look anybody up in a directory and call them.

I don't really understand the motivation trying to keep one's phone number or address secret.

I'm old enough to have lived with a phone number published and not have to worry about it. Then came the telemarketers. Still, long distance calls were expensive and such calls were relatively rare. Next, long distance calling became effectively free and telemarketing calls were becoming frequent. However, it was still only a dumb phone - so nothing to worry about. Then, smart phones came along, storing personal information and being potentially vulnerable to threats. They were also used for authentication. But hey, not to worry, since it's just like before: everybody used to be in the phone book ...

First they came for the Landlines. I was not a landline, so I did not speak up...

It's not the digits of the phone number or the name of your street that are sensitive. It's the way they connect with all other aspects of your life.

The white pages never told me who was friends with whom. Or what everybody's sexual preferences were. Or any other data except their name and phone number.

One FB notices that you exist, they build a profile on you that includes much more than these two pieces of info. OP was trying to avoid being "noticed" by FB in the first place, to prevent the profile being assembled.

Similarly it took effort to look up random numbers in phone books. Few if any were ever printed in a way that made it easy to to from number to name, only name to number.

> Few if any were ever printed in a way that made it easy to to from number to name

Law enforcement, most libraries and other phone companies always had access to public printed reverse number phone books in the late 1980-1990's.

Point taken, not familiar any libraries that was large enough to have such items shelved.

> I don't really understand the motivation trying to keep one's phone number or address secret.

Random people/companies calling you at home, harassment, generally people on the internet being less than civil especially if you are a woman.

Earlier, phone numbers were listed. Currently, phone numbers and the frequency of interaction between these phone numbers are recorded by Facebook. The situation is not similar to a simple yellow page directory. It's basically a yellow page directory which real time updates - Bob called Alice 5 times and sent 10 messages.

Previously I would agree with you but with more and more of our life moving online, the repurcussions of someone malicious having identifying information about you are grave. For instance, you could be a target of a 2FA hack based on a cell phone number spoof.

That's just more incentive for companies to stop doing insecure 2FA over mobile networks.

When uploading your contacts the secret isn't your or their phone number. It's the relationships. The phone book never listed all of your friends and family.

Look beyond the face value and see the implication

Not everyone who has a mobile or is online grew up in that time period so the perspective is foreign.

Your DNA is hardly private. You leave it everywhere you go. If someone wanted it they'd only have to follow you around for a short while.

And that kind of following around was what certain secret police forces did back in the day. And it was inefficient unless they already had a reason to spend manhours on you.

Now however the computer will do it, virtually for free, and then the hoard of data can be dug through for incriminating (or socially awkward) information.

I hold a copyright on my DNA.

No you don't, you are not the creator of it therefore have no claim to copyright. It is not an original work of authorship. You can have a patent on DNA but not on naturally occurring DNA according to the supreme court.

>you are not the creator of it therefore have no claim to copyright.

So his/her parents hold the copyright?

Overruled. I will charge 42M per minute to unauthorized copies of my DNA. My shadow ninjas will sneak into their vaults and take back every single strand of my DNA.

Which means what in context of this conversation?

That LinuxBender has a sense of humor

If anybody does (which they don't) it would probably be your biological parents, jointly.

> I was infuriated but now I just give up, because there's no way to control your own information.

Many feel this way (about many issues these days). Let me suggest that it's not at all hopeless:

1. Security is never about 100% protection; it's about making attacks more difficult. You can't protect your data perfectly, but you can make attacks more difficult. A simple solution for phones: Use a VOIP number and forward it to or connect to it via a pre-paid phone (without giving your name). You can't stop people from sharing your number, but you can disassociate your number with your location and other data on your phone.

2. Better yet, use Signal, Tor, etc. Use cash when you can.

3. The problem can be fixed via legal action. It's happened to a great degree in Europe. If everyone walking around saying 'it's hopeless' actually did something about it, it would change quickly. And people on HN are generally the ones who understand these issues and need to take the lead for friends, neighbors, and coworkers.

One big thing to keep in mind about Facebook et al, is that every page with a like/share/whatever button from a third party effectively allows said third party to build up a profile of who the person operating the computer is.

Even if you do not have a Facebook account, Facebook likely has a shadow profile on you based on browser fingerprint etc showing up on various sites.

Also, your typical adblocker is not configured to block these buttons by default.

A sibling signed himself up for 23andme, and I realized that once he did that, I'm as good as being up there too

Should it ever become legal they’ll sell that info to insurance companies in a heartbeat. That’s the real risk.

This completely depends on where you live. I'm glad the EU introduce the GDPR, and as privacy becomes more and more of a hot button issue I trust in them to keep bringing out the big guns.

For the time being GDPR is quite a joke. The guys that actually did planet-scale tracking never stopped and the little guys that did nothing wrong posted obnoxious notices on their sites. If I had to guess, it will take at least a decade to get things moving.

I once managed a Facebook account on behalf of someone else. Normally, you would expect this to confuse Facebook greatly. Instead the opposite happened: Whenever I was logged in as them, Facebook recommended me people that they knew.

This means you can find out who someone knows if (a) they’re not already a Facebook user, and (b) you create a profile for them. Whoopsie.

I kind of want to write a dystopian story about babies being bartered based on their social network rating, which of course is derived from their family history. We’re all reduced to numbers in the end.

hyper-reality: https://vimeo.com/166807261

> I kind of want to write a dystopian story about babies being bartered based on their social network rating

Would be an interesting "prequel" to the troubling but good "Nosedive" episode of Black Mirror. https://en.wikipedia.org/wiki/Nosedive

I kind of want to write a dystopian story about babies being bartered based on their social network rating, which of course is derived from their family history.

I'd have to go look for it but I swear there's an episode of either Next Generation or Deep Space 9 that covers almost this exact topic to the letter. Anyone else know what I'm thinking of??

That's the one (lo and behold it was Voyager, after all)! Curious if that's where your mind was when you threw the idea of writing that story of yours?

Star Trek was one of my biggest influences. I’ve been trying to think of a way to modernize the series without cannibalizing Roddenberry’s ideals. Or go the other way and do a Game of Thrones style Star Trek universe.

Also https://m.youtube.com/watch?v=Wkedd6A6_mU was one of the funniest trek videos I’ve seen.

Looks like CBS is bringing back Picard for a series - https://twitter.com/SirPatStew/status/1025840545216823296

> Or go the other way and do a Game of Thrones style Star Trek universe.

Focussing on the Klingon Empire (or the Mirror Universe Terran Empire) would let you do that without even having any tension with existing canon.

How did Facebook identify that person? By e-mail address?

Quoth Facebook, from the article:

“We don’t expose this information via our API and we don’t allow accessing or collecting data from Facebook using automated means"

Oh man what a world we live in. Facebook is free to leech your phone of every last bit of personal data... but heavens forbid an end user or journalist tries to learn something about Facebook.

> heavens forbid an end user or journalist tries to learn something about Facebook

And more specifically, the tool is for learning what Facebook knows (or thinks it knows) about you.

... and which they are supposed to provide upon request.

... to individual users. And individually, your data isn't worth much. It's the aggregate that surfaces the really interesting things.

> ... to individual users.

... Which is what this tool does.

And while I take your point about aggregate value, I personally have a great deal of interest in data about me.

Wouldn't it be interesting to a finance firm, insurance company or potential employer?

Yes, depending on the size of the loan a borrower would pay good money for such data on a single user.

That being said, a less nefarious use could be lender has a std mortgage rate of 4.25%, but if you give us a FB data dump, you can quality for a 4.00% rate.

> That being said, a less nefarious use could be lender has a std mortgage rate of 4.25%, but if you give us a FB data dump, you can quality for a 4.00% rate.

Or, with your FB, the bank realize you have an higher risk of defaulting/dying before paying the full mortgage, so they bump the rate to 5%

...depressing indeed.

At least some are trying to counter all this privacy invasion (EU, Apple, etc.) Let's hope they will continue doing so.

> At least some are trying to counter all this privacy invasion (EU, Apple, etc.) Let's hope they will continue doing so.

Not so sure. While they may be doing more than FB obviously, both example parties you listed are more than happy to support privacy invasion when it furthers their ends. You don't need to lower your standards to find counter examples nor do you need praise those that are only a little less bad. In my view the word "all" in your statement is verifiably false. We should be more nuanced and call out bad when we see it no matter the org instead of giving out org-level labels.


> proprietary hardware, intentional slowdowns, and insane app fees

Android phones are also effectively proprietary (especially phones from manufacturers such as Samsung and Xiaomi) and seemingly get slower over time. But yes, the apps are cheaper/free thanks to sideloading and a good community.

(RMS-Mode On) What I'm referring to as Android, is in fact, Google/Android, or as I’ve recently taken to calling it, Google plus Android. Android is not (really) a usable operating system unto itself (for most people), but rather another component of a fully functioning Google system made useful by the Google Apps, Google Play Services and vital system components comprising a full OS as defined by the end user. (RMS-Mode Off)

If either way you're going to have a proprietary phone, you might as well pick the one that has less of an interest in data mining you.

Nobody’s saying that Apple is not self interested. The claim is that it is Apple’s self interest not to violate its users’ privacy. In contrast to Google and Facebook, Apple’s business model does not require Apple to know as much as possible about their users. It does however require Apple to offer something that its competitors does not or cannot, in order to justify its higher prices. Privacy is one such thing.

In fairness it was a pretty old phone, without a secure enclave iirc.

> 'didnt help'

Why the scare quotes?

This line would have more impact if Facebook themselves observed any ethical or legal limitations on their own data scraping that they don't think they can get away with.

The problem is that it’s more difficult for people to get an army of lawyers to represent them than businesses. As such, it’s usually more difficult for people to defend their rights in cases such as these.

Facebook forbids them to learn something about Facebook. It was the phone's fault when they didn't limit what data Facebook pulled from it. Modern ones do just that.

Awww they don't like it when you mine their data. Poor lil' guys.

I mean, they're not really getting access to some secret Facebook data. They're getting access to the user's data, including their username/password.

Who is "they"? Gizmodo is not getting access to the user/password neither do they have access to the user data. Quote from the article: the information was stored on individual users’ computers; we weren’t collecting it centrally

Since the stuff they're probing isn't included in the APIs, they gotta have some account info to provide insight into this behavior.

I agree with your sentiment, but then if said journalist sold said data to foreign government, then how different would that be to what happened with Cambridge Analytica?

Yes, being able to access your data is nice, but you also have to balance it with people trying to trick people into leaking their data for nefarious reasons. Obviously, here, it's an open source project and we can see that it's (probably) secure, but then where do you draw the line?

First, it’s getting data about your data on your behalf —what you end up doing with your own data is kinda up to you. It’s yours. This is a great tool to shine a light on what FB knows about you.

W:re CA. FB was touting and selling this service to any and all intetested parties. It’s only retroactively that FB and others are crying foul. I mean sure, FB should not have sold thet data or made it available to any interested party, but that’s their actual business model.

> FB was touting and selling this service to any and all intetested parties.

Where do you get that from? My understanding is that the API used was closed in 2014, and existing applications were given until 2015 to stop using it. That's when the data was gathered, and that data was used later in 2016 by CA.

In this case, this is another API that is closed, and this application is trying to get around that. Again, I understand that this application has good intentions, and is hopefully secure, but they are still breaking the rules. If anything happens, then it's Facebook who is on the line.

Sure, if that happened that would be bad, but in this case only the user of the app could see their own PYMK data - the tool did not send data to the journalists.

Facebook closed up a lot of API endpoints after the CA incident became big (including public page post data, to my annoyance).

And this is one of those APIs that is closed. This application is intentionally going around it, and if any data ends up leaking, all the people in this thread demanding access to data will be the first to attack Facebook for not doing a better job at guarding user data.

It seems like they were able to get the data just fine, so they're not guarding user data now. I think the best way for facebook to protect this data is to not have it.

Before the CA incident became big actually. Most of the APIs used by Cambridge Analytica (in particular, the ones that let apps access data about a person's friends) were removed all the way back in 2014.

This tool only works with the user's personal authorization and direct consent.

If the journalist was selling data about themselves, which is what this tool collects, I wouldn't mind.

As a Facebook user I think I agree with Facebook's position here. It makes sense they block crawlers like this; especially if they ask users for their username/password.

> Facebook is free to leech your phone of every last bit of personal data... but heavens forbid an end user or journalist tries to learn something about Facebook.

I'd argue the risk isn't that a journalist is learning about Facebook, but rather that it has access to all of the users information and username/password.

Looks like you didn't read the article.

The tool was changed to redirect users to a Facebook sign-in page to log in, which is the proper way to sign in. They just found another reason to deny them access after.

It's also not collecting or centralizing data, this is just for the user that uses the tool to learn more about the tool, nothing is sent to Gizmodo.

Gotcha. After looking through the source on Github I'm more at ease. One always has to be skeptical about things that involve authenticating to important accounts.

Can you explain where in the article you read about a crawler, or how Gizmodo would be getting any of the information gained via the app? The app in question isn't a crawler, and the information is stored on the client side, never touching Gizmodo servers.

> Facebook disagreed and escalated the conversation to their head of policy for Facebook’s Platform, who said they didn’t want users entering their Facebook credentials anywhere that wasn’t an official Facebook site—because anything else is bad security hygiene and could open users up to phishing attacks. She said we needed to take our tool off Github within a week.

I mean, this is a fair point. It wouldn’t be the first time that a Github tool was forked to surreptitiously send information to a 3rd party.

So they updated the tool to let users login to Facebook.com through the browser and just hijacked the session cookie to gain access to the pages.

Since this is a program which runs on the users machine, and downloads standard Facebook pages over their standard HTTP interface, I don’t see how Terms of Service can differentiate between accessing Facebook services through this program versus Chrome.exe or Edge.exe.

What makes one thing a user agent and another thing not a user agent?

I think the difference in intent and purpose between PYMK Inspector and Edge/Chrome is pretty obvious.

Yes, Edge/Chrome is for the drone users that are Facebook's bread and butter whilst PYMK Inspector users have some semblance of awareness of user privacy and the asymmetry of the situation.

Putting on my 'big business c*nt' hat I can totally understand why Facebook wouldn't want PYMK users.

I became deeply concerned when the article seemed to suggest that Gizmodo was more worried about losing their Facebook page, or being sued, than sticking to a high standard of journalistic integrity. By the sound of things Gizmodo was saved by a Cambridge Analytica deus ex machina.

I can't help but think that there will be ever more regulation because of it.

Y'all know that that Facebook page is part of the income stream that helps them keep the lights on for that journalism, right? And that getting sued is a fast way to no longer being able to do that journalism?

The problem it should raise for you is letting overwhelmingly omnipresent companies chill journalism in this way, not that the journalists worry about being chilled.

Not sure I agree, it's the job of the 4th estate to speak truth to power, that is why it is afforded special privileges and protections.

If you predicate your ability to speak truth to power on a business model that relies on the power structure you are criticizing you undermine your raison d'être.

Cool. So what's your solution when private businesses are busily making sure that those "special privileges and protections" don't mean anything to the people who have to hear you speaking truth to power?

I hear your criticism... I personally don't have a solution as news media isn't my area of domain. However, there must be a technical solution to the problem. I've seen one or two HN threads discussing a decentralised web. Perhaps that's indeed the solution?

Since it is not using the Facebook platform or API, what is Facebook's recourse? If it was using the API and there was a developer with a Facebook account they could "hold accountable" then they could do something, but there is no specific Facebook account associated with creating this software. Ban/censor the Gizmodo page on Facebook.com? That would be an interesting gambit that would surely backfire in the court of public opinion.

This is just like Facebook banning anything which tries to track unfriending, another thing you can't do through the platform API and that they have actively worked to oppose and suppress.

Let's take Facebook's argument to its logical conclusion which would suggest that they should go on the warpath to ban all password manager applications that store a user's Facebook username and password.

Facebook's recourse is to change their HTML specifically to break this tool. Which leads to a cat-and-mouse game.

Is there a good reason not have a law that prohibits service providers from mandating the means by which you form requests your machine sends to them.

I totally get limiting the volume and frequency of requests. That's fair. Bandwidth costs providerds money afterall. But why really should you have any say over how I formulate my requests? Whether they comee from your app, a personal script I wrote or I hand form them in Fiddler, what's the problem besides the service provider's control-mongering.

Am I missing something here?

The DMCA provides a mechanism for the government to define some activities that do not violate the DMCA--essentially, a whitelist of activities. It's done by the Librarian of Congress and the exceptions last 3 years. This is how jailbreaking became clearly legal.

It provides certainty for activities that, under the law, are arguably legal--but making the argument in court would be very expensive and time-consuming.

What if a similar process was instituted for the CFAA? This would take some of the reins out of the hands of self-interested parties like Facebook, and provide certainty for people who are operating in good faith in the public interest.

An example of this would be to say that it is legal for a person to use automated tools to observe their own authentic interactions with hosted software, and it is legal for other people to make such tools available to the public. That would cover Gizmodo's tool, I think.

Had Gizmodo not changed the login approach, could FB send a DMCA takedown request in good faith to GitHub? I'm asking for a...um...friend that is developing a client-side-only open source tool that allows them to automate tasks on websites they visit from their computer, some of which may require credentials.

Is the law settled with regards to distributing non-commercial tools to do something a site may not allow in their ToS, or is it similar to doing it yourself where it's fuzzy based on the reasonableness of the ToS and intent and a whole bunch of other things?

It is creepy how much data Facebook has on users. I created a fake Facebook account using an email account not associated with my name and using a made up fantasy name. I created it on a computer at a large hospital while logged into the computer with a coworker's credentials. Facebook immediately suggested all my family and friends upon creating the account. I still haven't figured out how they did it.

Did you ever log in to FB with your own account on that machine? If you did, they leave a cookie there; even if no cookie, perhaps they used canvas fingerprinting (It's frightening how many websites do -- there's an about:config setting in Firefox that would let you know and block that).

Did you have your phone with you, with the FB app? If you did, they likely have your exact location, and possibly also the exact location of the browser.

A false positive costs them essentially nothing; So they'll always offer the 50-or so highest scoring matches to you. It's possible that those were matched to you because you had just logged out from the same public IP 5 minutes before (even if it was another machine on the network), and there were no other sources of information about that fake account.

It is creepy. It should be illegal, but unforunately, it's legal -- and the governments are happy about it because it's often illegal for THEM to collect all this data, but it's not illegal to let FB collect it and ask them for a copy.

I did have my phone with me. Did not log into Facebook at all with my real account at the computer before making the fake account so no cookie.

May be some of your friends had this email in their address book?

Nope, one time throw away email account. Never used it before, never used it again.

Even though the code does something possibly illegal (by violating copyright) when executed -- can't it still live on github as free speech? (like an anarchist textbook or 3d printing schematic of a gun or bitcoin core)? Can't it just be demonstrative and educational, and if you download it and run it you may be held liable but otherwise the code itself is fine and legal to simply exist?

Facebook knew they were dealing with a popular blog, yet it seems they decided to approach the issue in the most expensive and inflammatory way possible. Why not simply implement mechanisms for making the data more difficult to scrape and let the app die on it's own?

Scorched-earth policies serve as a warning to others who might try the same thing.

So who here is cutting the cord? I quit about 12 months ago, the dissonance of using the service and knowing how bad it was got to be too much to handle. It isn’t the end of the world! It’s actually quite nice to miss all the junk that constitutes a majority of the networks content.

It may sound crazy to suggest techies could lead an exodus off Facebook, but look at twitter: techies lead the influx onto that service, so why not the other way around? When are we going to put our money where our mouths are, “vote with our feet” and leave Facebook (i.e. permanently delete your account)? Otherwise all this handwringing seems meaningless- they keep misbehaving because they know no one’s going to do anything.

Hi, ex-FB'er here.

> Facebook has nearly limitless access to all the phone numbers, email addresses, home addresses, and social media handles most people on Earth have ever used.

They're overlooking an obvious one which is location. If you and a stranger use your FB apps from the same restaurant the same evening, the stranger will appear higher in your search results of someone with their name. It would be reasonable for this to feed PYMK.

You have to feel a little sorry for them in the position they currently find themselves in. If they allow journalists this kind of access and power, what’s to prevent bad actors using it as well?

The very thing they are in trouble for is giving third parties (which would be to include journalists) unfettered access. It’s kind of disengeneous Gizmodo didn’t even recognize their own cognitive dissonance in this situation.

> You have to feel a little sorry for them in the position they currently find themselves in. If they allow journalists this kind of access and power, what’s to prevent bad actors using it as well?

On the same lines, here's another question - what prevents Facebook from turning into a bad actor and using the power it has?

Is the data from PYMK subject to GDPR / available for export? As a user I should have a way to archive and keep track of PYMK, whether it be a third party open source tool, or via some other means?

Under GDPR it pretty clearly falls under "opinions we have about you", and ought to be requestable. I'm wondering when the first GDPR case against FB will substantively materialise, and it surely will. They're the obvious first target.

Might they also be scanning the wifi networks we all connect to and if they see us on the same network as someone else N amount of times, go ahead and make the suggestion?

It was unclear to me from the article what the outcome was. Did they keep the tool up and Facebook got bored and went away?

"Shortly thereafter, in March, Facebook’s world exploded, when it was revealed that Cambridge Analytica had gotten access to the profile information of millions of Facebook users, going through what was considered an “official route” in 2012. Facebook stopped bothering us about our PYMK Inspector, and the tool currently remains up."

likely they still have an issue with it, but as the article implies they are busy with other issues/data scandals to make a further issue with the "tool" at this time atleast.

I'm glad events as such surface to the public. While the merits of Facebook are undeniable, they are an evil company.

Pretty rich given that violating privacy is Fb's business model.

These large tech companies hate people who tell the truth and expose their evil intentions. Alex Jones got kicked for telling the truth. Richie Allen got blacklisted to speaking truth.

Wow. Fuck Facebook.

Google needs to be investigated as well. It violates people's privacy to the same degree as FB.

Like we've already asked, please stop posting generically about Google and Facebook. We're here to learn and this isn't getting us there.


Meta: Who in HN is pushing this story down? It currently has 3x the points as the "Facebook Field Guide to ML" story, currently at #1. They were published around the same time and this currently is at #6.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact