It’s a very unfortunate precedent in my opinion, that limits pretty much all forms of scraping. Copyright law should not be abused for this, but yet here we are.
If the Knight’s institute manages to get Facebook to amend their policy that is a great step in the right direction, but I feel that copyright should not be abused to restrict scraping like this: the copyrightable content is not what is being scraped at all, it is immediately thrown away after the user-produced content is extracted.
This means there is no way for a user to even legally get their own content extracted from Facebook this way, and on top of that, web browser make in-memory copies of Facebook’s HTML all the time; heck, they even specifically instruct browsers and proxies to cache a lot of their content.
Isn't copyright a bound on distribution? How would this work unless the HTML itself was distributed by the scraper? If you use SAX processing to stream it, does that change interpretation? (Not sure about how to interpret the in-memory copy.)
If they violate my copyright, can I sue them for royalties every time they rendered the image they never got a license for and distributed it to someone? I'd imagine their TOS/EULA precludes this by just equating uploading to refusing my license, but how enforceable is that? It neatly bypasses any consumer protections.
If facebook is acting this way, maybe I've been wrong all this time--maybe I shouldn't be paying for their service to avoid ads, maybe they should be paying me for the license to my content AND not showing ads.
No, it covers any reproduction in a tangible medium, whether or not you distribute.
I believe a court has ruled that RAM copies count. I'm not finding the case now though.
I think the journalists would have a strong Fair Use argument...
What's the justification of this? How does this afford protection to the copyright owner if you don't distribute? Is it purely to protect against derivative works bypassing copyright?
How can this kind of reasoning survive a sane court of law?
With rare exception, law in the US is "More Fair" the more money you have. So yes, reading the body of law from that perspective should provide significant clarity as to why the ruling was done that way.
I've read enough history to know this isnt exclusive to the US, but rather the same in every country in human history.
Only religion, politics, and birth-right seem to have any pull similar to money.
"Judge Fogel’s reasoned that MAI Systems Corp. v. Peak Computer, Inc. and Ticketmaster LLC v. RMG Techs. Inc. indicated that the scraping of a webpage inherently involves the copying of that webpage into a computer’s memory in order to extract the underlying information contained therein. Even though this "copying" is ephemeral and momentary, that it is enough to constitute a "copy" under § 106 of the Copyright Act and therefore infringement. Since Facebook’s Terms of Service prohibit scraping (and thus, Facebook has not given any license to third parties or users to do so), the copying happens without permission."
Unfortunately the real problem isn't so much the concept but rather the cost of hiring lawyers skilled enough to take on multi-million dollar companies.
Apple is doing the same with an encryption key embedded in Mac’s SMC chips. The key is something like “our hard work by these words guarded, please do not steal (c) Apple Computer Inc” and is used to decrypt system binaries.
While providing no actual security I guess it gives them a leg to stand on should anyone recreate a machine capable of running MacOS (the OS would not start without this key being in the hardware).
So this poem here:
Your karma check for today:
There once was was a user that whined
his existing OS was so blind,
he'd do better to pirate
an OS that ran great
but found his hardware declined.
Please don't steal Mac OS!
Really, that's way uncool.
(C) Apple Computer, Inc.
Is not a trade secret.
Just navigating to Facebook requires creating an in-memory copy of the HTML...
First of all, the HiQ case was against LinkedIn.
Second of all, the ruling held that LinkedIn COULD NOT bar HiQ from scraping the public content.
LinkedIn appealed the ruling to the 9th Circuit, which heard the case this spring but has not ruled.
Do you have a reference for the case you're thinking of? Missed this one and really want to read it.
Solution: implement the scraper as a proxy that forwards the HTML, but also saves the user-authored content.
Money can do lots of PR and marketing work.
I have thought about just using the DOM, and/or using Chrome itself as an intermediary using its dev toolkit but I think that from a legal perspective, this is all just a modification of the original works as well.
IANAL, of course.
Extracting the user's data out of the page is what's _allowed_; since that is the part of the page that Facebook does not actually hold a copyright claim to, I'm pretty sure you could make the claim that extracting the user's own data out of the page into a structured format for the user's own use is transformative.
The part that wasn't allowed was literally making an ephemeral in-memory copy of the page as-is, which included Facebook's copyrighted HTML.
(I am also not a lawyer)
Even something as private as DNA is no longer under my own control. A sibling signed himself up for 23andme, and I realized that once he did that, I'm as good as being up there too. My entire family tree can now be identified because of that single person, which is scary as hell.
This rabbit is not going back in the hat. We need better tools which respect users, behaviour change via a new common sense, and decentralization. Not flashy legal theatrics that is out of date well before its written in law.
If you think you can legislate the world of crime away through criminal law you are terribly naive. At most it will punish a few criminals here or there while the massive crime industry continues to flourish.
Just because something bad is hard to regulate doesn’t mean you just give up. Besides, saying decentralization is the solution is even more naive, because centralized systems are more convenient and people prefer convenience to privacy.
im not even an EU citizen and already ive benefited from GDPR
I don't really understand the motivation trying to keep one's phone number or address secret.
The white pages never told me who was friends with whom. Or what everybody's sexual preferences were. Or any other data except their name and phone number.
One FB notices that you exist, they build a profile on you that includes much more than these two pieces of info. OP was trying to avoid being "noticed" by FB in the first place, to prevent the profile being assembled.
Law enforcement, most libraries and other phone companies always had access to public printed reverse number phone books in the late 1980-1990's.
Random people/companies calling you at home, harassment, generally people on the internet being less than civil especially if you are a woman.
Now however the computer will do it, virtually for free, and then the hoard of data can be dug through for incriminating (or socially awkward) information.
So his/her parents hold the copyright?
Many feel this way (about many issues these days). Let me suggest that it's not at all hopeless:
1. Security is never about 100% protection; it's about making attacks more difficult. You can't protect your data perfectly, but you can make attacks more difficult. A simple solution for phones: Use a VOIP number and forward it to or connect to it via a pre-paid phone (without giving your name). You can't stop people from sharing your number, but you can disassociate your number with your location and other data on your phone.
2. Better yet, use Signal, Tor, etc. Use cash when you can.
3. The problem can be fixed via legal action. It's happened to a great degree in Europe. If everyone walking around saying 'it's hopeless' actually did something about it, it would change quickly. And people on HN are generally the ones who understand these issues and need to take the lead for friends, neighbors, and coworkers.
Even if you do not have a Facebook account, Facebook likely has a shadow profile on you based on browser fingerprint etc showing up on various sites.
Also, your typical adblocker is not configured to block these buttons by default.
Should it ever become legal they’ll sell that info to insurance companies in a heartbeat. That’s the real risk.
This means you can find out who someone knows if (a) they’re not already a Facebook user, and (b) you create a profile for them. Whoopsie.
I kind of want to write a dystopian story about babies being bartered based on their social network rating, which of course is derived from their family history. We’re all reduced to numbers in the end.
Would be an interesting "prequel" to the troubling but good "Nosedive" episode of Black Mirror.
I'd have to go look for it but I swear there's an episode of either Next Generation or Deep Space 9 that covers almost this exact topic to the letter. Anyone else know what I'm thinking of??
Also https://m.youtube.com/watch?v=Wkedd6A6_mU was one of the funniest trek videos I’ve seen.
Focussing on the Klingon Empire (or the Mirror Universe Terran Empire) would let you do that without even having any tension with existing canon.
“We don’t expose this information via our API and we don’t allow accessing or collecting data from Facebook using automated means"
Oh man what a world we live in. Facebook is free to leech your phone of every last bit of personal data... but heavens forbid an end user or journalist tries to learn something about Facebook.
And more specifically, the tool is for learning what Facebook knows (or thinks it knows) about you.
... Which is what this tool does.
And while I take your point about aggregate value, I personally have a great deal of interest in data about me.
That being said, a less nefarious use could be lender has a std mortgage rate of 4.25%, but if you give us a FB data dump, you can quality for a 4.00% rate.
Or, with your FB, the bank realize you have an higher risk of defaulting/dying before paying the full mortgage, so they bump the rate to 5%
At least some are trying to counter all this privacy invasion (EU, Apple, etc.) Let's hope they will continue doing so.
Not so sure. While they may be doing more than FB obviously, both example parties you listed are more than happy to support privacy invasion when it furthers their ends. You don't need to lower your standards to find counter examples nor do you need praise those that are only a little less bad. In my view the word "all" in your statement is verifiably false. We should be more nuanced and call out bad when we see it no matter the org instead of giving out org-level labels.
Android phones are also effectively proprietary (especially phones from manufacturers such as Samsung and Xiaomi) and seemingly get slower over time. But yes, the apps are cheaper/free thanks to sideloading and a good community.
(RMS-Mode On) What I'm referring to as Android, is in fact, Google/Android, or as I’ve recently taken to calling it, Google plus Android. Android is not (really) a usable operating system unto itself (for most people), but rather another component of a fully functioning Google system made useful by the Google Apps, Google Play Services and vital system components comprising a full OS as defined by the end user. (RMS-Mode Off)
If either way you're going to have a proprietary phone, you might as well pick the one that has less of an interest in data mining you.
Why the scare quotes?
Yes, being able to access your data is nice, but you also have to balance it with people trying to trick people into leaking their data for nefarious reasons. Obviously, here, it's an open source project and we can see that it's (probably) secure, but then where do you draw the line?
W:re CA. FB was touting and selling this service to any and all intetested parties. It’s only retroactively that FB and others are crying foul. I mean sure, FB should not have sold thet data or made it available to any interested party, but that’s their actual business model.
Where do you get that from? My understanding is that the API used was closed in 2014, and existing applications were given until 2015 to stop using it. That's when the data was gathered, and that data was used later in 2016 by CA.
In this case, this is another API that is closed, and this application is trying to get around that. Again, I understand that this application has good intentions, and is hopefully secure, but they are still breaking the rules. If anything happens, then it's Facebook who is on the line.
> Facebook is free to leech your phone of every last bit of personal data... but heavens forbid an end user or journalist tries to learn something about Facebook.
I'd argue the risk isn't that a journalist is learning about Facebook, but rather that it has access to all of the users information and username/password.
The tool was changed to redirect users to a Facebook sign-in page to log in, which is the proper way to sign in. They just found another reason to deny them access after.
It's also not collecting or centralizing data, this is just for the user that uses the tool to learn more about the tool, nothing is sent to Gizmodo.
I mean, this is a fair point. It wouldn’t be the first time that a Github tool was forked to surreptitiously send information to a 3rd party.
So they updated the tool to let users login to Facebook.com through the browser and just hijacked the session cookie to gain access to the pages.
Since this is a program which runs on the users machine, and downloads standard Facebook pages over their standard HTTP interface, I don’t see how Terms of Service can differentiate between accessing Facebook services through this program versus Chrome.exe or Edge.exe.
What makes one thing a user agent and another thing not a user agent?
Putting on my 'big business c*nt' hat I can totally understand why Facebook wouldn't want PYMK users.
I can't help but think that there will be ever more regulation because of it.
The problem it should raise for you is letting overwhelmingly omnipresent companies chill journalism in this way, not that the journalists worry about being chilled.
If you predicate your ability to speak truth to power on a business model that relies on the power structure you are criticizing you undermine your raison d'être.
This is just like Facebook banning anything which tries to track unfriending, another thing you can't do through the platform API and that they have actively worked to oppose and suppress.
Let's take Facebook's argument to its logical conclusion which would suggest that they should go on the warpath to ban all password manager applications that store a user's Facebook username and password.
I totally get limiting the volume and frequency of requests. That's fair. Bandwidth costs providerds money afterall. But why really should you have any say over how I formulate my requests? Whether they comee from your app, a personal script I wrote or I hand form them in Fiddler, what's the problem besides the service provider's control-mongering.
Am I missing something here?
It provides certainty for activities that, under the law, are arguably legal--but making the argument in court would be very expensive and time-consuming.
What if a similar process was instituted for the CFAA? This would take some of the reins out of the hands of self-interested parties like Facebook, and provide certainty for people who are operating in good faith in the public interest.
An example of this would be to say that it is legal for a person to use automated tools to observe their own authentic interactions with hosted software, and it is legal for other people to make such tools available to the public. That would cover Gizmodo's tool, I think.
Is the law settled with regards to distributing non-commercial tools to do something a site may not allow in their ToS, or is it similar to doing it yourself where it's fuzzy based on the reasonableness of the ToS and intent and a whole bunch of other things?
Did you have your phone with you, with the FB app? If you did, they likely have your exact location, and possibly also the exact location of the browser.
A false positive costs them essentially nothing; So they'll always offer the 50-or so highest scoring matches to you. It's possible that those were matched to you because you had just logged out from the same public IP 5 minutes before (even if it was another machine on the network), and there were no other sources of information about that fake account.
It is creepy. It should be illegal, but unforunately, it's legal -- and the governments are happy about it because it's often illegal for THEM to collect all this data, but it's not illegal to let FB collect it and ask them for a copy.
It may sound crazy to suggest techies could lead an exodus off Facebook, but look at twitter: techies lead the influx onto that service, so why not the other way around? When are we going to put our money where our mouths are, “vote with our feet” and leave Facebook (i.e. permanently delete your account)? Otherwise all this handwringing seems meaningless- they keep misbehaving because they know no one’s going to do anything.
> Facebook has nearly limitless access to all the phone numbers, email addresses, home addresses, and social media handles most people on Earth have ever used.
They're overlooking an obvious one which is location. If you and a stranger use your FB apps from the same restaurant the same evening, the stranger will appear higher in your search results of someone with their name. It would be reasonable for this to feed PYMK.
The very thing they are in trouble for is giving third parties (which would be to include journalists) unfettered access. It’s kind of disengeneous Gizmodo didn’t even recognize their own cognitive dissonance in this situation.
On the same lines, here's another question - what prevents Facebook from turning into a bad actor and using the power it has?
likely they still have an issue with it, but as the article implies they are busy with other issues/data scandals to make a further issue with the "tool" at this time atleast.