
How I got sued by Facebook - petewarden
http://petewarden.typepad.com/searchbrowser/2010/04/how-i-got-sued-by-facebook.html
======
dotBen
"my lawyer advised me that it had never been tested in court, and the legal
costs alone of being a test case would bankrupt me."

This is a very real problem with law surrounding emerging business practices,
esp here in the US.

Ultimately you can only pioneer whatever you can afford to defend in court.
Facebook, or any other BigCo for that matter, can assert that you can't do X
and it's up to you to fight it in court... Even if there is prior behavior
such as with this case. Clearly Google does the very same job and doesn't have
an agreement with Facebook to spider their site.

But if you can't afford to defend it and bring up the prior examples in a
court, then Facebook - or anyone else for that matter - can stop you.

~~~
viraptor
They probably have some clause in the terms and conditions about the automated
interaction with the page (e.g. point 3.2, but there might be more)... but I
hope it would be very hard to enforce. Especially when you can download the
public pages without accepting the terms and conditions document or even
seeing it.

------
kbrower
"Their contention was robots.txt had no legal force and they could sue anyone
for accessing their site even if they scrupulously obeyed the instructions it
contained. The only legal way to access any web site with a crawler was to
obtain prior written permission."

This is ridiculous. Can someone release the data so this can be tested in
court? EFF?

~~~
DrJokepu
There's nothing ridiculous about that. There's a clearly visible link on the
bottom of each Facebook page that links to the Terms & Conditions of accessing
Facebook pages. Some relevant parts:

"If you collect information from users, you will: obtain their consent, make
it clear you (and not Facebook) are the one collecting their information, and
post a privacy policy explaining what information you collect and how you will
use it."

"You will only use the data you receive for your application, and will only
use it in connection with Facebook."

"By "application" we mean any application or website (including Connect sites)
that uses or accesses Platform, as well as anything else that receives data."
- note that by this definition a Facebook crawler is an applicatiom

"You will only use the data you receive for your application, and will only
use it in connection with Facebook."

"You will have a privacy policy or otherwise make it clear to users what user
data you are going to use and how you will use, display, or share that data."

"You will not transfer the data you receive from us (or enable that data to be
transferred) without our prior consent."

"You will make it easy for users to remove or disconnect from your
application."

etc.

These things are easy to look up, it took me about five minutes to find them.
In fact, they have the right to limit the way even publicly faced content on
their site is used. Things like copyright and data protection laws come in
mind. You might not agree with that but it is absolutely in their right to do
so.

The FB Terms & Conditions don't mention robots.txt at all.

~~~
axod
As far as I can see, they weren't creating a facebook application, so I'm not
sure why the Terms & conditions are relevant.

If you run a public website, you have to accept that people may crawl your
website. If you want to prevent it, don't let them crawl it. That's what
robots.txt is for.

~~~
DrJokepu
As far as I understand, the problem was not crawling the website, it was the
way he used to data he gained by crawling.

~~~
echaozh
I think it's quite ridiculous.

If I buy a map, can I use it to be a guide, giving people directions and make
a living?

If I buy a story book, can I use it to read to children and collect money from
their parents?

Or, if I spend some time to index all the books in the library, can I sell the
index and make money? Or did I violate the rights of either the library or the
books' copyright owners?

I think crawling and making use of the crawled data is not offending the
copyright law because it's not actually a copy of the data. Rather, it's a
service to transform the data and help other people get to the original data
in an easier way.

Technically, they can crawl the data on the spot when receiving a request from
their clients. In order to speed up the process, they somehow cache the data
crawled from earlier occasions. But that's technical details, which is
encapsulated. Otherwise, they could have stored URLs and offsets in the pages,
rather than the data itself. I don't think that breaks the law, or there will
be no way to avoid breaking the law to refer to anything.

~~~
DrJokepu
A couple of examples: if you rent a DVD from one of those DVD rental places,
normally you can't play it in public places like bars and places like that. If
you buy a CD you can't broadcast it on your radio station, you need a special
licence for that.

Another possible related issue is the question of derivative works (e.g. the
index of a library). There's a Wikipedia article about derivative works, it's
a bit too complex to summarize here:
<http://en.wikipedia.org/wiki/Derivative_work>

~~~
blasdel
Two extraordinarily poor examples!

Both films and music (plus songwriting too) have been explicitly enshrined by
US copyright law as having separate rights for home use and public
performance. Each has multiple spheres of statutory licensing organizations
with special exemptions from anti-trust law.

~~~
DrJokepu
I was referring to the copyright laws of the United Kingdom which is the
country where I live (Copyright, Designs and Patents Act 1988). Here, public
performance infringement is not limited to solely films and music (Section
16.1c).

I'm not very familiar with US copyright legislation but I assumed that the
copyright laws there are similar without checking any sources - which was a
mistake apparently.

------
ramanujan
Facebook was founded by an outlaw scraper. Game recognize game, they're just
neutralizing a threat, don't worry about the legal window dressing.

Read about Zuck's wget magic here, and how he illegally scraped (guarantee it
was a TOS violation) all the Harvard online facebooks in 2003:
<http://www.scribd.com/mobile/documents/538697>

As Balzac said, behind every great fortune lies a great crime :)

~~~
mos1
He also wrote software that scraped and archived AIM status messages.
Apparently he has a long history of scraping content in various forms.

~~~
jacquesm
Not to mention the fact that he 'scraped' the idea of the concept of facebook
in the first place.

------
asimecs
Legal precedent: "Copiepresse (Belgian Newspaper Conglomerate) v. Google (Read
more.. ). Filed in August, 2006.

Claims: to remove all the content indexed by Google's crawlers on the
newspaper's websites."

[http://www.infoniac.com/offbeat-news/google-list-of-class-
ac...](http://www.infoniac.com/offbeat-news/google-list-of-class-action-
lawsuits.html)

Google's response: "Of course, if publishers don’t want their websites to
appear in search results (most do) the robots.txt standard (something that
webmasters understand) enables them to prevent automatically the indexing of
their content. It's nearly universally accepted and honoured by all reputable
search engines."

[http://googleblog.blogspot.com/2006/09/about-google-news-
cas...](http://googleblog.blogspot.com/2006/09/about-google-news-case-in-
belgium.html)

May be you can get Google to file an amicus brief...

"Outcome: Google had to remove the plaintiff's newspaper content from its
database within 10 days or face fines of 1,000,000 Euro per day. Google had to
publish "in a visible and clear manner and without any commentary from her
part the entire intervening judgment on the home pages of google.be and of
news.google.be for a continuous period of 5 days within 10 days... under
penalty of a daily fine of 500,000 Euro per day of delay". Google had was
awarded the costs of the expenses of 941.63 Euro (summons) and 121.47 Euro
(costs of thy proceedings)."

------
siculars
I've been beaten into oblivion before for speaking out about the way I see the
future shaping up re Big Corporations owning our information but I'll take my
chances and use this opportunity to speak out against Facebook again. In
truth, I did invoke Orwell...

I'm no Facebook hater, I actually use Facebook often but when they pull these
kind of stunts it really upsets me. I would imagine it would upset most
freedom lovers on this site as well. When Facebook decides to take a
smattering of our data and make it "Public" how then do they decide to control
that data after the fact? Since when did Facebook re-define the word "Public"?
It's like your cell phone provider and ISP redefining "unlimited". This stuff
has to stop. Getting back to Facebook, since when does merely browsing to a
URL(I) enter you into some sort of binding contract with the publisher?

~~~
imajes
Whilst i agree with you, the devil's advocacy in me says,

"what happens when Facebook gets sued for this data exposure?"

And there are two avenues that might happen - investors suing as to loss of
potential revenue, and user(s) suing over privacy violations.

Bear in mind, FB have already had class actions against them.. i don't think
setting aside another 10m+ (conservative) just to defend yet another nuisance
lawsuit is something they're going to be up for.

So, as Pete Warden described, they bullied him into shutting it all down with
threats that they could just as easily lose in court - based on the principle
that it's cheaper for them to sue Pete and lose cred amongst us than it would
be for them to face a class action from users who are led to believe that
their privacy has been violated by the data set via an ambulance-chasing
lawyer.

Facebook should have instead invited Pete in for a chat and a job, but instead
they took the full frontal lawyer bulldog approach. Sometimes that happens.
Hopefully next time it wont- and i've already poked my friends at FB to raise
the issue internally if possible -- and so should you.

------
breck
Wait, from reading the article it appears he didn't get sued by Facebook (they
just threatened to sue).

There's a difference.

~~~
viraptor
Yes... a difference between a lawsuit and a "legal extortion".

~~~
devinj
It's not extortion, they didn't settle for anything other than him deleting
all copies of the data that he had access to. Hyperbolic statements are
unnecessary.

~~~
loup-vaillant
They extorted the "right" to aggregate data from their site. Also note that
such right could imply monetary benefits, so this is indeed extortion.

------
wooster
[http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Tel...](http://en.wikipedia.org/wiki/Feist_Publications_v._Rural_Telephone_Service)

------
analyst74
How about crawling search engines? For example, instead of crawling directly
on facebook, crawl on a search engine with "site:facebook.com".

In that case, your data is gathered from the search engine and has nothing to
do with facebook any more. And I doubt the search engine will sue you for
using their service.

~~~
Vivtek
It's still Facebook's data (by which I mean "data that originated at
Facebook"), and I guarantee you that their lawyers will still threaten a
lawsuit. _What do they have to lose?_

~~~
robryan
You could then claim that the lawsuit is against the wrong person or would
they argue that the original scraper was in the right and the secondary person
was in the wrong?

~~~
Vivtek
Well, it's your use of the data, not so much the scraping itself. But way more
importantly, the search engine has money and you don't.

------
btipling
I'm surprised that shutting up about this wasn't part of the agreement, either
that or he's violating it. I've never heard of a settlement that didn't
involve keeping your mouth shut.

I'm glad he posted about it though, that legal issue regarding robots.txt is
good to know.

~~~
petewarden
They never asked me to keep quiet about it, and they've been willing to
comment about it too, eg

[http://www.newscientist.com/article/dn18721-data-sifted-
from...](http://www.newscientist.com/article/dn18721-data-sifted-from-
facebook-wiped-after-legal-threats.html)

I'd imagine they want to deter people from following in my footsteps, so
publicizing their actions serves as a warning.

~~~
jrockway
_I'd imagine they want to deter people from following in my footsteps, so
publicizing their actions serves as a warning._

If anything, I'm encouraged to start a publicly-available database based on
crawling Facebook. Don't associate your name with it, crawl from your
Ipredator account, and make sure your data is spread far and wide. Then they
can sue the Entire Internet like the music companies are having so much
success with.

Information wants to be free.

(Also, what's the worst that could happen? You get sued, and Facebook gets a
few thousand bucks from your savings account and your used mattress? OH NOES.)

~~~
pbiggar
> Information wants to be free.

I think the full quote is something like "Information wants to be free.
Information also wants to be expensive". As I recall, it was about the
dichotomy between the fact that you could copy bits for free, and that
information was very valuable.

~~~
jrockway
I thought it meant that Mr. Information was locked in a dungeon, yelling
"Hellooooo? Can anyone hear me!???" to any passerby inside the walled garden.
He wants to be free. He wants you to free him!

------
gcb
He is not publishing the data as is. he is publishing his work done over the
data.

or am i wrong?

Now can't i publish my thesis because it contains information i got from
copyrighted books?

~~~
scott_s
He offered to distribute the data as-is to whoever was interested.

