

How I got sued by Facebook (2010) - helwr
http://petewarden.typepad.com/searchbrowser/2010/04/how-i-got-sued-by-facebook.html

======
antihero
"my lawyer advised me that it had never been tested in court, and the legal
costs alone of being a test case would bankrupt me"

What's to stop two smaller companies making a "court case" where they sue each
other for small bucks with the desired outcome (following robots.txt is a
legal way to access a site with a crawler). This would then set a precedent
that would benefit others as a whole.

~~~
ovi256
Now that you mention it, what's stopping two such companies from manipulating
the result of this landmark case ?

What's stopping Facebook from setting up a puppet company to sue to obtain
their desired precedent ?

This seems like a huge hole in the "let's let the courts decide the law
system".

~~~
sigstoat
judges are not even remotely amused by attempts to manipulate them, and if
they discover it, i can't imagine things ending well for parties making such
an attempt.

------
randomwalker
Lawsuit nastiness aside, there's an interesting and important legal-technical
question that this exposes: how should websites specify acceptable uses of
crawled data and other fine-grained restrictions in a machine-readable form.

Motivated by this incident, I got together with Pete (the author/victim) to
write a piece on "The Need to Reboot Robots.txt" [1] but it went nowhere.

Any suggestions on how to give our proposal legs would be much appreciated.

[1] [http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-
rob...](http://33bits.org/2010/12/05/web-crawlers-privacy-reboot-robots-txt/)

~~~
blauwbilgorgel
You can set the SyndicationRight directive for OpenSearch.

"Contains a value that indicates the degree to which the search results
provided by this search engine can be queried, displayed, and redistributed."

The default is "open" meaning: \- The search client may request search
results. \- The search client may display the search results to end users. \-
The search client may send the search results to other search clients.

[http://www.opensearch.org/Specifications/OpenSearch/1.1#The_...](http://www.opensearch.org/Specifications/OpenSearch/1.1#The_.22SyndicationRight.22_element)

That would give you more fine-grained control over what search agents do with
your data. I don't know how broad the support and adherence is to the
OpenSearch spec (IMDB uses it).

------
dodo53
I wonder if he asked EFF if they were willing to defend the case. The thing is
it's probably never individually worth defending against these cases, but on a
society level there'd be so much gain if someone had set a legal precedence
for the validity of robots.txt.

~~~
bertil
They settled out of court:

> He was with the head of their security team, who I knew slightly because I'd
> reported several security holes to Facebook over the years. The attorney
> said that they were just about to sue me into oblivion, but in light of my
> previous good relationship with their security team, they'd give me one
> chance to stop the process. They asked and received a verbal assurance from
> me that I wouldn't publish the data, and sent me on a letter to sign
> confirming that.

The robots.txt part was just Facebook lawyers trying to grasp to a contract
with them; the actual issue was that the EULA allows Facebook to make basic
information available, but users do not expect such a database to be freely
available. Although I personally regret there is so little done by the data
team to help research, legal consequences were not worth the prank; what
actually matters is usage, and Facebook clearly police the spirit of the
platform rather then the law, even beyond their contract partner—see RapLeaf.

------
RyanMcGreal
A case before the Supreme Court of Canada right now [1] touches on a similar
untested premise of the open web. At issue is whether a hyperlink constitutes
a _citation_ or a _republication_ of that page.

In this case, the plaintiff is accusing the defendant of defamation for
linking to web pages the plaintiff argues are defamatory. (Aside: compared to
the US, defamation law in Canada is weighed much more strongly toward the
plaintiff than the defendant.)

Lower courts have decided that simply linking to a defamatory web page does
not constitute defamation, unless the link is provided for the purpose of
endorsing the defamatory material, in which case it is the _endorsement_ of
the link that constitutes defamation, and not the link itself.

The problem in Canada, as in the US, is that governments have not kept up with
legislation governing the legality of various internet-specific activities,
like hyperlinking and so on. That has left the courts to try and decide
through precedent how to handle these conflicts.

[1] [http://www.scc-csc.gc.ca/case-dossier/cms-sgd/sum-som-
eng.as...](http://www.scc-csc.gc.ca/case-dossier/cms-sgd/sum-som-
eng.aspx?cas=33412)

------
RiderOfGiraffes
You might care to read the extensive discussion from when this was posted 11
months ago:

<http://news.ycombinator.com/item?id=1243159>

------
petewarden
I've added a post-script to this story, updating with developments over the
last year:
[http://petewarden.typepad.com/searchbrowser/2011/03/facebook...](http://petewarden.typepad.com/searchbrowser/2011/03/facebook-
isnt-so-evil.html) In particular, I know from my friends in the academic
community that they're quietly putting together processes for working with
researchers. That's a big step forward in my view, as long as they can
safeguard privacy, there's a lot of potential for world-improving research.

------
tba
Great article. Is this the same person that Palantir mentioned as a potential
source of Facebook information for social engineering attacks?

From the leaked HBGary emails:

"The Palantir employee noted that a researcher had used similar tools to
violate Facebook's acceptable use policy on data scraping, 'resulting in a
lawsuit when he crawled most of Facebook's social graph to build some
statistics. I'd be worried about doing the same. (I'd ask him for his Facebook
data—he's a fan of Palantir—but he's already deleted it.)'"

[http://arstechnica.com/tech-policy/news/2011/02/black-ops-
ho...](http://arstechnica.com/tech-policy/news/2011/02/black-ops-how-hbgary-
wrote-backdoors-and-rootkits-for-the-government.ars/4)

------
xd
I notice they have updated their robots.txt to only allow user agents they
have approved.

<http://www.facebook.com/apps/site_scraping_tos.php>

~~~
historical
I noticed that a while back when it had the effect of removing thefacebook.com
from the Wayback Machine. There used to be some fascinating reading in their
old TOS and Privacy Policy. Wish I'd kept a copy.

------
gommm
What would have happened if he had done it from a company based in the
Seychelles for example? Would that be a way to protect against Facebook
aggressively suing with no grounds?

~~~
ovi256
My guess Facebook would still sure and obtain an injunction on the site being
distributed in the US, probably shutting down access in the US.

~~~
gommm
But how can they do this if the dns server and server are not hosted in the
US?

I've been wondering about this problem too recently when looking at some
frivolous patent lawsuits... The problem is that for quite a few case even if
the law is on the side of the startup, the cost of applying the law and
winning the lawsuit are too high...

~~~
nl
If your domain name is registered though a US company they may be able to
seize it.

Also, if you have any assets in the US they can be seized.

------
younata
[2010]

I thought it sounded familiar.

~~~
spxdcz
Also, the title is misleading: he got threatened with legal action, not sued.

------
il
So...anyone have a mirror of the data?

~~~
xd
google for "fbnames".

------
urza
Reminds me how facebook was almost suing suicidemachine.org [1] just because
they allowed people to commit online suicide from facebook (unfriend everyone
and set random password).

For me, facebook is just another bigheaded company, that is trying to turn
your social life into their product [2]. And that is not the place, where I
want to hang out with friends online. (And I dont.)

[1]
[http://suicidemachine.org/download/Web_2.0_Suicide_Machine.p...](http://suicidemachine.org/download/Web_2.0_Suicide_Machine.pdf)

[2] <http://twitter.com/#!/librarythingtim/status/13226541303>

------
otterley
This was, in fact, tested (to a limited extent) in court about a decade ago.
See _eBay v. Bidder's Edge_ , 100 F.Supp.2d 1058 (N.D. Cal. 2000).

Short story: Back in the days when there was actual competition in the online
auction market (anyone remember Yahoo! Auctions?), Bidder's Edge was crawling
eBay listings to index them for an auction search engine. (I worked for one of
their competitors.) eBay sued on a trespass theory, and was granted a
preliminary injunction because the judge held that eBay was likely to succeed
on the merits of the claim.

Unfortunately, the trespass claim was never fully litigated; Bidder's Edge
agreed to stop crawling after the PI was granted.

------
greendestiny
Someone convince me what facebook said here was wrong. I don't think
robots.txt gives you a license to do whatever you want with web content. If it
did wouldn't robots.txt effectively put everything into the public domain?

~~~
Animus7
Google can mine the data and do whatever they want (and I don't doubt for a
second that they run analysis on it), but this guy can't?

Facebook wants to have their cake and eat it to. They want free Google
publicity but god forbid some dude starts downloading pages for research. It's
legal, but it's wrong.

~~~
greendestiny
I'm really only interested in the legal question. And I genuinely would like
to be convinced that the legal system would allow scraping like this. I just
don't see it.

------
jscore
Well, you never got sued just threatened.

------
willlisten
As a founder of a new company and the son of a lawyer lawsuits are certainly
something I think about. It seems all companies that become well known
eventually face lawsuits. While it sucks and you never want to face one, many
know it is a cost of doing business. You also find people who want to attack a
company seeing a big dollar sign in front of them. Plus lawyers might earn
hundreds of millions or dare I say billions if they win a case from a company
like Facebook or Google.

~~~
PakG1
When the Twitter strategy docs got leaked a while back, there was a specific
section that dealt with potential lawsuits.

[http://techcrunch.com/2009/07/16/twitters-internal-
strategy-...](http://techcrunch.com/2009/07/16/twitters-internal-strategy-
laid-bare-to-be-the-pulse-of-the-planet/) (see Defensive Strategy section)

 _Legal

\- We will be sued for patent infringement, repeatedly and often

\- Should we get a great patent attorney to proactively go after these patents
(We need to talk about this more, we are unsatisfied)_

------
PaulHoule
The world really could use better analytics tools for Facebook apps since the
ones that Facebook provides are a little sorry in my opinion.

------
aksbhat
Sorry but I side with Facebook, a freely available public graph of millions of
users could have been used for re-identification attacks.

Frankly you should never share your friends list publicly.

~~~
fname
I'd bet money that a similar list is probably already floating around. Whether
it's freely available or not may be up for interpretation.

