

Facebook employee responds on robots.txt controversy - petewarden
http://petewarden.typepad.com/searchbrowser/2010/06/facebook-employee-responds-on-robotstxt-controversy.html

======
finiteloop
This is Bret Taylor, CTO of Facebook.

There are a couple of things I want to clarify. First, we genuinely support
data portability: we want users to be able to use their data in other
applications without restriction. Our new data policies, which we deployed at
f8, clearly reflect this (<http://developers.facebook.com/policy/>):

    
    
        "Users give you their basic account information when they connect with your application. For all other data, you must obtain explicit consent from the user who provided the data to us before using it for any purpose other than displaying it back to the user on your application."
    

Basically, users have complete control over their data, and as long as user
gives an application explicit consent, Facebook doesn't get in the way of the
user using their data in your applications beyond basic protections like
selling data to ad networks and other sleazy data collectors.

Crawling is a bit of special case. We have a privacy control enabling users to
decide whether they want their profile page to show up in search engines. Many
of the other "crawlers" don't really meet user expectations. As Blake
mentioned in his response on Pete's blog post, some sleazy crawlers simply
aggregate user data en masse and then sell it, which we view as a threat to
user privacy.

Pete's post did bring up some real issues with the way we were handling
things. In particular, I think it was bad for us to stray from Internet
standards and conventions by having an robots.txt that was open and a separate
agreement with additional restrictions. This was just a lapse of judgment.

We are updating our robots.txt to explicitly allow the crawlers of search
engines that we currently allow to index Facebook content and disallow all
other crawlers. We will whitelist crawlers when legitimate companies contact
us who want to crawl us (presumably search engines). For other purposes, we
really want people using our API because it has explicit controls around
privacy and has important additional requirements that we feel are important
when a company is using users' data from Facebook (e.g., we require that you
have a privacy policy and offer users the ability to delete their data from
your service).

This robots.txt change should be deployed today. The change will make our
robots.txt abide by conventions and standards, which I think is the main
legitimate complaint in Pete's post.

~~~
jancona
Bret wrote: "I think it was bad for us to stray from Internet standards and
conventions by having an robots.txt that was open and a separate agreement
with additional restrictions."

You don't have an "agreement" at all. An agreement requires that two parties
actually, um, agree. You've published a statement where you assert certain
rights and imply that you will sue anyone who accesses data on your site in a
way you don't like. You may get away with that, regardless of the legal merits
of your position, because you have more money for lawyers than most people
you're likely to sue. But don't try to dignify what you're doing by calling it
an "agreement". It's like an extortionist telling me that we have an agreement
that he won't break my windows if I pay him protection money.

~~~
jm3
If you feel so strongly about Facebook's "agreement", perhaps you shouldn't
use Facebook.

------
jdrock
In my personal opinion, the Facebook employee's response shows that he doesn't
understand the company's high-level business goals. Their developers may think
they are building an open web, but it's very clear the company is only
interested in a closed web they control.

Based on my interaction with website API developers, most of them honestly
believe that they are building an open web. But let's face it, the API is a
benefit to them and creates a critical dependency for the user of the API.
It's basically vendor lock-in.

~~~
natrius
As far as I know, any data that people post to Facebook without restricted
visibility can be pulled through the API. If a user has logged into your site
and given you access to their data, you can access restricted data as well.
You can also write to Facebook via the API.

I'm not sure what you want Facebook to change.

------
Aaronontheweb
Pete's right-on about this: "You've chosen to leave all that information out
in the open so you can benefit from the search traffic, and instead try to
change the established rules of the web so you can selectively sue anyone you
decide is a threat. "

Speaking as someone who's working on leveraging the Facebook API in a
commercial product, this leaves me feeling like I'm opening myself to a lot of
legal exposure if Facebook subjectively decides that my service poses even a
minor threat to them. Given that I'm bootstrapping, there's no way I'd be able
to put up any sort of legal fight what so ever against a company as well-
funded as Facebook.

~~~
sounddust
I think that Facebook has a right to protect their data, but leaving
robots.txt wide open and trying to enforce some random TOS that a developer
might not even know about is the wrong way to do it. I think a much better
method would be to ban all bots from robots.txt (and crawling) except the
major search engines and provide data to others through APIs or other means
that are controlled (and the TOS can apply to the API rather than random
crawling). Maybe not what a developer wants to hear, but the truth is that the
internet is infested with bots that do nefarious things, and I think it's
better for the privacy and security of their userbase to control the spread of
their data as much as possible.

~~~
keltex
They could also make the pages NOT indexable and then have a side agreement
with Google to give Google API access to index the pages they want.

~~~
duck
What? So you want to just make Google _more_ powerful? How about all the other
search engines, even the less known ones like Duck Duck Go?

------
gkoberger
"Facebook has always been a closed system where developers are expected to
live in a culture of asking permission before doing anything, and existing at
the whim of the company's management. [...] The web I love is an open world
where you are free to innovate as long as you stick to the mutually agreed
rules."

Facebook's policies aside, it's interesting to note that this criticism was
written in response to a letter from Blake Ross- the founder of Firefox.

------
curio
this is a really important issue.

this new model that facebook is trying to push isn't scalable. it favors the
big guys and it's bad for the open web.

you shouldn't have a TOS that contradicts your robots.txt. period.

------
jrussbowman
Sorry, as implied to my comment on the first story about this posted to Hacker
News, I'm siding with Facebook on this one.

Of course they tune their page so that search crawlers can best index the
information, so does everyone else on the web.

However, they also provide an API, with clearly defined terms of use, which
you may use to get information. Basically your complaint boils down to you
don't want to use the methods they've set up for you to access their data, and
you're complaining about it.

As for comments about fair, true spirit of the internet what have you... I
don't think the true spirit of the internet has ever been everyone has to give
everything away in every possible imaginable way. And, in the end, it is
Facebook's data, they make that clear before you ever start adding data to
their servers. So does just about everyone else who allows you to submit data
to their servers.

This shouldn't be confused with how difficult it is to remove data and
accounts from their system, which is a giant pain in the butt, or the fact
that they've made drastic changes to the public nature of the data after
keeping it private for so long. That's all just a giant mess.

~~~
natrius
If I request content without circumventing any access controls and you give
that content to me, I've done nothing wrong. I didn't accept any terms of
service to do that, and even if I did, what is the legal recourse for someone
breaking the terms of service? I always assumed it was termination of service,
but that's probably wrong.

Suing people because you're too lazy to only give your content to people who
you want to receive it is wrong.

~~~
jdrock
What will happen is that they will send a cease and desist letter to you (like
they did to Pete). At that point, you have basically given notice and it
becomes much riskier to continue doing what you're doing.

That's not to say their C&D is valid or enforceable. It just enters into
another legal bin that's a bit more hairy.

Unfortunately, the legal precedent for all this is murky at best. Past cases
have been very specific to the details of those cases. No line has ever been
drawn about what is ok and what is not.

~~~
moxiemk1
> That's not to say their C&D is valid or enforceable. It just enters into
> another legal bin that's a bit more hairy.

Exactly. It will be interesting to see where things end up playing out in the
"leaving yourself wide open" department.

It's very much parallel with the "using open wifi points" debate.

------
ojbyrne
Facebook isn't the only site with that language. In fact I think you'll have
to look hard to find any major site that doesn't have it. Not that it doesn't
suck.

------
tjmaxal
this is essentially the same argument people use against the iOS platform.

But clearly from Apple's example it is possible to build a profitable product
on a closed system.

Admittedly it doesn't seem "fair" or true to the spirit of the internet. but
it shouldn't be any surprise that a company is going to do what is best for
that company forsaking all others.

------
sd273
I'm tired of hearing people complain about private companies' rules for using
their information/platforms. Facebook and Apple can do whatever they want on
their own platforms and if people don't like it they can leave/not use the
service. That's how a free market and free web works. People complained about
Microsoft's dominance years ago and now their ascendancy is ending due to this
idea called the "market"...

~~~
pyre
The issue that a lot of people have is that they would like to see change
happen in their lifetime. Microsoft is in a decline now, but that doesn't mean
that: 1) it will continue to decline or 2) it will completely fail within our
lifetimes. Once companies hit a certain level, they take a long time to
completely fail due to the accumulated money and power during their 'good
years.'

Change, in general, doesn't come about by people sitting around doing nothing
but saying, 'the market will decide.' If you take the time to become an
advocate for a certain 'side,' _you_ can affect the market through public
perception.

[aside: People are not always logical agents with access to full information
like most economic theories seem to assume. To me this comes off like a lot of
physics where things are assumed to be on a 'perfectly frictionless surface,'
'in a vaccum', etc.]

