

Tim Berners-Lee: Tell Facebook, Google you want your data back - soitgoes
http://news.cnet.com/8301-1023_3-57415764-93/tim-berners-lee-tell-facebook-google-you-want-your-data-back/

======
moxiemk1
When discussing data-portability of social networks, Facebook's data download
feature is sometimes brought up.

I periodically use this feature, to see if it has improved any since it's
initial disappointing release. As of a few weeks ago, it has not.

A _true_ data download from Facebook would consist of: a machine-readable form
of every action I've taken on Facebook (likes, friend requests sent/received,
photos I added tags to, photos uploaded, status updates, messages
sent/received, comments made, etc.) along with timestamps and _at least_ URIs
pointing to the objects referenced (photos, people, etc.) if not a copy of my
view of those objects.

(I understand why Facebook might claim they shouldnt give me, for example,
dates that other people de-friend me, since that isn't accessible info.
However, I do think that copies of statuses I commented on and can still see
isn't unreasonable)

What we have now is: A static HTML dump of your profile page, photo page, and
messages that is massively incomplete. Since the switch to timeline, fewer
actions I have taken in the past seem to qualify for inclusion on my page
("moxiemk1 commented on friend's photo" used to feature more often in my
profile than it does now). Since the revamped messages/chat integration, the
messages dump (which always eventually cut off at some point in the past) is
even smaller, and harder to read.

I would indeed like to have copies of the data I've created, and would like to
emphasize that Facebook's "effort" to do so is complete BS.

~~~
grey-area
When you join facebook you're joining a _free to use_ data-silo, most of which
is not open to the internet - the site is predicated on hiding your data from
the world (and esp. google), and then selling it on to advertisers and other
businesses, sometimes anonymised, sometimes not. Beacon is a perfect example
of the sort of uses you can expect them to put your data to in future. All the
data from like buttons, your social graph etc, is invaluable to them, and
invaluable to advertisers and retailers.

The logical conclusion of that is they have absolutely no incentive to give
you your data back, in fact they view that as _their_ data, earned by offering
you the service of sharing stuff with your friends, without having to set up
your own website. That data is their crown jewels, so I am amazed that anyone
would expect them to give it up, or be surprised at their reluctance - this is
the very essence of Facebook, and they've done very well out of it.

That's not to say that you should never use Facebook, but just that if you do
use a free service like Facebook, you should expect to give up some of your
privacy and control over your data in return. If you don't want to do that, it
would be better to use another service (which doesn't rely on selling your
data as their business model).

~~~
Xuzz
Facebook does not sell user data.

~~~
grey-area
In the crude sense of collecting all your posts and sending them all to an
advertiser, of course not. In the sense of selling your interests, friends,
social position, age etc to advertisers as a datapoint, yes they do; that's
how they target ads and make money. They also tried to harvest purchasing
habits from other sites like Amazon (with Beacon), and give broad access to
developers, some of whom abuse the privilege and have been caught selling data
on. I'd expect that sort of activity to increase post-IPO. They're not alone
in this of course - gmail does the same, without the data lock-in.

~~~
Xuzz
No, they do not sell interests to advertisers. What they do is allow
advertisers to show their ads to people with those specific ages, interests,
and such. It's a subtle and important difference: with this method,
advertisers only know that their ads are being shown to _someone_ who matches
their criteria, not _who_. Advertisers are not able to correlate your identity
with ad targeting.

------
NZ_Matt
Twitter is one of the worst culprits. Currently there is no way to search for
any tweet more than a few days old.

Google's realtime search used to provide the ability to search and retrieve
tweets from a specific date and time in the past, but twitter cancelled that
deal and haven't provided any decent replacement.

A publicly accessible archive has huge potential for research. A friend of
mine used Google's realtime search to pin point and keep record of the first
tweets out of Christchurch following the Christchurch earthquake. Sadly there
is no way to do this now and twitter don't seem to care.

~~~
_delirium
They're going to start selling archives of the past two years' of Tweets
through DataSift, though for DataSift's usual hefty prices (for a real-time
feed, they currently charge $0.10/tweet!):
<http://www.bbc.com/news/technology-17178022>

Presumably the target customer is researchers of the finance-sector variety
rather than researchers of the sociology or history-department variety...

~~~
sad_panda
$0.10/tweet? That is insane!

------
jmathai
This is precisely why The OpenPhoto Project exists. We're building a system
where users give applications valet keys to _their_ data.

Data ownership without portability is a moot point. Many services allow you to
download a blob of your content but that's of little to no value for most
users. What we really need (it's 2012 after-all) is a more thought out system
where the user actually owns and controls their data and gives applications
access to them.

This means multiple applications can leverage the same set of data and the
user doesn't have to continue using any of them. Basically, there's no single
point of failure in terms of data interoperability.

Currently, a user's Facebook content can be used by other services but for
this to remain the user must keep their Facebook account open.

We're solving this problem by letting the user grant OpenPhoto software access
to their photos. Most likely it's a bunch of photos that reside in _their_
dropbox account but could also be an S3 bucket or box.net account once those
become more consumer friendly.

<http://theopenphotoproject.org> or <https://openphoto.me> if you want to sign
up for a hosted account.

~~~
dkarl
_where users give applications valet keys to their data_

I love the metaphor. I want Facebook and Google to have my data, because they
can't give me good service without it. Unfortunately, as a consumer, giving
Facebook or Google access to your data is akin to the Native Americans
welcoming the Europeans to their continent. You have no rights, and the
balance of power is against you. They own your data now, and you may use
whatever part of it they give you permission to use, in the manner they give
you permission to use it.

Your metaphor is beautiful, instantly understandable, and puts them in their
rightful place as service providers, not overlords.

~~~
jmathai
Not sure I completely understood your first paragraph. Are you saying that as
consumers if we give Google access to our data (even if it's limited - aka
valet) then they ultimately "take over" the ownership of it?

There's a lot here. If they do get access to your data they'd most likely
annotate it and provide you an enhanced experience on data that is owned by
them (their annotations on your data). But I think that's okay. We have to be
pragmatic here and say that in the example of photos if you give Google access
to your photos to display in your G+ account and they collect +'s and comments
perhaps it's okay that they "own" that content. As a user you still are better
off than today and maybe it's a progression to where more and more data is
portable and owned by you...but we have to start somewhere :)

~~~
dkarl
If you maintain a copy of the data and give them access to it, then you aren't
going to lose the data. You've lost control of it, since they can take it and
do whatever they want with it, but you'll always have your copy. However, very
few web apps work that way. The personal data that goes through Gmail is
overwhelmingly generated in Gmail, and Google retains the only copy unless you
back up all your data at home. Ditto for Facebook -- the only personal data
that people are likely to retain a copy of is photographs, and again, that
would only be for backup purposes, except for hobbyists who keep raw files or
higher-quality images at home. For most people, Facebook serves as a very
reliable repository for their personal snapshots.

We're completely at their mercy for all the data we create in their apps, and
therefore we're completely dependent on two things:

1) They want to retain customer trust. 2) They are concerned that they might
lose that trust if they misbehave.

They're constantly testing the limits of 2) (intentionally or
unintentionally.) At some point 1) could cease to be true. They could become
so necessary, or so powerful, that people become cynical about the possibility
of defying them, and therefore start accepting abusive behavior.

Another scenario is that they could stop caring about customer trust not
because they're too powerful but because customers have already abandoned
them. Imagine that in 2022 Facebook is a has-been. It has been beaten in
social networking, or social networking sounds ridiculously quaint in 2022,
and Facebook is a failing company with no prospects for revival. Kind of like
SCO. And it happens to possess, as its only asset, a little bit of
intellectual "property" (a decade of personal information on a billion
people.) Perhaps, in its death throes, it will go the way of SCO, abandon all
moral restraint, and attempt to squeeze a windfall out of that "property" in
the most cynical way possible. Maybe they'll charge you a ridiculous fee to
retrieve your data in a useable form. If their user agreements make that
impossible, someone in the company might go rogue and sell all your private
data to a shady entrepreneur in a lawless country. They could sell the photos
to Joe Francis and individual account histories to blackmailers and gossips.
We know that won't happen today, because Facebook has the resources and the
motivation to make sure it doesn't happen, but if the company goes to hell, we
won't have those assurances anymore.

------
dkarl
_"There are no programs that I can run on my computer...."_

Major generation gap and sophistication gap here. This is why the issue of
data ownership took so long to get traction. For most people, data is
invisible and using it is magic. The only difference people noticed when they
lost control of their data was that they could suddenly access it everywhere!

The difference between owning your data and not owning it is like the
difference between living in a country with civil liberties and legal
protections versus living in a country without civil liberties and effective
legal protections. The difference is abstract and meaningless until it becomes
a matter of life and death, when you end up in a situation that you only
thought existed in paranoid fantasies.

The invisible change from owning your data to being owned by Facebook and
Google happened to correspond to a very noticeable improvement in convenience
for users. As in more dramatic examples from history, an authoritarian regime
has delivered in a way that its predecessor could not, and as a result, people
have embraced it. It doesn't mean that people accept its ideology. The way we
think about Facebook, the way anybody thinks about Facebook who engages in
these conversations about web industry economics, _and therefore how Facebook
thinks about itself,_ is much darker and more sinister than the aspect of
Facebook that users embrace. Users embrace the effect that Facebook has had on
their lives, which is overwhelmingly positive. The side we see is not
something they embrace; it is something they have barely begun to consider. I
have faith that they will react to it as we do.

We are not more sensitive to freedom than they are; we are merely face to face
with the problem because we like to imagine ourselves in Facebook's shoes :-)
We imagine what it's like to have that power; we are able to think in a
predatory, profit-oriented mindset; we understand the temptation. We don't
have to be evil to see the temptations that Facebook faces. And we understand
that a person can withstand temptation, but a corporation cannot. A
corporation cannot withstand the temptation to make money. Only as long as its
profits exceed the wildest demands, anyway. When profits flag, a corporation
will collapse to the moral lowest common denominator, because the people who
resist immoral profits will be replaced.

Like I said, we don't have to worry that the average Facebook user is okay
with the dire scenario of Facebook fully and amorally exploiting its power.
It's completely imaginary at the moment. Those who worry about it worry alone,
but only because we are uniquely positioned to imagine it. We can make other
people understand. Right now Facebook's major sin has been to appropriate a
dangerous and therefore immoral amount of power. They haven't abused it yet,
not the way they could. I think with the right kind of education, users will
revolt and demand rights and control before Facebook breaks down and really
abuses the terrifying power it has amassed.

Am I too optimistic?

------
thegooley
It's interesting that people seem to have suddenly started talking about data
ownership much more lately.

We have been working on a project called TheMux [1] that aims to create a data
platform (for lack of a better term) which helps you to pull in and archive
your data from various sources and apply some simple normalization. We're
specifically working on ways to keep both content (blog posts, photos, status
updates, etc) and datum (health info, workouts, communication data, etc) in a
form where you a) have full control of the raw data and b) can make select
data available to external apps which do things like presentation and
analysis.

Our goal is to create an open-source platform to address some of these
questions around data ownership, access and portability. Imagine the day when
you decide that you want to move your blog from Tumblebook to Posterpress and
you can do that by simply creating a new account on Posterpress and granting
it access to your existing data. Or you've been using JogKeeper but then you
find this great new service called SprintTracker that you want to try out -
and all you need to do is connect and it will have your years worth of running
data.

And we think that something like this will also make it much simpler for SaaS
developers to compete not by customer lock-in but rather by providing superior
products and continually working to make the customer happy.

We're taking it slowly right now to build this platform which we will open-
source under a permissive license (as soon as it's a little more mature) by
first building a few consumer-oriented services on top of it. Number 1 on our
list is a blog-type website based on this MuxDB concept and that's what we're
working on at the moment.

If you're interested in giving us your thoughts (or help, or tell us we're
crazy, or whatever), my email is in my profile.

[1] <http://theMux.com>

~~~
leot
I've been advocating (to friends) the need/inevitability of this kind of
system.

One possible marketing phrase that's been thrown around is "data asymmetry",
as in "TheMux aims to rectify the massive data asymmetries that give Facebook
and Google orders of magnitude more power than their users."

Even better would be the development of a completely peer-to-peer version of
Facebook (which Diaspora isn't).

~~~
thegooley
Nice phrase, I'll have to keep that in mind - at least when talking to
hackers.

Re: replacement of Facebook, I don't think it's realistic for someone to
develop an actually decent federated replacement for a true social network.
The difficulty of doing some of the stuff that Facebook users like, but trying
to make it work when all of the nodes are on different networks with different
constraints, seems to be very high.

I'm not so much in the "Facebook shouldn't have my data" camp... more like the
"I want control of my all data TOO" camp.

Our concept for people writing apps that leverage TheMux doesn't require them
to actually load data live from their user's instance of TheMux (because
that's crazy, a single point of failure, and probably very inefficient) but
that there is simply the agreement to interchange data.

To steal your phrase - we want to make the data symmetric.

So that no matter where I create or update my data, the systems involved makes
a best-effort to return that data to my MuxDB, and then my MuxDB will make a
best-effort to get that data out to the sites I've authorized and who need the
data to do what I've signed up for.

Obviously, we're still in the early stages, but it's been an exciting and
challenging journey so far.

------
amirmc
Original article: [http://www.guardian.co.uk/technology/2012/apr/18/tim-
berners...](http://www.guardian.co.uk/technology/2012/apr/18/tim-berners-lee-
google-facebook)

------
rmc
Those of you in the European Union can avail of EU law which says you're
legally entitled to get a copy of all the personal data companies hold on you.
People have made this requests to Facebook. Here's how: [http://europe-v-
facebook.org/EN/Get_your_Data_/get_your_data...](http://europe-v-
facebook.org/EN/Get_your_Data_/get_your_data_.html)

~~~
klez
I can't find the original story, but when a guy asked for his data, some was
missing and when he asked about it, Facebook answered it was their own trade
secret and could not give them to him.

------
aangjie
Honestly, i was happy when i first discovered google's data liberation
project..<http://www.dataliberation.org/>. Was upset with fb a few years ago..
But nowadays i have accepted that no private company is going to have a
financial incentive to give back my data. At best, there can be laws that are
made, but loopholes will be found. Am moving all the stuff i care about to a
vps hosted server. probably a couple more months and even email should be
moved. Yay..:-)

------
lifeformed
It would be cool if all your personal data was stored in some central
repository. You can designate certain chunks of that data as publicly
accessible, and you can grant permission to different sites for reading and
writing data to other sections. That way all sites can contribute to one data
set, kind of like a personal Wikipedia. Everybody wins.

It would have a set of standards that make it easy to request commonly used
pieces of information: name, primary-email, avatar, etc. You could even store
passwords in it, or more likely just sidestep the need for passwords all
together, since it could essentially be a global login. You would never have
to fill out forms for registrations, credit card info, etc.

For sites that you don't want to give your personal information to, you just
give access to a secondary set of data instead, with values for your internet
nickname and persona.

Of course, there'd be big security challenges, but I think that would be a
neat solution to issues with online identity and personal data.

------
enqk
I really have to wonder, why is there no "peer to peer" social network? It
seems ideally suited for this type of usage, and this would ensure that data
is only in each node rather than on some central server. Does something exist
like that already?

~~~
rmc
_why is there no "peer to peer" social network_

Probably the closest there is is the blog-o-sphere and/or the internet in
general? I can host my photos on a website, and email that to a friend?
Someone can subscribe to my RSS feed of my blog, etc.

------
Splines
> _He also told The Guardian that his habits on his computer indicate his
> health and places he's been._

Technology can definitely make a difference.

 _Last week you sat on the bus next to someone who has been diagnosed with
Avian flu - would you like me to schedule a doctor's appointment for you?_

------
hcarvalhoalves
Being able to download a blob != having your data available.

~~~
Bjartr
What does == having your data available?

~~~
jmathai
See my comment here, <http://news.ycombinator.com/item?id=3859864>

------
loverobots
I do not want my data as much as I want to be able to delete everything from
mm/dd/year to mm/dd/year. And I mean delete and truly forget. Tracking is the
much bigger issue, my tweets out of twitter are worthless. Pictures posted are
low res versions too.

