
Facebook's Download-Your-Data Tool Is Incomplete - Garbage
https://privacyinternational.org/long-read/3372/no-facebooks-not-telling-you-everything
======
mudkip
This is a topic that I'm intimately familiar with, thanks to a bizarre set of
circumstances (and a ton of reverse engineering). Story above, technical
details below:

Part 1:

A couple years ago, I noticed that the number of photos I was tagged in kept
going up and down, as a couple of people I knew would disable their accounts
occasionally, and re-enable them a couple weeks later.

I manually the images from them, but wanted a way to automatically scrape any
images I was tagged in, so I wouldn't need to do this manually.

I got myself a Facebook Graph API key and created a sample app with full
account permissions, only to discover that Facebook won't let you export
photos you're tagged in (that you didn't take). The numbers the API reports
are wrong, and there's no indication that it's being purposely redacted.

As a result, I wrote a tool that crawls a profile given a set of authenticated
cookies, and essentially clicks the download link automatically on every
photo. This worked decently well for a couple years, and continues to work to
this day.

Part 2:

I had some spare time on my hands in December 2019, and wanted to write a tool
to browse chat logs from across a variety of services (Facebook Chat,
Hangouts, SMS), such that you'd be able to click a name and see a
chronological discussion, regardless of what service it was on. I downloaded
the Facebook data dump, figuring that was the easiest way to get access to my
Messenger data.

The Messenger dump revealed a few things that surprised me: * The character
encoding is messed up, and requires decoding as Latin1, then re-encoding as
UTF-8

* Some messages are straight up missing, despite being in the UI. The dump is supposed to include attachments (images are included), but is missing audio messages / voice snippets, presumably among others.

* If a user has deleted their Facebook account, the username will appear solely as 'Facebook User', so now you need to figure out who you were actually talking to. Some conversations were very obvious, but others involved wasting a ton of time on and involved dumb techniques (like finding Adium logs of the same chat from an old computer).

To identify certain conversations, I started scrolling back through certain
Facebook posts (which I wrote), to figure out who had been at certain events
with me (to narrow things down). I read a bunch of comment threads that didn't
appear to make much sense to me, until I realized that anyone who deletes
their account also has their comments removed, so basically all old comment
threads are somewhat nonsensical if anyone in the conversation has since
deleted their account. For comparison, deleting a reddit account changes the
ownership of a comment/post to [deleted], which seems much more appropriate.

Presumably wall posts (including happy birthday messages) from people who have
since deleted their accounts are also removed, which is exceedingly shitty -
if someone sends you a greeting card and then dies several years later, it's
not like the post office comes to your house to take your cards back in the
middle of the night.

Part 3:

Because of this, I figured that the only way to mitigate future data loss on
Facebook is to consistently archive things. Since the 'download your data'
tool is basically useless, I started work on a tool that scrapes the site and
"decompiles" pages into raw directed graph DB rows, which can be re-rendered
into a new version of the site. It features a reasonably complete
implementation of Facebook's TAO ([https://www.facebook.com/notes/facebook-
engineering/tao-the-...](https://www.facebook.com/notes/facebook-
engineering/tao-the-power-of-the-graph/10151525983993920)) on top of
PostgreSQL, and works decently well - notably, it also maintains things like
proper links to profiles and stores all assets offline.

Writing a bug-compatible "decompiler"/"recompiler" taught me several things
about how the site works (or rather, doesn't). Here's a small list of errata
I've discovered along the way:

* Objects can have multiple FBIDs

* FBIDs can contain comments/reactions

* Since there may exist multiple FBIDs for a given object, it's quite common for multiple comment threads to exist for a given item, such that commenters on one don't see the responses on the other (and vice versa). Several of my friends have confirmed finding disjointed discussions on their posts after discovering this bug.

* Facebook has several types of deprecated reactions that they store in the DB, which cannot traditionally be viewed from the site anymore. Sucks to be you if you reacted to something that way.

* Certain objects can get lost in their UI, with no easy way to find them. Uploading a photo in a post will put it in your Timeline Photos album, but uploading a photo as a comment to someone else's post will basically make it impossible to find again.

* The number of reactions/comments on a given post is often wrong - this isn't the traditional bug due to eventual consistency, but rather is due to not adjusting the counts for items when a person deletes their account. To a certain degree, this will show you how many people that interacted on something have departed the site.

~~~
wormseed
Please stop this. This is a huge violation of your friend's privacy. Just
because a friend has shared something with you on facebook doesn't mean they
expect it to be in someone's personal cambridge analytica forever.

~~~
mudkip
I'd be curious to hear from other people with similar viewpoints, but I
personally disagree. Cambridge Analytica was bad because it was a third party
receiving data from others, and those people had zero reason to think that
their data would ever be shared with a third party. In this case, all of the
data I've obtained is either public, or was explicitly shared with me by those
people in the first place. If you're arguing that I shouldn't be able to hold
data that someone else wants removed, then I'd ask at what point we should be
deleting your memories of events that someone else wants repressed.

To be clear, if I want to create the equivalent of a data-hoarder bunker, I
don't think there's anything wrong with that. I do, however, agree that
sharing things with people that couldn't see it themselves (expanding the
original audience) isn't something anyone should ever do. I'd also like to
point out that once someone removes something from Facebook (or any other
service, for that matter), the authenticity of copies of that information is
now debatable - I could scrape a bunch of stuff and show people, but you have
no way of proving whether or not that information is authentic. For all you
know, my script makes any posts%5 super racist.

~~~
wormseed
People post on facebook assuming the privacy model of facebook, where _they_
have _revokable_ control of who sees what they share. What do you think your
friends would say if you told them you're keeping a personal copy of their
photos even if they try to delete them?

~~~
mudkip
I've actually shown this project to several people with varying degrees of
technical knowledge. The technical people have generally already considered
the privacy implications of what they've shared on the internet, fully
expecting most things to become public thanks to crappy code/policies. They're
usually more interested in what a rewrite of TAO looks like, or how you
decompile rendered frontend code back into accessible JSON.

As for the less technical people, they tend to be significantly more annoyed
at Facebook when I tell them that their old comment threads / messages are
likely incoherent junk.

------
basch
This is a semantics argument. Facebook considers "your data" things you
uploaded to Facebook intentionally.

They do not consider "things people learned about you by watching you" or
"things people said about you" your data, they consider it their (the watchers
or speaker) data.

It's not all that different from how photography laws work in different
countries. If someone takes a picture of you, is it their picture, or your
picture?

EDIT: Im in agreement this doesn't match up with GDPR. Outside of Europe,
facebook can treat "Your Data" differently, and the button can function
differently in different places.

~~~
mcny
This is disingenuous.

If Alice wrote on Bob's wall, it should be a part of both Alice's and Bob's
takeout. Similarly, if I am tagged in a photo, it should be a part of my
takeout. Imagine if my email provider (Google Gmail) said you can only takeout
the emails in your sent folder.

In fact, I'd argue if Alice has made their contact information (email, phone
number, physical address) visible to Bob, it should be a part of Bob's
takeout. Including Alice's location history (provided Alice shared it with
Bob) would probably be pushing it a little but only because it becomes
difficult to argue for people with whom too many people share data with but
anything that Bob is explicitly and manually tagged and is visible to Bob on
Facebook in should definitely be a part of Bob's download.

Facebook can't have it both ways: you can't make it easy (by default) to share
and still use the data as a moat.

~~~
marcus_holmes
as basch said, that's not how copyright works in photography. If you're tagged
in a photo as a subject of that photo, it is _not_ your photo, and if you take
a copy of it then that's a breach of copyright. It belongs to the
photographer.

Our expectations don't match with actual law here.

~~~
p49k
We’re not talking about copyright, though - we’re taking about privacy rights,
and - even if we discard compelling fair use arguments - everyone who uses
Facebook gives them a license to distribute the photos they upload to the site
to others for various reasons. Copyright law doesn’t really come into play,
and of course much data involved here isn’t even copyrightable.

~~~
basch
its an analogy, im not bringing up copyright law as something that impacts the
situation, its an illustration of a similar concept.

Some countries treat photography as copyright as the priority, others let
privacy law take precedent.

------
freeAgent
They have also made a very insidious change to their /ads/preferences page
within the past month or two:

1\. Entries on the list of companies that have uploaded your contact
information to Facebook cannot be blocked by simply clicking an 'x' on their
name. You must click for "View Controls" and then click on two additional
buttons to not allow them to target (or exclude) you.

2\. The list is in RANDOM order (as far as I, a user, can tell)

3\. There is no ability to distinguish between blocked and unblocked entries
from the main page. Users must click into each entry to check the status.

4\. New entries are not highlighted or indicated in any way. Because of this,
and combined with facts 2 & 3, users must now re-check each entry manually in
order to be sure they are catching any new entries that arise.

I get new entries on a near-weekly basis despite now (but not in the past)
using a dedicated Facebook-only email address and removing my phone number and
all other identifying information from the site. Facebook is maintaining a
shadow profile on me which includes data I scrubbed from my account and
profile over a year ago, and they are still matching advertisers to my profile
based on the scrubbed data. That scrubbed data DOES NOT appear in the
"download your information" tool, either.

~~~
radicaldreamer
Lots of dark patterns in these interfaces. The Google one has an incredible
amount of white space and drop downs to obscure content.

~~~
judge2020
Are you talking about adssettings.google.com? That site is pretty good, it
shows what it thinks you're interested in and turning them off is click ->
turn off; plus the topics you've already turned off aren't part of the list of
things it's currently targeting.

------
notRobot
SayIt's comment is very relevant so I'm sharing it here:

> _These companies are like online governments. They have a great deal of
> influence over very many persons, and can affect their lives substantially.
> But they are not democracies. They are totalitarian, with one person at the
> head of the company able to make many powerful decisions unilaterally, if he
> wishes. [...] Giving you some control over your data does not change the
> fact that these are essentially digital dictatorships._

From:
[https://www.schneier.com/blog/archives/2020/03/facebooks_dow...](https://www.schneier.com/blog/archives/2020/03/facebooks_downl.html#c6806851)

~~~
aldoushuxley001
granted, but could you imagine not having one person at the head of the
company? What a sh*tshow that'd be.

------
NikolaeVarius
Original link [https://privacyinternational.org/long-read/3372/no-
facebooks...](https://privacyinternational.org/long-read/3372/no-facebooks-
not-telling-you-everything)

~~~
dang
Changed to that from
[https://www.schneier.com/blog/archives/2020/03/facebooks_dow...](https://www.schneier.com/blog/archives/2020/03/facebooks_downl.html).
Thanks!

------
0j
Does anyone know of tools that make it easier to browse and analyze the data
downloaded from Facebook or Google (offline)? For example visualizing the
location timeline, statistical analysis etc. This is something I have been
looking for for a while. Both as a tool to analyze my behavior, and to better
understand and communicate to others how much these companies know about us.

~~~
simonw
I've been building tools for this at
[https://github.com/dogsheep](https://github.com/dogsheep)

The unifying idea is to convert data dumps from these kinds of companies into
SQLite databases, then query and visualize them using
[https://github.com/simonw/datasette](https://github.com/simonw/datasette) and
[https://github.com/simonw/datasette-
vega](https://github.com/simonw/datasette-vega)

------
jl2718
After downloading my data, I’ve begun to worry not only that they know too
much, but that what they ‘know’ about me is quite wrong.

~~~
notRobot
> but that what they ‘know’ about me is quite wrong.

Isn't that a good thing?

~~~
jschwartzi
When it's innocuous, such as Google thinking I'm interested in _women's_
fashion and showing me ads for women's clothing sites since I've indicated an
interest in fashionable clothing, it's not a bad thing(this is actually what
happens!). But imagine if Google's profile of me indicated an interest in
something dark like fascism or racism, and they sold that data on which later
resulted in me losing job opportunities or credit. That would be a really bad
thing, because their highly inaccurate algorithm had a negative impact on my
life.

The slippery slope is slippery because there are a lot of downward steps, not
because there's one big step. Companies already buy your credit score and
financials before deciding whether to hire you. Imagine if they bought your
entire Google profile and used that too? It's about as accurate as a
personality test and there's no way to dispute it.

It's worse than Social Credit Scores because Google doesn't even admit they
have these profiles of you, yet they sell the data as accurate.

~~~
joshuamorton
Do you believe that Google or Facebook sells personalized profiles of the form
"jschwartzi is interested in tacos"?

Because that isn't how these companies work.

~~~
notRobot
They might not right now, but they certainly could start doing so literally
any day.

Also they can be compelled to legally hand over such data by governments.

Or it could be leaked due to a security vulnerability or rogue employee.

------
fsflover
And they do not comply with GDPR:
[https://ruben.verborgh.org/facebook/#history](https://ruben.verborgh.org/facebook/#history)

~~~
Aachen
That is both hilariously written and quite frustrating to read. Thank you for
your perseverance and writing it up!

Is there an RSS feed or mailing list where updates are pushed to?

