
Ask HN: Has Facebook 'Download your information' become very slow? - nischalsamji
I requested a copy of my facebook data to be downloaded on December 2nd. The status page still says my download is pending. Did anyone else face this issue?<p>I downloaded my information previously in a couple of instances and it was very fast then.
======
Waterluvian
I'd love to understand what is going on mechanically when these requests are
made. Like what's actually being done that takes my google data download
request days to fulfill?

My uninformed guess is that it's a service which orchestrates internal API
calls to all the other services and builds a tarball. And the reason it takes
forever is probably mostly just low priority queuing of all these various
requests.

~~~
londons_explore
I can provide insight here...

A typical archive might touch 50+ services. Each of those services has an API
to export data which is called. If any service is down, the whole thing is
delayed.

Internally, each service has to go retrieve all the data. _All the data_.
That's typically a very expensive operation - A datastore for a document
editor would perhaps be designed for an average user to store 100k documents,
but perhaps only access 10 per day. There's a good chance the data is sharded
per user, which means the work of retrieving _all the data_ is going to fall
on just one machine/storage server/application server/rendering
server/whatever. That server still has other users to service too, so we can't
hammer it flat out with your request.

Many types of data, when old, get archived on hard disk, since the chance of a
user accessing an email attachment from 2009 is very very low. When creating a
mail archive however, _all_ those old mail attachments need accessing, and
remember there's a good chance they're sharded by user, and therefore all on a
small set of disks.

Remember most of the applications were designed before data exporting was a
thing, so typically there is no API to read all data, and instead it must be
implemented as a 'list all objects then retrieve objects one/a few at a time'.

If a disk seek on a 7200 rpm disk takes 10 milliseconds, and you have 1
million mail attachments to retrieve in random order, thats 3 hours, assuming
no other load on that disk cluster.

~~~
Waterluvian
Thanks for sharing. A lot of concepts for me to parse and Google. I'm having a
fun evening of it.

------
lazzlazzlazz
I've been doing this regularly for years, and it's always taken a few days -
so I wouldn't describe it as "very fast", but I've never seen it take more
than a week.

~~~
nischalsamji
Yea... The last time I did it, it took me 4 hours. Now it's been pending for
12 days as of today.

~~~
elliekelly
I submitted a Facebook request on Nov. 18 and had the file on Nov. 19. I
submitted a WhatsApp request 8 days ago and still haven't received anything.
Interestingly, their service promises it within two (or maybe three?) business
days.

