

Twitter Starts Rolling Out Option To Download Your Twitter Archive In One File - iProject
http://techcrunch.com/2012/12/16/twitter-starts-rolling-out-option-to-download-your-twitter-archive-request-every-tweet-youve-ever-made-in-one-file/

======
jimray
I got mine this weekend. It's a solid implementation, a great 1.0 and
something Twitter and the engineers who worked on it should be proud of.

When you click the "get my tweets" button, it kicks off a process in Twitter
land somewhere. A few minutes later, you get an email with a link to a zip
file.

The file contains a full archive of every one of your public tweets, including
@replies you've made, but not DM's or replies to you or follower/following
information, etc. It's just _your_ public tweets. The tweets themselves are
stored as CSV _and_ JSON. Which is actually pretty cool because it means you
can build your own apps, or archive apps like Thinkup can ingest your tweets,
if you're so inclined.

As the article states, you can explore your archive via a web app that all
works client side in a browser, based on Bootstrap, natch. The app works quite
well. You can search your archive quickly and easily. It points to the
canonical URL of the tweet on twitter.com. There's some pretty basic
visualization of your tweet archive.

The javascript that runs the app is all minified so it's kind of hard to
explore. The application is named "Grailbird" which I thought was kinda clever
( <http://en.wikipedia.org/wiki/Ivory-billed_Woodpecker> ).

On a personal note, it was pretty great (though often cringeworthy) to be able
to roll back through 6 years of tweets. I found the very first tweet that the
woman I'd end up marrying every replied to. The tweets that led to friendships
and career changes.

It's a solid first step. Nice work, Twitter.

~~~
graue
Thanks for this, I was immediately curious what's included and how it works.

Would you be willing to share an example of the JSON structure for a tweet?

~~~
jimray
Here's my first ever tweet. There's a separate JS file for every month of
tweets. The JSON object itself is named according to the month.

    
    
      Grailbird.data.tweets_2006_12 = 
      [{
        "source" : "web",
        "entities" : {
          "user_mentions" : [ ],
          "media" : [ ],
          "hashtags" : [ ],
         "urls" : [ ]
        },
        "geo" : {
        },
        "id_str" : "547413",
        "text" : "counting down the seconds until 5",
        "id" : 547413,
        "created_at" : "Sat Dec 02 00:57:17 +0000 2006",
        "user" : {
          "name" : "Jim Ray",
          "screen_name" : "jimray",
          "protected" : false,
          "id_str" : "35623",
          "profile_image_url_https" :   "https://si0.twimg.com/profile_images/1234214846/avatar_normal.jpg,
          "id" : 35623,
          "verified" : false
        } ]
    

The CSV data is much more basic

    
    
      547413,2006-12-02 00:57:17 +0000,counting down the seconds until 5,

~~~
napoleond
Does anyone know the purpose of having both the "id" and "id_str" attributes?

~~~
russss
It's probably because modern tweet IDs are larger than a 32-bit integer.
Presumably some JSON parsers aren't too hot on parsing bigints, so they give
you the option of having a string instead.

~~~
napoleond
Ahh, that makes sense. Thanks! (And thanks to andrewf as well!)

------
kylec
I've been using TweetNest (<http://pongsocket.com/tweetnest/>) for a while to
get an archive of my own tweets and will continue to do so, but it's nice that
they're providing an official way. TweetNest, as well as all other tweet
archivers, are limited by the 3200 (IIRC) tweet limit in the API, I wonder if
they'll be adding support to import from the backup that will presumably have
everything.

------
state
When will there be an option to upload the archive to App.net?

~~~
ihuman
Why would you want to do that? Then the timeline of everyone that follows you
would be spammed with updates.

~~~
aes256
Ideally (not that this would ever happen) the third party service (e.g.
App.net) would be able to query Twitter as to the veracity of each user's
archive (verify an md5 hash or whatever kids use these days) then allow the
tweets to be backdated into their system.

~~~
FuzzyDunlop
I imagine that dealing with the t.co URLs, and mentions by/conversations with
Twitter users who aren't on App.net, or use a different name, would be a
problem too.

------
greghinch
> By the end of the year I’ve already promised this, so the engineers – when I
> promised it publicly they’re already mad at me so they can keep being mad at
> me.

Wow I would hate to work at Twitter.

~~~
tibbon
I'd be really curious to know more about Twitter's database setup. There's
clearly something that they've done for scale that makes this a difficult
thing. Otherwise, implementing a button that SELECT * FROM tweets WHERE
user_id = 1; wouldn't take long to implement.

~~~
mmahemoff
I can only guess the format for archived tweets is very different from newer
one, maybe just a compressed text file, which the engineers have to automate
extracting and parsing at scale. Doesn't seem that hard, but the format was
probably a hack job back in the day and there are probably a lot of
considerations, e.g. metadata like RTs and favorites.

Plus, CEO's comments sound like it's been set up as a side project. The kind
of management style books like Mythical Man Month and Peopleware warned us
about.

~~~
aes256
Indeed, this would also explain why the search function only allows one to
view tweets going back seven days.

~~~
dylanvee
I interviewed with a Twitter search engineer and he told me that that's
because their in-memory search index is not big enough hold all tweets ever
posted. On top of that, their efforts to scale up the index are counteracted
by the ever-increasing volume of tweets posted per day.

~~~
diego
And the reason it's not big enough is because they don't want to invest in it.
They found out search is not as monetizable as they thought it would be, and
at the same time they cannot take away the functionality. The compromise is to
keep what they have.

------
tlrobinson
I was pretty impressed with Facebook's data export option as well (click
"Download a copy" on <https://www.facebook.com/settings>).

------
davidu
I want my DM archive.

------
nibz
How far does it go back?

~~~
mscarborough
First sentence from the article:

"It looks like Twitter has started rolling out the option to let users
download all their tweets"

