Hacker News new | past | comments | ask | show | jobs | submit login
Google Takeout: Download your data from Picasa, Contacts, Buzz and Profiles (google.com)
186 points by ch0wn on June 28, 2011 | hide | past | favorite | 47 comments



There are some serious problems with their Picasa export (possibly others, I only tried Picasa). First, a good number of my photos simply didn't download - I only got 1328 of my 2283 photos. Second, I uploaded all my photos at their native sizes to Picasa, but the photos I got from Takeout were resized to around 1200x1600. However, my originals are kept by Google and I can download them individually through the Picasa interface, so they must be resizing them specifically for the download.

The first strikes me as a bug that will eventually be fixed, but the second seems to be an unfortunate design choice - Google can save money on bandwidth if they downsize your photos, and if you know that what you get when you export is downsized you're less likely to do it. I feel sorry, though, for people whose only copy of some photos is in Picasa.


Is the amount of bandwidth saved by Google in doing this really going to be all that cost efficient? I would chalk this up to a bug or poor design choice as well rather than something with the bottom-line in mind.


There are things I would _really_ like from google - where I fit in to their adsense categories, for example. (I'm not sure whether this data is anonymised or not)

I just downloaded this data, all the data they would give me. I got my email address and my name back. I also got a handful of contacts. I consider myself a reasonably heavy google[-owned projects] user.

This, for the most part, is useless information.

What I would like is exactly what they would give to the government (obviously, after confirming fifty times that it's actually me they're giving it to).

Regardless of all that, I think this should boost Google's image. Very smart to focus on social privacy, when that's Facebook's one downfall.



There are things I would _really_ like from google - where I fit in to their adsense categories, for example.

Absolutely. I'd love to be able to get all of my Analytics data, for example. Retrieving some subset of the data report by report is nowhere near the same thing.


Why not give me a nice big mbox file with all my emails in it as well? That's honestly the main thing that Google has that I care about preserving.


This is possible via the Google Apps Email Audit API[1], which is accessibly only by paid Google Apps accounts. The API submitted export request takes usually several hours to complete and results in a series of large mbox files (1 GB chunks I think) containing the entire mailbox, including trash (if requested). However some meta information such as labels/folders is not present, but the results can easily be concatenated into a single mbox file containing the entire contents of a Google Apps email account.

[1] http://code.google.com/googleapps/domain/audit/docs/1.0/audi...


Thanks for that. Weird that it's a buried feature.

Seems like "easy self-directed backups" would be a nice selling point.


Shouldn't that be possible via IMAP?


Yes, it is possible


It should be easier than that.


POP? It's leaner.

Seriously, it's an absolutely perfect match for exporting everything, and it's easy to set up, and you get it in the format of your choice. Infinitely easier than having them support only formats they have time to write exporters for. It's also easily resumable, unlike the 8GB zip file I'd otherwise have to download.


You do realize that POP only supports your Inbox, right? The protocol has no concept of 'folders' or 'tags.'


I think if you switch it to "all mail", it downloads everything. Though you might be right.


POP might be a start, though for a proper backup tags need to be included somehow, maybe in a custom header field.


Afaik, not really. Mails in GMail are not represented in a hierarchical tree structure and thus don't directly match the IMAP protocol.


IMAP can be used to download the raw message bodies. hierarchical structure (aka threading) comes from the parsing of Message-ID, References, In-Reply-To MIME headers contained in the bodies.


I mean directories. GMails IMAP interface maps each tag to a directory but that is not exactly the same. Esp. when your extracting algorithm doesn't know about this, you end up with a lot of duplicates.


So only download the All Mail folder into a single mbox.


Then you miss the tagging information.


I'm confused. How does a single large mbox file of all emails in a GMail account map directly to GMail's structure? That's the the OP was asking for.


Unfortunately it is still pretty hard to get any of your Google Talk chat logs out. Most of the solutions are from years ago and no longer work. I ended up being able to download an sqlite database of my chat logs using Google Gears.


You can retrieve Google Talk chat logs with this API call: http://code.google.com/googleapps/appsscript/class_gmailapp....


Do you find them useful for reference or just feel safe having them?


I log everything because quite often it is useful to refer back to them.


It might defeat the benefits of using the web interface for you, but you could connect to Talk using Pidgin, or something similar.


I already do. I prefer Pidgin's UI anyways. (Also, the chats are logged by gmail anyways, so if I want to use Google's search for the logs over grep, I still have that option)


Releasing this at the same time as Google+ is a very smart PR move. :)


I would guess it's more everyone making launch deadlines before the end of the quarter!


I gave this a spin as well and was fairly happy with it, at least as far as it goes. Takeout has a fairly limited scope, but of course the DLF (greatest project name EVAR) has done more good than just this.


Why is this separate from the Data Liberation Front?

Another 20% project destined to be a quick PR hit and never actually become usable enough to fulfill its promise of giving the user control over his own data?


It doesn't seem to work for me. I tried to create an archive for all my data and it just shows 'Files: 0, Size: 0B' on everything.

Edit: Tried a second time and it works now.

But as it was said, the most important thing I miss is a way to extract the mails (in the way they are stored in GMail with all tags and other meta information).


What meta information does GMail have that you can't get from downloading your emails over POP or IMAP (other than tags)?


Tags are the most important thing for me. If this is not possible, any extracting is really not worth it for me.

To extract tags right now, you could search in every other directory (and assume that this are all available tags) for the same mail and get by that error-prone algorithm all tags of a message.

Maybe also missing is some meta information about the spam level and/or importance level. And meta information why Google thinks that some message is important to me (on the web interface, it sometimes shows a reason like 'because of recent conversation with this person' or so).

And maybe more. I would just like to get everything.


  > et by that error-prone algorithm all tags of a message.
Shouldn't be error-prone. All your emails should have a Message-Id: header. While not impossible, I've never heard of issues due to Message-Id collisions...

You could write something to sync your emails to a database, then when it encounters an email in a 'folder' it can just add it to the email as a tag. I don't know of anything that currently does this, but it's not like the technology (and information) isn't there.

  > And meta information why Google thinks that some
  > message is important to me (on the web interface,
  > it sometimes shows a reason like 'because of recent
  > conversation with this person' or so).
Of what use is this outside of Google, though? IIRC Google is the only one doing something like this. It's not like you could import that meta information into Outlook/Thunderbird/Mail.app.


Error-prone because it adds stupid complexity and many additional steps for just getting some simple information which Google probably has stored already along with the mail. With additional complexity, you always get further things in your algorithm which could go wrong. Such unnecessary complexity should always be avoided.


This is broken for me. I choose "Contacts and Circles", then "Create archive". It then fails. I try again, it fails a bit later. Again: it succeeds with no obvious way to download the file. I try again and get "Download quota exceeded". The file which I haven't downloaded isn't even a megabyte.


Great, now let me get my Reader stuff.

Edit: And put the buzz stuff in some sort of timeline. I guess this raises the question of an archival format for modern web "experiences"...


Buzz includes shared items/notes in Reader. You can always export (& import) your subscriptions: http://www.google.com/reader/subscriptions/export?hl=en .

Also, although there isn't any manifest or timeline file, each .html file for Buzz has a last modified date that corresponds to when it was created. In addition, in the file itself, it has a timestamp.

This seems more than reasonable. Sure an XML/JSON timeline might be nice, but it wouldn't be human-readable either.


Reader shared items only appear in buzz if you've "connected" reader to buzz. I don't have it connected because people I know hate seeing stuff in buzz they've seen elsewhere.

Plus, I've got hundreds of reader items I shared before buzz existed. I don't think google even keeps them though they imply that they do.

I just used "save web page, complete" in firefox on my buzz feed and got something better than what the takeout buzz download gave me.

The Cloud^TM - because all filesystems should be stochastic. Is my file there? Maybe!! We don't know, honestly. Your query timed out? I guess the system doesn't know either. Try again later!


Voting for mailbox takeout!


What use is an exported that that cannot be easily imported to other system. Google should provide connectors to other popular tools e.g. export data from picasa to flickr or smugmug etc. similarly export to hotmail, yahoo mail etc.


I haven't looked at the exported format, but I assume it's readable. But surely other than that, import tools for other services should do their share of the work.


Why should they do that? Surely that would be something the other companies should do if they wish to encourage importing?


Anti-data-lockin FTW


+1


Besides not understanding hate/low value towards parents' Anti-Data-Lock-in post (On-point with "Avoiding vendor lock-in for computer software": http://en.wikipedia.org/wiki/Vendor_lock-in), I am unsatisfied and frustrated my high-five post gets downvotes. With a certain humility. It's like getting pushed over for waving your hands (or voting).

I passionately support freeing data because of my work and info I want to share freely being archived|trapped away as proprietary property/formats.

I hope a little phrase like Anti-Data-Lock-in can transcend copyright and privacy. Maybe a meme for not losing your data/yourself, by being locked out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: