Hacker News new | past | comments | ask | show | jobs | submit login
Google turns on "Download Gmail Archive" feature (google.com)
430 points by thejerz on Feb 9, 2014 | hide | past | favorite | 155 comments

I'm very impressed with the number of products and services supported so far. It has everything from bookmarks to location history. It even lets you choose the format for some products, and Drive in particular has some nice options. I'm glad to see Google opening this tool up.

Agreed. Say what you want about Google, but I can't think of many, although they might be hotter, start ups that provide an equivalent service.

Them allowing you to download your data isn't anything new.


For Gmail, it is new. Until recently, 'download' was only possible through IMAP and POP3 – and could therefore take days or even weeks due to throttling.

And Google's shitty, shitty, broken IMAP implementation.

Most of the dev work on offlineimap is to cope with GMail's weird ideas on what constitutes functional IMAP.


Ah, don't remind me.

In an old job, they used gmail for email, so most of the company used Thunderbird. This worked fine for most of them, but sysadmins got so much email (mostly alerts) that google's crappy IMAP implementation kept breaking so the clients would be constantly syncing, and folder operations such as moving stuff would often take multiple minutes.

My cheap VPS I run my own email on vastly outperforms gmail's IMAP.

Plot twist:

For NSA too.

Seriously, please keep the NSA comments to NSA articles. I know the NSA issue is important, but yammering on about it in every article is the same as a Bible-bashing Christian raising the topic of Jesus in every single conversation.

And I am Christian, and I do think the NSA issue is important.

I had a moment of paranoia a few months back where I thought it was concerted attempt to trivialize the issue, but then I realized it's just a way to get cheap laughs/upvotes.

Or it's a way to keep it on people's minds - that isn't mutually exclusive with having a few laughs, I guess I can see that more easily than others coming from a nation with a long history of self-deprecating and dark humour, but it's by no means a unique phenomenon.

It's still an improvement; before, the NSA could, and we couldn't.

For large archives, 10GB in our case, we had issues exporting to gzip or bzip2--it would crash before the archive was completed. Sort of annoying in that it took almost 24 hours from the start of the archive creation to the crash event. The help desk could not resolve the issue for us. Originally, we chose gzip/bzip2 formats instead of getting a series of files with zip (i.e. zip max file size is 4GB)

At end of the day, we had success choosing the zip archive format. We did not receive a series of zip files, but just one large uncompressed text file in mbox format.

Our last big data dump was down back in late December 2013, so I am not sure if these issues persist regarding bzip/gzip2 and 10GB size archives.

Easy prediction: they'll disable imap access within a year.

Why would they do that? It seems like disabling IMAP would drive away huge numbers of users and give their competition more advantage.

Just as they would never turn off Jabber support in Google Talk, right?

No, not the same. Some of my least technical friends use imap without knowing they're using imap. Ask them about Jabber and their response would be "What??" which is why it got axed.

Killing imap would cause 60-year-olds I know to complain at the same time that we would complain. You can't have that broad spectrum complaint. As much fuss as there was over whatever that RSS thing was, I signed up and used it for 6 months and then forgot it existed.

In short: Let's put to bed the idea that Google is arbitrary, they're not. If they look at the numbers and see people not really using the product, they take it out back and put a hollow point through its head. (This is also why I'm bullish about Google Plus...they would have killed it otherwise.)

Ignorant considering we have no idea how Google plans to make its money in the future or even now. The older services that were created to make people think it was really cool, just create problems for Google because they were made using open protocols that don't conform to Google's new lock-in strategy. Non of this is "spring cleaning" or "just running the numbers" kind of stuff. Its more like herding cattle based on some strategy that involves location data, driver-less cars, robot/machine learning, all its users data and god only knows what else in order to do god knows what.

"Some of my least technical friends use imap without knowing they're using imap. Ask them about Jabber and their response would be "What??" which is why it got axed."

Ask them about IMAP and their response would be "What??". Your point doesn't support itself.

Sorry, no that wasn't clear now that I look at it.

I tried to type out another paragraph but it was equally as bad. Here is another shot at what I'm trying to get at: They're not the same because I'm willing to bet there are orders of magnitude more users using imap than there were technical people who used and cared about Jabber. Even as a technical person, I hadn't realized Google had removed it and I'm someone who previously used it. And I'm in the small group, not the general public, but I know several people off the top of my head that aren't technical and myself that use imap on a daily basis.

at least for google apps for bussiness, IMAP is a must.

which has drove away tons of users!

Jabber support, in the sense of any client being able to connect to google talk, is perfectly active, hell I've got it open on my desktop right now.

Federation, the ability for having your own jabber network contactable by google talk users and vice-versa, that was (partially) turned off, after a string of spam abuses.

IIRC, using XMPP is going through the legacy "Google Talk" infrastructure which they want to axe in favor of Google+ Hangouts.

They actually temporarily broke Jabber support and then put it back on when people freaked out. I have noticed though that when I have hangouts conversations they don't seem to make it to my Jabber client...

They are probably going to scrap Google Talk. Even now you can't participate in group chats on Hangouts over XMPP which is hugely annoying.

"disabling IMAP would drive away huge numbers of users" -- Why? Ask 10 random people on the street in any large American city if they've ever heard of Gmail, and then if they have ever heard of IMAP. Fair chance the numbers will be something like 8 and (maybe) 1 on average.

On the other hand, disabling IMAP will make sure that people have to look at the webapp, and tie into the Google ecosystem to read their mail. Looking forward five more years, you might even have to have a Google+ account to read your email.

Just because they don't know what the acronym is doesn't mean they don't use it. Ask if they read the mail sent to their gmail address through outlook or the mail app on their iphone.

ask people if they use Apple Maik (on iPhone too) and that drastically changes

Possibly there might be obvious technical hurdles I'm missing here, but why wouldn't Google just release a Gmail app for the iPhone?

They already did, but many people are satisfied with the Apple app and wouldn't appreciate being forced to switch to a different and not-meaningfully-better app.

You got it. For me, I like getting my personal and work emails pushed to my iPhone in the same place (the Mail app). Though I have the Gmail app I don't use it.

I only keep the gmail app around for search, because the way gmail or iphone Mail.app implement IMAP search is not so good.

Because of iphone.

Wouldn't Google just release a "Gmail" app for the iPhone / Windows Phone?

There already is one.

Not for Windows Phone there isn't, and they probably won't release one.

Or they could forward all their Google email to another email provider that does offer IMAP.

I'd certainly leave immediately.

Why haven't you already?

I'm not him, but I'm in the same situation: I currently use Gmail almost exclusively through IMAP, and I'd jump ship instantaneously if they turned it off.

Why haven't I left already? Because it works fine and does everything I need.

Why would I leave? The way you ask, you seem to think it's obvious, but I have no idea.

Yeah if IMAP access goes on Gmail that would be it for me.

I'd probably bounce back to hosting my own IMAP and using some utility to download my Gmail to it with Push or whatever protocol (HTTP?) that Gmail uses for push.

It would be nice to have Sieve scripts again to filter my mail (Gmail's filtering leaves much to be desired).

Because it's a hassle and it's easier to keep using it so far, and the alternatives aren't as nice in other ways. It's going to be a bigger pain to do it once I'm forced to, but…

I haven't left because:

(1) I don't have a server where I can run my own mail infrastructure -- I don't think my ISP would be happy with me running a mail server off my cable modem). And

(2) I don't trust Outlook.com not to silently discard mail from randomly selected senders (as I believe Hotmail was known to do). Gmail is reliable.

But otherwise, I'm ready. I've already stopped using the Gmail web interface (retaliation for killing Google Reader). And once I'm no longer tied to Gmail anymore, I can switch to a better IMAP client than Thunderbird.

Have you tried Fastmail? I switched to them a while ago and haven't looked back. Their web UI isn't as nice, but they have an honest business model with a much better privacy policy and migrating is pretty straightforward.

Their web UI isn't as nice,

I like their web UI better than GMail. It is cleaner than GMail and much faster most of the time. It's also worth noting that they now have a beta version of a calendar (with CalDAV) support:


And they are apparently working on CardDAV.

I have a fastmail account, it is worse than gmail in these respects:

- the spam filter is not very effective. I have marked as spam a certain kind of spammy newsletters many times but they still come throught.

- No calendar or contacts you can sync to your phone.

Another anecdote here -- I also use them and like the service.

In all fairness, every mail server discards mail from some senders. Your junk folder would explode if everything got through.

Hopefully they reject this mail, not simply discard it.

Huge numbers of users who are not using the web interface/mobile app and are not seeing the ads. Those might as well not exist.

Google still gets to mine all the email they receive for data, and if they use the same Google account (or maybe IP address) for anything else, use it to target the ads there. It also brings network effects, since you don't have to convince the user to sign up for a new Google account for some other Google service if they already have one for gmail.

Sure, it has some benefits. But if it can help push 60% of them to the Gmail app, if might be worth it to Google, even if it's to lose 40%.

And I seriously doubt it will lose 40% even. It's not like people have many decent options for webmail + mobile mail, especially if you exclude paying for it.

And what about paying customers for hosted domains?

What about them? They're paying for the web mail service, not IMAP specifically. And their "terms of use" probably already cover that Google can do whatever it likes to the service.

And alienate millions of iPhone users that use Gmail but for one reason or another don't know about or haven't downloaded the Gmail app? Lol

No need for knee-jerk "lols", it makes a lot of sense.

For one, it will force most of them to download the Gmail app.

Second, those iPhone users don't benefit Google -- Google needs to control the experience, the ads etc in order to make money off them.

So, you have users who have years of e-mail in GMail, use GMail for free, use their @gmail.com everywhere, and probably don't know how to migrate their e-mail to another provider.

Give them a choice: (1) go through a messy migration or (2) install another app. Most will probably do (2) without even thinking about it.

And then you have to factor in that worldwide, over 80% of the smartphone users already have a platform where GMail is the default mail app:


20% of a big pie is still pretty big.

I think they'd definitely do it. Google has decided to flip the switch and monetize as much as possible lately, having to use the Gmail app would be in line with that.

The benefit of gmail to Google is the intelligence gained for targeted marketing by scanning your email.

The worry that Google will disable IMAP misses this point. Google gains so much ancillary intelligence from scanning your email, that if they lose market share, their overall ability to target unrelated ads (e.g. during search).

They already turned off Exchange access, but they grandfathered in any device that already added it. Unfortunately, now there is no way to get pushed gmail on phones like the iPhone without using the native application.

Google Sync was only disabled for free accounts. It's still available for paid accounts.

As long as they keep POP3 (or maybe extend it), I'm happy.

I never really understood why anyone would want to use GMail but NOT use the web interface. I mean, it's cool that they offer it, but I don't recall it working especially well.

A lot of people have more than one email account (personal, work, school, etc.) and not all of them are Gmail accounts.

Currently, the easiest way to manage multiple email accounts at the same time is by using a standalone IMAP client. Five accounts in the sidebar that I can access with a single click, with Unified Inbox at the top!

Gmail's web interface only lets you access one email account at a time. You could have delegated accounts, but they open in a separate tab, and the whole concept of delegation only works for Gmail accounts anyway. Or you could forward everything to your Gmail account and call it a day, but some employers might have a problem with that, especially in light of Google's apparently comfy relationship with No Such Agency.

Personally, I use my Gmail address to subscribe to public newsgroups, but I wouldn't let anything private ever touch a Google server. With a standalone IMAP client, it's very easy to maintain this kind of separation without having to suffer any noticeable inconvenience.

I have multiple accounts and what I've done is forward everything to my personal gmail address, and use gmail's "send with" email to reply with those addresses (obviously it's still going through gmail's servers though).

You might want a greater seperation of your inboxes, though, so your results may vary.

Gmail can get mail from other accounts via pop or imap.

Among many other reasons: because the web interface is utterly incapable of sending and receiving unmangled patches, or more generally sending and receiving plain text without wrapping.

Also: because people might want security, such as GPG.

I don't think the parent post was arguing about that, I think he was arguing that if these are important features to you, why choose gmail in the first place?

Upsides: it's ubiquitous, well-provisioned, has very high uptimes (a few outages, those tend to become national or international news), and, for now, good support of IMPAPS or POPS access for standalone email clients (which I use on my phone and desktop).

Downsides: Snooping, NSA honeypot, Chinese government hacker honeypot, etc., etc.

Because you already have a Gmail account when you decide to care about these issues?

Maybe I did not choose it? Maybe my employer did?

I wouldnt disagree with anything but the GPG, a dishearteningly small number of users use such features.

Moreover, if you regularly use GPG, the GMail web interface would be nearly unusable, no? So it seems you'd be much better off with nearly any other email provider.

Most GPG messages I get are signed, not encrypted, and I usually don't need to check the signature. So the web client is fine, but it's still good to have desktop client access.

Why are other email providers better than (Gmail - web interface)?

there are some browser extensions for that

Desktop e-mail clients have more features than the web interface. Also, you get to keep a copy of your e-mails, so if Google removes your account (it may happen) you don't loose anything.

Also, you get to keep a copy of your e-mails, so if Google removes your account...

Or, more likely, if you're offline for any reason at all. All web tools disappear in a puff of smoke if there's a problem between you and your ISP. Or, more rarely, between your ISP and their peers.

It's worth mentioning Gmail Offline. I actually like that interface more than the regular one.


You can use gmail offline in the browser now. I do this most mornings on the subway. It's read/write and very similar to having a desktop app.

If you think desktop clients have more features than Gmail's web interface, you either haven't explored the features Gmail offers, or the set of features you care about is specialized.

That's a Universal Argument: if someone names a feature, you have two outs. Either Gmail has it, or it's "specialized".

Here's one feature my email client has: automatic spell-check, such that emails are disallowed from being sent if they have any spelling errors (with an override, of course). Gmail has a manual-trigger spell-check, while the browser has automatic spell-checking, but neither have a personal (jargon-customized) dictionary built up over decades nor an automatic modal dialog if the message to be sent has a mistake.

I guess automatic spellcheck override dialogs are specialized, eh?

I prefer Thunderbird's search to Gmail. I also have 4 email accounts I use and having them all in one client (and being able to move mail between them) isn't something I would be willing to give up.

Privacy, security and lack of ads aren't "specialized features"

Non-tech people use Gmail because it's free, portable across ISPs [though they don't use the term "ISP"] and a gmail address is socially acceptable in a professional setting, unlike Hotmail, Yahoo, Mac, or AOL...hmm...a CompuServe address, now that would be nerdtastic.

Anyway, I can't see why a tech savvy person wouldn't just have a domain and email hosting service instead of gmail and being locked to Google's whims, e.g. the ability to download could be a first step to shuttering gmail since gmail doesn't provide the core tracking data Google uses while creating specific privacy headaches - I.e. gmail is problematic to monetize efficiently.

> I can't see why a tech savvy person wouldn't just have a domain and email hosting service instead of gmail

Maintaining your own email server to the same level of reliability (backups, etc), speed, and functionality as gmail would be a lot of work. Not every "tech savvy" person out there wants to invest the time and resources needed to duplicate what gmail gives them for free.

[Also gmail's web interface is very, very, good, and it's well-integrated with other google products, which are very popular.]

Why does Gmail not provide valuable tracking data? It has tons of very personal content, it has contacts, so I don't see what's missing if you're in the business of compiling personal profiles.

Biggest problem is when you have like 10 email accounts to check. A desktop client having them all in one place makes it MUCH more convenient.

I suppose my criticism applies to all web based email services. I don't use the web interface because its sucky, slow, bloated and consumes hundreds of megabytes just to show a couple of web-pages. The searches are slow as hell for me - I expect results to complex queries (contains X, doesn't contain Y, within specified date-range .. etc) in under a second - as what I get from native email clients which do indexing.

I don't understand people who use the web interface. I guess many people are used to the general suckiness of web-apps.

It actually works pretty well, at least in Thunderbird. I don't use gmail for my personal email, but my employer uses google apps. IMAP access lets me have both accounts in one interface.

I know plenty of people that use it via IMAP exclusively, myself included. OS X & iOS Mail.app works just fine, I enjoy the spam filtering and automatic filtering.

I have 6 Gmail accounts I use on a regular basis. I also like having archives of all my emails going back decades.

However, I use POP3 and SMTP with a desktop client, not IMAP.

Mobile devices?

no worries, Microsoft now has free IMAP on outlook. back to hotmail :-)

I ran it and it gave me a zip containing an html file called errors.html with a bunch of errors and no actual emails.

Same here, I just got 14 MBs of errors.html and nothing else.

Tried running it twice, all I get are emails informing me they can't do it ( http://i.imgur.com/zlFgQX4.png ).

This error might have come up because you have an open Gmail client session running (or another Google service), or simply that the new service is completely overloaded right now.

Same here. I got a 64MB errors.html file. No actual data.

Hopefully some human is checking the logs and will be fixing this.

I checked this with a new/small account and it gave me the mails al right.

+1 same issue.

I have 10.6GB of email. Wonder if that might have something to do with it.

Yes -- just to be clear, the submission you link to occurred before Google released the Gmail export feature.

I've used the Gmail export feature weeks ago - this does not appear to be anything new.

It (gmail export) was slowly deployed to more and more accounts over the last few weeks

If I wanted to download my mbox file from Gmail and reupload it at a later date to Gmail (possibly another of my accounts), I could do that without a problem, right? Or will they just be a jumble of emails, out of order or something? I suppose at the least, they won't retain their labels.

I'm glad they've rolled this out widely, but unfortunately, when I attempted to use it, it simply sent me a 5MB error log and not a single bit of archived data.

If I'm already using gmvault, is there any advantage to downloading it directly?

Anybody here tried this at all?

I tried to get a takeout with my emails several times, all I got is on 2.3MB errors.html, telling "service cannot retrieve this item" over and over again.

Same here.

The new option is great but I am still very happy with a (payware) app called CloudPull. It can automatically backup all your Google Apps data and is not limited to Gmail but also covers Drive.


CloudPull is OS X only. I'm sure there're are other great options to create Gmail backups independent from Google

Am I missing something new? Hasn't this been in place for couple months?

When they announced it, it did not support Gmail. (Or many of the other Google services)

Takeout was announced in 2011. Takeout for Gmail was announced in December 2013. Gmail has always supported POP and IMAP which are functionally equivalent to or superior to the MBOX archive format provided by Takeout, for most purposes.

When it was first announced, it was available to everyone (I didn't have it). I think now it's open to everyone.

Yes, in fact, as IMAP/POP3 access where you could download everything to your computer. I don't see why all the excitement.

I had to migrate around 100 accounts out of Gmail (free) in 2009 and it was a pain because their IMAP implementation was sub-par and so all the IMAP migration tools (imapcopy etc..) failed/halted somewhere during the process on each mailbox. At the end we had to configure accounts on various computers and manually download them (probably two bad IMAP implementation, Gmail and Outlook, worked better together than using other tools :)) Of course POP was not acceptable because of lack of labels/folders support.

If this existed at that time it would have saved us a lot of hassle...

Shameless plug: http://thehorcrux.com/

I've tried to solve the exact same problem. Backing up emails and easy migration.

It has; grabbed mine last week.

I've had 4 "Failed - Network error" messages in a row.

I have twice created an archive with my Hangouts data, and the archive contained only an errors.html file explaining that Hangouts.json failed with a "Service failed to retrieve this item" message. I left "feedback" saying so. I guess we'll see where this goes.

All right, finally got a nonempty archive. Looks like it's probably complete. Seems fairly legit. However, there is one thing that confuses me so far. Looks like a lot of links get a link_target attribute that is different from the actual link:

   "type" : "LINK",
   "text" : "http://en.wikipedia.org/wiki/American_Letter_Mail_Company",
   "link_data" : {
     "link_target" : "http://www.google.com/url?q=http%3A%2%2F\
                      &sa=D&sntz=1&usg=[alphanumeric token scrubbed]",
     "display_url" : "http://en.wikipedia.org\
However, links to Youtube, or Google, or any subdomain of either, have a link_target attribute identical to their display_url. In-teresting. I wonder what that alphanumeric token is. It's like how Google search results also have, rather than normal links, links to google.com/url?[a ton of URL parameters]. I assume the latter is so they can gather data about what URLs are clicked in the search results, or possibly pasted elsewhere, and conceivably to discourage scraping. And as for this?

Perhaps it's a kludge for Google chat clients, which need to parse URLs (they do something special with Youtube links) and might be thus freed to do it stupidly. Perhaps Google wants to know what people do with their downloaded Hangout archives. Perhaps Google wants to know what people do with Hangout history in the browser, and they've changed the links in that archive, and then they just leave it that way in the exported format. --Turns out Hangout history in the browser has exactly those links... I'm guessing it's the last one. Well, at any rate, at least it's easy to ignore that field.

When you create the Takeout or when you try to download the archive?

I created it twice successfully last night, I thought maybe the servers are overloaded so I waited to download the second one until this morning. Checked my Gmail, got the your archive download is ready email, clicked the link, logged into Google's site, clicked Download Archive...mine is several hundred megs in size, it gets between 60 and 150 megs downloaded then again this morning, it quits with the error I described.

Maybe there's some bug relating to the size of my download but I'd think a few hundred megs shouldn't be an issue to download.

Perhaps you should choose a single category (like Gmail) of your Google content, download that, then go back and download another single category until you have it all.

It would probably work but it's only a few hundred megs I'm talking about. I logged onto my Linode and created a 588 meg file with this:

  dd if=/dev/zero of=testing bs=196k count=3k oflag=dsync 
Then I downloaded it and it went fine. So the problem is on Google's side.

The fact that I tried downloading the first archive so many times their service told me finally that I'd downloaded it too many times then the second archive failed around the same rough time frame. Their service is counting the failed downloads as successful, that's bad in itself.

I've been similarly unable to download my archive as well :(

I think this is Google's half hearted/half assed attempt at allaying the fears of those concerned about 'what Google might do next'. As a lot of you who have tried to export out have seen, this does not work satisfactorily or does not work at all! (See other's experiences in this thread).

I myself tried to export gmail contacts out and found it does not work as expected and does not work at all for groups. Nor does it export full data in vc, like contact photos. Why not? It's supported in the file format so why not add it?

Seems like they're making the fixing of export-out bugs a low priority. So low that its not even working at all for some. To me all this seems like deliberate negligence.

I'm getting "Service failed to retrieve this item."

For every item I tried to export.

I hope it will one day include my YouTube (favourites) - as I still get an empty result for archiving my YouTube profile. It requires Google+ to access that now, from what I've tried to research on it.

What can you do with the downloaded mail archive?

Would it be possible to load/import it into Thunderbird and get to view all the same folders as in Gmail now for example?

I'll just ask the NSA for mine ;-)

Who do you think they created this feature for?

Is this new? I'm pretty sure it's been around in some form for a couple of years at least.

Gmail was announced several weeks ago (maybe December?) as coming to Takeout, and last time I checked I didn't have it enabled, but now I do. It's by far my biggest file, my last export of all the services I was able to download was ~100MB, but this is at 3.5GB and climbing.

I just requested an archive and I will be surprised if I get it any time soon. Gmail - 5GB, Drive - 3 GB, G+ Photos - 1.5 GB - this is going to be one huge zip file !!

Just as an FYI Takeout's zip files are limited to 2GB (for compatibility reasons) so you'll actually get a couple zip files, if you want just one file you need to pick one of the other archive formats.

Perhaps preparing to turn off 'IMAP access' feature?

Not likely. You can only create a backup once every 24 hours which would be useless for importing into a mail client.

I think the idea here is that now they could turn off IMAP and not be accused of lock-in, whereas IMAP used to eh the only way to liberate your email.

They won't turn off IMAP of course, because they would anger way too many mobile users.

It might be a coincident. But since I started creating an archive I do not get any more email to that account. And it already takes some hours.

Is there anyway to download using wget or curl? Google chrome isn't cutting mustard. After about 4 gigs, it gives me a fat error.

> After about 4 gigs, it gives me a fat error.

That sounds more likely to be an issue with your local storage (4GB being the limit for files on a FAT-formatted hard drive, IIRC). Though on the other hand, somebody still using FAT in this day and age seems a fairly unlikely explanation too :S

Well that's a FAT error then, not Chrome's fault.


They took their time but this does look like the right way to do it.

I'm surprised google are making it easier to escape.

1.4MB file. I don't think it worked lol

You should create backup every month...

Wasn't this there before?

Useful but you could do it with IMAP already.

anyone else getting just a zip of error.html which contains whole bunch of "Service failed to retrieve this item." for all emails?

Or worse: system "got" 7 gigs, still in progress, refresh page, 7 gigs disappear.

now I'm getting this

    Sorry you have have reached the maximum number of archive
    creations allowed per day. You can try again tomorrow.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact