
Gmvault: gmail backup. Liberate your emails - zoobert
http://gmvault.org/
======
pilif
if your tagline is "For geek and non-geek users." then you probably should not
be depending on users to open a shell and type commands there to access their
emails.

For this to be remotely useful for non-geeks, try adding a small IMAP server
so people can access the mails through their mail clients. Having to re-upload
all mail back to a (presumably new) gmail account will take ages and accessing
the mails locally is not really helpful these days with mime encoding and HTML
email.

On a related note: If you have a small box with a PTR record laying around
somewhere, you can use my setup described at
<http://pilif.github.com/2011/02/how-i-back-up-gmail/> to route all incoming
and outgoing mail through a mail server you own as it moves to and from gmail.

If you ever lose your google account or its data, all mails will be on your
small box in convenient maildir format, ready to be served over IMAP.

~~~
kijin
> _try adding a small IMAP server so people can access the mails through their
> mail clients. Having to re-upload all mail back to a (presumably new) gmail
> account will take ages and accessing the mails locally is not really helpful
> these days with mime encoding and HTML email._

One of the stated purposes of this tool is to "liberate your emails" by
letting you keep local copies of every e-mail you send and receive. If you
need to use a third-party IMAP server to keep a backup of your e-mails, how
does that liberate you? Now you're dependent on two online services instead of
one.

Reading locally stored e-mails would be a problem for non-geeks, though. But I
think the right way to solve that problem would be to offer a program to read
those e-mails, or even better, a plugin to a popular mail client (such as
Thunderbird) so that the e-mails can be accessed through the mail client.
Depending on the "cloud" just detracts from the stated purpose of this tool.

~~~
pilif
I wasn't talking about a third party IMAP server. I was talking about adding
an IMAP darmon right in the backup application itself.

That way, if your google account goes away (I guess we all agree that
administrative account lockout is way more likely than google actually losing
data) and you need an email RIGHT NOW, you would just configure any imap
client to use the daemon provided by the backup application (localhost:143)

Or it could bundle some web based mail client and then open a browser.

It's impractical to reupload the whole backup to gmail before being able to
access your mail.

~~~
kijin
Thanks for the clarification, it seems that I misunderstood your comment. A
local IMAP server would fit right into my suggestion of mail client
integration, without the headache of writing a plugin.

But now you've got a background process running on a privileged port, yikes!
(It's got to be a daemon that starts automatically, because "non-geeks" will
not understand why they have to start another program before accessing their
mail in Thunderbird.)

~~~
EwanG
Better option (IMNSHO), if someone wants to read the email they don't REALLY
care if it's as an email. So a reader application that parses the DL'd file so
that you can do rudimentary search, find what you want, and then copy/paste as
necessary.

------
digitalsushi
Just a few weeks ago I was looking for exactly this piece of software! What
brilliant timing.

But a quick question of concern - gmail is pretty smart and can tell when I am
just using it for bulk storage. If you send yourself 8 emails in a short
amount of time, each with maximum sized attachment sizes, gmail will politely
tell you to cut it out for a while.

Does this software run the same risk? I am wary to let it just keep barging in
after connection errors, if there's a chance gmail will get upset and ban me
from my own account. My gmail is my lifeblood, and I want to back it up from
the entity that scares me the most of taking it away from me.

Thanks for making this!!

~~~
justincormack
It should be no different from what would happen if you connect a new imap
program that does a full sync of your mailbox, which many do.

------
utexaspunk
Sweet! I'm gonna keep my gmail backup on my google drive!

~~~
amouat
Err, isn't there a flaw in that plan?

Surely the back-up should protect you in case you lose access to your Google
account...

~~~
camiller
I think he was trying to be subtly ironic. Perhaps too subtle.

~~~
amouat
I wondered that after I wrote the reply.

However, I bet some people do use google drive to back-up other google
services. It's not ironic, just foolish IMO.

~~~
icebraining
You can always create different Google accounts for each service. My "main"
account (Reader, Youtube, etc) was never tied to my Gmail & Calendar account,
and the probability of losing both at the same time is rather small.

~~~
mcherm
Not as small as you may think. Do you really believe that Google can't connect
the dots (or the IP addresses) and figure out that both accounts are
associated with the same person?

~~~
skeletonjelly
They'd have to tread carefully. Would suck to have an entire university banned
from Google services ;)

~~~
mcherm
True... but I imagine they can also distinguish between frequently-shared IPs
and rarely-shared ones.

------
DanBlake
Great tool, awesome work. Gmail should really offer a link to backup your mail
though. I know you can do it with mail clients and this, but its annoying I
cant just download one big zip file from gmail itself.

~~~
jemfinch
I imagine (and I'm not speaking for my employer here) that it has to do with
cost: if the easiest way to backup your email is by using IMAP or POP, it's
essentially an incremental backup mechanism. If the easiest way to backup your
email is by clicking a link which contains all your email in a .zip file, it
would be a non-incremental backup system and its cost would be much higher,
because people would be far more likely to repeatedly download the same
emails.

~~~
DanBlake
Maybe there could be a few options? Download all mail, download by month,
download by year? Seems like they could solve that problem pretty easily and
would be very awesome.

~~~
jemfinch
Offering different zipfiles for months/years wouldn't solve the problem. "The
problem" is that when you have hundreds of millions of users, the easiest
mechanism needs to be an efficient mechanism. Adding links for months/years
would still leave the _easiest_ mechanism as the least efficient: I know that
I, personally, would continue to download the "all mail" zipfile just to
ensure that I didn't miss anything. I cannot imagine my mother or grandmother
doing anything different.

------
zoobert
Thank you all for the feedback and support. It's been great and proved that a
simple tool up for that task is needed by plenty of people.

If you would like to participate and help me developing Gmvault contact me on
twitter at @zoobert or via email: guillaume((dot))aubert((at))gmail((dot))com

I will now work hard to finalize Gmvault and produce the first final release
(available within a month).

If you would like to report some issues or would like extra features: submit
them to <https://github.com/gaubert/gmvault/issues>

------
kozubik
No promises, but I think we will enable this on rsync.net storage arrays...

We already have s3cmd in our environment, so you can:

ssh user@rsync.net s3cmd get s3://rsynctest/mscdex.exe

So if we put this into the environment, you could call it over SSH:

ssh user@rsync.net gmvault sync foo.bar@gmail.com

... which is fantastic.

More like this.

~~~
zoobert
Awesome. Let me know if you need some support.

~~~
toomuchtodo
Any chance you could build S3 support into Gmvault? So you could cron it on a
linux box to connect to Google and push all the data into an S3 bucket? If
money is an issue, I'd be interested in footing the bill.

~~~
zoobert
This is something I have in mind: Cloud save. It will be added in the roadmap.
Please contact me and I will keep you in the loop. Regarding the money I am
thinking of adding a donate button to support the project.

------
rcthompson
Small nitpick: the application name "Gmvault" is inconsistent with the tagline
"Liberate your emails", since one does not normally associate vaults with
liberation. Maybe something like "protect your email" would be more vault-
appropriate?

~~~
zoobert
;-) I like that. Thanks for helping me keeping the theme right

~~~
rcthompson
Also, don't go with exactly "Protect your email", since it sounds kind of
bland. Just something like that. Maybe "Safeguard your email". Find an
exciting synonym for "protect"!

~~~
minikomi
Or maybe "Stash your mail"

------
dools
I just use fetchmail to grab everything (except spam) from gmail anyway but
this:

"With the restore command Gmvault can recreate your gmail mailboxes in any
Gmail account. All attributes such as Gmail labels are preserved and
recreated. With restore, you will recover your Gmail account exactly as it
was."

I would use it just for that! When I wanted to unify my email/calendar etc.
under one account there didn't seem to be anyway of transporting existing
gmail over. Gmail doesn't even have a way to do it if you decide to start
_paying_ them!!

------
mikemarotti
Granted this isn't working for me to begin with
(<http://i.imgur.com/yLzyy.png>), but will this tool work to archive Google
Apps email accounts? eg. Gmail accounts that do not end in gmail.com?

~~~
zoobert
At the beginning of the bash script gmvault there is a variable called
GMVAULT_HOME. If instead of .. you had the full path to the gmvault-v1.0-beta
dir that should work. If there are more problem could you please run it like
this: $>sh -x ./gmvault sync .... and send me the console print.

I will fix that in the next version. Thanks for the feedback

~~~
mikemarotti
Still not working for me, I will send you an email. Thank you for working on
this tool though - our company would gladly pay for an enterprise version of
this (especially if it was a bit more seamless and integrated with Google Apps
somehow).

------
carterschonwald
Hey: I"m trying it out, and its not entirely clear if its properly
incremental. To whit, if I halt it mid backup and start it up again, it seems
to start from scratch all over again.

am I miss interpreting how its working?

------
michaelmior
For anyone trying to get the Linux version to work, here's what worked for me.

1\. Download the tgz and unzip.

2\. Create and activate a new virtualenv in that directory.

3\. easy_install -U distribute (in my case, the version of distribute
installed by pip was too old for imapclient)

3\. pip install logbook gdata imapclient

4\. Replace bin/gmvault with the following

    
    
        #!/usr/bin/env python
        import sys
        import os
        
        sys.path.insert(0, os.path.abspath(os.path.join(sys.path[0], '../lib')))
        
        import gmv.gmv_cmd as runner
        runner.bootstrap_run()

~~~
zoobert
the release on HackerNews is a bit premature. I wanted to test the response
and was not expecting so much interest. I will fix the deployment issues on
Linux and Mac OS X. You can download the tgz and run a michael said.

You can also use easy_install or pip. It is on Pypi Create a virtualenv as it
is better and do: pip install gmvault (or easy_install gmvault)

See <http://gmvault.org/install.html#py_install> for further info

------
samuel1604
wha's the difference between this and a tool like offlineimap?

~~~
callahad
As an OfflineIMAP user, I see a few immediate differences:

1\. OfflineIMAP expects standard IMAP servers. GMVault has special casing to
handle Google's wonky IMAP support, along with features like labels and the
"All Mail" directory.

2\. OfflineIMAP is GPLv2 (or 3, at your option). GMVault is GPLv3.

3\. OfflineIMAP syncs to either a local Maildir or another IMAP server.
GMVault syncs to its own custom on-disk representation.

4\. OfflineIMAP uses a username/password to log in to servers. GMVault uses
XOAuth.

5\. OfflineIMAP is fully bi-directional by default; local deletion propagates
to the remote server. GMVault notes that "manually deleting emails or emails'
directories does not prevent Gmvault from working."

6\. GMVault can encrypt its own archives. OfflineIMAP cannot.

Basically, GMVault looks like a much less general-purpose tool, but in its
specialization, should allow for a much nicer experience for users that simply
want a backup for their GMail account.

I'm fond of the versatility that OfflineIMAP gives me (I can restore my mail
to any IMAP server, not just GMail; I can access the local Maildir with other
applications like Dovecot and Mutt; etc.), but excited about the possibilities
of GMVault for friends and family.

"Do you have your GMail backed up? No? Here, let me install GMVault for
you..."

~~~
emillon
I believe that OfflineIMAP now has gmail-specific code (type = Gmail) :

[http://docs.offlineimap.org/en/latest/MANUAL.html#sync-
from-...](http://docs.offlineimap.org/en/latest/MANUAL.html#sync-from-gmail-
to-another-imap-server)

~~~
zoobert
it is only to backup specific target labels (or imap directories). If you ever
restore your Gmail account with OfflineIMAP then I think that you will loose
partly this information. I would say the purpose of these tools is different.

------
afterburner
I used another Gmail backup tool once that made all my emails "read". Does
this do that too?

~~~
zoobert
Normally yes or as I said above it is a bug and you should report it there:
<http://gmvault.org/report_pb.html>

~~~
afterburner
Thanks for the reply. Your answer is confusing, but I see from your other
replies that GMVault is supposed to preserve read/unread status.

~~~
zoobert
I responded to the wrong comment. Normally it should preserve unread messages

------
eblume
Oh wow - this is really interesting. I just released a VERY similar program
called `gmail-safe` yesterday. I was going to do a Show HN sometime today.

If anyone is interested, here is the URI:
<http://eblume.github.com/gmail_safe/>

Note that this is my first Node app. It needs a lot of polish. It is published
under the MIT license, though, which some people here might enjoy.

------
zoobert
tools is still in beta. Comments, features are welcome.

~~~
orta
I love the idea, but I wasn't able to use it.
<http://cl.ly/2m170D012X370n1g2K0S>

OSX has some niceties for python apps, you could use py2app to make it an
actual binary rather than a folder or bundlebuilder.

~~~
gglanzani
Same here. Installing with pip (not in a virtualenv), and I get

    
    
        Cannot find the python executable to set env var PYTHON_BIN. 
        Please check where your python binary is.

~~~
ibejoeb
if python is in your path, just edit gmvault, delete the PYTHON_BIN stuff
starting at line 24 and add PYTHON_BIN="python"

That should work if python is on your path.

~~~
gglanzani
Thanks, works like a charm now.

------
PanMan
Nice tool! I think the default sync interface should show a progress bar (like
wget). Now it says: Processing 7890 emails, but then it prints a line for
every email. I don't need to see every email in a list, but I would like to
see how long it takes to complete. Update: I see it does print that every once
in a while. I would skip the other lines... :)

~~~
zoobert
Thanks for the feedback. I will think on how to present operation progresses
(progress bar and so on with curses and so on)

------
iloverobots
I think it is great that you are getting a lot of good feedback and
constructive criticism from the community, but I just wanted to say thank you!
I've been looking for a simple tool like this for a while, and Gmvault is
perfect. I'm sure that you will add polish and new features in the future, but
I'm glad that you released when you did.

------
espinchi
My first thought was "cool, I certainly have to use this", and switched to
something else.

I just came back and started to perform a backup. I think you should, too.
This is insurance.

For the author: the link from the "Learn More" is broken. It should probably
point to _install_ , not _documentation_.

~~~
zoobert
Thanks I fixed it. Sorry for the premature launch I think it was somehow a
mistake (too early) but I needed a boost to be sure that I was doing something
useful.

------
sneak
I think people familiar with command-line tools are already probably doing
what I do, which is to use mbsync to sync a remote gmail imap account to a
local maildir. It even easily supports multiple sync account/maildir pairs via
its somewhat confusing config file format.

<http://isync.sourceforge.net/mbsync.html>

------
ernestipark
If this fails/stops in the middle of syncing, will it restart at the point of
failure or start from the beginning again?

~~~
zoobert
Yes it has these kind of mechanisms are embedded: If there is an error in the
middle of a syncing then Gmvault will wait few seconds and try to restart the
current operation. After 4 attempts it will leave in error. Gmail imap servers
can start to throttle the transfer and also cut the connection. Gmvault also
has a restart mode (option --restart) to restart where it was. Actually up to
20 emails before because Gmvault save its "position" (email id synced or
restored sucessfully) regularly.

------
epo
As it stands, this could not work with local mail clients because of the
custom storage format. Why is that, gmail meta data?

It would be very impressive if gmailvault functionality could be combined with
a local mail client then you could get both a full (uploadable to gmail)
backup of the gmail-box and non-browser access.

------
chbrown
No resume functionality? Or am I just missing something? I've got ~3 GB of
emails in my Gmail, and GVault's ETA is 4 hours.

It already timed-out once (no discernable reason), and after restarting, went
right to email number 1 (forgetting about the 1800 emails it had already
gotten through).

~~~
zoobert
Thanks for the feedback. There is a restart functionality use the --restart
option (see gmvault sync -h for more info). Also note that when you restart
from scratch, if the email has already been downloaded and is identical then
the download is not performed. So only scanning the mailbox is faster than
having downloads.

Regarding the timeout, please send me the error message you had and if it is a
bug I will fix it but note that sometimes Gmail cut the connection without any
reasons. There is also a retry reconnect process (up to 4 times) if it is not
a fatal error (cannot recover from them).

~~~
davidradcliffe
Restart doesn't seem to be available.
<https://github.com/gaubert/gmvault/issues/4>

~~~
zoobert
yes it is there for the restore operation but not for sync. It will be
available in the next version (to be released in 2 weeks)

------
andrewpi
Does this use SSL to grab the emails?

~~~
callahad
Be default, yes:
[https://github.com/gaubert/gmvault/blob/master/src/gmv/imap_...](https://github.com/gaubert/gmvault/blob/master/src/gmv/imap_utils.py#L194)

However, if I recall correctly, Python's built-in SSL support doesn't do
certificate verification, so you're still completely open to man in the middle
attacks.

~~~
SoftwareMaven
That's a good point, and a tool like this probably really should be verifying
the host.

(To the author) See [http://stackoverflow.com/questions/1087227/validate-ssl-
cert...](http://stackoverflow.com/questions/1087227/validate-ssl-certificates-
with-python) for multiple options to accomplish this.

------
Glowbox
It didn't work for me (TypeError: 'NoneType' object is not iterable), I opened
an issue on github.

~~~
mattjaynes
Here's a workaround from the GH issue's comments:

Solution: set 'Show in IMAP' in Settings->Labels->All Mail on gmail.com

[https://github.com/gaubert/gmvault/issues/2#issuecomment-555...](https://github.com/gaubert/gmvault/issues/2#issuecomment-5557075)

------
igurari
You might want to consider another name in light of the Google Vault product:
[http://www.google.com/enterprise/apps/business/products.html...](http://www.google.com/enterprise/apps/business/products.html?section=vault)

------
zyb09
Wow, very cool! This is exactly what I needed right now. I had bad dreams
about what would happen if my Google account disappeared all of the sudden,
but didn't do anything about it yet. Now I will be able to sleep better!

------
delinquentme
Has no one mentioned search? intra-gmail search is nothing short of an
atrocity compared to the google.com search... A search feature capturing
partial terms "Aft" in "Afternoon" would more than make this service viable.

------
wooptoo
I've been doing this for quite some time using getmail
[https://wiki.archlinux.org/index.php/Backup_Gmail_with_getma...](https://wiki.archlinux.org/index.php/Backup_Gmail_with_getmail)

~~~
zoobert
It is only one way: Gmail to disk. When you restore the emails to a Gmail
account you will loose part of the info (labels, ....).

------
villagefool
"After that Gmvault will automatically authenticate itself using the
credentials stored in $HOME/.gmvault (or %HOME%/.gmvault for Windows). " how
are the credentials stored offline?

~~~
hmottestad
Probably as clear text. You can delete the file if you want to.

~~~
zoobert
Nop they are stored encrypted with a key randomly generated

------
csabapalfi
How is this better than <http://code.google.com/p/got-your-back/> which does
almost exactly the same thing?

------
bvi
When restoring a gmail account (say I want to upload all the backed-up emails
to a new account), would the date information be preserved as well?

~~~
zoobert
Yes of course. Date, Gmail labels and IMAP flags (READ, UNREAD, ...) are
preserved. Otherwise it is a bug so please refer to
<http://gmvault.org/report_pb.html>

~~~
bvi
Another question: Does gmvault work behind a proxy?

~~~
zoobert
If the proxy let you connect to the Gmail IMAP server, I don't see why it
wouldn't work

------
bob87
What is the switch to encrypt the archives? gmvault -e? I haven't been able to
find it in the documentation.

------
Drbble
Why would a techie need this product? We know how to turn on POP3 in a mail
program.

------
LyleK
Can this store mail in mbox format? In what header do the gmail labels go?

~~~
callahad
The source is pretty easy to read. Here's how it's working:
[https://github.com/gaubert/gmvault/blob/master/src/gmv/gmvau...](https://github.com/gaubert/gmvault/blob/master/src/gmv/gmvault.py#L38)

(It stores each mail in two separate files: a .eml file with the message body,
and a .meta file with metadata.)

~~~
zoobert
I will take that as a compliment. Yes files are all stored individually as
.eml file. All the extra Gmail info (labels, ...) goes in .meta

------
pdk
Is there a way to make Thunderbird or Mac's Mail do the same thing?

~~~
camiller
Yes, but it won't keep the gmail labels etc. And I doubt it would do a restore
to gmail.

~~~
probably
Ah, so this is the value-added proposition.

------
waynemr
does this have any issues when you have 2-part authentication configured for
Gmail? Well, I guess I'll find out in a second and report back :)

------
ljosa
How does this compare to mbsync?

~~~
zoobert
mbsync is a IMAP sync tool. Gmvault is a dedicated Gmail tool so it preserves
all Gmail specific attributes like the labels, ... and allow you to restore
all your emails in a Gmail account. With mbsync you might be able to upload
your emails on a gmail account but you would then loose lots of information
and your restored gmail account would be end up being very different.

------
fogol
Is there any data transmitted to servers other than Google's?

~~~
zoobert
Nop of course not. You can check the sources to be sure that Gmvault is only
talking to Gmail. You can also use XOAuth (token security mechanism) to allow
Gmvault accessing your account. This is by the way the recommended
authentication way.

------
creativityhurts
OP: I understand I will collect down votes with this comment but, seriously,
you gave credit to everybody in the footer except to Twitter Bootstrap, which
you didn't even bother to personalize. Is that part of the brogrammer code,
guzzle down some redbulls, fire up bootstrap and don't even bother about it?

Invest a bit of time in the looks of your service/startup, if it looks like
the next service/startup, it's just another Boostrap themed website that
people (I assume) are tired of.

~~~
zoobert
Uuuuh I think there is scale issue. This is an open source tool that I am
willing to support. I do not intend to create a start-up here. Still you are
right and I will reference bootstrap.

~~~
creativityhurts
I understand that and it's pretty awesome. Have the frontend on Github and I
can help you tweak it out. Open source stuff should look awesome as well.

~~~
zoobert
It is available on the gh-pages branch of the Gmvault github repository
(<https://github.com/gaubert/gmvault/tree/gh-pages>). Pull it and submit your
suggestions. Many thanks.

------
jenius
Crappy design + twitter bootstrap default = tab closed. If you are serious
about releasing a product ffs make sure it's polished. If the attention to
detail I see in the splash page is similar to the attention to detail in the
actual product (which is what I assume automatically), my faith in it is close
to zero.

This is ESPECIALLY important when I'm authorizing you with access to my emails
-- and there is no way I would do that for someone that I assume is not
serious about their product.

~~~
swaits
You left out the part where you were swirling your wine in its glass and
adjusting your top hat while angling your nose sharply toward the sky.

HN is a place where lots of launches happen. A place where open mindedness is
a plus. Seriously.. if this is actually your attitude, you might miss some
cool things. Take a step away from the hive mindset every now and then.

~~~
jenius
There's no hive mindset here - I just looked at the page, noticed that it had
really poor and sloppy design, and assumed that it was a poor and sloppy
product. It's called 'judging a book by it's cover' and is something that
everyone does automatically (regardless of the classic advice).

Craigslist aside, you'll be hard pressed to find a successful product that has
no attention paid to design whatsoever. Design is important, and it makes me
really sad to see it neglected, and so many people supporting not caring about
design here (as indicated by comments like this and downvotes)

~~~
icebraining
Design is important, but this project has a decent design, your prejudice
against Bootstrap -which I'm pretty sure most people don't share-
notwithstanding.

The typography is decent, the utility of the project is immediately understood
(without even having to scroll), the call to action is present and it's two
clicks from the main page to having the application downloading.

You're advocating for "prettiness," which is frankly the least important part
of design.

