

Ask HN: How to mine my inbox.  - jman1

I use gmail and over the years have accumulated a lot of mails. Some that I'd like to keep and some that I'd like to get rid off. The Issue is that I have about 5000 emails and going one by one is not an option. I would like to know if there are any tools that will allow me to mine my inbox orif there is a better way to deal with the problem of having a big inbox, because if the size of the inbox keeps growing it just becomes "noise" and from that point forward having or not having that data(email) is the same.
======
cstrat
You can use the filter tool to sequentially use rules to apply labels to
emails from certain senders. Having labels makes searching/sorting much
easier.

~~~
jman1
Any suggestions on how I can get a list of distinct email addresses from which
I have received emails ? Kinda like a select distinct on db table ?

~~~
hakaaak
Exporting and importing contacts as csv:
[http://support.google.com/mail/bin/answer.py?hl=en&answe...](http://support.google.com/mail/bin/answer.py?hl=en&answer=77259)

You could then manipulate that file to hand remove duplicates, could use A2=A1
type formula in a spreadsheet and fill down to find dupes, copy relevant
column to text file and sort and uniq in *nix:
<http://linux.about.com/library/cmd/blcmdl1_uniq.htm>

Merging and mass merging Gmail contacts:
[http://support.google.com/mail/bin/answer.py?hl=en&answe...](http://support.google.com/mail/bin/answer.py?hl=en&answer=165334)

You could also use Thunderbird. A few t-bird plugins will let you do things
like remove dupes and sync with Gmail.

As for cleaning your inbox though, I would see this as an opportunity to write
a script in Ruby that uses IMAP to automate your scrape and purge:
[http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/net/imap/rdoc/Ne...](http://www.ruby-
doc.org/stdlib-1.9.3/libdoc/net/imap/rdoc/Net/IMAP.html) Or whatever other
language you'd like to use. I just like Ruby, but since it might involve a lot
of text parsing, maybe Perl would be a good choice:
[http://search.cpan.org/~djkernen/Mail-
IMAPClient/IMAPClient....](http://search.cpan.org/~djkernen/Mail-
IMAPClient/IMAPClient.pod)

And then GPLv3 your script, put it on GitHub, use a default GitHub template to
create a nice looking site for it, and post the link back to HN with the cool
doc saying how to use it.

