Hacker News new | past | comments | ask | show | jobs | submit login
Elasticsearch For Beginners: Indexing your Gmail Inbox (github.com)
221 points by antoaravinth on Dec 18, 2017 | hide | past | web | favorite | 20 comments

This is a great "get to know ES" project use case for elasticsearch! Frees you a bit from having to use Google search while also teaching you how to use queries / aggregations. Awesome work. Will give this a shot later.

Thanks for putting this together.

Maybe the Thunderbird guys could use it because the search remains the worst part of using Thunderbird after all these years.

Worth noting that if you like the idea of indexing your mail, there's also Notmuch. It's a dedicated email search and indexing tool and is very nice when paired with something like OfflineIMAP to sync messages.


Gmail users wanting to try notmuch probably want something like https://github.com/gauteh/gmailieer for seamless tag<->label integration.

Ohhh, this is exactly what I want, minus emacs.

Not sure if you're saying you want emacs or don't, but...

* You want emacs - https://notmuchmail.org/notmuch-emacs/

* You don't want emacs - https://github.com/pazz/alot (I use this one, it's very nice)

Also Mutt - https://notmuchmail.org/notmuch-mutt/

Thanks for the links!

What I really want is: mutt's UI, but with a SQLite3/PostgreSQL backend, and mutt not to iterate a mailbox when opening it, and an async IMAP client that reconnects as needed.

Similar concept and similar speed, but notmuch is a little more actively developed. I also find the notmuch command line interface a bit easier and the various tools built on top of it to be better. Alot, the terminal UI I mentioned in the parent comment, is almost exactly what I want in a mail program.

> First, go here and download your Gmail mailbox…

> The downloaded archive is in the mbox format

So really, these instructions are for indexing any mbox-formatted mailbox.

> So really, these instructions are for indexing any mbox-formatted mailbox.

It may not be what you expected, but this is wrong.

The instructions tell you how to download your Gmail emails to mbox format, so they are instructions on how to index your Gmail emails.

In a way, though the instructions specifically handle GMail labels which aren't present in other mbox files. But it's pretty general for any mbox email dump.

Also, any new emails you received since exporting your mailbox will not be available.

I mean, isn't this obvious if you're fetching a file representing your inbox?

Yes. I'm just adding to Cyberdog's comment that it's not immediately obvious from the title of the article that the instructions are for indexing a static file.

The title makes it sound like it's instructions to setup up an alternative API to Gmail search. I was thinking something like Algolia.

What is that http://ohardt.us/download-gmail-mailbox URL where you're supposed to download your email ? Looks fishy, though the hostname doesn't even resolve so not sure what's going on.

> Prerequisites: Set up Elasticsearch

More like first get yourself a host with at least 16gigs of RAM first.

this is like using a jackhammer to nail in a pin isn't it? what's the benefit other than a faster search?

It's a teaching tool. The benefit is to show someone how to use ES for a real-word thing.

With that said, I suspect given tuning based on your search patterns and usage - you could get more accurate search results when you control the indices, stop words, etc.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact