Hacker News new | comments | ask | show | jobs | submit login

Have there been any email clients/plugins that used Bayesian filtering to sort non-spam email into different categories successfully? For example, is there anything out there that would let Fred Wilson automatically tag all the emails he gets from entrepeneurs asking him to check out their sites and invest in their companies?

Well, there was this thing I wrote like 10 years ago called POPFile (http://getpopfile.org/) when I had the problems that he is talking about. I haven't looked back.

POPFile grew from POP only to lots of other protocols and many, many people still use it.

Wait, you're THAT jgrahamc?! Whoa, small world. I nearly invited you to speak in Japan about POPFile and OSS at the technology incubator that was my ex-ex-day job, but our budget fell through. As far as I know our not-quite-ready-for-primetime POPFile/Outlook integration is still protecting a few hundred desktops next to a rice field.

Whoa. Are you talking about QuickPOPFile?

I would have happily come to Japan and spoken, I've been there before and always loved the place.

QuickPOPFile was not quite us, but the general idea to make it useful for non-technical folks was similar. (We were going to aggregate filtering across an organization, on the assumption that most folks couldn't be bothered to train emails. Turns out this is easier to do if you are Google than if you want to run a parallel mail infrastructure alongside the one that actually delivers mail.)

POPFile is awesome. Definitely one of the most useful email apps I've ever used. Thanks for all your work behind it.

You are most welcome and I'll pass on your praise to the people who keep POPFile alive and growing today.

Yes, there is, the mail client integrated in Opera Mail does this: "Learn from messages added to or removed from filter: allow Opera Mail to train itself into recognizing which messages belong in the filter, and which do not. This can act as a substitute for adding rules, or in addition to the rules. It learns from the messages you remove or add." http://www.opera.com/browser/tutorials/mail/sort/

I love this feature.

It is also possible to use POPFile with any client and IMAP server. http://getpopfile.org/

We're doing this at OtherInbox.com with our Organizer solution. I saw another company doing this in the TC50 demo pit last year but I can't remember the name.

Most of our Organizer logic starts with heuristic matches on the From address and other headers in the message, but we're also testing out Bayesian filtering on the message bodies to identify other traits and train the system.

Joshua Baer | @joshuabaer

Gmail has some features that work like that.

That's not true.

Gmail has good filters, but it does not have bayesian learning (appart from the spam filtering).

Ah ok, so labeling is only based on the filter contents and not on previous emails that were labeled with a certain tag.

That's weird, they use the exact same tech for the spam, why not apply it to the 'good' mail as well?

Perhaps I should start selling POPFile + GMail as a service? I wonder how much people would pay for me to have POPFile magically label their gmail messages for them. Especially since GMail allows OAuth access via IMAP.

I once had a computer running all the time to run POPFile and filter my Gmail messages through IMAP.

At that time POPFile+IMAP was not very stable and POPFile would crash often. And also, it was a pain to have a computer running all the time at home only for this.

Then, there was another problem. POPFile would run every two or five minutes, so when I opened Gmail I used to find not yet filtered mails in the inbox.

If you make a webservice that does this right, I will consider using it. Now I'm using Opera Mail, which is very nice, but the training is local and, therefore, I don't have the same filters and training at home and at the office, which is rather painful, sometimes.

I'm sure there is a succesfull business behind this.

Well, I probably wouldn't use POPFile itself for this anyway because I have a blindingly fast C version of the same classification techniques called polymail which I license to people. Would just have to graft on the IMAP part. Hmm.

Evening project?

Clearly, the not yet filtered bit is a nuisance, that's why you really want it to be built into your mail client.

The good thing is that now Gmail's IMAP supports OAuth, so people migh be much less reluctant to give you access to their mailboxes.

Yes, the nuisance is there, but I'm sure there is a way to solve it. The good part is that it does not make the service unusable, so, people can start using the service even though with this nuisance.

Release early and then iterate. ;-)

I guess Gmail did not release this feature because it might confuse some people. At first, when the filters are not well trained, there are many errors, so I guess, many people would not understand why and would think that Gmail does not work well.

Maybe they should make it a 'labs' feature then?

Yes. That's it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact