The one that got me was spam filtering. Google does a pretty good job at that, a...

mike-cardwell · on Feb 4, 2012

It's really easy to install SpamAssassin and point your MTA at it. Even the default configuration is highly effective.

acdha · on Feb 5, 2012

It's still a service which you have to maintain and there's constant config & upgrading required since some spammers are smart enough to test their messages with it first.

I've run SA professionally for modest (low hundreds) of users. It works reasonably well but it's a job and I'm not paid to do it personally.

nupark2 · on Feb 5, 2012

SpamAssasin will auto-update its rules if properly configured. I have touched my SA install in years.

mike-cardwell · on Feb 5, 2012

nupark2 is correct. There's less maintenance involved in running SpamAssassin than there is in running Firefox.

davidw · on Feb 4, 2012

I did that, back in the day, but it just wasn't enough.

pyre · on Feb 4, 2012

One could always mooch off of Gmail's spam filtering... Use SpamAssassin in conjunction with a gmail account that is only used as an attempt to attract as much spam as possible. Then just download the spam folder over IMAP and use it to teach SpamAssassin...

antics · on Feb 5, 2012

Respectfully, no.

The primary issue in spam filtering is not that we don't have training corpora, it's that spammers are very efficient at finding holes in your system.

One example is when the Hotmail team first enabled keyword filtering. When the spammers found out experimentally, they began injecting HTML comment tags into high-weighted words like "free", breaking the model. When the Hotmail team took steps to combat this problem, the amount of spam that employed this technique dropped from 5% to close to 0%[1] in a matter of days.

Spam detection is complicated and hard.

[1] Hulten, G., Penta, A., Seshadrinathan, G., and Mishra, M. Trends in spam products and methods.

gnosis · on Feb 5, 2012

"The primary issue in spam filtering is not that we don't have training corpora, it's that spammers are very efficient at finding holes in your system."

Finding holes on sites like gmail and hotmail is actually much easier than on private sites, because the spammers no doubt have accounts on gmail and hotmail and can test what gets through and what fails and tweak their algorithms until their spam gets through.

Spammers don't have that luxury on private servers, so they have to spam blindly. So in this respect private servers have an advantage over gmail and hotmail.

On the other hand (as has been pointed out many times in this thread), gmail and hotmail have the advantage of advanced spam detection algorithms and virtually instantaneous feedback from millions of users.

a3_nm · on Feb 5, 2012

> So in this respect private servers have an advantage over gmail and hotmail.

Besides, spammers care more about being able to circumvent Gmail's spam filtering than they care about circumventing your own system.

kragen · on Feb 5, 2012

That's a perfect example of how spam detection is harder for a centralized server operator like Hotmail or Gmail than for somebody who's running their own server: if I were to add a delete-HTML-comment-tag preprocessing phase to my own copy of SpamAssassin, spammers would have a difficult time detecting that, and so would not be able to adapt. (Because none of the dozen or so people who can get mail on the server are spammers.)

mike-cardwell · on Feb 5, 2012

SpamAssassin takes care of this for you by automatically downloading new filtering rules. You host the spam filtering software, but the actual rules are out-sourced by default. It even includes configuration for multiple DNSBLs, DNSWLs and RHSBLs by default. You can add your own local rules if absolutely necessary too.

pyre · on Feb 5, 2012

I'm not saying it's perfect. Obviously having a network effect allows you to do things like flag messages that are exact duplicates over multiple sent-from and sent-to addresses.

I've seen this happen where 1 or 2 messages hit my inbox, but the next 20 or so are in my spam folder.

coopdog · on Feb 5, 2012

Gmail's spam filters are so good because they have a network effect going though, I doubt one spam folder would achieve much, you need a much bigger sample of all the spam than one account could ever attract

alanh · on Feb 7, 2012

Pretty sure you are making some unwarranted assumptions here – other commenters mention that SpamAssassin can download updates to itself regularly, so you aren’t starting from scratch with a completely untrained Bayesian algorithm.

alextingle · on Feb 5, 2012

I use SpamAssassin server side, and Thunderbird's own filter polishes off the rest. I see perhaps 1 spam mail per week (from the literally tens of thousands that hit my server).