I remember reading this at the time, installing one of the implementations that immediately popped up (SpamAssassin), and finally having my spam separated from my ham.
Edit: I had not considered until now the possibility that the "IKEA effect" might have made me overestimate the quality of filtering because of the effort that I put into training the classifier!
In the past year I have gotten >1,750 mails from recruiters, most of them unique addresses. I don't want to mark them as "Spam" because this is a type of "Spam" that I want to keep so I can refer to it later. I'd like to also un-train "not-so-Spammy" messages so I can see the jobs I'd be interested in, but I'm afraid of false positives making these harder to retain.
Recently I started to give a different email to each service so that I can see if any get compromised, but so far I don’t think I’ve noticed anything like that.
 I borrowed someone’s workaround where you create a filter that excludes emails matching a specific random UUID, which no email would match, with action “never send to spam”. Perhaps there’s now a more straightforward option.
If it weren't true, or spam folders would contain thousands of messages.
My spam folder receives no more than a dozen of messages per month and I know that there're tens of thousands attempts to send me a spam.
I’m pretty sure recruiters’ emails peterwwillis mentioned would not be as bad as to be eliminated before spam folder.
You could use popfile, it supports moving mail to named folders, which isn't the same as just applying a label, but works. http://getpopfile.org
Steve yegge's various blogs (https://sites.google.com/site/steveyegge2/blog-rants, https://steve-yegge.blogspot.com/, https://firstname.lastname@example.org)
https://waitbutwhy.com/ (bit cheesy but alright).
Anyone here working on anti-spam?
Where are things at today?
Today the situation has flipped. Most of the spam we get is coming from authoritative servers (ie: gmail, yahoo, etc), making stuff like SPF/DKIM/etc next to worthless from a spam perspective (it's still marginally useful for forgeries), while bayes (or in general, trainable) filters are essentially the only thing that can differentiate it reliably.
With a modern setup, you can basically next to zero spam and no false positives. In fact, honest email marketing (ie: mailing lists you've actually subscribed to) are from my experience the only thing that throws these filters off.
For example, we use our own https://github.com/ronomon/mime to detect and reject email which has missing multi-parts (no terminating boundary delimiter). All of this has been spam so far, and we are yet to see a false positive. I don't think SpamAssassin has a rule for this (yet)?
Another example is illegal header characters, which are almost always spam, with a handful of false positives (usually machine-generated).
Postfix may require a process callout, you might need to write a milter.
It seems that centralisation has been highly effective in spam-fighting. Google must have a huge corpus of spam and ham, and they also have the benefit of being able to spot patterns of incoming mail "live" across a very large number of accounts.
In the last years though, I cannot really recommend to use any of the DNSBL anymore. I've encountered more cases where legitimate servers were blocked due to netblock vicinity or indeed previous ownership than actual spam issues.
Greylisting will still catch dynamic allocations almost as effectively, while you won't reject legitimate mail due to server and/or DNSBL issues.
In other words, if at least two DNSBL queries agree, then reject, or feed this information to the rest of the spam pipeline?
I found this to be pretty much worthless if you already have greylisting, even for high-quality curated lists such as spamhaus SBL/XBL.
...it did not work out.
17 years later and spam is still annoying. I use ThunderBird and tag every spam message as spam.
Still, some spam messages get through.
And because there are false positives sometimes, I always look through my spam folder before I empty it.
I believe spammers actively test their messages against things like SpamAssassin to sneak them through.
With Gmail you hardly see any spam.
>Still, some spam messages get through.
Some? Back in the day it was in the hundreds every day if you were active online...
The ones that drive me nuts (and have led me to having to sign up to fastmail for my business emails, rather than sending from my own shared server) are the ones that are valid responses to emails sent from GMail in the first place. Excellent logic (or lack thereof), Google.
I don't actually believe so, but I have often wondered whether GMail has such an overly fastiduous spam filter simply to encourage people into its fold... .
That is not the content based plan for spam PG describes.
And it comes at a cost. Complete loss of control over your inbox.
I use Office 365 and Gmail. Both do a good job.