
A Plan for Spam (2002) - vinnyglennon
http://www.paulgraham.com/spam.html
======
lukego
Candidate for best blog post of all time? Quickly made a big impact on a real
problem in the computing world.

I remember reading this at the time, installing one of the implementations
that immediately popped up (SpamAssassin), and _finally_ having my spam
separated from my ham.

Edit: I had not considered until now the possibility that the "IKEA effect"
might have made me overestimate the quality of filtering because of the effort
that I put into training the classifier!

------
peterwwillis
Where's the Gmail option to let me turn a label into a unique bayesian filter?

In the past year I have gotten >1,750 mails from recruiters, most of them
unique addresses. I don't want to mark them as "Spam" because this is a type
of "Spam" that I want to keep so I can refer to it later. I'd like to also un-
train "not-so-Spammy" messages so I can see the jobs I'd be interested in, but
I'm afraid of false positives making these harder to retain.

~~~
goblin89
Personally I disabled Gmail’s spam filter long time ago[0]. (False positives
plus I have some weak morbid curiosity as what comes in spam.) Not getting too
much junk so far.

Recently I started to give a different email to each service so that I can see
if any get compromised, but so far I don’t think I’ve noticed anything like
that.

[0] I borrowed someone’s workaround where you create a filter that excludes
emails matching a specific random UUID, which no email would match, with
action “never send to spam”. Perhaps there’s now a more straightforward
option.

~~~
anticodon
There's no way to disable spam filtering in Gmail because most of the
filtering happens long before user filters are checked.

If it weren't true, or spam folders would contain thousands of messages.

My spam folder receives no more than a dozen of messages per month and I know
that there're tens of thousands attempts to send me a spam.

~~~
goblin89
I agree that there may be earlier filtering stages, though in my experience
even if you send unauthenticated messages from not-really-configured postfix
on Ubuntu they would still be viewable in recipient’s spam, or at least that
was the case a few years back.

I’m pretty sure recruiters’ emails peterwwillis mentioned would not be as bad
as to be eliminated before spam folder.

------
dalbasal
This really was the golden age of blogging. That unpretentious style of
writing made me engage with so many ideas I never would have, reading
professional writing.

~~~
tw1010
This still exists, you just need to know where to look.

~~~
dalbasal
You can't just write a comment like that. links please. ;)

~~~
earenndil
[https://danluu.com/](https://danluu.com/)

[https://www.stilldrinking.org/](https://www.stilldrinking.org/)

Steve yegge's various blogs ([https://sites.google.com/site/steveyegge2/blog-
rants](https://sites.google.com/site/steveyegge2/blog-rants), [https://steve-
yegge.blogspot.com/](https://steve-yegge.blogspot.com/),
[https://medium.com/@steve.yegge](https://medium.com/@steve.yegge))

[https://waitbutwhy.com/](https://waitbutwhy.com/) (bit cheesy but alright).

------
tw1010
When I first read this article it felt terribly old. An article from 10 years
ago? That's ancient! Now it's almost 20 years old and for some reason it feels
a lot fresher today than it did even back then. (Maybe history starts to
compact the older you get.)

------
tw1010
Funny thing is that even to this day, most AI companies employs algorithms no
sophisticated than in this almost 20-year old article.

------
dvirsky
This post blew my mind at the time, it was my first exposure to ML techniques.

------
jorangreef
I remember reading pg's post, it was a classic.

Anyone here working on anti-spam?

Where are things at today?

~~~
thatsaguy
From my perspective (~500 employee mail server), greylisting had a much larger
impact at the time, thanks to the spambots/viruses attempting direct
connection to mail servers. Extremely effective, zero false positives, much
lighter on resources. I did use both, of course, so that I could keep a record
of how effective the systems were.

Today the situation has flipped. Most of the spam we get is coming from
authoritative servers (ie: gmail, yahoo, etc), making stuff like SPF/DKIM/etc
next to worthless from a spam perspective (it's still marginally useful for
forgeries), while bayes (or in general, trainable) filters are essentially the
only thing that can differentiate it reliably.

With a modern setup, you can basically next to zero spam and no false
positives. In fact, honest email marketing (ie: mailing lists you've actually
subscribed to) are from my experience the only thing that throws these filters
off.

~~~
jorangreef
Thanks, one thing we also found is that spammers tend to be poor at RFC
standards, in a way that Gmail etc. will have no problem with, but which are
obviously broken.

For example, we use our own
[https://github.com/ronomon/mime](https://github.com/ronomon/mime) to detect
and reject email which has missing multi-parts (no terminating boundary
delimiter). All of this has been spam so far, and we are yet to see a false
positive. I don't think SpamAssassin has a rule for this (yet)?

Another example is illegal header characters, which are almost always spam,
with a handful of false positives (usually machine-generated).

~~~
readingnews
That is an interesting approach. Care to let us know how you go from
[https://github.com/ronomon/mime](https://github.com/ronomon/mime) to some
kind of SMTP server plugin (like for postfix for example)?

~~~
jorangreef
Thanks, you might find Haraka to be easiest since it's already Javascript.

Postfix may require a process callout, you might need to write a milter.

------
TicklishTiger
It sounds good but ...

...it did not work out.

17 years later and spam is still annoying. I use ThunderBird and tag every
spam message as spam.

Still, some spam messages get through.

And because there are false positives sometimes, I always look through my spam
folder before I empty it.

~~~
coldtea
> _17 years later and spam is still annoying. I use ThunderBird and tag every
> spam message as spam._

With Gmail you hardly see any spam.

> _Still, some spam messages get through._

Some? Back in the day it was in the hundreds every day if you were active
online...

~~~
detritus
Gmail also dumps a lot of non-spam emails into its spam folders.

The ones that drive me nuts (and have led me to having to sign up to fastmail
for my business emails, rather than sending from my own shared server) are the
ones that are valid responses to emails sent from GMail in the first place.
Excellent logic (or lack thereof), Google.

I don't actually believe so, but I have often wondered whether GMail has such
an overly fastiduous spam filter simply to encourage people into its fold... .

