
How Spam Filtering Works: From SPF to DKIM to Blacklists - prostoalex
https://deliciousbrains.com/how-spam-filters-works/?__s=b7opgxc3bpupssi4tgau
======
gorkish
The author is still pretty far behind the curve with this info. Unfortunately
while these policies are great to have (well these days they are more or less
necessary), the simple fact is that they are so often misused or improperly
maintained they don't really stop that much; a SPF mismatch is simply treated
as another item to score the likeliness that a message should or should not be
blocked. A DKIM signature is almost completely useless in this system since
all it can do is prove a message was handled by a specific domain holder and
not modified. So if you want to modify it, just remove the signature. If you
want to send a forged message, don't sign it. There's nothing in DKIM itself
to tell a recipient "my messages must be signed."

The new hotness is a DMARC record, and it finally allows mail senders to
basically say "Here are my suggested handling policies for DKIM and SPF, and
please actually enforce them because they are properly maintained and tested."
Best of all it has a feedback loop, so you are able get reports of how
receiving hosts are treating your messages and forensic reports of forgeries
which can help with detection of several types of phishing threats.

It's still a long way from perfect, but it's better. It doesn't help
deliverability though. For that you will want to stop emailing shit that
people don't want.

~~~
mehrdadn
The new hotness is ARC (arc-spec.org), which I understand came out of DMARC?
Not sure though, don't know too much about it.

For DMARC, it's not so awesome :\
[https://news.ycombinator.com/item?id=17900765](https://news.ycombinator.com/item?id=17900765)

~~~
massaman_yams
In short: ARC builds on top of DMARC to fix some cases that DMARC didn't
handle well. (And DMARC, in turn, builds on top of DKIM and SPF.)

DMARC is overall a very good thing, and has significantly cut back on the
effectiveness of forged email for spam/phishing, at the expense of mailing
list/forwarding headaches.

~~~
mehrdadn
Hopefully it's been good for other people... I haven't seen any benefit
personally. :\

On a related note: do you know if DMARC feedback leaks information to the
sender on whether or not you opened an email, or reported it as spam, or
similar? If so, do you know why this isn't considered a privacy or security
issue?

~~~
Ionlyprograminc
DMARC feedback is only based on the MTA (Message Transport Agent) handling the
mail. I don't have much experience with DMARC reports but Senders typically
use it when they are first Setting up DMARC. It is up to the receiving MTA to
send the inbox/spam result according to the RFC. I think that the reports are
aggregated so single inboxes are not able to be detected. I don't think many
people actually send / follow up on DMARC reports imo.

Source:
[https://tools.ietf.org/html/rfc7489#section-7.3](https://tools.ietf.org/html/rfc7489#section-7.3)

------
znpy
I've been thinking for a while that there should be a movement to allow people
running mailservers at their home again.

"Again" as in get other actors in the field to allow/ease that: spamlist
managers should stop blocking residential address _by default_ and make it
easier to appeal. ISPs should make it possible and easy to get a reverse-ptr
with every fixed ipv4 allocation.

In this monitoring age I want to be able to handle my private conversations
(and my data) on my own iron (running somewhere in my house).

All this impediments to running your own mailserver and take back control of
your own data really look like a push towards state-/corporate-backed
espionage of individuals.

It's not 1995 anymore, most of us have an internet connection well capable of
sustain enough mailboxes for a whole family.

~~~
_jal
I'd absolutely be all for it - I've been running my mail server since the 90s,
and used to run it out of my closet.

There are two major, somewhat interlocking problems, though: home users with
compromised machines currently represent the population of home "mail servers"
(spam malware) at the moment. Selling this involves convincing mail
administrators that not simply blackholing all of what is currently a cesspool
is a good idea. That's not easy.

The second problem is that fixed-IPs, open well-known ports and DNS have
become a product differentiator for ISPs. Those are "business" features; dumb
ole' consumers don't "need" them.

I don't see any compelling argument that would convince either prong of that
trap - hell, I agree with you, but as a mail admin both professionally and
personally, the first one gives me hives, thinking about the years of work
it'll take to re-stabilize a functional antispam infrastructure to handle it.

And after that, you need to convince Comcast to change their revenue-
extraction plans and generally be less dickish.

If just you want a mail server now, your best bet is a cheap VM somewhere. It
can be really low-end and cheap.

~~~
megous
Other way I've thought of effectively achieving this is by using VPS for
static IP outside of residential ranges and home machine connected by
WireGuard VPN. WireGuard handles roaming easilly, so dynamic IP is no problem,
IPv6 is no problem, and the mail will stay on your home machine. You'll also
bypass your country's/ISP tracking somewhat if you place your VPS outside of
the country.

~~~
lsh123
I run home mail sever for a while. I used to have smtp at home but a few years
back I switched to a smtp proxy/forwarder on AWS ($5-6/month) connected via
TLS SMTP on custom port at home (which also serves IMAPS on a custom port).
Once a year I have to update DNS when my Comcast IP changes. Otherwise it is
completely trouble free setup.

------
mrmekon
In my experience, spam blacklists have significantly decreased in efficiency
over the last 10 years. I think the biggest e-mail providers stopped
contributing to them, so the user-reported lists are almost unused. The
honeypot lists lag behind the spammers by a few days, so plenty slip through.
They do trim out 85% of my incoming spam, but that last 15% is still a lot.
Back in ~2013 they cut out more like 99%.

Today, the single most effective thing you can do if you run your own mail
server is to completely block all gTLDs. Screw 'em, they are 99.9999% spam.

Plenty of spam has valid SPF and DKIM records. They are sent through legit
services, either through cracked credentials of real users or rotating through
new accounts.

It also doesn't seem like anybody cares about abuse@/spam@ reports anymore...

~~~
interfixus
> _completely block all gTLDs. Screw 'em, they are 99.9999% spam_

[citation urgently needed]

Anyway, in my own experience with many years of self-hosting mail - until
giving up and going Fastmail a couple of years ago - the real problems were in
sending. No matter what rigorous level of DKIM'ing and ip-hygiene and whatnot,
Google and Microsoft - Microsoft to a grotesque degree - would randomly ditch
incoming mails from my server. Would sometimes happen in the middle of a
conversation thread, and for the most part without warning. The kind of person
using Hotmail is typically not someone you can convince that the error lies in
his end.

~~~
mrmekon
> completely block all gTLDs. Screw 'em, they are 99.9999% spam

That's for _my_ e-mail. Your experience may differ. Perhaps you communicate
often with people on .loan domains.

E-mailing Microsoft accounts is just not an option. They have no process for
fixing incorrectly blocked IPs. I take the same approach with Microsoft as
with gTLDs...

Never had a problem with any other mail provider. Google has never blocked or
spam-holed me.

------
teekert
Been there, done that. And then Microsoft answers my complaints (after my
mails never showed up at my brother in laws outlook.com mailbox):

 _We have reviewed your IP(_. _._. _) and determined that messages are being
filtered based on the recommendations of the SmartScreen® Filter.

Email filtering is based on many factors, but primarily it's due to mail
content and recipient interaction with that mail. Because of the proprietary
nature of SmartScreen® and because SmartScreen® Filter technology is always
adapting and learning more about what is and isn't unwanted mail, it is not
possible for us to offer specific advice about improving your mail content.
However, in general SmartScreen® Filter evaluates specific words or
characteristics from each e-mail message and weights them, based on their
likelihood to indicate that a message is unwanted or legitimate mail.

Unfortunately, after reviewing the information you provided and in compliance
with our mail policies, we are unable to offer immediate mitigation for your
deliverability issue. However, we have some specific recommendations for you
to consider that can help you to improve deliverability over time._

It's a struggle running your own mailserver.

~~~
megous
Yes, it's annoying because mailbox owners have zero control too. If your
brother in law complained to outlook that some emails are missing from his
inbox/spam because outlook decided to reject them outright, he would get the
same respnse. "We have smarter systems(r) than your judgement, sorry."
(Remeber, you can't click "not spam", if the e-mail was not delivered at all)

The problem is the burden of not upsetting a mail filter got somehow shifted
to the sender in case of big services filtering mail, which is ridiculous.
Customers should talk to their providers if they're not receiving mail,
because providers don't deliver it. But the correcting feedback loop is not
there or/alternatively is much longer with more steps than in case of sender
getting a bounce.

------
jorangreef
Much more important than SPF and DKIM is Forward-confirmed reverse DNS.

You need SPF and DKIM, but before that you need FCrDNS.

Without FCrDNS, your server will look like a dynamic IP address.

It's also not enough to monitor your own dedicated IP address. You need to
monitor your entire /24 IP address neighborhood at your hosting company. You
might be sharing the same /24 IP address space as a hacked Wordpress
installation and find that Outlook.com will block you just because of that.

After that, you might run into issues with large SpamAssassin setups hosting
hundreds of thousands of mailboxes blocking your email because of arbitrary
rules such as too little text combined with a large image etc. Plain text
email will improve your deliverability for these setups.

~~~
pbhjpbhj
Yes, Microsoft blocked me sending email from my shared hosting to my Hotmail.
The email address was whitelisted, and established email back and forth to
that domain, and no spam ... but another server at the same host had
previously been blacklisted in one blacklist.

SPF nor DKIM helped, I don't own the mail server nor want to pay Microsoft's
service provider to verify me.

In the end I had to bounce the emails from a third-party address. Which you'd
think shouldn't work -- certainly not when whitelisting doesn't -- but solved
the problem.

So to avoid being flagged as a spammer, do the spammiest thing possible: set
up a new, free mailbox and bounce your mail from there. SMH.

------
Sir_Cmpwn
If you're looking for an easy way to test your outgoing mail setup, I
recommend mail-tester.com. They give you an address to send an email to, then
analyse it and tell you how you can make it more deliverable.

~~~
walrus01
[https://mxtoolbox.com/](https://mxtoolbox.com/) can also give you a lot of
useful info related to your MX.

[https://mxtoolbox.com/diagnostic.aspx](https://mxtoolbox.com/diagnostic.aspx)

------
rurcliped
The great fallacy of spam filtering is that access control SHOULD be
probabilistic whenever it is easy to implement that. Let's look at the
physical world where implementation is harder. It's very unlikely that I'll
arrive home from work at 4:30 AM, AND that I'll be driving a rental car
instead of my own car, AND that I'll be wearing new shoes with a sole pattern
that my smart walkway hasn't seen before. So, ideally my home security system
would automatically call the police. Right?

~~~
megous
I don't think that's how it works. You train your spam filter both with spam
and ham and the sets need to have comparable sizes for it to work reliably.

So in your example, you'd train it on your patterns of behavior AND patterns
of burglars. So unless you had a clear burglarly behavior, the system would
probably not flag you out for just behaving differently.

------
Ameo
I've spent many, many hours playing with my mailserver and adding all of these
extra features and security settings to try to keep Gmail and others from
denying mail from me.

It turns out that no matter what I do, I'm pretty much out of luck because I'm
sending from a non-standard (.link) domain which gets treated as spam by Gmail
as well as others. I get a perfect score on every email test site I've tried,
but I still go to spam (or worse) regularly which is a big problem if I ever
want to use my personal email for anything important.

For anyone looking to host their own email using a personal domain, I'd
suggest sticking to something like .com to avoid the hell that I've been
through with this.

------
lxchase
Would love to get feedback from the HN community. We're a decently sized
sender following what we think are best practices (sending to engaged, dkim,
etc.) however our gmail deliverability is rock bottom and it's been difficult
to improve. Every other provider is to benchmark or better. One issue may be
high hard bounce rates on our very first email sent, but we don't send emails
to bounces at all afterwards. (Wouldn't a double opt-in result in the same
situation). What are the most impactful levers to pull in your opinion?

~~~
techsupporter
> ...our gmail deliverability is rock bottom and it's been difficult to
> improve. Every other provider is to benchmark or better. ... What are the
> most impactful levers to pull in your opinion?

I'll be blunt, and maybe others will disagree with me, and say that I don't
think there are any impactful levers you can pull.

In my experience, Gmail frankly does not give a shit about receiving e-mail
from any senders except ones that fall in one of two categories: a) very large
so users will complain in large groups if receiving from them is impacted (put
your Outlooks and your Yahoos and your huge residential ISPs in this category)
or b) technically-savvy senders that aren't quite as large but Gmail employees
routinely interact with so they're otherwise "trustworthy" (Fastmail is an
example here).

I've ranted about here but I've had terrible success being smaller than a
multinational ISP and sending e-mail to Gmail users. Messages will silently
disappear, even though Gmail's SMTP server claims to accept the message, and
the postmaster tools are less-than-useful (that might be my one suggestion,
sign up for them on behalf of your domain and see if you get any reports; I
didn't, 'too small'), and no amount of tinkering or asking would help.

I admit I'm just a SMB org whose business isn't sending e-mail--it's "just" a
communications tool for us--so I had this option: I threw in the towel and
switched to Fastmail and have had no problems since.

~~~
Avamander
> That might be my one suggestion, sign up for them on behalf of your domain
> and see if you get any reports; I didn't, 'too small.

I had the same issue, what I found though is that DMARC reports give at least
some insight into the process, but that's about all the information I've
managed to get from Google.

------
jonathanbull
Nice article. It's worth noting that it's not just the reputation of the
sender domain you should be looking at - the reputation of the links in the
_body_ also plays a big part. In fact over at
[https://emailoctopus.com](https://emailoctopus.com), this is the number one
issue we see for legitimate senders landing in spam.

