
The Cat-And-Mouse Story of Implementing Anti-Spam for Mail.ru - yarapavan
http://highscalability.com/blog/2016/8/30/the-cat-and-mouse-story-of-implementing-anti-spam-for-mailru.html
======
yarapavan
Tarantool, mentioned in here, is an in-memory database and application server
with 100% compatible drop-in replacement for Lua 5.1

Github page:
[https://github.com/tarantool/tarantool/](https://github.com/tarantool/tarantool/)

------
kazinator
One powerful rule that I use in my Exim setup is to drop any SMTP connection
that doesn't have forward and backward DNS that match. This catches a big
majority of all spam, with hardly any false positive downside. No credible in
SMTP forwarding host has broken DNS (by definition, practically). Lots of
spamming machines do. Of course, not all, but this is a game of percentages
and multiple stages.

~~~
treve
You'll definitely lose email from Amazon SES then. What about checking SPF
records instead?

~~~
kazinator
My darned residential cable line has forward and reverse DNS that match.

If the Amazon SES can't figure this out, they have no business sending SMTP
e-mail.

I do apply the SPF check and reject the SMTP if it comes up as a confirmed
negative: the sending domain has an SPF record, with strict semantics, and the
IP doesn't match.

SPF speaks to a different problem (not the problem of "is this host a spammy
host?"), and generally has no teeth. Passing SPF only informs me that the IP
address sending to me is allowed to use the given _envelope_ sender domain. A
spammer can register a bunch of domains and set up SPF records for them, and
then use the domains as envelope addresses. (Of course, these envelope sender
addresses don't necessarily have anything to do with what is in the From:
header).

You can't expect some sort of free pass just because you made a SPF record.
What the SPF record does is protect your domain from being used by other
people the bases of a sender envelope address. This is of low value, because
most recipients don't even see the envelope address.

SPF can be applied against From: too but that's fraught with problems, because
of forwarding via mailing lists, or people using alternative sending
identities which are legitimate.

------
kazinator
I use a home-grown web app called Tamarind for managing throwaway mail
aliases, in conjunction with Exim:

[http://www.kylheku.com/cgit/tamarind/tree/README](http://www.kylheku.com/cgit/tamarind/tree/README)

------
Robin_Message
The spam bit sounds cool but the nosql db sounds odd. E.G. That tree
implementation with batched rebalancing sounds downright scary, and I thought
avl was considered the worst kind of balanced tree due to excessive overhead
(an int per node to red-black's bit) .

~~~
bigbes
> Our investigation revealed that in case of frequent insertions and deletions
> Tarantool initiated a complex process of tree rebalancing (all our indexes
> were of TREE type).

It's all about 1.5. New version (1.6) uses brand new bps-tree, not sg/avl-
tree. It behaves better on all workloads.

AVLTree was "temporary" hack. Our implementation works better, for their
needs. BTW - AVL is not bad, but it's hard to implement a good one (believe me
:) ).

~~~
Robin_Message
Nice. Yeah, AVL is definitely okay but it makes me happy to hear something
better is in there :)

What sort of tree is this new bps-tree?

~~~
bigbes
bps-tree means B+ _-tree: unique combination of B+ and B_ tree. You can read
wiki or "The Ubiquitous B-Tree" whitepaper
[https://wwwold.cs.umd.edu/class/fall2002/cmsc818s/Readings/b...](https://wwwold.cs.umd.edu/class/fall2002/cmsc818s/Readings/b-tree.pdf)
.

You may read code of it here:
[https://github.com/tarantool/tarantool/blob/1.6/src/lib/sala...](https://github.com/tarantool/tarantool/blob/1.6/src/lib/salad/bps_tree.h)
. It's very thoroughly commented.

