Removing user-controlled input indeed removes the incentive, but for reasons beyond my comprehension, our sign up form _still_ gets periodically blasted. Without additional countermeasures, our sending reputation would be at risk.
We apply a few strategies...
1. Require oauth-based sign up for gmail.com, hotmail.com, and live.com addresses. No emails sent until after authentication.
2. Drop obvious spam. If your name contains "http://" or "Whatsapp" or your User-Agent is "python-requests" than you get a 400.
3. Manual approval for suspicious sign ups. High Abuse IP DB scores, statistically unusual names, etc generate a help desk ticket that someone from our staff approves before an email is sent. Persistent bad actors are blocked by IP for 2 weeks.
4. Rate limiting by IP. This prevents bad actors from spamming our help desk.
All together this seems to have largely solved the problem... for now.
So far all "my" spammers seem indescriminate. I have the feeling that our form is just another URL on a very very long list, and I see no evidence of adaptation.
If there was something of value (besides excellent software) behind the signup form maybe we would need different strategies.
Signup which looks like successful but blackholed is user hostile. Majority of web form spammer don’t even try to evade filtering - why bother when there are many web forms without validation. All signup spam I’ve ever seen in email was sent via forms which allows to enter longish text in the name field with ether url or email address or phone number. Reasonable validation of user entered fields used in signup emails should stop 99% of such spam.
Ah, I got a phone call yesterday because a small company I ordered something with didn't know where to send the package.
I gave them an e-mail address that contained a "+". Apparently MondialRelay silently dropped their requests. They thought they had sent me multiple e-mails. I received none.
I'm so tired of these idiots. I've been slowly teaching various (non tech) co workers how to protect themselves and every time we get to email signups, I have to add a caveat about this only sometimes working, if the company has intelligent programmer.
The way it goes is that an attacker got hold of e.g. an Amazon account and starts ordering stuff to his address. In order to prevent the victim from becoming suspicious, the attacker buries the Amazon emails in an avalanche of spam emails. So a bot was submitting the victims email address to my sign up form, my app sends out a verification email to the victim and being part of the distraction.
Yes, I'm not too big of a fan of Google. Plus (I don't really know how it works behind the scenes) the Cloudflare approach is a simple checkmark, so there's no annoying clicking of tiles with motorcycles in it...
Any ideas on how a simple proof-of-work could look like, also from the backend side?
A simple proof-of-work would use a hash function (sha or something like blake2, in javascript or webassembly, can be multiple rounds), plus a server-provided random seed. Append random data to the seed until the hash function matches a certain condition. Submit the data to the server for checking.
Something like that pseudo-C code:
With check_output checkig that the leading n bits are all zero, for instance, with n the difficulty.
On the server-side, no loop is needed: only one check with the server-provided data and client-provided "i" is enough.
In practice, i is probably going to be much larger than 32 bits, and any cheap way of changing it shold be enough, it doesn't have to be linearly increasing.
I imagine countless proof-of-work libraries exist.
The issue with that approach is that slower clients will take longer to solve the challenge, but it just needs to be prohibitely expensive (and slow) for the attacker to spam this, even if they have powerful machines.
Botnets can sidestep the issue a bit by distributing computing, but this should still slow them down.
You could also ask for the client's best find after x seconds if it's low-powered, and check that it is reasonable (though that can be gamed). The difficulty can also be increased temporarily if there is a surge of requests.
Maybe we need some kind of "Internet weather forecast" to adjust captca difficulty across websites according to detected botnet activity?
It's true, I assumed there would be multiple targets, maybe that was misguided?
You probably can't use a single website to send those 20 messages (if it's well-designed with a cool down), as the recipient address is the same. So you need 20 different websites to send these.
What outgoing spam rate is acceptable, assuming you can't reach zero? 1 per 10 sec?
Wow. It’s unfortunate that so many programmers dedicate their time to breaking systems like this all for spam. It’s pathetic the lengths we have to go to protect even the most basic parts of systems like sign up forms.
I understand the "this is why we can't have nice things" sentiment.
On the other hand, it's a fascinating topic on a technical level and I wouldn't compare it to other system exploits that you may witness in real life of people taking advantage.
In my app, users can create projects, and invite other users to their projects. The invite email contains the project's name. You can guess what happened next :-)
The spammer must have been quite determined -- for free accounts there is a limit of invited users, so their automation script had to send an invite, cancel the invite, send the next invite, cancel, and so on.
I reject posted data from open forms if there are any "http", "https" or "@" in the field.
These are all spams. The last valid case was when someone wanted to report a bug on a page from my site and want to include the URL of that page. But this case is so rare now a days. People can't differentiate the URL of a page from the page itself.
I have to think that the recent trend of dimming or even hiding (Safari) the full URL of the page has contributed to people not understanding how URLs work.
I think the first time I read about this had to be.. 1999 or so? Funny to think that form to emails are still getting abused like this 20+ years later.
Around 2000 I fixed more than a few FormMail.pl on a freelance basis. I suspect it's still doable to be a specialized consultant focusing on nothing but spam prevention.
> Never include any user-input text in welcome emails, or any other type of emails triggered by submitting publicly accessible forms, where the receiver’s email address is part of the submitted data.
Isn't it better to validate input data? If you don't do it you can have bigger problems then sending spam, regardless if you use user input data into welcome emails or not.
It amazes me to see the kind of code at some startups. Such code would never pass manual testing when working for a software company. But likely it won't reach testing phase because it won't pass code reviews.
I once fixed a Contact Us form that allowed the recipient email address to be overwritten, so anyone could put their spam in the body and sent it to anyone via that form.
Input validation is important. PHP's mail() function has a parameter for "extra headers", which is often used to construct a From header on contact forms. This was exploited very frequently back in the early 2000s. I hope people have gotten more careful since then ...
The way it worked was that the "from" email address could contain CRLF and then a bunch of extra headers, even a message body, which would simply get injected.
The issue is also that most email clients will automatically convert URL's to links in HTML emails. So if an URL is put in e.g. your name field, it will still be clickable despite of no <a> tags.
I'm instead having people try to use my newsletter to "DDOS" other people (subscribe someone to 1000 newsletters, of which one is mine, and see how they like it). Subscription confirmation emails don't help, because now instead you get 1000 confirmation emails. Or at least that's what I think is happening.
We only include the username, not the (optional) real name. If I had thought about it before now, I’d have said that sounds suboptimal. But now, I guess it was smart ;)
Not OP, but usernames usually have restrictions to certain characters and lengths. It's pretty common to not allow : / and ., which would stop it. Space and @ usually aren't allowed either.
So "Go to superspammysite.com for horny pictures" is still an allowed username, right? Or of you skip spaces "superspammysite.com" may be enough that some spammer wants to abuse it.
I think you can't restrict usernames enough to not allow any spam. The only right way is to really just have the email and nothing else in the first confirmation mail.
Then you have "Hello superspammysite.com, welcome to our website." Not enough space for some kind of scam-text like in TFA. And considering that this has not happened in the last 27 years, it does not actually seem that worthwhile for spammers.
There are low tech and effective ways to tell bots from real people but I don't want to share them in public with possible bot writers here, sorry. If they know they don't care because they are successful enough without complicating their code a little bit more. If they don't know, don't tell them.
> This is one of those articles where reading the title is enough. No need to read the rest
then why isn't the title on HN what the article says is all we need to know?