Another huge problem is that people are paid next to nothing in China and India to manually spam websites and break captchas. The number of human spammer keeps increasing. When I left last year, it was becoming a huge problem. Definitely the biggest headache for us in ~5 years.
In my experience, the best protection against web spam is still Akismet/Mollom/Defensio. And for the record, I know we didn't like when people used other mechanism to stop some spam before it got to us because we didn't get to see the full corpus, which was invaluable to us in helping all our users fight spam.
Those are reserved for the big ones, for all the others is mostly general-purpose bots that try every form they can find on the internet. Where speed is most important than accuracy spammer won't use the "Heavy" bots.
Yes, it's nice (I guess) when someone's name is a link to their personal website or they can post the URL of a relevant article in the comments, but it's not like commenting ceases to be valuable without those features.
Spam and link spam were already there before Google existed and the PageRank was invented. The index of the AltaVista search engine was huge and full of spam.
When the nofollow value for the rel attribute was introduced there were many claims that this would reduce the amount of link and comment spam. Critical remarks came often from people who were offering link building and SEO as a service.
The number 1 effective thing we have found to do is to not allow hyperlinks to be posted if they are not trusted (not enough rep/point/score whatever)
Overnight it basically stopped the spam wave. Your removing the one thing of value for them, a hyperlink. I'm a big fan of accessibility and this works well with it. The only other technique we use is honeypot form fields which do catch a fair few, but nowadays a lot of spam I suspect is paid human spam.
Each comment has exactly one pair of transposed letters. There is no product being pitched, and no url (we don't display or link to email address either). It's baffling.
Sounds like they're doing what's known as "Bayesian poisoning" (http://en.wikipedia.org/wiki/Bayesian_poisoning) ahead of time to open the door for later link spamming.
Also, I'm a fan of not allowing brand new accounts to post URLs in their comments. It's a no-brainer.
Pity it also targets normal users that simply want to post a hyperlink :(
Isn't the website field for a hyperlink? (or at least a domain that is converted to a hyperlink?).
If I leave a comment but refer back to a related page on my blog is that automatically "spam" in your opinion.
I may not have reverse engineered it fully, but something like this will allow me to post images around the internet that actually create comments on your site by the IP of the visitor.
<img src="http://dendory.net/blog.php?id=5078058e&cn=Kudos&cp=... />
Step 2: Google, Bing, and others index my site
Step 3: Every bot that crawls my site is now a spambot
Then disallow any form submissions server-side which contain a value for 'website'. Automated bots can't resist filling out that field.
It works very well unless you're big enough to merit individual attention from a spammer. It's not rocket science - it just raises the bar a little above the level of effort that people who spam everything, everywhere are willing to put in.
That might change.
The plugin improves on the method described by randomly generating the value of the additional token parameter, and keeping a list of all generated tokens. If the server receives a comment post request which does not contain one of the generated tokens, then that comment is guaranteed to be automated spam.
We're not allowing commenting on WP, but obviously have to allow people to post on the forum. The forum software offered a couple of (unofficial) anti-spam plugins, but they were not effective at all.
Decided to try re-captcha, but found that to be equally ineffective (hadn't read about just how broken re-captcha is until this incident).
So I spent 10 minutes writing a little script that checks for mouse movement and clears a pre-populated field. If the field isn't empty, bot it is.
Wasn't sure it'd work, but so far, so good. I know it's not ideal and will be a problem for people without js enabled, but the site and product are targeting a demographic in which that's likely to be a rare occurrence so the benefit > risk.
Nice idea. I tend not to use the mouse a whole lot once the 'reply' link has been clicked, have you had any complaints of legitimate posts being lost?
I'm wondering if adding a check for key down/up events would mitigate this potential issue since a spam bot is not likely to generate those either.
The forum requires registration (and verification) before posting, so once they're registered there aren't any restrictions. And one of the benefits of this check is that there aren't any "human verifications" visible to the user. In fact, I could probably do away with the email validation too.
For example, We built a website with a forum some years back and used phpBB. Within days massive amounts of explicit porn had been posted all over it and we had a client threatening to sue.
We tried everything we could to get rid of it, stopping images/hyperlinks from being posted, adding captchas , anti-spam plugins and doing stuff like adding sneaky hidden form fields.
At one point we even deleted the signup form and required administrators to create accounts by hand on request for users, yet the bots still somehow managed to create their own accounts on the forum.
None of it worked for over a month at a time.
In the end I just built a super simple php forum by hand in a few hours with very rudimentary anti-spam since it was a small forum and we weren't using many phpBB features anyway.
Took over a year for the bots to come back and at that point switching the HTML around and changing the form field names seems to have kept them away thus far.
The "problem" with this kind of tricks is that they works for small/medium website and only if they are not adopted as part as a big library that everyone uses.
They are not that hard to beat if you want to spam someone intentionally or if they are implemented by a well known plugin for (wordpress/joomla/etc..)
A simple honeypot with some CRSF tokens would reduce spam, if you want to beat spam altogether, then invest some time in a captcha, but expect it to come at the user's expense.
Some of the bots simulate mouse movements, some of them even inject letters/words into textarea elements as if someone is typing. It's not that hard to make it look like someone is correcting typos.
Server-side, encrypt a token which, including representing the unique form instance, contains a tick count and set a hidden input's value to it. Now, ensure that each form instance cannot be submitted more than once AND that the delta between the current tick count and the form's tick count is greater than or equal to the amount of time that would be need for a human to fill out the form.
You MUST ensure client-side error detection is superb (as you want to catch all errors prior to submitting), handle for back button usage properly (browser caching directives, http status codes, etc), and ensure you handle for browsers which may auto fill information in for the user.
You would be surprised just how many bots come in and either used a cached form or immediately submit it. Assuming they are smart enough to bypass both of these, you just reduced the number of times they could potentially spam you dramatically.
The tick count figure needs to be done on a form by form basis, as each one likely has a different minimum.
Interestingly enough one of you, someone who saw the story here, decided to actually write one such bot and start spamming my blog post, but again they were pretty stupid and it was trivial to block. Still, pretty sad that someone would go to this length and actually try and send hundreds of spam posts just for the kick of it.
Also a lot of people mentioned captcha, and yes I guess I should have mentioned that, but the reason I never used one is because I didn't get any spam in the first place.
The per-message payoff for spam is horrendously low. Spammers only do it because they can post a huge number of messages. The big threats are necessarily automated, and that automation isn't going to bother with special cases for any site that isn't worth their while.
For the longest time, the anti-spam measure on my blog's comments was a field that literally said:
Type the word "elbow": _____
Somebody finally added this to their bot, so I modified it slightly, to:
Type the word "humour", but with American spelling: _____
For the curious, the problem I chose is a standard one you'll find if you search for "hashcash". The quick version is that the server generates some random data and gives it to the client. The client then searches for a salt that, when added to the data, produces a SHA-1 hash with a given number of leading zero bits. The number of leading zero bits required can be easily tuned, with each additional bit roughly doubling the amount of time it takes to find a solution. The client's solution can easily and quickly be verified by just combining the client's solution with the generated data and counting the number of leading zeroes in the SHA-1 hash.
I occasionally get spam, still. From looking at the logs, I'm about 99.9% sure that these spam comments are being posted by actual human beings sitting at a browser. I have no idea how it could possibly be cost effective to do this, but the quantity is low enough that it's not a real problem.
My crazy hashcash solution has an additional benefit, which some might see as a liability. I only start the work when the user clicks on the comment form, in order not to burn up their battery unnecessarily if they don't plan to leave a comment. The user then has to wait until the proof of work is completed, typically 20-30 seconds, before they can post a comment. This strongly discourages short, off-the-cuff comments, which are almost invariably worthless anyway.
In short: spam prevention is easy if your site is small and you have the time to invest in a custom solution. Any custom solution will do. As long as it doesn't match whatever patterns spambots possess, it doesn't much matter what you do, as long as it's unusual.
Once your site gets big enough, you'll no doubt need more. But cutesy stuff like changing your form variable names won't save you then anyway. If you're at the level where the linked solution works, you're at a level where nearly anything custom-made will work.
No spam yet. But it's quite a small site, probably this is over only about 6Million hits.
For all I know it's just because it's a hand-coded site. Trying this on a WP site is on my todo list.
However, isn't that something of the past?
I've totally outsourced my blog comments to Disqus (there are other alternatives) and I'd like to say, I'm very happy with my decision. Some manual spams still leaks through but they're so minuscule and I don't really fret over them any more.
>I've totally outsourced my blog comments to Disqus
That's all well and good until someone writes a bot designed to target Disqus users because of the size of its userbase.
Fighting these kinds of problems makes for interesting mental challenges, but a technical solution isn't necessarily the best one. Shouldn't the price of having space on my site to comment be that you do so from some kind of online identity of your own?
They are an open and distributed service that uses various signatures (ip, tar pits, etc) to block spammers and bots on your site.
I put it up on one of my sites and saw an immediate drop to almost nil. I went from 100+ spammy messages a day to less than 20 in the last 3 months.
What is working for you is it's custom code.
Anything that doesn't match standard templates is helpful.
Downside is of course having to download the sites you want to spam, whereas apparently traditionally spammers just send post requests.
Might this code be shutting out legitimate users? (Apart from the fact that if you have JS turned off you can't comment, that is.)
As other commenters have pointed out, however, this kind of defence only works against generic attacks, and defending against a targeted spam attack will always be difficult. But for the generic case, there will continue to be simple things you can do to thwart naive attacks. One that springs to mind is to introduce a scripted timing element. A spam bot won't wait a minute before submitting, but a user should at least have read the post they're commenting on.
For any spammer that has softer targets it makes little to no sense to bother.
For personal sites it really comes down to your preferences. Personally I would prefer that everyone was able to comment, however if it stops you having to wade through thousands of spam messages every day I can see the point of using it.
In this case though, from an accessibility point of view there are a few issues with the use of a link tag rather than the standard form 'input submit' or 'button'.
- It goes against user expectations of how the form functions
- User would not be able to submit the form while focus is on one of the inputs (although this could be remedied with more js)
- User would have to realise that this form does not have a standard submit button and realise that the link tag is the submit button (difficult for screen readers because there are no alt tags).
There is also an issue with usability for the few who don't have js enabled, as they will not be able to submit this form.