
How To Stop Forum Spam - latitude
http://swapped.tumblr.com/post/36085762303/how-to-stop-forum-spam
======
TomGullen
The main way to stop spam on your website is to be ahead of the herd. At the
moment the general herd of websites do very little, so a couple of honeypots
and IP bans are all you need most of the time.

We were swamped recently with spam when our page rank recently hit 6 (I guess
spammers target sites with higher PR). For this we had to implement a block on
users with under n posts being able to post URLS. That basically eradicated
that wave of spam.

It's amazing how smart and adaptive some of them are though. It only took them
12 hours to realise if they post anything then edit it they can edit in links.
We fixed that one promptly and they haven't been back since.

So for that reason your suggestion probably would only be temporary. I'd also
be concerned about a lot of false positives.

I also believe a lot of spam is human. If you have people willing to work for
a few dollars a day spamming websites that automated processes have difficulty
reaching, it could well be a cost effective method for spammers to reach their
targets.

~~~
citricsquid
> We were swamped recently with spam when our page rank recently hit 6 (I
> guess spammers target sites with higher PR)

We have really good search engine rankings and if you post a topic on our
forum you'll be available in all the search engine results within minutes. The
_funniest_ thing I've ever experienced was discovering when a spammer will
submit a post about a live stream for a football match 15 minutes before kick
off that we'll see ~3,000 people browsing the forum all from google because
they searched "watch barcelona vs real madrid" and our ranking is being abused
by spam bots.

Prior to realising that this happened I'd always viewed spam on forums as a
sort of "hope people click the links" _thing_ but after discovering just how
much traffic people like those mentioned in my example can drive... these
spammers must be making _a lot_ of money. The forums are used as their quick
way to get into search results for a term they know is about to blow up, I
think it's pretty clever and obvious in hindsight.

<http://rmbls.net/post/21518498093/barcelona-vs-real-madrid>

~~~
TomGullen
We have that as well, and ranked very highly for some long tails all relating
to live sports events.

However, it is our opinion that the traffic these links generate is fake (and
we have some evidence for it as well). I think it's aim is to trick Google
into thinking it's a good quality result, or alternatively the links often are
very heavy in adverts so the fake traffic routed via Google, your website and
finally the spammers website could be used to generate lots of fake ad clicks
in a semi realistic way. That's our conspiracy theory anyway, I very rarely
see real world examples of content on the web being ranked so well so quickly
and driving so much traffic in such a short space of time.

As a small unscientific test when a seemingly popular spam sports event thread
was posted, I deleted it and created a duplicate under my account. It didn't
get any traffic at all. This in some way helps validate my suspicions that the
traffic being driven is not a result of Google's process, but is artificial.

I was going to write a blog post about it but never got round to it!

~~~
citricsquid
Hm, that sounds like a plausible theory. I've never investigated beyond
looking at the locations via google analytics and they always seemed to be
European visitors (Spain was the majority) which I assumed meant the traffic
was legitimate because the football matches of interest match up. Next time it
happens I'll look up some of the IP addresses used and see if they match any
known spammers, although I suspect if they are doing some sort of click fraud
they'd be using bot nets?

------
jiggy2011
The easiest way to do it in my experience is to write your own simple forum
software. This should only be a few hours work in a framework and will have
the added bonus of being more specialised to your needs and probably simpler
to use.

If it's a small forum you probably don't need any actual moderation features
(stuff you do need you can do through a scaffolding interface).

Bots seem to be attracted to phpBBs like flies to shit so not being
fingerprinted is a huge bonus.

Also helps to do weird stuff, like instead of having a <input type="submit" />
for a form submit use an IMG tag with an onclick that reads data from the form
fields in the most roundabout way possible.

Also make the URL of each thread change when there is a post so it makes it
harder for the bots to tell if their posts are working or not. Better yet
don't display the results on the post on the site itself for 30 seconds or so
but use JS to make them appear to the submitting user.

Some of those tricks can of course hurt usability and accessability though but
I managed to reduce spam to 0 on one forum.

~~~
citricsquid
The sort of forums you're talking about won't be what spammers really care
about. If they can hit a small forum with no work needed they _may as well_
but the targets that they'll _work_ for are the high traffic forums that can't
sacrifice usability, SEO or accessibility in any meaningful way to stop spam.

If you want to stop spam on a small forum you don't need to break usability
because nobody is going to spend time looking at how you're doing things to
get around your spam prevention.

~~~
jiggy2011
I suppose it depends upon how sophisticated the spam bots you are dealing with
are and whether they have human assistance.

We originally used phpBB and applied various anti-spam plugins, including some
modifications that were made manually to the php code (including honeypot
fields that were hidden with CSS) but the bots kept spamming it regardless.

I guess there is a sense of "we know this is a phpBB therefor we must be able
to spam it so keep trying" vs "I don't know what this is","can I POST this
form and see instant results?","no?","give up then".

If it had been a larger forum then I'm sure these tactics would not have been
effective regardless since we would have suddenly become worthy of having a
custom spam bot written just for us.

Those tactics are somewhat extreme, in many cases just giving form fields
weird names has good results. Of course there is the problem of "what is bad
for spam bots can be bad for screen readers".

------
buro9
99% of spammers can be stopped simply by IP and email bans.

For which I find this one of the best providers of up to date lists... oh, and
they have an API I use at registration time: <http://www.stopforumspam.com/>

The last 1%, I let my other users flag, and then I ban them and add their data
to the site above.

This isn't a big problem anymore, it just sounds like the author hasn't
integrated his forum with the site above.

~~~
latitude
The principal difference is that Stopforumspam is a blacklisting method, and
what I described is a whitelisting one.

Theirs is an effective approach, and it works great for dedicated forum sites.
But I think it's an overkill for simpler setups for two reasons. First, it
creates an obvious dependency on an external service. Second, if I want to
allow unregistered posting, it leaves me only with an IP address as a data
point and here I wouldn't bet that their by-IP detection is too accurate.

It's a simple, self-contained, virtually maintenance-free way to detect humans
trying to post. I don't argue it's a superior to other methods, but it is
_simple_ and it helps simplifying the user experience of real visitors.

~~~
Jhsto
I've been developing this 'invisible-catpcha' thing for a while now, which is
basically meant to block bruteforce attacks and spammers alike from posting
comments and registering accounts etc. The visual difference to the end-user
is - as said - invisible, which obviously should be a good thing. Now that I
read your comment I start to wonder would anyone really even use my service,
as then their site's spam blocking would be dependent on me? Is there other
people ho see it this way?

~~~
latitude
FWIW I'm pretty sure that my views on external dependencies aren't very common
or popular. I think you'd have no problem finding users for your service.

~~~
moepstar
IMO, this depends - i'm sure quite a few factor this into their decision on
using or not but in the end it depends on your reputation being stable/online
99.XX% or being flakey...

------
gm
This is not really about how to stop spam, as much as it is a bite-sized
statement that we should analyze behavior to tell bots apart from real users.

------
joshuahedlund
I've found the best solution is to use a variety of strategies. We recently
switched a forum from PhpBB to MyBB in the hopes that a slightly less popular
platform would get less spam... we had a spam user sign up and post within 60
seconds of finishing our conversion.

However, we set up a StopForumSpam plug-in, a time-based plug-in (if you
complete the registration page in < X seconds, you're probably a bot), and a
custom questions plug-in, and we haven't seen any spam since. If they start
figuring out to break through those walls, we'll try a few more... (On the
other hand, it' s a small enough forum that I don't know if we're missing
false positives.)

Yes, it's a cat and mouse game, but it's one you can generally stay on top of
with minimal configuration once you understand what the spammers want and how
to use your platform. And I have to admit there's some thrill involved at
successfully thwarting their plans...

~~~
lazyjones
Switching to another "mainstream" forum software is probably in vain. We have
our own software with unusual URLs, navigation, registration (requiring
confirmation via e-mail) and still get a couple of spambots trying to post
their shortened URLs every day, as well as obviously human spammers who try to
participate in discussions for 4-5 posts, then post a few spam URLs.

~~~
coffeeaddicted
Also I'm pretty certain that it's a mixture of bots and humans by now. I'm
using random questions currently and whenever I change them the amount of
registered bots goes down for 1-2 days. Then it slowly comes up again over the
week until I think up new questions.

So I guess there's probably someone going over unknown questions regularly and
adding new answers.

------
Metatron
Nice idea until spammers just make bots that idle on your site and follow a
few links before and after signing up.

~~~
huhtenberg
They said the same about greylisting. That the spammers would just write a
retry logic into their sending daemons. This was several years ago, back when
I enabled it on my mail server, and I am yet to see a single spam that didn't
come from Hotmail or Gmail.

[1] <http://en.wikipedia.org/wiki/Greylisting>

~~~
nivla
Sadly, greylisting is an absolute nightmare for server-admins on the sending
end. We sent out time-sensitive emails to our clients and Yahoo's Greylisting
could sometimes delay accepting the mail for upto an hour. This causes extreme
frustration for our customers and some even leave as a result. Its even worse
because customers are never aware of Greylisting and even if they do, Yahoo
wouldn't let them have control of it. Asking someone to change their main mail
provider doesn't end well either. This could also happen for confirmation
emails when they first sign-up to your site. Spam does indeed cost a lot to
businesses directly or indirectly.

------
gearoidoc
I had this problem recently in the custom blog/comment functionality I had
written for my company's app.

I've taken a few steps which has completely rid me of spam (for now):

\- IP blacklisting: an obvious one but well worth the few minutes it takes to
setup, most of our spam was coming from China.

\- Link blocking: our comments don't really require links to work as the
comments are generally short and to the point and the blog is not used by
terribly tech savvy users.

\- Hidden checkbox: add a hidden checkbox to the form. If it comes through as
checked you know a human didn't submit the form.

Analysing a visitors progression through the site is a neat idea though - if
spam becomes an issue again then I may use this approach (since I'm already
gathering this data for custom analytics).

------
Thrall
Has bayesian filtering been tried on forum posts? (It can work well for
emails)

If the filter is unsure, the post can be referred to an administrator for
moderation (and will subsequently be added to either the spam or ham corpus,
training the filter for similar posts).

In the case of false negatives (spam gets through), a discrete "report spam"
button will allow a moderator to add it to the spam corpus (again training the
filter against similar occurences).

It might even be possible to use the filter score to reduce "report spam"
abuse, i.e. if the filter is fairly certain it's ham, require a larger number
of users to report it as spam before bothering an admin with it.

------
sheraz
I've mentioned this before in previous posts:

    
    
      * http://news.ycombinator.com/item?id=4646710 (How to beat comment spam)
    

in short: <https://www.projecthoneypot.org/>

------
codebeaker
Doesn't this solution as soon as people come directly to the forum from a
search engine, and asking bots to maintain a cookie jar is hardly challenging.

------
systematical
I could just as easily craft my bot to appear more like a user. "Reading"
posts, checking PMs, etc. etc.

------
islon
I'll give a week until the bots adapt.

~~~
latitude
See huhtenberg's comment above. It was my experience too - the spammers are
just too lazy/busy/cheap to chase the fractions of the percentages.

~~~
nospamforums
Maybe they're just efficient. Little work for max cost is more efficient than
lots of work for max cost. (This is exactly what you said with busy and
chasing the fractions of percentages)

