

Reducing website spam - AshleysBrain
http://www.scirra.com/blog/61/reducing-website-spam

======
TeMPOraL
"Most of the time they are oblivious to it. Some of the time they feign
ignorance. The ones who are oblivious to it after a bit more questioning
appear to have hired 'SEO Experts' to help improve their website rankings.
These 'experts' then start up their various pieces of spam software and sit
back often charging the site owners a lot of money for that service."

Back in the time when I was too lazy to install Akismet and manually managed
comments on my blog (via moderation), I used to see from time to time annoying
SEO comments - a fake profile with genuine-sounding comment, backlinking to a
website catalog. When I was especially bored, I used to go to such catalog in
order to find the SEO company who was spamming me.

On one occasion I contacted a private photo studio, asking if they're using
SEO company XYZ, as links to them are all over the catalog. It turned out that
they just hired a SEO company to 'position their website', without knowing
anything about how it's done. I explained the situation (sending screenshots),
and the studio immediately ditched the SEO company and hired another one. I
was amazed by this attitude and featured a special advert-like post on my blog
for them and we're in good contact since then.

~~~
TomGullen
Good story! Being proactive like this is a great idea, and I think I will try
the same when we get some more. A lot of it is from this sort of thing, people
who don't really understand what they are paying for which is what I tried to
highlight a bit.

~~~
TeMPOraL
Thanks :).

Yes, you're right. I currently don't have any other idea how to deal with
those small, spammy SEO businesses than to raise awareness about the subject
and hit their customers directly. It's sad, because the customers here are
usually not to blame - like you said, they neither know nor understand what's
going on here. I was quite (pleasantly) surprised by the immediate reaction of
that particular person, who 'out of respect for my space in the Internet'
decided to take action that cost her money. I deeply respect such people.
"Vote with your dollars" mindset should be encouraged, IMO.

Anyway, here's the story - it's in polish, but maybe HN folks from Poland will
find it interesting (and get to know about one thoughtful photographer) -
[http://temporal.pr0.pl/devblog/2011/09/19/spam-seo-i-
fotogra...](http://temporal.pr0.pl/devblog/2011/09/19/spam-seo-i-fotografia-
slubna/).

------
jackson71
A couple of ways I've handled the spam situation in the past:

1\. Base64-encode your form field names and decode them server-side prior to
processing, or...

2\. Create one-time use field names using md5 hashes of random numbers, map
them to their true fields and store them in a session. Then process against
those on the server side post-submit and clear the slate. (I used this method
more often than not.)

3\. Control the visibility of the honeypot field with CSS rather than
"type=hidden".

Using #1 OR #2 in addition to #3 I've never had to use CAPTCHA nor human
tests. A few paid spammers have come around from time to time, but since
automated software isn't sophisticated enough to pick apart which field's
which it either doesn't even try or it throws whatever it can at the fields,
getting locked up in server-side validation.

~~~
TomGullen
Good points. In regards to CSS I think it's important to specify exactly what
you mean though. I've seen this implemented dangerously where they simply
position the form control off the wide (left:-4000px). This is vulnerable to
browsers auto filling!

However hiding it completely with display:none should be safe and is what I
think you mean.

In our case, just hiding it works fine. We can upgrade this to a CSS solution
if we need to upgrade it though. The main point is if your doing _something_ ,
you're ahead of the herd enough for spammers to generally leave you alone.

------
typicalrunt
On the forum I work on (custom software; forked from JForum) we used to get
10s of thousands of spam a day. That stopped almost immediately with the
requirement of a verified email address.

We had tried CAPTCHA and honeypots but the spammers broke through it (we were
being actively targeted). Once we used email verification, it forced the
business owners of the email (Gmail, Yahoo) to implement better verification
on their side and stop so many fake accounts from being created.

Spam is now a rounding error in my system (about 60 for every 56,000 daily
posts).

I should also add that I subscribe to the broken window theory, so we also
implemented Akismet to check all incoming posts for spamminess. We hide all
posts that Akismet marks as spam, and it cleans up the place enough to
(hopefully) ruin any spammer's SEO tactics. Once it becomes a futile effort to
post spam that's just going to get hidden, they seem to stop aggressively
targeting us.

~~~
TomGullen
Interesting thanks! We don't want users to have to verify email addresses as
we see this again as a barrier of entry that will put people off. However I do
see the necessity of it in your case as this doesn't always scale very well.

~~~
typicalrunt
We were worried about implementing this feature too, citing the same issue.
What we did was explain to the users why we were making the change, bless some
of the existing users that are known to be in good-standing, and then provide
a tiered view of the forum for those accounts with an unverified email
address. The people in that group could post only text. However, once their
email address was verified, they are automatically put into another group and
they can post like normal users again.

So while it's a barrier to entry, we attempted to minimize it as much as
possible.

------
eli
Important caveat for using honeypot hidden text fields: use a field name that
is gibberish.

If you name the honeypot field something like "address" or "website", you get
browser toolbars that will "helpfully" try to pre-fill the field for the user
even though it's hidden. And then you're flagging legit users as spammers. I
think an ideal system would simply require users who fail the honeypot field
to submit a captcha rather than lock them out altogether.

~~~
TomGullen
I've done some tests and no browser appears to autofill an HTML hidden field.
It's why we picked that method as supposed to CSS as I'm not sure the
consequences of this.

Display:hidden on an input field would be OK, but there might be cases on some
sites where the field loads before the CSS file. This could cause auto filling
to happen. It's a lot harder to test this sort of thing with CSS rules, it's a
lot easier just to use hidden fields.

I'd recommend using names such as "username" as honeypots as the likelihood of
being filled are high.

~~~
eli
Did you test browser toolbars? I've actually been bitten by this. I believe
Google Toolbar for IE7 was one of the culprits.

------
RyanMcGreal
I managed to completely eliminate bot spam on a fairly popular site I
administer through a combination of a honeypot form field and a simple human-
testing question. This worked flawlessly for years, but a recent problem is
spam accounts that appear to be filled in by actual humans rather than bots.

~~~
TomGullen
We just don't like doing that no matter how simple it is. A lot of our users
are from all over the world (probably 50% from non English speaking countries)
and it really makes it a lot harder for them to signup. Also anything that
acts as a barrier no matter how weak WILL lose customers and signups!

~~~
SageRaven
So use math, the universal language. I had a site request that I try to stop
spam on a evaluation/registration page. I added a randomized single-digit
addition problem to the form and the spam ceased. A single input field where
the answer is always 0 to 18 was all it took.

Or course, if someone took the time to target this specific site, that would
be easily thwarted.

It still puzzles me just how this custom-made form got infiltrated by spam to
begin with. Do people go around picking such forms and submitting form-
specific bot-code code to some vast pool sold to bot operators? Or are bots
far more intelligent than I give them credit for?

------
willvarfar
The complex captcha has shown has been taken completely out of context; it's
at <http://random.irb.hr/signup.php> and its completely appropriate and funny

~~~
TomGullen
Fair point Will, I'll find a different image!

------
eli
If you site ranks highly enough for certain valuable keywords and the bad guys
start specifically targeting you, a lot of these countermeasures are useless.

I get comment spam on some sites that I'm 99% sure is being posted by a human
using a regular browser at an internet cafe in China.

~~~
TomGullen
If you're being targeted by humans, you're right, nothing you can really do.
If they are using automates systems to target you best thing to do is start IP
banning.

~~~
danneu
Ever since my forum got popular, I've had non-stop trouble with human
spammers. Straight up IP banning doesn't work. Too explicit and obvious.

A better solution is to let them register, but check all registration/post IP
addresses against the blacklist. If they match, unobtrusively move them into a
usergroup that just seems like your site isn't working that well.

Works wonders.

~~~
eneveu
Yep. There was an interesting article about this practice (hellban / slowban)
a while ago:

[http://www.codinghorror.com/blog/2011/06/suspension-ban-
or-h...](http://www.codinghorror.com/blog/2011/06/suspension-ban-or-
hellban.html)

<http://news.ycombinator.com/item?id=2619641>

------
Zak
I've been looking at a lot of spam lately as I'm developing services based on
text classification as a web service at classifyr.com, and spam is kind of the
obvious test. Initially, the site was getting 8200 spam attempts a day with a
captcha that was 99.9% effective. The classifier is 99.99% effective and has
cut down the number of spam attempts to almost nothing.

The ones that still get through on rare occasions copy legitimate content and
usually link to sites that aren't inherently spammy (like grey-market
pharmacies) for SEO purposes. I suspect these are actually posted by humans; a
post can classify as suspect rather than spam or ham and these always do when
they get through. When that happens, the user is asked a trivia question
related to the site's subject matter. I think it extremely unlikely a spam bot
would be able to answer correctly.

------
brlewis
Really? Is type="hidden" sufficient for an effective honeypot? If so I'll
start doing this right away.

~~~
TomGullen
For us it seems to be yes. It catches lots out. But there are other ways of
hiding the field such as with CSS which is probably safer.

------
shtylman
If you rename the username field with something nonstandard, then autocomplete
tools have a hard time handling it (think of email fields).

------
jeremydavid
I hope not to sound daft, but why do spammers do this? What is the benefit of
automatically creating these accounts?

Are they trying to find some sort of exploit in your code that lets them send
out emails?

~~~
rplnt
As outlined in another comment, marketing/seo is often the reason. That is,
bots will spam links. Either they want direct clicks or at least some SEO
bonus.

~~~
eli
The worst is the bots that are smart enough to figure out how to post spam on
your site, but not smart enough to see that it's all rel=nofollow

~~~
danneu
I'm not convinced nofollow links are valueless.

~~~
danneu
Source: [http://www.socialseo.com/blog/an-experiment-nofollow-
links-d...](http://www.socialseo.com/blog/an-experiment-nofollow-links-do-
pass-value-and-rankings-in-google.html)

Google "nofollow seo value" to find other insights.

------
rkon
Wouldn't an obscurely named username field hurt accessibility? People using
screen readers probably wouldn't be able to register.

~~~
function_seven
That's a very fundamental conflict when implementing these types of anti-spam
measures. Unfortunately anything that is screen reader friendly is often spam-
bot friendly, (or, anything that is unfriendly to spam-bots will also be
unfriendly to screen-readers) as both entities are programs that attempt to
parse a web page.

Maybe include a link near the start of the form that says something like,
"Screen-readers, follow this link."? The link would be to a one-time use page
that is sane and accessible. I don't know...

