

Ask HN: 2 karma minimum to lock out spam bots - jauco

I just flagged two submissions that were blatant spam, both submitted by accounts created two seconds before the post.<p>Wouldn't a 2 point karma minimum to url submission and voting fix a lot of spam posts? For legitimate users it's pretty easy to get one upvote on a comment and as a nice bonus it would raise the chance that the user has a basic grasp of community values.<p>For spam bots OTOH it's impossible to submit links before they get one upvote and it's impossible to upvote themselves before they get one upvote. I think that it effectively locks them out.
======
ComputerGuru
That would encourage them to try to generate spam comments in the slim hope
that they'd get upvoted.... which would lead to a much worse spam situation.

For instance, they'd employ Markov Chains to re-hash comments in a post, or
from older posts, or from the text of the article.

One thing I've learned: spammers don't give up. If you give them an easy way
to submit that's even easier for you to clean up, that's better than starting
an all-out war that'll just increase the spam content and make it more
difficult to filter out.

~~~
jasonkester
Very true.

The best you can do is allow the post to go thru and make it appear to the
spammer that they have succeeded. Let them view their post on the homepage and
go away with a nice warm feeling of satisfaction, secure in the knowledge that
they don't need to improve their algorithm at all for this site.

It's pretty straightforward to modify your display mechanism to include spam
posts if and only if they originate from the same IP address as the viewing
client.

I implemented this on Blogabond, and it had two positive effects: My Spam
corpus is growing at a faster rate (thus making it more effective), and the
sophistication level of the average attack has dropped sharply.

~~~
philh
The problem is detecting the spam posts in the first place. Your idea would be
incompatible with the proposed method, because you don't want to do the same
thing to legitimate new users.

~~~
jasonkester
Certainly, HN must have a bayesian classifier looking at the content of its
submissions and deciding whether they are "Hacker Newsworthy". By now, the
site must have sizeable corpi (corpuses?) of good articles and spam links to
use in such an assessment.

Since this site was built by Mr Bayesian himself, it never occurred to me that
it might have weak spam filtering.

------
trickjarrett
I suggest we just add a recaptcha for people posting links below a certain
karma threshold.

~~~
froo
Instead of a captcha, what about something that is like a puzzle to complete,
that way you can also avoid low quality submissions which might come from
people not intelligent enough to solve said puzzles and would also be a
delightful little game for the rest of us who like to solve puzzles/problems?

~~~
bendotc
First, I don't think I would be delighted by an automated puzzle on HN.

Second, the idea that puzzle-solving-ability is somehow closely related to
interesting/insightful writing is specious. People are smart in different
ways, and as a programmer myself, the kind of smart that I want most is the
kind I don't have, which is often not the problem-solving kind.

All that having been said, I'm not against a captcha. I just don't think it'd
be delightful and I don't believe that it'd somehow raise the quality of posts
around here (beyond from removing some spam).

------
DarkShikari
I saw an interesting method used by spambots on the Doom9 forum: they would
take old threads and repost them (with a link to their ad in their signature
or such).

Some variant of this method might work to circumvent such a policy; find a
similar thread, pick a comment from it, and post it, perhaps? It might only
work 20% of the time, but that's good enough to get some spam accounts with
URLs approved.

Of course this would all take effort, and might be enough to lock out a lot of
would-be spammers, or at least convince them to go somewhere else.

~~~
Tichy
Reminds me of what I see on my Wordpress Blog: lot's of 0-content comments
like "this is a cool article". At first they didn't make sense (blocked them
anyway), but then I remembered the Wordpress settings for comments. By
default, it is set to "commentors must have been approved at least once". So I
guess if I would let one of those innocent comments through, they would be
back with more serious stuff.

Spammers are crafty...

------
ashleyw
Good idea; though I'd argue 2 is too low, if these aren't "bots" but real
humans, it wouldn't be hard for them to adapt by commenting and then upvoting
each other.

10+ would be nicer, anyone truly interested in HN would understand why! :)

~~~
talison
It's a good point that some spam "bots" could be human. I was running a free
email site when we saw a lot of strange account creation. We had recaptcha
enabled and knew it hand't been cracked.

It turned out (based on IP address analysis) that the accounts were being
created by humans in the Philippines and then handed over to spammers in
Dubai. Ah globalization...

If you have an efficient spam vector, it's not unusual to see low-wage humans
manipulating the system to get around captcha.

~~~
jasonkester
This is a lot bigger than you'd expect. Nearly all the spam that makes it into
the database on my sites is human-powered. It's maybe only 1% of the total
attack volume, but because simple checks knock out nearly all the noise, it
becomes the most significant fraction that you have to deal with.

~~~
ErrantX
limit it to 1 account creation per IP per hour :)

Yeh they can use a ton of proxies to get round that but I bet it cuts the
account creation _right_ down.

And it shouldnt affect 99.999999% of "real" users.

~~~
jasonkester
Ah, but there's the rub. Extrapolating from my comment above, your number one
job is to make spammers feel successful when they fail. If you reject new
accounts like this, you'll force them to write those little proxies to get
around your system.

The better thing to do is to simply notice what they're doing and flip the
IsSpammer bit on all those new accounts (including the first one.) That way
you can correctly classify any content they may post from those accounts in
the future.

~~~
ErrantX
There's the age old argument of which system to go for: Passive (my
suggestion) or Active (yours).

Probably both have merits but I am inclined to agree yours is the better way
:D

------
slater
How about we keep with the intellectual stuff around here, and make 'em answer
math captchas. Using latin numerals.

eg, what's LV + IV? Answer has to be given in latin numerals, too.

~~~
dhimes
We'd have to agree on how to write II + II

We could ask for a basic derivative (for example, of a polynomial), or the
value of 'x' in a simple algebraic equation x+5 = 2x + 10

~~~
cperciva
Problems like that are far too easy for computer programs to solve. Clearly
the right solution here is to present new users with a set of Turing machines
and ask them which of the specified machines halt.

This would probably be very effective at limiting the growth rate of HN, too.

~~~
dhimes
I've never written a bot. I thought perhaps that the process of parsing
instructions (we could put more than one variable in the equation), then
writing code to solve, might be more effort than it's worth to them.
Especially if we're doing something that is somewhat unique so their reward
would only be one page. However, phildawes is apparently implying that once
the bots have parsed the problem and know what to solve for they could go to a
page that implements Mathematica and submit the problem to be solved,
therefore saving the time required to write problem-solving code. I didn't
realize they worked like that.

~~~
rythie
agreed, computers are good at solving math problems or running Turing machines
(since they are them). You would be better off with a turing test
(<http://en.wikipedia.org/wiki/Turing_test>) that only a human can answer -
and if any does get a computer to solve it you get them to start a company
because they have solved a fundamental computing problem.

~~~
cperciva
Computers are good at simulating Turing machines. Computers aren't good at
determining whether Turing machines halt; in fact, it is impossible for a
computer to determine in the general case whether a Turing machine will halt.

~~~
rythie
Fair enough, though it might have the effect of not letting humans in either
;-)

------
rythie
There is an article on slate.com about CAPTCHAs
<http://www.slate.com/id/2216837/> (submitted here:
<http://news.ycombinator.com/item?id=579051> )

------
seejay
Hope HN won't introduce anything similar to the burying system on digg which
the powerful users on the site use for their advantage.

