

Ask YC: How come spammers are not attacking YC News? - adityakothadiya

I run a niche social news site in the part-time, and it has a very small community. It's growing slowly, but I'm fine with it. What I'm worried about is - how to control spammers by not submitting irrelevant stories on my site.<p>Adding CAPTCHA is one option, but then I was wondering, YC News also doesn't have any CAPTCHA protection. Then how come spammers don't submit advertisement based non-relevant news to YC News?<p>Does YC News algorithm detects such kind of links?  Or is there any manual intervention? Or it's just that the community is so good that nobody attacks it.<p>In anyway, your input about how can tackle this situation will be very helpful. Currently I manually go and delete all those irrelevant submissions (Daily there are atleast 5-10 such submissions.)<p>-Aditya
======
pg
They are. We currently get about 30-40 spam submissions a day. Turn on
showdead in your profile and you'll see it all. The reason we don't get more
is that we're very aggressive about killing spams. Most spammers give up
eventually when they realize that submitting here generates near zero traffic.

~~~
adityakothadiya
Thanks PG for your advice.

BTW, when you say "aggressive about killing spams", you mean killing manually,
right?

~~~
pg
Some spam gets killed automatically. Some gets flagged either by filters or by
users (there is a flag button on stories after you get over a certain karma),
and killed manually by editors. It's very rare now for a spam not to at least
get flagged.

------
jasonkester
Pretty much every site with user generated content is overwhelmed with people
trying to post spam. For my site, I use a combination of javascript human
detection, bayesian filtering, and aggressive human intervention (including
single click "spam this" links on every piece of content when logged in as an
Admin)

It's worth noting that since late 2007, a significant portion of comment spam
is human powered. CAPTCHA style bot filtering doesn't work against it, since
it's not bots doing the posting. Bayesian filtering and good moderation tools
are essential these days.

~~~
tectonic
Sounds like a good business opportunity.

------
Dilpil
The sites most vulnerable to spam are ones that a) have a critical mass of
readership, especially dumb readership that will click on ridiculous spam
links, and b) ones not run by people who are active contributors to the field
of spam filtering.

~~~
Mistone
i cant see how not being an active contributor to the field of spam filtering
make your site vulnerable to spammers. Not be vigilant against spammers yes,
but you don't need to be in the industry to combat this problem. the dumb
readership comment speaks for itself, lets get off the high horse bro.

~~~
SwellJoe
I think the point was merely that pg has spent a lot of time thinking about
the problem of spam (it's one of the things he's famous for), he also wrote
the software that runs HN, and when those two facts combine you end up with
software that has many mechanisms for automatically preventing spam. It's just
that being involved in the fight against spam means his site probably makes
use of more cutting edge techniques than sites built by folks who have never
dealt with spam before. HN probably also has a much higher "editor to
submitter" ratio than most sites, and so a human that has privileges needed to
kill spam usually sees it long before it hits the front page.

------
crabl
They're all afraid of Paul Graham.

------
TomOfTTB
I think a lot of it also has to do with the community itself. A site of this
size would probably get 300-400 spam messages a day if it weren't for the fact
that it's audience would see right through it. Tech people are so concious of
Spam that they ignore it out of principle which means spamming a tech site
pointless.

As for suggestions...

1\. Obviously CAPTCHA. It just makes sense 2\. I find keyword blocking very
effective. So, for example, if I was running Hacker News I'd block any news
item containing the word Viagra that was submitted by a user that is under a
certain feedback level (like, no feedback, for example). With one caveat which
is to give them a way to manually verify it (say an e-mail sent to them that
allows them to verify they are an actual person and have the item approved)
3\. Use E-Mail Spam Block Lists. Lists like SBL, CBL and XBL give IP addresses
that generate massive amounts of spam. Many of those same IP addresses
generate web spam. 4\. I've never been a fan of this paticular method because
I think it's discriminatory to an extent I'm uncomfortable with but many
places have special requirements for countries that are famous for spam
generation (Russia, China, etc...) Like making users from those IPs jump
through special registration hoops.

Hope it Helps!

~~~
kwamenum86
Part 2 sounds powerful but it would make the submission process less simple
and maybe less user friendly for new users.

~~~
TomOfTTB
Well you'd only do it on words that are almost certainly spam. Like Viagra or
male impotence or...well, you get the picture. It works on the theory of "this
word would almost never be used legitimately in a post so it's almost
certainly spam"

I use this on my mail server and with 200 users I've yet to ever get a false
positive.

~~~
janm
> I use this on my mail server and with 200 users > I've yet to ever get a
> false positive.

How do you know? I don't see how you would measure that; if you can figure out
it is a false positive, you have discovered a better filter. You might get
user complaints, but the absence of user complaints doesn't prove you have no
false positives. (Although the presence of user complains could prove that you
do.)

Also: The assertion that everything to do with viagra is spam makes it very
difficult to have a discussion about viagra or spam. For example, this posting
would be rejected.

~~~
TomOfTTB
If you read my initial post I said specifically that it can't be just a flat
out block. What you do is stop it and send an e-mail to the person who posted
it asking them to verify they are an actual person.

That's both why it works even if you want to discuss viagra and how you can
tell if you are getting too many false positives.

------
ilamont
We have a problem with comment spam on our site (a news and prediction market
site, using Drupal). We introduced captchas, activated nofollow, all to no
avail -- there are some very persistent spammers who will still go through the
trouble of entering captchas just to have their stupid links show up at the
bottoms of comment threads. It's not a huge issue, but it's definitely an
irritant and added cost, in terms of the staff time required to clear it out.

