
Ask HN: What's with all the trivial link spam? - cperciva
I've noticed an increasing trend over the past month: New accounts are created and immediately used to submit a trivial link -- e.g., "Mozilla Firefox Start Page", "Orkut - home", "Google", "Gmail", "Hacker News | Submit", etc.<p>Can anyone come up with a plausible explanation for this?  It doesn't make sense as traditional spam, since the pages in question aren't selling or promoting anything; nor does it make sense as accidental bookmarklet clicks, since (I assume) the bookmarklet doesn't create an account immediately before submitting the page.<p>For now I'm just flagging such content-free links, but I'd love to understand what's going on here.
======
DanielBMarkham
On my blog I'm getting comment spam without any links as well. The usual stuff
like "Great work! Plan to come back often!" but there's no link to Joe's Sex
Shop or anything -- just a bogus email address and fake name.

I used to think it was just poor comments, but it only happens on certain
pages, which aren't at the top of my most popular list.

My theory is that the bots are working in waves -- first identify easy
targets, then slowly exploit them. I wouldn't be surprised if the botnet guys
are using Mechanical Turk-type operations just to work the CAPTCHAs and get
valid logins for later use.

It used to be you could make generalizations about the bot wars. Now, with
possibly millions of programmers out there with nothing to do but try to work
the system? The complexity and nuance of attacks are several orders of
magnitude greater than just five years ago. I'm not sure any kind of sweeping
statement catches what's going on, except: there are a lot of entities on the
web that are looking for doors -- any kind of doors. Once they find them, it
may be months or years before they are ever used.

~~~
briansmith
One of the ways Askimet works is shared-secret-key authentication using your
email address. Notice how sites collect your email address when you submit a
comment but they don't usually publish it. However, they _do_ send your email
address to Askimet. As a result, we can get a fairly good approximation of
whether a comment is spam or not simply by checking to see if the email
address was used for comments previously flagged as spam or ham. The more non-
spam comments--and especially the more ham comments--that the email address
has in the corpus, the less likely it is that the new message is spam.

Now, imagine I am a spammer. I need to submit some automated comments that
don't look like spam so that my spammy comments wll later get accepted. There
are a variety of ways I can do this. The easiest is to give the user a generic
compliment like "Wow, this was a great blog post! Thanks for publishing it."
The comment doesn't add to the discussion but the blog owner is often
reluctant to delete it because (a) it strokes his ego, (b) there is no way for
him to tell if it is automatically generated or not. In fact, if his spam
filter preemptively classified it as spam, he might even override the filter
and reclassify it as ham; this would be a huge win for the spammer-to-be.
Another way to do this is to automatically scrape a comment from another
discussion about the page like Reddit, Digg, Twitter, or FriendFeed and submit
it as a blog comment on the original page. A third way is to pay somebody a
(very) small amount of money to read the blog post and write a comment
(possibly using some fill-in-the-blanks response template system like those
used in call centers.)

It works the other way too. If I want your comments to start getting flagged
as spam then I should start submitting spam-like comments in your name. Then,
users will start classifying my forged spam comments as spam and the automatic
classifiers will start automatically classifying your valid comments as spam.

~~~
briansmith
I forgot to add the corollary. Much like computers will soon be able to solve
CAPTCHAs better than people, automated spamming systems will eventually be
able to generate comments that are as good or better than human-submitted
comments. In other words, we can expect comments that are generated by
spammers to eventually become valuable enough that we want to retain them
instead of filtering them away. That isn't the reality today but it could be
soon.

You might have a discussion in your blog's comments and then realize that
everybody in the conversation is a robot except you. And you will learn
something from those robots and/or lose an argument with them, when they get
really good.

Eventually, you will be able to post a rough draft of your blog post, and the
spam robots will start up a good conversation about it in the comments. Then,
based on their feedback, you can revise it into the final draft (if the robots
haven't already given you a link to a blog post that refuted your point or
made it better than you can).

As a final step, we might notice that not only are the best comments are
written by spam robots, but the best blogs are spam blogs too. Right now,
people are blogging-for-adsense manually; in the future, Google will be able
to blog-for-adsense itself, cutting out the authors.

Today, if a site wants its user-generated content (comments in this case) to
retain its value, it needs to start filtering out the low-value content,
regardless of whether it was generated by a spammer or a well-intentioned
user. For example, If I add a comment to your blog "Great blog post!" on your
blog, you should probably delete it, even if you know I am not a spammer.
Unfortunately, if you do that, legitimate users who see their complimentary
comments get deleted might react negatively (e.g. stop commenting or complain
loudly, generating even worse comments). So, we can see that the spammers who
are generating low-value comments already have the ability to drive away your
valuable commentators if those commentators occasionally submit low-value
comments.

~~~
thepanister
This looks so scary.

 _we can expect comments that are generated by spammers to eventually become
valuable enough that we want to retain them instead of filtering them away_

What is the point? Why spammers would submit a valuable comment? Just to be
kind of a door for them to enter through later, and nobody will think it's a
spam?

~~~
briansmith
When you submit a comment to a blog, you usually get to include a link to a
website along with the comment. Usually the commentator's name is linked to
that website in the resulting comment page. If my spam comment is good, a very
small number of people will click my username and go to the site I choose. The
more comments I get published, the more clicks I get on my link. If that is a
spam link, I get paid for every one of those clicks.

Or, a spammer might generate some good comments just to get the comment
classifier to mis-classify a blatantly spammy comment that will be submitted
later.

~~~
thepanister
You are more than a genius!

Do you have any publications or a blog? I am very interested in learning more
from you... you look like a scientist!

EDIT: But there is a point here that you did not really mention! You assumed
that spammers will be smarter, but spam filters will still stupid? Don't you
think that as spammers build a smarter bots, the spam-fighters will build a
smarter spam filters?

~~~
briansmith
See, this was a very nice demonstration. How am I supposed to tell if
thepanister is a robot or not? Based on the content of this message it could
go either way.

~~~
thepanister
LOOOL But I edited it to look a man-made. :)

Please re-read it again.

~~~
briansmith
I have been meaning to make a blog but all my computing time has been spent
building some software that I will release soon. Email me
(brian@briansmith.org) and I will send you a link when it is ready. Include
what kind of phone you have (e.g. Nokia 1100, Android) and I will send you a
free copy of the software if it works on your phone.

(This is an open offer to anybody: if you are the first person with your model
of phone to email me, I will give you a free license if/when it works on your
phone. Also notice how spammy this comment is.)

~~~
DanielBMarkham
Can we be sure you're not a robot?

That's kind of a joke, but kind of serious too -- the entire idea of CAPTCHA
is going to have to evolve in quite meaningful ways.

I liked your thesis, as speculative as it was. My general impression is that
we're talking decades here, not years. Predicting out that far is always
tricky. One can imagine receiving phone calls from friends -- the friends
being electronically generated voice impressions of our real friends that try
to sell us things. Once you start down this path of faking a person, you're
going to end up in some very weird places. For instance, I could create "fake
mes" that would interact on the web as well, creating blogs, commenting on
articles, documenting a presence -- all for the purpose of leaving a bad trail
for spambots to follow.

------
eli
It's not just testing the system. At least one popular blog anti-spam system
forces comment to moderation only on the _first_ comment from each person.
Subsequent ones go right through.

I was seeing spam like this over a year ago across a network of sites all with
the same set of messages (e.g. "Good site! Thanks!")

~~~
CalmQuiet
And I guess moderators are reassured by the nice (though valueless) words...
and permit the comment.

If your theory is right it would suggest that moderators set a higher standard
(e.g., _actual content to the comment_ ) for what they permit?

------
pg
I've been wondering too. My current theory is that it's spammers testing
whether submission is unmoderated.

~~~
thepanister
Hello, I had some doubts that you might think of it like: "If I fight their
spam entries, then they will increase the flow". Am I right?

EDIT: Are you worried about the server? If you make a spam filter algorithm,
then this is more likely to increase the load on the server, which increases
the latency... and as a result, increases the headache?

~~~
alecst
Like DanielBMarkham said, the spammers are likely just looking for easy
targets. So my answer to you would be "probably not."

But if we assume that they will take HN at any cost, then their course of
action will match that of bacteria: attack, mutate, rinse, repeat. Eventually
it will be either them or us that survives. And it probably won't be us.

~~~
DanielBMarkham
More than anything else this is what will probably drive true artificial
intelligence: the need for the internet to survive as a valid information and
commercial platform. We're clearly setting up a prey-predator situation which
will continue for as far as the eye can see.

------
SwellJoe
My theory would be that it is a new spam bot, being tested out by its creator.
Just a theory, of course. It could also be a user with a bone to pick with HN;
their blog posts consistently got deaded by moderators, perhaps, and now
they're angrily trying to prove something. Maybe they're doing it with a spam
bot built for the purpose...this is Hacker News, so building such a spam bot
would be trivial for most of the audience here. A few lines of
Python/Perl/Ruby would do the trick.

~~~
ErrantX
I've noticed HN appear in a few "spammer lists" over the last month or so.

Usually the price to spam into here is very very high because it is, by
comparison, quite difficult.

Im tempted to agree with you - it is a spammer going after the good money
testing a new bot.

~~~
pg
_it is, by comparison, quite difficult_

Merely difficult? Isn't it impossible? Has a spam ever survived here?

~~~
SwellJoe
Borderline stuff can last a few hours. I've occasionally seen linkjacked stuff
that is several hours old. The story being jacked is interesting, and so it
may even have upvotes. Obviously, once a moderator spots it, the link gets
changed to the original story...but it'd be hard to spot in a quick peruse of
the New page, so I can understand how such links could live so long.

Anyway, it depends on your definition of "survived" and whether there is value
for the spammer in their story living on the new page for minutes or hours.
Our company forums get an onslaught of spam about once per week, and though
the spam never lasts more than a few hours (far less these days, as I've added
a couple of moderators), it seems to be the same spammer doing it over and
over again, so they must consider it worth a shot. When your employees time is
almost free (or the work is done by a bot), pretty much any result is a
worthwhile result.

~~~
pg
I should be more precise. What I really meant was, has a spam ever made it out
of the holding pen of the new page and onto the frontpage? The new page only
gets a fraction of the traffic of the frontpage.

I'm pretty sure the fact that spammers do something doesn't automatically mean
it's worthwhile. There are some spammers whose stuff has been autokilled for
months, but who keep submitting. They can't possibly be measuring the traffic
they're getting.

~~~
ErrantX
I think your right.

The last request for spamming HN I came across was worth about $1,500 _and up_
for a front page spot.That's a lot for a single link.

~~~
pg
Hmm, maybe we have a business model here. I could easily enough make such
spams not appear to users with over some threshold of karma, which the
spammers presumably wouldn't have. Can you point me to the page with the
offer?

~~~
ErrantX
Not prepared to post it here (for obvious reasons)

I will email it though (you will need to register on a forum and make a few
posts I think :) been a while since I joined) if you wish. Which email goes
direct to you?

~~~
ErrantX
Well no response... if you are interested I have a draft email composed. I'll
send it to <your username> @ ycombinator.com later today if I dont hear
anything...

Though I suspect you were attempting to call a bluff ;) oops...

------
russell
How about banning links in comments until a small karma level, say 10, or a
few days registration? Another possibility might be deletion privileges for
questionable comments for hackers above a certain karma or registration time.
By questionable, I mean comments of a certain form from new users, rather than
being completely subjective.

~~~
colins_pride
Allowing users to:

    
    
        only comment without links up to 10 karma, 
        submit at 20 karma, 
        comment with links at 50 karma, etc. 
    

sounds like a pretty reasonable approach.

Initially one proves themselves in discussion, then they can bring new ideas
to the table. By setting the limits at relatively low levels it doesn't
discourage new participants.

As a relatively new member of the community, I believe that this is important
because you don't want the community to become static any more than you want
to get overrun a la digg, reddit, etc.

A few days registration won't work, though, because the bad guys will just
start setting up accounts, keeping them dormant for the waiting period, and
then do their dastardly deeds. Besides, karma is a better measure than
seniority.

~~~
thepanister
EDIT: You should realize something, that spammers might create 10 accounts. 1
account that makes a comment, and the other 9 accounts would vote up for the
other account's comments, to increase the karma and pass the karma threshold.

This is similar to what I wrote here:
<http://news.ycombinator.com/item?id=506028> Users should have a history
record.

But it won't be an effecient solution, according to briansmith's approach.

Spammers could hire users to comment. Here is how I imagine it: 1- Spammers
crawl the submitted article.

2- Ask real users to read it and comment on it.

3- Copy what users said and submit it automatically here.

4- You will think it's NOT a spam, and you will up-vote what the spammer
commented, and this will allow the spammer to submit content and the problem
won't be really solved/ but reduced.

~~~
CalmQuiet
Your suggestions seem reasonable... if it gets to the point of an all-out _War
On Spam-bots_. Unless things get that bad, it may be sufficient simply to make
it a little more difficult for the spammers (so that they go hit on other
sites). Meanwhile, I doubt that many spammers are going to go to trouble to
create the mutually-supportive accounts (and certainly not actually spending
money by hiring live users).

~~~
thepanister
About hiring live users... it will be a slave hiring, without paying money.

Here is how it works, just one of these 2 options:

1- Spammers automatically create a blog, crawl the submitted articles here to
their blog, and wait for a real user to comment on it, and then submit that
comment here.

Or: 2- Spammers would create a bot that crawls the submitted article, submit
that article to any public social news website, and wait for real users'
comments, and then automatically submit the real users' comments here, and you
will think it is NOT a bot/spam, but it is.

EDIT: This is something that can be implemented these days, not after a
decade!

If this happens, will you think that this is a bot or not?

------
mixmax
The answer might be quite boring.

When a new user comes to this site he will try out the functionality, click
around and see how things work. One of the things that he will probably want
to know is how to submit an article. So he goes ahead and tries it, using
whatever link he has handy.

I know this because I did it myself when I joined. I deleted it right away
though. I've seen the same thing on other sites, so it's not that unusual.

Edit: tuned down the wording a bit, since PG's reply indicates that this might
not be the sole reason.

~~~
pg
That doesn't explain the sudden sharp increase in these links. We're growing,
but the growth rate didn't suddenly increase 10x like the rate of these links
has.

~~~
mixmax
Maybe it's because the influx of new users are more casual in their use of the
site, now that it has grown to a certain size. I could imagine that users that
just stumble upon the site will be more willing to engage in this semi-
destructible behaviour than users that got the site recommended from you or
another early user.

Can you see whether the users that do this end up being good citizens? That
would shed some light on whether these are malicious accounts or just new
users trying things out.

------
jasonkester
If you think about how you'd build a script to spam Hacker News, it's easy to
see what's going on here.

    
    
      - Step one is getting your bot to reliably create accounts.
      - Step two is getting it to create accounts and post links.
      - Step three is feeding it a list of 5000 of your sites.
    

This bot appears to be at step two.

------
joe_the_user
Perhaps the spammers are testing to see if the spam "sticks". The sites where
the spams stays will be revisited. Perhaps that spammers think that an account
that posts links that don't come up on a black list will considered "safer"
later when they can then spam seriously.

------
thepanister
I have already talked about this problem, but nobody really cares!

Even I provided a significant solution to solve the problem:
<http://news.ycombinator.com/item?id=506028>

Guess what? I got down-voted on my comment!

EDIT: If pg has no time for it, then why does not he allow us to code a
solution, and he can review it? And if he likes it, then he would use it!

~~~
mattmaroon
You sure are excitable!

~~~
thepanister
I am sorry, I don't understand what do you mean? Please forgive me, I am not a
native english speaker.

~~~
ksvs
He means the exclamation points. In English you don't use those except when
you really want to draw attention to a sentence. Otherwise you seem crazy! See
what I mean!

~~~
thepanister
WoW! I feel there is a culture shock here. Even I had to look up the word
"exclamation"; I really still have a too long road in learning english. :(

~~~
CalmQuiet
Hey... I give you _lots_ of credit for being open to learning... and for
continuing to try to contribute to this English-based forum. Live & learn:
that's okay.

~~~
thepanister
I don't want to surprise you that I did not really receive any English
education. I am self learner.

Your words made me feel really great.

