

What's the deal with these weird comments on my blog? - gavingmiller
http://www.thepursuitofquality.com/post/22/whats-the-deal-with-these-weird-comments-on-my-blog.html

======
pg
They also submit random posts here (e.g. links to the frontpage of Facebook or
Google) in order to test how quickly we kill them.

<http://news.ycombinator.com/item?id=1332147>

~~~
kgermino
I notice that both the accounts on that post were created right before that
post, presumably for the sole purpose of creating that post. It seems to me
that that's fairly common. Why not add a delay (1-2 days or so) from the time
an account is created until they can post? It seems to me that doing so would
likely make it too difficult to create a large number of spam posts.

I know that there are sometimes members who create pseudo-accounts to protect
their identity on certain posts but you may be able to work around that by
letting posters use aliases that would not link back to their main account on
the front end.

~~~
pg
That would restrict legit users too. But what I might do is say that some
posts have to be approved by moderators, and use the newness of the account as
one factor in deciding which to say that for.

Spam is not an insoluble problem though. Most of it gets autokilled.

~~~
kgermino
That may be OK in that it might help force users to spend more time getting a
feel for the site before they submit. It would also help level out the surge
of new members that I read sometimes happens when HN gets linked to. AFAIK
neither of these have become serious issues so far but I think it's at least
worth taking into account when you make your decision.

Edit: To me it comes down to balancing solving these issues (if they even
exist, you would know that better than me) and missing out on whatever content
that doesn't get posted because of the restriction.

~~~
derefr
No one registers until they have to—that is, until they have something to
submit. If you read HN for months, finally finding something worthy to submit,
set up an account, and then find _you can't_ , I believe you'd be a bit
steamed.

Also, some people create one-time accounts to detach certain posts from their
well-known pseudonyms (whistleblowing, asking advice about projects under NDAs
without giving identifiable details, etc.). These accounts would be thus
restricted as well, which would discourage this type of contribution.

~~~
kgermino
Your first point is a good one, and looking back was presumably the basis for
pg's disagreement. And as long as there is no serious problem with posts by
brand new accounts that alone is likely a good enough reason to say I'm wrong.

Your second point could be fixed by allowing users to create pseudo-accounts
that don't visibly link back to their main accounts. Although as I think about
that now that would only work until lawyers got involved, so not at all on
serious issues.

On second thought I concede to your points.

------
Angostura
No, I don't get it.

In what way it it better for a spammer to post a scout message and then check
whether the scout message appears before posting the spam, than simply posting
a spam message and seeing of the spam appears?

~~~
jonknee
Posting 1 comment on 10,000 sites to see which are vulnerable lets you
concentrate the bulk of your efforts on the best targets. There's no URL in
the post which may mean they are creating a list to hand-spam later on. It's a
lot harder for filters to catch hand written text and if a random comment gets
through the likelihood that a hand written one will is high.

So maybe out of 10,000 sites they find 1,000 that are good targets and then
have their third-world employees go to town working at 10x the rate. Not a bad
return.

It's sort of like burglars cruising through a neighborhood on multiple nights
to see who has the lights off.

~~~
proexploit
I doubt any have real people doing the submissions, it's automated. It would
make sense to find which one's are vulnerable, but they wouldn't need to test
with a fake spam post, they'd test with a real one and check their backlinks
(yahoo site explorer for example).

~~~
dagobart
I disagree. Those are real people who do that kind of work. When I was looking
for online work at sites like getafreelancer or rentacoder there usually were
tons of job offers regarding forum posts/blog comments. Rarely labeled as
forum/blog spam but often meaning that. Here's a typical (current) job offer
for forum posts: <http://is.gd/c3dIo> (This is not intended as advertising but
only to provide an impression of such a job offer and as evidence that such
job offers actually do exist and are common, depending on where you look.)

In this particular job, they offer $20 for 400 posts (or 2000 replies). Most
often, one condition of posting such stuff is that it is at least unique,
"original work" or would "pass copyscape".

So, there seem to be people who actually do this kind of job manually.

~~~
proexploit
Exactly, real people are hired to generate backlinks via blog comments. I
consider it spam, but since the goal is to make a relevant comment, that's
debatable. The example given in this post however, is automated. You don't
hire people to write unique content and have them list keywords. That's the
type of spam I'm referring to as automated.

------
jacquesm
There is a very simple trick at work here. The sequence of words is unique to
the posting, the 'title' of the comment is the payload, the rest is a marker
of the place the comment was left.

30 days later you'll see your blog overrun by spambots if the comment does not
get removed, the page will be found again by using a fragment of the sequence
of words. If the fragment can't be found the blog will be marked as 'live
moderated' and the spammer will move on.

So, the 'no broken windows' theory is tested here, if you don't 'fix your
windows' prepare for a lot more stones to be thrown.

Spammers like to know that the comments 'stick' so they use these as a way to
see where to concentrate their spam waves with a greater chance of survival of
the spammed content.

If it never shows up at all on your pages that is of course the best remedy.

------
derefr
This looked more to me like a regular spambot which filled out every field on
a comment form it could, and didn't bother to bail out when it couldn't find a
URL field. ("Natrol acai berry diet reviews" looks very much like something
that should be linked.)

------
PanMan
I had some that seemed to be less random words, but still auto posted, like
"Nice, insightful post!". I always guessed it had to do with spamfilters:
Previous approved posters will get auto approved on next posts, I assume (for
some filters).

~~~
proexploit
What you're talking about was most likely including a link to the site in
their name, such as in Wordpress posts. Those type of spammers actually have
chosen to write things like "Nice, insightful post!" because it's generic and
fits any blog and is more likely to be accepted (stroking the ego of website
owners).

------
RK
On a related note, doing some searches on twitter I have come across some
sites like this:

[http://www.articlenutrition.com/science/what-is-the-
differen...](http://www.articlenutrition.com/science/what-is-the-difference-
between-applied-mathematics-and-physics/)

The text seems to be scraped from somewhere else, translated, then re-
translated. I assume that these sites are simply to attract ad views. I
remember seeing "comments" that seemed to be clearly concerning American
topics (along the lines of "here in Chicago I always rode the L when I was
growing up"), but the text didn't seem like it could possibly be written by a
native speaker, but all of the text on the site had the same idiosyncrasies.

Any ideas?

~~~
bolero32
That isn't a poor translation. The spambot is using a Markov chain to generate
plausible text and circumvent the possibility of a filter which also uses
Markov chains.

------
Gormo
I get emails like this all the time too.

I think it may be an macro-level attempt to defeat Bayesian spam filters. Put
lots of random words into a nonsensical message, and the recipient will mark
it as junk.

Keep doing it, and eventually the spam filters will start triggering false
positives on non-spam messages that contain the same words, arranged sensibly.

Users then need to sort through their spam folders to find the real messages,
and are then exposed to your Viagra ad.

------
tokenadult
This has been going on for a long time. An earlier version of this phenomenon
was found on Web-based discussion forums. And email spamming is often based on
similar principles. (The commercial online services through which I was
introduced to online communication were moderated well enough to avoid this
problem. If you check the new page here on HN you can see similar attempts to
test the moderation here.)

------
nostromo
It happens on wikis all the time too. I run a little code-sharing site at
<http://www.codecodex.com/> and I would get lots of random ten digit codes
entered -- I assume for the same reason. A number of other wikis have noted
the same type of edits.

------
varjag
Poisoning statistical spam filters?

