

Battling the Internet Water Army: Detection of Hidden Paid Posters - yuhong
http://arxiv.org/abs/1111.4297

======
DevX101
OK, so I've learned from this paper that this sort of paid postings to
influence public opinion occur systematically by trained teams in China. But
my praises end there.

I don't trust much their machine learning detection approach. But before we
get into that, let's describe what they did. These researchers collected about
21,000 comments from a news site representing 552 users. The researchers then
manually selected which of these 552 users they thought to be paid posters
without any external confirmation. According to their methodology they assumed
stupid comments or contradictory comments were by paid posters. The approach
from here was to do lots of fancy math on novel comments not screened by the
researchers, and make a classification on if these novel comments were by paid
posters or not.

If you haven't guessed it by now, the flaw here is the researchers assigning
which commenters _they thought_ were paid or not, based on the stupidity of
the comment. Its pretty easy to manually classify e-mail spam, but I'd have a
hard time classifying a paid public opinion shill based on a comment.
Furthermore, if the researchers are using the intelligence of the comment as a
marker, my experience with youtube comments, r/politics, and a few other
internet forums leads me to believe that there's no shortage of stupid,
contradictory ideas espoused by real unpaid people.

~~~
6ren
_Automatic Detection of Stupid Posters_ would also be acceptable.

~~~
khafra
It's been done: <http://stupidfilter.org/wiki/>

------
Alex3917
"To the best of our knowledge, this paper is the ﬁrst to study the social
phenomenon of paid posters."

A simple Google search shows there are all sorts of papers that have been
written on detecting astroturfing. This statement is an epic fail.

~~~
farnsworth
Well.. maybe you don't get those results in China.

But seriously, you're right, even a similar one posted to arxiv:
<http://arxiv.org/abs/1011.3768>

~~~
skore
Sorry about following your OT here, but it's a very interesting thought - That
the great firewall might be a factor in over-representing the uniqueness of
your own work if prior art is censored from your view.

This reminds me of a video I saw a while back where they showed Chinese
students the iconic "man in front of a line of tanks" picture and they failed
to recognize it. I wonder whether irony will eat itself in the future - maybe
some "capitalist" country will decide to use tanks on its people and they will
try to stop it with a human shield. Cue "look at the failure of capitalism -
using tanks on their own people who bravely stand up to it!" comments from
Chinas statesmen.

Shielding your citizens from material that encourages dissenting thought is a
cute plan, but you may end up making them look like fools on the international
stage.

------
jarin
I'd love to see this applied to some of the "patriotic" Facebook pages (like
the "Being American" page).

------
mark_l_watson
In the USA, it is now law that bloggers have to mention any gifts of products,
books, licenses, etc. that might influence blog posts.

I think this is good!

I am an author, and I get comped a _lot_ of books, and some software products.
It just feels right to say something like, for example, "thanks to publisher Z
for sending me a revue copy of book B" when talking about book B. If I get
comped something that I don't like, then I won't talk about it.

Paid posters on Reddit, etc., are more insidious because you just have to
guess if they might be paid by a company or government to push desired hype.

~~~
tingletech
not actually a "law" -- more of an F.T.C. regulation

"The Guides are administrative interpretations of the law intended to help
advertisers comply with the Federal Trade Commission Act; they are not binding
law themselves." -- <http://www.ftc.gov/opa/2009/10/endortest.shtm>

~~~
OstiaAntica
FTC regs are backed by truth-in-advertising law.

------
RyanMcGreal
Obligatory: <http://xkcd.com/810/>

------
VladRussian
from the geographical distribution of users vs. paid posters on pg.7 the 2
provinces - SICHUAN and esp. SHANDONG - are noticeable by lower ratio of paid
posters. Any idea why it so?

~~~
AntiGameZ
I guess their data example is not strong enough. I don't think the two
provinces have any reason for a ratio so low.

~~~
est
or bad GeoIP db

------
Angostura
This is a fantastic study! The authors are top class in their field and I
think we should all take notice of what they say!!!

~~~
ImprovedSilence
I see what you did there...

------
ableal
I suspect quite a bit of astroturfing is going on in app store reviews. My
national iOS app store has a fairly small volume of ratings/reviews, usually
single or double digits, only hitting hundreds for a very few apps. I've
spotted two or three weird scattershots of desultory, ill-fitting, generic-
sounding reviews. It even looked like the effort needed was overestimated out
of ignorance ...

~~~
ceejayoz
Oh, that's a certainty. I've seen Amazon Mechanical Turk work items that have
you place a five star review and receive payment for having done so.

~~~
ableal
Hmm, that explains why they are "reviews" (with verifiable names) instead of
just ratings (untraceable externally).

Fortuitous interesting excursion on arXiv: top-200 heavy physics papers
readers, "2010 Institutional arXiv Usage Data" at
<http://arxiv.org/help/support/2010_usage>

