
Internet trolls: Proactive policing - ghosh
http://www.economist.com/news/business-and-finance/21650102-new-research-suggests-it-possible-identify-online-troublemakers-they-strike-proactive?fsrc=scn/tw/te/bl/ed/internetttrolls
======
diminoten
The work is here:
[http://arxiv.org/pdf/1504.00680v1.pdf](http://arxiv.org/pdf/1504.00680v1.pdf)

According to the data, FBUs (future banned users, aka trolls) are
significantly more readable in their writing than NBUs (never banned users).

That's certainly surprising.

Also of note is that FBUs tend to reply more than NBUs, which actually meshes
with some features of HN -- the inability to reply to a nested comment from
within the nest (one has to click on the comment itself to reply -- though
this feature has always been somewhat ambiguous and not really deterrent of
impulsive replies; I don't even notice it anymore).

Also, I think HN deflates the score of posts with many comments, a strategy
supported by this line in the paper: "Prior work also identified post
frequency as a signal of a low quality discussion (Diakopoulos and Naaman
2011)."

An absolutely _fascinating_ paper. What are the consequences, from an IP
standpoint, of taking this paper and implementing features based on its
conclusions?

~~~
michaelkeenan
It's a great paper, and a great application of machine learning!

Though, the FBUs had worse readability than NBUs. I think you're
misinterpreting the readability index - higher is worse, not better. From the
paper:

"Users who get banned in the future (FBUs) (a) write less similarly to other
users in the same thread, (b) write posts that are harder to read (i.e., have
a higher readability index), and (c) express less positive emotion."

~~~
diminoten
I saw that later on, I was specifically referencing chart 1(b).

------
bglazer
The paper itself [1] and a Wired article covering it [2], were received very
poorly on HN when they were originally posted before.

[1]
[https://news.ycombinator.com/item?id=9399090](https://news.ycombinator.com/item?id=9399090)

[2]
[https://news.ycombinator.com/item?id=9398399](https://news.ycombinator.com/item?id=9398399)

I honestly didn't expect that. My impression was that people are generally
afraid that this will be used as a way to auto-ban people just for disagreeing
with popular sentiment.

I think what people didn't understand was that this analysis is heavily based
on community and admin actions, primarily banned users and deleted posts.
While there is some analysis of the comments themselves, like readability and
word similarity, this is __not __some SkyNet auto-ban bot.

They are predicting how likely a user is to be banned, based primarily __on
the community 's response __to that user.

My point is that if systems like this are used to flag or ban people for
disagreeing with the group think, then the problem is with the community, not
the algorithm.

Edit: typos and clarity

------
RcouF1uZ4gsC
I don't think this technique is going to be all that helpful. This is because
you cannot consider specificity and sensitivity without taking into account
the prevalence. Let us look at the data from the paper [1].

From the chart on page 3, we can see that the percentage of banned ranges from
1.7% to 3.3%. On the last page we have the phrase "Though average classifier
precision is relatively high (0.80), one in five users identified as
antisocial are nonetheless misclassified". Based on this, I think we have a
specificity of 80%, and a sensitivity of 80%.

Let us look at 1000 hypothetical users, using 3% as the prevalence. Flagged
and Not Flagged represent the output of the algorithm, and Regular and Banned
represent "Gold Standard"

    
    
                 |Regular  |  Banned |
    
      --------------------------------
    
      Total      |970      |  30
    
      ----------------------------------
    
      Flagged    |194      |  24
    
      ----------------------------------
    
      Not Flagged | 776    |   6
    

In this example, the algorithm will flag 218 out of a thousand people as being
trolls. 194 of the 218 people flagged are actually not trolls. Doing any sort
of automated banning based on this algorithm will I think be unacceptable.

I may have some of my stats or assumptions wrong, and I would love to be
educated why if that is the case.

1\.
[http://arxiv.org/pdf/1504.00680v1.pdf](http://arxiv.org/pdf/1504.00680v1.pdf)

------
mindslight
Trolling is the inevitable reaction to groupthink. Predictably, groupthink is
interested in ways of insulating itself from trolls.

~~~
throwaway7767
> Trolling is the inevitable reaction to groupthink. Predictably, groupthink
> is interested in ways of insulating itself from trolls.

The word 'trolling' seems to have taken on many definitions of late, you are
probably using a different one than the article.

I understood the article to mean troll in the sense of a person purposefully
misdirecting discussion to make people angry (note that the troll in this
sense usually does not believe what they are saying, they're "trolling" for
reactions).

It sounds like you are using the more recent use of "trolling" to mean "to
disagree with someone".

~~~
DanBC
> It sounds like you are using the more recent use of "trolling" to mean "to
> disagree with someone".

That's not a definition of trolling. That's people in arguments mistakenly
calling their opponents trolls.

I don't think parent is saying "people who disagree are trolls" but is saying
that some forms of groupthink encourage trolling behaviours. When your
wikipedia edits get stomped on by Twinkle-using 13 year olds on Vandal Patrol
who are gathering reverts to apply for admin some people react by leaving;
others react by trolling.

(As I understand it WP has tried to address Vandal Patrolling; Twinkle misuse;
reverts for adminship; etc).

~~~
cbd1984
Trolls are the ones trying to get people to kill themselves.

