Hacker News new | comments | ask | show | jobs | submit login

They didn't just choose those criteria out of thin air. They compiled data and ran a logistic regression that was able to predict whether or a not a given user would be banned from a web community based on the content of their posts. And it turned out that negative words, profanity, spelling and grammar were all predictive features in determining that result.

Here's how science works: If you don't like the conclusions, you can question the methodology. You can conduct the experiment yourself to see if you can reproduce the results. You can run your own studies and see if you come up with different conclusions. But you don't get to say "Hmm, the data don't conform to my preconceived notions about how the world should work, so I reject their implications."

Everyone doesn't agree with the idea that every thought from every person should be heard at all times. Some people are just jerks. "Trolling", by definition, is about making provocative statements to incite reactions, as opposed to contributing to conversations. There are places where that kind of thing is accepted (4chan) and places where it is not (hn). Tools that help communities to encourage their preferred forms of communication are good things, and will be increasingly important in the future.

Having a minority opinion is often perceived as "provocative" by people who feel threatened by that opinion, no matter how well articulated. These days the feeling of being "threatened" is at an all time high. (I say this having participated in discussions online for 30 years.)

The assumption that "trolls" are trying to "incite reactions, as opposed to contributing to conversations" is often repeated, but in practice, "troll" is a term that is often used against people whose viewpoint is minority, but who are legitimately arguing it.

Those with the majority viewpoint often find it convenient to dismiss (and attempt to discredit) those they disagree with by simply calling them troll. And it's not uncommon for forums to be operated in such a way as to actually ban people with minority viewpoints.

> but in practice, "troll" is a term that is often used against people whose viewpoint is minority, but who are legitimately arguing it.

Of course, but the "troll" wants to have that conversation, and the rest of the community does not. That doesn't mean the conversation or opinions are unworthy or wrong, just that they are unwelcome in this context.

Good or bad doesn't matter, the "troll" is a disruptive influence that turns a comfortable place into an uncomfortable one.

Most people don't come to the Internet to have their beliefs challenged, no matter how wrong those beliefs might be.

Right, but you can find correlations between being black and committing crime - compiling data and running logistic regressions be damned.

These measures they came up with are heuristic proxy measures at the very best, and noise at the worst.

The troll hunting algorithm has to face the false positive problem [1], which the paper does not address.

My very legitimate content has been censored various places (notably Facebook) because it tripped 'anti-trolling and scam algorithms' but the things I was trying to post were Snowden and Manning, and TPP leaks.

[1] http://understandinguncertainty.org/node/238

I have. Profanity is a weak indicator.

Disproportionally faster engagement over a given short time period and low variance of word choices along with repeated use of n-grams >= 4 words are all much more indicative then profanity.

Trolls are argumentative and tend to resort to trite sloganish language. They are no more rude then the average commentator (which is fairly impersonal and insincere)

It's about a general discordance with a generic community and what that looks like. There's two kinds: the salvageable and the hopeless. The hopeless is unwavering and defensive - irritable and divisive.

You can see where and when users post - them there are extreme outliers - those are usually bot spammers; the next group in, the first humans, they are the trolls

These are not hard things to compute - and profanity has nothing to do with it.

Alas,yet another thing I should have written up in LATEX and sent off to a fancy journal...

I think high predictive accuracy is not the only important consideration. For science it may be enough, but not for real world usage.

Because once you implement the classifier you affect the world. And you have to take into account what those effects are. Two properties I think are desirable: 1. It should encourage good behaviour. By this I mean that if you adapt to the get a better evaluation that means you also become a better member of the forum. This relates to not being gameable. 2. It should give everyone a chance. For example I could see how being poor could correlate with low quality posts. But shutting out all poor people means you can lose valuable perspectives, so its not an ideal solution. As long as your posts are good you should be welcome, even if your a pleb.

I light of these, consider what filtering for poor spelling actually does. What do we know that correlates with poor spelling? a) being a foreigner. b) being poor / uneducated. c) being underage. Those are the people you filter out. This goes against the 2'nd principle I mentioned of giving everyone a chance. It's a tradeoff, of course, and I could see how it's worth it sometimes.

I think parent was getting at the classical problem of using a scientific result to justify a policy. The problem is that no statistic (not even the proof of a causal link) is enough to justify a policy. A policy is also about the question "what should be". How much do you value each of the conflicting goals - silence trolls vs hear every opinion? The problem is when people think they can answer this kind of question with logistic regression, too.

> Some people are just jerks and "trolling", by definition, is about making provocative statements to incite reactions, as opposed to contributing to conversations.

I believe the core of paulhauggis' point was closer to the idea that trolling and behaviour that leads to being banned are not the same thing. Scope in studies is as important as anything else.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact