I think that incentive idea is great, and is a smart move to build a community, ...

omginternets · on Aug 17, 2016

Thanks for your reply!

>I think that incentive [is good], particularly when you're trying to draw subject matter experts.

You bring up an excellent point. One of the fundamental problems with comments, I think, is that it creates a space in which ignorance and expertise are equally-weighted. In fact, it's often worse than that for reasons we all know: interesting issues are hard to distill into 300-or-so characters, and short, simple points are often more percussive.

Vetting credentials is a very good option IMHO for certain forums but not for others. Reddit's /r/askscience is an example of a forum in which it works well.

>It still doesn't solve the problem that for someone to _find_ those great comments, they have to _read_ them, and stop them from getting buried.

I wonder if this problem can't be solved through the use of machine-learning to classify comments into high-versus-low quality by grammatical and semantic analysis. This kind of first-pass filtering could, at the very least, help throw out the obvious trash and pre-select candidates for recognition.

Such a system can be tuned to minimize false-alarms (shitpost getting flagged as good), which I think represent the most problematic of classification errors. This is a nice problem-space for ML because the increase in misses implied by a bias against false-alarms doesn't degrade the service much: not having one's comment select for re-publication is unexceptional.

Thoughts?

showerst · on Aug 17, 2016

RE:Machine learning: I think there are two problems with that approach, one cultural and one technological.

The cultural issue is that many news orgs are still run by people for whom the idea that technology could accidentally censor a valid criticism or ban a decent voice is just too risky. I think this is changing, and many newsrooms today a little more fluid than when I really cared about the problem 4 years ago.

The tech issue is a little bit of a cop out on my part. An ML approach is super attractive to me as a techie. Google (youtube), facebook, NYT, WaPO, and tons of other billion dollar orgs have this problem, and could loads of money by being seen as better communities.

On the more guerrilla side, hundreds of subreddits have automoderaters written by savvy, caring moderators.

They have terabytes of training data, already tagged, and world class ML experts on staff. If it was a tractable problem with business value, why wouldn't they have fixed it? I'm guessing it's the sort of thing that looks doable from the surface, but you get buried in the details.

Again, cop out answer, so please go prove me wrong!!

omginternets · on Aug 17, 2016

>cultural issue

I understand, and I think that's probably the most difficult problem of the two. I'd just like to point out -- in the interest of discussion -- three things:

1. Pre-filtering for moderators is different (much safer) than auto-banning by a bot

2. It's valid both to filter informed opinions that are poorly expressed, and for a publisher to have a preferred "voice", i.e. a style of writing that it favors.

3. The argument can be made that machines are no more biased than human editors, and that in many cases, the biases of the former are known. As a corollary to this point, there exist certain ML techniques (e.g. randomized forrest classifiers) for which the decision process of an individual case can be retraced after the fact.

How do you think publishers would respond to these counter-points?

>technical problem

Counter-cop-out: someone has to be the first!

Somewhat-less-cop-outy-counter-cop-out: by your own admission, certain sites (e.g. Reddit) have high-quality automoderators.

I would argue that the problem is "approximately solved" and that this is sufficient for the purposes of moderating an internet news publisher. Again, I would make the signal-detection-theoretic point of my previous comment: I can selectively bias my automoderators in favor of reducing either false-alarms or misses. Of course, this brings us back to the cultural problem you mentioned.

By this I conclude that the bottleneck is cultural, which brings me to a follow-up question: what do you think is driving the increased tolerance towards accidentally censoring a "decent voice"? Is it the understanding that it doesn't matter so long as a critical mass of decent voices are promoted?

showerst · on Aug 17, 2016

omginternets we're starting to run into HN flame-war restrictions, and I'm working so apologies if responses come slowly.

> How do you think publishers would respond to these counter-points? In my experience 1 and 2 are fine, but 3 is actually a _net negative_ to some of them. People who by and large have come up through 10+ years of paying dues in a 'The patrician editor is always right' culture _hate_ giving up control, even when it makes their jobs easier.

Editors I've seen have balked at things like Taboola and outbrain, despite them being test-ably better than human recommendations, and saving staffers work. It's a fair argument that picking which stories to promote is a core part of the editorial job more so than comment moderation, but the attitude match is there. Editors at one DC media org I didn't work for shot down A/B testing any new features in the first place, because there was an assumption that the tech staff would rig it!

I don't want to paint 'editors' with too broad a brush, but there's definitely a cultural reluctance at the high level to automated decision making.

> What do you think is driving the increased tolerance towards accidentally censoring a "decent voice"? Is it the understanding that it doesn't matter so long as a critical mass of decent voices are promoted?

It doesn't matter to you and me. We think like HN'ers, where there are trillions of internet packets flowing around every day, and a few will get lost. They think like hometown newspaper editors parsing letters. When you take on the responsibility of being a gatekeeper, screwing it up is a big problem, every time.

I think increased tolerance is coming from more exposure to the sheer volume (Every week at FP the website gets more visits than people who have ever read the magazine in it's 50 years of existence combined), and a bit of throwing the hands up and saying "who knows"

Again, I'm speaking for a pretty specific niche of old-school newspapers and magazine people turned editors of major web properties, because those are where my friends work. Things are probably different at HuffPo or Gawker or internet native places, but clearly not that different because their communities are still toxic.

> I would argue that the problem is "approximately solved" So I disagree here, but don't have evidence to back it up, other than years-old experience with Livefyre's bozo filter, which we didn't put enough work into tuning to give it a super fair shake.

Taking spam comments as mostly solved, I think there are 3 core groups of 'noise' internet comments:

1. People who don't have the 'does this add to the discussion' mindset to use HN's words. cloudjacker and michaelbuddy 's comments below demonstrate this pretty well. I'd lump cheapshot reddit jokes in here as well. They're not always poor writers, or even negative -- "Great article! love, grandma". Which falls back into the ethics of filtering them. I suspect that this is 80%+ solveable.

2. The 'bored youth' and 'trolls' group. This is actually the worst group I think, because these are the people I suspect that make death threats and engage in doxxing and swatting. Filters will catch some of these people, but they're persistent, and many of them are tech-savvy and reasonably well educated. They can sometimes be hard to tell from honest extremists. A commenter from group 1 who is personally affronted can fall into this group, at which point they become a massive time suck. Hard to solve, but verified accounts help here in the US case.

3. Sponsored Astroturfing. Russia, Turkey, (pro/anti) Israel, China, Trump (presumably the DNC?) all have a large paid network of people just criss crossing the internet all day trying to make their support base look larger than it is. Especially in the US politics case, they often speak good english, and are familiar with both sides' goto logical fallacies. They'll learn your moderating style in a heartbeat, and adapt. Unsolveable.

Anyway, if someone builds a good bozo filter, they're almost certainly a zillionaire. I hope it happens, but I suspect we'll just start looking back on website comment sections like usenet, as a good idea that didn't scale very well, and find something better.

dredmorbius · on Aug 17, 2016

Taboola and Outbrain's recommendations are so pathetically insulting, and the tracking so obvious, that I've both blocked their domains (router DNS server) and specifically set "display:none;" properties on any CSS classes/IDs matching their names or substrings.

It's pathetic bottom-feeder crap.

Maybe if I fed the beast through tracking, I'd see higher quality recommendations, but I won't, and I don't. They only serve to tell me just how precariously miserable the current state of advertising, tracking, surveillance-supported media is. I'm hoping it will crash and burn, not because I want present media organisations to die, but until they do, we don't seem to stand any chance of something better.

(What better, you ask? Information as a public good, supported by an income-indexed tax.)

showerst · on Aug 18, 2016

I was referring specifically to their paid same-site recommendation engines. So you drop it into an article, and it recommends other articles from your site. In my experience it's decent to good, depending on what metadata you provide it.

I agree that the '10 weight loss secrets' promoted junk to third party sites is bottom scraping.

dredmorbius · on Aug 18, 2016

It tarnishes both brands. Taboola and the hosting site.

Sufficiently that the in-site referrals fail for technical reasons.

michelledd · on Aug 18, 2016

I really disagree. Yes, taboola maybe is promoting literally ANY content- even spam. So yes- I blocked them but currently Outbrain is really operating as a content discovery- I didn't find any content the abuses me as a reader. Not Yet. I know that they have strict guidelines as well for their advertisers.

ckrailo · on Aug 17, 2016

Reading the other reply thread with slowerest gave me another possible solution, too.

Perhaps the comments sections for journalistic pieces from organizations like Ars, NPR, NYT, local news, etc could be more of a competition (like Slashdot). Top 300 comments get preserved, leave it open for a month with no comment limit and some light moderation, and let the conversation go wild (I like Reddit's system for this), then delete all but the top 300 at the end.

Adjust "300" and "top" to fit your organization's needs, just make sure they're clearly defined. Would also help limit the scope for an ML-based solution, too. :)

LarbarBarking · on Aug 17, 2016

For news sites with a paid component, they could allow comments only from subscribers / donors. Having a gate which involves money will improve the conversation somewhat. I'd even go a step further and make comments invisible except for subscribers. People creating trial paid accounts could see the comments but not comment themselves. This latter step would prevent astroturfing from firms willing to pay $10 for a trial but not $100 for an annual subscription.

Moderators would still be needed but their workload would be reduced. And there would be money available for them since many would subscribe / donate just to be part of the community, which would make moderation less of a drain and more of the core profit-making.