

Automatic content moderation using validates_text_content - aarongough
http://blog.thingsaaronmade.com/validatestextcontent

======
erik
I could see this causing problems on technical discussion sites as it might
consider a block of C code a bad comment. Sometimes there are legitimate
reasons to post something non-english, like a blob of numbers or some ascii
(or unicode) art.

Slashdot has some filters like this on their comment system, but I think they
are mostly targeted at blocking ascii art spam.

~~~
aarongough
I can see that being a problem too. I think in that case you would have to
change the validation code to look for those exceptional cases. In all honesty
the validation as it is now is designed to handle content on more
conversational sites (versus technical ones).

I'll be interested to see what people use the validation for and there
shouldn't be any problem expanding the code to handle more edge-cases...

~~~
eru
I guess going with a statistical tool will be easier in the medium to long
term than adding special cases.

~~~
aarongough
That's definitely a possibility. It's worth noting that the plugin was created
with a specific use-case in mind (a non-technical user generated content site)
and that outside of that role it's clearly not going to perform as well...

~~~
eru
Yes, for such a limited use case special rules are quite good.

Just don't fall into a trap.

------
petervandijck
So you create a barrier by rejecting "bad" comments versus creating a barrier
by using signup/captchas etc.

I'm not really buying it. The problem in communities isn't really bad writing
or quick comments, it's usually trolls who will circumvent something like this
quite easily.

~~~
aarongough
I see what you mean about there still being a barrier, however I would
describe it as a trailing barrier, rather than a leading barrier. Most people
will never encounter it and if they do then they have already invested at
least a little of their time to creating their content.

It's been my experience that once they are invested in the process they are
more likely to commit to following through and completing the process, even if
that means getting rejected several times (I get to see the progression
because the systems in production email me every time content is rejected).
Those that don't follow through are the one who were just creating throw-away
content anyway, which is what I want to discourage.

From what I've seen the trolls do make it through a percentage of the time but
they tend to be well-written, entertaining, trolls. Those who spout short
comments full of racial slurs don't tend to understand why their comments are
getting rejected and they either give up or their comment morphs into
something marginally more useful.

In that case the voting and flagging mechanisms built-in to the posting system
come into play, the majority of users who are reasonable tend to be able to
swarm out the trolls, just like they do here on HN.

The pivotal piece there being that the rejection mechanism tends to discourage
people from posting that have no investment in what they're writing, which
means the remaining users have a vested interest in keeping the trolls at
bay...

If you are interested in seeing examples of content progression from a live
system just shoot me an email!

~~~
eru
Do you explain the rules to the users? What kind of feedback do you get when
your comment violates the rules?

~~~
aarongough
The rules are not explained in advance to the users, that would make the
system too easy to game, particularly if you were very specific.

If the content fails the validation this is the default message that is
returned:

 _Correct capitalization and punctuation are required. (ie: All sentences
should begin with a capital letter and end with a punctuation mark, etc.)_

Most people seem to understand and simply clean up their sentence structure a
bit. People that have written their entire comment in block caps tend to
simply give up.

------
aarongough
Before someone else says it: I know I should migrate to using Git, I just
haven't found the motivation yet! I will try to migrate in the next few months
though.

~~~
eru
Git is fine. The other distributed version control systems (darcs, mercurial,
bazaar, etc) are also usable.

~~~
aarongough
Yeah it's just that the Ruby/Rails world is pretty tightly tied into Git. Most
Gems these days are automatically created by GitHub/Gemcutter...

I'm actually perfectly happy with SVN, but in order to participate more in the
open-source side of the Rails world I'll need to move to Git.

------
pierrefar
Interesting. Might try to port it to PHP.

~~~
aarongough
Thanks! I don't expect it would be too hard, most of the logic can be taken
pretty much as-is. Let me know how it goes aaron@aarongough.com

