Why existing profanity detection libraries suck and how I built a better one

drallison · on Feb 5, 2019

Most of the words on the profanity list are in common use in polite company these days. If the goal is to identify hate speech and sexual speech, doesn't there need to be a semantic component. I wonder why a profanity filter is needed--what is the use case?

vzhou842 · on Feb 5, 2019

Hey, author here. I run a couple web games that allow in-game chat between players. One of the biggest complaints from players is that some people like to troll and spam post racial slurs / hate speech / threats. I wanted the profanity detection so I could just filter out the most egregiously profane messages and potentially kick/ban players who continue trying to send obviously profane messages.

IshKebab · on Feb 5, 2019

Seems like a reasonable use of it as long as the sensitivity is set so you have a reasonably low false positive rate (or have a human in the loop).

vzhou842 · on Feb 5, 2019

that's the idea!

IshKebab · on Feb 5, 2019

Do none of the winners of the Kaggle competition he linked release their code?

jaclaz · on Feb 5, 2019

Only as anecdata, I remember a Forum/Discussion Board (technical) that a few years ago had issues (its word filter) with the word Matsushita (Panasonic was fine ;)).

WalterGR · on Feb 8, 2019

This is called the Scunthorpe Problem. https://en.m.wikipedia.org/wiki/Scunthorpe_problem

Overtonwindow · on Feb 5, 2019

I have never meant to say the word “ducking” in my entire life...

bryanrasmussen · on Feb 5, 2019

a propos the subjectivity of profanity, the word Slut in Danish means finished, it might often be found at the end of long bodies of text.

vector_spaces · on Feb 5, 2019

Same in Swedish and probably also Norwegian, which both have a pretty substantial online gameing presence

vzhou842 · on Feb 5, 2019

yup valid points, that’s definitely a weakness here

rebornshellfish · on Feb 5, 2019

There is no sociologically valid use case for a profanity detection library. This is a misuse of technology. Get over your shit.

ken · on Feb 5, 2019

Is there a "sociologically valid use case" for other software you've seen announced on HN?

ashleyn · on Feb 5, 2019

Not even making services age-appropriate?

rebornshellfish · on Feb 5, 2019

Firstly, it won't work because designing a profanity filter can only ever be an eternal game of cat and mouse - there are always going to be new words or ways of r3wr1t1ng the same words. This doesn't take into account the problem of multiple languages.

Secondly, age-appropriate is a far more complex challenge than simply filtering out "bad" words. Anything can be reworded. Families differ in which words they consider acceptable at each age. If you really want to make an online service safe for young children, you probably shouldn't be matching them with unapproved strangers to talk to in any form.

beobab · on Feb 5, 2019

Are you familiar with the maxim that people behave better if they believe they are being observed?

Would you agree that people would be less likely to “go off on one” if a draft message displayed a “this message could be construed as offensive” warning before they sent it?

jsjohnst · on Feb 5, 2019

But what defines “offensive”? If you used a “profane” word, I’d likely not even notice. But if you said something negative about me or my family, yes, that’d likely be offensive. Even that isn’t fool proof, a friend saying something negative about me might not be offensive (especially if a truthful statement), but a stranger it would likely always be. There’s so much nuance here that I feel it’s a losing battle.