Most of the words on the profanity list are in common use in polite company these days. If the goal is to identify hate speech and sexual speech, doesn't there need to be a semantic component. I wonder why a profanity filter is needed--what is the use case?
Hey, author here. I run a couple web games that allow in-game chat between players. One of the biggest complaints from players is that some people like to troll and spam post racial slurs / hate speech / threats. I wanted the profanity detection so I could just filter out the most egregiously profane messages and potentially kick/ban players who continue trying to send obviously profane messages.
Only as anecdata, I remember a Forum/Discussion Board (technical) that a few years ago had issues (its word filter) with the word Matsushita (Panasonic was fine ;)).
Firstly, it won't work because designing a profanity filter can only ever be an eternal game of cat and mouse - there are always going to be new words or ways of r3wr1t1ng the same words. This doesn't take into account the problem of multiple languages.
Secondly, age-appropriate is a far more complex challenge than simply filtering out "bad" words. Anything can be reworded. Families differ in which words they consider acceptable at each age. If you really want to make an online service safe for young children, you probably shouldn't be matching them with unapproved strangers to talk to in any form.
Are you familiar with the maxim that people behave better if they believe they are being observed?
Would you agree that people would be less likely to “go off on one” if a draft message displayed a “this message could be construed as offensive” warning before they sent it?
But what defines “offensive”? If you used a “profane” word, I’d likely not even notice. But if you said something negative about me or my family, yes, that’d likely be offensive. Even that isn’t fool proof, a friend saying something negative about me might not be offensive (especially if a truthful statement), but a stranger it would likely always be. There’s so much nuance here that I feel it’s a losing battle.