
A bot that tweets any time devs swear on GitHub - deepsy
https://thenextweb.com/apps/2017/10/09/twitter-microsoft-bot-swear-github/#.tnw_SIiKr9mX
======
uiri
Author here. I made this almost five years ago when I was in high school. The
Scunthorpe problem is real - "shit" is used in a lot of good compound swears.

The code isn't public because I was concerned about people taking it to make a
more popular version of the same thing. Not that it is difficult to glue
together the two APIs. It is also an embarrassing mess of around 150 lines of
Python.

One issue with linking to the commits or repo is naming & shaming. The other
is, as I mentioned, people trying to get on the bot intentionally.

It scans a GitHub API once a minute so as not to put noticeable strain on
their API. I think the firehose of constant commit messages has only gotten
worse.

~~~
tzs
I did the communications server for a small online gaming site once, and it
had a profanity filter on the chat channels. The approach we took mostly
avoided the Scunthorpe problem, and also caught when people tried to sneak
profanity past it by splitting it across words (e.g., knowing that
"motherfucker" would be filtered, someone might try "motherfu cker", or maybe
even "m0th3rfu ck3r").

Here is what we did, as far as I recall. This worked a heck of a lot better
than I expected it to.

1\. Make a copy of the text and replace all the common non-letters that look
somewhat like letters with those letters. E.g., 0 => O, 1 => L, 3 => E, 4 =>
A, 7 => T, @ => A.

2\. Do a spell check on each word. Set a flag on each character that comes
from a word that passed the spell check.

3\. Scan the sequence of characters, ignoring all punctuation and white space,
looking for sequences that match known profanity.

4\. For each sequence that matches, mark it as profanity if any of its
characters were not flagged in step #2.

As described above this probably would not scale well to high traffice because
of the spell check. That could be improved by changing the order. Look for
potential profanity first. Only if some is found would you then do the spell
check, and you would only need to spell check the words that span the
potential profanity.

------
painbody
I fear this will be used to further arguments that programmers create a
hostile environment.

I know people in all occupations swear, but this puts the focus on it and
allows people to quantify it.

Just imagine the article:

"One report showed that the f-word was used over 5,000 times in a single
month. No other profession that's been measured has showed near this level of
profane, unwelcome environment. How are stay-at-home parents supposed to feel
welcome in this community?"

~~~
taxpayer9
I'm sorry, in what way is the word "fuck" creating a hostile environment? It
depends on context. Compare this commit message:

"Get the fucking login button working"

"Fuck painbody, they don't know how to wire up a login button"

I write a lot of commit messages with "offensive" language, but it's not
directed at anybody and it's just to blow off steam. How are stay-at-home
parents supposed to feel welcome in this community? Release the death grip on
their pearls and realize that people swear and that there's no way a "bad
word", on its own, can hurt you. Unless someone is directing harsh language at
you, you have no reason to be offended by the word "fuck". Good god, it's just
a word.

~~~
SeanDav
> _" Good god, it's just a word"_

Extremely naive to think that just because it is a word, it does not matter.

A single word in the wrong place or at the wrong time, could destroy your
career, make you a social pariah or even get you killed. Words are powerful.

~~~
fuckword
And yet somehow I don't think "fuck" in a public commit message is that big of
a deal. We weren't talking about swear words destroying your career, making
you a social pariah, or getting you killed. We're talking about tightasses
getting a bee in their bonnets about "bad" words.

~~~
grzm
Yet you do care enough to create a new account just to post this comment,
complete with dedicated account name. If you truly don't think it's such a big
deal, why post in this way? Why not stand behind your comment with a non-
throwaway username?

~~~
tomc1985
"Good god, it's just a word"

Going to second this fella. Not a throwaway account. Anything to discredit the
prudes of this world and their pathetic, sensitive little war on "profanity".
Fuck forever!

~~~
SimbaOnSteroids
Do we want the kinds of kids that are sticklers for the rules really getting
involved in this field? ;)

------
hasbot
> one bored Microsoft programmer has built a Twitter bot

Hmm, I'm not sure I would want to be labeled, in public, as a "bored Microsoft
programmer." His manager is wondering "Did he write this on Microsoft time?"
"What else is he doing to alleviate his boredom?" "I give him plenty of work
to do so why is he bored?"

~~~
hooksfordays
Problem is, this was developed 5 years ago, well before the developer joined
Microsoft. I understand where you're coming from, but the article is
misleading.

------
strictnein
The Twitter account:

[https://twitter.com/gitlost](https://twitter.com/gitlost)

------
leejo
Related:
[http://www.commitlogsfromlastnight.com/](http://www.commitlogsfromlastnight.com/)

------
Erazal
I just sweared in a commit and it did not appear. Makes me wonder how the
tweeter API works, how often it updates, etc... Off to discover new horizons !

~~~
citrusui
Can confirm, it doesn't always work. Most likely due to Twitter's 2400
tweet/day API limit

------
mrighele
A bit more evil would be to do something similar every time someone mentions
"password" or "secret" in a commit message...

------
MrQuincle
Moby Dick is not a swearword :-)

~~~
amarraja
We had a small bespoke customer support system we wrote back in 2000 or so.
The system would allow an operator to type the message body, and would
automatically generate the mail text e.g. "Dear {customer name}, {body},
Regards {operator name}". Cutting edge stuff back then!

One day, we get a call that an operator couldn't send an email since the third
word was classed as "offensive", however the message body was fine. After
firing up the debugger, it became obvious... the customer's surname was
"Dick".

We never got to the bottom of how to solve it, so hacked something in. I
wonder how many times Mr. Dick has issues with false positives in profanity
filters.

~~~
atomwaffel
Probably often enough to be used to it, although it's still bad design and
needs to be fixed. (I'm not sure why a non-public-facing system needed a
profanity filter in the first place.)
[https://en.wikipedia.org/wiki/Scunthorpe_problem](https://en.wikipedia.org/wiki/Scunthorpe_problem)

I find it interesting that this Twitter bot seems to have the same problem in
reverse: it can't reliably filter out things that _aren 't_ offensive.

Edit: Thinking about it, it's really still the same problem, i.e. false
positives when trying to automatically determine whether or not a given string
contains swearwords.

------
pvinis
Would be nice to have the repo where the commit happened too.

------
sAbakumoff
is it implemented by using GitHub web-hooks?

