
Scunthorpe Problem - bambax
https://en.wikipedia.org/wiki/Scunthorpe_problem
======
rcarmo
A long while back, the first time someone decided to add content filtering to
the corporate proxy every local news site would be inaccessible on Fridays,
because in Portuguese “Sexta-feira” is abbreviated as “Sex.”

You’d be surprised how many variations of this I’ve come across over the
years.

------
YeGoblynQueenne
Surely, that is a solved problem now that Google has "advance[d] the state of
the art in natural language technologies and buil[t] systems that learn to
_understand_ and use language _in context_ "? [1]

Or, I guess, one can just buy more NVIDIA GPUs, and simply solve the problem
using Torch, following their online guide on " _understanding_ Natural
Language with Deep Neural Networks Using Torch" [2].

In general, I hear that many language understanding tasks are finally
conclusively conquered, as shown by OpenAI with a combination of supervised
and unsupervised learning that has led to an ImageNet-class increase in the
state-of-the art results in language understanding datasets [3].

Why, these days, we can even build an "AI System [that] Understands Music Like
Humans Do"! (NVIDIA, again; [4])

The inhabitants of Scunthorpe and all the Dicks, Pennys and Dikshits of the
world can finally rejoice! Machine learning is on the job!

______________

[1]
[https://ai.google/research/teams/language/](https://ai.google/research/teams/language/)

[2] [https://devblogs.nvidia.com/understanding-natural-
language-d...](https://devblogs.nvidia.com/understanding-natural-language-
deep-neural-networks-using-torch/)

[3] [https://blog.openai.com/language-
unsupervised/](https://blog.openai.com/language-unsupervised/)

[4] [https://news.developer.nvidia.com/ai-system-understands-
musi...](https://news.developer.nvidia.com/ai-system-understands-music-like-
humans-do/)

~~~
bambax
Can't tell if you're being sarcastic or not, but I very much doubt AI will
prevent this problem from happening again in the future.

~~~
betterunix2
Not to be too cliche, but a _sufficiently advanced_ AI would be able to solve
this problem, because it would have the ability to recognize context as humans
might. The more interesting question will be what such an AI would do with
words that native speakers of a language do not agree on e.g. in English words
like "crap" or "damn." I think the future problem will be AIs that are either
too conservative or too liberal when it comes to censoring "mild" vulgarities
(and then I will probably say, "a sufficiently advanced AI would adjust the
censorship level by the audience to avoid that problem," etc.).

~~~
Piskvorrr
Yeah, that's not really useful though: "if _magic_ , then problem solved" is
pretty much the definition of handwaving.

~~~
betterunix2
I am not sure it is "magic" so much as "still beyond the state of the art." I
would be surprised if we _never_ get to the point where AI can handle that
kind of nuance, especially given how much progress has been made just in my
lifetime. Maybe it will take longer than some expect, but it is not an
impossibly hard task.

~~~
LocalH
Any sufficiently advanced technology is indistinguishable from magic

~~~
Piskvorrr
Yup. I was certain that the GP was referring to Clarke's Third Law, especially
given the emphasis - thus the near-equivalence.

------
Agentlien
My wife and I noticed this very clearly when trying to use Dutch in the text
chat in Elder Scrolls Online. A lot of words were censored because they were
close to blacklisted words in English. The most frequent one was "kunt",
second person present tense of "can".

------
brianmcc
Classics like that "everyone knows about" are still new to many thousands
engineers every year and definitely worth bringing out for an airing every now
and again!

Thoughts on that topic -> [https://mcconnellsoftware.github.io/supporting-the-
ten-thous...](https://mcconnellsoftware.github.io/supporting-the-ten-
thousand/)

~~~
Freak_NL
It might help if every new software developer was at least forced to grok the
_five monkeys in a cage_ parable:

[https://hipmonkey.wordpress.com/2016/02/08/5-monkeys-a-
banan...](https://hipmonkey.wordpress.com/2016/02/08/5-monkeys-a-banana-and-a-
ladder/)

Afterwards gently point them towards the familiar examples of passwords
(“Passwords can't contain spaces/be longer than 20 characters/must …”), email
address validation (“I found this really complete regex that …”), names (“No,
surnames are always at least 3 characters long, and …”), and sex/gender
(“Haha, yeah I know gender isn't a binary option, but this is the _sex_ field,
it just records what's in the user's passport, and THAT is either male or
female!”¹).

1: Nope: [https://www.dutchnews.nl/news/2018/10/gender-x-first-
gender-...](https://www.dutchnews.nl/news/2018/10/gender-x-first-gender-
neutral-passport-in-nl/)

~~~
kwhitefoot
? 1: Nope:

And not everyone has a passport or other standardised ID document.

------
Fej
Tom Scott did an excellent video on this.
[https://youtu.be/CcZdwX4noCE](https://youtu.be/CcZdwX4noCE)

------
s0fa37
I'm from Scunthorpe! Never thought i'd see the day it made it to Hacker
News...

To add something relevant to this post - I'm too young to remember this
"problem" regarding the AOL filter, and my parent's first ISP was NTL which is
now Virgin Media. Of course the use of the town name with emphasis on the
aforementioned profanity was common amongst the youth.

~~~
roebk
Hello, fellow person from Scunthorpe! Where about's are you now? I assume you
had to move out of the town to keep in the tech sector.

~~~
s0fa37
I moved to Sheffield for University and then stayed for a couple years after
for work. Have actually ended up in Leicester recently to live with my
girlfriend. My job is quite forgiving in terms of location as I'm on the road
quite often, or can work from home!

------
mcculley
When I see discussions of these filter/namespace problems I am reminded of
when a new coworker caused the administrators to reconsider their account
naming policy because his last name is “Root”.

~~~
canhascodez
I met a lovely Mrs. Null, who had many unfortunate stories concerning
automated forms.

------
petercooper
One of the most popular FIFA (as in the soccer video game) sites has the most
naive filtering, so terms like "assist" and "passing" appear as "...ist" and
"p...ing" which makes the latter look more like "pissing" :-D

------
leipert
That reminds me of the clothing brand "Lonsdale", which contains "NSDA". So if
you wear a jacket, you can hide the other letters. It was worn a lot in the
German neo-nazi scene because Hitler's party was called "NSDAP".

While it is an accident with "Lonsdale", and they actively did a campaign
against it in the early 2000s, someone created a brand named "Consdaple",
which contains the full "NSDAP".

[https://de.wikipedia.org/wiki/Consdaple](https://de.wikipedia.org/wiki/Consdaple)

~~~
Steve44
I remember seeing someone on a night out who had removed the L, the S and the
D. Actually, I'd never noticed that those removed letters have a meaning too
so it was possibly a much deeper message.

~~~
TeMPOraL
"Lonsdale" \- LSD = "on ale". So many interpretations could be made.

------
JdeBP
From the mention in the re-posting at
[https://news.ycombinator.com/item?id=18466787](https://news.ycombinator.com/item?id=18466787)
no doubt. (-:

------
masonic
Overreaction to word fragments has become epidemic in closed captioning of
American TV shows.

I've seen this on more than a dozen networks: captions will be masked with Xs
if any substring matches any possibly offensive word or phrase, e.g.

XXXXtails for cocktails

XXXume for assume

XXXXX for spade (in the unmistakable context of a shovel, e.g. in "The African
Queen")

XXXX for coon (in the unmistakable context of a raccoon, e.g. in "True Grit")

Doo-XXX for Doo-Wop (music)

gimXXXX for gimmick

sXXXpets for snippets

Those are all _actual examples_ from MeTV, H&I, Movies!, GetTV, etc.

------
CM30
One of the best examples of this was with the Pokemon games, where the
creators had tried to block offensive words in nicknames without realising
that some of their own characters had names that trigged the filters by
default.

So for a while, you had a situation where players couldn't trade Cofagrigus,
Froslass or Marshtomp on the GTS (Global Trade System) because the default
species name set off the swear filter.

Hell, in one case, one you got traded in game couldn't then be traded online,
since the nickname the NPC used would trigger the filters and wasn't usable by
players.

It also shows the folly of trying to do this with multiple languages at once
too, since the filters block anything with a name that's offensive in any
language the games are released in. So in some cases, you see Japanese players
blocked from using a name because some characters match a swear word that's
apparently a thing in France or what not.

Makes you feel sorry for the folks at Game Freak having to name these
characters in the first place, or the translators trying to make them work in
other languages. Imagine trying to name 100 new characters per game in a way
that doesn't contain any unintentionally words in any of the 9 or so languages
the games are released in...

------
raesene9
Back in my days working in Infosec at a bank, we had a naive filter on
Internet bound mail looking for "bad words".

I had to review the blocked mails if there was a query about why it was
blocked.

One time I came across a mail and couldn't for the life of me work out the
reason, I ended up writing a script to match on the wordlist to find the
problem.

Turns out the troublesome phrase was "Don't be too hard on yourself"

We updated the block list after that...

------
api
When I was at University of Cincinnati they had a "recreational computing"
mailing list. One day some CS people decided to build a cocktail table video
game console using MAME or something like that. This rapidly became the MALE
GENITALIAtail table (yes, all caps) due to some kind of obscenity search and
replace on the list. The name stuck and there was a MALE GENITALIAtail table
in the student union for a while.

------
gaius
Amongst seafaring folks the cunt splice is often called the cut splice now,
despite that making no sense, to workaround this problem

------
masonic
Somewhat related:

[https://en.wikipedia.org/wiki/Controversies_about_the_word_"...](https://en.wikipedia.org/wiki/Controversies_about_the_word_"niggardly")

------
c3534l
I remember getting an autoban on a Twitch chat for using a word that happened
to start with the letters "paki." That word was "Pakistani." So apparently
just being from Pakistan is the part that's offensive.

~~~
lifthrasiir
Probably because "paki" itself had been a derogatory abbreviation of
"Pakistani"? [1] Yet another example of mindless word filter problems.

[1]
[https://en.wiktionary.org/wiki/Paki#Usage_notes](https://en.wiktionary.org/wiki/Paki#Usage_notes)

------
Theodores
On the Wikipedia page for Scunthorpe (not the problem) the etymology of the
name is given. Seems that we wouldn't be having the Scunthorpe Problem if the
people of Scunthorpe could be bothered to spell their town's name correctly:

The town appears in the Domesday Book (1086) as Escumesthorpe, which is Old
Norse for "Skuma's homestead", a site which is believed to be in the town
centre close to where the present-day Market Hill is located.

If they changed the 'c' for 'k' then they wouldn't get modded down in the AOLs
of this world and nobody outside of Lincolnshire, England would have any
reason to even know of the place.

~~~
pjc50
Next you'll be telling the residents of Westward Ho! to get rid of their
exclamation mark. And what of Six Mile Bottom in Cambridgeshire?

(I think London's Gropecunt lane _did_ get renamed by Victorian Bowdlerisers
though)

~~~
Piskvorrr
If reality doesn't fit the model, the easy solution is to mutilate reality,
obviously. /s

------
Piskvorrr
And of course, comments here are flagged...for profanity (i.e. naming the
Forbidden Towns), one would assume.

------
Digit-Al
buttbuttinate :-D

~~~
jpatokal
Clbuttic!

