
Scunthorpe Sans – a self-censoring font - phoe-krk
https://vole.wtf/scunthorpe-sans/
======
IgorPartola
As a way to show off the capability of ligatures, more power to them. As an
effort to censor, why in the ever loving fuck nuggets would you ever actually
do that?

I have young kids and a large family. We all curse periodically. Instead of
telling the kids that these are bad words, we told them they are adult words.
As in, when you are old enough you can curse all you want, but as a kid you
can’t. We felt that this was a more correct way to put it, and instead of
making a huge deal out of them we instead taught them what is and what isn’t
appropriate. You’d think this would result in them using those potty words all
the time. Nope. I mean mostly no. Currently one of them asked one of the
family members to stop using the word fuck because it makes her uncomfortable.
The other one and I were playing Minecraft and I tried to explain the Nether
to her as hell and she was like “wait is that a word I can say?” which led to
us discussing when it’s ok and not ok to use that word. My girlfriend
explained what “boss bitch” means to the kids recently in terms of dog
mushing.

Some amusing stories: my oldest when she was in daycare made a pact with her
classmates to go into the bathroom and say curse words because the teachers
wouldn’t hear them in there. They even had someone on lookout when they did
this. Words used were “poop” and “doodoo”.

That same kid when she was 4-5 came up to mom and asked “I really want to say
a curse word”. Mom sent her to her room to say it no more than five times and
come back. She did and that was that.

The other kid accidentally busted out with a periodic “what is this bullshit?”
when she was much younger. She does not curse currently.

Words are words and censoring “bad” words is silly. You can’t censor meaning.
“F __k you” is just as powerful a statement as “fuck you”. Especially if it
censors words like “associating” or “Cummings” or “arsenal”. This is just
fodder for the unnecessary censorship memes.

~~~
jawns
Around middle school, I decided that I would stop using swear words, because
it seemed like getting myself in trouble by letting one slip out at an
inappropriate time was more of a bother than just avoiding them and coming up
with more inventive ways to express myself.

Refraining from curse words has been part of my character ever since, and just
about everybody who knows me knows that I don't swear. And you know what? I
have never missed those words. I have some of the most foul-mouthed friends
you can imagine, and the words don't shock or scandalize me. They're just not
a part of my vocabulary.

I have never, so far as I know, suffered a single negative consequence because
I don't curse. But the benefits have gone far beyond never getting detention
at school or grounded by my parents because of the use of salty language. For
instance, it's led me to stretch myself creatively when I really need to
express exasperation or any other strong emotion. And it's led me to notice
that some swear words, far from being innocuous, can be really debasing, and
to have empathy for those on the receiving end.

~~~
hiccuphippo
ESL here. At school one teacher told us using swear words only show your lack
of knowledge for better words. I took that to heart and the very few times
I've used them, I really had no other option. However I've noticed swear words
come more easily to me in English and don't feel as harsh as when I hear them
in Spanish. I wonder if it's my lack of vocabulary.

~~~
kandaBear
There's actually been interesting research done about that idea, that people
who swear have a smaller vocabulary or are less fluent. It seems that the
opposite is actually true¹, people who swear are actually demonstrating a
broader fluency. Worth a read!

¹
[https://www.sciencedirect.com/science/article/abs/pii/S03880...](https://www.sciencedirect.com/science/article/abs/pii/S038800011400151X)

~~~
dylan604
I've never given credence to the "cursing == less articulate" either. It's
knowing the impact dropping a curse can have to get people's attention. Plus,
sometimes, people are just too dense to get how bothered you are, but drop a
colorful phrase, and they tend to get the point. Like the George Carlin bit,
take a word like "incredible" and it's a nice word. Stick fuck in the middle
of it to make "infuckingcredible" and it's an even better word.
Fanfuckingtastic > fantastic and is much more impactful than "most fantastic".
Having said all of that, I have never told someone FU. That's my line.

~~~
jsjohnst
> Fanfuckingtastic > fantastic and is much more impactful than "most
> fantastic"

I’ve always heard and used that “word” in the sarcastic sense, rather than as
meaning more/most fantastic. Using in context to illustrate:

Friend: I broke the part we need to finish the project.

Me: Fanfuckingtastic, guess I’ll get busy making another one.

~~~
dylan604
well, people have misused grammar for ages <ducks>

Again though, that's the great thing about "swear" word. Shit can be good,
shit can be bad, and shit can just be shit. Not being a grammarian, how many
words other than curse words be a noun, verb and adjective all at the same
time. buffalo comes to mind, but is just non-nonsensical when used.

~~~
jsjohnst
> Not being a grammarian, how many words other than curse words be a noun,
> verb and adjective all at the same time.

Here’s just a few, there’s a bunch more:

\- back

\- best

\- better

\- bitter

\- broadside

\- clean

\- clear

\- close

\- cod

\- collect

\- counter

\- crisscross

\- damn

\- double

\- down

\- even

\- express

\- fair

\- fast

\- fine

\- firm

\- flush

\- forward

\- free

\- full

\- home

\- jolly

\- last

\- light

\- low

\- o.k.

\- okay

\- out

\- pat

\- plain

\- plumb

\- plump

\- pop

\- prompt

\- quiet

\- right

\- rough

\- round

\- second

\- short

\- solo

\- square

\- steady

\- still

\- tiptoe

\- true

\- upstage

\- well

\- wholesale

\- worst

\- wrong

\- zigzag

As an added bonus, the words above aren’t just valid as nouns, verbs, and
adjectives, they are also adverbs too.

Source:
[https://onweb3.wordpress.com/2013/08/14/663/](https://onweb3.wordpress.com/2013/08/14/663/)

------
flurdy
I once worked for an online game aimed at children. With some limited
messaging and a forum, we needed to automatically filter swearing especially
when it started to reach millions of users.

The initial limited English only blacklists were useless. Kids are creative
and terrible. So we replaced with big processes of failover with several
external swearword filtering services. Costly...

But kids are creative..., they write single letter messages that circumvent
any automated filtering. Or rearranged their "room" with decoration that spelt
out letters, or simply drew cock and balls, etc. This font might have reduced
the need for filtering. But kids are creative...

Thankfully it was not instant messaging. Direct near-instant messaging came in
later products... Car-crash every time. Never write a messaging app for kids.
There is always a mean-time-to-dickpick or worse.

~~~
jstanley
As soon as you start trying to automatically filter out messages you don't
want, you enter an arms race that you're not going to win.

Surely a better solution is to provide the players with a way to votekick
abusive users, and leave it at that?

~~~
softwarejosh
have you never played video games?

sure lets votekick someone from club penguin, im sure that wouldnt encourage
cyber bullying and make matters worse

~~~
jstanley
I've played a bit of [https://skribbl.io/](https://skribbl.io/) recently and
the votekick mechanism works very well for getting rid of people who just want
to draw dicks and write swear words.

------
IanCal
I love this.

I particularly like that arsenal is censored to a___nal.

~~~
Datenstrom
Call of Duty has censored all input with a regex like this for as long as I
can remember. I can't even name one of my classes "Assault" which isn't even
viewable publicly. You would think it would be easy to add a whitelist for
such a common word.

~~~
tzs
I had occasion to write a chat server for a small gaming service once. The way
I came up with to do the profanity filter ended up working a lot better than I
expected it to, and avoided the problem you describe.

It worked like this.

1\. Make a copy of the text to be filtered, modified as follows.

\-- change upper case letters to lower case

\-- change 0 to o, 1 to i, 3 to e, 5 to s, and vowels with various accent
marks to the corresponding unaccented lower case vowel

\-- discard anything else

For example, if the input was

    
    
      start the assault. let's f-u-c-k them
    

it would become

    
    
      starttheassaultletsfuckthem 
    

2\. Scan this for any substrings that are on the bad word list. In this case
it would find "ass" and "fuck" [1].

3\. If no bad words are found, the original string is returned and we are
done.

4\. For each word in the original string, look it up in the dictionary (I
believe I used /usr/dict/words). Note the ranges of character positions in the
original string of the words that are spelled correctly.

5\. For each bad word found in the filtered string, if all of its characters
came from positions in the original string that were part of correctly spelled
words, leave it alone. Otherwise, asterisk out its characters in the original
string.

In the above example, "ass" would be left alone because all of its characters
came from "assualt" which is recognized as a correctly spelled word. None of
"fuck" came from correctly spelled words, so it would get zapped. The final
result would be

    
    
      start the assault. let's *-*-*-* them
    

[1] Well...actually not. I just checked my archives and found the chat server
bad word list. It did not include "ass".

~~~
woodrowbarlow
just... be sure you're at least aware of the scunthorpe problem that this font
is named after. do not implement an obscenity filter without being aware of
the inconveniences they pose.

my name is woody (like the tom hanks cowboy, but also, i've been told, like
the outdated euphemism for an erection) and i've been scunthorped out of
creating accounts on dozens of sites because of my name. it's aggravating at
the best of times, but doubly so when i want to be actually recognizable on
the service in question.

[https://en.wikipedia.org/wiki/Scunthorpe_problem](https://en.wikipedia.org/wiki/Scunthorpe_problem)

------
thepete2
There is a video [0] about this censoring-"problem" by Youtuber Tom Scott. He
talks about how it's impossible to implement with all edge-cases, easy to
circumvent and thus usually not a good idea. Fittingly, the video is filmed in
Peniston.

[0]
[https://www.youtube.com/watch?v=CcZdwX4noCE](https://www.youtube.com/watch?v=CcZdwX4noCE)

~~~
dubcanada
If you play any video game that is multiplayer online you will know it's
impossible.

All these large companies Riot, Blizzard, Valve, etc implement censoring
tools. People just swear over voice or add random characters to swear words or
swear in different languages.

It really doesn't matter what they do, people will still swear.

~~~
conradludgate
I think Riot did ok actually, at least when I used to play LoL you can disable
the filter. That means those who do swear can just use their normal vocabulary
and not have to invent ways around the filter, and those that haven't disabled
it get a still decent censored experience.

------
jonesjohnson
[http://www.sansbullshitsans.com/](http://www.sansbullshitsans.com/)

------
some1else
Another example of redaction using OpenType features:

[http://projectseen.com/](http://projectseen.com/)

> "Seen" is a typeface that is concerned with privacy and the interception of
> our communications by the NSA. It automatically strikes through spook words.

~~~
MauranKilom
Somewhat ironic that this site uses a self-signed certificate...

------
EE84M3i
It kind of sucks that in the source code of the page though they have to use
asterisks instead of the literal swears, presumably in case the browser that's
viewing the page doesn't load the font for some reason.

------
lisper
It's a cute idea, but it took me all of 5 seconds to find a way to do an end-
run around Scunthorpe Sans's fuckcked up censorship.

I've never really understood why people think that misspelling "fuck" as
"f*ck" suddenly makes is socially acceptable.

~~~
parenthesis
I'm not expressing an opinion on the practice, but in case a child who doesn't
know the word reads it?

I can remember as a child reading a story in a UK tabloid newspaper that
quoted (Neil Kinnock, I think) with some asterisks in some of the words, and
not fully understanding.

~~~
lisper
I don't think it would take very long for a child to learn that f_ck and f___
and "the F word" are all alternative spellings of "fuck".

Hm, HN elides asterisks in in-line text. Maybe they still work in code blocks?

    
    
      f*ck f***
    

Yep. <irony> Whee, what a fun game. </irony>

------
Doctor_Fegg
Never mind the ligature gag... the source font, Aileron, is lovely. Like a
modern version of Rail Alphabet, the Helvetica-derived British Rail corporate
font.

[http://dotcolon.net/font/aileron](http://dotcolon.net/font/aileron)

[https://en.m.wikipedia.org/wiki/Rail_Alphabet](https://en.m.wikipedia.org/wiki/Rail_Alphabet)

------
unfunco
Lightwater in Surrey is still censored.

~~~
m4r35n357
That one took me a few seconds . . . ;)

------
syntheticnature
I find myself wondering whether a similar technique could be used to create a
font that censors the vowels in God and related words for Jewish readers, due
to their taboos regarding vowels in the name of God.

------
glangdale
Mishit is censored too... I like the number of HN commenters who tried out
their favorite Scunthorpe variations.

As somehow who designed an implementation of \b for an NFA-based regex matcher
([https://github.com/intel/hyperscan](https://github.com/intel/hyperscan)) I
always have an appreciation for the Scunthorpe problem; we did see quite a
number of scam/spam/bad-word detection regex patterns that tried to avoid it
with word boundary stuff.

------
nixass
I'm Richard and cannot even type my own nickname

~~~
aeneasmackenzie
Yeah these efforts always fall flat. Buttbuttination is my favorite example.

~~~
lokedhs
Yes, that's a clbuttic.

[http://thedailywtf.com/articles/The-Clbuttic-
Mistake-](http://thedailywtf.com/articles/The-Clbuttic-Mistake-)

~~~
userbinator
That was 12 years ago, and sadly the Google link to searching for the word now
links to countless articles about the phenomenon, instead of actual examples
(I gave up after the 5th page.) Several years ago I could still find them, I
remember there being an entire mailing list that was censored in that way.

I did manage to find one example that was saved by the Web Archive:
[http://web.archive.org/web/20080725103641/http://www.bluegra...](http://web.archive.org/web/20080725103641/http://www.bluegrassworld.com/music/Is-
the-song-Dueling-Banjos-considered-blue-grbutt.html)

------
lifeisstillgood
Many many years ago I wrote a sign up form (an app now I suppose) for a UK
ISP, and we, I cannot remember who wanted the feature, write a simple
blacklist of words that users should not be allowed to put in their addresses,
and of course we whitelisted Scunthorpe (as well as several other rude
placenames in the UK post office address file)

Just weird how the same things come round and round.

~~~
fit2rule
I cut my teeth as a junior dev in the 80's writing soundex filters for the
same purpose, which was repurposed to do auto correction of street names after
I got it working on swear words... always amuses me to see so much effort put
into such things even still today (as I struggle with autocorrect while typing
one-handed due to a broken arm...)

~~~
lifeisstillgood
It is, I think, based around the need to understand the context to understand
if it is a swear word, part of "reasonable" conversation or unwanted trolling.

And we are still a long way from AI so a long way from a generic solution - as
such everyone must roll their own for their specific needs and trade offs

------
mathieuh
I used to work on the MOT (roadworthiness test) system for GB, we had a
blacklist and “screw” was in it.

One day we got some feedback saying “what group of virgins wrote this
software” because a tester was trying to type in “screw in tyre” as a fault.

First time I’d encountered the Scunthorpe problem in real life, had a good
laugh about it.

------
zxcvbn4038
This might actually be a good way to make people self -censor. When a word
suddenly blacks out they will wonder what they have triggered and maybe think
twice before sending it because either they know they’ve crossed a line or
it’s dubious the software will accept their text as written. Just like you
showing people their ip addresses has proven a significant deterrent for
attempting credit card fraud, masking out the undesired language might have
the same effect. the beauty of doing it in. The font is its a mechanism nobody
expects, someone is going to waste a lot of time figuring out how it works. I
don’t see a GitHub project, I’m wondering if the list of words can be expanded
to include non-English languages.

------
xg15
Soo... the ligature trick is relatively old news - but I was somewhat
surprised by the font _not_ censoring "Scunthorpe". (And my suspicion is that
this ability was what they were actually showing off)

Any idea how they did this?

~~~
bntyhntr
No expert in ligatures but from a naive point of view I would assume it's
another ligature that takes priority

~~~
xg15
That would make sense. I wasn't aware, overlapping ligatures were permitted. I
wonder how well-defined this is though. I.e. will the ligature for the longest
substring always "win", or if this is implementation-defined?

------
gorgoiler
This is shamelessly off topic but what’s the best tool right now for using
open type features (like ligatures) for PDF typesetting from simple ascii
text?

I’m using asciidoc which has thankfully recently shipped hyphenation support,
but I don’t think it supports ligatures. They feel like another must have for
beautiful typesetting.

I’m wary of TeX because I find it so hard to get the stylised results I want
for our org (a school) but should I be less afraid? TeX based tools always
felt content first, which is laudable, but presentation fanciness is a
requirement when one works in pitching — in this case to school children.

------
m4r35n357
Hehe, who remembers Mary Whitehouse these days?

~~~
parenthesis
I do, and the Mary Whitehouse Experience.

~~~
Myrmornis
It’s funny that Pink Floyd were having a go at MW back on Animals — she did a
good job of getting under the skin of several decades of young people
apparently.

~~~
gnufx
That's history today, that is.

------
canofbars
Why does the inspector and copy/paste only show * under the censored areas?

~~~
londons_explore
Maybe so search engines don't blacklist the site for being adult content?

------
donatj
I had this exact idea after they squirt gunned the gun emoji. I am surprised
regime’s haven’t pushed something like this to block words like Tiananmen.

------
Endlessly
Lol, like how when it loads you see the hidden text and then it disappears.

Another option might be better to process the text, swap out each letter with
for example X — or even better a ID-key, then render the font you made.

If you used the ID-key, you could even encrypt each key, tag them by type
(spoilers, swear words, personal info, etc) - though for obvious reasons that
would require server-side code.

~~~
londons_explore
You only see the hidden text because your browser renders it with a different
font first.

Its the font itself doing the censoring, not some script, which IMO removes a
good chunk of the 'neatness' of this solution.

------
retpirato
[https://www.glyphrstudio.com/online/](https://www.glyphrstudio.com/online/)
is a free online font editor that lets you do this. You can either create a
font from scratch or load an existing one (ttf, otf, or svg). It wouldn't
censor a word like "associating".

------
BearOso
Yoshitsuna. I see this false positive on other sites all the time. It needs to
exclude censored words that are part of others by checking for white space on
the sides. And that won’t work with ligatures at the very beginning of text.

Then you need the censorship to be voluntary, so people aren’t tempted to
circumvent it.

------
martin-adams
This is very clever. You could explore more fun with this technique, such as
putting squiggly lines under common misspelling.

------
arijun
This is very cute! But I wonder why the creator decided to special case of
Scunthorpe and not more common terms like peacock

~~~
unklefolk
I live not too far from Scunthorpe and heard a story that Scunthorpe county
council rolled out a new spam filter on the own email servers and no one in
the council could sent any emails that day as they all got blocked. This was a
few decades ago and I can’t find a source but there are references to the
origin here:

[https://en.m.wikipedia.org/wiki/Scunthorpe_problem](https://en.m.wikipedia.org/wiki/Scunthorpe_problem)

~~~
twic
Different, but Essex University still has the domain sx.ac.uk to be able to
work around daft profanity filters.

------
lihaciudaniel
Too bad the 18th century adaptation of the Spanish negro isn't censored

------
globular-toast
It's strange because the words written on the website are actually "f* * * ",
"c* * * " etc. which are "censored" just like the actual swear words.

------
smitty1e
"I think this is [flowerbed]ing awesome," said a Babylon Bee podcast
aficionado.

They use an excellent voice-over actor, Dave DeAndrea to bleep out the errant
F-bomb.

~~~
selimthegrim
Wait, edging is edgy now? Or that's just sticking another f-word in there?

~~~
smitty1e
Latter.

------
Mindwipe
Presumably by someone from the UK given some of the references, but "bloody
crap" is okay. Curious.

------
imdsm
The word "parse" breaks it

------
Lio
It’s pretty good. I tried it out with c%^*ge and that’s a proper filthy!

------
racl101
It still allows blowjob, handjob, rimjob and ZJ.

------
mrlonglong
Not going to read this, it's just sily.

------
tdons
Instant clbuttic.

------
OliverJones
Somebody has extra time on their hands!

------
dkdbejwi383
RIP Shitterton

------
jpxw
“Fuⅽk” works (copy and paste in)

------
rasengan
What happens if you copy paste?

~~~
alpaca128
Font ligatures only affect how the text is displayed in that specific font.
The underlying text is always the same.

------
yuriko
Shitake mushrooms

~~~
StavrosK
Isn't that spelled "shiitake"?

~~~
thrwyoilarticle
It's spelled 椎茸. There may also be a collection of Latin letters that are
pronounced similarly.

~~~
StavrosK
> [https://www.stavros.io/posts/fastapi-with-
> django/](https://www.stavros.io/posts/fastapi-with-django/)

Commonly referred to as "transliteration", and the transliteration is spelled
"shiitake".

~~~
thrwyoilarticle
There are at least 3 transliteration systems for Japanese. Writing foreign
languages with 26 Latin letters usually loses some nuance and shows some bias
towards the pronunciation of the authors' native language.

~~~
Delk
It has also become a loan word in English, so there's at least a de facto
convention of how it is spelled with latin characters in English text. It's
not written in kanji when used in English.

~~~
thrwyoilarticle
A loan word with two different spellings!

------
DmitryOlkhovoi
this shit doesn't work for сука блять

------
djsjejjdi324
on the other hand “fûck” is not censored

------
celticninja
seems like bitch is acceptable language

~~~
catblast
Bitch has a common use/definition that is not vulgar.

~~~
saagarjha
Well, I think it’s become less common.

------
atrilumen
_Chimba._

------
buboard
Please put a comments section in the page !

------
shelsoloa
Doesn't censor "Karen"

0/10

------
lerpapoo
e for effor

------
Atlas48
clbuttic mistake ahoy!

------
dkaigorodov
One more font idea: Add words like "_rump" and "_orona" to the block-list

------
LeoNatan25
The author links to a Wikipedia article, which lists many common “censoring”
mistakes, to show they have overcome one of those, but all others I’ve tried
from the article are indeed incorrectly “censored”.

~~~
IanCal
No they are saying they have explicitly deal with one and only one special
case, which is also the name of the problem and the name of the font.

------
curuinor
It doesn't seem to censor the n-word. I would think it's pretty important to
censor the n-word

checked, failed to censor: n-word every word in this comic (nsfw for offensive
words) [https://pbfcomics.com/comics/the-
offenders/](https://pbfcomics.com/comics/the-offenders/)

~~~
superzadeh
Fag##t is also not censured. It is one of the core problems: Not enough
diverse people involved in those projects, and the outcome is showing that the
only thing "wrong" in calling someone "f###ing fag##t n#####r" is the first
word, the two other ones are "fine" or "depends on context" as other mentioned
in this thread. Go thoughts and intention, but needs support on execution.

~~~
jpxw
I think you’re taking this a little bit too seriously

~~~
superzadeh
You might have never experienced the intersectionality of being both. In any
case, is it really yours to judge? Anyways, your comment is precisely the
point I'm making: most people working on these projects are not necessarily
aware of the underlying issues that people actually suffer from. They under
play some, simply because they have not experienced it. Not complaining
though, this is better than not doing anything, but we need to setup better
ways to pull-in more diversity on these projects.

~~~
benchik
Just proves the point that you're taking think a bit too seriously.

