Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scunthorpe Problem (wikipedia.org)
71 points by mjs on April 5, 2017 | hide | past | favorite | 59 comments


Imagine being called John Graham-Cumming. Long, long ago Google didn't understand that "Cumming" was a name. Google myself, get served ads for adult web sites.

And Eudora's Mood Watch feature would flag every single email I sent as offensive.


Interestingly enough, there's also a doctor of a sexual health clinic that also provides online prescriptions for Viagra called Sean Cummings.

Somehow delivery of order confirmation emails have never been an issue!

Edit: eeek I don't order from there, I just work with the website's host!


It's okay, we won't judge you.

Similar note, my surname is Fahey, which is Irish. For a few years after 9/11 my dad would get pulled aside for screening almost every time he flew alone. It was basically guaranteed if he was flying one-way.

Our guess is that we were getting lumped in with some sort system that assigned higher risk to people with Arabic sounding names. Every Arabic surname I'm aware of begins with Fa[1]. Plus, if you look at it phonetically, both Fahey and the Arabic names ending with i have an ee sound at the end.

I've always wondered if that was a coincidence. A computer might not even have been involved, since it never happened flying from Chicago, where most people recognize Fahey as Irish, it only happened flying to Chicago.

Of course my dad tends to talk to himself when he is thinking, so who know, maybe he just looked really shifty.

[1] Wikipedia list of Arabic surnames as an example: https://en.wikipedia.org/wiki/Category:Arabic-language_surna...


There's an old joke about a girl named Megan E. Cummings, who successfully petitioned for a change to her university e-mail, which had been auto-generated according to the scheme `substr($LASTNAME, 6).$FIRSTINIT.$MIDINIT`.


There's a sugar called "fucose", sometimes abbreviated to "fuc". The "kinase" type of enzymes, which transfer phosphate groups, are abbreviated to "K". You can figure out the rest. http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm


I once worked with a person named Alison Funkhouser. She was a college intern, and her university's username policy was:

    first_name[0] + last_name[:6]
The resulting username was afunkho.

(she was also really great to work with, and she knows her stuff when it comes to robotics... if anyone happens to run across her resume anywhere, hire her)

There was also the athlete Kevin Youkilis, who happens to be Jewish. One website that lists pro athletes used this scheme to provide a unique URL for each athlete:

    last_name[:5] + first_name[:2]
That's right, his identifier was youkike. When someone pointed this out to the website owner, he changed it... to youklKe. Which isn't much better.

(I don't know Kevin Youkilis personally. I got this off TVTropes.)


Nearly my first helpdesk call at a new job was for a Wayne Anker. Luckily we used family name + initial for ID and not the other way around.

I had a particular friend in childhood. What on earth Mr and Mrs Head were thinking about when they named their son Richard is beyond me. They already had a daughter called Rachel so should have got the hang of the naming process the second time around.


> I had a particular friend in childhood. What on earth Mr and Mrs Head were thinking about when they named their son Richard is beyond me.

I knew a Richard Bates in college (yes, the same guy from the Silk Road trial). He betrayed one of my best friends in a really deplorable way, so you can guess the nicknames we called him behind his back.


I worked with a guy called Steve Rascine, and his login name was "RACIST".


Then 'Racine', surely?


I have a friend named Ashley, and his emails always end up in my spam filters, because of AshleyMadison.com, and all their spam.


At least the wife will find that one plausible.


Actually, she was the one that pointed this out to me long ago. It was frustrating and eventually I emailed Google and got them to fix it.


Tom Scott did a video on this:

Why Web Filters Don't Work: Penistone and the Scunthorpe Problem - https://www.youtube.com/watch?v=CcZdwX4noCE

It's well done, like the rest of his content.


On a similar note, I wonder what web filters would make of the URLs for Pen Island stationers shop or the Mole Station creche websites.


Try keeping up with the journals as a chemist while on holiday behind an overeager webproxy. You are told that the subdiscipline of analytical chemistry is out of bounds.

But that's a feature. The voting public sees that you are trying hard and failing, that's somehow considered better than shaking your head at the intractable problem.


In my former life as a mathematician, I worked on analytic combinatorics. Mathematicians aren't quite as aggressive about abbreviating as chemists, so I never saw the abbreviation "anal. comb." in the wild, but I always expected to.


Philosophers have to contend with Aristotle's Posterior Analytics. The title was particularly striking on the old Dynix Automated Library System.

https://en.wikipedia.org/wiki/Dynix_(software)


That's like the British joke about which three football teams have swear words in their names: Arsenal, Scunthorpe, and MANCHESTER FUCKING UNITED. :-D


I've run into this problem myself when parsing recipes for food allergies . Doughnuts has the word nuts in it but doesn't always contain nuts as an ingredient .


Just had to update a conference web page because the sponsor logos had a css class of 'sponsor' which ublock and others were blocking.


This happened at Medium [0] because they hash paragraphs to a 4-digit hexadecimal string, and ad blockers would hide things like "#ad01", "#ad02", etc.

[0]: https://medium.engineering/the-unluckiest-paragraphs-751dd36...


Same here for our Hackespace a year or so ago - our sponsor logos weren't showing; we had to change the CSS class to some arbitrary meaningless phrase.


Back in the late nineties, I attended the Norwegian University of Technology and Science.

Someone in the IT department figured it was an excellent idea to host all student accounts on the stud.ntnu.no subdomain.

We got a few odd bounces.


I was sure you were going for a NUTS reference with that first sentence. Alas, a missed opportunity for stud.nuts.edu. ;)


I had a professor once tell the class that Minix was created with the Free University Compiler Kit.

He realized his mistake a few seconds later when we all started laughing.


Matter of fact, nuts.edu is (or, at least, was - doing a whois on my phone is a pain) registered to a computer club at the NUST (which, incidentally, is the official abbreviation even though the direct translation would be NUTS...)


I'm currently attending, and we're still on username@stud.ntnu.no emails. I haven't had any experience with anything bouncing in my 5 years though.


Presumably it has been white-listed just about anywhere as enough confused, angry or bemused students complained via other channels.

Also, I remember how we initially didn't check the auto-generated user names (part of first and last name) for obscenities or other unfortunates.

One of the highlights I still remember 20+ years later is a poor sod who was pervo@stud for a few hours; luckily we spotted it before he started using the account.


I once built a license code scheme that occasionally generated obscenities in the middle of the code. Once we realized the problem, we had a fun few hours building a blacklist with every obscenity we could think of.


This is where I always chime in with a link to the List of Dirty, Nasty, or Otherwise Obscene Words on github.

https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and...

I originally found this list during a few frantic hours after I first created a password cracking homework for my security class. Naturally I'd used the Linux dictionary in /usr/dict/words as the source of the passwords. It only occurred to me much later to check what passwords my script had randomly chosen, and by then the hashes had been distributed to the students. Whoops!


> Added "shitblimp"

I'm already in love.


I built a similar system for marketing codes recently, but I wanted to encode a combination of the user ID and some of the details of the mailing so the relevant information could be determined without having to check a database. I just removed letters that could create certain four-letter forms of profanity, and then broke up the code into four-letter blocks using hyphens so I didn't have to worry about obscenities longer than that.

I don't doubt you could still find an obscenity if you were really looking, but I feel comfortable with that level of obscenity prevention. If someone honestly finds a way to be offended by whatever obscenity possibilities are left over, then they could probably find a reason to be offended by almost anything.


Came across an instance of this recently, I think on the FT's website... It took me a while to figure out what was going on with "smar * * * * ches".


okay, I give up. Is it two words?


Think tech.


twat


When I'm bored, googling for "buttbuttination" provides nearly unlimited entertainment.


TVTropes as a good list of amusing examples as well: http://tvtropes.org/pmwiki/pmwiki.php/Main/ScunthorpeProblem


Obligatory warning: probably the single most addictive website in existence. Don't go there if you have something else to do today.


Many amusing examples in the source page, but this one really stood out.

> It also blocked e-mails sent in Welsh because it did not recognize the language.

With my (very) limited exposure to Welsh, i kinda get that it would give spam filters fits.


This problem could be solved by defining a logical rule (most probably through a regular expression) that would only filter the bad word when present as a single word.

I'm amazed how rarely this simple system is used. Instead you end up with monstrosities such as the power stars chat that mangles most words into unreadable mess of .

Could be a fun game though. Guess the words!

ertion

Weight and m


When I worked for a company that made label printers we had a potential customer who wanted us to print labels with human readable and barcode fields with 4 random letters and 4 random digits but did not want the letters to spell any obscene words. We asked for a list of words to ban but they declined to provide such a list. We did not get the contract.


Clbuttic


Note that the problem of words being misunderstood when lacking context is not limited to computers. My father - a chemistry professor - was at a conference a few years ago about Free Radicals when he was approached by a member of the public who wanted to know if he could participate...


Not quite as funny, but I used to work for a company that had the word microwave in its name - we made radar components and such. We once were approached by someone who wanted us to repair their microwave oven. Our building had no obvious sign and didn't look like an appliance repair shop, so I don't know how they found us.


Maybe he thought it was a Flaming Lips show: https://www.youtube.com/watch?v=qwlC0QWxj88



From their FAQ:

> Q: Can I provide my own wood? A: In most cases we can handle your wood.

I'm finding it hard to decide how intentional this is...?


100% intentional in that case.


expertsexchange.com was (is?) the most (in)famous of these.


I thought it was a dead heat between Pen Island, Powergen Italia, and Therapist Finder?


There was also Mole Station Nursery.


There's a store near me called Kids Exchange with a badly kerned sign. It's difficult to parse as intended.


Does this happen to be in NC? (If so I know where it is)


They're a chain, or at least they're also up here in CT.


I'm not sure if it's still the case, but it used to not be possible to trade certain Pokémon over the global trade system with their default name due to a filter like this.

I believe Nosepass and Cofagrigus were two of the affected.


This is still happening on some subreddits, /r/latestagecapitalism for example.


see wikipedia article "Internet_Watch_Foundation_and_Wikipedia"




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: