Imagine being called John Graham-Cumming. Long, long ago Google didn't understand that "Cumming" was a name. Google myself, get served ads for adult web sites.
And Eudora's Mood Watch feature would flag every single email I sent as offensive.
Similar note, my surname is Fahey, which is Irish. For a few years after 9/11 my dad would get pulled aside for screening almost every time he flew alone. It was basically guaranteed if he was flying one-way.
Our guess is that we were getting lumped in with some sort system that assigned higher risk to people with Arabic sounding names. Every Arabic surname I'm aware of begins with Fa[1]. Plus, if you look at it phonetically, both Fahey and the Arabic names ending with i have an ee sound at the end.
I've always wondered if that was a coincidence. A computer might not even have been involved, since it never happened flying from Chicago, where most people recognize Fahey as Irish, it only happened flying to Chicago.
Of course my dad tends to talk to himself when he is thinking, so who know, maybe he just looked really shifty.
There's an old joke about a girl named Megan E. Cummings, who successfully petitioned for a change to her university e-mail, which had been auto-generated according to the scheme `substr($LASTNAME, 6).$FIRSTINIT.$MIDINIT`.
There's a sugar called "fucose", sometimes abbreviated to "fuc". The "kinase" type of enzymes, which transfer phosphate groups, are abbreviated to "K". You can figure out the rest. http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm
I once worked with a person named Alison Funkhouser. She was a college intern, and her university's username policy was:
first_name[0] + last_name[:6]
The resulting username was afunkho.
(she was also really great to work with, and she knows her stuff when it comes to robotics... if anyone happens to run across her resume anywhere, hire her)
There was also the athlete Kevin Youkilis, who happens to be Jewish. One website that lists pro athletes used this scheme to provide a unique URL for each athlete:
last_name[:5] + first_name[:2]
That's right, his identifier was youkike. When someone pointed this out to the website owner, he changed it... to youklKe. Which isn't much better.
(I don't know Kevin Youkilis personally. I got this off TVTropes.)
Nearly my first helpdesk call at a new job was for a Wayne Anker. Luckily we used family name + initial for ID and not the other way around.
I had a particular friend in childhood. What on earth Mr and Mrs Head were thinking about when they named their son Richard is beyond me. They already had a daughter called Rachel so should have got the hang of the naming process the second time around.
> I had a particular friend in childhood. What on earth Mr and Mrs Head were thinking about when they named their son Richard is beyond me.
I knew a Richard Bates in college (yes, the same guy from the Silk Road trial). He betrayed one of my best friends in a really deplorable way, so you can guess the nicknames we called him behind his back.
Try keeping up with the journals as a chemist while on holiday behind an overeager webproxy. You are told that the subdiscipline of analytical chemistry is out of bounds.
But that's a feature. The voting public sees that you are trying hard and failing, that's somehow considered better than shaking your head at the intractable problem.
In my former life as a mathematician, I worked on analytic combinatorics. Mathematicians aren't quite as aggressive about abbreviating as chemists, so I never saw the abbreviation "anal. comb." in the wild, but I always expected to.
I've run into this problem myself when parsing recipes for food allergies . Doughnuts has the word nuts in it but doesn't always contain nuts as an ingredient .
This happened at Medium [0] because they hash paragraphs to a 4-digit hexadecimal string, and ad blockers would hide things like "#ad01", "#ad02", etc.
Same here for our Hackespace a year or so ago - our sponsor logos weren't showing; we had to change the CSS class to some arbitrary meaningless phrase.
Matter of fact, nuts.edu is (or, at least, was - doing a whois on my phone is a pain) registered to a computer club at the NUST (which, incidentally, is the official abbreviation even though the direct translation would be NUTS...)
Presumably it has been white-listed just about anywhere as enough confused, angry or bemused students complained via other channels.
Also, I remember how we initially didn't check the auto-generated user names (part of first and last name) for obscenities or other unfortunates.
One of the highlights I still remember 20+ years later is a poor sod who was pervo@stud for a few hours; luckily we spotted it before he started using the account.
I once built a license code scheme that occasionally generated obscenities in the middle of the code. Once we realized the problem, we had a fun few hours building a blacklist with every obscenity we could think of.
I originally found this list during a few frantic hours after I first created a password cracking homework for my security class. Naturally I'd used the Linux dictionary in /usr/dict/words as the source of the passwords. It only occurred to me much later to check what passwords my script had randomly chosen, and by then the hashes had been distributed to the students. Whoops!
I built a similar system for marketing codes recently, but I wanted to encode a combination of the user ID and some of the details of the mailing so the relevant information could be determined without having to check a database. I just removed letters that could create certain four-letter forms of profanity, and then broke up the code into four-letter blocks using hyphens so I didn't have to worry about obscenities longer than that.
I don't doubt you could still find an obscenity if you were really looking, but I feel comfortable with that level of obscenity prevention. If someone honestly finds a way to be offended by whatever obscenity possibilities are left over, then they could probably find a reason to be offended by almost anything.
This problem could be solved by defining a logical rule (most probably through a regular expression) that would only filter the bad word when present as a single word.
I'm amazed how rarely this simple system is used. Instead you end up with monstrosities such as the power stars chat that mangles most words into unreadable mess of .
When I worked for a company that made label printers we had a potential customer who wanted us to print labels with human readable and barcode fields with 4 random letters and 4 random digits but did not want the letters to spell any obscene words. We asked for a list of words to ban but they declined to provide such a list. We did not get the contract.
Note that the problem of words being misunderstood when lacking context is not limited to computers. My father - a chemistry professor - was at a conference a few years ago about Free Radicals when he was approached by a member of the public who wanted to know if he could participate...
Not quite as funny, but I used to work for a company that had the word microwave in its name - we made radar components and such. We once were approached by someone who wanted us to repair their microwave oven. Our building had no obvious sign and didn't look like an appliance repair shop, so I don't know how they found us.
I'm not sure if it's still the case, but it used to not be possible to trade certain Pokémon over the global trade system with their default name due to a filter like this.
I believe Nosepass and Cofagrigus were two of the affected.
And Eudora's Mood Watch feature would flag every single email I sent as offensive.