Hacker News new | past | comments | ask | show | jobs | submit login
WordSafety – Check a name for unwanted meanings in foreign languages (wordsafety.com)
431 points by randall on Aug 25, 2015 | hide | past | web | favorite | 276 comments



Hey guys, I made this last week as a two-evening side project. Happy to see it posted here, thanks randall!

I know the word lists aren't complete. This was the best I could do given the time constraint, the fact that I don't actually speak 19 languages... And also, after two evenings of googling dirty words, I started feeling like I'm about to acquire Tourette's in some unknown language ;)

I'll update the database with the words submitted here and through the form on the site. Thanks!

--

Edit -- here's Google Analytics for this site after 1 hour on the HN frontpage:

http://wordsafety.com/img/analytics_2015-08-25_1736.jpg

This is a site that had essentially zero traffic before HN, so I figured this would be a potentially interesting glimpse into HN's audience.


I'm so proud of you <3 http://i.imgur.com/bMQK9vU.png


Indeed your reply was the reason I made this, so thanks :)


I tested my own name "Lucas" and it resulted in "ås". The site said that "ås" means "donkey" in swedish. But me being a native swede that is actually incorrect. The real meaning of the word "ås" is actually "esker" the ridge thing. The real word for "donkey" in swedish is "åsna" so probably it's only a typo. :)

PS: If you need some help with the swedish dict i could possibly help you or collect some friends to do it. :)


> I know the word lists aren't complete. This was the best I could do given the time constraint, the fact that I don't actually speak 19 languages... And also, after two evenings of googling dirty words, I started feeling like I'm about to acquire Tourette's in some unknown language ;)

Despite all that, I'm impressed by the level of completion for Bengali and Hindi. I tried a few variations of several common Bengali curses and all came up. Thanks for making this, and also for not limiting it to Romance and Germanic languages.

What sources did you use to compile the list of words?


For each language I used several random sources on the web, sort-of crosschecked for multiple spellings, etc.

This Hindi site was definitely my favorite:

http://www.hindilearner.com/hindi_bad_words.html

After all the insane insults that I'm too shy to repeat, there's an awesome deadpan note at the end: "Learn to avoid these Hindi Bad Words in your Hindi conversation." Thanks for teaching them first?!


These are brilliant:

Gaand main lassan: Garlic in ass

Is this an insult or an old medical remedy?

Lundoos: Born into this world from a dick

You really have a word for this?

Toto: Penis

There must've been some giggles in the 80's music scene.

Teri Jhanten Kaat kar tere mooh par laga kar unki french beard bana doonga: I will cut your pubic hair and stick them on your face and make a goatee on your face.

Ohkaay


Those are some wonderfully imaginative insults. I did like that near the end, after endless filth, they've put "Why are you boring me with this useless narrative?"


You may want to look at slang as well. I tried both "knob" and "bell end". It said both were safe. Maybe they may be safe-ish in the US, but in Britain they are definitely slang words that could result in quite a chuckle if you named your app that.


Or "root" in Australia.


Quick, somebody point me in the direction of some witty australian UNIX jokes.


What did the sysadmin buy her wife for Valentine's Day? A rootkit.

(Just made that one up... Not genuine Australian. Also, I can't tell jokes.)


well your jokes certainly fit.


There's an area of San Francisco that styles itself as 'the tender knob'. Oh the mirth.


Can't edit my original comment anymore, so here's analytics update #2, with OS statistics as requested:

http://wordsafety.com/img/analytics_2015-08-25_2207_os.jpg

And browser:

http://wordsafety.com/img/analytics_2015-08-25_2207_browser....


Searched several spanish words, nothing was found. I guess no support for spanish yet?


One detail to consider about Spanish is that each country has it local variation, so there are some words that are totally safe in some countries but have a totally different unwanted meaning in others. For example "coger" means "pick" in Span and "fuck" in Argentina.

(If you say it in Argentina with a Spanish accent the you will not get into troubles, but the people will give a subtle weird look and someone will explain the local meaning later.)


It's definitely very important to realize that Spanish-speaking countries, while nominally speaking the same language, are very different. Spain has an entire verb tense that is not in Mexican Spanish. And unconfirmed example I have heard is pico de gallo. In Mexican Spanish it means something that's not too dissimilar from salsa, yet in some parts of South America, pico means penis. I heard there was an issue with a Nintendo DS game for children that revolved around cooking that assumed they were all the same and had a recipe for pico de gallo!

Edit: grammar/spelling


Every spanish speaker can understand and speak Neutral Spanish. The one taught in schools, universities and spoken in dubbed movies. In the same way every english speaker can understand and speak Hollywood English.

In a way, the dialects of spanish are even more regular than the dialects of english. For instance, the spelling is the same of everybody (regulated by The Real Academia Español) and almost everything is spelling phonetically, so the changes in pronunciation are quite regular. If you understand written spanish you can master any pronunciation just by learning a few regular rules.

There are differences in tenses and plenty of slang, but this can be sidestepped by speaking formally (where the are no variation in tenses and there is little difference in vocabulary).


>Spain has an entire verb tense that is not in Mexican Spanish.

Are you sure you mean tense? The verb has an additional form in each tense because of vosotros, but there are no additional tenses so far as I know.


It would probably be most accurate to say that it's an extra person.

Verbs can be inflected in various languages for gender, number, person [which can include degrees of formality, respect, or social distance], voice, mood, tense, aspect, ergativity [an alternative to voice], evidentiality [how the speaker knows that the thing happened], and other things I'm probably forgetting.


Yes, I know. As far as I know, Mexican Spanish has all the same tenses as European Spanish but lacks a second person familiar plural, and hence the associated form of the verb in each tense.


The first person informal plural (vos, vosotros) form of verbs isn't used outside Spain. Or maybe it's just not used in Latin America. Source: high school Spanish.


There are parts of south america where vosotros is used, but I think what the parent was getting at is that it is not technically considered a 'tense', but rather a 'person' as in e.g. 'third person'


In Colombia, the use of 'vos' means the person is from the region where sugarcane is cultivated, or the cities around.

That means Cali, Buenaventura and some others.


Yes, that's why I said "because of vosotros" in my post.


I thought it was used in Argentina at least?


It's not just Spain either... Chile uses 'vosotros' as a sort-of-weird version of 'tu' (with some dropped sounds) e.g. 'como estas?' becomes 'como estai?' and sounding like 'como etai?'

Also: tire / tire shop: llanata / llanteria (mexico), neumatico / (tienda de) vulcanizacion (chile). Corn: maiz (mexico), choclo (chile). Car: coche (mexico), auto (chile). etc etc etc

(I'm not latin, I just like traveling there, so dont take the above as gospel :)


How is this different from English? Another poster mentioned "knob" being offensive in the UK but it's an everyday word here in Seattle. (It means "door handle".)

How is the "coger" example different?


The joys of slang. You could just as easily use the sentence "I need to fit a new doorknob" or "Twiddle that knob there to turn the volume up" in the UK and not get any funny looks. It just so happens that if you said "Can I twiddle your knobs" to a group of sound engineers they might take it the wrong way.


(Valgarity alert for Spanish speakers!)

Indeed. "Coger" is not found, but more universal vulgarities such as "puta", "verga", "mierda" are found.


¿Donde coges el autobús? - ranges from practical to vulgar and confusing.


I once witnessed a Dominican telling a very recent Mexican immigrant to "cogelo suave". The look did indeed range from shocked to confused.


¿Donde coges la buseta?

Perfectly normal here, even more confusing in other places.


How do you say "pick" in Argentina? I've never learned a synonym for it.


"Recoge" and "escoge" should work (first one for grabbing things, second for choosing things).


Agarrar, tomar, levantar


I added the infamous case of the Mitsubishi PAJERO :P


And Nissan Moco!


what about the Chevy NOVA?


It's a neat idea, however, I think this sort of an effort can more effective if open sourced. In fact, it's probably better you open-source it before someone else comes along and does it. To clarify, I am not talking about the UX but only the lists themselves.

If you do go this route, my 2 cents are two keep it JSON formatted and maybe add a severity flag to each word for words like "git" which aren't so bad if you product is targeting non-english audiences while words like "fuck" are really bad irrespective of the audience you are targeting. Taken in conjunction with the population size of the language, this could generate a good score for word safety.


I will be happy to help with arabic words


Don't hesitate to submit on the site, it's very welcome! (I'm going to vet them manually when I have a moment.)


I've seen some words with bad meanings, like "cipa" means "polish: penis" according to your site, but it actually means "vagina" (quite the opposite ;) ). Is it good if I just submit better meanings ?


Hi thanks! This is really great! It would make me more careful to choose a name, and yes, naming is hard :))

Btw, no South East Asia language yet?

https://www.techinasia.com/dell-peju-leaks/


That's very cool. Interesting to see. I just thought that there would be a lot more returning visits. I know that I've used the site more than once, and now have it bookmarked in my "tools" folder. ;)


I added two common examples from the infosec and node communities: 'nonce' (which means pedophile in proper English) and 'gyp' (which is a pejorative for gypsy).


Bimbo (the bread) doesn't flag the bad English meaning.


I'd be intrested to see the browser and OS stats.


can you describe how you built your site, what technologies you are using etc.. thanks!


Dutch: http://www.taalkabaal.nl/scheldwoorden/indexa.php

Have fun. Some of these are spelled incorrectly, so run them through a spell-checker.

While I'm at it, let me translate some personal favourites. I realize they are quite long and unlikely candidates for the next hot SF start-up, but why keep knowledge away from the masses:

adderengebroedsels - offspring of vipers

argeologisch kontfossiel - archaeological ass fossil

bosuil - Strix aluco

duinbewoner - dune dweller

ebverzuiper - person who drowns during ebb (burnnn)

greppelheks - ditch witch

ingeblikte atlas - canned atlas (??)

janksnor - literally "crying mustache"

kamelenzuurvleesoog - literally "camels-Sauerbraten-eye" (???)

mountendeldarmbeklimmer -- literally "climber of Mt. rectum"

paashaasschaamhaarverzamelaar -- literally "collector of Easter Bunny pubes"

rioolpinguin -- sewer pinguin

tepelbaviaan -- nipple baboon

I have submitted several short and useful Dutch words to grant myself license for this comment.


I want to start a razor startup now just so I can call it Janksnor.


Dollar Shave Club... but for Scandinavia.

Or a brutally fast Handlebars implementation.


Wow.

American-English swear words pale in comparison.


We're also one of the rare languages that swears not only with sexual organs, excrement and bodily fluids, but also like to throw diseases into the mix. Actually surprised me when I learned this is uncommon in most other languages, it seems so natural :-)

In particular "older" diseases that are less common in modern days, like tuberculosis (tering-), cholera (klere/kolere), typhoid (tyfus), plague (pest-) and pox (pokke-) are popular as general insults, adjective or interjection.

"Cancer" (kanker) is also used a lot, but is considered almost universally to be in bad taste, because nearly everybody knows or has lost someone they know to cancer, whether it's highschool kids or in "polite company".

More modern diseases (Ebola/H5N1/Swineflu/SARS/etc) are being used as well, but mainly for comedic effect.


> bosuil - Strix aluco

Huh?


> kontfossiel

LMAO :')


"He had a computer that knew all the names of all the companies, and another one that checked if the made-up word meant "dickhead" or something in Chinese or Swedish."

-- William Gibson, "Mona Lisa Overdrive"

https://books.google.se/books?id=QojrNYyGMyEC&pg=PA118#v=one...


"bite" is safe according to this tool. Any french speaker does a double take when they see that word though, especially in certain sentences.

For reference: https://askafrenchguy.wordpress.com/2012/02/07/petite-bites-...

Edit: Yes, I have submitted it to the database, it should get vetted eventually. A crowdsourced process for the vetting could be fun too!


And 7 Eleven in Sweden managed to make it even more dirty by using the headline "Bite sale!", which apparently means "Dirty dick!" in french. As if that wasn't bad enough, the ad was actually meant to sell sausages. To kids.

Reference: http://z.cdn-expressen.se/images/a3/9b/a39b426464844e3ba6ef3...


"Oh thank heaven". This is so hilariously unfortunate. The funny part is that I really don't notice "bite" in this way if it's surrounded by English words. The tumblr blog listed downthread does nothing for me for example.

But if the surrounding words exist in French too, my brain invariably gets tricked into switching to French.


The mature side of me sees a great tool that will help to avoid unintended public image issues.

The immature side of me sees a great tool for picking immature names for online games that won't be flagged.


This was the first word I tried too. As a frenchman in the US I can't help but laugh every time I see "bite" used on a product. It's the most common slang for "dick" in french, everyone knows it. Juvenile, yes, but still hilarious. Examples: http://bitesubite.tumblr.com/archive



"zizi" too.


"Kekette" as well


In Dutch pipi is dialect for urination, often used when talking/referring to children.


It's "peepee" in English but you are likely to get a chuckle if you name your product "pipi". Back in 2006 Nintendo ran into this problem with the Wii ("wee" or "weewee" is also slang for urination eapecially among children.)


I always thought this was deliberate. You are telling me no one in Nintendo knew what Wii would mean to kids?


What it means to foreign kids. It's not surprising that a Japanese company wouldn't have that fine a grasp on English nuance.


I thought they chose the name to fit their inclusive, inviting target with the console. "Wii Play Sports" -- yes we do!


In Spanish it's exactly the same.

I wonder were the roots of that is?


In polish it's "psipsi" which is clearly an onomatopoeia.


Ditto in English, for that matter, although for various reasons we spell "i" with two "e"s.


In German as well, though I guess that's not very surprising.


"Verge" is similarily hilarious for french speakers


I remember a native French speaker was kind of disturbed by the sentence "où est ma chatte?", which is the sample French sentence Alice thinks of in Alice in Wonderland when she speculates that mice might speak French or Latin instead of English. (In the original story, it was disturbing to the mouse, too, though for a different reason!)

(Edit: this site's database recognizes "chatte" as a concern in French.)


I imagine "where is my pussy" would get a chuckle out of most anglophones today, too...


much less "bitenuker" which is highly offensive to those who speak either French or Dutch.


I'm Dutch and I know my offensive language. Never heard of that.


A Franco-Dutchman would pronounce it bete-neuker:

https://www.youtube.com/watch?v=Cqszqbd5cLg


It's a joke from 30 Rock.


It's not searching for 'sounds like', it doesn't think 'fuc' or 'fuch' are bad words.


So this seems to work for a very small subset of the words I typed. Also, it seems to only check against dictionary meaning and not cultural usage.

"Tatsu" means "to stand" in Japanese, but is culturally used for erection. This is just an example, I tried a bunch which I know and none were flagged.


"Tatsu" is the verb stem and would not be offensive alone; in fact, like most Japanese words, it has plenty of homonyms including 竜, "dragon".

https://en.wiktionary.org/wiki/%E3%81%9F%E3%81%A4#Japanese

Flagging that would be kind of like flagging "hard" in English. Could it potentially be offensive in the wrong context? Yes. Are there brands where that would be absolutely fine? Yes.


I think this is a case of 'the product is YOU!'

notice the little box 'enter word here'? :P


That box doesn't seem to permit adding cultural referents / context. If we added all normal words that can have another meaning, it'd flag almost everything.


I saw this and immediately thought of an old story where General Motors tried to sell the Chevy Nova in South America. It hardly sold at all in South America even though it was a hugely popular car in North America. The reason turned out to be that "No va" in Spanish translates to "won't go" so GM was basically trying to sell a sporty car with a misleading name.

Unfortunately this website wouldn't have helped GM sell the Nova since it's only looking for profanity, but I think that the concept is great and clearly needed. I hope you develop it further and get to make some money off of it. Great job!


For what it's worth, the "Nova" story is a myth: http://www.snopes.com/business/misxlate/nova.asp


Here is a whole list of cultural misunderstandings of car manufacturers http://www.oddee.com/item_93544.aspx


That was the first thing I attempted to search for and the output from the site is:

Spanish: did you know that the popular anecdote about the Chevrolet Nova is an urban legend? Google for "snopes nova".

It's a nice detail.


But I guess it sold like hot cakes in Brazil, didn't it?


One of the most recent corporate mishaps I've learned of is Microsoft Nokia calling their phone "Lumia", when in Spain "lumi" or "lumia" is an informal word to mean "prostitute".

Your app doesn't reflect that. I was going to say that you need to source slang dictionaries, but this one is in the Diccionario de la Real Academia:

http://buscon.rae.es/drae/srv/search?val=lumia

And even in the first random online bilingual dictionary Google threw up:

http://diccionario.sensagent.com/lumi/es-en/

So maybe do a bit of scraping/spidering of multilingual dictionaries, starting with your collected list of bad words?


I think Wiktionary is a great and underutilized resource. Fairly good coverage, free, and easily amendable. There is in fact an entry for lumia: https://en.wiktionary.org/wiki/lumia#Spanish

It's actually the basis for a site called The Alternative Dictionaries which features "colorful extracts": http://www.alternative-dictionaries.net/


I'd be really interested in knowing how do they do the phonetic matching. Things like, the nonexistent English word "bocket" sounding like Brazilian slang for blowjob ("boquete"), but only when spoken the way a Brazilian would.

I think this cross-pronouncing thing would actually be harder to tackle: It's more important to try to match the way users on their home locale would say the foreign term, than the way the foreign people would say it.

To illustrate what I mean, consider the word Skype, said in Portuguese, is pronounced as if it were spelt in English as "Shkuipy" (I mean [ʃkaj'pi]).


I'd be really interested in knowing how do they do the phonetic matching.

Honestly, the code for that sucks. It just looks for specific variations of letter combinations.

I guess a more robust approach would try to build up a real phonetic representation of the word, then apply various languages' orthographic rules to that to check for matches.


I'm no expert on phonetic matching, but a product I worked on many years ago used Soundex. (It's meant for English pronunciation, so you'd have to research other languages)

https://en.wikipedia.org/wiki/Soundex


Soundex is optimized for collapsing the spelling of names into a common key and isn't so hot for general words. Metaphone would be a more useful matching algorithm. It also preserves a legible spelling so that you can pass the result onto further fuzzy matching stages like an edit distance measure.

https://en.wikipedia.org/wiki/Metaphone


It's the same with MR2 in french sounds like "shit",Toyota changed the name of a car to MR in France and Belgium.


Emm Are Two == Merde?


Nah, it actually sounds more like 'emmerdeur'

https://en.wiktionary.org/wiki/emmerder

A bit of trivia, there's a radio that is called NRJ which sounds like 'énergie' when you say it (in French)


It actually sounds more like "eh merde !", which would be "oh shit!".


Nobody, to my mind's got this quite right yet so I'll give it a bash...:

MR2 = Ehm Air Duh ~~ Merde :) cuz the middle e is like the ai in air (near enough) and the last e is like the uh in duh (near enough) - Not a native speaker so caveat enunciator


It sounds like "Eh merdeux", which in French means "hey filthy" in a bad way, "un merdeux" is someone who's covered in shit.


Not really good french from me but is "emm er deux" which sounds similar to "a merde"


Emm Err Deux


Another interesting example for a soundex marching would be Coca Cola's failed energy drink "full throttle", which rrsembles the german " Volltrottel" (complete dumbass)


The lack of phonetic matching is a problem - the most (in?)famous example of this was the Chevy Nova, which phonetically sounds like 'doesn't go' in spanish.


Regarding the Chevy Nova, I always found that story very hard to believe. While it may seem plausible for someone who does not speak Spanish, someone who does would quickly note the fact that "no va" is pronounced /no'βa/ (stress on the second syllable) while Nova would be pronounced /'no.βa/ (stress on the first syllable).

These two sounds would not be perceived by a Spanish speaker as being the same.


A Spanish speaker would certainly notice that the two are similar and if they had the right sense of humor they'd intentionally mispronounce it.

EDIT: Though I'm apparently wrong and this is a myth: http://www.snopes.com/business/misxlate/nova.asp


Yes, I've heard that saying "Nova" is read as no va is like saying "notable" is read as "no table".

(of course, the classic Spanish screw-up is the Mitsubishi Pajero, which has no excuse whatsoever: that's unambiguously the Mitsubishi Wanker)


I always thought that someone in Mitsubishi named the Pajero knowing exactly what it meant in Spanish as a joke.


It doesn't do phonetic matching. A big flaw.

Baca returns nothing, Baka returns 'idiot/er in Japanese'.


Obligatory Arabic story of the last few years: the Pakistani ambassador who was rejected for service in the Gulf, because his name is Akbar Zib.

This is Arabic for biggest cock.

https://encrypted.google.com/search?hl=en&q=%D8%B3%D9%81%D9%...

No reputable source mentions this; only crap newspapers in different countries mention this. But every person I know in the Arab world knows this internet meme.

Almost as bad as naming your son after a deceased Libyan dictator, before he was dead of course. Perhaps I know one of those too. Talking about parenting jokes.


A former Finnish ambassador to Cairo had the first names "Aapo Esa". As classic/written Arabic has neither "p" not "e", this turned into "Abu Isa", meaning "Father of Jesus"!

This wasn't actually quite as hilarious in Arabic as you'd think, since "Isa" is a fairly common first name and "Abu X" is a standard way to refer to a man with a son, but definitely good for a few chuckles back home...


There's a guy on Israeli TV named גיא פינס--pronounced "Guy Penis." How's that for a name?


The classic example of this is the Ford Pinto. That was a naming mistake. (Try it)


"expertsexchange" only matches the Chinese “cha”, and not "sex" or "sexchange".


Coincidentally cha1 ("1" here means the first tone in pinyin) is a sexual innuendo alluding to sexual intercourse. The character itself, which means "to insert", has nothing to do with sex in normal usage though. And most importantly, there is virtually zero resemblance between the pronunciation of "change" and that of "cha1".

> to have sex (lit. insert).

> Source: a long list of Chinese profanity on Wikipedia.[0]

[0] https://en.wikipedia.org/wiki/Mandarin_Chinese_profanity


That reminds me to the name "KIDSEXCHANGE". [Picture of the store](http://www.webdevelopersnotes.com/blog/blog-images/kidsexcha...).


Yeah, it seems like it either doesn't match english words (probably because it assumes you already speak English since it's an English-language web site), or it is only looking for specific types of matches (swear words, etc.). I tried merder.com, since I know that "merde" is french for "shit" and it sounds like "murder.com". It caught "merde", but didn't catch "murder". I then tried "murder.com" and it was like, "Yep! Looks good! Go for it!" (I may be paraphrasing.)


Tried the Spanish word "pajero" (usually translated to "wanker" or "tosser"). Mitsubishi named a car "Pajero" and they had to change the name to "Montero" in Spanish speaking countries.

Another unfortunate car name is Suzuki Moco. This word neither appears in this app. "Moco" means "snot" or "booger" in Spanish.


I have found it's important when emailing/texting in Spanish to put the tilde in 'año', when I normally wouldn't bother with an American keyboard.


US-Based native Spanish speaker here. Very important indeed, specially when asking someone their age :) Whenever the ñ is not available (foreign keyboards, mobile, etc...), a decent substitute is "anyo", which is phonetically equivalent and also happens to be the Ladino variant of the word.


Agno is safer (same idea as in 'gnu'). "Anyo" is close phonetically to "anillo", that is a ring, but also a common source of jokes about little Frodo's sexuallity.


I think Catalan people don't have the "ñ" letter in their alphabet and they use instead this combination "ny" for that particular sound.


Well, catalan people have the ñ letter in their alphabet because they are spaniards, is just that some of them prefer to ignore this for political reasons, and they choose instead to do simple things complicated.

Both systems are a question of convenience, so no one is perfect. You can choose between the useful (and trendy in two or three spanish communities) "ny" or the older "gn" charged of historical context and showing lots of connections with other latin languages like french. Be aware also that "ny" leads easily to a "nll" sound that can be annoying, specially when is placed next to an 'i'.

This is just a personal opinion. Is perfecty ok if you think different, but if you are interested in mastering the second language with more native speakers in the world, I'll suggest to avoid the political experiments of the modern catalan and save yourself a lot of future headaches with grammar and orthography. To replace the 'ñ' by '~n' will work in most of the cases also.


And this, kids, is how a Spanish hater looks like.

If anyone is interested, Catalan[1] is a Latin language spoken in Catalonia and other areas of the North East of Spain, South of France and a city in Italy. It is also the official language of a country: Andorra.

It has about 10 million speakers[2], a bit more that Swedish, that is an official working language in the EU.

Its use has been forbidden in Spain during 300 years[3] -until the death of the Spanish fascist dictator Francisco Franco- but, even though it has managed to survive until today.

And there's a top level domain, .cat[4], that is intended for websites that use Catalan language.

Also, and we don't use Catalan because we are doing political experiments, or to annoy Spanish people. We use it because, for some of us, is our native language!

[1]: https://en.wikipedia.org/wiki/Catalan_language [2]: https://en.wikipedia.org/wiki/Languages_of_Europe#Number_of_... [3]: https://en.wikipedia.org/wiki/Catalan_language#Spanish_state... [4]: https://en.wikipedia.org/wiki/.cat


And this, kids, is one of the many problems with some Catalonian people. Persecutory delusion. "Spain hates us".


I was answering to you, not to all Spain.

And I did because I think that your comment stating that Catalan language and culture are a "political experiment" is not only offensive to Catalans -and anybody a little common sense- but also shows that you have a huge lack of understanding about Spanish culture and history.


Pretty cool. And the example I chose, funny.

I searched "foda" (f--- in Brazil) just to try it out, here's the result, which actually mentions the Amazon. http://i.imgur.com/CUpSgN6.png


You don't have to go very deep into the Amazon at all for that to be an insult. :-) What an amusing example.


There needs to be a "No results found" type result. Currently it's impossible to tell the difference between no result and an empty result.


Interesting, I recently found out that one of my usernames on a popular game means "shitman" in finnish or swedish, not through this site though (and it doesn't appear to bring anything up). Great idea though!

Also the word search field has an XSS, try entering "<script>alert(1)</script>". Not sure if it's a big deal but it's good to be safe.


I believe you can make a good use of http://www.urbandictionary.com/ to update your database, because there are also lot of foreign words, which you miss at the moment. (pula - Romanian for dick, fasz - Hungarian for dick, piča - Czech/Slovak for cunt, etc.)


They have an unofficial API[1] that could be used.

[1] http://api.urbandictionary.com/v0/define?term=culo


Suggestion for improvement: numbers.

I was about to name a project "Plate88", and a couple of people independently pointed out a reference to the nazi salute [1].

We then renamed it to Plate28, hopefully safer.

Anyway, maybe worth considering these use cases.

[1] https://en.wikipedia.org/wiki/88_(number)


When my son was about 5, he picked up on the fact that the number 88 sounds a lot like "idiot," so he got a lot of pleasure out of yelling "88" all the time. It's become part of our family folklore. I take a lot of pleasure associating white nationalism with the word idiot now.


Which brings us to the second part of naming things. I wouldn't tell someone in NASCAR country that 88 means a nazi salute unless you really want to offend.


Exactly -- we want to do a healthy/educational thing, imagine our surprise knowing that...


I guess I would say the bad meaning of 88 is not very relevant and certainly shouldn't stop a company operating in Asia from using it. Heck, a golf company using 88 would mean double snowmen rather than any dark meaning beyond a bad score.


It's great :-)

However, could I Suggest there are two markets for this and you might be falling between two stools.

Firstly there is the market we have here - looking for dirty words in different languages. I love the petites bites ad - and it would be great to have a crowd sourced "daily WTF" site of amusing failures

But the usage of your site looks ... Serious, with half an eye maybe on charging marketing departments for access. Which is almost impossible because no sane database can catch MR2.

But if the entertainment site catches on, you have a ready made list of reliable dirty-minded experts whose private opinions and double entendres you can charge marketing depts to put their ideas in front of them - confidentially ensuring they don't screw up. And given that the number of language to language potential screwups is n^^2 and the experts are n you should be ok.

Anyway - it's a lovely idea and reminds me of Douglas Adams' "Go stick your head in a pig".


"Results for “Hello”

“hell”

English: heck

Direct match at start or end, potentially serious issue! 335 million native speakers, about 1.5 billion speakers in total."

Isn't that overdoing it a little?


False positives are far more useful than false negatives.


Excellent idea.

Given the number of people here who missed the "add a word" function, you might want to mention it up top with a negative result.

> No results found for this search. It looks likely that it's safe...

say this in hopes to get fuller coverage because it looks like a useful tool.


I updated the "no results" text, thanks!


Yeah, also, a classic one we had here in Sweden:

Honda released this new small car model named "Honda Fitta", and one of the slogans were something like "small on the outside, large on inside".

In Swedish, "Fitta" is a crude word for female genitalia.

The car was quickly renamed "Honda Jazz".

A link to a Swedish article, translated by Google:

https://translate.google.com/translate?sl=auto&tl=en&js=y&pr...


I was going to post something about this myself, but decided it was still unverified. The Auto Motor & Sport article was as far as the trail took me. It quotes the Dagens Nyheter, which in turn quotes an anonymous Japanese car magazine, whose article may or may not be true (if it exists at all).

This has the hallmarks of an urban legend: Japanese car manufacturers do have previous form with the Toyota MR2 and Mitsubishi Pajero, so the story is plausible; there's a moral here; and the story has not been followed all the way to its original source.


There's a company here in New Zealand called RaboDirect. Rabo means "tail" in Portuguese (and in Spanish maybe?). But it can also mean arse or ass. In most cases I'd think people were talking about arses than about tails. This was not caught by wordsafety :/

Not sure if you are breaking up words. I remember a friend told me about some algorithm that works for that, and is used by German linguists... not exactly stemmers, but there was another thing that could be useful too.

Anyway, thanks for sharing! Looking great.


I've submitted it but it brought up nothing for "wog". Did this due to the "wogrammer"/"wog" issue pointed out on Twitter yesterday. I haven't heard it in recent years but "wog" was used much like the n-word when I was a kid in the UK and appears to still carry some of that meaning: https://en.wikipedia.org/wiki/Wog

"slope" is another, but I don't know if I'll submit it since no-one seemed to have heard of it at the time. Jeremy Clarkson got in trouble over it though - http://www.telegraph.co.uk/motoring/top-gear/10995483/Jeremy...

I absolutely love the idea of having a newsletter on the backend full of awful words submitted though - a rather "cute" idea.


I looked up "Bora". No results. The Volkswagen Bora was somewhat famous in Iceland because "Bora" means "Anus". You could drive the Volkswagen Anus! The vendor went out of its way to mispronounce the name in their TV ads as "Bóra", which would be like pronouncing "Anus" as "Aneece" or something like that.


As we are in the subject, does the brand names "Dickies" "Dick's Sporting Goods" or even the character "Dick Tracy" seems awkward name choices for US people/native english speakers?

As a ESL, the first time I heard of them it was kind of funny. I guess when one grows up in the context of those names is not that appalling...


You're exactly right. Growing up with the words removes them from being recognizable as dirty. Note also, Dick is a name (or nickname for Richard). Language is weird sometimes...


This is great! There are certain puns/combos that would be really hard to catch though.

For example - the Ford Nova. In Spanish, it literally translates to "no va" "doesn't go". Terrible name for a car.

I'm unsure how to make this understand the context of the product, but that would be the next step


I've always been pretty skeptical about Nova in particular. It's like an English speaker going to a restaurant named "Notable": we wouldn't expect to have to eat on the floor.


"Colgate" similarly passed. In Spanish, it's the imperative command "Hang yourself".



But "nova" in Portuguese means "new" so I guess Ford was in the safe side with that product name.


This is really a shame, I would hate to see unintentionally awful things in corporate media go away!


How do you account for spelling?

In Fijian, "caita" means "fuck" or "fuck it", but the word is pronounced "thaita". From a fijian perspective, seeing both "caita" and "thaita" would bring the swear-word to mind.



It would be really hard to capture this kind of double meaning that only applies to a certain product category... "Doesn't go" is certainly an unwanted association for a car, but it wouldn't matter for most products.

To make it more complicated, "nova" actually has the same astronomical meaning in Spanish as in English:

https://es.wikipedia.org/wiki/Nova

So it's fine in some contexts, bad in some very specific context. Someone smarter than me will probably crack problems like this with AI...


Did you read that link?

> This has since been debunked.

http://www.snopes.com/business/misxlate/nova.asp



It would be nice this detected double meaning phrases, although it might be hard to implement. In spanish many combinations of 'safe' words will generate very 'unsafe' meanings, probably many other languages too.


Italian one needs work. It did not have 'minchia' in it, which is no longer even all that regional, AFAIK. Didn't have 'mona' either, although that could be foregiven as it's dialect in the Veneto.


italian is definitely lacking. it doesn't recognize STOCATSO nor ESTIGRANQATSI nor euphemisms such as patata, passera..


If only Miyazaki would have had this when he named Laputa: Castle in the sky


FWIW, he got the name from Jonathan Swift:

https://en.wikipedia.org/wiki/Laputa


Who... apparently didn't speak Spanish.


>As "la puta" means "the whore" (see Spanish profanity), some Spanish editions of "Gulliver's Travels" use "Lapuntu", "Laput", "Lapuda" and "Lupata" as bowdlerisations. It is likely, given Swift's brand of satire, that he was aware of the Spanish meaning. (Gulliver, himself, claimed Spanish among the many languages in which he was fluent.)

https://en.wikipedia.org/wiki/Laputa


Do not trust yet - list is not complete. Some known Polish bad words are not there.


Phonetic does not work also. Tried qrwa, qurwa, kuhrwa,. Nothing. They also find hui (russian) but not huj or chuj.


Yeah, I've had some fine time trying to remember most of polish swear words. But it will probably never properly take into account the variety of word "pierdolić".


Just tried the word "crotte" which means "dung" in french. It returned "No results found for this search. It looks likely that it's safe." Phew, it's a good thing I speak french :)


Maybe it should include English words as well?

Beaver returns nothing

'caca' and 'pede' do return, so props for that

'bosta' matches but maybe Josta should?

'sharmuta' also matches (with a different spelling, but it's probably a variation)


Yes, it seems to be missing a fair bit of English slang, like "fanny", "root", "gigolo"...


False positive: "hat" was flagged for "idiot" in English. I can't find a dictionary with that definition, and it's not one that I'm familiar with as a native speaker.


Is a start. How about death and failure type words? Muerta doesn't elicit any warning. And 'Nova' is famous for its Spanish meaning (apocryphal?) - words like that might be hard to catch.


It's reasonably good - I tested it for Polish swearwords, it misses some no-nos, such as "alfons" (which stands for "pimp"), I actually submitted this :)

It reminded me of this major company https://en.wikipedia.org/wiki/Osram whose name happens to be "future tense conjugation of verb >to shit<" (a spot-on match, and the website correctly detects it) :)

It was founded in 1919 when the website didn't exist yet


Found a hole on my first try. 'Foda' in Portuguese is 'fuck', yet the app deemed it to be safe. 'Foda-se' (fuck you) wasn't there either - submitted both.


I've been blessed to name my new startup "Phucker."


"phuck" works as well


> We respect your privacy — input is never logged or monitored.

Except it is logged and monitored by the US Government because this site uses unencrypted HTTP.

Otherwise, a commendable sentiment…


Anyone find any real-world matches? I was able to get positives by typing in foreign curses directly, but couldn't find any startups with foreign curses in their name.


Not a startup, but yesterday I was impressed with the font Skolar, which name I mistakenly remembered as Skola, and apparently it's considered close enough to "Chola", which is Hindi for clitoris.

The "level of worriness" does go down (pun intended) when I type the font's correct name.


"airbnb" results in

  “ayr”
  Arabic: penis
The only one I could find, so just bad luck :)


Wikipedia was the closest I found, not really a start up though.



Doesn't catch "tineh", which is apparently a derogatory term for Indians http://www.urbandictionary.com/define.php?term=tineh, so probably wise to also check UD and Google.

I was going to have that as part of my company name until I discovered that (tineh is a transliteration of the word for "fire" in Irish).


'Results for “nova”: Spanish: did you know that the popular anecdote about the Chevrolet Nova is an urban legend? Google for "snopes nova".'

I liked that.


It doesn't list the word "Belgium". Even though according to Douglas Adams that's the most offensive word in the galaxy. :)


This would sell much better as baby name checker.


Where was this when these guys were picking their name?

http://pidora.ca/


155 million native speakers.


Aren't you the creator of Pixel Conduit? I used your tool for some VFX works and recently saw that you are creating software for web animations, but it ia a surprise to see you come up with such a tool. By the way, I am planning to create a slang database for Turkish and Turkic languages. I would like to share the database with you as I develop.


It seems that "fuk" is not recognised as sounding like "fuck". Might want to look at that phonetic matcher :)


While it could help some people around to know about it: http://www.tripadvisor.fr/Restaurant_Review-g298113-d4906941...


I think this should also do a phoneme based comparison, for example the photo sharing website Flickr is pronounced like this word (as a slang word it is well known) https://en.wiktionary.org/wiki/flikker


Do you know of any high-quality phonemic dictionaries?


There is such a thing for English, the CMU Pronouncing Dictionary.

Very, very useful, although specific to American English.

https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary

I've succeeding in using it to solve puzzles that had to do with how words were pronounced, as opposed to how they were spelled.


Or wix.com, which sounds like wichs in German (wank).


Excellent idea, but the lists need to be expanded. Didn't catch any of the Slavic words I threw its way.


Is Russian really supported? I tried both Cyrillic and Latin transliteration of some words and it reports it's all safe. For example, try any of the Russian synonyms for "shit". "говно" transliterates to "govno" and both check out as safe.


Submitted bunch of Nepali swear words. People in Nepal will laugh out a loud when English speaking people talk about renting a Condo or generating a Rand number. Here are few /swear/ words in Nepali (Spoken by ~30M people)

Chick

Goo

Moose

Turi

Condo

Moot

Rand

Chalk

Lado

PuT

Fuse

If you are wondering what these words mean check this:

http://www.youswear.com/index.asp?language=Nepali


Turi and Lado are not words in English? (Unless I'm very much mistaken). All the other's are, bar Rand, but that is a variable name beloved of coders!


Turi prefixes Turing (which means the act of pissing in Nepali) and Lado is popular in Spanish i guess.


Sounds like a good reason to not open a business in Nepal.


One of the funniest in Spain: facultad (faculty). An innocent word, with a very common and poisoned abbreviature. "Nos vamos a la fac" (We go to the university). English speakers always hear another thing. Totally homophone with the called "f-word"


I added "cu" (Portuguese for "ass"), but I'm not able to get it to match. Guessing there's a size limit?

I wonder if it makes sense to add "ku" as well? When I was a kid, I'd always giggle at the American candy bar "Kudos."

This is really cool....

Kudos.


No results found for this search. There's a reasonable chance that it's safe...

But you can never be quite sure. There are over 6,000 languages spoken in the world. Somewhere deep in the Amazonian jungle, "anal" could be an insult that gets you killed.


not only that (incomplete coverage of languages) - words can also have phonetic similarities to vulgarities without the same spelling.


It wouldn't have saved Mitsubishi: https://en.wikipedia.org/wiki/Mitsubishi_Pajero

But I like the idea and will submit a few words


Missed a few of the ones I tried:

"zina" "eir" "khuee" "popa"


I use a similar method to find brandable names and domain names. I'll input a word or synonym of into Google Lang Tools and translate it into all the available languages. I've had really good success in doing so.


So there's a restaurant near me called "Pho King". It's pho king delicious. Anyway, this tool doesn't know how 'pho' is pronounced in Vietnamese because it didn't catch it.


It literally fails on the first word I tried, the/a Dutch word for 'penis': http://imgur.com/Mw6Efx0


If you are still here, it would be useful if we could link to a word, e.g.: http://wordsafety.com?word=dimwit


This seems like a nice idea, but many of the comments here are pointing to omissions that the people commenting feel could be serious.

I guess I see six kinds of potential difficulties:

① There are so many languages out there, including languages will millions of speakers who might eventually come across your thing. Maybe that's not an issue for tangible products that will be marketed to specific territories.

② There are so many slang terms out there; each individual language might well have thousands of terms that have a rude, sexual, or excretory meaning, or that are used as a slur against some group. Also, some languages have expletives that don't correspond to expletives in other languages. https://en.wikipedia.org/wiki/Quebec_French_profanity

③ People have pointed out that phonetic matching is hard when you're dealing with different languages' orthographies and phonologies, and you can have the problem of "the source language's intended pronunciation sounds like an offensive word in the other language", "the other language's likely pronunciation of the written term sounds like an offensive word in the other language", or even other combinations among highly multilingual populations. "Sounds like" is sometimes challenging to automate in software, for example because epenthesis of a vowel may not be enough to remove the association. (But I think Levenshtein edit distance between phonemic transcriptions can kind of sort of work.)

Also, the "MR2" example someone gave shows that understanding how people will pronounce something in different languages is complicated: you have to know that the number two in French is "deux" /dø/.

④ People might also perceive something as an offensive reference that isn't even familiar to people elsewhere, like a reference to an upsetting person, place, or event. Reportedly some people in India have named people and businesses after Adolf Hitler just because he was famous, for example. I bet it's easy to do this cross-culturally in general.

⑤ As people point out with the Chevy Nova story, there might be a reason why a product name would become the target of ridicule in a particular language even if it's not offensive. That's true even if it didn't literally happen to the Chevy Nova.

⑥ It might even turn out that the space of offensive references is so dense that there is nothing that isn't a near-homophone of something pretty offensive in some language.

Anyway, I think this project is really neat; I'm just reminded from people's comments that natural language is hard! There's scope to keep expanding this site, and I think there are also existing "cultural consultant" businesses that try to deal with these problems through human review (I wonder how many of them have consultants on contract from many widely varying cultures, which seems especially useful in the Internet era).


A funny page that might be support for point ⑥ is

https://web.archive.org/web/20071112172929/http://members.ao...

If you can take an arbitrary word and find a coincidental homophonous synonym (or antonym) in some other language, you can probably find a coincidental homophonous expletive or sexual reference in some other language too!

(These coincidences are not cognates -- that's part of what "coincidence" means here.)


I'd have expected a small easteregg for the word "bro" ;)


Searched for Tina but it found nothing, in North Africa it means pu$$y,


What are we, five year olds?

Pussy.


You seem to have a good grip on Hindi words :) This is a good idea. Last I heard, Accenture spent good amount of $$$ to verify the name in several languages before fixing on that name!



It will be great if the program checks for phonetically similar words as well, currently, looks like it doesn't. Bhat (India), pronounced like 'butt' is not flagged.


It doesn't match some balkanic or eastern european words.

Some of us don't appreciate tracking links, please get rid of tracking so we don't have to manually edit the urls.


I tried Tessa and it means "to pee." How come it is not on the list? I know it's not a swear word, but it's still a name with negative connotations.


There is a form to submit words that are missing.


Funny, it gets Siri, which has to be katakana-ized specially, but it misses both ketu and ketsu. What kind of database are you using? This is a great idea, by the way.


Results for “home” “ho” English: woman Direct match at start or end, potentially serious issue! 335 million native speakers, about 1.5 billion speakers in total.


Misses "matasano" -> "quack doctor".


The phonetic matching doesn't catch "Phuck"


It's missing rather a big one. "Mist" means something very different in german.

My point: Is this connected to dictionaries or is it all crowdsourced data?


Isn't 'mist' like 'darn' in English?


"Darn" is a minced oath. Mist, simply means "crap" or "manure". A pile of manure is called "Misthaufen" (crap heap), a pitchfork for handling manure is called "Mistforke" or "Mistgabel" (crap fork). It's a bit vulgar but so is the subject matter.

But you can also use it as a general expression of discontent: "Mist!" is something you might exclaim if you just dropped something expensive and fragile. It's still vulgar but it's something you'd find far more appropriate around the young ones than the harsher "Scheiße!" (shit).

It's basically the little brother of "shit" in the same way as "dumb" is the little brother of "retarded".

You might legitimately hear someone curse at the "Mistkatze" (Katze = cat) that just peed on the bed, or the "Mistauto" (Auto = car) that refuses to start or simply shout "So ein Mist!" when they find out they've spent the past hour aligning the wallpaper upside down.

The equivalent of "damn" would be "verdammt" and there are minced oaths for that in German as well (although nowadays they're generally considered cute and not something you'd use only because you're mild-mannered or religious).


Man, don't put "dumb" anywhere near "retarded". I don't want another literal user for a word (retarded=delayed, dumb=mute) to disappear in the name of political correctness.

To quote from TGOTG: "This dumb tree, he is my friend." Groot is dumb because his is effectively a mute. Drax was not commenting in any way on his intelligence.


http://www.dict.cc/german-english/Mist.html

Mist {m} [ugs.] crap [vulg.] rubbish [esp. Br.] [nonsense] bullshit <BS> [vulg.] tripe [coll.] garbage [fig.] bull [derived from bullshit] [Am.] [vulg.]


Ah good, it matches ‘perse’, which was a slight amusement when seeing the expensive clothing brand ‘James Perse’ while visiting the US :)


While you have this on the front page, why not start a repo where people can send pull requests, rather than submit one word at a time?


This doesn't work for Esperanto. Fiku vin!


I would have expected "Coq" to fail.


I would have expected Cox Communications to fail.


Only if it could be formally proved wrong...


FYI this doesn't check for words that sound like swear words. For example, it'll detect Fuck but not Fack.


With the input box I suspect they want the database to be crowdsourced. I wonder if they vet the input, if we all try hard enough might we get hackernews into the swear list?


I searched "gift" expecting "poison" from German, and was disappointed when nothing came up.


Missing Zune which means Penis in Quebec.


It didn't catch anything on "phuck," despite claiming that it checks phonetically.

Still a nice idea, though.


Doesnt detect any greek curse words


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: