
Regular expression to search for Gadaffi - marcog1
http://stackoverflow.com/q/5365283/89806
======
colanderman
Easy... (Qadaffi|Khadafy|Qadafi|...)... it's self-documented, maintainable,
and assuming your regexp engine is actually a regexp engine, will compile to
the same DFA that the obfuscated garbage in the SO solutions will compile to.

Writing compact regular expressions is like using short variable names to
speed up a program. It only helps if your compiler is brain-dead.

~~~
true_religion
> assuming your regexp engine is actually a regexp engine

There is no guarantee that two regex that match the same corpus will compile
to the same DFA or even compile to a DFA at all. Do not confuse regex with a
regular expression.

The answer isn't to turn against any of these variation of regex by deriding
them as "compact" but instead test to see if this is a bottle neck, and try
out a non-self documenting variation if it is.

~~~
colanderman
You are right, I should be more precise with my terminology. By "regexp" I
meant "regular expression". Regardless, neither term implies implementation as
a DFA, however regular expressions, and the subset of regexps which are
equivalent to them, can and should be compiled to an optimized DFA.

------
yannickmahe
Isn't it strange that there is no official transcription for Arab names?

Chinese, Japanese and Korean used to have a transcription by language (i.e. a
transcription for the English language, another one for the french), but that
got standardized, and now there is only one transcription for each word
(except of course names that became really common: Peking in English, Pékin in
French, Beijing in the now official pinyin transcription). Is it because the
Arab language is separated in so many countries without a centralised system
to "manage" the language?

~~~
bendmorris
Chinese has only one transcription for each word? I guess you've never been to
Taiwan. It only seems that way for Chinese because PRC is so much bigger.
Taiwan uses a different romanization system.

I imagine the difficulty for Arabic would lie in the fact that there is no
major player like China in the Arabic world.

~~~
autarch
Actually, if you look at street signs in Taiwan you'll see that the use
multiple Romanization systems, often in the same city!

I think it was in Kaohsiung that I saw the same character Romanized two
different ways on two different streets.

I'm not sure, but I think they're using some combination of Pinyin, Wade-
Giles, and possibly something else.

Even worse, I think some of the Romanizations used in Taiwan are just made up
on the spot. My wife's name should be written "Hui Ling" in Pinyin, but for
some reason her English teacher in Taiwan (who was Taiwanese) told her to
write it as "Huey-Ling". So now we get phone calls asking for "Huey" all the
time.

------
edw519
(G310|K310|Q310)

<http://en.wikipedia.org/wiki/Soundex>

I wouldn't expect too many false positives.

~~~
bhousel
That's actually a pretty good rule. But I checked the list on Stack Overflow
and we might also need to include: Kazzafi - K210 Qadhdhafi - Q331

K210 a bit more problematic, since it matches names like 'KOSOFF'.

~~~
m_myers
Me: "Kazzafi"

Google: Did you mean "Gaddafi"?

Hmm, I think I have a potential solution...

------
fedd
while the regexp would find Ghaddafi in all latin script languages even if
it's not understood, then let me suggest my Russian as well :)

Каддафи

btw, it's always written only this way, to the opposite of the Osama bin Laden
situation, when some emigrant sources name him as Осама бин Ладен instead of
the correct Усама бен Ладен.

------
marcog1
Also see
[http://upload.wikimedia.org/math/6/1/f/61f34aa25871e9546b6a1...](http://upload.wikimedia.org/math/6/1/f/61f34aa25871e9546b6a11243e1bed31.png)

------
pbhjpbhj
From the SO page: "I just thought that if the arabic transcription says
Qaḏḏāfī, the regex should check for Qaddafi too."

Would any of these searches find an article that solely uses a phonetic
spelling with diacritical marks as opposed to a "standard" alpha only
transliteration?

I guess it depends on the engine used - in which case the question is do any
regex engines do such automated conversion?

------
Mafana0
There are over 32 accepted ways of spelling Mu'ammar al-Qadhafi. Here's a
dance song to help you remember just a few of them:
[http://www.buzzfeed.com/danilic/gaddafi-the-duck-sauce-
remix...](http://www.buzzfeed.com/danilic/gaddafi-the-duck-sauce-remix-1o3k)

