

Homoglyph Substitution - jmhobbs
http://www.velvetcache.org/2014/08/29/homoglyph-substitution-for-urls

======
ggchappell
Worthy thoughts. But you should think of the Icelandic eth ("ð") as a
variation on a "d", not an "o". :-) Not that I know any Icelandic, but a
capital eth looks like this: "Ð".

More info:
[https://en.wikipedia.org/wiki/Eth](https://en.wikipedia.org/wiki/Eth)

EDIT: As for "クッキー": dealing with this is mostly straightforward, if you're
willing to allow _multiple ASCII characters_ for a single unicode character.
Most of the katakana (which these are), along with the hiragana, have a
standard romanization. For the first three here, the romanizations are (if I'm
not mistaken) "ku", "tu", and "ki". The dash-looking thing lengthens the last
vowel, so this is "kutukii". That probably requires more intelligence than
you're wanting to bake into this thing, but I think "kutuki-" wouldn't be bad.

Korean characters similarly have standard romanizations (e.g., "원" = "won").

Figuring out all of the above would take some work, but it could easily be
crowd-sourced, and it would only have to be done once.

The zillions of Chinese characters are the problem. These could have different
romanizations depending on which of the Chinese languages they're being used
for, and often multiple possible romanizations when used for Japanese (in
which case they are "kanji"). So there might be no good solution for these.

