
Phishing with Unicode Domains - tvvocold
https://www.xn--80ak6aa92e.com/
======
merricksb
Previous recent discussions:

[https://news.ycombinator.com/item?id=14130241](https://news.ycombinator.com/item?id=14130241)

[https://news.ycombinator.com/item?id=14119713](https://news.ycombinator.com/item?id=14119713)

------
stereo
Gosh that’s old. The original paper was from 2001, the Shmoo group wrote about
it in 2005 -
[https://blogs.oracle.com/yakshaving/entry/so_not_funny_shmoo...](https://blogs.oracle.com/yakshaving/entry/so_not_funny_shmoo_group)
\- and Joi Ito and I were able to register Veriѕign.com then, to highlight how
their greedy mismanagement of .com made this possible.

Three possible solutions, not mutually incompatible:

* Make the browsers catch it - Chrome just shows characters that look like apple.com here, and shouldn’t.

* Whitelist character sets that are allowed to be mixed, at the registry level - it should only be possible to mix cyrillic-latin homoglyphs with cyrillic non-homoglyphs.

* Don’t allow IDNs on gTLDs - if you want “écriture”, get écriture.fr, not .com

Obviously, the registries have a conflict of interest here, and won’t let 2
and 3 happen on .com because it would cut into Verisign's revenue.

See also
[https://en.wikipedia.org/wiki/IDN_homograph_attack](https://en.wikipedia.org/wiki/IDN_homograph_attack)

~~~
ko27
Update your Chrome

~~~
runeks
There are no updates to Chrome for iPad in the App Store, and it looks like
this: [http://i.imgur.com/JjR5evp.jpg](http://i.imgur.com/JjR5evp.jpg)

------
averagewall
Do unicode URLs actually provide any real value? Every web user must be
already used to typing Latin characters because so many major websites use
them. So nobody would be excluded by that. Whereas, any non-Latin character is
going to be nearly impossible for most of the world to enter.

A particularly terrible language is Chinese where most old people can't type
the characters even though they can type Latin letters. That's because you
have to deliberately invest time to sit down and learn an input method which
is a non-trivial endeavor that takes weeks of effort and old people just
aren't going to go back to school for that.

~~~
pilif
_> Do unicode URLs actually provide any real value?_

yes. Not everybody speaks english.

 _> Every web user must be already used to typing Latin characters because so
many major websites use them._

s/web user/existing web user/

Unicode domains are one more piece required for the net to be as inclusive as
possible.

~~~
lambdadmitry
On the other hand, unicode domains may lead to balkanization of the net. How
would you even type in something like борщ.рф (and before you ask, you can
easily translate its contents using Google Translate after entering the URL)?
And everyone, just everyone in Russia is already capable of typing in stuff in
ASCII. So the upside is small and diminishing (more people learn English over
time and that's a beautiful thing), and the downside is the reversal of the
unification effect that Internet had. I'm pretty sure it's not an obvious
choice.

I should also add that the general attitude of "not everybody speaks English
so we should adapt our tech to reduce the need for English" seems to imply a
privilege of already knowing English. It is true that not everyone speaks
English at this moment, but the right solution would be to teach everyone
English as it expands horizons immensely, not to balkanize the world.
Languages are _not_ equal and English is the single most useful one. One can
argue that e.g. Russian is just as good as English, but it's just not true.
The amount of information available in English is _immeasurably_ higher than
in any other national language, and one should have the privilege of knowing
English for some time (or being a native speaker) to forget the fact.

~~~
pilif
_> How would you even type in something like борщ.рф_

by clicking on the link on my search engine. Though of course, either that
page is in a language I can easily type (so why then use that URL?), or
otherwise, I would not have searched for that term to begin with, whether I
typed it in the URL bar or in the search field.

If I'm clicking a link on a page in a script I can read, then the point is
moot too.

 _> (more people learn English over time and that's a beautiful thing_

I don't know. Giving access to people who don't (yet) speak english to me is a
nobler goal than forcing people to learn english and the latin script.

If they want to learn english, that's fine. But forcing them to is being
exclusive.

Yes. There's a lot more content available in english and in the end, that's
what made me learn it (honestly - the sole reason I started to learn english
was to be able to play the talkie version of "Indiana Jones and the Fate of
Atlantis"), but this was my decision. I wasn't forced to.

------
Jonnax
Ouch. This is a good one.

Whilst it's easy to say "Just enable punicode always" People that use the web
in different languages lose a lot of functionality because of it.

I could imagine a solution would be to collect a list of homogliphs then when
the browser suspects an overlap it does a search for similarly​ spelt sites
then warns the user of the possibility of the site being an imitation. Of
course then also converting the URL in the address bar to punicode.

What other ideas are there?

~~~
3131s
How about an icon next to the URL bar that displays and allows a user to
select their preferred Unicode block(s)? If a character in the URL falls
outside that block then the URL is highlighted in red or some warning is
displayed.

------
herghost
Safari just displays it as "[https://www.xn--80ak6aa92e.com"](https://www.xn--
80ak6aa92e.com") whereas Chrome (Version 57.0.2987.133 (64-bit)) displays it
as the author intended.

~~~
wila
Same with pale moon just displays it with the puny code instead of the unicode
characters.

Firefox 52.0.2 here displays it as apple.com So updated Firefox (which failed
until I reran it) and then.. Still displays it as apple.com (Firefox 53.0)

~~~
wila
OK, I read the linked article now and apparently you have to change a setting
in Firefox in order to see the URL in punycode as Mozilla decided it is "not a
bug".

Eg. about:config and set the "network.IDN_show_punycode" to true to avoid this
trap.

------
jwilk
[https://en.wikipedia.org/wiki/IDN_homograph_attack](https://en.wikipedia.org/wiki/IDN_homograph_attack)

------
mirages
Chrome 58 rolled out yesterday fix the issue

~~~
pluma
Except the fix seems to be simply to show the punycode URL.

That's not a fix, that's a workaround.

EDIT: This led me to read up on how various browsers handle non-ASCII letters
which in turn helped me discover that apparently no browser supports the
German sharp-s ("ß") which gets auto-expanded to "ss" although domains
containing the sharp-s can be registered separately from "ss" domains --
effectively allowing people to register domains that can't be accessed in any
browser without explicitly using the unreadable punycode representation.

EDIT2: It seems the fix is more fine-tuned than just showing punycode for
everything. So it's still a workaround (punycode URLs are not fit for human
consumption so this still actively punishes confusing domains even if they're
not intentionally malicious) but it affects fewer domains than I initially
feared.

~~~
jerryszczerry
It was already fixed back when domain names had to be plain ASCII.

It was West-centric, yes, but it allowed for a unique and legible ASCII
identifiers. And encouraged non-ASCII languages to create a unique (or,
mostly-unique) Latin representation of their scripts — which is, in general, a
good thing. It encouraged unification, using ASCII as the common ground.

Allowing for Unicode characters opened a new Pandora box, creating a situation
that is unsolvable — either we keep the new names, making almost every string
of characters potentially ambiguous, or we return to the state where ASCII-
only names are the only ones usable.

Also, differentiating between ASCII and non-ASCII names doesn't solve the
thing. Imagine what if the legitimate address is already in a non-ASCII
script.

~~~
matt4077
In what universe is ASCII "common ground"? And in what universe is a few
scammers here and there "pandora's box"?

Some people in this threat seem almost eager to throw out any attempt at
respecting cultures other than their own using the earliest convenient excuse.

~~~
Dylan16807
> In what universe is ASCII "common ground"?

Excluding EBCDIC, which has the same characters, can you name a major
character set that doesn't start with a carbon copy of ASCII? Shift JIS starts
with ASCII. Big5 starts with ASCII. Every code page starts with ASCII.
Unicode, of course, starts with ASCII. Look at just about any (physical)
keyboard for any language and it will support ASCII.

