
Browsers should not support Unicode in the address bar - Meai
I&#x27;ve now seen my first phishing site that has:<p>1. Identical url as the real website. 
2. Valid ssl certificate. You only figure out that it&#x27;s not the real site if you navigate into the certificate and check the domain name in the technical details. No casual user does that and no user should be expected to do this.<p>This is serious, browsers should not support unicode in the address bar. It&#x27;s going to be impossible to detect fake url&#x27;s just by looking at the url, there is bound to be a unicode character that looks like a regular ascii character but isnt. Then the scammer can just replace it in the url and make anybody believe that you are on a real site.<p>Here is a picture of the problem: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;LZFfN" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;LZFfN</a>
======
Nadya
Mozilla is the only holdout on this issue AFAIK. Safari and Edge will show the
punycode and Chrome 58+ has fixed it as well.

The URL is not even visually identical due to the dot underneath the letter
`d`. Seems to be this unicode character [0]. After the apple.com [1] example
of this problem Chrome patched it. The apple.com "spoof" is now xn--
80ak6aa92e.com in Chrome.

[0]
[http://www.fileformat.info/info/unicode/char/1e0d/index.htm](http://www.fileformat.info/info/unicode/char/1e0d/index.htm)

[1]
[https://www.theguardian.com/technology/2017/apr/19/phishing-...](https://www.theguardian.com/technology/2017/apr/19/phishing-
url-trick-hackers)

------
godot
I think the restriction should be limited to the hostname, but I agree with
you in principle. There are legitimate use cases of using unicode characters
in the rest of the URI. East Asian languages web sites for example can use the
extra SEO for having an article's title on there (like English articles).

But indeed, there's no reason we need unicode in the hostname/any domain
names.

------
olliej
Screw non English speakers right? English is not anywhere near the most common
language, and plenty of “Latin” languages require accents that are provided by
Unicode.

Compare él to el in Spanish one means he (iirc) the other means “the” - he-
man.com and the-man.com are probably going to be different.

~~~
brokenmachine
What's the most common language? Chinese?

~~~
O_H_E
Yes, it is called "Mandarin" Yeah I know, they have LOTS of people, around
fifth of word population

------
twobyfour
That seems to me like a rather anglocentric view. How are people who speak
Arabic, Hindi, or Greek supposed to use the internet? And believe it or not,
not everyone's keyboard allows for easy input of ASCII characters.

Besides, it's too late. There are already non-Latin TLDs out there - in part
to support languages with non-Latin alphabets.

~~~
Meai
I'm the first guy to defend the need for multiple languages but how then make
it safer? Maybe highlight which characters are unicode in the url string...

~~~
twobyfour
That's a good point. There's probably a way to highlight unexpected
characters. I wouldn't use "Unicode" as a criterion (that's an encoding, not a
character type that excludes ASCII). Perhaps the browser could define a set of
"expected" characters based on the user's locale and highlight anything
outside that set.

------
bradknowles
Use the "punycode" extensions, until all browsers build this functionality in
by default.

