Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One important aspect of this that I often see people forgetting is that this check is designed for working with ASCII, but the domain name at least can be non-ASCII. User interfaces should remember to support IDN in domain labels and convert it to punycode before validating, and if you store A-labels (which you probably do) then convert it back to IDN form when presenting it to users.

(Alas, <input type=email> still doesn’t support non-ASCII in the local part, which isn’t supported everywhere but is, I believe, fairly widely supported now. See https://github.com/whatwg/html/issues/4562 plus https://en.wikipedia.org/wiki/Email_address_internationaliza... for a little more background on what it is.)



I thought they've rolled back most of this because it made phishing domains indistinguishable from the real ones?

I used an-emoji.my.domain for a while until chrome changed it back to punycode


I erred slightly in what I wrote there; I said IDN form for user display, but should have said IDNA form, which is the set of restrictions to mitigate those hazards. That will allow things like उदाहरण.example, while leaving emoji as xn--n3h.example since they’re not valid in IDNA. See also https://en.wikipedia.org/wiki/Internationalized_domain_name and https://en.wikipedia.org/wiki/Emoji_domain.


Actually, I’ve got to add another note because my memory and experience was insufficient here: see also https://en.wikipedia.org/wiki/IDN_homograph_attack#Client-si... for descriptions of what further restrictions browsers do, with the most notable additional filter being disallowing mixing scripts. This is something that would be nice to formalise in some way, though I don’t know of any venue suitable for the task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: