
Domain hacks with unusual Unicode characters (2018) - robin_reala
https://shkspr.mobi/blog/2018/11/domain-hacks-with-unusual-unicode-characters/
======
ChrisSD
> It could also be used for evading URl filters.

If the filter is a whitelist then it can't. If it's a blacklist then there are
so many ways round filters already (e.g. redirect the link) so this won't help
much.

------
saagarjha
I don't want to rain on this person's parade, but I think this is a horrible
idea. This is great for phishing but a really bad choice for your actual
domain, because it's basically impossible to type out the URL (or even convey:
it's "foo.tel but the 'tel' is small for some reason?") and you're already
seeing a bunch of tools break on it. (To really drive the point home, the
final link the article had a box in it on my computer.)

~~~
wizzwizz4
Except, it's equivalent to normal .tel. This URL points to the article above,
with all the unhacking done client-side:

[https://xn--69f31l4t57c0mag4b613h.xn--
7uh4898msjaso/🆆🆃🅵/](https://🅂𝖍𝐤ₛᵖ𝒓.ⓜ𝕠𝒃𝓲/🆆🆃🅵/)

(that's https;//🅂𝖍𝐤ₛᵖ𝒓.ⓜ𝕠𝒃𝓲/🆆🆃🅵/)

~~~
saagarjha
Cool, it has the exact same issue here as well. (It shows up as boxes.)

------
A_No_Name_Mouse
Could this be used in a XSS attack where there are very limited characters to
use after the src= tag, where it would resolve to a valid domain with more
characters?

~~~
lol768
Potentially, depending on how the character counting is done (if it counts
grapheme clusters instead of bytes I guess?)

------
rakic
You might be interested in this — an example where the company logo is the
domain name, and vice versa:

[https://xn--bj8a.com](https://ꑮ.com) (ꑮ.com)

(Safari on macOS and iOS displays the symbol in the address bar.)

~~~
techslave
huh. that’s an attack other browsers have fixed by using punycode. i wonder
does safari have blacklist for homoglyphs but renders others as the intended
glyph.

~~~
rakic
The whitelist for Safari on macOS is kept in a text file entitled
IDNScriptWhiteList.txt, located at
/System/Library/Frameworks/WebKit.framework/Versions/Current/Resources

ꑮ is from Yi script, which is obviously whitelisted.

------
techslave
> Here are the single characters which can be normalised down to a valid TLD

only TLDs or anywhere in the name? i suspect this is special treatment for
TLDs

------
failrate
A work buddy got an emoji domain (.ws) but couldn't find an email server that
could work with emoji domain email addresses.

~~~
service_bus
He was probably trying to type emojis into form fields instead of punycode.

------
jujodi
I'll admit this is pretty neat. Unclear if I will ever use this information,
but I'll always know it now.

------
specialist
Why does Unicode have different code points for same letter rendered with
different type faces?

~~~
ilammy
Accidentary math symbols. Sometimes there is semantic difference between 𝒎, 𝐦,
𝕞, m which will be lost in plaintext rendition of a formula. Yeah, you'd
better be off with TeX for this specific purpose, but that's a valid point for
having those font variations.

And it's not a generic modifier because those code points predate the expanse
of emoji and all surrounding normalization of modifiers. Language tags are
considered a bad idea, for example, so it's not all clear that it's necessary
for Unicode to convey semantics as well.

~~~
specialist
Thank you. I should have figured this out myself. I've been postponing going
down the Unicode rabbit hole.

Here's some relevant entries.

𝐦 217F SMALL ROMAN NUMERAL ONE THOUSAND
[http://unicode.org/cldr/utility/character.jsp?a=217F](http://unicode.org/cldr/utility/character.jsp?a=217F)

[https://en.wikipedia.org/wiki/Mathematical_operators_and_sym...](https://en.wikipedia.org/wiki/Mathematical_operators_and_symbols_in_Unicode)

Whelp. That's a lot to ingest. I suppose if (when) I have to do Unicode for
real, I'll need to find the 'ascii' command line tool equivalents for Unicode.

