Hacker News new | past | comments | ask | show | jobs | submit login

The is a horrendously horrible idea, for the same reason why unicode domain names are a bad idea. Domain names are important because they provide a reasonable amount of trust. If I type http://apple.com, I'm 99.99999% certain that I've connected to Apple's website. This gets nasty with unicode, because a person can spam your email account and get you to click on a URL which looks very similar to something like apple.com, but really points to a malicious site (thank you cyrillic characters).

Hash based domain names would be even worse. You have no idea what site is lurking behind some big string of hex digits. You could argue that a person should just compare the hash to some known set of hashes, but that's a. cumbersome and b. unrealistic. If it's done by humans, it's error prone (a malicious site could spoof the first few chars to point to their site), and if it's done by computers, what's the point? You've now effectively created a really shitty replacement for DNS.

That's pretty harsh, although you may be entirely correct. I'm not smart enough to know. Please remember that the author did this for fun and despite whatever flaws it may have, it is still interesting to think about.

Is it impossible to overcome the flaws you pointed out? Is there a way to abstract away the risk from the end user? Are there other applications for this where trust is not an issue? At risk of sounding like an idiot, could some sort of distributed proof-of-work/proof-of-stake protocol alleviate some of the trust problem?

>The is a horrendously horrible idea, for the same reason why unicode domain names are a bad idea.

So, you suggest that non english speakers should just "learn it" to use DNS?

Until we can figure out something that doesn't suck, yes. Look, I'm sympathetic with a huge chunk of the world's population having to deal with a (potentially) unfamiliar character set, but what can you do about it? There has to be some standard that everyone can agree to which doesn't compromise the integrity of the net.

This might at first sound like a xenophobic, anglocentric position, but it actually makes quite a bit of sense. There are a number of instances in which it is advantageous / critical to adopt a lingua franca -- that is, a single language that everyone in the world agrees to use for a particular purpose. In the case of domain names, Patrick is right: If we allow characters that appear to be one thing but are actually another, it opens up a whole bunch of evil possibilities.

Couldn't this also be mitigated by making all domains mandatorily delegated from a national ccTLD? Eliminate/redirect apple.com to apple.com.us, then make sure it conforms to the rules for US domains (Latin character set only, etc) whereas apple.com.ru comes under the rules of the Russian registrar?

Language and nation are not the same thing. Sadly, a policy like this would open up a can of worms.

Should the Cherokee syllabary be permitted in .us domains? If so, you are still open to homograph spoofing against the Latin character set(e.g. Ꭹ for y or Ꭲ for T). If not, isn't it a trifle rude to declare that writing system "unamerican"?

What about immigrant languages other than English? Why are Latin-charset language users more American than e.g. Hebrew ones like the long-standing and well-known Yiddish population of New York?

You could ban mixing of code ranges in domains. That might help, but how do you sensibly restrict a code range? Turkish is a Latin charset language, but contains a few extra characters that pose a homograph risk. How do you work out whether a domain is Turkish (and allowed to contain ı) or not Turkish? Also, what if you wanted to differentiate the website for your California-based Yiddish-named restaurant from a similarly named competitor in New York?

Should the governments of Morocco and Algeria be empowered to blanket-refuse Tifinagh domain names? What about when Georgia was part of Russia?

I like this solution. However, ICANN already made profit on .us/.ru domains, when I really think that they should have come as 'free' suffixes to registrations.


I buy 'apple.com' and while if I want to leave it as this, fine. However, I should also have 'apple.com.us' and 'apple.com.ru' so that I can handle these appropriately. It's not perfect, but it at least gives my users a chance to say "hey, I probably prefer (english|russian), so please give me that page."

Of course, this is also a bit lazy and somewhat of a non-solution, as this only addresses the issue for English speakers. A russian speaker using the .ru namespace is already willing to "play by the ascii rules."

People going to 'apple.com' really expect to go to the webpage of the American electronics company. One could assume this by the TLD. Users sending a request to 'apple.рф' would be doing something somewhat strange (user sends english base label, followed by a cyrillic tld). This isn't that absurd though, as english company names become loanwords (at least in russian -- see "xerox" or even ask a russian if he owns a 'yabloko makkniga pro' or an 'apple macbook pro' for example). Should the presence of a non-unicode TLD trigger country-specific mode in browsers for the sake of security? How do we handle loanwords (spoiler to above: russians say 'apple' when referring to the brand, even though it shadows the actual Russian word for apple) with non-ascii TLDs?

IMO different language versions of a site should not be based on TLD, but rather a subdomain: us.apple.com, ru.apple.com

_Which_ Latin charset? "ASCII == Latin charset" is somewhat, um... naïve (Yes, that's a part of a Latin charset, and it's even English).

I prefer an unfractioned internet

That's not fractioned. Are you being flippant?

There is some middle ground between opening up all of UniCode and just Latin characters.

It seems that generally a subset of sufficiently distinct characters should mostly suffice. I don't think e.g. Latin, Hebrew and Chinese have much visual overlap.

This is already a solved problem in every major browser. They will show mixed script domains as punycode instead of their unicode equivalent[1][2]. ICANN does not even allow all unicode characters[3]. There's so much FUD going around.

[1] http://www.chromium.org/developers/design-documents/idn-in-g...

[2] http://www.mozilla.org/en-US/about/governance/policies/secur...

[3] https://tools.ietf.org/html/rfc5892

Great links. I'm not sure I'd go so far as to say it's a "solved problem" when there are four different behaviours from four separate browsers, but I'm glad each browser is taking homograph attacks seriously. In the first link, I compared Chrome and Firefox and they do work differently.

> I'm sympathetic with a huge chunk of the world's population having to deal with a (potentially) unfamiliar character set, but what can you do about it?

Well, seeing as that "huge chunk" is a large majority, then the solution isn't "learn english because I'm used to typing URLs _this_ way."

I understand that English is the current lingua franca, but it's aggressive to expect everyone to deal with it just to use the internet.

Why not use our country tlds? It might mean that ICANN has to actually do some work, but I think that the uppermost tld for countries should actually be reserved for suffixes.


apple.us #should have never been sold apple.com.us #there, none of that scary unicode apple.com.ru #same unicode problem abound (but at least it addresses the first issue)

Who says you have to learn English?


I hate to break it to you but Unicode domain names are here and have been for a while. For instance this English language page has a couple of graphs about registrations in the .РФ domain. http://кто.рф

Note that you can't just type it in because kto.p? is not the right letters and that last one is obviously not on your keyboard. But cut and paste should work fine.

What is there to learn? Learning to type English character? We are not requiring them to learn English.

Using phone number doesn't require you to "learn" Math.

The alphabet isn't restricted to English. Also, see: punycode.

I think the popularity of url shorteners shows that people are willing to click on links where they have no idea what is lurking behind an unintelligible string.

As for the apple site, there are other (better) systems in place for supplying identity information than just the url.

Unicode domain names are no more dangerous than HTML emails, which can also be used to fool people into clicking links that look like one site but go to another.

Unicode or not, if you type in apple.com on your English keyboard, you will go to the website of Apple, Inc. (Unless your DNS cache has been poisoned.)

> Unicode domain names are no more dangerous than HTML emails, which can also be used to fool people into clicking links that look like one site but go to another.

Except in the case of a carefully selected unicode domain, the address bar will say 'www.paypal.com', not 'www.paypallolimstealingyourlogin.com'.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact