Hacker News new | past | comments | ask | show | jobs | submit login

Han unification is the primary reason not to force utf8, especially in a language that has strong roots in Japan. (Sorry to be short, I'm on a smartphone. Googling should provide sufficient answers)

Han unification is one problem; another problem is that not all encodings can be round-tripped losslessly through Unicode. Shift-JIS, for example, has multiple separate characters that convert into the same character in Unicode, and therefore cannot be converted back into their original form reliably.

The shift JIS issue seems to be a fault in the design of shift JIS, resulting in even symbols like square root not having a canonical encoding. At what point do you just draw the line and tell developers if they need to deal with such things themselves? No one is taking away byte arrays. Fragmenting the userbase seems suboptimal.

Wow. That's pretty ugly, thanks for the info. But for people not wanting to use Unicode... Does that not mean they simply cannot use strings in Java, .Net, Windows (to some extent), etc.? It just seems sorta not feasible at this point to not use Unicode. And according to Wikipedia, Unicode now has a way to select which language variant of a unified character. So is unification not as big a problem if people use selectors?

And what's the practical alternative? Keeping things in country specific encodings?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact