Hacker News new | past | comments | ask | show | jobs | submit login

> Java (and JavaScript) is outdated: if you were designing them today, their strings would not be UTF-16.

Except that UTF-16 makes a lot of sense on Windows, which won’t change anytime soon.

Since you always have to deal with noncharacters, initial vs. non-initial BOMs, isolated combining characters, etc., and you have to validate your inputs anyway (meaning you almost always need a failure path for unvalidated strings anyway), I’m not sure if (unpaired) surrogates constitute that much more of a complication.




It probably won't change soon, but Microsoft (or at least some teams in it) have acknowledged the mistake of UTF-8, AND they have taken some steps toward UTF-8:

http://www.oilshell.org/blog/2023/06/surrogate-pair.html#fut...


edit: s/mistake of UTF-8/mistake of UTF-16/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: