Hacker News new | past | comments | ask | show | jobs | submit login

You're demonstrably wrong.

Most complexity in Unicode derives from:

  - real complexity in human scripts
  - politics
neither of which is something that Unicode could have avoided. Complexity in human scripts necessarily leads to complexity in Unicode. Not having Unicode at all would be much worse than Unicode could possibly seem to you -- you'd have to know the codeset/encoding of every string/file/whatever, and never lose track. Not having politics affect Unicode is a pipe dream, and politics has unavoidably led to some duplication.

Confusability is NOT a problem created by Unicode, but by humans. Even just within the Latin character set there are confusables (like 1 and l, which often look the same, so much so that early typewriters exploited such similarities to reduce part count).

Nor were things like normalization avoidable, since even before Unicode we had combining codepoints: ASCII was actually a multibyte Latin character set, where, for example, &aacute; could be written as a<BS>' (or '<BS>a), where <BS> is backspace. Many such compositions survive today -- for example, X11 Compose key sequences are based on those ASCII compositions (sadly, many Windows compositions aren't). The moment you have combining marks, you have an equivalence (normalization) problem.

Emojis did add a bunch of complexity, but... emojis are essentially a new script that was created by developers in Japan. Eventually the Unicode Consortium was bound to have to standardize emojis for technical and political reasons.

Of course, some mistakes were made: UCS-2/BMP, UTF-16, CJK unification, and others. But the lion's share of Unicode's complexity is not due to those mistakes, but to more natural reasons.


And the alternative would be what exactly?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact