Hacker News new | past | comments | ask | show | jobs | submit login

> So, the question is rather: why not a design that doesn't need "normalization" and runes, code points, and all that...

Because language is messy. At some point you have to start getting into the raw philosophy of language and it's not just a technical problem at that point but a political problem and an emotional problem.

Take accents as one example: in English a diaresis is a rare but sometimes useful accent mark to distinguish digraphs (coöperate should be pronounced as two Os, not one OOOH sound like in chicken coop) the letter stays the same it just has "bonus information"; in German an umlaut version of a letter (ö versus o) is considered an entirely different letter, with a different pronunciation and alphabet order (though further complicated by conversions to digraphs in some situations such as ö to oe).

Which language is "right"? The one that thinks that diaresis is merely a modifier or the one that thinks of an accented letter as a different letter from the unmodified? There isn't a right and wrong here, there's just different perspectives, different philosophies, huge histories of language evolution and divergence, and lots of people reusing similar looking concepts for vastly different needs.

Similarly the Spanish ñ is single letter to Spanish but the ~ accent may be a tone marker in another language that is important to the pronunciation of the word and a modifier to the letter rather a letter on its own.

There's the case of the overlaps where different alphabets diverged from similar origins. Are the letters that still look alike the same letters? [1]

Math is a language with a merged alphabet of latin characters, arabic characters, greek characters, monastery manuscript-derived shorthands, etc. Is the modern Greek Pi the same as the mathematical symbol Pi anymore? Do they need different representations? Do you need to distinguish, say in the context of modern Greek mathematical discussions the usage of Pi in the alphabet versus the usage of mathematical Pi?

These are just the easy examples in the mostly EFIGS space most of HN will be aware of. Multiply those sorts of philosophical complications across the spectrum of languages written across the world, the diversity of Asian scripts, and the wonder of ancient scripts, and yes the modern joy of emoji. Even "normalization" is a hack where you don't care about the philosophical meaning of a symbol, you just need to know if the symbols vaguely look alike, and even then there are so many different kinds of normalization available in Unicode because everyone can't always agree which things look alike either, because that changes with different perspectives from different languages.

[1] An excellent Venn diagram: https://en.wikipedia.org/wiki/File:Venn_diagram_showing_Gree...

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact