Hacker News new | past | comments | ask | show | jobs | submit login

Sure, but removing those wouldn't make Unicode any simpler, they're just character sets. The GP is complaining about things like combining characters and diacritic modifiers, which make Unicode "ugly" but are necessary if you want to represent real languages used by billions of people.



I’m actually complaining about more “advanced” features like hiding text (see my comment above) or zalgoing it.

And of course endless variations of skin color and gender of three people in a pictogram of a family or something, which is purely a product of a specific subculture that doesn’t have anything in common with text/charset.

If unicode cared about characters, which happens to be an evolving but finite set, it would simply include them all, together with exactly two direction specifiers. Instead it created a language/format/tag system within itself to build characters most of which make zero sense to anyone in the world, except for grapheme linguists, if that job title even exists.

It will eventually overengineer itself into a set bigger than the set of all real characters, if not yet.

Practicality and implications of such system is clearly demonstrated by the $subj.


"Zalgoing" text is just piling up combining marks, but there are plenty of real-world languages that require more than one combining mark per character to be properly spelled. Vietnamese is a rather extreme example.


You're right, of course. The point was glibly making was that Unicode has a lot of stuff in it, and you're not necessarily stomping on someone's ability to communicate by removing part if it.

I'm also concerned by having to normalize representations that use combining character etc. but I will add that there are assumptions that you can break just by including weird charsets.

For example the space character in Ogham, U+1680 is considered whitespace, but may not be invisible, ultimately because of the mechanics of writing something that's like the branches coming off a tree though carved around a large stone. That might be annoying to think about when you're designing a login page.


I mean, we can just make the languages simpler? We can also remove all the hundred different ways to pronounce English sounds. All elementary students will thank you for it xD


You can make a language simpler but old books still exist. I guess if we burn all old books and disallow a means to print these old books again, people would be happy?


Reprint them with new spelling? We have 500 year old books that are unreadable. 99.99% of all books published will not be relevant to anyone that isn’t consuming them right at that moment anyway.

Lovers can read the lord of the rings in the ‘original’ spelling.


The point is that you still want the universal encoding to be able to represent such texts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: