Hacker News new | past | comments | ask | show | jobs | submit login

> The main reason is probably that ISO-8859-9 / Windows-1254 already only had a single I and i so it would be impossible to know which Unicode character to convert them to.

The alternative codepoints I proposed could coexist with the previous ones. After all, it is not like Unicode does not offer different ways of representing what humans consider to be the "same".

For example, 'İ' ≡ uc( lc 'İ' ) can be preserved by representing the result of lc 'İ' as "i + COMBINING DOT ABOVE" which I found to be rather clever[1].

However, I am not aware of a similar trick enabling me to preserve 'ı' ≡ lc( uc 'ı' ).

For reference, I have been dealing with Turkish character issues etc since the 80s when I was typesetting texts containing math, Turkish, and several European languages using WordStar during which time I had to hack my own keyboard drivers because there was zero support for Turkish. Then, there was a period where every computer had its own convention.

A lot of people used my hacks for a long time without even realizing them. So, things have improved, but I just can't figure out why not reserve a couple of extra codepoints for logically distinct concepts. It would make nothing worse and improve things in situations where one can take advantage.

[1]: https://www.nu42.com/2017/02/for-your-eyes-only.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: