Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For what reason other than perhaps font rendering code do you need to index code points in a string? Everything you think you need code points for (character counting, truncation, etc) you actually need grapheme clusters.

I wish Python went with UTF-8 instead of their multi width internal representation.



The problem IMHO is more not regressing on lots of already-written code which assumes O(1) indexing, and less a problem of principle.


Most code works perfectly fine indexing bytes in a UTF-8 string. Anything that looks for stuff in the ASCII range such as parsers would not need to be changed.

Python 3 already required all code to be looked over because of the string literal change so it wouldn't be much different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: