Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven't thought about this deeply, but it seems to me that the evolution of unicode has left it unparseable (into extended grapheme clusters, which I guess are "characters") in a forwards compatible way. If so, it seems like we need a new encoding which actually delimits these (just as utf-8 delimits code points). Then the original sender determines what is a grapheme, and if they don't know, who does?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: