Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Except if it's a compound emoji, which takes multiple codepoints -> more than 4 bytes. So really the takeaway is that "character" is highly ambiguous nowadays and shouldn't be used anymore, use one of these instead: * UTF-n code unit * Unicode code point * Grapheme Cluster * Extended Grapheme Cluster For the latter one also needs to specify the version of Unicode they're talking about, since every new compound emojis changes the definition of "Extended Grapheme Cluster".


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: