Hacker News new | past | comments | ask | show | jobs | submit login

"10" is used as a prefix for the bytes after the first. This gives it the self-synchronization property if it somehow ends up in the middle of a sequence. See the first table in this Wikipedia link: https://en.wikipedia.org/wiki/UTF-8



Additionally, 110xxxxx tells you that the character is two bytes, 1110xxxx three bytes, and 11110xxx four bytes, i.e., number of bytes in number = leading 1 count.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: