Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> People’s names are all mapped in Unicode code points

Curious how the author recommends we avoid this one



Actually, there is a case of this in China where (previously) both Unicode and the Chinese standard (GBK) was unable to encode Ma Cheng's (马𩧢) name. If you have Android, you might even see this in action (since that Noto still does not encode this one).

Currently, the solution going forward is to restrict what characters are acceptable as names.


Safari on iOS 18.1 doesn’t render the second character properly either. (Unless it’s a 口 :)


There are lots of Chinese people whose last names use rare characters or variants that are not mapped in Unicode. Taiwan, which retains traditional characters and hasn't forced people to standardize as much as mainland China and Japan, is particularly notorious for this. There's also the whole Han unification debacle, where similar but not always identical characters used in Chinese/Japanese/Korean have been mushed together.

Support for some Indic scripts also remains quite patchy: https://modelviewculture.com/pieces/i-can-text-you-a-pile-of...


I'm guessing the solution is to forget Unicode sequences as an identifier, and assign a hashed integer account number instead. Treat Unicode as non-textual data format for human use only, like how account pictures are. I think many web systems for mainly CJK users do so.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: