Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sorry if I was unclear but my point was that when you receive a string from the Windows API you cannot make any assumptions about it being valid UTF-16. Therefore converting it to UTF-8 is potentially lossy. So if you then convert it back from UTF-8 to UTF-16 and feed it to the WinAPI you'll get unexpected results. Which is why I feel converting back and forth all the time is risky.

This is one reason why the WTF-8[0] encoding was created as a UTF-8 like encoding that supports invalid unicode.

[0] https://simonsapin.github.io/wtf-8/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: