Personally no, but you *have* to handle 0 correctly (well, mainly, consistently)...

derleth · on Sept 29, 2013

Exactly, and moving from ASCII to UTF-8 means you get to keep that consistency: 0x00 means 'End of string' in ASCII and it means 'End of string' in UTF-8. No change. Never a miscommunication. No possibility of old software getting confused on this issue. Any code which had its last buffer overrun flushed out in 1983 is still free of buffer overruns in 2013.

And, if you really need to represent codepoint 0 in strings, you can use Java's Modified UTF-8, where codepoint 0 is represented by the byte sequence 0xC0, 0x80. (This isn't valid UTF-8 because in straight UTF-8, every codepoint must be represented by its shortest possible representation.)

http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

dbaupp · on Sept 29, 2013

> it means 'End of string' in UTF-8

No it doesn't, unless you are saying that one should treat it like that. But null termination is as dangerous[1,2] with UTF-8 as it is with ASCII and should be avoided as much as possible anyway. Also, ASCII doesn't mandate that \0 is end-of-string, that's just a "convention" from C.

(Did you notice that my original comment actually included the exact modified UTF-8 link you provided?)

[1]: http://cwe.mitre.org/data/definitions/170.html [2]: http://projects.webappsec.org/w/page/13246949/Null%20Byte%20...