
Should UTF-16 be considered harmful? - DanielRibeiro
http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
======
pixelcort
The problems not so much UTF-8 vs UTF-16 as the fact that there is so much
software out there that can't handle unicode codepoints outside the BMP.

For example: <https://github.com/rails/rails/issues/3727>

Another example is MySQL. utf8 won't work; you've got to use utf8mb4 instead.
(OT, but won't this be a problem again if and when we get unicode characters
that are 5 bytes in UTF-8?)

~~~
prodigal_erik
Didn't they declare that Unicode ends at U+10FFFF, which is about 200x all
characters we've allocated so far for all covered languages?

------
ToastOpt
I think the most amusing case of UTF-16 considered harmful would possibly be
the choice, in WPF/Xaml, to use UTF-16 code point offsets to reference
positions in XML strings -- while supporting UTF-8 XML encoding. It can create
the situation where, even if the input and output are both UTF-8, you'll
nonetheless need to transcode into then out of UTF-16 in order to perform the
proper string splitting.

Fun.

