Nor should you. Even a well-formed sequence of utf-16 codepoints can be utter nonsense; there's approximately no level of abstraction between "sequence of fixed-width code units" and "run it through a full-blown a font rendering stack" where it makes sense to assume your input is "well-formed".