In practice most applications that require a chars(str) function can get away with returning the wrong result for things outside the BMP, as opposed to UTF-8 where you need to start caring as soon as you hit words like "café".
Even if you you do require chars(str) for large strings outside the BMP those characters were so rare before emojis that you could waste a single bit on "contains any non-BMP?" and almost always do the work in O(1) time as opposed to O(n) for UTF-8.
Sorry, that just generates garbage when dealing with things outside the BMP. That can be a lot more common than you think. E.g., when dealing with Chinese characters in a context where unification is not welcomed (e.g., in China).
Yes, you're right that it generates garbage, but that's besides the point.
The point is that a huge number of programmers, especially in the 90s and early 00s would argue for UTF-16 on the basis of it being a fixed width encoding in practice. Maybe they didn't know that it actually wasn't, or they knew and didn't care because they never had to deal with anything outside the BMP.
The overlap between Windows programmers producing software for e.g. in the U.S. or European market and those that would have ever encountered a non-BMP used to be tiny until emojis came along.
So yes, while not in theory, in practice you could get away with treating UTF-16 like fixed width encoding like UCS-2 for a huge number of applications where you could reap the benefits of constant-time chars(str) and charoffset(str, N).
The garbage is super annoying. Please stop. Human scripts are O(N), too bad. You can build indices (must, for large documents), but you can't really avoid this being O(N).
And we're not even talking about normalization.
People get upset about these things and blame Unicode, but the problems are not with Unicode -- they are semantics problems with our scripts that Unicode deals with about as well as can be hoped for.
The only thing I'd remove from Unicode is pre-compositions and the associated normal forms NFC and NFKC. But note that that wouldn't remove the need for normalization.
Even if you you do require chars(str) for large strings outside the BMP those characters were so rare before emojis that you could waste a single bit on "contains any non-BMP?" and almost always do the work in O(1) time as opposed to O(n) for UTF-8.