> This means that a plain string is defined as an array of 8-bit Unicode code units. All array operations can be used on strings, but they will work on a code unit level, and not a character level. At the same time, standard library algorithms will interpret strings as sequences of code points, and there is also an option to treat them as sequence of graphemes by explicit usage of std.uni.byGrapheme.
And perhaps my favorite part (https://tour.dlang.org/tour/en/gems/unicode):
> According to the spec, it is an error to store non-Unicode data in the D string types; expect your program to fail in different ways if your string is encoded improperly.
I should note that what I really like about this approach is the total lack of ambiguity. There is no question about what belongs in a string, and if it's not UTF then you had better be using a byte or ubyte array or you are doing it wrong by definition.
What I like about the D approach isn't that declaring it an error actually solves anything directly (obviously it doesn't) but that it removes any ambiguity about how things are expected to work. If the encoding of strings isn't well defined, then if you're writing a library which encodings are the users going to expect you to accept? Or worse, which encodings does that minimally documented library function you're about to call accept?