I disagree. I think it should be indexed by bytes. One reason is what the other comment explains about not being constant-time (which is a significant reason), although the other is that this restricts it to Unicode (which has its own problems) and to specific versions of Unicode, and can potentially cause problems when using a different version of Unicode. A separate library can be used to deal with code points and/or EGC if this is important for a specific application; these features should not be inherent to the string type.
In practice, that is tiring as hell, verbose, awkward, unintuitive, requiring types attached to a specific instance for characters to do numeric indexing anyway and a whole bunch of other unnecessary ceremony not required in other languages.
We don't care that it takes longer, we all know that, we still need to do a bunch of string operations anyway, and it's way worse with swift than to do an equivalent thing than it is than pretty much any other language.
In Swift (and in other programming languages) it does use Unicode, but I think it probably would be better not to be. But, even when there is a Unicode String type, I still think that it probably should not be grapheme clusters, in my opinion; I explained some of the reasons for this.