Okay, so, from my current perspective, everything in your comment is flat wrong. Assuming that it isn't, you and I must be talking about different things to have such divergent views.
Upthread was posted, "I want to iterate over UTF-8 codepoints directly. The initial test for a 1-byte sequence is just too simple with a branch and will be hit most of the time."
Compared to this, your proposals require iterating twice, in the context as I understand it: either (1) to convert to UCS-4 and (2) to process; or (1) to establish pointers to the starting UTF-8 code unit of each code point and (2) to process.
Additionally, if you convert to UCS-4/UTF-32, and most of the text is in ASCII, you will roughly quadruple the space required for this conversion. If you establish 32-bit pointers into the UTF-8 string for the starting code unit of each code point, you will also approximately quadruple the space required. Obviously, double that if you're using 64-bit pointers.
By contrast, iterating through a UTF-8 string directly has no such problems, at the expense of slightly more CPU time for each iteration. Such iteration can take the form of:
Instead of expanding all data to UTF-32 you can expand in chunks of some fixed amount, say, 1024 characters; so memory usage is capped, the UTF-8 unpacking can be unrolled/SIMD whatever, and the thing iterating over codepoints is trivially branchless (can even be its own SIMD). Annoying downside being potentially some wasted computation if the iteration is early-exited, but even then you could have something like doubling chunk size on each chunk, starting with 8 chars or something.
Upthread was posted, "I want to iterate over UTF-8 codepoints directly. The initial test for a 1-byte sequence is just too simple with a branch and will be hit most of the time."
Compared to this, your proposals require iterating twice, in the context as I understand it: either (1) to convert to UCS-4 and (2) to process; or (1) to establish pointers to the starting UTF-8 code unit of each code point and (2) to process.
Additionally, if you convert to UCS-4/UTF-32, and most of the text is in ASCII, you will roughly quadruple the space required for this conversion. If you establish 32-bit pointers into the UTF-8 string for the starting code unit of each code point, you will also approximately quadruple the space required. Obviously, double that if you're using 64-bit pointers.
By contrast, iterating through a UTF-8 string directly has no such problems, at the expense of slightly more CPU time for each iteration. Such iteration can take the form of:
Are you talking about something else? Perhaps I have misunderstood something somewhere.