Yes, and that's the reason why it's important to choose a tool that gets Unicode...

jandrese · on June 2, 2017

Even if the language handles it correctly the lowly programmer is playing with white hot fire. Unicode adds so many edge cases it's basically a serrated knife in the middle of your code.

Will your regex match correctly in the case that the text shifts direction halfway through? Will the case insensitive search work in languages where case is ambiguous (where uc(lc(x)) != x)? Will you be able to match letters that are effectively identical but on different code points? Do you want to? Will it get confused by furigana?

Unicode's goal of encoding every language in the world as is means it encompasses every bizarre thing people have ever done to a written language. It is almost impossible to actually do anything with a block of unicode codepoints except treat it like a big binary blob and hope that you never have to personally deal with the strings contained therein. Most everybody gets it wrong in one way or another.