> "\c[latin small letter e]\c[combining acute accent]" eq "\c[latin small letter e with acute]"
> "\c[dog face]".chars
PS: WTF? HN strips emojis :/ (and does it incorrectly when they are emoji sequences).
But I think Perl6 is the only language that can do this magic:
> 'Déjà vu' ~~ /:ignorecase:ignoremark deja \s vu/
"Character" is a somewhat vague term, and Unicode prefers to use more specific terms like "code unit", "code point", "abstract character", etc.
In this case I think you may be referring to grapheme clusters, which come closer to how "humans think about characters" than Unicode abstract characters, which are building blocks of the technical encoding standard but in some cases don't really match a human concept of a graphical element of a writing system.
See also “Characters” and Grapheme Clusters in section 2.11 of https://www.unicode.org/versions/Unicode12.0.0/ch02.pdf, for example.
s = '１ ２'
collation-level => 1, Country => International, Language => None, primary => 1, secondary => 0, tertiary => 0, quaternary => 0
> '１２' coll '12'
As to lelf's 1-character emoji, str.chars returns the number of characters in the string-- it would only return 2 if it returned the number of code units instead (which, the documentation notes, is what currently happens on the JVM).
That's for strings. For identifiers (names, filenames, ...) there are a lot more rules to consider, and almost nobody supports unicode identifiers safely.
There's also still no support for foreign strings on the most basic utilities, like expand, wc, cut, head/tail, tr, fold/fmt, od or sed, awk, grep, go, silversearch, go platinum searcher, rust ripgrep, ... => http://perl11.org/blog/foldcase.html
I do maintain the multibyte patches for coreutils and fixed it for my projects at least.