Note that ปฏิบัติจริง is 9 somethings or 11 something elses.
Finally, move the cursor along the "characters" in either google translate's output field (it's fiddly but works if you are patient) or the Unicode Inspector.
Note that there are SEVEN movements.
What sort of elements are these cursor movements moving over?
Graphemes.
There aren't 7 Code Points. Or 7 Code Units. Or 7 Bytes.
There are 7 Graphemes.
Graphemes are defined in the Unicode consortium's glossary as "What a user thinks of as a character."
There really couldn't be a more fundamentally important aspect of text processing for half the planet than dealing with text at the grapheme level ("what a user thinks of as a character") BY DEFAULT, unless the dev wants to do tricky stuff.
But almost all current langs (including Python 3.x) get this stuff terribly wrong in all sorts of ways.
* Perl 6, by default, stores all strings given to it in NFG form, Normalization Form Grapheme. It's a Perl 6-invented character representation, designed to deal with un-precomposed graphemes properly.
* The Perl 6 C<Str> type, and more generally the C<Stringy> role, deals exclusively in NFG form.
* By default regexes operate on the grapheme (NFG) level, regardless of how the string itself is stored.
So, as far as graphemes ("what a user thinks of as a character") are concerned, Perl 6 gets this basic aspect of text processing right when almost all others get it terribly wrong.
(There are a couple other langs that seem to have made an effort to take graphemes seriously enough. For example, the .length function for strings in Elixir (related to Erlang) returns the count of graphemes by default. And a few like Clojure take graphemes seriously, just not seriously enough.)
So, a significant and rapidly growing chunk of the planet's programmer population are likely to end up interested in at least playing with P6 to see if it gets this bit right (and they'll find it does).
I wonder if that "significant and rapidly growing chunk of the planet's programmer population" will find that an untested and partially implemented feature suffices for their needs in the real world.
Most English devs have no idea just how bad things are with current langs.
A quick experiment.
First, translate "Practical" in to Thai: ปฏิบัติจริง (https://translate.google.com/#en/th/Practical)
Next, see Unicode Inspector: (http://apps.timwhitlock.info/unicode/inspect?s=ปฏิบัติจริง)
Note that ปฏิบัติจริง is 9 somethings or 11 something elses.
Finally, move the cursor along the "characters" in either google translate's output field (it's fiddly but works if you are patient) or the Unicode Inspector.
Note that there are SEVEN movements.
What sort of elements are these cursor movements moving over?
Graphemes.
There aren't 7 Code Points. Or 7 Code Units. Or 7 Bytes.
There are 7 Graphemes.
Graphemes are defined in the Unicode consortium's glossary as "What a user thinks of as a character."
http://www.unicode.org/glossary/#grapheme
There really couldn't be a more fundamentally important aspect of text processing for half the planet than dealing with text at the grapheme level ("what a user thinks of as a character") BY DEFAULT, unless the dev wants to do tricky stuff.
But almost all current langs (including Python 3.x) get this stuff terribly wrong in all sorts of ways.
Perl 6 takes graphemes seriously.
From Synopsis 15 (https://raw.githubusercontent.com/perl6/specs/master/S15-uni...):
* Perl 6 by default operates on graphemes
* Perl 6, by default, stores all strings given to it in NFG form, Normalization Form Grapheme. It's a Perl 6-invented character representation, designed to deal with un-precomposed graphemes properly.
* The Perl 6 C<Str> type, and more generally the C<Stringy> role, deals exclusively in NFG form.
* By default regexes operate on the grapheme (NFG) level, regardless of how the string itself is stored.
So, as far as graphemes ("what a user thinks of as a character") are concerned, Perl 6 gets this basic aspect of text processing right when almost all others get it terribly wrong.
(There are a couple other langs that seem to have made an effort to take graphemes seriously enough. For example, the .length function for strings in Elixir (related to Erlang) returns the count of graphemes by default. And a few like Clojure take graphemes seriously, just not seriously enough.)
So, a significant and rapidly growing chunk of the planet's programmer population are likely to end up interested in at least playing with P6 to see if it gets this bit right (and they'll find it does).