
Unicode, Perl 6, and You - kamaal
https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-perl-6-and-you/
======
SwellJoe
Perl has had decent Unicode support longer than most similar languages (years
before Ruby and Python, for instance), but Perl 6 is just ridiculously good at
it, and I hope other languages follow suit. I'm unaware of any other language
that handles Unicode this well...am I missing any languages that do? I guess
JavaScript is coming along on this front and ES6 includes support for Unicode
regexps, which is progress, so maybe that's the closest mainstream language.

~~~
lelf
> _ES6 includes support for Unicode regexps_

Will it provide \X for example? (\X matches extended grapheme cluster.)

> _am I missing any languages that do?_

Swift is one notable example. It has built-in and simple enough grapheme
handling.

~~~
SwellJoe
I haven't looked at Swift, at all. I don't buy Apple products, so have no
familiarity with their ecosystem. But, now that it's been opened up, I'll have
a look at it, though it seems likely to remain predominantly a language for
Apple products for the foreseeable future (I think?), so not something I'd
find myself using in production any time soon. But, I guess we'll see how that
shakes out over time now that it is open.

Given the rate at which JavaScript is converging on a really nice set of
modern features and is having warts removed and performance is accelerating, I
wonder if any other language is as relevant long-term.

------
kbenson
> Don’t worry though, standard Perl 6 does not demand that you be able to type
> Unicode. If you can’t, there are so-called “Texas” variants:

I've always loved the "everything's bigger in Texas" joke implicit inthe
"texas" variant on some operators.

> If you’re interested in working within a particular normalization, there’s
> the self-explanatory types of NFC, NFD, NFKC, and NFKD.

That would probably be better with a "Well, it's self explanatory at the point
you know you want to work in a particular normalization", since I only vaguely
know what those are, and I've beenhearing about some of them for years. ;)

Great post though!

------
Grue3
>say "नि".codes; # returns 3

How is "नि" 3 codepoints? There is only two: न and ि . Could this be a bug?

~~~
kamaal
Not sure how to interpret this, are you suggesting the hindi नि should be two
code points because when translated to English it is 'Ni'(two letters)?

~~~
Grue3
I'm suggesting this string has two Unicode codepoints. I don't know anything
about Hindi language or Devanagari script.

------
wtetzner
I haven't finished reading through the whole post yet, but if Perl 6 works on
graphemes, are ligatures considered to be only one character?

~~~
lelf
Yes. Bear in mind however that ligatures are in Unicode only for the backward
compatibility.

    
    
      > 'ﬄ'.chars
      1

~~~
logicallee
that's sure to ruﬄe some feathers.

(see what I did there)

~~~
evmar
Try highlighting substrings of that text -- my browser doesn't even know it's
multiple characters. (Separate is the idea of a browser displaying ligatures,
which already works. But that's because ligatures are a display issue and the
source text must have non-ligature text in it.)

