
Unicode is Kind of Insane - benfrederickson
http://www.benfrederickson.com/unicode-insanity/
======
soraminazuki
One of the things I hate the most about Unicode is the CJK unified ideographs.
While the Unicode consortium are happy with mapping all those ridiculous
emojis in Unicode, they refuse to separate letters of distinct languages. Now,
people have a harder time configuring fonts so that Japanese texts don't look
Chinese.

~~~
balls2you
sounds like something designed without consulting the CJK users...

------
raiph
RNNs wouldn't fall for confusables... ;)

The venerable Perl 5 and its ecosystem has long aimed at Unicode compliance
with acceptable performance and is one of the best toolkits available in 2015.

Perl 6 aims to outdo Perl 5 (and other langs) by making Unicode as simple as
it can be while retaining acceptable performance and correctness. A simple
example is that the "character" abstraction is an extended grapheme cluster
yet strings can remain compact if they can be (for good RAM performance) and
indexing in to strings that aren't compact is an O(1) operation (unlike, say,
the quadratic slowdowns with Swift).

Is there something that might entice you to make a few visits to the freenode
IRC channel #perl6 [1] to chat about making the long term Unicode roadmap for
Perl 6 be the best it can be?

[1]
[https://kiwiirc.com/client/irc.freenode.net/perl6](https://kiwiirc.com/client/irc.freenode.net/perl6)

------
monk_e_boy
Or actually, it's really kind of amazing.

~~~
benfrederickson
Its amazing and insane =). I was trying to highlight the complexities behind
something that many people people naively think is simple - I probably should
have made the clearer based on the reaction elsewhere though

------
nofollow
the unicode consortium addresses this:
[http://unicode.org/reports/tr39/#Confusable_Detection](http://unicode.org/reports/tr39/#Confusable_Detection)

edit: a quick search found implementations of the described algorithms in c++
(ICU library) and perl (Unicode::Security).

