I must say, when Perl is the "top flight" implementation for unicode i don't wan...

draegtun · on Dec 4, 2012

This is very thorough boilerplate code for dealing with all corner cases when using utf-8 data with Perl.

Instead of dismissing or dissing Tom Christiansen excellent post I would highly recommend reading into his The Good, the Bad, & the (mostly) Ugly presentation from OSCON 2011 [1] where he compares Unicode handling across mainstream languages and then see how this code (and Perl) shapes up in comparison.

In the meantime pragmatic Perl programmers can cover most of that utf-8 boilerplate with just:

  use 5.016;
  use warnings;
  use utf8::all;

Or if you're like me and use perl5i [2] then its just:

  use perl5i::2;

[1]: http://training.perl.com/OSCON2011/index.html

[2]: https://metacpan.org/module/perl5i

mpyne · on Dec 4, 2012

Perl 5 was released in October 1994, so it's impressive in its own right that a) there is boilerplate you can add at all to get good Unicode support and b) that you can extend the language to support it using just boilerplate.

As the other comment mentioned, improvements by "default" to Unicode support do get included into later Perl 5 releases, but you have to let the compiler/interpreter know that you're buying into that so that it can reduce the boilerplate for you.