
Unicode Standard, Version 12.0 - lelf
http://blog.unicode.org/2019/03/announcing-unicode-standard-version-120.html
======
billpg
Emojis are a plot to make English-speaking developers care about fixing their
code to work with Unicode.

~~~
Dylan16807
If only they hadn't tried to cut it down to 16 bits near the start, we could
have avoided a _lot_ of the partial support that emojis expose.

~~~
nikbackm
Maybe all Asian scripts was not planned to be included back then?

Seems strange they would miscount so grossly otherwise.

~~~
Dylan16807
It's all down to CJK. Originally they allocated 21k codepoints to CJK, and if
that was accurate then 16 bits would pretty much fit things.

But we currently have 88k CJK characters assigned out of possibly more than
100k total.

I can't easily find anything about _how_ this went wrong and they got such a
small number.

~~~
vorg
> we currently have 88k CJK characters assigned

If you also count the Unihan variations registered in Unicode's ideographic
variation database by various Japanese outfits, encoded using the VS16 to
VS255 characters after the codepoint they modify, there's another 8k or 9k
unique characters assigned.

------
lelf
Every top comment is about Emoji… Law of triviality in action ;)

    
    
      Also in Version 12.0, the following Unicode Standard Annexes have notable modifications ⟨…⟩
      UAX #14, Unicode Linebreaking Algorithm
      UAX #29, Unicode Text Segmentation
      UAX #31, Unicode Identifier and Pattern Syntax
      UAX #38, Unicode Han Database (Unihan)
      UAX #45, U-Source Ideographs

~~~
781
It's not triviality.

How many people do you really think care about Elymaic script? Or about
Nandinagari?

~~~
jameshart
Presumably some people. Assigning code points is the Unicode consortium’s job.
That’s what unicode _does_. Nobody should be upset that they keep doing it.

But you ignore the substance of the parent’s post which is about the new
elements of the Unicode _standard_ which are not confined to assigning code
points. There is substance there to be analyzed - there is material there
about how Unicode should be used in defining identifier syntaxes which is of
high relevance to an HN audience (eg in defining your new serverless
framework, what characters should you allow in function names? Unicode now has
a better answer for you than ‘[a-zA-Z_0-9-]’).

There are updates to Unicode security and idna support.

But no, sure, let’s complain about emoji and obscure languages.

~~~
coldtea
> _Assigning code points is the Unicode consortium’s job. That’s what unicode
> does. Nobody should be upset that they keep doing it._

Why not?

Something being "somebody's job", and others being upset that that somebody
keeps doing it, can be very logical. E.g.

1) if the job wastes resources, is deemed silly, is done badly, etc.

2) if the job is detrimental to those others

------
edent
It is a shame that the majority of these won't been seen by common devices.
Google has stopped work on their Noto fonts initiative, and modern versions of
Android are stuck with pre Unicode 10.

Apple has been pretty good at adding Emoji on iOS, but macOS seems to be left
behind.

As for Linux... Installing the Unifont gives coverage, but most distros don't
seem to have a way to update base level fonts.

I'd love to be proved wrong about any of this. But is seems that the majority
of systems don't care because their native scripts are already supported.

~~~
thomasfedb
Emoji handling on Linux is so upsetting. It should not be this tricky.

~~~
magicalhippo
I submit that it should: [http://baldi.me/blog/emoji-in-
sql](http://baldi.me/blog/emoji-in-sql)

~~~
m6w6
Hilarious.

------
daoxid
Are the more fancy scripts supported by Unicode used by real people in
production? By scholars? With special fonts? Or is it more like Unicode just
wanting to support everything, even though the target audience is actually
using something else?

Asking because I'm impressed by the aim of the whole Unicode project but
having no real experience with it beyond the basics.

~~~
wongarsu
Mostly scholars. But even if nobody at all would be using it currently, the
explicit goal of Unicode is to support all scripts. Unicode is meant to make
all other text encodings obsolete so the world never has to think about text
encodings again (which mostly worked so far). That goal can only be
reached/maintained if every script anyone might plausibly want to use is
contained in Unicode.

~~~
Freak_NL
More specifically, scripts and glyphs that have documented and valid use
cases. If you made up a script today, you would have to start using it first
(and gain acceptance of it in some community) before it would be eligible for
inclusion in the Unicode standard. A good example is the power symbol (⏻,
Unicode 9.0). The proposal for it neatly documented that it was in wide use
already — in manuals in particular.

Emoji are a slightly different beast though. Those seem to get included based
on projected use cases.

~~~
epse
They used to be included because the Japanese had them in their encoding
systems, but the situation now is far more fuzzy. Which is odd for a standard.

~~~
coldtea
> _but the situation now is far more fuzz_

It's basically: "text/social comment/chat apps are big, let's add more BS
icons for our Facebook/Apple/Google/MS/etc chat apps"

------
rurban
FWIW, I've updated now the safeclib to 12.0.0 final from the previous
12.0.0-d1, and there were no changes in the case folding tables. And the
changes from 11.0 are minimal, just 6 new entries. So it's just a minimal libc
specific update, thanksfully.

For utf8-safe languages there would be 4 new scripts to add, but this affects
only rust, java and cperl. All others are unicode unsafe.

------
qwerty456127
I would be more amazed to see Tengwar and more conlang scripts added finally.

