
What's new in Unicode 9.0 - ingve
http://babelstone.blogspot.com/2016/01/whats-new-in-unicode-90.html
======
edent
We managed to get Power Symbols into Unicode
([http://unicodepowersymbol.com/](http://unicodepowersymbol.com/)) and it was
all sparked by a conversation on HN
[https://news.ycombinator.com/item?id=6828102](https://news.ycombinator.com/item?id=6828102)

~~~
Zikes
Does anyone know if there are plans to incorporate "powerline" symbols, a la
[https://github.com/powerline/powerline](https://github.com/powerline/powerline)?
Or are those already Unicode compliant, and simply missing from most font
libraries?

~~~
rspeer
Powerline symbols are private use codepoints.

I think it might be better this way. You need a font that supports them
anyway, and they're not intended to convey anything in the absence of such a
font, so it doesn't matter that they're private use.

If the symbols got official Unicode codepoints, switching to the official
codepoints would break all existing Powerline-compatible fonts. And if
something better than Powerline came along it would have to happen all over
again.

~~~
Zikes
I understand that it would be backwards-incompatible, however it would be nice
to make it official, considering it's been proven useful.

It wouldn't break existing Powerline fonts, however, because those private-use
codepoints would still work as long as they're not assigned to anything else.

~~~
rspeer
This is still exactly what the private use area is for: "I want a code that
tells _this particular font_ to show _this particular symbol_ , and nobody
will need to know what this code meant 20 years from now".

------
andybak
Anyone scratching their head over the reasoning behind the ballooning number
of emoji might enjoy reading this post by the same author:
[http://babelstone.blogspot.co.uk/2015/04/whats-new-in-
unicod...](http://babelstone.blogspot.co.uk/2015/04/whats-new-in-
unicode-80.html)

~~~
versteegen

      The question is, now that there is a mechanism for defining skin tone
      colours for Unicode characters, will this be enough? Or will users
      demand a similar mechanism to specify hair colour and eye colour? And
      will users want to expand the concept of modifier characters to cover
      any colour and any Unicode character?
    

I find the thought absolutely horrifying

~~~
matthewmacleod
Why?

~~~
TeMPOraL
Add modifiers for proportions and it'll turn Unicode plaintext into a general-
purpose drawing language.

~~~
gherkin0
> Add modifiers for proportions and it'll turn Unicode plaintext into a
> general-purpose drawing language.

Not even that: it's all special-purpose stuff with narrow use cases.

~~~
TeMPOraL
I think Unicode has already enough features to address any (x,y) point with
some precision (think Zalgo) - now add colors and scales, and soon someone
will start encoding pictures in plaintext (and I don't mean ASCII art)...

------
chris_wot
I'm currently refactoring LibreOffice code around font handling, and I have to
say text is _complicated_. I've had to do a lot of reading about Windows, Unix
and OS X font and text handling, and to be honest I really think so far that
Apple has the cleanest platform for handling text.

I could be wrong about this, and happy to be challenged (actually, I welcome
it with some reasoning because I'm still getting my head around all the
systems fully). Anyway, just an aside.

~~~
sandGorgon
I'd just like to ping a thanks for the effort you put in.

Libreoffice is pretty popular in India as a teaching tool.

Just wanted to draw your attention to two points: Libre is impossible to
pronounce in Asia. Which is why people are not able to Google for you (how do
I tell my friend to Google for X).

It would be great if you could take Google Noto fonts into account. There are
several vernacular language users in India who would love to switch over to
Linux.

Emoji ! In fact, its not a bad idea at all to consider LibreOffice to have an
independent font installer (like the way Atom editor does plugins)

~~~
chris_wot
The work I do is tiny compared to the other developers :-) I'm really getting
more out of contributing to the LibreOffice code base than The Documentation
Foundation gets from me! But thank you, it's always great to hear that someone
appreciates our efforts.

It's interesting to hear that LibreOffice is being used in Indian education -
in my opinion Indic scripts are by far the most advanced and complex writing
system, even more so than Arabic and East-Asian scripts!

In terms of fonts, we use Graphite, Pango and Harfbuzz in an attempt to
wrangle out font handling. We've got some serious text layout issues (look for
the comments in the code around DXArray!) but the basic guts are there. It's a
complex issue and you can see how over 30 years there has been many, many
stabs at getting international fonts and text handling right. I think I have
so far counted 10 classes that _just_ handle fonts... And have recently
removed one (FontInfo) that didn't seem to actually do anything!

So refactoring such an old module in the project might take some time, and
also bare in mind I'm (sadly) a monolinguist so mainly at the moment I'm
trying to streamline class hierarchies, get some saner class interfaces, merge
classes where appropriate, review the font mapping code and try to understand
how we deal with three very different approaches to font and text handling on
OS X, Windows and the rest of the Unix world...

In terms of Noto, I think it would be unwise of us to bundle a font installer.
It's not a bad idea, but LibreOffice UI code is currently tightly coupled to
Star's VCL which was at one point a cross platform visual component library
that could stand on its own: that has now gone by the wayside and the
LibreOffice team is currently trying to wrangle it into a more stable and
responsive framework. But it's quite hard to make another program out of it
(sadly).

Another reason is that the distributions are actually better off bundling the
fonts themselves via their package management systems, and for Windows and OS
X I really think I'd suggest embedding the fonts you want to use in the
document themselves if that's the concern.

As for the name: unfortunately that won't change now. Blame that on Oracle,
who decided not to give the TDF the trademark and instead gave it to Apache,
where the code is currently bitrotting away :-(

------
acheron
"What are letters?"

"Kinda like mediaglyphs except they're all black, and they're tiny, they don't
move, they're old and boring and really hard to read."

\-- The Diamond Age (Neal Stephenson)

------
allochthon
Those emojis, including the taco, will be there for generation after
generation of people to ponder and behold. The taco seems like a pretty
transient thing to put into something as important as unicode.

~~~
awalton
IMO the entire emoji-in-unicode idea is ridiculous - it's a hack around the
inability of allowing arbitrary markup across platforms. But, we're stuck with
it because Apple said "we're doing this" after seeing Japanese phone makers do
it with arbitrary encodings last decade and the entire rest of the industry
caved, completely without regard for how much worse it makes text rendering,
processing and layout engines (to which they say "you can just not support
it", and instead turn Twitter and the rest of the web into mojibake).

But that's just my opinion on the matter. I'm certain I'm wrong and they're
actually just the best thing since sliced bread since apparently every woman
I've ever met loves the damned things.

~~~
0x0
I think it's actually pretty neat that they are regular unicode characters, as
they then can be applied in any text field, making for fun stuff like using it
in Address book contact names or calendar event names or even the terminal.
That wouldn't work if markup was required.

~~~
josteink
I think that's exactly his point.

That we have an entire industry where humans in the past could do anything in
freeform (pen and paper) we are now limited to representing that as text only.
While our human-needs for adding extra "stuff" is still there.

If we had an industry-wide "standard" text-representation (with associated UI-
widgets and controls) which had the ability to include more than just text,
this wouldn't be a problem and we wouldn't need to standardize each new
"symbol" we want to represent as "text" in our applications through Unicode.

Using Unicode for this certainly feels like inappropriate piggy-backing and a
giant hack.

~~~
0x0
Creating a "standard" text-representation UI widget and markup that works
universally sounds like a much larger engineering effort compared to adding
some code points to an existing standard, with a much larger room for errors
and differences in the implementations. Perfect is the enemy of... something
that (already) works :)

~~~
TeMPOraL
What you said is pretty much textbook definition of short-sightedness :).
Perfect is the enemy of good, therefore in things that are actually important
(like critical infrastructure) we should not stop when we reach "good enough"
:). Or, worse is sometimes better, but usually it's just worse.

------
kevin_thibedeau
I've been dying for a modern pentathlon emoji. My life is complete.

------
juhq
The emojis here makes me sad in multiple levels.

Most of all makes me sad because they seem to be prioritised on a higher level
than some of the oldest written languages on the planet.

~~~
masklinn
> Most of all makes me sad because they seem to be prioritised on a higher
> level than some of the oldest written languages on the planet.

1\. is there any evidence for that claim or are you just making things up as
you go?

2\. which scripts are specifically blocked because of emoji "being
prioritised"?

Keep in mind, Unicode 9 includes 74 new emoji and 7227 non-emoji codepoints
including 4 new scripts.

------
bitwize
I'd been bracing for the inclusion of fistbump, selfie, and "talk to the hand"
emoji for some time now. Now I can finally text in peace.

~~~
TeMPOraL
I wonder what's the expected lifetime of SELFIE though. Will it even be a
thing in 20 years?

~~~
gherkin0
> Will it even be a thing in 20 years?

I'd expect emojis to go out of style before selfies. Selfies are just a quick,
easy, and ubiquitous way of getting a photo of yourself (which people have
wanted since the beginning of photography). They're just taking off now
because the technology that allows them (front-facing phone cameras) has
become widespread.

------
Animats
But you can't send the new emoji over basic SMS, because SMS, uses a variant
of UTF-16 from the era when people thought 16 bits was big enough. (So do Java
and Windows, although there are hacks in both to get past 2 bytes.) The new
emoji are all up in the astral planes, beyond 2 bytes.

~~~
alblue
Emoji has always been beyond 2 bytes. The Unicode spec also includes
"surrogate pairs" which allows a higher plane code point to be represented as
4 bytes.

~~~
Animats
There are a few 2-byte emoji:

0x2639 Frowning face

️0x263a Smiling face

(Hacker News doesn't speak much Unicode; the Unicode symbols won't pass
through.)

~~~
vardump
So what happens if I put a standard smiley in a message? (Grinning face
Unicode: U+1F600, UTF-8: F0 9F 98 80)

Edit: I see, it disappears.

~~~
simoncion
Hmm. Does HN use MySQL with "utf8" encoding as backend storage? ;)

~~~
riffraff
for those not getting the joke: mysql has a thing called "utf8" which is not
in fact utf8 and will (depending on settings) either truncate text when it
meets a 4+ byte character, or raise an error.

It also supports real utf8 in more recent versions calling it "utf8mb4"

~~~
david-given
This made filing a bug about Thunderbird not sizing astral plane code points
correctly slightly more hilarious than it should have been (Mozilla's Bugzilla
instance runs on MySQL)...

------
donatj
I'm curious the real value of supporting dead languages. The number of texts
in said languages are presumably no longer growing.

There's no harm in it, particularly in further out plains, just my knee jerk
reaction is "why?".

~~~
awalton
I can't understand how you _don 't_ see the value in supporting ancient and
dead languages with precise full text search, enabling lossless reproduction
and alternative representations through printing, web sites, etc. Why wouldn't
we want to try to keep our pre-digital records alive by immortalizing them
digitally?

------
supernintendo
I need that pancakes emoji.

~~~
BinaryIdiot
I love pancakes! If they're going to add anymore foods then it should be
pancakes damn it :)

In all seriousness I'm very curious to see how far this will go. Do we really
want these graphical glyphs in Unicode or would they be more suited for
another encoding to separate them (but perhaps provide a way of
interoperability?). Honestly I'm out of my wheel house trying to think of
ideas of doing this better so I don't really know.

~~~
masklinn
> but perhaps provide a way of interoperability?

There's the rub. Unicode is a fine way to provide interoperability, has been
since at least Unicode 1.1 (which added — amongst many others — U+263A WHITE
SMILING FACE or U+25EE UP-POINTING TRIANGLE WITH RIGHT HALF BLACK) and
probably 1.0 (but I can't be arsed to look up 1.0's symbolic codepoints)

------
urda
Mmm avocado!

------
oliv__
More emoji, seriously? Is this what the future of tech standards looks like?
When is this validation-hungry-teen-catering-ego-fest going to stop...

I'm getting really sick of the direction mainstream tech (which is I guess
driven by mainstream culture, or lack there of) is going these days.

~~~
falcolas
Pictograms have been with us for years. Hieroglyphs, Chinese/Japanese all have
their roots in drawings of real life items. They're easy to understand and can
frequently convey more meaning in a single glyph than a dozen conventional
words.

~~~
totony
I don't think unicode should be used to convey pictographs. I understand the
need to cover _writing systems_ such as hieroglyphs, but adding in randoms
pictographs (like emojis) and permutations (like skin colors) seems like too
much complexity for what is expected of unicode.

