
Unicode skin-tone modifiers - weinzierl
http://www.unicode.org/reports/tr51/#Diversity
======
bane
On the plus side, this is really great work. On the minus side I sometimes
think people working on unicode, a system intended to allow for all the
world's written languages to be represented by a single encoding, have lost
their minds sometimes.

I understand that emoji are used in-line with text, but they aren't text.

Once you start going the path of encoding pictures into a text representation
you're going to start missing things that people need in these kinds of
pictoral representations.

I'm sympathetic in a sense, I wrote a paper in college demonstrating why
emoji/emoticons are a natural extension of text (to a point), because they can
allow encoding emotion and intent, which natural language punctuation don't
really allow for. But at the same time, is it _really_ necessary to have half
a dozen fish, trains, several food items an alien head, a full set of numbers
from 1-10, squares of different sizes, and other non-emotional iconography?

And then the argument is that this is just intended to encode Japanese carrier
emoji. Great, let's spend brain cycles building into unicode icons used by
phone carriers in one country.

I'm already bothered by the separate encodings I've seen from time to time for
the exact same script, but in slightly different font variations. That's
supposed to be handled by the font that's representing the script, not by the
encoding.

This really needs to be a separate encoding that's not unicode. I figure at
some point, unicode will simply turn into a generic image format with a bunch
of extra character encoding baggage that people will complain about.

~~~
masklinn
> On the plus side, this is really great work. On the minus side I sometimes
> think people working on unicode, a system intended to allow for all the
> world's written languages to be represented by a single encoding, have lost
> their minds sometimes.

Why? It's a useful and convenient pictographic addition to text, and it brings
attention to astral characters _and_ combining characters whose handling is
routinely screwed up by western developers (let alone anglo-centric ones) who
don't routinely deal with them and don't care to test for these cases. I mean
it took until _2010_ for MySQL to start not _destroying_ astral characters at
all (with the introduction of "utf8mb4" in 5.5(.3)). And oddly enough, one of
the big world-wide changes to text content between 2006 (MySQL 5.1) and 2010
was emoji spreading outside of japan starting around 2008. I'd like to say
it's a coincidence, but that's not really likely.

> And then the argument is that this is just intended to encode Japanese
> carrier emoji. Great, let's spend brain cycles building into unicode icons
> used by phone carriers in one country.

That's an argument nobody is making because emoji escaped japan back in 2008
when people outside japan started unlocking the emoji iOS keyboard and using
emoji elsewhere.

> I'm already bothered by the separate encodings I've seen from time to time
> for the exact same script, but in slightly different font variations. That's
> supposed to be handled by the font that's representing the script, not by
> the encoding.

That doesn't really work, because now you can't mix the two languages anymore,
this is actually a big issue with han unification, mixing Chinese and Japanese
in the same text becomes a pain in the ass because they can't look right
without a bunch of font-based hackery, which expects the client to have all
the right fonts in the first place.

> This really needs to be a separate encoding that's not unicode.

And so you couldn't mix emoji and text, now that would be convenient and
absolutely wouldn't lead to proprietary emoji implementations in private
unicode fields at all.

Well it would actually because that's how emoji were first integrated in
unicode in the first place.

~~~
bane
You talk about this issue like text and images have never shared space next to
one another in the same gui control on a computer screen.

More importantly, your history is all wrong. Emoticons have been used in the
west since the telegraph and kaomojis (emojis) showed up in the 80s and have
been known outside of Japan not long after.

These images are simply an interpretation of common emojis into graphical
form, they literally map something like ^_^ to a smiley face.

Somewhere along the line, it was thought to be a good idea to let people just
choose from common emoticons and have it map to the character implementation
rather than doing the reverse (because remembering things is hard). On the
receiving end, the ^_^ is simply replaced with whatever image the chat app has
chosen to represent that.

Later, DoCoMo, KDDI and Softbank decided to further formalize the emoji codec
mapping as part of Shift-JIS (and ISO-2022-JP) and that's why we have
encodings for snail, minidisc and chicken leg, but not mosque, pork chop and
blue jeans. Here's the mapping table
[https://docs.google.com/viewer?url=http%3A%2F%2Fwww.unicode....](https://docs.google.com/viewer?url=http%3A%2F%2Fwww.unicode.org%2F~scherer%2Femoji4unicode%2Fsnapshot%2Femojidata.pdf)

What the logic was to do this is anybody's guess, but it was probably for
bandwidth efficiency.

It would be more appropriate for there to be an entirely separate
"expressions" standard for encoding various emoticon systems (there's several)
and providing a standard iconography. You can already mix and match fonts and
languages with images in most rich-text controls.

Unicode is not the correct place to do this. And amazingly, text-controls, as
we have already established, can support fonts _and_ images next to each
other. So extending a text control to support unicode, images _and_
expressions, would seem more logical than shoehorning shift-jis, which is
_not_ a human written language, into unicode.

Here's the original proposal that started this madness, so you can read the
rationale. There's lots of people to blame for this, but we can start with the
authors of this proposal.

[http://www.unicode.org/L2/L2009/09025r2-emoji.pdf](http://www.unicode.org/L2/L2009/09025r2-emoji.pdf)

------
drdaeman
Wonder what was the decision to make color set fixed to Fitzpatrick scale and
not use something generic to allow non-natural/non-realistic/not-human
tones[2]. Something like language tags[1] but for color codes.

[1]
[https://en.wikipedia.org/wiki/Unicode_control_characters#Lan...](https://en.wikipedia.org/wiki/Unicode_control_characters#Language_tags)

[2] The Internet is primarily about cute kitties, not humans. And there are
still no means to mark whenever U+1F431 (🐱) is black, or, say, silver spotted
tabby.

~~~
kps
Well, I think we should get the humanoids done first.

    
    
        U+1F400 EMOJI MODIFIER SMURF
        U+1F401 EMOJI MODIFIER NAVI
        U+1F402 EMOJI MODIFIER GREY
        U+1F403 EMOJI MODIFIER GRAY
    

… the last two being particularly useful for 1F47D  EXTRATERRESTRIAL ALIEN;
and

    
    
        U+1F404 EMOJI MODIFIER REPTILIAN SHAPESHIFTER
    

whose rendering is implementation-dependent but will be visually identical to
one of the FITZPATRICK MODIFIER set if you know what's good for you.

But the real, important work will come with the modifier modifiers, e.g.

    
    
        U+1F500 EMOJI MODIFIER MODIFIER EMBARRASSED
        U+1F501 EMOJI MODIFIER MODIFIER SUNBURNT
        U+1F502 EMOJI MODIFIER MODIFIER CHOKING
        U+1F503 EMOJI MODIFIER MODIFIER DEAD
    

… to be used in conjunction with the EMOJI MODIFIER set. These will need to be
stackable if we are to be able to distinguish embarrassed dead white males
from regular arrogant dead white males.

Another commenter mentioned the requirement for facial hair,

    
    
        U+1F1330 EMOJI MODIFIER ANCHOR
        U+1F1331 EMOJI MODIFIER BALBO
        U+1F1332 EMOJI MODIFIER CHIN PUFF
        U+1F1333 EMOJI MODIFIER CHIN STRAP
        U+1F1334 EMOJI MODIFIER DUTCH
        U+1F1335 EMOJI MODIFIER FIVE O'CLOCK SHADOW
        U+1F1336 EMOJI MODIFIER GOATEE
        U+1F1337 EMOJI MODIFIER HACKER BEARD
        U+1F1338 EMOJI MODIFIER IMPERIAL
        U+1F1339 EMOJI MODIFIER MUTTON CHOPS
        U+1F133A EMOJI MODIFIER NECKBEARD
        U+1F133B EMOJI MODIFIER SOUL PATCH
    
        U+1F1430 EMOJI MODIFIER BOX CAR MOUSTACHE
        U+1F1430 EMOJI MODIFIER CHEVRON MOUSTACHE
        U+1F1430 EMOJI MODIFIER DALI MOUSTACHE
        U+1F1430 EMOJI MODIFIER HANDLEBAR MOUSTACHE
        U+1F1430 EMOJI MODIFIER PENCIL MOUSTACHE
        U+1F1430 EMOJI MODIFIER PYRAMIDAL MOUSTACHE
        U+1F1430 EMOJI MODIFIER TOOTHBRUSH MOUSTACHE
        U+1F1430 EMOJI MODIFIER WALRUS MOUSTACHE
    

which in turn requires the development of headgear modifiers, so that it's
possible to distinguish between U+1F468 MAN, U+1F3FB EMOJI MODIFIER
FITZPATRICK TYPE-1-2, U+1F1430 EMOJI MODIFIER TOOTHBRUSH MOUSTACHE (Hitler)
and U+1F468 MAN, U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, U+1F1430 EMOJI
MODIFIER TOOTHBRUSH MOUSTACHE, U+1F1849 EMOJI MODIFIER BOWLER HAT (Chaplin).

(Edit: I'm at 1338 internet points right now, so _somebody_ needs to downvote
me for this.)

~~~
krick
Also we could discuss ordering of these characters. Like why, for instance,
white males should be standing before black females after sorting? Seems like
inequality! Maybe they should be sorted randomly every time?

Same applies to beards, of course.

------
overgard
So, uh, what's the actual use of this? Think about it, normal emoji are pretty
neutral since their features are abstract and they have a skin color (bright
yellow) that doesn't exist anyway. If I send a smiley face with a skin tone
attached I'm suddenly making a political statement with a smiley face. That
just seems weird.

~~~
Synaesthesia
I believe Black people were not really represented. Yellow can be interpreted
as white or Asian but possible not African/black skin colour. For example at
the "Princess" emoji it appears to be a blonde girl. So I want my princess
emoji to look more like my girlfriend maybe. This is prevalent all over the
world, here in South Africa for example If you look for example at childrens
books and toys there is also a lack of representation of black skin colour.

~~~
overgard
Should emoji "represent" anyone at all though? I always considered them an
abstract representation of an emotional state, not a person. I guess the
advantage of the old school method (punctuation marks, like ":-)" ) is that
they're so abstract that they obviously represent an idea rather than a group
or a person.

~~~
colllectorof
That's my take on it as well. I guess it can make some sense to add skin color
to emoji that represent _people_. But a lot of them represent _emotions_ and
_mental states_ :

"I'm happy"

"I'm surprised"

"I'm listening"

Things that are universal to all humans. Will adding skin color to _those_
emoji really enable better communication, or just create issues?

------
sidarape
Am I the only one that find the whole emoji thing a bit ridiculous?

~~~
weinzierl
Not ridiculous, but I think Unicode consortium's resources could be spent
better.

This is my opinion about including emoji in Unicode, not the skin color
modifiers.

I can only think of it as a case of bike-shedding. Han unification, indic
scripts or the lack of differentiation between umlaut and diaeresis are
difficult and require experts to be solve satisfactorily. Emojis are easy and
anyone is able to contribute a little bit.

What is the scope of Unicode?

    
    
      Unicode covers all the characters for all the writing 
      systems of the world, modern and ancient. [1]
    

Sure, emojis are used for communication, but so are images, logos, emblems,
coats of arms. Will we have company logos next? Hopefully not.

[1]
[http://www.unicode.org/faq/basic_q.html](http://www.unicode.org/faq/basic_q.html)

~~~
masklinn
The thing is, emojis were being used (appeared in the 90s), and spreading (how
many people used various tricks and hacks to "unlock" emojis in older iOS
versions despite not being in Japan?), so the unicode consortium could either
standardise it, or see the number of proprietary extensions increase further
and less and less people give a flying fuck about their registry.

~~~
sidarape
You might have a point, but why would, for instance, iOS uses its own
proprietary emojis if everybody that has Android cannot see them?

~~~
masklinn
Why not? But you certainly don't have to trust me on that: it's what actually
happened. Apple added support for Softbank (unicode-based) emoji when they
started working on japan, and non-japanese started using that even though it
only worked between iOS devices[0] (didn't even work on OSX IIRC). Hell iOS
has its own proprietary messaging system and people use that.

[0] it actually required "unlocking" the relevant keyboard by starting an
application which used those characters or something like that, shit was
crazy, it took several ios versions for apple to make the emoji keyboard
available to any and all OOTB

------
jameshart
Can't wait to bring up this edge case the next time someone asks me to reverse
a string in an interview.

~~~
peteretep
How's it different from any other combining character?

~~~
masklinn
It's not, jameshart just learned about something which has been an issue in
reversing unicode text since 1991.

So yay for unicode skin tones I say, just as emoji brought to light all the
broken handling of astral characters out there, skin tones might finally bring
some light to combining characters for all the anglo-centric developers out
there who couldn't be arsed to learn about the issue when it didn't affect
them.

~~~
jameshart
Well thanks for assuming ignorance rather than amusement on my part.

I've always found the entire concept of reversing a string amusing enough, now
I get to add another example of why the entire problem is ill-defined. What is
the reverse of the 'white woman hearts black man' emoji? Is it a 'black man
hearts white woman' emoji?

The realistic string handling problem this will at least bring to light is
probably more likely to be an issue with strong truncation than with ill-
formed interview questions. As you say, if this brings the risk of splitting
combining characters to the fore maybe more software will handle it in the
future.

~~~
peteretep

        > What is the reverse of the 'white woman hearts black
        > man' emoji? Is it a 'black man hearts white woman' emoji
    

No, because when denormalized it consists of codepoints joined by U+200D ZERO
WIDTH JOINER, which suggests it's a single grapheme, which means you shouldn't
break it apart when reversing.

------
weinzierl
This was news to me (the UTR is dated 2015-06-09) but I found that it had been
discussed on the web previously. A good overview is [1].

Another thing I learned is that there is a scale for skin tones, the
Fitzpatrick Scale [2].

[1] [http://blog.emojipedia.org/2015-the-year-of-emoji-
diversity](http://blog.emojipedia.org/2015-the-year-of-emoji-diversity) [2]
[https://en.wikipedia.org/wiki/Fitzpatrick_scale](https://en.wikipedia.org/wiki/Fitzpatrick_scale)

------
makecheck
The basic problem with emojis is the use of color of _any_ kind, because a
text string alone cannot specify the background that would make the emoji look
good. (And besides, the look of each icon varies across platforms!)

This to me is why emojis in fonts are so weird. Normal glyphs can be given a
color (e.g. you have a red background so you make all your text white).
_Everything_ in Unicode can be reasonably changed to another text color,
except for emojis. Heck, even old dingbat icon-fonts look good in any color!
Emojis don't; they clash with practically everything that isn't a white
background.

For example, on my iPhone, I tried using an emoji such as "📞" (telephone
receiver) in an entry from my contacts list, and sure enough that is allowed.
The problem is, this string is used in many contexts: during an outgoing call,
names are shown in white on an arbitrary image background; in the contacts
list, names are black; and there are probably other renderings too. The plain
_text_ looks fine in every case because the OS simply changes all the glyphs
to the appropriate color; yet the "📞" icon looks terrible on the call screen
because its appearance is unchanged and it looks a blob of goo that's hard to
distinguish from the background.

There are usability reasons to have black-and-white icon designs, too. Mac OS
X does this; such icons are less distracting and it's easier to see details at
small sizes.

Basically I think they should've taken a close look at what dingbats fonts
were doing, and just extend that concept further to have a host of new black-
and-white icons.

------
Kenji
I don't understand why unicode needs emojis. I am perfectly happy with ASCII
smileys :D

This introduces a lot of complexity with little payoff; not a good engineering
practice. Also, let us not forget what Bruce Schneier said: "Unicode is just
too complex to ever be secure." Maybe it would be a good idea to stop
spiraling the complexity out of control??

~~~
weinzierl
I agree with your point, just wanted to add that from the Unicode perspective
emoticons (=ASCII smiley) and emojis are not the same.

    
    
       Q: Are emoji the same thing as emoticons?
    
       A: Not exactly. Emoticons (from “emotion” plus “icon”) 
       are specifically intended to depict facial expression 
       or body posture as a way of conveying emotion or 
       attitude in e-mail and text messages. They originated 
       as ASCII character combinations such as :-) to indicate 
       a smile—and by extension, a joke—and :-( to indicate a 
       frown. In East Asia, a number of more elaborate 
       sequences have been developed, such as (")(-_-)(") 
       showing an upset face with hands raised. Over time, many 
       systems began replacing such sequences with images, and 
       also began providing ways to input emoticon images 
       directly, such as a menu or palette. The emoji sets used 
       by Japanese cell phone carriers contain a large number 
       of characters for emoticon images, along with many other 
       non-emoticon emoji. [1]
    

[1]
[http://www.unicode.org/faq/emoji_dingbats.html](http://www.unicode.org/faq/emoji_dingbats.html)

------
jgalt212
The more they add to, or mess with (depending on one's perspective) Unicode
the more it risks becoming an attack vector.

And if this becomes true, then we have to lock everything down, and we're back
to ASCII. And then we have to start all over again.

~~~
krick
Yep, and because ASCII is strictly not enough to write text in most human
languages — or, well, even in English, if you would like to use proper
punctuation — we have to use some other non-unicode encoding. And because it
won't be enough to type in several languages at once and will cause all
problems with collation we still have — we are soon building our own Unicode.
Let's hope it will turn out better…

------
golergka
Because Unicode standard is so simple and boring as it is already.

~~~
gutnor
As far as complexity go, this is quite simple. Unlike a lot of latin
accentuated characters, colour modified emoji do not have evil twin brothers
that look the same with a different series of code point.

~~~
golergka
But you're talking about this feature, isolated.

You should be thinking about how this feature interacts with every other
feature implemented in Unicode. And how, for example, this feature will be
brought into an existing Unicode parser.

There's always a straw that breaks the camel's neck. Or just leads to camels'
quality to gradually deteriorate, leading to increasing bug rates and camel
maintainers' depression.

~~~
matthewmacleod
It doesn't interact with anything. It's an existing feature.

------
BudVVeezer
I think this is a good step in the right direction, but still falls short. For
instance, it's heavily gender binary, and has odd cultural mismatches such as
"man with turban" but no "woman with hijab." However, major kudos to the
Unicode consortium for their work on this -- it's a very challenging set of
subjects to get into!

------
krick
I see many people here (somewhat surprisingly) didn't know about that yet, so
here's the thread about how the whole thing started:
[https://news.ycombinator.com/item?id=8558022](https://news.ycombinator.com/item?id=8558022)

------
amelius
What is next? Unicode Afro-hair? :)

Thinking about it, I want a Unicode hacker beard :)

~~~
pavel_lishin
Please specify whether you would like a EMOJI MODIFIER NECKBEARD, an EMOJI
MODIFIER STALLMAN, or a EMOJI MODIFIER ROCKSTAR.

------
mschuster91
I'm missing fur color modifiers for dogs and kittens!

------
Grue3
All this bullshit, and we still can't have separate Chinese and Japanese
characters that were Han-unificated.

------
tomp
Again, no one cares about gingers.

