Hacker News new | past | comments | ask | show | jobs | submit login
Unicode skin-tone modifiers (unicode.org)
31 points by weinzierl on Aug 2, 2015 | hide | past | web | favorite | 63 comments

On the plus side, this is really great work. On the minus side I sometimes think people working on unicode, a system intended to allow for all the world's written languages to be represented by a single encoding, have lost their minds sometimes.

I understand that emoji are used in-line with text, but they aren't text.

Once you start going the path of encoding pictures into a text representation you're going to start missing things that people need in these kinds of pictoral representations.

I'm sympathetic in a sense, I wrote a paper in college demonstrating why emoji/emoticons are a natural extension of text (to a point), because they can allow encoding emotion and intent, which natural language punctuation don't really allow for. But at the same time, is it really necessary to have half a dozen fish, trains, several food items an alien head, a full set of numbers from 1-10, squares of different sizes, and other non-emotional iconography?

And then the argument is that this is just intended to encode Japanese carrier emoji. Great, let's spend brain cycles building into unicode icons used by phone carriers in one country.

I'm already bothered by the separate encodings I've seen from time to time for the exact same script, but in slightly different font variations. That's supposed to be handled by the font that's representing the script, not by the encoding.

This really needs to be a separate encoding that's not unicode. I figure at some point, unicode will simply turn into a generic image format with a bunch of extra character encoding baggage that people will complain about.

I'm far from an expert on Unicode, but I definitely have the feeling that they've had trouble sticking to their original mission. A few months ago we had a discussion here on HN about some people still not being able to write their names in their native script -- Burmese? maybe, I forget -- but now we have skin tone modifiers for emoji.

That said, I guess I agree, if you're going to have emoji at all, you have to at least think about this issue. My inclination would have been to make them all green from the beginning, so there's no question of ethnic favoritism, but I guess it didn't happen that way.

> On the plus side, this is really great work. On the minus side I sometimes think people working on unicode, a system intended to allow for all the world's written languages to be represented by a single encoding, have lost their minds sometimes.

Why? It's a useful and convenient pictographic addition to text, and it brings attention to astral characters and combining characters whose handling is routinely screwed up by western developers (let alone anglo-centric ones) who don't routinely deal with them and don't care to test for these cases. I mean it took until 2010 for MySQL to start not destroying astral characters at all (with the introduction of "utf8mb4" in 5.5(.3)). And oddly enough, one of the big world-wide changes to text content between 2006 (MySQL 5.1) and 2010 was emoji spreading outside of japan starting around 2008. I'd like to say it's a coincidence, but that's not really likely.

> And then the argument is that this is just intended to encode Japanese carrier emoji. Great, let's spend brain cycles building into unicode icons used by phone carriers in one country.

That's an argument nobody is making because emoji escaped japan back in 2008 when people outside japan started unlocking the emoji iOS keyboard and using emoji elsewhere.

> I'm already bothered by the separate encodings I've seen from time to time for the exact same script, but in slightly different font variations. That's supposed to be handled by the font that's representing the script, not by the encoding.

That doesn't really work, because now you can't mix the two languages anymore, this is actually a big issue with han unification, mixing Chinese and Japanese in the same text becomes a pain in the ass because they can't look right without a bunch of font-based hackery, which expects the client to have all the right fonts in the first place.

> This really needs to be a separate encoding that's not unicode.

And so you couldn't mix emoji and text, now that would be convenient and absolutely wouldn't lead to proprietary emoji implementations in private unicode fields at all.

Well it would actually because that's how emoji were first integrated in unicode in the first place.

You talk about this issue like text and images have never shared space next to one another in the same gui control on a computer screen.

More importantly, your history is all wrong. Emoticons have been used in the west since the telegraph and kaomojis (emojis) showed up in the 80s and have been known outside of Japan not long after.

These images are simply an interpretation of common emojis into graphical form, they literally map something like ^_^ to a smiley face.

Somewhere along the line, it was thought to be a good idea to let people just choose from common emoticons and have it map to the character implementation rather than doing the reverse (because remembering things is hard). On the receiving end, the ^_^ is simply replaced with whatever image the chat app has chosen to represent that.

Later, DoCoMo, KDDI and Softbank decided to further formalize the emoji codec mapping as part of Shift-JIS (and ISO-2022-JP) and that's why we have encodings for snail, minidisc and chicken leg, but not mosque, pork chop and blue jeans. Here's the mapping table https://docs.google.com/viewer?url=http%3A%2F%2Fwww.unicode....

What the logic was to do this is anybody's guess, but it was probably for bandwidth efficiency.

It would be more appropriate for there to be an entirely separate "expressions" standard for encoding various emoticon systems (there's several) and providing a standard iconography. You can already mix and match fonts and languages with images in most rich-text controls.

Unicode is not the correct place to do this. And amazingly, text-controls, as we have already established, can support fonts and images next to each other. So extending a text control to support unicode, images and expressions, would seem more logical than shoehorning shift-jis, which is not a human written language, into unicode.

Here's the original proposal that started this madness, so you can read the rationale. There's lots of people to blame for this, but we can start with the authors of this proposal.


Do emojis always look the same or are there different sets? I often wonder if the emotion that I feel some icon perfectly shows, will be displayed the same way at the recipient. I know that fonts and icon sets are usually different per app. Is it just as unpredictable here?

Emoji are like letters, they have a basic meaning but the exact representation depends on the font. So there are "different sets" in the sense that U+1F46E "POLICE OFFICER" may not look the same across all systems, in the same way "A" will not always look the same unless you specify the font to use.

Wonder what was the decision to make color set fixed to Fitzpatrick scale and not use something generic to allow non-natural/non-realistic/not-human tones[2]. Something like language tags[1] but for color codes.

[1] https://en.wikipedia.org/wiki/Unicode_control_characters#Lan...

[2] The Internet is primarily about cute kitties, not humans. And there are still no means to mark whenever U+1F431 (🐱) is black, or, say, silver spotted tabby.

Well, I think we should get the humanoids done first.

… the last two being particularly useful for 1F47D  EXTRATERRESTRIAL ALIEN; and

whose rendering is implementation-dependent but will be visually identical to one of the FITZPATRICK MODIFIER set if you know what's good for you.

But the real, important work will come with the modifier modifiers, e.g.

… to be used in conjunction with the EMOJI MODIFIER set. These will need to be stackable if we are to be able to distinguish embarrassed dead white males from regular arrogant dead white males.

Another commenter mentioned the requirement for facial hair,


which in turn requires the development of headgear modifiers, so that it's possible to distinguish between U+1F468 MAN, U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, U+1F1430 EMOJI MODIFIER TOOTHBRUSH MOUSTACHE (Hitler) and U+1F468 MAN, U+1F3FB EMOJI MODIFIER FITZPATRICK TYPE-1-2, U+1F1430 EMOJI MODIFIER TOOTHBRUSH MOUSTACHE, U+1F1849 EMOJI MODIFIER BOWLER HAT (Chaplin).

(Edit: I'm at 1338 internet points right now, so somebody needs to downvote me for this.)

Also we could discuss ordering of these characters. Like why, for instance, white males should be standing before black females after sorting? Seems like inequality! Maybe they should be sorted randomly every time?

Same applies to beards, of course.

So, uh, what's the actual use of this? Think about it, normal emoji are pretty neutral since their features are abstract and they have a skin color (bright yellow) that doesn't exist anyway. If I send a smiley face with a skin tone attached I'm suddenly making a political statement with a smiley face. That just seems weird.

I believe Black people were not really represented. Yellow can be interpreted as white or Asian but possible not African/black skin colour. For example at the "Princess" emoji it appears to be a blonde girl. So I want my princess emoji to look more like my girlfriend maybe. This is prevalent all over the world, here in South Africa for example If you look for example at childrens books and toys there is also a lack of representation of black skin colour.

Should emoji "represent" anyone at all though? I always considered them an abstract representation of an emotional state, not a person. I guess the advantage of the old school method (punctuation marks, like ":-)" ) is that they're so abstract that they obviously represent an idea rather than a group or a person.

That's my take on it as well. I guess it can make some sense to add skin color to emoji that represent people. But a lot of them represent emotions and mental states:

"I'm happy"

"I'm surprised"

"I'm listening"

Things that are universal to all humans. Will adding skin color to those emoji really enable better communication, or just create issues?

I hear your argument, we do need to abstract away from race. That said with the black guy I can do a like a cool brother emoji, just need the comb in the hair ... the white man with the moustache can represent my dad from the 70's. ha ha ;-)

Anyway there's that Microsoft grey emoji too, but it looks a bit bland.

I don't really get the idea either, emojis are used to represent emotion not skin color (that's why it's called emoji in the first place...) if people want it black or replace them with cat faces, it should be up to them. What's the benefit of specifying explicitly that they should be displayed black ?

"Political statement with every emoji" sounds about right.

Certain groups hate the fact that the Internet allows people to overlook the boundaries of race, nationality and gender. It robs those groups of fuel for manufacturing outrage and leveraging it for political power. They'd love to "fix" it.

You think I'm wrong? Okay, then tell me this. Why can't we just color emoji a neutral color (green, blue) and be done with it? Nope, instead we're adding this insanely complicated stuff to Unicode definition.

There's no complexity to combining characters, it's something that's been there all along. What's the technical difference between e becoming é and [boy] becoming [black boy]? There isn't one is the answer Both U+1F3FB "EMOJI MODIFIER FITZPATRICK TYPE-1-2" and U+00B4 "ACUTE ACCENT" are non-reorderable modifier symbols. U+00B4 is somewhat more complex actually, because it's a diacritic with Break_Before linebreak and specific arabic shaping.

There's no complexity to combining characters

Except that all of it needs to be translated to code, it involves operations with color (or multiplying the number of certain icons by 5), it needs to have a slightly different behavior (i.e. more code) on black-and-white screens, and so on. And just because there is already high level of complexity involved in something does not mean you should gleefully add to it.

Combining characters and ligatures have existed since Unicode 1.0 in 1991, skin colors aren't a different mechanism and if they require more code because you've hardcoded either I'm sorry to report you're a moron.

The only thing which may have required more code is multi-color font glyphs and that's not a new things of emoji skin colors that was at best a new optional thing back when emoji were first introduced.

> it needs to have a slightly different behavior (i.e. more code) on black-and-white screens


Actually there is. This stuff caused problems ever since it appeared. There have been known bugs and vulnerabilities caused by combined characters.

More than that, I don't think I can provide any references now, but I remember some vulnerabilities caused by improper implementation of emoji alone in applications supposed to support them.

So both always were somewhat of a problem, except combining characters are obviously more important and hence more justified.

Now we have both combined. What unexpected behavior this will lead to? None of us can know.

    There's no complexity to combining characters
Show me one fully compliant and correct implementation. Only one. I'm serious.

Not always. http://emojipedia.org/woman/ Apple made the woman always white, until the newest iOSes.

But I agree, who really cares about the race of their emoticons?

Am I the only one that find the whole emoji thing a bit ridiculous?

Not ridiculous, but I think Unicode consortium's resources could be spent better.

This is my opinion about including emoji in Unicode, not the skin color modifiers.

I can only think of it as a case of bike-shedding. Han unification, indic scripts or the lack of differentiation between umlaut and diaeresis are difficult and require experts to be solve satisfactorily. Emojis are easy and anyone is able to contribute a little bit.

What is the scope of Unicode?

  Unicode covers all the characters for all the writing 
  systems of the world, modern and ancient. [1]
Sure, emojis are used for communication, but so are images, logos, emblems, coats of arms. Will we have company logos next? Hopefully not.

[1] http://www.unicode.org/faq/basic_q.html

The thing is, emojis were being used (appeared in the 90s), and spreading (how many people used various tricks and hacks to "unlock" emojis in older iOS versions despite not being in Japan?), so the unicode consortium could either standardise it, or see the number of proprietary extensions increase further and less and less people give a flying fuck about their registry.

    so the unicode consortium could either standardise it, 
    or see the number of proprietary extensions increase 
    further and less and less people give a flying fuck 
    about their registry.
I'd say:

    so the unicode consortium could either standardise it, 
    or see the emoji fad disappear as quickly as it appeared.
Admittedly I don't know much about the situation in Japan or in other countries where emoji became popular in the 90s, but I'm curious.

In particular I have no clue how popular they really were (only youth phenomenon?) and how many people somehow depended on them. In English or German you can get along quite well without emoji. Maybe communication in other languages depends more heavily on emoji because emotions can't be expressed concisely? Or is there another reason why they became popular in Japan first.

You might have a point, but why would, for instance, iOS uses its own proprietary emojis if everybody that has Android cannot see them?

Why not? But you certainly don't have to trust me on that: it's what actually happened. Apple added support for Softbank (unicode-based) emoji when they started working on japan, and non-japanese started using that even though it only worked between iOS devices[0] (didn't even work on OSX IIRC). Hell iOS has its own proprietary messaging system and people use that.

[0] it actually required "unlocking" the relevant keyboard by starting an application which used those characters or something like that, shit was crazy, it took several ios versions for apple to make the emoji keyboard available to any and all OOTB

There is a certain irony to Unicode releasing an update that adds support for modern hieroglyphics. It's as if human recorded word is staged to come full circle

Is there a standardized or widespread character set with company logos or coats of arms?

Does Unicode only accept characters which are in an already standardized or widespread character set?

Company logos are widespread on the web and in icon fonts. 145 of the 585 glyphs in Font Awesome 4.0.0 are brand icons This is one quarter. Of the 2000 icons in the Ultimate Icons Pack 354 are "social logos". Other fonts are similar.

I don't want to see them included in Unicode but I see more arguments for logos in Unicode than for emojis.

I believe countryballs would be a more popular demand ;)

(Damn. Now I want it. Would be a perfect addition to any political discussion. And don't forget about COUNTRYBALL MODIFIER ANSCHLUSS EYES.)

Yup, you're not alone.

In my personal opinion, that's more of rich text formatting (text with images), not extra characters anymore. That's debatable (and the debate's lost already, huh) but I still believe pictures (emojis) should've been a separate encoding, independent from Unicode, as there is a distinction between a grapheme and icon (yes, the distinction's very blurred by pictograms and ideograms - and I'm not sure I get all those things correctly, but still...)

Yeah exactly. Emoji might charitably be called an extension of punctuation, but some of the stuff that ends up in the Unicode spec seems pretty crazy.

I mean, there's a Moai head in this set.

I did at first, but talking to people younger than 20 changed my mind.

Saying "the whole emoji thing" is like saying "the whole texting thing" in 2003 or "the whole internet thing" in 1997. They're here to stay, calling them ridiculous marks you as dated. Whether you care is up to you. 😜

I'm not against emoticon even if I don't like, for instance, all the big emoticon that facebook uses, but I like using a few like :P or ;). My point is that it's should stay like it is currently with a combination of ascii symbol instead of adding unicode character for each face, emotions, things that exist in the world.


Can't wait to bring up this edge case the next time someone asks me to reverse a string in an interview.

How's it different from any other combining character?

It's not, jameshart just learned about something which has been an issue in reversing unicode text since 1991.

So yay for unicode skin tones I say, just as emoji brought to light all the broken handling of astral characters out there, skin tones might finally bring some light to combining characters for all the anglo-centric developers out there who couldn't be arsed to learn about the issue when it didn't affect them.

Well thanks for assuming ignorance rather than amusement on my part.

I've always found the entire concept of reversing a string amusing enough, now I get to add another example of why the entire problem is ill-defined. What is the reverse of the 'white woman hearts black man' emoji? Is it a 'black man hearts white woman' emoji?

The realistic string handling problem this will at least bring to light is probably more likely to be an issue with strong truncation than with ill-formed interview questions. As you say, if this brings the risk of splitting combining characters to the fore maybe more software will handle it in the future.

    > What is the reverse of the 'white woman hearts black
    > man' emoji? Is it a 'black man hearts white woman' emoji
No, because when denormalized it consists of codepoints joined by U+200D ZERO WIDTH JOINER, which suggests it's a single grapheme, which means you shouldn't break it apart when reversing.

This was news to me (the UTR is dated 2015-06-09) but I found that it had been discussed on the web previously. A good overview is [1].

Another thing I learned is that there is a scale for skin tones, the Fitzpatrick Scale [2].

[1] http://blog.emojipedia.org/2015-the-year-of-emoji-diversity [2] https://en.wikipedia.org/wiki/Fitzpatrick_scale

The basic problem with emojis is the use of color of any kind, because a text string alone cannot specify the background that would make the emoji look good. (And besides, the look of each icon varies across platforms!)

This to me is why emojis in fonts are so weird. Normal glyphs can be given a color (e.g. you have a red background so you make all your text white). Everything in Unicode can be reasonably changed to another text color, except for emojis. Heck, even old dingbat icon-fonts look good in any color! Emojis don't; they clash with practically everything that isn't a white background.

For example, on my iPhone, I tried using an emoji such as "📞" (telephone receiver) in an entry from my contacts list, and sure enough that is allowed. The problem is, this string is used in many contexts: during an outgoing call, names are shown in white on an arbitrary image background; in the contacts list, names are black; and there are probably other renderings too. The plain text looks fine in every case because the OS simply changes all the glyphs to the appropriate color; yet the "📞" icon looks terrible on the call screen because its appearance is unchanged and it looks a blob of goo that's hard to distinguish from the background.

There are usability reasons to have black-and-white icon designs, too. Mac OS X does this; such icons are less distracting and it's easier to see details at small sizes.

Basically I think they should've taken a close look at what dingbats fonts were doing, and just extend that concept further to have a host of new black-and-white icons.

I don't understand why unicode needs emojis. I am perfectly happy with ASCII smileys :D

This introduces a lot of complexity with little payoff; not a good engineering practice. Also, let us not forget what Bruce Schneier said: "Unicode is just too complex to ever be secure." Maybe it would be a good idea to stop spiraling the complexity out of control??

I agree with your point, just wanted to add that from the Unicode perspective emoticons (=ASCII smiley) and emojis are not the same.

   Q: Are emoji the same thing as emoticons?

   A: Not exactly. Emoticons (from “emotion” plus “icon”) 
   are specifically intended to depict facial expression 
   or body posture as a way of conveying emotion or 
   attitude in e-mail and text messages. They originated 
   as ASCII character combinations such as :-) to indicate 
   a smile—and by extension, a joke—and :-( to indicate a 
   frown. In East Asia, a number of more elaborate 
   sequences have been developed, such as (")(-_-)(") 
   showing an upset face with hands raised. Over time, many 
   systems began replacing such sequences with images, and 
   also began providing ways to input emoticon images 
   directly, such as a menu or palette. The emoji sets used 
   by Japanese cell phone carriers contain a large number 
   of characters for emoticon images, along with many other 
   non-emoticon emoji. [1]
[1] http://www.unicode.org/faq/emoji_dingbats.html

The more they add to, or mess with (depending on one's perspective) Unicode the more it risks becoming an attack vector.

And if this becomes true, then we have to lock everything down, and we're back to ASCII. And then we have to start all over again.

Yep, and because ASCII is strictly not enough to write text in most human languages — or, well, even in English, if you would like to use proper punctuation — we have to use some other non-unicode encoding. And because it won't be enough to type in several languages at once and will cause all problems with collation we still have — we are soon building our own Unicode. Let's hope it will turn out better…

Because Unicode standard is so simple and boring as it is already.

I know you are sarcastic, just to point out how complex Unicode has become:

7.0.0 Core Specification has exactly 1000 pages. This is only the prose of the core spec, the standard annexes and the character database is separate.

As far as complexity go, this is quite simple. Unlike a lot of latin accentuated characters, colour modified emoji do not have evil twin brothers that look the same with a different series of code point.

But you're talking about this feature, isolated.

You should be thinking about how this feature interacts with every other feature implemented in Unicode. And how, for example, this feature will be brought into an existing Unicode parser.

There's always a straw that breaks the camel's neck. Or just leads to camels' quality to gradually deteriorate, leading to increasing bug rates and camel maintainers' depression.

It doesn't interact with anything. It's an existing feature.

I'm not sure it's that simple.

To quote from another reply: "Can't wait to bring up this edge case the next time someone asks me to reverse a string in an interview."

That's a problem which has existed since Unicode 1.0. That you weren't aware of it before does not make your past code correct.

Hell, now, 24 years late, you finally know about that issue and with a bit of luck you may not fuck it up next time you're doing that. That sounds like a great result to me.

Thanks for sane commentary throughout this story.

It's a bit tiresome how people are so quick to tar orderly additions to technical things as "complexity".

I think this is a good step in the right direction, but still falls short. For instance, it's heavily gender binary, and has odd cultural mismatches such as "man with turban" but no "woman with hijab." However, major kudos to the Unicode consortium for their work on this -- it's a very challenging set of subjects to get into!

I see many people here (somewhat surprisingly) didn't know about that yet, so here's the thread about how the whole thing started: https://news.ycombinator.com/item?id=8558022

What is next? Unicode Afro-hair? :)

Thinking about it, I want a Unicode hacker beard :)


I'm missing fur color modifiers for dogs and kittens!

All this bullshit, and we still can't have separate Chinese and Japanese characters that were Han-unificated.

Again, no one cares about gingers.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact