>The scope of the Unicode Standard (and ISO/IEC 10646) does
not extend to encoding every symbol or sign that bears
meaning in the world.
>This list has been round and round and round on this -- regular as clockwork, about once a year, the topic comes up again. And I see no indication that the UTC or WG2 are any closer to concluding that bunches of icons should start being included in the character encoding standards simply on the basis of their being widespread and recognizable icons.
>Where is the defensible line between "Fast Forward" and
"Women's Restroom" or "Right Lane Merge Ahead" or
"Danger Crocodiles No Swimming"?
Now it looks they add whatever somebody thinks of. I guess it's related to the liberation from the BMP.
Until Unicode has a half-star character, it won't even be able to encode the average newspaper.
Recent article on the Unicode/emoji debate:
In all seriousness, I'm not sure emoji's really belong in text encoding. Even though it's more convenient, based on where they're most frequently used I don't think they need to be universal.
1) everybody uses them on their phones, they're in Unicode, consistent and compatible between devices and messaging programs. In the far flung future, researchers will be able to study their linguistic role in communication, confident in understanding what the characters were.
2) everybody uses them on their phones, they're proprietary fonts and codepoints (in the Unicode private use area if you're luck, just random data if you're not), there's no consistency between phone models, manufacturers, or cell networks. Future researchers can pound sand.
We were at #2 pre-Unicode. It was a goddamn mess, especially in Japan. Lord knows why anyone would prefer it. There's no value in being a snob about what kinds of incredibly frequently used characters we think are Worthy of inclusion, imo.
3) People who love colourful images will use stickers in Facebook Messenger, LINE, Viber, and soon iMessage. I'm sure WeChat has them too.
It's basically like 2), except we've moved from proprietary codepoints to proprietary protocols.
I don't mind characters like and or even good old ︎ (which has always been too tiny for its own good). These work in black and white, in different artistic styles, and they're a fairly limited set.
But now we're going down the road where we get new stuff like tacos and unicorns every year. And even though Unicode is an industry standard, the pictures need to look like Apple's bitmaps to avoid confusion, and the Unicode standard changes so often that you have to manually keep track of who can already see and whose computer/phone/browser/messenger software is too old.
> characters like and or even good old ︎ (which
Should have been:
> I don't mind characters like ((yellow smiling Emoji)) and ((thumbs up Emoji)) or even good old ︎((pre-Emoji Unicode smiley)) (which has always been too tiny for its own good). These work in black and white, in different artistic styles, and they're a fairly limited set.
> But now we're going down the road where we get new stuff like tacos and unicorns every year. And even though Unicode is an industry standard, the pictures need to look like Apple's bitmaps to avoid confusion, and the Unicode standard changes so often that you have to manually keep track of who can already see ((upside-down smiling Emoji)) and whose computer/phone/browser/messenger software is too old.
Text symbols (as opposed to emoji) have different rules. Basically, the symbol needs to be used in "running text" (i.e. normal text), like "containers with [recycling symbol] can be recycled" or "he bid 2[club]". Traffic signs for example are not normally used in the middle of text, so they aren't encoded in Unicode. To get the Bitcoin symbol encoded, I needed to show that it was used in text, not just as a standalone icon. The full rules for symbols in Unicode are at http://www.unicode.org/pending/symbol-guidelines.html
For the snowman in particular, it was added to Unicode because it was a symbol used in the character set for Japanese TV broadcasts, see http://www.unicode.org/L2/L2007/07391-n3341.pdf
TL;DR: Don't argue "Why does Unicode have a poop emoji but no symbol for X?" - the rules are totally different for emoji and symbols.
Edit: does HN strip out arbitrary Unicode characters now? I originally had Unicode characters in place of [recycling symbol] and [club], but they disappeared when I submitted.
The snowman, on the other hand, is a weather symbol for snow, I assume. It appears alongside other symbols for meteorological phenomena, so I imagine was added around the same time and with similar reasoning: http://www.fileformat.info/info/unicode/block/miscellaneous_...
They are mostly in Unicode for use in SMS, but there are plenty of use cases in other forms of text.
Heck, I'd be unhappy. I love adding emoticon and emoji and fun things to my emails.
So why don't we pick a very good set: perhaps every letter in every language in common use for the past 200 years? Then, for the oddball symbols that someone wants to mix in text, there can be some kind of SVG-like convention. This allows publishing textual information without requiring that every device maker updates their device to support a 1-off symbol.
The main purpose of Unicode is to encode the information. How the information is turned into its visual counterpart is outside the scope of unicode. For what it's worth this could be done by linking unicode code points to matching SVGs in a document. Wait, exactly that is already a W3C standard: https://www.w3.org/TR/SVG/fonts.html
Or, put another way:
'We have an unambiguous, cross-platform way to represent “PILE OF POO” (), while we’re still debating which of the 1.2 billion native Chinese speakers deserve to spell their own names correctly.'
> undue effort on every computer maker, ect, to keep up.
The effort to update the font files every few years? Unless you insist on supporting a new Unicode version the second it comes out, I don't see the big effort here? Of course there is effort for font makers, but this is quite centralised.