Hacker News new | past | comments | ask | show | jobs | submit login

How did the Unicode Consortium turn around. I remember 10 years ago they were refusing to add standard media icons because

>The scope of the Unicode Standard (and ISO/IEC 10646) does not extend to encoding every symbol or sign that bears meaning in the world.

>This list has been round and round and round on this -- regular as clockwork, about once a year, the topic comes up again. And I see no indication that the UTC or WG2 are any closer to concluding that bunches of icons should start being included in the character encoding standards simply on the basis of their being widespread and recognizable icons.

>Where is the defensible line between "Fast Forward" and "Women's Restroom" or "Right Lane Merge Ahead" or "Danger Crocodiles No Swimming"?

(http://www.unicode.org/mail-arch/unicode-ml/y2005-m08/0371.h...)

Now it looks they add whatever somebody thinks of. I guess it's related to the liberation from the BMP.




>The scope of the Unicode Standard (and ISO/IEC 10646) does not extend to encoding every symbol or sign that bears meaning in the world.

Until Unicode has a half-star character, it won't even be able to encode the average newspaper.


Somebody should propose the half star (used in star ratings) to Unicode. Seriously.


Multiply your rating system by 2 and you won't need half stars :)


This makes subitizing much more difficult though.


This comment taught me a word. Separately, you are completely correct and this is extremely valid in the design of rating systems.


Only the star with the left side filled in. And an outline on the right.


I think something like an occlusion mask modifier (slice off this much from this side/corner) would be more useful.


And two thirds, and three fifths, …


good idea


What if my products are rated in smiley-faces?


This is the role of digits I believe.


Unicode is supposed to include symbols that appear in "running text", not standalone icons. So no on traffic signs for instance. (There are exceptions for historical reasons. And emoji are a totally separate story.)


> And emoji are a totally separate story.

Recent article on the Unicode/emoji debate:

https://www.buzzfeed.com/charliewarzel/inside-emojigeddon-th...


Unicode 9.0 adds 7500 characters, 72 of which are emoji, so I think the "Emojigeddon" is a bit exaggerated.


Sounds like 72 too many if you ask me.

In all seriousness, I'm not sure emoji's really belong in text encoding. Even though it's more convenient, based on where they're most frequently used I don't think they need to be universal.


Your options are:

1) everybody uses them on their phones, they're in Unicode, consistent and compatible between devices and messaging programs. In the far flung future, researchers will be able to study their linguistic role in communication, confident in understanding what the characters were.

2) everybody uses them on their phones, they're proprietary fonts and codepoints (in the Unicode private use area if you're luck, just random data if you're not), there's no consistency between phone models, manufacturers, or cell networks. Future researchers can pound sand.

We were at #2 pre-Unicode. It was a goddamn mess, especially in Japan. Lord knows why anyone would prefer it. There's no value in being a snob about what kinds of incredibly frequently used characters we think are Worthy of inclusion, imo.


There's another option:

3) People who love colourful images will use stickers in Facebook Messenger, LINE, Viber, and soon iMessage. I'm sure WeChat has them too. It's basically like 2), except we've moved from proprietary codepoints to proprietary protocols.

I don't mind characters like and or even good old ︎ (which has always been too tiny for its own good). These work in black and white, in different artistic styles, and they're a fairly limited set.

But now we're going down the road where we get new stuff like tacos and unicorns every year. And even though Unicode is an industry standard, the pictures need to look like Apple's bitmaps to avoid confusion, and the Unicode standard changes so often that you have to manually keep track of who can already see and whose computer/phone/browser/messenger software is too old.


Interesting. Did you try to include some emoji in your comment? They did not get included:

> characters like and or even good old ︎ (which


Oops, thanks. Well, that explains why I've never seen Emoji on Hacker News. And I've missed the edit window, so I can't fix my post.

Should have been:

> I don't mind characters like ((yellow smiling Emoji)) and ((thumbs up Emoji)) or even good old ︎((pre-Emoji Unicode smiley)) (which has always been too tiny for its own good). These work in black and white, in different artistic styles, and they're a fairly limited set.

> But now we're going down the road where we get new stuff like tacos and unicorns every year. And even though Unicode is an industry standard, the pictures need to look like Apple's bitmaps to avoid confusion, and the Unicode standard changes so often that you have to manually keep track of who can already see ((upside-down smiling Emoji)) and whose computer/phone/browser/messenger software is too old.


The "universal" in Unicode means that it aspires to include all symbols used in any form of text; not that it should only include symbols that are used in all forms of text.


I kind of agree, but we've already committed at this point. No going back, really. So there's no harm in adding some more.


I have never read a book that had a snowman in the running text, so what's the story for emoji?


Emoji were added to Unicode for compatibility with various mobile phones, so they would have a standard encoding. That's how Unicode ended up with the poop emoji for example - they didn't sit around thinking "what we really need is...". Since people really, really want more emoji, Unicode is sort of stuck constantly adding more. If you want to propose new emoji, the rules are at http://www.unicode.org/emoji/selection.html

Text symbols (as opposed to emoji) have different rules. Basically, the symbol needs to be used in "running text" (i.e. normal text), like "containers with [recycling symbol] can be recycled" or "he bid 2[club]". Traffic signs for example are not normally used in the middle of text, so they aren't encoded in Unicode. To get the Bitcoin symbol encoded, I needed to show that it was used in text, not just as a standalone icon. The full rules for symbols in Unicode are at http://www.unicode.org/pending/symbol-guidelines.html

For the snowman in particular, it was added to Unicode because it was a symbol used in the character set for Japanese TV broadcasts, see http://www.unicode.org/L2/L2007/07391-n3341.pdf

TL;DR: Don't argue "Why does Unicode have a poop emoji but no symbol for X?" - the rules are totally different for emoji and symbols.

Edit: does HN strip out arbitrary Unicode characters now? I originally had Unicode characters in place of [recycling symbol] and [club], but they disappeared when I submitted.


IIRC HN might have a character whitelist to prevent overloads of combining or layout-altering characters, and not have to worry about the behavior of newly added characters. There were some comment threads a few years ago that were just stacks of hundreds of combining diacritics that would crash some rendering engines and create odd decorated text on others.


Tom Scott has a great video about the history of emoji. Basically, some companies in far east countries were encoding these icons in various proprietary codes for their messaging systems. The ecosystem became widespread and consistent enough that the unicode consortium saw it fit to include these emoji in the standard.

The snowman, on the other hand, is a weather symbol for snow, I assume. It appears alongside other symbols for meteorological phenomena, so I imagine was added around the same time and with similar reasoning: http://www.fileformat.info/info/unicode/block/miscellaneous_...


Books aren't the only form of running text. Emoji are quite common in some forms of textual communication.


Emoji are widely used in text such as instant messages and forum posts.

They are mostly in Unicode for use in SMS, but there are plenty of use cases in other forms of text.


What about a text, or an article? Books aren't the only source of running text.


And what about email? I have to support POP3 (!); my customers would be seriously unhappy if they can't sent an email with emoji.

Heck, I'd be unhappy. I love adding emoticon and emoji and fun things to my emails.


How are traffic signs not in "running text" in books about the rules of the road and such like?


Running text means INSIDE text (as in: "running along" with the other characters), not "used in a book as illustration".


It seems that every symbol imaginable will be used in running text eventually. At very least, for purposes of discussing the symbol itself!


Yes, of course. What was I thinking?


I'm not sure about the 'running text' thing, but in my view Traffic Signs are not globally universal (yet), so you'd have to have regional variants which is impractical.


Tons of the things in unicode are not "globally universal".



Let’s start working on "SVG over UTF" RFC, should we?


Honestly, I think "SVG over UTF" makes a lot more sense. It's impossible to make a character set that supports every character known to man, because that just adds undue effort on every computer maker, ect, to keep up.

So why don't we pick a very good set: perhaps every letter in every language in common use for the past 200 years? Then, for the oddball symbols that someone wants to mix in text, there can be some kind of SVG-like convention. This allows publishing textual information without requiring that every device maker updates their device to support a 1-off symbol.


> This allows publishing textual information without requiring that every device maker updates their device to support a 1-off symbol.

The main purpose of Unicode is to encode the information. How the information is turned into its visual counterpart is outside the scope of unicode. For what it's worth this could be done by linking unicode code points to matching SVGs in a document. Wait, exactly that is already a W3C standard: https://www.w3.org/TR/SVG/fonts.html


Because it's easier to throw in random icons than to actually accomplish the goal of "every letter in every language in common use for the past 200 years", or even "past 20 years".

Or, put another way:

'We have an unambiguous, cross-platform way to represent “PILE OF POO” (), while we’re still debating which of the 1.2 billion native Chinese speakers deserve to spell their own names correctly.'

https://modelviewculture.com/pieces/i-can-text-you-a-pile-of...


This is a link by the article's author that is intended to make it easier for us to add useful symbols: https://github.com/jloughry/Unicode I recommend you use it to add any glyphs that you feel are being neglected.


That article raises an interesting issue about a character in the author's name that is missing from Unicode. Unfortunately the article is (how to put this?) not constructive. The complex reasons that Unicode excluded the character are described in [1]. If the author addresses those issues, there's a much better chance to get the desired character into Unicode.

[1] http://www.unicode.org/L2/L2004/04252-khanda-ta-review.pdf


Correct me if I'm wrong, but isn't the Han Unification project more about unifying semantically distinct, but visually identical characters under the same codepoint (rather than grouping together similar-looking codepoints as the article suggests)? As far as I'm aware it's more along the lines of reusing the codepoint for 'a' when encoding both English and Spanish text. Am I mistaken in thinking this?


But if the shape of embedded in the text, font choice becomes meaningless.

> undue effort on every computer maker, ect, to keep up.

The effort to update the font files every few years? Unless you insist on supporting a new Unicode version the second it comes out, I don't see the big effort here? Of course there is effort for font makers, but this is quite centralised.


What about the oddest oddballs whose "symbols" are animations http://www.reactiongifs.com/r/tww.gif? They are used a lot on reddit sometimes even with sound.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: