At the very least it is a pleasant unintended consequence. A decade ago support for characters outside of the BMP was hairy (certainly possibly though in PDF's with LaTeX via XeLaTeX), and something you would use if you dabbled in the obscure glyphs contained in it; now it is standard.
Seems strange they would miscount so grossly otherwise.
But we currently have 88k CJK characters assigned out of possibly more than 100k total.
I can't easily find anything about how this went wrong and they got such a small number.
If you also count the Unihan variations registered in Unicode's ideographic variation database by various Japanese outfits, encoded using the VS16 to VS255 characters after the codepoint they modify, there's another 8k or 9k unique characters assigned.
Also in Version 12.0, the following Unicode Standard Annexes have notable modifications ⟨…⟩
UAX #14, Unicode Linebreaking Algorithm
UAX #29, Unicode Text Segmentation
UAX #31, Unicode Identifier and Pattern Syntax
UAX #38, Unicode Han Database (Unihan)
UAX #45, U-Source Ideographs
How many people do you really think care about Elymaic script? Or about Nandinagari?
In addition to what grandparent listed:
> UTS #10, Unicode Collation Algorithm—sorting Unicode text
> UTS #39, Unicode Security Mechanisms—reducing Unicode spoofing
> UTS #46, Unicode IDNA Compatibility Processing—compatible processing of non-ASCII URLs
I think plenty of people care about these changes.
But you ignore the substance of the parent’s post which is about the new elements of the Unicode standard which are not confined to assigning code points. There is substance there to be analyzed - there is material there about how Unicode should be used in defining identifier syntaxes which is of high relevance to an HN audience (eg in defining your new serverless framework, what characters should you allow in function names? Unicode now has a better answer for you than ‘[a-zA-Z_0-9-]’).
There are updates to Unicode security and idna support.
But no, sure, let’s complain about emoji and obscure languages.
Something being "somebody's job", and others being upset that that somebody keeps doing it, can be very logical. E.g.
1) if the job wastes resources, is deemed silly, is done badly, etc.
2) if the job is detrimental to those others
Apple has been pretty good at adding Emoji on iOS, but macOS seems to be left behind.
As for Linux... Installing the Unifont gives coverage, but most distros don't seem to have a way to update base level fonts.
I'd love to be proved wrong about any of this. But is seems that the majority of systems don't care because their native scripts are already supported.
The latest macOS version gets emoji updates regularly; it stays in sync with the latest iOS.
Newer versions of distributions tend to come with updated versions of the fonts installed, so eventually support will increase.
I'd be surprised is there was more than a "between iOS/macOS major versions" discrepancy of their emojis.
:-( This may be my favorite Google project ever.
Asking because I'm impressed by the aim of the whole Unicode project but having no real experience with it beyond the basics.
For practical purposes there isn't "something else". We're well past the point where Unicode was adding things that worked fine on a specially modified edition of Microsoft Windows for the specific language (like Dungan, which needs extra characters not normally used in Cyrillic) or whatever, these are now often _really obscure_ writing systems where previously you'd only put them "on a computer" by uploading a picture of the writing. Now the computer can handle them as text because they're in Unicode.
For all the historical writing systems, and some of the minority systems that have very few users many of whom know another language that is more widely used and thus more useful to them in practice (imagine going on a forum to ask a question about maintaining the motor sledge you use, you know Russian and also Dungan - obviously you will ask in Russian, because that's a LOT more people who might answer) - in practice the new scripts in Unicode will only be used by academics to transcribe stuff. It still makes that easier, because they can use Unicode everywhere, not just in specialist tools that maybe another researcher built for the language they care about.
Emoji are a slightly different beast though. Those seem to get included based on projected use cases.
It's basically: "text/social comment/chat apps are big, let's add more BS icons for our Facebook/Apple/Google/MS/etc chat apps"
Technically speaking Unicode is not an encoding, but otherwise your point is mostly correct.
A character set can be encoded in a variety of ways, for Unicode / ISO-10646 the encoding UTF-8 is the most popular for a variety of reasons that I'm sure will one day be an exciting historical artefact for HN readers to remark upon.
I don't like the word character, because it tends to cause idiots to build software that thinks Unicode codepoints are the indivisible unit out of which strings are made, and that's no more true than for bytes. I prefer the nice fuzzy word "squiggle" when I mean the thing you as a human are perhaps imagining when saying "character" and to use nice technical terms like "pictogram", "grapheme", "glyph", "code point", "code unit", "symbol", and so on when I mean those specific technical things. But in the phrase "character set" that's what we ended up with, so be it.
For utf8-safe languages there would be 4 new scripts to add, but this affects only rust, java and cperl. All others are unicode unsafe.
Anyway, since you asked, the fringe diversity group getting the new custom emojis is disabled people (mainly hearing-impaired, vision-impaired, and wheelchair users). So, y'know, on the order of 10% of the population. Very fringe.
The rest is stuff like garlic and yo-yo. I mean think of yo-yoers what you will but I don't think they're a fringe diversity group :p.
In comparison, a few extra kb won't seem so bad.
Once they added global 3G/UMTS support their networks got SMS support, but it's only used by phone number verification services and 2FA, nobody actually sends them. Their homegrown 2G networks before that launched messaging using E-mail.
(Reason for that is that the dot patterns are heavily overloaded and even language specific. German digits are different from US ones, for example)