
A comment on Hacker News led to 4½ new Unicode characters (2016) - pcr910303
https://unicodepowersymbol.com/we-did-it-how-a-comment-on-hackernews-lead-to-4-%c2%bd-new-unicode-characters/
======
lokedhs
I posted a message on the Unicode mailing list, which eventually lead to an
proposal to accept a large number of new characters that encodes symbols used
in the old 8 and 16-bit micros.

My original question was specifically about the C64 character set, but we
managed to get several others covered as well, including several symbols from
the Atari ST character set.

The proposal was accepted, and the work continues to create a new proposal
covering the character sets of even more old computers.

[https://www.unicode.org/L2/L2019/19025-terminals-
prop.pdf](https://www.unicode.org/L2/L2019/19025-terminals-prop.pdf)

I'm quite happy that my modest question led to some real progress.

~~~
jimnotgym
I'm disappointed to find that the Atari ST character set doesn't contain a
bomb symbol. Obviously two entirely different things have been confused by my
childhood mind.

~~~
lokedhs
The Atari ST did display bombs when an application crashed. But it was never
part of the character set. It's a graphic that is displayed by the trap
handler in the operating system. The number of bombs indicate the trap type,
so three bombs means trap handler 3 (address error).

The symbols that do exist, but was not included in the proposal was the Atari
logo and the J.R. Dobbs picture. Both of which are copyrighted, which is why
they are included.

~~~
ghostbrainalpha
These type of comments are why I come to Hacker News and waste so much time
here. Thank you.

~~~
lokedhs
Thank you for the appreciation. The Atari ST was a very important computer for
me when I was growing up, and I love talking about it.

------
3JPLW
This is a great story.

Next up: I'd love to see the sub/super proposal get some more attention and
effort.

[https://github.com/stevengj/subsuper-
proposal](https://github.com/stevengj/subsuper-proposal)

~~~
moultano
Having the combining character version of that would be fantastic. Eventually,
It would be amazing if all the details of math rendering could make it into
unicode.

~~~
nixpulvis
> Eventually, It would be amazing if all the details of math rendering could
> make it into Unicode.

I'm not sure if I agree with you or not... Generally I'd say I do, but we're
going to have a hard time "finding the line". Meaning what counts as "math"?
Surely 1 + 1 is, as is ∇×𝐇, and we can start to do things like x⁰. However,
what about a graph with nodes and edges (just as an example)? Is that "math"?

One things strikes me about _strings_ of _characters_... you can select and
copy/paste them (at least in my native alphabet of Latin) very reliably. This
property is not present with Unicode in general.

~~~
chris_wot
It’s not easy to select characters in Arabic and many Vedic languages.

------
ramshorns
I give this achievement 4½ stars.

[http://www.righto.com/2016/10/inspired-by-hn-comment-four-
ha...](http://www.righto.com/2016/10/inspired-by-hn-comment-four-half-
star.html)

------
gambiting
I love how this was apparently accepted into the standard in 2016, and yet
Chrome still displays an empty square instead. Unicode is such an unbelivable
mess when it comes to support it's crazy. Windows displays it correctly for me
in Word, but when pasted into Teams it comes out as a semicolon instead.
Brilliant.

~~~
toyg
But it’s not a mess, there is a clear chain of responsibility: * consortium ->
font developers -> app developers. If anything is broken, bitch to the first
level (typically app devs) and wait for your grievance to percolate upwards as
necessary; if it doesn’t happen, you know who to blame.

* most modern operating systems have hopefully sorted out their issues a long time ago.

~~~
LunaSea
If you have to complain, it's broken. As a consumer I don't care what the
issue is and who the responsible party is.

Unicode is a clusterfuck exactly because the chain is too long and the
implementation errors are too easy to make and the world is rife with
incomplete implementations.

~~~
Wowfunhappy
Is it really so surprising that updating a universal standard takes time?

IMO, these processes _should_ happen slowly.

~~~
LunaSea
It has nothing to do with the update speed and everything to do with the
standard itself. Even if you were freezing Unicode now, you wouldn't encounter
complete, correct implementations in the wild more than 50% of the time in the
next 5 years.

------
edent
It has been 3 years, and Android & iOS still don't fully support Unicode 9.
Any ideas who I should bug about that?

Google's Noto font hasn't released anything for 2 years. No idea if Apple's
default font is open source.

------
nixpulvis
I love how iOS doesn't understand this URL, and still doesn't have these
characters. Too busy removing the Taiwanese flag, and implementing "Animojis".

~~~
phkahler
I still think unicode should not have added emojis. The big guys are adding
animated emojis now, and that's clearly out of scope for character sets. If
they continue down that path unicode will eventually become SVG with
animation. IMHO they should have stopped short of emojis.

~~~
Grue3
IMO Unicode shouldn't have ever added characters with color. Let color be the
property of the markup. The original emoji were all monochrome, and mainly
displayed on monochromatic flip-phone screens. The fact that there now exist
emoji that are otherwise identical but have different color is absolutely
idiotic.

~~~
icebraining
In Unicode, they're represented by a combination of emoji character + colour
character, not by new characters:
[http://www.unicode.org/reports/tr51/#Diversity](http://www.unicode.org/reports/tr51/#Diversity)

Seems a decent solution to me. Monochrome screens can just ignore the colour
character.

~~~
zyx321
There are at least 7 variations of the heart symbol ️ differing only in color
️🧡, not including the heart (playing card) which also has multiple variations.

Edit: apparently HN limits the number of emoji per post, I originally included
several.

[https://emojipedia.org/white-heart/](https://emojipedia.org/white-heart/)

~~~
bhaak
There are 27 variations of the capital letter A in Unicode (see the confusable
list
[https://unicode.org/cldr/utility/confusables.jsp?a=A&r=None](https://unicode.org/cldr/utility/confusables.jsp?a=A&r=None)).

Redundancy has never been a problem with Unicode. After the decision was made
to add the symbols of the original encoding to the list of Unicode characters,
they had to add all heart variation, otherwise it wouldn't have been able to
be backward compatible.

Or alternatively add a color modifier? I'm not sure that would have been a
better solution.

------
ggm
Power up stars for the first person to successfully argue for _removal_ of a
UNICODE character.

~~~
nixpulvis
Scanning quickly through I found this one: ℻ ([https://unicode-
table.com/en/213B/](https://unicode-table.com/en/213B/)).

Apparently it's semantically fac·sim·i·le. Which means (according to Google)
"an exact copy, especially of written or printed material."

My argument is something like, why not just write FAX. Or is the counter that
some fonts will specialize this character to something closer to the native
language? That seems unlikely, and instead people will probably learn that FAX
means "to make alike", from Latin. Or is it that we need to make it just a
little bit above the baseline to indicate that it's special. Surely "FAX"
isn't the only thing that should be allowed to be special, right? But then
that's a whole can of worms. Anyway, I'm rambling...

This was the best I could do in the limited time I had (I really should be
asleep by now).

~~~
kuschku
On business cards, in letters, etc you write your phone numbers.

Usually as

TEL 0123 45 67 89

FAX 0123 45 67 89-0

the "TEL" and "FAX" part should be superscript small capitals though. That's
there these special symbols come in.

Some fonts also replace those with icons for phone/fax.

They're actually still in use in Germany today.

~~~
cyxxon
Well, fax machines are still in use here (most notably in mdeical offices, it
seams), but superscript abbreviations? Never seen them in the last 40 years. I
mean, yes, I have seen the letters FAX or TEL in front of the number, but
never really explicitely as a new character (AFAI can tell).

This would also fit best with the experience that people crafting cards or
letterheads never really know all the intricacies of Word, or Unicode, or
whatever they use, and just "make it look good" \- use tabstops instead of
tables, simply type FAX and mark it as superscript, etc...

~~~
kuschku
> This would also fit best with the experience that people crafting cards or
> letterheads never really know all the intricacies of Word, or Unicode, or
> whatever they use, and just "make it look good" \- use tabstops instead of
> tables, simply type FAX and mark it as superscript, etc...

I mean, that’s widely known anyway. Look at most letterhead templates online,
pretty much all of them are broken and quite painful in the way they’re built.
Broken tables, tabstops, all combined painfully.

Often enough proper tables would simplify a lot, if combined with columns one
can create amazing things. If one even adds automated hide/unhide elements
(e.g. page count, automated Internetmarke or hiding it if unused, etc) one can
create stuff that’d save hours of work every day.

------
Wowfunhappy
> There was some discussion around ⏾ as several “moon” characters already
> existed. None of them [...] convey the semantic meaning of “Sleep” – so ⏾
> was accepted.

I'm not convinced ⏾ conveys that meaning either, unless it's explicitly used
alongside the other new symbols. And if it is used alongside those symbols, a
couple of the existing moons could also work just fine.

(⏾ is appearing as tofu on HN for me, but I'm just going to roll with it.)

------
re
It's disappointing that OS-included font support for these has apparently
lagged. iOS 13.2 supports emoji from Unicode 12[1], but doesn't ship with a
font that includes these symbols. Anyone know if any OSes do support them?

[1]
[https://emojipedia.org/apple/ios-13.2/new/](https://emojipedia.org/apple/ios-13.2/new/)

~~~
cosmie
iOS 13.3[1] support these symbols.

[1] The 13.3 Beta I'm currently running on my 6S Plus renders them fine.

~~~
re
By the way, if you're viewing them on the unicodepowersymbol.com website, they
show up because of a web font included on the page. So here they are on a site
without the font: ⏻⏼⭘⏽⏾

Also here:
[https://en.wikipedia.org/wiki/Power_symbol#Unicode](https://en.wikipedia.org/wiki/Power_symbol#Unicode)

~~~
cosmie
Ah, that makes sense. I retract my statement – those symbols are still MIA on
13.3.

------
aembleton
Does anyone know how to add these to Ubuntu 19.10? When I search for Power in
Characters, it has the 'Power On' character but you can't see the symbol [1]

1\. [https://imgur.com/a/4qMgPkv](https://imgur.com/a/4qMgPkv)

~~~
ygra
[http://www.fileformat.info/info/unicode/char/23fd/fontsuppor...](http://www.fileformat.info/info/unicode/char/23fd/fontsupport.htm)

------
yoloClin
Now if only we could get Hacker News to support emojis

Edit: This comment ended in man-shrugging, but Hacker News stripped the emoji.

~~~
athenot
They used to be supported until they were abused.

~~~
colejohnson66
Maybe only allow them if you have a certain reputation/points like downvoting
and flagging has? Then if it’s abused, remove that privilege for that person
(IIUC, flag rights can be revoked)

------
Buttons840
> What other useful and/or important symbols are missing from Unicode?

Seems like a good time to ask this question again. Any new answers?

~~~
sjwright
The USB, Wi-Fi and Bluetooth symbols.

A spray can to represent spray paint, insect spray, or spray lubricant.

There's a power plug but no power socket.

There's no staples or stapler.

There are no appliances—no oven, microwave, toaster, mixer, washer, or vacuum
cleaner.

There's no traffic cone.

~~~
duskwuff
> The USB, Wi-Fi and Bluetooth symbols.

I believe those are all trademarked. Unicode tries to avoid those.

> A spray can to represent spray paint, insect spray, or spray lubricant.

Interesting idea! I wonder if there'd be any interest in getting wider
coverage of other bitmap paint tools: rectangular selection, lasso, paint
bucket...

> There's a power plug but no power socket.

That's problematic from a localization perspective, as electrical sockets vary
widely from country to country, and some of them may be difficult to recognize
as a socket at a small size. For example, some European countries use a socket
which is made of three circular pins in a straight line -- it'd just look like
an ellipsis.

Besides, there isn't a lot of symbolic meaning that's conveyed by a socket
that couldn't be expressed just as well with a plug.

> There's no staples or stapler.

Maybe. There isn't a lot of symbolic meaning to these either, though.

> There are no appliances—no oven, microwave, toaster, mixer, washer, or
> vacuum cleaner.

A lot of those will just look like white boxes at text size, and -- again --
they don't have a lot of symbolic meaning.

You might be able to make a case for an upright vacuum cleaner, though, since
that's visually distinctive and is associated with cleaning.

> There's no traffic cone.

Oh, I like that idea. It's got some symbolic meanings, too, like "warning" and
"under construction". There is already a construction sign (U+1F6A7), though.

~~~
Doxin
To be fair, a lot of existing unicode points don't really have a semantic
meaning either. Eggplant only got a semantic meaning after becoming a part of
unicode.

~~~
Symbiote
It means aubergine, doesn't it? At least originally. It's distinctive.

For cleaning, a broom is probably better than a vacuum cleaner.

A broom already exists 🧹

~~~
majewsky
"Aubergine" and "eggplant" refer to the same vegetable. The "semantic meaning"
is probably the NSFW one.

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=11958682](https://news.ycombinator.com/item?id=11958682)

------
oska
Input into a decision like this might be one of the best approximations of
achieving immortality.

------
peterburkimsher
That post inspired me to submit some Unicode characters as well! I found 11
Hakka & Taiwanese characters when trying to scrape & parse the Bible as plain-
text, and wrote a blog post about my experience of the submission process.

[https://news.ycombinator.com/item?id=17968110](https://news.ycombinator.com/item?id=17968110)

------
kensai
Great work, but we are still missing a "tinfoil hat" symbol in Unicode (or as
Emoji). This is a major letdown! :D

------
ericfrederich
Even the existing repurposed one doesn't work on my up-to-date Linux Mint
installation.

[http://www.fileformat.info/info/unicode/char/2B58/browsertes...](http://www.fileformat.info/info/unicode/char/2B58/browsertest.htm)

------
a3n
The "advisor" to the effort is doing interesting work on nano-grids, timely
because of the PG&E shutoffs.

> We had the very generous help of Bruce Nordman, who was involved in the
> original IEEE 1621 standard.

[http://nordman.lbl.gov/](http://nordman.lbl.gov/)

------
kuharich
Previous discussion:
[https://news.ycombinator.com/item?id=11958682](https://news.ycombinator.com/item?id=11958682)

------
imhoguy
Is there a practical upper limit of Unicode set capacity? Kind of IPv4 limits
with all reservations and local/multicast quirks.

~~~
kijin
Unicode has 17 planes. Each plane has 65,536 code points, so the total
capacity is 1,114,112 code points. In practice it's a bit less, thanks to
surrogates, private areas, and a bunch of "non-character" code points. That
still leaves close to a million code points.

Last time I checked, just over 13% of the available public space was
allocated. Most of the planes remain unused.

~~~
colejohnson66
Why _17_ and not a round number like 16? That would give a nice “round” one
mebicodepoints

~~~
kijin
BMP + 16 planes.

You can blame UTF-16 for this mess. Unicode was originally meant to be able to
encode two billions (2^31) characters. It bent over backwards to accommodate
the limits of the bastard child that is UTF-16.

[https://en.wikipedia.org/wiki/Plane_(Unicode)](https://en.wikipedia.org/wiki/Plane_\(Unicode\))

~~~
colejohnson66
Maybe they did it because Windows was backwards and used UCS-2, and later,
UTF-16? If somehow, Windows managed to switch to UTF-8, I’m sure they
(Microsoft) would mess it up and keep the 4 byte limit (imposed by Unicode)
there even if it’s later removed (for backwards compatibility). What Microsoft
really needs to do, IMO, is rewrite the Windows API to use UTF-8 or UTF-32.
Make a `wwchar` type or something...

------
kingludite
I really like the smiling poop and everything but [personally] I've only ever
needed a feed icon⸮

------
blt
I have some good ideas for emojis, but I'm guessing that's even harder to get
through...

~~~
phkahler
Poop is in there, and the love hotel.

~~~
kalleboo
Those have a compatibility story though

------
joshdance
Very cool. Never really considered how new characters were added. Thanks for
sharing!

------
Havoc
Any idea why 4/5 don't display until I enable javascript?

------
unhammer
Wonderful :) Now can we please have this:?

[https://www.xefer.com/2008/03/interrocolon](https://www.xefer.com/2008/03/interrocolon)

~~~
colejohnson66
We already have the interrobang, no? So why not this?

------
deaps
_> > ⏻ To The People!_

That line at the end made me smile.

------
pdq
led, not lead.

~~~
dang
Fixed. Thanks! Those are surprisingly hard to spot sometimes.

~~~
pvg
Moderation software idea: globally replace lede/lead with led in any
discussion about lede vs lead. Would bury the led once and for all.

------
notatoad
Is there a reason this is being posted now, or does this need a (2016) in the
title?

~~~
CaliforniaKarl
It indeed was covered here at the time...

[https://news.ycombinator.com/item?id=11958682](https://news.ycombinator.com/item?id=11958682)

[https://news.ycombinator.com/item?id=11952765](https://news.ycombinator.com/item?id=11952765)

... so yes, as per convention it should have a (2016). The best way to get the
message to the mods is via email (using the Contact link at the bottom of the
page).

------
nsxwolf
What is a half character?

~~~
saagarjha
a character is a character. You can't say its only a half

~~~
unwind
The symbol was already present, but it was given new documented semantic
content ("meaning"). Previously it was just a circle, now it's also the symbol
for "power off".

~~~
saagarjha
[https://www.youtube.com/watch?v=kpk2tdsPh0A](https://www.youtube.com/watch?v=kpk2tdsPh0A)

------
stiray
I know a guy, who was fighting hard with unicode consortium for adding two
characters with meaning "begining of blob, end of blob". Imagine how simpler
the coding would be without the need to escape anything. Unfortunately he
didnt succeed. They were more busy adding smileys.

~~~
bonoboTP
You'd have to escape those new symbols if they occur within the blob.

~~~
nixpulvis
Just add a `\\` /s

~~~
joosters
That's too lazy, we need unicode symbols for escaped start-blob and escaped
end-blob /s

~~~
majewsky
Not sure why /s. Also, you only need escaped-end-blob. start-blob can safely
be used inside a blob. So it's only 3 new codepoints, which is genuinely an
interesting proposal.

~~~
Dylan16807
Because that doesn't round trip. You need to be able to distinguish whether
the blob originally had "end-blob" or "escaped-end-blob". So now you need
another character for double-escaping, and so on, and so on.

To avoid that issue you're back to either adding a backslash, or doubling up
the character inside the blob... but if you're doing that you could have just
used " all along. No need for new characters!

If you really don't want to change the contents of the blob, and can't length-
prefix, then you could also use a new UUID as your delimiter each time you
embed a blob.

~~~
stiray
No. It depends only on developers. If they would start embeding it into
format, they would break the rules. If not, it would work.

~~~
Dylan16807
If you're telling developers not to embed that character anywhere then you
only need _one_ end of blob character. So not 'no', I'm still right in saying
that having two end characters is a non-solution.

And you still didn't explain why the existing ASCII control codes don't solve
your problem. The suitable ones are also not supposed to appear inside text.

