
Why isn't the external link symbol in Unicode? (2018) - networked
https://dafoster.net/articles/2018/11/24/why-isnt-the-external-link-symbol-in-unicode/
======
TeMPOraL
I find myself agreeing with the rejection. As much as I dislike the ridiculous
amount of emojis in Unicode, and their increasingly widespread use, they do
fit plain text communication, whereas an "external link" symbol does not. My
heuristic is something I'll call _SMS sniff test_ : would it make sense to
type that symbol into a text message? If it wouldn't, then it doesn't belong
in Unicode.

"External link" symbol tells you, "that last bit of differently-formatted text
is an active element in this application, leading to an outside resource".
It's not something that makes sense in plain text, because any external link
in a plain text message is both visible and obviously a link.

Elsewhere in the thread someone mentioned the play/pause/stop symbols from
cassette recorders/VCRs. But those have been used culturally as symbols
denoting starting, pausing and stopping for decades now, so they're an idea
communication tool that makes sense in plain text, and thus pass the SMS sniff
test.

(Note that I'm not sure if all symbols in Unicode actually pass the SMS sniff
test. I suppose the best option for those wanting "external link" symbol to be
included would be to resubmit it as the DOUBLE ARROW POINTING TOP RIGHT OUT OF
SQUARE symbol, or something like that.)

~~~
joatmon-snoo
This made me stop and think because I've never really wondered about what
belongs and doesn't belong in Unicode, and then I went looking at what strange
corners Unicode has and _wow_ the "Miscellaneous Technical" code block is such
a strange thing.

* It has APL symbols (checking the Wiki article on APL syntax and the APL standard, it seems that APL programs can be represented entirely in Unicode - which makes sense, but is still a little surprising).

* There's a benzene code point: ⌬

* "ERASE TO THE LEFT", better known as backspace, is a thing: ⌫

...and a little more bizarro:
[https://en.wikipedia.org/wiki/Miscellaneous_Technical](https://en.wikipedia.org/wiki/Miscellaneous_Technical)

~~~
DagAgren
The backspace symbol makes perfect sense. You may very well want to say "Press
⌫ to delete the selected object".

~~~
amelius
But you may also say "Next to the link you will see [EXTERNAL LINK] which
indicates that the link leads to another website."

~~~
Freak_NL
That reasoning would make every conceivable icon a valid target for inclusion.

To actually get the character included, you can't just reason that it might be
used in a certain way, you will have to demonstrate it with actual and
authentic use in the wild. The accepted proposal for the inclusion of the
power symbol is a nice example of how this works.

If you can find a bunch of printed manuals or books or websites that actually
use this icon in this manner you might be able to submit a successful
proposal.

~~~
amelius
The web is full of those symbols, so I'm sure there are books (or online
resources) explaining them. And if not, someone should do something about
that.

------
jstanley
> It is unclear that the entity in question is actually an element of plain
> text, given the inevitable connection to its function in linking to other
> documents, and thus its coexistence with markup for links. Furthermore, the
> existing widespread practice of representing this sign on web pages using
> images (often specified via CSS styles) would be unlikely to benefit from
> attempting to encode a character for this image.

I don't really have a horse in this race, but... isn't the existing widespread
practice of representing the sign using images attributable to the fact there
is not a character available? What else do they want people to do? The point
of adding the character is so that in the _future_ people can use text instead
of an image.

~~~
psychoslave
I think that the point is, unlike emojis which have been integrated in end
products by some vendors, forcing Unicode comity to integrate them due to the
Unicode consortium goal of backward compatibility, this icon is enforced in
any vendor charset out there.

------
dan-robertson
I think an issue with the arguments here is that each side is making its
arguments based around quite different standards. Unicode have two standards
for inclusion of symbols, one for adding new symbols and one for merging in
characters from other character sets.

The rejection is based on the first standard for adding new characters.

The arguments against it seem to be based on taking the second standard as
precedent. But I think as far as the Unicode committees are concerned,
arguments based on the precedent of whatever random crap was grandfathered in
from preexisting character sets do not apply to characters that are not from
preexisting character sets. I think any argument that relies on “you already
allow all this other random crap” must also argue that this symbol exists in
some other character set which ought to be merged into Unicode, or that the
standard for new characters should be more precedent-based/different, but I
don’t see any such arguments other than some implied “common sense guess as to
how one expects Unicode to work”

~~~
Aeolun
So the smartest way to go about this is to create a whole new character set
for external link symbols? That seems like a really roundabout way to get this
thing accepted.

~~~
cryptonector
The word you're looking for is "script". And, no.

One has to make a better argument for the external link symbol getting a
codepoint assignment. TFA, for example, makes an argument based on emojis --
certainly that's strong enough to blunt the UTC's rejection rationale, but
perhaps not enough to win approval outright.

Addressing all the arguments used in the rejection is important, of course.
The fact that currently images are used is hardly dispositive: that's business
as usual for missing Unicode assignments!!

But there are probably stronger arguments for rejection than adoption than the
UTC made that it could make the next time this comes up.

The best argument for rejection that I can think of has to do with layering.
An external link character isn't very useful without the _actual_ external
link, but the link belongs a layer up: not in the text, but in the markup.
Well, if the _link_ belongs a layer up, why not also the symbol?
Alternatively, more markup can move into Unicode. There has been and will
continue to be some pressure to move more semantics from markup to text, but
it's probably best to resist that pressure.

On the other hand, a solid argument for adoption may involve text rendering of
HTML. Think of lynx/elinks and other such browsers, which can't use images. An
external link character could prove useful in distinguishing the rendering of
linked text from, say, underlined non-linked text.

I'm surprised these arguments didn't come up. Or maybe they did -- I've not
gone down the rabbit hole on this one, and probably I won't.

------
rustybolt
I think that most people find this hard to digest. The committee approves a
ton of emojis with a dozen variations each, a gazillion characters that make
it possible to make some weird word soup that breaks sane layout, but
including one of the most commonly used symbols is somehow out of the
question.

~~~
DagAgren
Because all of those symbols are used as part of text, while the external link
symbol isn't. It is not part of the text itself. If you copied that text, you
would not really expect the link symbol to be copied along with it.

~~~
nabla9
>If you copied that text, you would not really expect the link symbol to be
copied along with it.

Because it's not in Unicode it can't be copied. If it was in unicode symbol I
would expect it.

~~~
DagAgren
But it's not part of the text, it is a decoration. Even if it was in Unicode,
I would not expect it to be part of the copied text.

~~~
pas
Isn't this something like a stylistic editorial decision? Basically it could
be either way depending on what the author/publisher/editor wants.

On a lot of pages links are hidden as plain text and only show up if someone
hovers over them. (Great? Confusing? Bad UX? Sure, but still a choice.)

At the same time someone else might just use underlining, but no different
color. And someone might just want to use a symbol.

------
josh_fyi
The reason for rejection, for better or worse, is that this is a functional
element, like a button, rather than part of running text.

~~~
bluescrn
Like the play/pause symbols then, which are already in there?

And since when was a smiling poop an ‘element of plain text’

~~~
VMG
not that I agree with the final decision, but I can imagine "Play"/"Pause" to
show up in a device manual and Smiling Poop as part of a chat message.

Neither applies to the "external link" symbol.

~~~
toxik
Why couldn’t the external link icon appear in technical documentation as it so
often does?

~~~
jolmg
Can you imagine it in paper documents or anywhere that is not hypertext?

~~~
martyvis
Yes, I've seen it used as a way to steer the reader to open their browser for
further information with URL written in plain text. Like hyper footnote

------
11235813213455
U+1F517 [https://emojipedia.org/link/](https://emojipedia.org/link/) might do
this job

Anyway I think we can avoid target="_blank" most of the time, you can have a
[https://developer.mozilla.org/en-
US/docs/Web/API/WindowEvent...](https://developer.mozilla.org/en-
US/docs/Web/API/WindowEventHandlers/onbeforeunload) event listener if
something needs to be saved before the page location changes

~~~
Izkata
I generally see that one used as "permalink/create share link to the current
page".

------
Symbiote
The closest I can think of is U+2b00 (north east white arrow) followed by
U+20de (combining enclosing square).

I suspect HN will eat the character: ⬀⃞ although they sometimes pass through
if there's enough other text in the comment.

~~~
saagarjha
Hacker News lets some emoji and Unicode through, but I’m not sure how it’s
chosen. Here’s some I copied from Wikipedia’s page on emoji: ℹ ⌛🀄🈚

~~~
OJFord
Weird, I've never seen any before this thread, I thought they were all
stripped. Bug or feature, I wonder?

~~~
saagarjha
I’ll ask tomorrow if I can remember and get back to you.

------
qwerty456127
> Furthermore, the existing widespread practice of representing this sign on
> web pages using images (often specified via CSS styles) would be unlikely to
> benefit from attempting to encode a character for this image.

Obviously enough text-mode browsers would be likely to benefit.

~~~
etrabroline
This made my head spin. It's like saying people already using the JIS text
encoding would be unlikely to benefit from adding Japanese to Unicode.
Absolutely mind blowing.

~~~
DagAgren
No, it is saying that Unicode is used to encode what people think of as plain
text, and that UI symbols that are not part of the text content are outside
the scope of it.

------
irrational
> It is unclear that the entity in question is actually an element of plain
> text...

As the author pointed out, emojis seem to clearly violate this excuse. What
possible justification do they give for emojis? There is no way [I originally
inserted the poop emoji here, but it was stripped out. That kind of reinforces
my point.] is an element of plain text.

~~~
PeterisP
The pile of poop would not ever get accepted as a character in unicode on its
own. It's there because of including an existing character encoding set (IIRC
from Japanese featurephone messaging standards), which had a pile of poop
character and other emoji, so for purposes of compatibility all the characters
of that set must get Unicode mappings.

So there's a situation of dual standards - all the weird characters that were
included in any pre-Unicode text encodings for whatever arbitrary reasons are
in Unicode and are always going to be there; but all the _new_ weird
characters need appropriate justification for inclusion and are likely to be
denied.

------
cabalamat
> The UTC rejected the proposals to add “external link sign”, most recently in
> L2/12-169. It is unclear that the entity in question is actually an element
> of plain text

Nor is Pile of Poo, and that's in Unicode.

If external Link was added to Unicode, I expect it would be more used than
1000s of characters that are in it.

~~~
gnulinux
> Nor is Pile of Poo

I really really don't understand this point. We have evidence that people use
pile of poo in plaintext today in instant messaging all the time. Isn't this
enough evidence that pile of poo belongs to plaintext? I'm not trying to be
facetious; it's really puzzling to me. Since every single comment in this
thread is about pile of poo, whereas it seems to me the worst possible example
since it's such a widely used emoji.

~~~
recursive
It's only possible to use it in a text message because it's a unicode
character. If external link was a character, would people use it? I don't
know. Do you?

~~~
gnulinux
Maybe, maybe not, we don't know. What I know is that even my mom and grandma
use pile of poo on facebook. Will they use "external link" symbol? I would
guess not, but maybe.

~~~
recursive
> Will they use "external link" symbol? I would guess not.

In that way, it would be no different from 99% of unicode characters then.

~~~
gnulinux
Exactly, my point is pile of poo is such a bad example since recently it's
probably in 1%-th percentile of most used unicode symbols in plaintext. It's
similar to arguing something like ∆ or é does not belong to unicode.

------
yoz-y
I'd like the condition on the plain text to be relaxed. I would love to be
able to use Unicode for creating basic interfaces. I would love to have basic
interface elements such as a magnifying glass, "save" icons and external link
to be part of the standard. Maybe they don't strictly find their use but for
one people would find the use if they were there, and two, there are already
(granted, grandfathered) elements that were used exactly for this purpose back
in the era of plain text window interfaces.

~~~
squiggleblaz
There's the PUA. And I mean, it's not like it doesn't happen. See font awesome
for instance.

~~~
yoz-y
I use the awesome versions of fonts in terminal for power line and stuff. But
that's kind of the thing, it can't be ubiquitous and easily reused. If every
system had it's interface font it would make a lot of stuff easier. An example
would be glyphs on buttons that are almost always the same.

------
archgoon
This would make copying links harder as you would need to avoid copying the
link symbol, unless the link symbol was part of the URL.

~~~
aendruk
A typical implementation would use a generated pseudo-element that doesn't get
copied.

~~~
scubbo
So, not a unicode character, then - a presentation element that is rendered
_on_ unicode.

~~~
aendruk
No, e.g.:

    
    
      a::after {
        content: "↗";
      }

------
EGreg
For this stuff we have fontawesome and ttf generators from svg’s. Just grab
svg icons on thenounproject and make your way own font!

I happen to disagree with the unicode decision because there should be a
section for basic commands so they can be rendered in diff fonts. But whatever
— as I say don’t wait for standards if they don’t exist, make your own!

------
paulsutter
How many new characters are proposed each year? What’s the rejection rate?
Maybe it’s a healthy process to reject when uncertain and then reconsider
after popular appeals, especially if they are inundated with (mostly)
questionable applications. It’s premature to judge the process without more
context

------
joelanman
We removed it from GOV.UK in 2016:

[https://designnotes.blog.gov.uk/2016/11/28/removing-the-
exte...](https://designnotes.blog.gov.uk/2016/11/28/removing-the-external-
link-icon-from-gov-uk/)

------
akvadrako
After getting an emoji accepted I submitted a proposal for the external link
symbol in 2018, trying to address the committee's concerns from the several
earlier proposals. [https://doubly.so/pub/External-
Link-2018.pdf](https://doubly.so/pub/External-Link-2018.pdf)

It was summarily rejected with the nonsense statement (in full):

> Thank you for your submission. This was discussed during last week's UTC
> meeting. I was directed to let you know UTC feels that, as submitted, the
> proposal does not sufficiently demonstrate a plain text need for such a
> symbol. The context for usage is mark-up with links by default.

~~~
ajnin
At this point I think a submission of 8 generic "arrow exiting square from
top, top-right, etc." might have a better chance of being accepted as it holds
a general meaning of exiting something and is not specific to hypertext as
seems to be their objection.

------
ghusbands
> Its main rationale appears to be that the external link icon is not an
> element of plain text. I would agree that is the case. However I would like
> to point out that emoji and other similar useful symbols are not plain text
> either yet they have been accepted and continue to be accepted.

The author seems to be using a very limited definition of "plain text". Emoji
are clearly used in the same way as latin-character text in communication, so
they are effectively plain text.

I think a more effective argument may be that the external-link symbol should
be allowed for the same reasons that the power on/off/toggle and eject symbols
were allowed.

------
mci
One of most peculiar tourist attractions in the Wieliczka salt mine near
Krakow are signs with a shaft symbol that is not in Unicode. It looks like a #
with a · in the middle and appears in texts instead of the word "shaft".

------
ZiiS
It is simply factuly incorrect to say that a commonly used sysmbol would not
benifit from being included in fonts. How can a committee working in this
field posibly fail to understand that?

~~~
etrabroline
It is not incorrect, and they do understand but they don't care. They have
esoteric philosophical justifications that exclude some commonly used symbols
while inventing never before seen ones. This is the problem with putting a
tiny number of otherwise powerless people in charge things.

They _clap_ become _clap_ petty _clap_ tyrants.

~~~
andai
What is the significance of clap?

~~~
etrabroline
I'm mocking the overuse of the clapping hands emoji. Here is the first result
from Bing for the search "clap emoji overuse":

[https://www.reddit.com/r/justlegbeardthings/comments/6rl6mu/...](https://www.reddit.com/r/justlegbeardthings/comments/6rl6mu/whats_with_the_overuse_of_the_clap_emoji/)

Which is something no one asked for, but Unicode gave us anyway and now
everyone is annoyed by.

~~~
squiggleblaz
Maybe a few people on twitter? I've never seen it elsewhere and I barely use
twitter so saying "everyone" is an exaggeration. Also, if everyone were
annoyed by it, then it wouldn't be overused any more almost by definition.

------
totetsu
This kind of reminded me of this I was reading this morning
[http://seancubitt.blogspot.com/2020/04/allonomy-autonomy-
and...](http://seancubitt.blogspot.com/2020/04/allonomy-autonomy-and-
after.html) "a poem cannot 'contain in itself the reasons why it is so and not
otherwise' (Coleridge) since it must be written on top of the infrastructure
of a language and orthography that the poet rarely originates"

------
jakeogh
What's the code point for uppercase superscript Z?

j/k there isnt one.

[https://en.wikipedia.org/wiki/Unicode_subscripts_and_supersc...](https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Latin_and_Greek_tables)

No doubt those emoji's are more important and historically common than
superscript uppercase C F Q S X Y and Z.

[https://github.com/jakeogh/unicodehaz](https://github.com/jakeogh/unicodehaz)

~~~
saagarjha
I’m actually fairly annoyed that a lot of obvious superscripts, subscripts,
and strikethroughs are missing.

~~~
clarry
I'm actually more annoyed that they are there in the first place. Unicode
can't seem to decide whether it's about just encoding text or also doing
presentation/formatting (and now piles of poo and colorful emoji, what next,
animations?), so now it's a bit of both, what a mess. It's making life hard
and breaking things for applications that assume text is just text, and the
presentational features are not enough to avoid having to implement your own
presentational features in a program that needs to do presentation..

now you have applications where you can't find a string because it was written
in Unicode bold letters instead of the letters' normal ASCII counterparts. And
then you have applications that are confused about those bold letters because
they are not actually surrounded by bold markup.

The worst part is that you can't criticize unicode without attracting a crowd
of bullies who handwave about human languages being complicated (no, that does
not justify poor design) or say it has to be this way because ugh shift-jis
(or whatever nasty old encoding you can come up with) is not nice.

Well designed technology makes complex things simple. People defending poorly
designed technology blame the problem for being complex.

𝔽𝕦𝕔𝕜 𝕦𝕟𝕚𝕔𝕠𝕕𝕖!

(Why don't you try search for "fuck" in Firefox?)

~~~
GekkePrutser
You're so right... Unicode is making things too complex, and intruding on
various formatting issues. Case in point: The skin tone modifiers. Or this
stuff: [https://en.wikipedia.org/wiki/Zero-
width_joiner](https://en.wikipedia.org/wiki/Zero-width_joiner)

There doesn't exist any parser that does it 100% correct. And parsing it is
becoming so complex that it's causing bugs and vulnerabilities (it's not a
coincidence that so many remote exploits use some kind of unicode to trigger
it).

------
Finnucane
I think from the consortium's point of view it boils down to whether it makes
sense to consider the link symbol separately from the text that makes up the
link. They apparently decided it did not, but you could disagree. Of course,
it does not really matter if you disagree, because they won't change their
mind (unless you repropose it in a way that makes a different argument from
the original proposal, which their rejection is basically saying could be
considered).

------
c-smile
Agree with the rejection.

The same link in different contexts can be as either external or local. So
that is a business of UA/renderer to mark it properly.

CSS is quite adequate for that ([https://davidwalsh.name/external-links-
css](https://davidwalsh.name/external-links-css)) and the image (or whatever
author/UA decided to use) can go inline in CSS itself.

------
vagab0nd
Talk about timing, I was just looking for this in unicode for my website
yesterday. Didn't find anything that looked good so ended up going with Font
Awesome: [https://fontawesome.com/icons/external-link-
alt?style=solid](https://fontawesome.com/icons/external-link-alt?style=solid)

I think it's a missed opportunity for unicode.

~~~
chadlavi
try `\u2197`. You can see it used on external links here:
[https://chadlavi.github.io/clear/#/link#examples](https://chadlavi.github.io/clear/#/link#examples)

------
webkike
Emojis are definitely plain text, as they have meaning when I write them on a
piece of paper

~~~
recursive
The external link symbol also has meaning when written on papaer, which is
"external link".

------
savolai
I want a tool that scans English &#128483; text for names of unicode symbols.
Then I want the English language to move to that symbolic writing. You know,
because it is fun to see cultures evolve.

------
modzu
click h͟e͟r͟e to learn more about the decision

------
ape4
Glad they say "no" sometimes

~~~
GekkePrutser
Didn't stop them from making Unicode an overcomplicated clusterfuck of
combination codes though... Referring to things like this:
[https://en.wikipedia.org/wiki/Zero-
width_joiner](https://en.wikipedia.org/wiki/Zero-width_joiner) . It's become
almost impossible to build an accurate parser now.

------
sova
"unicode was intended only for print media" doesn't bleed with irony already?

------
crazypython

        a::after {
          content: url(my_external_link_symbol.gif);
        }

------
chadlavi
I usually just use `\u2197` or similar.

------
runxel
This article sums up my encounter of the Unicode committee pretty well. They
have stopped long ago making sane decisions.

------
chrisseaton
Why do you need to know that a link is an external link?

~~~
pfranz
I see it often in intranets, wikis, and PDF documentation. It's an additional
context clue that you're leaving a closed website. The most egregious examples
give you a separate click-through screen when leaving the website. Government
web pages seem to do this the most.

It would be nice if we could all use a standard icon or some other constant UI
element--like a single underline for internal references and double underline
for external? I imagine unique colors would be too difficult to standardize.

~~~
chrisseaton
Yeah but why?

What do people do with the information that a link is external? Do people
think 'I'll follow this link - oh no wait a minute it's external I won't'?

~~~
pfranz
Specifically in those circumstances it's more important than normal; intranet,
government, and PDF documents.

Intranet: bespoke documentation to internal processes verses generic documents
used for reference.

Government: external references can be hijacked (asking for personal
information) or may not represent the government but still have relevant info.

PDF documents: jumping around a PDF document (from a table of contents) is
different than going to an external website. Especially if you don't have
Internet access at that moment.

I think all of this is significantly more important since browser have been
hiding more URLs.

------
danielrpa
...and U+1F4A9 is!

------
chris_wot
God, they spend so much time on emoji which is much the same. They are
becoming a bit of a joke!

~~~
golergka
You don't expect to be able to click emoji, and they don't lose their meaning
when printed out on paper.

~~~
dandellion
If you print a web page the links are still there. The icon tells you "this
here was a link" and you can go to the computer and look it up. If the link
wasn't underlined and had a different colour but you printed in b&w it might
be the only way to know that was a link, so I would argue that the icon is
more useful in paper than on a live web, actually.

~~~
hunter2_
It's not for all links. It's for external links, meaning a different domain
than what's currently in your address bar.

