
Schrödinger's 😻 and outside-the-box naming - edwintorok
http://lwn.net/Articles/545741/
======
h2s
While disappointing, it's perhaps not surprising that the umlaut in
"Schrödinger's Cat" caused trouble. The fact that the apostrophe was ruled too
risky as well, however, is an indictment on software engineering as a
profession.

If people are scared to put basic punctuation marks in the names of things,
out of fear that badly-written software might break as a result, then that is
a sign of just how far we still have left to go.

~~~
pavlov
Unfortunately your comment is a sign of just how far we have left to go to get
rid of the notion that computers are for use by English speakers primarily,
and the rest of the world is an afterthought at best.

On 99% of the keyboards I've seen in my life, the middle row reads like this:
ASDFGHJKLÖÄ'

The apostrophe is considered significant enough to be on that row, but so are
Ö and Ä. It's reasonable for users to expect that software can accept the
letter that is right next to L on their keyboard, yet there remain software
engineers who assume that users won't be surprised that things break if they
dare to touch this key.

~~~
h2s
My point was that Unicode is newer than ASCII, and that we can't hope to deal
with Unicode (Ö) properly if we can't even cope with ASCII (') yet. Nothing to
do with anglocentrism at all. I agree that there are lots of annoying
computing problems for non-English speakers though.

~~~
pavlov
What I tried to say is that dealing with ASCII is meaningless; it's not even a
useful starting point.

For the majority of people, a string format that accepts 100% of ASCII but
0.1% of Unicode is just as useless as one that only accepts 95% of ASCII.
Therefore the goal should never be to get your ASCII coverage from 95% of
100%.

~~~
kbolino
There are two issues here: one is not accepting Unicode properly, and the
other is making incorrect assumptions about the content of strings. Both need
to be resolved, and all this cultural butthurt is not productive to solving
either of them.

Resolving the Unicode issues is undeniably a higher priority for speakers of
foreign languages, but there are still plenty of languages and libraries whose
support for Unicode ranges from nonexistent, to antiquated, to limited, to
just plain broken.

There's nothing application developers can do in the short term to fix those
problems, but by examining their own code and removing fallacious assumptions
they can better facilitate the proper handling of Unicode once it becomes
available in their underlying technologies. In the mean time, though, they may
only have ASCII, or ISO 8859-X, or KOI8-R, or Shift-JIS, etc. to test with.

------
jakub_g
_In fact, some participants in the mailing list discussion proposed adding
non-alphanumeric characters to future release names just to see what happens.
(...) Peter Robinson proposed the project go right for the goal and choose
"DROP table_ *;".

We need a courage like this! ;)

~~~
lmm
Who's really going to use os-release in an SQL database? Real courage would be
'Fedora 20 ; rm -rf / #'

~~~
deafbybeheading
Don't forget --no-preserve-root

------
creamyhorror
I've seen a bunch of companies start to use icon/emoticon characters in their
email subjects to get attention - Newegg, LinkedIn, etc. It certainly works on
me. I'm afraid soon every mail will have an attention-grabbing icon and we'll
just get inured to it.

I have to admit, if I launched an email campaign, I'd use them too...

~~~
Evbn
Very easy to spam detect on non-alphanumeric chars in subject.

------
Chris_Newton
There are two big lessons from this discussion, IMHO.

Firstly, broad character sets are all very well, but they have limited value
if you can’t rely on _everyone_ using your text being able to see them.
Something like the ö (o-umlaut) is a reasonable thing to expect in modern
Western fonts, but what about the emoticons I’m increasingly seeing in
e-mails, or more advanced mathematical operators, or the cat in the title of
this very article that many people commenting here can’t see in some contexts?

We need much better standardisation and prioritisation of _sets of related
glyphs_ that are, for example, permitted by the coding standards for a
software project or supported by a font file. ASCII is too small, all of
Unicode is too big, and picking glyphs for inclusion in fonts one at a time is
too fine-grained for this purpose.

Secondly, it is crazy that with literally a million code points available in
Unicode world, we don’t seem to have new control characters for “begin
literal” and “end literal” to mark a range of text that should be interpreted
verbatim regardless of context. Instead, we’re still using horrible hacks like
quoting and escaping in environments like command lines and source code, and
in text file formats like comma- or tab-separated values. These kinds of
techniques are, invariably, horribly error-prone and terrible for usability;
after all, in the case we’re discussing, it seems to be the apostrophe that is
causing more problems than the ö! I think the computing world would be a much
easier place for many, many people if there were one universal standard way of
saying, “This is plain text”.

~~~
Jabbles
Is it too much to expect that every device should be capable of rendering
every unicode point (latest version at release) in at least one font? With the
option to fall back to that font if it tries to render something in a font
that doesn't have that character?

Obviously there are issues with unicode phishing of domain names and other
cases where you might want to signal that a character is "strange", but surely
the memory and processing requirements for this are low enough now. It doesn't
have to be a good font!

~~~
Chris_Newton
_Is it too much to expect that every device should be capable of rendering
every unicode point (latest version at release) in at least one font?_

Yes, I think it probably is. Unicode is _vast_ , and the 100,000+ characters
specified in the latest standard include numerous obscure, specialised, or
downright gimmicky ones.

The effort required to create just one font that supports even a crude version
of each and every character is probably measured in human lifetimes. Imposing
that kind of burden as a barrier to entry for any new platform seems
unrealistic.

~~~
ygra
For one, all those glyphs don't need to be drawn by a single person or even
coexist in the same font (they cannot, anyway, given current font formats). Of
the 110182 characters Unicode 6.2 defines Windows 8 (the only thing I can test
at the moment) includes glyphs for 102082 of them out of the box. The missing
ones fall mostly into either »rarely-used CJK ideograph« or »historical
script« (like Hieroglyphs).

Unicode serves many different needs and not all characters are necessary to
support in a general-purpose OS. There are fonts to cover the missing pieces
and professionals working in fields requiring those usually have them
installed.

There is also little benefit to provide a single font that encompasses all of
Unicode. Designers pick fonts for aesthetic reasons and every script has
different styles (although Latin, Greek and Cyrillic are fairly similar which
is why they usually are all included in every font). E.g. you have the main
distinctions into serif and sans-serif (for non-decorative body text). This
distinction never existed for scripts like Han, Hebrew, Arabic, various Indic
scripts, etc. So if you were to create only one font, what are your choices to
include for every script? Pan-Unicode fonts are mostly useful as fallback
fonts to ensure that you can see some rarely-used glyphs but for nearly all
practical purposes they cannot be used for anything else. It's also an
enormous effort beyond creating the glyphs because you'll have to include
kerning tables, define positions where combining characters appear, etc. Those
are often issues that make such pan-Unicode fonts unusable because yes, they
may contain plenty of glyphs but cannot be used reliably to render text that
goes beyond simple scripts (and diacritic placement can even be wrong with
just Latin.

~~~
Chris_Newton
Whether you try to supply a comprehensive set of characters in one font file
or many isn’t really the issue, though. You’ve still got to get all those
glyphs from somewhere, however they are grouped.

I’m just not sure I see a compelling argument that any new device entering the
market must be able to render advanced mathematical notations, animals, and
tarot cards. That’s a very high barrier to entry.

In due course, if there are freely available, good quality fonts that do the
job, then by all means include them, but we’re a long way from that situation
today. Even the most comprehensive efforts, things like Unifont, don’t cover
all of Unicode. Also, without wishing to belittle anyone’s efforts, some of
these projects are working on bitmap fonts, and it’s increasingly a vector
world. Perhaps they are still useful as a rendering of last resort, but I
suspect anyone working on a new platform or device has more pressing concerns.

------
xentronium
Why is unicode support so bad in 2013?

~~~
Millennium
It's basically Y2K without the sense of urgency. People assumed ASCII, or at
least single-byte supersets of ASCII, would pretty much be the norm, so they
never bothered with anything else. And since there are no apocalypse nutters
breathing down people's necks to fix it, it often gets deprioritized.

Plan 9 actually tried to do something about this: it assumed Unicode for
everything, and invented their own encoding for the process. The OS didn't
work out so well (regrettably), but we still use their encoding: it's called
UTF-8 now. Still a superset of ASCII, I guess, but at least we've gotten
beyond the single-byte assumption.

------
gbhn
Next release name: "' && rm -rf /"

------
biofox
More interested by the title... I had no idea unicode emoticons existed.

Found a complete list here:

<http://www.alanwood.net/unicode/emoticons.html>

~~~
INTPenis
I don't know how but the Homebrew project print a pitcher of beer somehow in
my iTerm2. Most likely unicode but it was not listed in the page you linked.

~~~
Zirro
🍺 Beer Mug - Unicode: U+1F37A (U+D83C U+DF7A), UTF-8: F0 9F 8D BA

You can find all of them in the Emoji section of Special Characters through
the Edit-menu, as described in another comment. They work in the classic
Terminal application as well, along with most other applications using
standard OS X API:s or ones that have specifically added support.

------
moron4hire
:( the cat glyph worked in Chrome on my phone but not on my desktop.

~~~
pavlov
Oh, it's supposed to be a cat glyph?

Chrome on OS X 10.8 shows me a square, and so I assumed that the joke is that
it represents Schrödinger's box -- you know, where he keeps the cat which may
or may not be dead.

~~~
evanb
My OS X 10.7.5 Chrome shows a cat glyph in the title in the tab bar, but a
square in the actual text. Quite weird.

~~~
randallsquared
Windows 8 Chrome does the same, here.

------
jonnyscholes
Hmm, is there a reference of who actually designed all these unicode chars
anywhere? I remember looking a while back to no avail - these more 'graphic'
ones resurfaced my interest. Did one poor sod do the whole lot [of Emojis]
or...?

~~~
ygra
The ones you are seeing come from a font you have installed. There should be a
copyright somewhere in there. For those in the Unicode code charts they come
from several people and prototype fonts, usually.

~~~
jonnyscholes
Ah, of course. Thanks :)

------
Jakob
For me in Chrome on Mac OS 10.8 instead of an emoticon it just shows a sqare.
Safari showed the cat.

The square looks like the qed symbol, I thought the article was about a proof
or a box. Both worked fine in my mind :)

~~~
omaranto
It sounds like something weird is going on, I would expect the following
behaviour: if you have a font with the cat glyph installed on your system all
browsers display the cat, if you don't have a font with the glyph then no
browser displays the cat.

Why does Safari show the cat while Chrome doesn't? I can think of two
explanations: (1) Mac OS doesn't actually have a designated place to put fonts
for all applications to find (I don't use Mac OS so I have no idea how it
works, but this sounds unlikely), (2) Mac OS does have a central location for
fotns but Chrome doesn't use them and just uses some fonts it comes bundled
with. Is either of those two explanations correct? If not, what is going on?

~~~
dragonwriter
> Why does Safari show the cat while Chrome doesn't? I can think of two
> explanations: (1) Mac OS doesn't actually have a designated place to put
> fonts for all applications to find (I don't use Mac OS so I have no idea how
> it works, but this sounds unlikely), (2) Mac OS does have a central location
> for fotns but Chrome doesn't use them and just uses some fonts it comes
> bundled with. Is either of those two explanations correct?

I suspect its an OS-related issue, as the cat shows in Chrome on Kubuntu 12.10
for me.

------
unabridged
There is a reason the latin alphabet is the most used in the world, and why
7bit ascii is the standard for computing. Simplicity. They are highly
distinguishable characters consisting of mostly straight lines and simple
curves.

Diacritical marks (training wheels) and unicode break this simplicity, you
cannot program in a language with many characters that are hard/impossible to
distinguish. Or with crazy characters that can switch direction of text and
other nonsense. Unicode is a luxury to pretty things up for end users, not
something to do serious work in.

~~~
Dewie
>, and why 7bit ascii is the standard for computing.

The US.

> Simplicity.

Oh...

> Diacritical marks (training wheels) and unicode break this simplicity, you
> cannot program in a language with many characters that are hard/impossible
> to distinguish.

Many languages get by with their communication in spite of having things like
"spots over the o's" or whatever. I have no problem distinguishing them. Do
you have experience with reading such languages? Or are you simply blowing hot
air?

Ever looked at typewriter font for l and 1? Yeah, these things are not
historical accidents at all...

> Unicode is a luxury to pretty things up for end users, not something to do
> serious work in.

It's a luxury for end users... yes, non-English users should count their
blessings when they are able to use their whole alphabet. Part of this issue
was correctly spelling the name of an Austrian physicist, not someone trying
to write "cat" with some esoteric Unicode that looks like a cat.

