
Which character should represent the apostrophe? The Unicode committee is wrong - AndrewDucker
https://tedclancy.wordpress.com/2015/06/03/which-unicode-character-should-represent-the-english-apostrophe-and-why-the-unicode-committee-is-very-wrong/
======
PuffinBlue
Is this one of those things that everyone agrees (apart from the emotionally
invested people with the power to change it) is absolutely completely
infuriatingly obvious and should be changed immediately?

------
Raphmedia
Ever had a ” stealth its way into your code, well hidden amongst a bunch of "
? Perhaps a ’ in a bunch of ' ?

It's enough to drive you crazy.

~~~
Mandatum
Oh god. Never, ever copy-paste from Word documents when running SMS campaigns.
Unicode characters are rampant and take up 6x as much character data, so when
you're already pushing the limits with your message you end up going well over
the 155 char limit for a message.

------
jimrandomh
What? Apostrophe is U+0027. The distinctions created by these extra Unicode
characters will never be faithfully reflected in most text, because English
speakers only have one key to type them with and don't care.

~~~
Avitas
The author's point directly conflicts with your suggestion of using U+0027.
The author correctly states that the apostrophe when used inside a contracted
English word should be treated as a modifier letter. There is a Unicode
apostrophe character set aside and labelled as exactly this--U+02BC.

The author's point makes sense to me. I agree and my initial thought is that
the correct thing to do is recommend U+02BC be the apostrophe.

This is something for the future. It won't be faithfully reflected in text for
years or decades.

------
userbinator
U+0027 has worked for me everywhere I've used it, and no one has complained
that I was using the wrong character.

On the other hand, I've had to track down a ton of mysterious bugs caused by
software "helpfully" converting characters out of the standard ASCII set,
unbeknownst to the user (and they look basically the same)... only a hexdump
shows the truth.

------
_kst_
Perl 5.22 adds a new kind of word boundary, spelled "\b{wb}" as opposed to
plain "\b", that recognizes that apostrophes can occur in the middle of words.

------
wodenokoto
The author bases his entire argument on the assumption that "don't" is
_obviously_ a single word in English and that there is no doubt about that.

I for one was quite surprised about that. When doing word segmentation for NLP
it is common to split these contractions.

Plenty of people argue that it is a single word because there are spaces
around it or simply because it is a contraction. Other argue it is 2 words
since they are of different word classes.

------
ben0x539
I'm gonna say I'm rather happy that \w doesn't match apostrophes, aka string
literal delimiters in some environments.

~~~
dragonwriter
U+0027 (APOSTROPHE) is a string (or character) literal delimiter in some
environments, and it might be sane (though inconvenient, for typing) to have
an environment where U+2018 and U+2019 (LEFT and RIGHT SINGLE QUOTATION MARK)
were character/string delimiters, but there is no sane reason U+02BC
(APOSTROPHE, MODIFIER LETTER) should ever be a delimiter of any kind.

This article is about the argument that Unicode Committee recommendation that
U+2019 (RIGHT SINGLE QUOTATION MARK) is the preferred character for apostrophe
in English text is wrong (since, among other things, it breaks detecting
matched pairs of quotation marks), and that the preferred character for that
use should be U+02BC (APOSTROPHE, MODIFIER LETTER).

~~~
anjbe
There are other situations where pairs of quotation marks can be unmatched.
For example, in English it’s common to have an opening quote at the beginning
of each paragraph of a multi‐paragraph quote, but only close it on the final
paragraph.

------
brianberns
Since there's only one key on the keyboard that represents both single-quote
and apostrophe, software has to guess which one you mean from context. I don't
think there's anything Unicode alone can do to fix this underlying problem.

~~~
dragonwriter
> Since there's only one key on the keyboard that represents both single-quote
> and apostrophe, software has to guess which one you mean from context.

That's not at all true; while "guess from context" is one option, using
modifier keys is also an option; there's a lot of software that does this for
things which don't have distinct keys. Certainly seen this used for different
widths of spaces, and different widths of dashes, even though the keyboard has
only the space bar and the hyphen.

~~~
brianberns
Only a small fraction of users know how (and are willing) to use modifier keys
like that. I don't think it's practical to assume they'll change their
behavior in this case.

~~~
userbinator
sadly, even the use of the shift key seems beyond the ability of most users.
just witness all the posts in online forums that look like this.

Fortunately, it does seem to have changed over the years (although I'm not
sure to what extent things like auto-correct/auto-capitalisation contribute),
so perhaps in the far future when keyboard layouts evolve enough, everyone
will be using Unicode apostrophes and single quotes...

------
transfire
+1

