Makes a real effort to completely gloss over a very common English use of apostrophes. From the article:
> Consider any English word with an apostrophe, e.g. “don’t”. The word “don’t” is a single word. It is not the word “don” juxtaposed against the word “t”. The apostrophe is part of the word, which, in Unicode-speak, means it’s a modifier letter, not a punctuation mark, regardless of what colloquial English calls it.
> According to the Unicode character database, U+2019 is a punctuation mark (General Category = Pf), while U+02BC is a modifier letter (General Category = Lm). Since English apostrophes are part of the words they’re in, they are modifier letters, and hence should be represented by U+02BC, not U+2019.
> (It would be different if we were talking about French. In French, I think it makes more sense to consider «L’Homme» as two words, or «jusqu’ici» as two words. But that’s a conversation for another time. Right now I’m talking about English.)
OK, I've considered an English word: "man's". In the sentence "That man's pants are on fire!", this is usually considered a single word, the genitive case of "man" (personally, I'm not a huge fan of that approach, since the "genitive" 's attaches to phrases, not to words, but it is the mainstream position in linguistics).
In the sentence "That man's about to jump", on the other hand, the "word" "man's" is two words joined by an apostrophe, exactly as in French "l'homme". These clitics aren't exactly rare in English. The author shows some linguistic training in the comments to his piece, but never once mentions clitics, and fails to address them when another commenter brings them up.
IMHO the fact that all these new Unicode characters look very similar to existing ones, maybe even pixel-identical in particular fonts, are sources of extremely unpleasant surprises.
Depends what you mean by "word". "Man's" is not the headword you'll find in a dictionary, but it would be separated by a space in a sentence. I think this latter case is what matters here. "Man' s" doesn't make sense.
The mainstream view is that English "don't" is just a word, the negative form of "do". The apostrophe is a historical accident. The author of this piece gives one argument for this view in the comments, pointing out that "don't" can appear in contexts where "do not" is ungrammatical.
A "clitic" in linguistics refers to an item which is (1) a word in the sense of having its own dictionary entry (which I might call "at the lexical level"), but (2) not a word at the phonological level -- clitics depend for their pronunciation on the word(s) (usually just one word) next to them. So, for example, the 's of "the man's about to jump" is lexically a form of the verb is, but it's been reduced down to zero syllables. The indefinite article (a/an) is another English clitic, and you can observe its pronunciation changing according to the word that follows it in the sentence pair:
1. A cow trampled me.
2. An elephant trampled me.
Traditionally, the definite article ("the") is also clitic, with one pronunciation before consonants and a different pronunciation before vowels. The before-consonants pronunciation is in the process of becoming universal.
Languages differ in whether clitics are written together with the words they attach to phonologically or not. Ancient Greek clitics are traditionally separated with orthographic spaces (we know they're still clitics because they affect the placement of word accents). Latin clitics are written as part of the same word: "felis canisque" (="the cat and the dog", where -que means "and"). English uses both approaches.
SUMMARY, about the specific example you chose: "don't" and the "man's" of "the man's about to jump" are not in the same class, because "don't" is just a word with no internal structure, and that "man's" is two words which are realized in speech as a single syllable. That "man's" might be thought of as a "normal contraction", a term of no meaning that I know, but linguistically is a full word ("man") with a clitic ("'s") attached. However, clitics in general are not necessarily zero syllables long.
> Consider any English word with an apostrophe, e.g. “don’t”. The word “don’t” is a single word. It is not the word “don” juxtaposed against the word “t”. The apostrophe is part of the word, which, in Unicode-speak, means it’s a modifier letter, not a punctuation mark, regardless of what colloquial English calls it.
> According to the Unicode character database, U+2019 is a punctuation mark (General Category = Pf), while U+02BC is a modifier letter (General Category = Lm). Since English apostrophes are part of the words they’re in, they are modifier letters, and hence should be represented by U+02BC, not U+2019.
> (It would be different if we were talking about French. In French, I think it makes more sense to consider «L’Homme» as two words, or «jusqu’ici» as two words. But that’s a conversation for another time. Right now I’m talking about English.)
OK, I've considered an English word: "man's". In the sentence "That man's pants are on fire!", this is usually considered a single word, the genitive case of "man" (personally, I'm not a huge fan of that approach, since the "genitive" 's attaches to phrases, not to words, but it is the mainstream position in linguistics).
In the sentence "That man's about to jump", on the other hand, the "word" "man's" is two words joined by an apostrophe, exactly as in French "l'homme". These clitics aren't exactly rare in English. The author shows some linguistic training in the comments to his piece, but never once mentions clitics, and fails to address them when another commenter brings them up.
Use U+0027 for English apostrophes. ;p