
ASCII and Unicode quotation marks - anschwa
https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
======
jimmies
I hate the "" -> “” thing with a passion. I don't know how much productivity
that the world has lost with that “” shit.

It doesn't look that much better, and it always fucks with me at random times.
That shit is on the list of annoying problems that shouldn't exist in the
first place, along with the \nl\cr thing, and the txt saved as rtf thing, and
the UTF-8 encoding-character-at-the-beginning-of-the-file or whatever it is
called.

Someone complains your program gives them an error when they open a csv file
you sent them. You tested your program, it works. You go on the phone with
them for 30 minutes, try to figure out what the fuck was going on. There it
is, it was opened in a program that meddles with that "" and replaces it with
the “” shit.

Also, there has to be at least one time you're fucked by the "" -> “”
snobbiness when you go to a random Wordpress site and paste the command they
tell you to do to the command line and realize it doesn't work. You pull your
hair for a couple of minutes, and there is that sneaky ” thing. Wordpress does
that for anything it doesn't think as code (inb4 ”good programmers don't paste
commands from wordpress to GNU+bash“).

One of the first things that I do when I set up a new Mac computer is to turn
that damn "" -> “” ““““feature”””” off.

~~~
hunter2_
> at-the-beginning-of-the-file

That thing's the BOM.

~~~
jiggunjer
Which you don't really need with UTF-8, it only has a purpose for UTF-16+.

~~~
adiabatty
It's useful to tell a text editor "This is UTF-8, not Windows-1252 or
ISO-8859-1 or whatever you might be used to".

~~~
mjevans
No, just do 8-bit clean, don't SCREW with the encoding if you weren't asked
to.

~~~
gnud
An editor can't "just do 8-bit clean", it has to display the characters. The
same bytes will sometimes be displayed differently in utf-8 and (e.g.)
ISO-8859-1.

I'm not sure if a BOM is a good way to handle it, but saying 'just do 8-bit
clean' doesn't work when you're displaying or printing the characters for
humans to understand.

------
lisper
The fact that ASCII does not have balanced quotes is one of the great
catastrophes of computing. It makes everything more complicated than it needs
to be, from embedding code in strings to parsing CSV files, to regexps. For
example, if I want to embed a quoted string in another quoted string, I have
to escape the inner quotes like so:

"This is string containing an embedded \"quoted\" string"

Then I have to think about whether or not the system I'm going to send that
string to is going to "helpfully" remove the backslashes, in which case I need
to write:

"This is a string containing an embedded \\\"quoted\\\" string"

God help you if you want to go two levels deep.

All this horrible complexity could have been avoided if we could just write:

«This is a string containing an «embedded» quoted string»

Alas.

~~~
eesmith
The complexity might be minimized, but not avoided. You would still need an
escape mechanism for something like «She said «The \» key on the server
doesn't work.»»

ASCII did add <>, [], and {}, any of which could have been used for quoted
strings, had the programming language designers chosen that option.

[https://en.wikipedia.org/wiki/String_literal#Paired_delimite...](https://en.wikipedia.org/wiki/String_literal#Paired_delimiters)
points out that PostScript and Tcl have a string literal which allows matched
quotes.

    
    
      PostScript: (The quick (brown fox))
      Tcl: {The quick {brown fox}}

~~~
stormbrew
Ruby lets you use arbitrary tokens for string literals with %s{} (where the
braces can be a bunch of things). I wish more languages would adopt this tbh.

~~~
ziotom78
C++11 has this feature too [1], e.g.:

    
    
        const char * str = R"*^*(This is string containing an embedded "quoted" string)*^*";
    

[1]
[http://en.cppreference.com/w/cpp/language/string_literal](http://en.cppreference.com/w/cpp/language/string_literal)

------
peapicker
I'm pretty sure text like:

    
    
      ``quoted''
    

Is how you're supposed to write short quotes in the TeX/LaTeX typesetting
system.

[edit: My point being that the author seems to think this type of quoting
originated with X11... which is actually newer than TeX (X11 was first
released in 1984), and that the prevalence of this type of quoting likely
originated with TeX when it was released in 1978... which isn't mentioned at
all in the article. In fact, since TeX/LaTeX is what all the CS, Physics, and
Math types were using for journal articles, it is likely the X11 font bitmap
glyphs were intentionally shaped like curly quotes to make editing your TeX
source files prettier.

At least, that's how I remember it...]

~~~
gbacon
Another giveaway of a TeX-savvy writer out of water is when you see --- for
em-dash, _i.e._ , ‘—’.

~~~
gbacon
Does HN markdown understand &mdash; or &lsquo;?

 _EDIT:_ Nope.

~~~
koolba
It doesn't need to. You can put the mdash directly in comments: —

As opposed to regular dash: -

~~~
PeterisP
Well, since my keyboard doesn't have an mdash key, if there's no support for
something like &mdash; or --- then I can't use mdashes.

~~~
mark-r
If I need a special character, I find a web page or document that has it and
use copy/paste.

~~~
jiggunjer
I have an A4 pinned next to me with all the windows Alt codes. Old-school
methods are fastest and I subconsciously learn them by heart over the years.

~~~
mark-r
Nice idea. I don't need them often enough to bother. Do you have a source for
the chart?

------
darkengine
MS PGothic, a very common font in Japan, still uses this type of quote.
"Quoting like this'' (double quote, then two single quotes) looks the most
natural in this font. "Using two double quotes" looks quite odd (see
screenshot) [1]

If you've ever seen an English-language page on a Japanese website that used
weird quotes, this is probably why.

[1] [https://i.imgur.com/zcuFZa1.png](https://i.imgur.com/zcuFZa1.png)

~~~
boondaburrah
Ah, the old dead giveaway "this game was translated from japanese and we CBA
to handle localisation properly" fonts.

------
ttepasse
The usage of of an accent as syntax in markup and programming languages annoys
me to no end. And it will still be used, to this day, the latest example are
template string in Javascript.

• It is semantically idiotic because it's an accent, not a character.

• It is visually annoying because you almost can't see the thing.

• It is bad for usability, because on non-US keyboards the accents are
implemented as dead keys. Yes, accent + space gives you the character but
that's really unintuitive for people who grew up expecting accents only over
letters.

~~~
evincarofautumn
Same, I’ve never cared for it. For these reasons I’ve decided to take a stand
and avoid using the grave accent for anything in a programming language I’m
working on. Same goes for the dollar sign, because it’s somewhat Americentric,
and as a currency character it doesn’t have any great semantic or mnemonic
value except for, well, currency units. I guess you could argue for $trings
(BASIC) or $calars (Perl) if you have $igils, but I don’t.

Sacrificing these bits of ASCII is fine by me, because the language is small
enough, and I also allow Unicode. For example, curved quotes are allowed and
can be nested or contain ASCII quotes without escaping:

    
    
        // Character literals
        ‘'’
        =
        '\''
    
        // Text literals
        “Some "text" with “curved quotes”.”
        =
        "Some \"text\" with “curved quotes”."
    

For the sake of usability, of course, everything in the core language &
standard library has an ASCII spelling, like in Perl6. I’d like for other
languages to adopt this view as well. If new languages allow proper Unicode
notation in some sensible places, then programming editors’ input methods will
catch up, e.g., automatically replacing “->” with “→” or “\theta” with “θ”
(like Emacs’ TeX input mode).

Also, does anyone know of a reference for keyboard layouts from around the
world that includes estimates of the number of people using them? I’ve tried
to keep things relatively easy to type on all the major layouts I know of, but
I don’t want to alienate anyone if I can help it.

~~~
int_19h
> Same goes for the dollar sign, because it’s somewhat Americentric, and as a
> currency character it doesn’t have any great semantic or mnemonic value
> except for, well, currency units.

By that metric, wouldn't & be too Anglo-centric, and # be too Euro-centric?
There are layouts out there on which neither is readily available.

~~~
evincarofautumn
I suppose so. It’s just one of the many small judgement calls you make when
designing a language, and definitely falls into the category of “design” more
than engineering or science. At some point I decided that grave and dollar
were out, while ampersand and octothorpe are in. And you can still define a
dollar-sign operator if you want, it’s just not in the core language or
standard library.

English is the lingua franca of programming, so it’s hard to avoid some
Anglicisms (like ampersand meaning “and”, dot instead of comma for decimals,
and English-language keywords) without going against strong precedents set by
other languages. If I really wanted to be pedantic, I might use /\ and \/ for
logical “and” and “or”—those spellings are the major reason that the backslash
even exists in ASCII.

------
garou
It's very odd for me to see the grave accent (`) as quoting mark in bash and
other programming languages. I understand that the accent alone lose its
function for the human language. But still uncomfortable to se an accent as
delimiter to a string.

~~~
sp332
I'm not even sure why ASCII has a grave accent. There are no combining marks
so you could never write it over another letter.

Edit: I forgot HTAB was actually part of ASCII. Oh well!

~~~
electroly
On a teletype, ALL characters are combining marks because you can backspace
(another ASCII character derived directly from teletype codes) and type
another character overtop it.

~~~
reaperducer
Are you old enough to remember when you printed your code on a teletype
machine that blank spaces were represented by a "b" with a slash through it? I
hated that.

Even worse, I remember one shop where the teletypes didn't have question
marks, so people used capital P's instead.

------
mxfh
to add to the confusion:

' PRIME (U+2032)

" DOUBLE PRIME aka inch mark (U+2033)

have their own codepoints

[http://practicaltypography.com/foot-and-inch-
marks.html](http://practicaltypography.com/foot-and-inch-marks.html)

which describes implications for typesetting coordinates and other things:

118° 19′ 43.5″

118° 19’ 43.5” wrong (curly quotes, although it renders identical in some
fonts)

118° 19' 43.5" right

~~~
mark-r
Those should be added to the document, with a note that they are NOT quotes!

~~~
mxfh
I doubt this will ever be updated, since it's reference for a very specific,
20 year old, code interpretation related proposal und not meant for type
setting.

But for anything related to contemporary typesetting on the web I recommend
_Practical Typography_ and especially the _Type Composition_ chapter:

[http://practicaltypography.com/type-
composition.html#links](http://practicaltypography.com/type-
composition.html#links)

Including notes on quotes and apostrophes:

[http://practicaltypography.com/straight-and-curly-
quotes.htm...](http://practicaltypography.com/straight-and-curly-quotes.html)

[http://practicaltypography.com/apostrophes.html](http://practicaltypography.com/apostrophes.html)

~~~
alanh
Hmm. I was going to link the classic
[https://alistapart.com/article/emen](https://alistapart.com/article/emen)
(and I guess I just did), but I noticed a disclaimer at the top pointing out
that it “is now obsolete.” And how! It proclaims that not enough text editors
support UTF-8 yet, which thankfully hasn’t been true in ages

------
treve
It just occurred to me how much easier certain text-operations (like syntax
highlighting, regular expressions and other parsers) if we consistently used
the right unicode symbols for quotes and apostrophes

~~~
mbrock
The only languages I know off the top of my head that use balanced delimiters
for strings are M4 and Perl 6.

Hey, imagine being able to nest strings without escaping! What a concept!

~~~
chaosfox
Perl does that as well, and you can even choose the delimiters you wanna use:

> _For the constructs except here-docs, single characters are used as starting
> and ending delimiters. If the starting delimiter is an opening punctuation
> (that is (, [, {, or < ), the ending delimiter is the corresponding closing
> punctuation (that is ), ], }, or >). If the starting delimiter is an
> unpaired character like / or a closing punctuation, the ending delimiter is
> the same as the starting delimiter. Therefore a / terminates a qq//
> construct, while a ] terminates both qq[] and qq]] constructs._

------
ratmice
FYI For a long time GNU coding standards prescribed using the grave accent,
but this changed some years ago now

[https://www.gnu.org/prep/standards/html_node/Quote-
Character...](https://www.gnu.org/prep/standards/html_node/Quote-
Characters.html#Quote-Characters)

~~~
josteink
From the link:

> Although GNU programs traditionally used 0x60 (‘`’) for opening and 0x27
> (‘'’) for closing quotes, nowadays quotes ‘`like this'’ are typically
> rendered asymmetrically, so quoting ‘"like this"’ or ‘'like this'’ typically
> looks better.

Is this link saying I can quit using `QUOTES' in my Emacs-documentation? That
style always struck me as odd :)

~~~
lottin
Curved single quotes (‘...’) are recommended now:

[https://www.gnu.org/software/emacs/manual/html_node/elisp/Do...](https://www.gnu.org/software/emacs/manual/html_node/elisp/Documentation-
Tips.html)

------
13of40
CSB: Years ago I was working on a team that developed a scripting language and
we had this recurring problem where someone would write up a code sample in a
Word document and it would break if you cut and pasted it because all of the
single and double quotes would be Unicode. My boss was this tough guy who
tried to snap the whole team to a standard of strictly disabling that behavior
in all of our Office applications, but I piped up and said maybe we should
just make the language treat all of those characters like apostrophes and
quotes.... I think around version 5 they finally made an API for doing proper
anti-injection escaping because you pretty much needed a PhD to get it right
due to all of the variations introduced by the extended characters.

~~~
delinka
Or ... use a text editor?

~~~
13of40
You know a lot of PMs who write specs in notepad?

------
jiggunjer
What bothers me about Unicode isn't that apostrophe (U+0027) is overloaded by
having two semantic meanings ("apostrophe" or "single straight quote"), but
that they exacerbate the confusion by recommending to overload "right single
quote" (U+2019) to _also_ mean apostrophe.

We now have two characters for apostrophe and extra ambiguity for processing
correct right single quotes. Great job not breaking historical documents
Unicode.

~~~
Loic
And now, imagine that your own name has an apostrophe in it. Like my family
name. I can tell you, I crashed many databases and in 90% of the cases where
people need to find again my name in a database, it is ending up with
requesting my address because each time a different character is put by the
clerk doing the data entry and they cannot match my name. Even state level
authorities are bad, really bad, at it.

------
kazinator
> _Please do not use the ASCII grave accent (0x60) as a left quotation mark
> together with the ASCII apostrophe (0x27) as the corresponding right
> quotation mark (as in `quote ')._

Tell that to GCC:

    
    
      /usr/lib/gcc/i686-linux-gnu/4.6/../../../i386-linux-gnu/crt1.o: In function `_start':
      (.text+0x18): undefined reference to `main'
    

Looks good to me, by the way.

> _Where ``quoting like this '' comes from_

I did it for a while out of a habit acquired from working with TeX. In TeX, it
is the source code syntax for encoding quotes. Of course, it is lexically
analyzed and converted to proper typesetting.

> _If you can use only ASCII’s typewriter characters, then use the apostrophe
> character (0x27) as both the left and right quotation mark (as in 'quote')._

It looks like shit in any font in which the apostrophe is a little nine, which
is historically correct. What you want is a little "six" on one side and a
"nine" on the other, or at least some approximation thereof. Even if the
apostrophe is crappily rendered as a little vertical notch, it still pairs
with a backwards-slanted `.

(The representation of apostrophe as a little vertical notch, I suspect,
caters to literals in programming languages.)

> _If you can use Unicode characters ..._

then you should still stick to ASCII unless you have other good reasons to.
``Can'' is not the same thing as ``should'', let alone ``must''.

> _For example, 0x60 and 0x27 look under Windows NT 4.0 with the TrueType font
> Lucida Console (size 14) like this:_

The idea that people should change their behavior because of which font is
default on the Windows cmd.exe console is laughable.

~~~
Freak_NL
> then you should still stick to ASCII unless you have other good reasons to.

Why? Using non-ASCII Unicode characters acts like a nice canary for detecting
character encoding issues. Besides, why would I purposely limit my text to
ASCII? It doesn't even suffice for English, let alone almost any other
language I use ­— including my native language Dutch, German, and Japanese.

~~~
kazinator
All sorts of reasons. Diagnostic printf message in some embedded firmware. Do
you need to drag Unicode into it? Git log message. Ditto.

~~~
comex
> Diagnostic printf message in some embedded firmware. Do you need to drag
> Unicode into it?

Why not? The firmware itself would usually have no reason to care about the
details of a diagnostic message's encoding, whether that be ASCII or UTF-8 -
it can mostly just treat strings as bags of bytes. There might be some byte
values that are special (nul terminator, % for printf, etc.), but UTF-8 is a
superset of ASCII and represents extended characters using only bytes with the
highest bit set, so there will never be 'false positives' of the special byte
values. Other than that, the bytes can stay uninterpreted as they go over
whatever serial port or diagnostic protocol the device is using, until they
eventually show up on - most likely - some sort of terminal application on a
modern computer, which probably supports UTF-8 already. So in most cases it
should 'just work'.

Of course, there are situations where it won't just work, such as if the
firmware needs to display the diagnostic message on a screen (by itself), but
from what I've seen those are the minority.

edit: As for Git, what's wrong with people writing log messages in their
language of choice? (Other than the social issue of it making it harder for
English speakers to use the codebase.)

------
sengork
Things become really fun when you're trying to figure out why that command
fails when you've copy/pasted it from another application window.

Often it's the quotes which have been silently (automatically) converted to a
visually similar (but functionally incompatible) character variant.

------
Tepix
It seems that half of the people in this company use the wrong acute sign `as
an apostrophe instead of ' or ’. Unfortunately it's the half that creates
presentations and talks to customers.

It looks terrible and to me it's a disgrace!

Example: it`s versus it's or it’s. (first one is wrong).

~~~
ygra
I've seen a café which had its name written in large, lit letters on the
façade and it included the following gem: Cafe`. Yes, the wrong accent, and
not even combined. Easy access to DTP tools (or even a word processor) for the
typographically uneducated masses ends up with quite painful results
sometimes.

------
mirimir
In another life, I analyzed enterprise data. Variation in quotation marks was
a common problem. I mean, is it "D'arcy" or "D’arcy"? Sometimes, I think,
people would mangle data in spreadsheets, with auto-correct on.

------
alanh
While I can’t expect many to follow suit, I myself often type educated quotes
and nice apostrophes. The macOS keyboard combinations (nearly-intuitive
combinations of Option-(Shift)-[ and -] for “”‘’) have long been committed to
muscle memory. And since nearly all (web) file formats seem to be UTF-8, the
days of manually typing &ldquo; and friends are long, long gone.

Benefits of typing and using typographer’s quotes directly in your
JS/JSON/HTML/source:

1\. No backslashes or other escape sequences needed!

2\. WYSIWYG

3\. Retina screens and gorgeous modern fonts mean that your sloppy quotes will
look extra bad if you just use ASCII quotes

------
tempodox
I would fain use the curly quotes if only Darwin's groff(1) wouldn't barf on
them. For the time being, man pages for one still need to quote like ``this''.

~~~
anjbe
In troff you can escape “ ” ‘ ’ as \\(lq, \\(rq, \\(oq, \\(cq respectively.

If you’re writing manpages, though, you should be using the -mdoc macros
([https://manpages.bsd.lv/mdoc.html](https://manpages.bsd.lv/mdoc.html)),
which have “Dq” and “Sq” macros that wrap the arguments in double and single
quotes respectively.

------
gumby
I find it interesting that the article includes a German keyboard that doesn't
include the proper ,,'' (or ,') quotation glyphs. However it does include
grave and acute accents as well as French primary quotations (<< and >>)
though not the secondary guillemots (quotation characters < and >) none of
which are used in German text.

And of course I used ascii analogues to type these into HN :-(

~~~
YSFEJ4SWJUVU6
>And of course I used ascii analogues to type these into HN :-(

But why, though? To the best of my knowledge, HN supports unicode quite well,
including the following quotes: »«›‹„“‚‘ (available with the help of AltGr and
sometimes shift from keys y, x, v, b when selecting the German keyboard layout
on my computer).

~~~
gumby
I'm using a travel laptop on a plane and it came with a US keyboard

------
rdtsc
> The Unix m4 macro processor is probably the only widely used tool that uses
> the `quote' combination as part of its input syntax; however, even that
> could be modified via changequote.

I remember staring for a long time at the file when I first saw an m4 macro.
My brain was telling, surely this has go to be a typo, but then everything
worked as expected. Then I learned that's a proper way of quoting there.

------
timb07
It's a little bit off-topic since the article was primarily about quotation
marks and coding, but it would have been good if it mentioned that an ʻokina
(as found in "Hawaiʻi") is neither an apostrophe nor a left quotation mark.

[https://en.wikipedia.org/wiki/%CA%BBOkina](https://en.wikipedia.org/wiki/%CA%BBOkina)

------
dmitriid
It's worse for other languages. Russian quotation marks are « and ». Thanks to
early computers being predominantly from/designed in the US, they are now
highjacked by American quotes.

Same probably goes for French and other languages with their own sets of
quotation marks.

~~~
ansgri
«Russian» quotation marks are actually the « French » ones with different
spacing. There's another, less used set of quotes in Russian, so called
„German“ ones (used as inner quotes and in handwriting). English quotes are
widely accepted though.

~~~
contingencies
Modern Chinese usage includes all of 《》〈〉「」『』【】“” and probably others, roughly
in that order. Modern typographic convention is perhaps 《 _title_ 》「 _quote_ 」
but that's surely opine and debatable. Hong Kong and Taiwan have their own
typesetting conventions, distinct from mainland China, and in the latter case
no doubt influenced by Japanese occupation and cultural inflow (manga, etc.).
Historically for most of Chinese history written language had no punctuation,
and sentence endings were merely inferred from context, which was historically
clearer 也. See
[https://en.wikipedia.org/wiki/Chinese_punctuation](https://en.wikipedia.org/wiki/Chinese_punctuation)
and
[https://en.wiktionary.org/wiki/%E4%B9%9F#Definitions](https://en.wiktionary.org/wiki/%E4%B9%9F#Definitions)
(definition #4)

------
jiggunjer
So why isn't there a straight single quotation, but there is a straight double
quotation? I get it probably arose from compatibility reasons, but nowadays
Unicode should be able to offer something?

P.S. Major coincide I was googling this very question yesterday?

------
exikyut
For reference, the BIOS text-mode font included with some IBM PCs (I've
observed this on NetVistas and ThinkPads myself, at least) renders ` as a
nice-looking opening quote, and ' looks like a nice closing quote.

------
audiodude
Honestly I've been seeing `quote' in bash and other CLIs for my entire career
and always thought they were just funny or strange, but carried no meaning.

------
jrochkind1
MRI ruby still does this in some error messages. I hate it. Always messing up
my copy-and-paste into `` markdown too.

------
crottypeter
Could do with a (2007) suffix.

