Hacker News new | past | comments | ask | show | jobs | submit login
Punctuation Matters: How to use the en dash, em dash and hyphen (punctuationmatters.com)
596 points by MrVandemar 8 months ago | hide | past | favorite | 340 comments

Oh this is easy.

m dash: --

n dash: -.

.. I take it few people find morse code puns funny anymore.

Seriously, what's the point of this pedantry. What does having 3 basically identical characters add to the language other than a pointless rules for insufferable pedants to power trip over. We've all been using - just fine. On what basis does the person writing this article believe these rules matter, are important, disambiguate language?

Call me a hopeless philistine, but I say down with the dash. One symbol is fine for word-compounding, numerical ranges, subtraction, mid word line breaks. No one needs an em dash to tell them pages 3-8 is not a compound word.

I strongly disagree.

By the same logic you might as well say: "why are we even kerning fonts, who cares if there's a few gaps when i write »irl«."

The fact that using different dashes does encode meaning in a subtle sense does have relevance for semantics -- but that's, imho, almost secondary to this argument, as it's not as grammatically relevant as commas and. periods, for example.

The primary importance of using the correct dashes is that it preserves a good flow for reading and is paramount to micro-typographic balance:

- A longer dash to link words that belong together is visually perceived as an interruption and doesn't feel like those two words are one

- In reverse, a shorter dash when switching context -- or interjecting another idea within a sentence -- doesn't slow the pace of the text flow enough, and your brain will read/intonate it the same way as when linking words.

- And at last, either of them won't preserve optical balance when displaying a numerical range, as numbers are wider than a hyphen, but narrower than an em space, which would result in either insufficient visual separation compared to spaces following said numbers, or too much of an optical gap within an entity that belongs together.

That's the barebones set of dashes that are relevant for a balanced typographical appearance, not made up pedantic complexity to annoy people. Otherwise we'd be taking about half and quarter em dashes and the likes.

> you might as well say: why are we even kerning fonts [...] is paramount to micro-typographic balance [...] is visually perceived as an interruption [...] won't preserve optical balance

These are typesetters concerns, not writers concerns. They are all context sensitive tweaks to what amounts to the same glyph.

If the rules for each have as well defined contexts as the article suggests, then it sounds like something more suited to ligatures and kerning.

Full glyph replacement ligatures were not something initially supported by all font formats, so perhaps the fact that they continuing to exist as separate characters is more of a historical detail. It's something that could easily be added with new fonts though.

Ligatures typically change the appearance of a character, they do not change the meaning. Merging the hyphen en the n-dash into the same character and then derive the correct one from the context (spaces around it) would be a whole new use of "ligatures".

From a software "separating of concerns" viewpoint it feels wrong to me to have your font renderer infer meaning. A pre-processor that replaces hyphens with the correct dash – like Word does – feels more sane to me.

> Ligatures typically change the appearance of a character, they do not change the meaning

Anyone can use a hyphen for all three purposes right now and people would understand the meaning, because the meaning is primarily derived from the context of surrounding glyphs. Only typesetters would complain that a subtly more appropriate glyph should be used for the purposes of refined optics and geometry etc.

Therefore an endash and emdash ligature could only change the meaning IF the context of each use case overlap. i.e if there is a valid glyph based context in which endash and emdash are both valid... which I don't think there is because that would be far too subtle.

Some fonts—like Inter—do this, but I see people complain that the font isn’t rendering exactly what they typed.

My favorite is that it will render 1920x1080, for example, as 1920×1080. I think the former looks terrible and unprofessional, especially when I see it in actual products rather than prose. So I really hope this catches on.

I’ve gone my entire life without knowing the difference and survived just fine.

It may not be entirely irrelevant, but it’s very close to it. A bit like saying your tie has to be knotted a specific way to look respectable. Very fun for the in-group, but completely incomprehensible to those outside.

Like, I’m not opposed to having a few silly things to learn just to separate those that can be bothered to pay attention from those that do not, but I’d be hard pressed to say it’s actually relevant outside of that.

> I’ve gone my entire life without knowing the difference and survived just fine.

That is an extremely low bar. People lived their entire lives without X for any X.

Text doesn't become unreadable when the dashes are used incorrectly, but (for me) when they're used correctly, they do make the text easier to read and digest.

Thank you for the post. I still don't want to learn & spend mental energy on which of 3 different dashes to use, but now I do see why people would want to (and I think the reasoning is solid, even if I don't personally want to bother with it :) ).

You started by talking about kerning fonts, which is a great analogy.

Building on that - kerning is awesome because stuff looks better and I don't need to do anything for it to happen. Would it work to have my display system figure out which type of dash to use automatically?

Like, a dash inside a word should be short (under the assumption that you're linking the words together) and dashes with whitespace around it should be longer (under the assumption that you're switching context/injecting an idea into a sentence).

Your message is somehow undermined by your use of “--” in place of “—”.

No it isn't--double-hypens are a great alternative to an em dash and are interpreted as such by many people and some software. GP's argument is for the grammatical functionality of differentiating dashes, not the specific symbols used.

That said, I don't use en dashes, if I want my numbers to line up I use a fixed-width font.

That’s a reason not to use figure dashes, which aren’t the same thing as en-dashes.

> No it isn't; double-hypens are a great alternative to an em dash and are interpreted as such by many people and some software. GP's argument is for the grammatical functionality of differentiating dashes, not the specific symbols used.

Use ;

Works just fine

> Use ;

Comma, not semicolon, is the usual alternative to em-dashes for setting off asides, semicolons set off independent clauses.

Not sure if a deliberate joke but those are two independent clauses so should've used a semicolon :-) (between "asides" and "semicolons")

Semicolons are often better just replaced by periods. I sometimes use them but I've had at least one editor who refused to use them for news-oriented copy.

I find it unlikely that the comment was typed on a typewriter, or sent over a teletype. Computers and phones make it easy to type em-dashes if you want to do that. No sub-par alternatives are needed.

Did you know double dash is treated as one single longer than normal dash by default in iOS?

This character ‘—‘ looks like one long dash to me, even though I typed the dash button twice. What’s even crazier is if I type four dashes ‘——‘ it still looks like one even longer dash; even six ‘———‘ is a solid line, and I can delete it by pressing the backspace button once

I have no idea how to get my phone to display two short dashes side by side: ‘--‘ maybe I can fake it by puttin an emoji in between, knowing that hackernews will filter it out. Let’s see what happens.

Edit - ooh that totally works. I’d never really paid attention to how this feature worked before.

I did, hence my other comment in this thread — proper dashes are easy.

“--” can also be input on iOS by entering “- -” and then deleting the space. Or you can disable the “Smart punctuation” setting and type what you actually mean.

Why do we stop with hyphen, n and m dash? There are at least 30 different use cases, we should not reuse only 3 versions of some short line. Let's make 30 versions, one for each meaning. (cynicism)

Never really cares about anything that you're saying about how the dashes should work to imaginary group of people way into typography

Then don't use them? As a reader, I certainly appreciate when people do. When writing documents or HTML I use them because it adds clarity. When typing on a web form, I'll usually use "--" because it's visually similar and much easier to type on a US keyboard. No one, pedants included, have ever tried to correct me on it.

I also use capitalization and punctuation when I type while many people do not. It'd be great if they did, since it makes reading easier and takes almost no additional effort, but I'm not going to let it ruin my day. The parent comment is about why the distinction in dashes matters and has virtually nothing to do with typography enthusiasm, but rather reader comprehension. If you don't want to integrate that information into your life, great, but that's not really a refutation. For my part, I found it interesting. Even though I use em-dashes I learned more about how they're helpful. If you don't want to use them, I'm almost positive no one is ever going to correct you.

> Then don't use them?

I don't and; but dumb tools replace -- with em-dash which breaks shit.

> If you don't want to use them, I'm almost positive no one is ever going to correct you.

Still have to look at it and suffer consequences of dumb editors replacing -- with em-dashes when someone innocently tries to just say commandline parameter

It also looks like you’re drawing attention to something — the use of the double dashes — in making a deliberate choice to break from the norm. Whereas if you just follow the way most people use dashes - single dashes, not double - then it doesn’t really stand out, it just looks ‘normal.’ You’re used to seeing it styled that way. It feels different.

An example from the article:

Looks good: “Sometimes writing for money—rather than for art or pleasure—is really quite enjoyable.”

Unreadable: “Sometimes writing for money-rather than for art or pleasure-is really quite enjoyable.”

Yes, punctation does matter. (In French the em-dash is almost inexistant; we use parenthesis instead usually.)

Just add spaces. Sorted.

“Sometimes writing for money - rather than for art or pleasure - is really quite enjoyable.”

A common substitution for emdash is -- which are two hyphens with spaces around them.

Personally, I think two hyphens also looks better than just one, and it conveys that you really intended it to mean emdash rather than hyphen.

This is similar to how it's typeset in TeX as well: two for en, three for em.

I have used two hyphens, but I appreciated text editors collapsed them into an (em-) dash.

Hyphens are simply for connected-words while dashes are -- for better of worse -- to make asides.

> Personally, I think two hyphens also looks better than just one

It's context-dependent. (Aside: you wouldn't write "context--dependent", which is the use case of the hyphen.)

Ostensibly the en dash is primarily used for ranges, although that's a case where I'm inconsistent. I won't typically write "A - Z" or the technically correct "A–Z", as I think in that case I tend to write "A-Z", using a simple hyphen. I certainly won't write "A -- Z".

The em dash is even wider—it's not typically mistaken for a hyphen.

Sometimes I write A->Z.

Em-dashes add a bit of a pause. And having them longer and taking a bit more of horizontal space makes it more intuitive. They also break a sentence into parts. Having them easily distinguishable helps navigate text and reduces overhead. Just like periods or paragraph breaks help you see parts of a text, or syntax highlighting helps you see lexemes in a program.

Using just one dash for everything will be readable in a text message or comment. But not in a (complicated) book, because there the benefit of these small things gets multiplied by the scale of the book.

> They also break a sentence into parts.

IMHO, this is the main determination on when I decide to use em-dashes: is the text between them an aside of some kind? An alternatives would be to use parentheses.

Personally I do not find that " - " as the GP suggests enough of a visual cue as "—". And on macOS using different dashes is fairly straight-forward:

* hyphen: the key next to zero, "-"

* en-dash: alt/option-"-": –

* em-dash: shift-alt/option-"-": —

Some apps (e.g. Mail) auto-convert double-"-" into an em-dash as well.

Good to see OSX people thought about this.

On Linux, one needs to enable Compose key (keyboard layout settings). After that, you get default sequences like --. and --- for en- and em-dashes.

Since I am now a hyper-hyphen-partisan-pundit after reading that blog post - I'd like to comment on your hyper-hyphenated comment.

> “Sometimes writing for money - rather than for art or pleasure - is really quite enjoyable.”

To me this looks like a cryptic-case of the corrective comma.

“Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable.”

Your first sentence should also use a comma rather than the wrong dash.

That's why I used the dash...

The way I was taught, you use the comma for a brief aside--em dashes are used for a larger diversion (and parenthesis are for the most tenuous connections.)

In other words a reader should be able to skip reading the contents of parenthesis with negligible impact on the context or meaning of the sentence. They should be able to skip reading the contents of em-dash-seperated text without changing the meaning of the sentence. And text between commas should be considered integral to the sentence, while secondary to the primary gist.

What you reference is that commas are used to set off non-restrictive clauses, where the meaning of the sentence is clear without the additional clause. Though, the non-restrictive clause provides additional description of a word in the main sentence.

Such as:

Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable.

Other than that, many people have come up with many writing styles. We mostly seem to be able to understand each other, so we are "all good".

“Sometimes writing for money - I have other aims besides art or pleasure - is really quite enjoyable.”

En space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, ideographic space, or Ogham space?

I have written and read text for decades without knowing the difference between those, so whatever space one gets when pressing the spacebar seems to do the job just fine. And if in doubt LaTeX etc will handle the rest well enough if I care about sub-pixel precision of some margins.

This is what I do. I don’t see the problem here. I don’t see the need to adopt additional characters.

It's a tradition from hand-set paper print which is now largely obsolete.

Spaces can cause word wrap that can leave a dash at the end or beginning of a line, which is not beautiful. A spaceless em dash doesn't have the wrapping issues while retaining legibility. You could argue that that's a problem with word wrap algorithms, not punctuation, but that situation is not going to change any time soon.

In German that’s the way it’s done: en-dash with spaces, em-dashes (basically) don’t exist.

I use em dashes with spaces in German (and English) all the time — I just like it better and don't care about arbitrary rules and traditions.

Yea em dash with spaces looks better to me too, I find that it’s harder to read if the em dash is there without surrounding spaces. Looks too cramped, not separated enough.

I have never understood the classical rule of no spaces around em-dashes. If you’re going to use fancy dashes at all, an em-dash represents a clear pause, a break in thought — something more robust than a mere comma. Typesetting an em-dash sometimes literally touching the words on either side has the opposite effect, visually connecting those words rather than separating them, and unlike a lot of the typographical snobbery we sometimes engage in, that one is a well-known (at least to designers) effect of proximity. Personally I prefer a thin space rather than a full one in media where it’s possible, purely for cosmetic reasons, but I’d rather have a normal space than none.

I think that is not really true? There is the "Gedankenstrich" and one can see it in texts. Or do you mean, that it is so rare, that German language almost does not use it? I think that depends on the writer.

Yes, and the Gedankenstrich is usually set as en-dash with spaces around, only rarely as em-dash. See https://de.wikipedia.org/wiki/Halbgeviertstrich#Gedankenstri...

Hum, a hyphen is still an entity of its own (it may be even a short, slanted dash in some fonts), then there's the en-dash for association (e.g. "ZDF – Zweites Deutsches Fernsehen"), and there's the "Gedankenstrich", which performs more like a separator. Three typographical entities to express three different concepts. (But there's a tendency of mixing the en-dash with spaces and the "Gedankenstrich", as the latter also comes with surrounding spaces, which may appear overly exaggerated in some fonts.)

Sure. As far as I‘m aware the Gedankenstrich is usually set as en-dash with spaces in German, though [1].

1: https://de.wikipedia.org/wiki/Halbgeviertstrich#Gedankenstri...

However, it is the en-dash, properly, rather than the hyphen. I quite like that punctuation.

Now, anyone typing random texts to a friend or a few need not care, but I think people that write in a professional capacity to more than a few people should know and care.

If you’re going to do that, en dashes look nicer (as explained in the article):

“Sometimes writing for money – rather than for art or pleasure – is really quite enjoyable.”

Is this the yardstick from The Grid? If so, hope all is well :) (and if not, I also hope all is well)

Sorry that’s not me, but thanks for the well wishes and I hope all is well for you too!

Or commas (?). "Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable."


> In French the em-dash is almost inexistant; we use parenthesis instead usually.

The only French-speaking place I've seen em-dashes used in daily life was Québec. For some (good) reason, it seems administration took a lot of care in using correct typography. My voting district for example was Mercier–Hochelaga-Maisonneuve (the first dash being an en-dash, and the second one a hyphen) and I was always amazed at how all communication actually used these two different dashes.

I can't imagine this level of care in French or Belgian official communication.

(By way of explanation, the parent commenter's voting district covered the Mercier and the Hochelaga-Maisonneuve neighbourhoods.)

I disagree that the first example looks good. Both cases would be better with spaces, which kind of renders the em dash unnecessary.

Some places I write for use the em-dash with spaces and some without. I try to remember which is which but I often forget.

Looks better: “Sometimes writing for money — rather than for art or pleasure — is really quite enjoyable.”

Punctuation matters, but space -- the "zeroth punctuation mark" -- matters more!

The author does discuss spacing the dashes but is, given the overall point of the article, surprisingly noncommittal.

Others have mentioned using spaces with an en-dash or hyphen instead of an em-dash. Having used a typewriter -back in the day- I learned to produce text like this.

How I learned the Unreadable: “Sometimes writing for money -rather than for art or pleasure- is really quite enjoyable.”

To the teacher I learned from this was a standard way of punctuating on a typewriter.

The "unreadable" sample is very much readable. We can all read it. No one is tripping up trying to figure out what a "money-rather" is.

Not for me. It's readable, but my brain has to do more work. When I get to "money-rather" my brain trips up slightly, and then I'm confused until the next dash, then I go back and figure it out.

All possible and dealt with in under a second, but in the first example with the longer dash my brain recognises a parenthesis and I take a little "breath pause" before carrying on.

Also consider, that eye movement is not always linear from left to right. But I agree, brain has to do more work and it is slightly confusing.

I can read it, but it definitely trips me up.

It‘s not unreadable, just a tad more difficult. And as others have pointed out, there are other ways of making it easier again than using a specific character. But the real point is: The information transported in both examples did not change its meaning and will be understood by the reader / receiver in both cases. If it‘s not, it matters. As long as it is, it‘s pedantic.

Wouldn't the alternative rather be to use commas there, not a hyphen?

"Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable."

In your head, do you read those differently?

Personally, I think this sentence would benefit from a comma before the ‘or’. And in that case we could probably benefit from a clearer way of setting aside the parenthetical.

“Sometimes writing for money, rather than for art, or pleasure, is really quite enjoyable.”

– this seems awkward to me. This version, though:

“Sometimes writing for money—rather than for art, or pleasure—is really quite enjoyable.”

Isn’t that more fluid?

> “Sometimes writing for money, rather than for art, or pleasure, is really quite enjoyable.”

I think that changes the meaning, since it’s now a list of 3 items with an Oxford comma, rather than two lists, with the first list having 1 item, and the second list having 2 items. And I’m having a rough time even making sense of such revised meaning.

Expressed as pseudo-code, I read the original intent of that sentence as:

“money and not(art or pleasure) == enjoyable”

and that can be broken into

“((money and not art) or (money and not pleasure)) == enjoyable

Also in Portugal, just use parenthesis (like when you would insert an idea into a sentence) and it still reads fine.

> In French the em-dash is almost inexistant; we use parenthesis instead usually.

French uses em-dashes ("tiret cadratin") or en-dashes ("tiret semi-cadratin") for dialogue. Like so:

– bonjour, dit-elle, comment allez vous?

– bonsoir, repondit-on. Ça va ça vient, et vous?

– bien

I use em-dashes and parentheses somewhat differently but you can mostly substitute the latter for the former.

Right, the dash length seems more of an aesthetic choice, like a drop cap or something.

I just use commas or parens.

Having multiple options for how to offset parenthetical asides, far from being redundant (or even confusing), offers us—as writers and readers—more opportunities to express the tonal variations (or nuances) that we would – in spoken language – communicate through our voice and body language; moreover it lets us vary the visual, aesthetic quality of our prose – which is as much a part of the experience of reading as comprehension is.

Is this a written excerpt from a text book or something? This single sentence makes more sense than many of the arguments I've seen previously.

Nah, it’s just a deliberate hodge-podge of a sentence where I threw in every subclause I could to try to illustrate the point.

Of course, we do use compound numbers in English.

A very common example is in threads for machined screw threads, e.g., 1/4-20. This is not a range of numbers spanning from 0.25 to 20.0, but rather a pair of numbers that define two metrics of a single thing, which combine to uniquely identify the thread.

Perhaps context is sufficient, but adding this to your examples gives us at least three scenarios where the single symbol would mean very different things with pairs of numbers: compounding, subtraction, and numerical ranges. If we add on the clause separation duties of the dashes mentioned in the article, we have four uses where a single symbol sits between two numbers and means entirely different things.

There's no shortage of mathematical notation and delimiting characters. Eg you could write your machine screws as .25+20i. Obviously you raise e to the power of your screw and you get a rotation rate in the complex plane, and a width of screw in the complex plane as well.

Compounding and numerical ops are basically never confused. Machine screw is the only one of these where its even plausible. Not that subtraction and range are ever ambiguous, but if they were just use "#1 - #n" to denote "the numbers 1/n being used as labels for some range of options, not as a numerical values".

All in all, we have plenty of characters. A minimal set of rules, minimal set of characters, rich in predictable patterns, is what makes for a good language. The existence of a whole slew of specialized characters, all basically indistinguishable and frankly unheard of to most, has to work hard to justify itself right to live on my keyboard. We have parenthesis, commas, colons both full and partial, brackets square and curvy, braces, slashes forward and back...More than enough permutations and code space for anyone's expressive needs. Why anyone would opt for more byzantine characters with more rules on top is beyond my imagination.

Then certainly we should remove those superfluous brackets. Commas suffice for parenthetical asides. Sentences already imply grouping. I am a bit upset at your use of double quotes above. After all, we have the single quote, which consumes half as many valuable pixels and does just as good a job of indicating quotation. Colons of any level of completion merely separate clauses, a task more than thoroughly covered by commas and periods. Context is, of course, a great disambiguator, so I see no reason to use any statement terminator besides a period. What possible confusion could arise.

While we are at it, we have so many words. Perhaps we should simplify to one of the several published standards of simplified English. After all, the number of combinations of a thousand words in sentences of arbitrary length is enormous. Why anyone would opt for more byzantine words with more nuanced definitions and rich history of usage, tradition, and cultural value is beyond me.

We could go on with grammar (I mean really, what the hell is pluperfect), spelling ('c', for example is useless on its own, its uses being filled alternately by k or s), fonts (wtf is a serif), capital and lowercase letters, and I am sure many other topics.

Why do we keep more words, punctuation, and other linguistic and typographical devices around than we need? A mix of inertia and legitimate uses and perceived value. It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.

I know your examples are intentionally extreme to prove a point, I'm biting anyway.

Parenthetical type grammar with an explicit start character and end character is pivotal for encoding information unambiguously. You can't replicate that with any system that uses the same characters for the start and end, because it would be ambiguous as to if you are starting a nested context or ending the present one. Double, single, and even the rare triple quote allow for nested quotation. In principle a clean open and close quotation mark would also solve this (no subtle pixel hunting). You're right that we don't truly need four redundant variations on bracketing, but reducing it to just one is probably too few as it would be representing too many possible things at once. How about one pair for a narrative context (aka a quote), one pair for linguistic recursion (like I'm doing right now), one pair for collections of objects such as a list or a set. Colons probably could be skipped, everything beyond that is strawmanning me. A certain small number of delimiters / particles / whatever are needed to have expressive completeness. You need to be able to build sequential lists, unordered lists, one of several possibility sets, and / or / not type relations. In other words, a natural language at the very least needs some sort of regex subsystem, but it need not be much more more sophisticated than regex. I'm not a grammar denialist in fact quite the opposite. I want the information coded in simple grammar rules, not ad hoc arbitrary tables continually expanding.

I say this as someone who had a 12th grade vocabulary in 5th grade and its only gone up since, vocabulary is a waste of time.

Actually, I'm almost with you on 'c', but I'd rather throw out 'k' because its one of the few that don't fit on a 7 segment display. Capital letters also don't add much information. Yes actually, I'm fine with all of those going away. I couldn't tell you why the people who design way finding signage avoid serifs like a pox, yet other design fields refuse to read without them. With or without seems to read just fine. I really don't care too much either way. Letters would be better if they all worked more like EFHLT. Right now, too many clashing elements. Some are boxy, some are round, some have sharp diagonals. I'm not saying it has to be a 7 segment design, but it would certainly be pleasing if learning the alphabet, its ordering, how to write it, could all happen much faster by just noticing a few easy repeating patterns. Yes actually, lets do language reform.

>It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.

Well I'll agree with you there. All to often pointless pedantry comes down to "my school must be right otherwise I am wrong". Love or hate my reasoning, at least you can't accuse me of doing that.

> Parenthetical type grammar with an explicit start character and end character is pivotal for encoding information unambiguously.

You argue against multiple types of dashes because context is sufficient, despite there being typographical ambiguity. But you insist that we must have typographically unambiguous bracket characters. I must admit that I am struggling in this conversation to determine when we can depend on context and when we need unambiguous markers. Perhaps I am just incapable of picking up on the subtle context that backs up this position of yours. (:

> everything beyond that is strawmanning me

In fact, you will find examples of real human languages that exhibit more extreme versions of the things I have suggested.


There are languages with simpler tense systems than what English has. Slavic languages, for example, tend not to have a pluperfect. So, the example of removing tenses is based in reality.

Hawaiian has an alphabet of just 13 letters. So, removing letters from the 26 in the English alphabet is based in reality.

The Dictionnaire de l'Académie française is being updated to its 9th edition and is expected to have ~60K words[0], whereas English dictionaries report an order of magnitude more[1] (even with the issues in the linked source, this is a large gap). Basic English[2] has a vocabulary of less than 1,000 words (if you desire a vast overhaul of the existing norms of typography, I hope that you are at least willing to entertain prior art in the area of overhauling the use of natural language as a valid example, even if you disagree with the intention or outcome). If you wanted me to go to extremes (which again, I did not in the post you replied to), I could have just suggested we use Toki Pona. Of course, if I did suggest such a conlang, you may have been correct that I was strawmanning you and going to extremes just for a point. Nevertheless, we can definitely conclude that there are, in fact, natural human languages with substantially fewer words than modern English, and there are definitely constructed and artificially restricted natural languages with enormously smaller vocabularies.

You need not agree that these examples constitute best practice, or that they represent desirable goals in the continued evolution of language and written communication. I hope, though, that you can recognize that none of these are strawmen, but based in reality, many in natural languages, and some in artificially constrained natural languages for specific purposes. If anything, I presented examples that do not represent the extremes of any position (I could easily have brought up languages with no written representation, for example). I merely selected additional examples that conform to a broad categorization of removing stuff from modern English.

I welcome further discussion on the topic, but I worry you might dismiss things I say you disagree with, as you have done once above by ascribing an intention of strawmanning you, and as you seem wont to do with typographical conventions you dislike. And if you want to eliminate the punctuation you dislike, what might you do to a person whose arguments you dismiss? (;

It seems though, that you just don’t like the various dashes, which is totally fine. Many other people and I find value in them. Still more probably just go along because, as I said, a big part of language norms comes from inertia. The point of language (other than perhaps some, but not all, artistic expression) is communication. Why abandon the norms that facilitate this communication? Is it better to stand on preference (or perhaps principle) and harm your attempts at communication or to yield to norms and be better understood (though perhaps annoyed)? I do not know that there is a correct answer to this question.

I do hope, though, that I have disabused you of the fanciful notions that I was cherry-picking ideas that are extreme just to prove a point and that I was strawmanning your argument. I have shown above numerous examples that back up each of my suggestions, grounded in the reality of natural human languages. Further, I have shown several examples that are truly extreme to show that my original suggestions were not “intentionally extreme to prove a point.”

[0] https://www.thoughtco.com/academie-francaise-1364522 [1] https://www.merriam-webster.com/help/faq-how-many-english-wo... [2] https://simple.wikipedia.org/wiki/Basic_English

I don't care about multiple types of parenthesis per se, I do care about there being a spanning set of grammatical constructs. I don't think period and comma alone would be enough. You need to have constructs for compressing and abstracting. "John/Paul/Ringo/George were in the Beatles." Notice how I just made 4 sentences for the price of one. I could have written "John was in the beatles", "Paul was in the Beatles" ... all four statements fully unrolled. You need constructs which let you FOIL sentence structure just like in math class, presenting (option A, B and C) to (you, and everyone else). You also need a handful of "client server type" interaction structures. Header information. A thing to indicate if the content is a question, request, demand, greeting etc. Grammar is not about encoding literal speech pausing, its about encoding how to deserialize the linear sequence of words.

In theory you could just make "(" and ")" the universal sub-context denoting symbol. You would just need a different extra symbol to clarify between what a parenthesis means. The three systems makes sense. One for data agnostic compression like a JSON object / foiling a math expression, one for relaying text itself as an object in the domain of discussion rather than as the thing being said (aka a "quotation"), and one for scopes that are part of the discussion per se (not quotation).

Context suffices when the parts of speech have no chance of being in the same slot. Compound words and numbers.. your machine screw example was pretty rare. I think the dashes are too specialized in meaning and too hard to tell apart to justify code points in the docs and buttons on my keyboard. If need be, distinguish the various flavors of hyphen with some rule about touching the letter or having two in a row. Our symbol set is reasonable. Not as succinct as Hawaiian, not so bloated as Chinese. 13 chars fits in 4 bits. 26 chars fits in 5. With great strain you can maybe find a workable set of grammatical symbols without blowing past 32 chars, but will probably end up using a 6th. I'm against bloating the raw number of symbols and rules everyone has to rote learn, not dashes in particular. If its already in frequent use like all the paren styles then fine, but lets not make anything worse than it has to be.

> Grammar is not about encoding literal speech pausing,

This is absolutely correct.

> its about encoding how to deserialize the linear sequence of words

This is absolutely incorrect. Grammar is the collection of rules that prescribes the combination of words to make valid collections of the same in a language. Specifically, grammar is distinct from semantics, which is concerned with meaning. A nonsense statement may be grammatically correct.

Punctuation is the collection of non-character glyphs that are used to capture the nuances of spoken language into a written form.

Punctuation is orthogonal to grammar.

Put more briefly: spoken language has grammar and no punctuation; written language has the same grammar as the same spoken language and also punctuation.

Parenthetical asides are represented in spoken language with some combination of marker words, pauses, tone of voice, word choice, and perhaps other indicators I may have forgotten. The purpose of punctuation is to lend some of the nuance of spoken communication to the otherwise sparse written word.

The argument of the number of bits to encode glyphs is also orthogonal to the purpose or usefulness of language, writing, and communication. Computers are tools. A keyboard should justify the paucity of its glyphs, rather than the other way around. Once we get here, we are in the realm of pure opinion and preference, which I don't have much interest in pursuing.

> A nonsense statement may be grammatically correct.

I'm sure then you already know, colorless green ideas sleep furiously.

I see no contradiction in what you are calling incorrect. At some point whatever representation our brain uses for concepts and thoughts, to share that object requires us to pack into a linear sequence of words which can then be reliably unpacked by on the other side. The very nature of verbal communication forces the existence of serialization/deserialization rules. Those rules are what we call grammar. Grammar may be somewhat orthogonal to semantics, as you observe it is possible to encode valid nonsense, but the grammar exists to encode semantics and is thus to some degree tied to it. The grammar rule of "subject verb object" doesn't only tell you how to check the validity of "colorless dreams sleep furiously", it tells you how to deserialize that sentence back into a hierarchy tree of constituents and their relations. It just so happens to unpack as an object of useless constituents and impossible relations.

Punctuation maybe orthogonal to grammar in the general case, but in this particular language they are highly coincident. Virtually all punctuation marks are grammatical particles. It doesn't have to be like this. Some languages have "audible parenthesis" words. Others have words for marking the end of a sentence as a question. Calling punctuation marks a non character seems a bit artificial. Let's just call them the non-audible characters, in analogy with non-printable characters.

The argument about bits was apparently lost in transmission. I assure you this isn't a preference and opinion thing. Information theory applies just as well to natural language encodings as it does to computer protocols. The basic principles of information entropy and optimal transmission encoding shows up in every language: the least frequently used words are the longest. In an analysis of conversations across languages, researchers found the bit rate to be constant. Some spoken languages are seemingly very fast, but that's because the information density per word is lower. The brains bit rate is a constant. Irrespective of if we are using a computer or not, the size of an alphabet is measured in bits. The bits in the alphabet determine how much you can possibly say per character. On an extreme end, Chinese has over 5000 characters. That's around 13 bits of information per character, at the low low cost of memorizing all of them. For comparison, ignoring capitalization and punctuation, English is a 5 bit alphabet meaning the same amount of information fits into 3 letter words. The Hawaiian alphabet can cover 80% of those possibilities with just 3 letters, and the remainder with a 4th. Think about how powerful that is. Is memorizing 5000 arbitrary squiggles worth it to compress the width of words down by ~3 chars?

The number of bits that are in an alphabet also determines the minimum number of unique design elements needed to construct letters for it. 7 segment displays are a great example. As I said, our characters fit on 5 bits. That's the minimum. Now when our letters came about, they didn't know about bits and they certainly weren't doing this on purpose, but almost every letter can be expressed on a 7 segment display. In other words, writing a letter only wastes two bits per character relative to saying it.

When you learn a new ligature in an Arabic script, you've doubled the number of letters you know. When you learn a new Chinese character, you've learned a new Chinese character. Language is a transmission medium. Its a tool. My takes here are no more preference and opinion than the allocation of the radio spectrum. There's an optimization tradeoff to be had between the limited character choices of Hawaiian and the extreme rote memorization of Chinese. Going from 13 to 26 characters does double the learning time, but the learning time at that stage was short anyway. Going from what we currently have to perhaps 60ish characters (6 bits) doubles it again. Maybe that's tolerable. The next step up is a ~128 characters. There may be things you can do quicker with a large set of symbols, but the ROI for learning all those symbols doesn't pay off. Around 5 to 6 bits is where most writing systems settle.

And that's why bloating the raw glyph table with letters and marks is the wrong solution.

But apparently only insufferable pedants care about clarity. That's why we should stop using those pointless number glyphs too and just write them out in unary using hyphens. -/------------------------- is just fine.

.... -.-- .--. .... . -. ... .- -. -.. .--. . .-. .. --- -.. .. ... .- .-.. .-.. .. -. . . -..

Reminds me of this guy I met at a CTF. He decided that punctuation generally is unnecessary. What's the use of having so many different symbols if the only thing they denote is pauses between words.

so when he wrote something . he used only periods to denote pauses . no other punctuation symbols . no capital letters . some people were thinking that his periods stand for perl concatenation operators . i dont know if he is still doing this . i hope he stopped

Writing is a recorded symbolic convention for the benefit of the sufficiently educated reader.



There's a reason monks of old read aloud. It was about the only way to confirm the actual meaning of a text.



(Bass-ackwards hint: "Arabic" expresses poorly when vowels are excluded.)

actually i kinda love that . punctuation is semi arbitrary anyway . and this is actually much easier to read than the usual literary english full of semicolons and dashes . mimics speech much better too .

I think it mimics a certain kind of speech.

Some people do talk like that . All complete thoughts . Sequential.

Other people—and I very much count myself among them—have a less linear, more tree-like mode of expression; where the ideas, instead of building on what came before, are being laid out out of order – the ideas aren’t completed – and more complex punctuation is needed to establish the relationships between those thoughts.

It sounds like I’m saying the former is less sophisticated than the latter. I don’t think that’s true.

I think we should probably try to express our ideas in a way that doesn’t require out-of-sequence reasoning. Short, simple sentences. With clear meanings. Building on one another. Much easier to follow.

The tree-like mode of endless nested parentheticals and asides is just a rendering of an incomplete thought process.

Not better or more sophisticated. Just still in progress.

Probably scored a. sweet gig. writing lines. for. Captain. Kirk.

The article is not actually very pedantic - at one point, the author encourages us to break the rules - and I feel it has been offered in the sense of "printers have developed these variations on the basic dash, and if you choose to use them, it is probably best to use them in the same sense as printers themselves do."

In several significant computer typography systems, the notation for an en dash is a doubled hyphen (--), and for an em dash a tripled one (---). Notably LaTeX and Markdown (Pandoc flavoured: <https://pandoc.org/MANUAL.html>).

In LaTeX I’ve been using \textemdash instead. I don’t actually know why, just, usually these sort of longer names tend to have some niche edge case they handle better.

em-dashes and parenthetical should be used sparingly so it isn’t too annoying to do all the extra typing.

My preference is for spare markup where at all possible. Less typing, less mental overhead, clearer source text.

If it's necessary to be explicit for clarity and proper rendering, then sure. But otherwise, the less friction the better.

After years of procrastinating in learning LaTeX (the Lion Book turned out to be a clear, delightful, and highly useful reference), one of the pleasant surprises was that paragraphs are simply denoted by two carriage returns. After years of hand-coding HTML where matching <p> and </p> tags (among many others) was a constant occupational hazard, this was just ... pleasing.

Markdown has a similar philosophy, if a far more restricted set of capabilities. That set is however sufficient for a tremendous number documents, and if it's ultimately insufficient still remains a useful way to get started with writing.

Probably you know this, but you don't actually need to close every <p> with a </p>.

I've been through enough different HTML variants whilst not having to adhere strictly to standards that I'm moderately fuzzy what the current state of the standard is. I hear the current standard is also ... large.[1]

But even if it's not strictly necessary to balance <p> tags ... it is necessary to do so with many other HTML elements, and missing or mis-typed tags can utterly bork a page, particularly if there's any complexity to it.

(Hand-crafting tends to minimise that complexity, but it's still possible to get reasonably twisted.)

That said, checking one of my favourite HTML5 references, whose page source itself is a beautiful example of clean HTML ... I see that Mark Pilgrim in fact omits the close tags on his paragraphs:


And that said: LaTeX and Markdown also omit the need for the opening paragraph tag. So there's that.

Still, fair point ;-)



1. Drew DeVault has suggested as much: <https://drewdevault.com/2020/03/18/Reckless-limitless-scope....>

It reminds me of the strong feelings about Comic Sans.

The guy who created it said something like, “If you love Comic Sans you don’t know much about typography and should probably get a new hobby. And if you hate Comic Sans you don’t know much about typography and should probably get a new hobby.”

I feel the same about this. The average person has about a billion things to improve in their writing before the “correct” use of different dashes should become something they think about.

Depending on the audience, I think the article is justified and gives a good overview. Just thinking of scientific papers, where sometimes you spend a full year carefully laying out the words. Being concise here helps improve legibility and is definitly worth the effort.

... with all due respect to folks who choose the hard and extremely frustrating academic career path, the inefficiency is so absurd that it truly only can exist in these gigantic institution-sized machines. (And in similar sized corporate money-makers.)

Most papers are fundamentally flawed, unfortunately, due to lacking sufficient information and data for replication, being underpowered (and not controlling for many factors).

It took decades to get to some minimally sensible standards (preregistration, conflict of interest declarations, awareness of the most common stats issues, power analysis), but we're still far from doing effective science.

Money is still handed out based on feels, hypes, name recognition (when it's not blinded) for laughably small projects, instead of focusing on establishing longer term ones and/or improving the actual science output (ie. data and hypothesis generation) of existing ones.

(Yes, of course, academia approximates this. Yes, yes. Everything's fine. We'll have a usable model of Alzheimer's any second now! Aaany second. Just let this new totally effective model of depression/obesity/learning/ME-CFS out of the door first.)

Without the pedantry, things can devolve to violence. Panda bears with machine guns. Horrible stuff.

- https://en.wikipedia.org/wiki/Eats,_Shoots_%26_Leaves#Title

> pointless rules for insufferable pedants to power trip over

The perfect topic for HN!

Arguably there's a place for both an em-dash and a hyphen. (For your example, a hyphen would be pretty normal style anyway.) But in a world where double quotes is a massively overloaded punctuation mark we probably don't need an en-dash at least.

TLDR; Using the right dashes is about the UX of text. If you don't care about UX of the reader your points are sound.

I however--as a typographer--strongly disagree. Typograpy is both about beautiful typesetting as well as making sure that the information contained in the text is understood easily.

The former is obvious to me. It may not be to you but that doesn't make your reasoning right.

As an analogy, there are quite a few people among my friends & acquaintances who cook occasionally or rarely. They usually share the trait that they care more about eating than how something tastes. Bluntly spoken.

They commonly have one kind of oil in their kitchen (most often suflower) and they use it when the recipe demands "oil".

Usually recipes specify what oil to use. It may say olive oil or peanut oil or sesame oil. They won't have these oils and they don't care.

Even though the effect of using a different oil is profound on many levels (not even only taste). If you care, that is. Same with the dashes. Text looks and reads very different when those different dashes are used correctly.

Which leads to the information part. Why do we have these different dashes? They actually map to spoken language.

A hypen is used to pull things together. A word can be hypenated (should be read as if the hypen didn't exist) or two words can be pulled together (making the pause between them shorter) "ever-changing" is pronounced differently than "ever changing".

An en dash used between points in time or space conveys that. A distance. The spoken pause is usually longer.

And finally, an em dash, like a comma, conveys an even longer pause between the words it separates.

I must say, truth is an absolute defense, and I'm certainly one to both value calories down the gullet crude and efficiently, and to not be terribly aware of what goes on in the font fetishizing circles (no disrespect). But I do understand information coding and that obsession over a good design. For me, its been subways and metros. I've been doing redesigns, obsessive recoloring, obsessively flipping between colors and shapes and other markers in an attempt to compress all that information down to the entropy limit. So I get it. I just don't get it with typography. Its all just letters to my viewing. Once the physical squiggle has been recognized for the abstract symbol it represents, the symbol and not the squiggle is all I remember seeing. I honestly couldn't tell you the last font I ever looked at, let alone if it had serifs or [insert typography feature, no really thats the full extent I know]. I can't say I'd ever noticed (or benefited from) a distinction between dash length. Any component of a letter under a certain length scale I mentally dismiss as likely printing dirt anyway. If it works for other people, well great and mad respect for it.

So I get it. visual design language serves a purpose. An important purpose. Its not the artful navel gazing outsiders think it is. Well, maybe some people are like that, but there really is objective purpose under it all. I'd even say I agree about rules for hyphens touching their neighbors or not. For compound words it should be a train-like-in-construction whereas in a delimiter roll like range of items it should go Boston - DC.

I just can't see having a whole dedicated set of minutely different characters fit for this purpose. I dislike it for the same reason I dislike lego sets that have a particular piece in them which isn't used for anything else in any other set and never will be. It ruins the elegance of the system. It offloads a minor design problem onto somewhere it doesn't belong (namely the character set). I want to know everything while learning as little as possible. Which is why I strive for encodings that express as much as they can with as few elements as possible.

If it were me, I'd just have '.' , '-' and '_' exist at mid, bottom and top heights and be done with it. Don't like my line length? make it whatever length you want either dotted dashed or continuous. Solves every use case, extremely composable, every permutation that should logically be there, is there. .,;:' notice anything incomplete? LHTIFE notice whats missing? qbhrnujdp damn that's frustrating. KRBPF where's the rest of the set?

I agree. I’m usually a stickler for punctuation and spelling, but I can barely tell the difference between these three hyphens. And that is when they are right next to each other. If they were alone in a document? There is no way I could know which is being used. If they aren’t easily distinguishable, I don’t see the point in using three separate symbols.

> Seriously, what's the point of this pedantry.

My takeaway wasn't that the article was being pedantic, just that it was being informative.

What's the point of punctuation? The point is that ambiguity exists in human communication. Where accuracy and precision are important — for example in formal communication — different punctuation marks and rules help prevent misunderstandings.

When engaged in less formal communication, or when the stakes of miscommunication are lower, these rules seem (as you observe) unnecessary. I think that insisting on proper syntax, spelling, grammar, or whatever else in an online forum like HN would be silly. But, internet forums aren't the entire world, and it is conceivable to me that there may be places where people need to depend on the meaning of their message being conveyed reliably.

I know the correct usages but often avoid them as it doesn't confuse the reader, but can break copy/paste usage. I can get by with ASCII hyphen/dash and double-dash for em-dash. I particularly dislike autocorrection of punctuation into more pleasing forms (e.g. smart quotes/apostrophes). This is one reason I tend to do outlining in Github issues more often than G.Docs.

Of course I'm mostly writing about computer/software topics and don't write for publications or a non-technical audience.

The em dash is syntactic sugar for a brief digression in discourse.

...and when the discussion on whether to use [mnxyz+]-dashes has finally been sorted we can start on which font to use to render these dashes, whether they should be proportionally rendered, how to handle ligatures with dashes, to RGBA or not to RGBA dashes, hinted dashes versus unhinted dashes, the big difference between the visually identical dashes in language A versus language B, et ce.te.ra.

We have nice things that are free to use—I think we should use them.

There are different contextual requirements that can be served by specific typographic characters. I'd much rather see them used, than not.

I'd imagine most would feel the same‽

I_for⁃one−do–not-care—what--line-thing-ies–areーused&∴I༌wish‑goodluck to those·that·care

Agreed! Plus with my handwriting, who's ever going to be able to tell the difference?

Handwriting? We don't do that here.

agree. why many word when few do trick

The article misses the rather important piece of trivia about technology compromises that what it has been calling “hyphen” is actually U+002D HYPHEN-MINUS, rather than U+2010 HYPHEN. The situation there is a real mess: HYPHEN-MINUS is ugly in many fonts due to compromising between the ideal appearances of a hyphen and a minus sign, and HYPHEN is often missing from the font, leading to falling back to a hyphen from a different font rather than HYPHEN-MINUS from the same font (which is clearly more desirable, but technically unappealing).

A comment led to the follow-up https://www.punctuationmatters.com/the-difference-between-a-..., but it’s still very insufficient, only dealing with MINUS SIGN and assuming HYPHEN-MINUS was exclusively a hyphen. And appears to have suffered from the same replacement of lone HYPHEN-MINUS with EN DASH as this article.

I get why you wrote those words in all caps but it still feels like you’re yelling emphatically about nothing, and that coincidentally sums up how I feel about the rest of this topic.

Thats so myopically HN... "I don't care about it, so it's probably not important and dumb anyway lol"

It’s also very likely to be hypocritical: how many topics on HN are tuned towards a very specific kind of focus/nerdom? And what’s the point of commenting “aha, good for me that I don’t care aboutt this!…?

I guess the difference here is that someone’s boss might complain that they should follow this article, since we all write stuff from time to time.

Agreed it’s very HN. But it’s not just bad. Hackers are usually hard-wired to reduce entropy—we’re quick to point out when something is redundant or unnecessarily ambiguous. Formalia is also used for gatekeeping, which the HN Zeitgeist doesn’t like.

That said, personally I need my different dashes, commas and parentheses for my excessive wavering.

And to the one expressing that thought, you're right. To them, it's not important, and dumb.

You could argue, however, that they should refrain from posting, but they probably felt the need to share in case others felt the same way.

What about I don't care that you don't care?

Great, we're on the same page

Made it 54 years without ever hearing about mdash/ndash/hyphen distinction. I've just been using the hyphen character for everything. Must have been absent that day in grade school.

This guide and most guides like it tend to miss the most important and powerful use of the em-dash and make it out like you can use it for anything but really they are just missing the wonderful simplicity of the em-dash and how versatile that simplicity is. The em-dash raises and lowers the narrative voice. In fiction this provides a way to provide insight into the narrator; an em-dash tells us we are switching from the story the narrator is telling us to the thoughts of the narrator, a second em-dash or a period lowers the voice back down to the story the narrator is conveying. This is the sense of dialog being introduced with em-dashes instead of being quoted, a new line starting with an em-dash lowers the narrative voice, narrator hands story off to character.

The simplified rules for the em-dash are pretty much intuited and prescribed versions of this which gut the effectiveness of em-dash. In general use an em-dash should be used to denote thoughts without having too restructure/delete what you just wrote to accommodate that thought.

Edit: I oversimplified. Consistency is what is important, using an em-dash like a comma that isn't a comma leads to ambiguity when you also use commas. A writer who avoids semicolons and quotes all dialog can use an em-dash very differently than raising the voice, but they can also use a semi-colon very differently than its standard accepted role, that is what these simple guides miss, the consistency of usage, they just list all of the various ways you could use any given mark and people start using an em-dash to "fix" their long run-on sentence with all of its commas.

The closest thing we have to standard use allows for wonderfully complex sentences which can convey great meaning but consistent and well defined use is most important.

comma - connects independent and dependent clauses

em-dash - raises and lowers the voice

semicolon - connects independent clauses in a more direct way than the paragraph

colon - elaborates an idea

parenthesis - an aside, stated instead of thought

period - end of thought

Question mark and exclamation points do not need to be at the end of a sentence, they can double as comma, semicolon, or colon.

I seem to be missing a nuance of HN's line breaks and formatting.

Reasonable choices. And a good description of a specific use for the em dash. But I think it’s a poor mind that can only conceive of a single use for a punctuation mark.

We could also use em dashes to signal excitedly running from one thought to the next—as if we’re just riffing on an idea—too fast to be interrupted—wouldn’t that be amazing?

Or we can use the em dash to slow us down—to pause and reflect on what we just said.

Or in dialog:

“Perhaps we can use it to signal an unexpected inter—“


“No, an interruption.”

“Yes, that would make more sense.”

“Oh! I just thought of something—we could also use it to indicate stunned silence.”



It is easy to conceive of uses, having a consistent style which conveys what you want to most any reader is another thing. If you had wrote all those examples without using the text to explain them the reader would have to stop and think about what you are doing and that is not a good thing.

Sure, but any given work could introduce such a use within the first few pages and the reader would be accustomed to it pretty quickly.

Too quote myself "consistent and well defined use is most important," and I have repeated this sentiment in most if not every post I have made in this thread. My point to the previous comment was that if you were not consistent it would not make sense, if his examples did not explain themselves than they would leave the reader stopping to figure out what is going on at each punctuation mark. You can break your own conventions within a work but those conventions need to be well established before you do so and you need a good reason to do so, breaking your own conventions because relying on the punctuation is easier than relying on the language or on whim is a terrible idea.

I've never heard that perspective before about raising the voice, but I really like it.

What's even more interesting to me is that this contrasts with a parenthetical which I now realize lowers the voice when we read it aloud.

Did you discover that difference on your own or did you read it somewhere? Just curious.

I realized it on my own in an intuitive sense, my writing before I properly learned it shows this use but eventually I read some things on punctuation and fixed my naive use of the em-dash and other punctuation marks. I think "raising the voice" might be the old fashioned term but I can not remember the more current term or even if there is one, put some time last night into trying to find it but search engines are nearly useless and return page after page of sites conflating voice and tone or prescription punctuation guides which just list uses with no care about consistency in style.

I realized it on my own in an intuitive sense - my writing before I properly learned it shows this use but eventually I read some things on punctuation and fixed my naive use of the em-dash and other punctuation marks. I think "raising the voice" might be the old fashioned term but I can not remember the more current term or even if there is one - put some time last night into trying to find it but search engines are nearly useless and return page after page of sites conflating voice and tone, or prescription punctuation guides which just list uses with no care about consistency in style.

This is how I read them.

My mental model is that an em-dash is a parentheses that author was too excited to slow down and make vertical.

Do you have a reference for this? Never heard that particular framing about the narrative voice before. You call it a versatile simplicity, but to me it sounds rather restrictive and specific, to be honest.

Search engines seem to really fail here, they are just giving me more guides like the one here, I can not get them to give me anything about narrative voice beyond conflations of narrative voice and tone. You can see this use in a great deal of literature which uses the em-dash to introduce dialog in place of quotes, I believe Becket would apply but it has been years since I have read him so can not say for certain. Most of the authors known for their long complex sentences follow the conventions I outlined in my edit even if they do not use the em-dash for dialog.

>sounds rather restrictive and specific, to be honest.

Write a single sentence which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc to write their wonderfully long and complex sentences and by complex I am referring too meaning as much as structure, we can have great meaning with simple structures but we have to accept a certain amount of ambiguity with that. Sure that challenge can be executed as a paragraph but then it ceases being a single thought, it is a collection of thoughts and that is a very different thing.

If you'll indulge me, I actually think your final paragraph could be copyedited to illustrate all of your suggested 'standard' rules — though in your own rendering you only used commas and periods.

> Write a single sentence, which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion, without following the "standard" I included in my edit: This is what allows writers (like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc) to write their wonderfully long and complex sentences (and by complex I am referring to meaning as much as structure); we can have great meaning with simple structures, but we have to accept a certain amount of ambiguity with that—sure, that challenge can be executed as a paragraph, but then it ceases being a single thought; it is a collection of thoughts, and that is a very different thing.

I tried to stick to your 'standard', though you might disagree on some of my choices. I would say I found it a little constraining. Here's an alternative edit that doesn't follow your rules but – I find – creates a more fluid reading of your original words:

> Write a single sentence, which clearly and concisely includes: exposition; thought; aside; rhetorical question; self rebuttal; and conclusion – without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc, to write their wonderfully long and complex sentences—and by complex I am referring to meaning, as much as structure. We can have great meaning with simple structures – but we have to accept a certain amount of ambiguity with that. Sure, that challenge can be executed as a paragraph; but then it ceases being a single thought—it is a collection of thoughts, and that is a very different thing.

All of which I hope goes to show that these choices are a matter of taste, not absolute rules

That paragraph of mine and most of my posts could stand some editing, I am terrible at editing on a screen and my casual use tends to be comma heavy. I think your edits wonderfully highlight an issue I believe I brought up in one of my posts in this thread, trying to "fix" things by changing the punctuation rarely works and restructuring the sentence(s) is almost always a better path.

>All of which I hope goes to show that these choices are a matter of taste, not absolute rules

Throughout this exchange I have avoided calling these rules and used convention instead, and that is what punctuation use is. Conventions are easy to break but you should have a good reason to do so if you want something to be readable and you must be consistent in your choices. I have tried to emphasize this throughout and I have repeated it many times, consistency of use is what really matters, punctuation marks are pretty much sign posts for the reader and as long as they remain consistent in their use and are well formed than most readers will have no issues figuring out any use.

Imagine if a town decided one day that they could save some money on removing a no longer needed stop sign by simply agreeing that it is not a stop sign, it is a small town and they can just pass the word that the stop sign on third is now a 30mph sign. This works quite well and save some money so now the town continues with this and starts changing the meaning of other signs. It is not difficult to see why this would be troublesome, eventually people will get confused and no one from out of town will have a clue. Thankfully our governments are consistent with their signs and a stop sign is a stop sign.

I gave I look through my books and English is wonderfully ambivalent when it comes to punctuation outside of prescriptive grammars. The descriptive grammars largely (if not completely) ignore punctuation and focus on spoken language, even the Cambridge Encyclopedia of The English Language reduces punctuation "rules" to a single page and reduces hyphen/en/em-dash to a typographical convention and does not say much more than the dash is often used in informal writing to replace other punctuation marks. All we really have here is convention and consistency, can you meet the challenge I outlined without following the conventions I laid out? It can be done but it will be considerably more verbose than it would be following those conventions which is not a bad thing. Authors like McCarthy, Krasznahorkai, Ellman, Bernhard have all built their style around breaking those conventions (yes, two are translations when it comes to English but they break the conventions in their own languages as well.) Even Joyce breaks the convention and he does it within single works, switches between adherence and breaking, but not many have pulled that off in the way he did.

It is a really complex thing and part of what makes English literature what it is. We have conventions which have evolved over time when it comes to punctuation and we have prescription, but we don't really have rules unless you are writing tech documents or journal submissions. It comes down to having a clear and consistent use more than anything else and using every punctuation mark for any accepted use based on whim is not clear or consistent.

I think I'd like to fork the language and write one with sane guidelines.

The problem is that the real world, human thoughts and other things that language needs to try to express are not "sane." So if we are to have a common basis for communication, the guidelines will tend to get "insane."

This is way too much pedantry and hyper-hyphen-focus. Honestly, I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will. They add nothing to anyone's communications.

Perhaps, typesetting still uses these, but that's okay. They can keep doing so, since these probably add aesthetic appeal to how flyers are designed.

I also noticed a pundit-battle brewing in the depths of the hyphen-m&ndash-soup.

The article:

  Let’s make that even more clear.
Yet, from another dash-hyphen pundit... [1]

  En and em dashes aren’t called that because they’re as wide as
  a lowercase “n” and a lowercase “m.” They’re called that
  because those are the specific typography jargon words that
  refer to the height of a physical piece of type (the “em,”
  also called the “mutton” to reduce confusion) and half that
  height (the “en,” also called the “nut”). An em dash was
  originally as wide as the font is tall.
[1] https://leffcommunications.com/2021/03/10/a-brief-history-of...

> I've never seen them in business or personal writing, and I probably never will.

En dash is all over the place in personal/business writing, even just in email, thanks to Word and Outlook autocorrecting a hyphen to an en dash whenever it's between two spaces (rightfully in my opinion). If you've never seen it then that surely says more about what you notice than the content of what you've read.

That doesn't necessarily contradict your point – if you never notice the distinction then what's the point? But it's different from how I read the implication of your post.

(Funnily enough, without thinking, I put an en dash in the paragraph above by holding down on hyphen in the Android keyboard, and only caught myself after I did it.)

>If you've never seen it then that surely says more about what you notice than the content of what you've read.

I'll agree with this. It also brings up the point, if punctuation isn't seen - is it useful? Probably not to me - maybe yes to others.

You might not be able to pick out the bassline in many of your favorite songs but that doesn't mean you wouldn't miss it were it not there.


"Spelling, grammar, and punctuation are a kind of magic; their purpose is to be invisible. If the sleight of hands works, we will not notice a comma or a quotation mark but will translate each instantly into a pause or an awareness of voice [...] When the mechanics are incorrectly used, the trick is revealed and the magic fails; the reader's focus is shifted from the story to its surface."

- Janet Burroway, Writing Fiction: A Guide to Narrative Craft

Thank you. It seems you misread the discussion.

Following this thread, the discussion isn't about the existence or absence of punctuation. The discussion is about the case of three specific punctuation marks, which appear extremely similar if not identical. These punctuation marks are being discussed after reading an article about their differences, which are only apparent to those among us who find memorization more important than clarity.

In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.

FYI this is a pretty condescending response to come back to. From the site guidelines:

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

Moving on from that.

> In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.

Indeed I did read it, I disagree.

Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.

I'm saying that the people who read literature containing en and em dashes would notice the difference were they not there. I'd echo what another commenter said: these marks wouldn't be missed until they're gone but we would definitely miss them.

I wasn't expressing condescension but that you would have had a better understanding of what was being discussed after you would have read the thread. You should work at following the exact site guideline that you quoted, instead of making assumptions about my day-to-day communications and commenting about those non-existent communications.

> Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.

What I was saying is that I don't see them in my day-to-day activities, which I don't. You are making assumptions about what type of communications I am involved with daily, and the types of people I communicate. I communicate with cryptographers and security professionals who all use mono-spaced text. I also communicate with C-level people who barely need to use punctuation other than a period. They have mastered brevity and communicate exceeding well.

It would make far more sense if you wrote, "I don't think it's reasonable that chownie has never seen an em-dash in business or personal writing."

The comment you're replying to makes perfect sense in exactly the context you describe, so your reply seems quite bizarre. Maybe you didn't read it properly? (See how unhelpful that tone is?)

For the analogy about not explicitly hearing the baseline, but the music still being affected by it: maybe you interpreted it just a little too directly? The analogue to removing the baseline is not the total removal all three of those punctuation marks. Instead, it's the removal of the distinction between them. I think it was pretty clear (and apt).

Quoting Erik Spiekermann “Typography is like air. We only notice it when it's bad”

> if you never notice the distinction then what's the point?

Given there is usage of en dash in the wild as you mentioned, there's a possibility this may be a case of "you don't know what you got 'til it's gone."

For someone who can quote Shakespeare [1] in a comment at the right time, you “…doth protest too much, methinks.”

[1] https://news.ycombinator.com/item?id=35086851

Ah, I am cut to the quick. In truth, one must sometimes be cruel to be kind. To one such as I, neither the hyphen nor the dash are a dish fit for the gods. In tragic travesty, it's all Greek to me. All that glitters isn't gold! [1]

[1] a bunch of Shakespeare's sayings scraped together, after they were trampled in a mosh pit.

> I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will.

There’s an en dash in the first line of text on apple.com right now. There are en dashes, em dashes, and hyphens in the most recent press release on that site, all used correctly.

> This is way too much pedantry and hyper-hyphen-focus. Honestly, I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will. They add nothing to anyone's communications.

You have definitely seen them. All professional writing outlets, like e.g. the New York Times, use em-dashes, curly quotes, and other “typographic” characters that one is supposed to use in American English.

And newspapers in my own country follow the typographical rules. Even though no one uses it in informal communication on HN or FB. (Well, some on HN do.)

Except, I didn't write I hadn't seen them. "I've never seen them in business or personal writing".

We can discuss that I chose the word "seen", when I meant "noticed", but there is no doubt that I didn't write what you intimated. I have seen the dashes in formal writing and in newspapers.

A too-hurried reading is worse than not reading at all.

Oh. So “business” does not encompass “professional writing”. Good to know.

And also not “formal writing”.

And presumably no copy-pasted message from Word or whatever other app inserts “smart”-whatever automatically.

And also not any regular old business website. (Did you think newspapers were the only ones? Just because those were the examples?)

Even for personal writing: some people even take the time to insert bullet points, so “proper” punctuation is easy for them.

You’re a fine one to complain about pedantry. (I guess yours is a just-right level of (cover your ass) pedantry.)

I was pointing out that I don't see them in my day to day activities.

You've added exactly nothing to this discussion, just written a personal attack founded on misunderstanding and make-believe.

IIRC, both of these are more or less true:


> They’re called that because those are the specific typography jargon words that refer to the height of a physical piece of type (the “em,” also called the “mutton” to reduce confusion) and half that height (the “en,” also called the “nut”).

An em was traditionally the width of an uppercase M and an en half that (around the width of an uppercase N). Nowadays, this relationship doesn't necessarily hold: one em is equal to the font size (e.g., a 12 pt font has one em = 12 pt).

obsessing about mundane details like that provides certain kinds of people with a mild feeling of control over their lives.

With what kind of telepathy did you uncover this fact?

introspection and observation.


I'm against ped:antic-pun-ctu,ation)) not against pretty, practical and productive punctuation.

Ironically on a punctuation blog, it looks like he has a punctuation typo in his title. In the headline, the semi-colon after "hyphen" should actually be a colon. So the corrected headline is "En dash, em dash and hyphen: what’s the difference?"

A colon is used in this context, when you're introducing the question that follows.

This article is wrong all over, from where I stand. Who uses semi-colon in titles???

> Some people prefer the way a “space-en-dash-space” looks.

I think this isn’t just a matter of personal preference, but it’s also largely a cultural thing – in German, for example, the “space-en-dash-space” form is common.

This is true for a lot of other punctuation as well. For instance, in Germany, we quote „like this“ instead of “like this”. Whereas in Switzerland or France, it’s common to quote using Guillemets, as in «Hello there!». This style can also be found in German texts, though it’s less common than quotation marks, and it would typically be used »inversely«.

> in Germany, we quote „like this“ instead of “like this”

This is also the traditional style in Dutch; it's what I was taught at school. These days many just use "upper quotes". You can still find the traditional style in books and some newspapers, but others have switched over the years.

In traditional Ethiopian you would use ፡ as a word separator, and ። as a full stop. Over time, people have started to "just" use the space as a word separator. There's some Wikipedia pages that mix both styles; for example on [1] you can see ፡ being used for the first three paragraphs and then it switches to a space. I rather like being able to see the evolution of language/typography on a single page.

[1]: https://am.wikipedia.org/wiki/%E1%8A%A0%E1%88%9B%E1%88%AD%E1...

Interesting. I also see a few periods and a lot of colons with a line over them.

What do they mean? Just curious.

Comma, question mark, stuff like that. There's an overview at https://en.wikipedia.org/wiki/Amharic#Punctuation

Thanks. I wasn't aware of this type of script either; I like it. Sort of a "missing link"(of course there is no historical relationship) between Kana in Japanese and Hangul in Korean.

Since you're quoting France, it's worth noting that there, double punctuations (?:!;) are preceded by a half-space (although in practice it's always a full space). Likewise, guillemets are surrounded by spaces (the space inside the guillemets might be a half-space, I'm not entirely sure). So it would be « Hello there ! »

Easy way to identify francophone writers. They always have a space in front of their colons, exclamation marks, etc.

Ahhhhh, thanks for that! I'm German speaking, and I must admit I questioned the intellectual capacity of some people I conversed with, due to that. In German there is even a slur for it: "Deppenleerzeichen" (fool's whitespace). Now that clears things up.

Just a convention. I used to snigger at the English language convention of capitalising the next sentence in a letter/email after the address - after all, you're still in the same sentence, so why capitalise it. But, it's a conventional thing, so now I do it myself.

That says more about you, frankly.

Eastern Europeans often drop articles because that’s (apparently) what they do in some Slavic languages. That’s a minor second/third-language quirk, not about an intellectual deficiency (lack of capacity).

Of course, some extra whitespace is even more harmless.

I think most people have these biases in one form or another. It's mostly a matter of your experiences I've found. Rural dialects especially trip me up. I don't know a lot of them personally, and most of the ones I see on TV are talking about.. rural stuff. Also, I think a lot of rural people who go to universities naturally end up toning down their dialects because they tend to be in the minority. So it's kind of rarer to see an academic with a thick southern accent. It will often be less pronounced. On the other hand Eastern European English accents have the opposite effect because most of the Eastern Europeans I've seen speak at length are chess grandmasters and physicists.

The only thing we can really do is try to notice these biases in ourselves and ignore them as best we can.

I never heard about the "Deppen Leerzeichen" in the context of punctuation, but always when German texts split up compound words with a space for no reason.

I think that's the central meaning, but it's used for space before punctuation as well (just a random reference https://www.lass-andere-schreiben.de/blog/kategorie/schreibe...)

That's normally called https://de.wikipedia.org/wiki/Plenken.

An unbreakable half-space, to be pedantic (though in this case the pedantry makes sense: you don't want your punctuation mark to end up on the next line)

And for extra fun, while the French word for space (espace) is masculine gender (un espace) for most its meanings, in typography, it's feminine (une espace).

Using an en-dash like this – you see – is the usual British style.

The unspaced em-dashes—like this—is typically American.

I consider a crime not to have any spaces between em-dashes and adjacent words. Traditionally, I guess, there were spaces of different sizes. Hair-thin spaces were typeset before and after em-dashes --- that's what I do in LaTeX using (\,). But, because different sized spaces have never been a thing on the Web, let alone plain text, people have preferred to not use any spaces, for some reason.

HN normalizes thin and hair spaces to normal spaces, so they can't be demonstrated here, but there is an example on Wikipedia: https://en.wikipedia.org/wiki/Whitespace_character#Hair_spac...

Oh dang the hair spaces look perfect.

I wouldn't call it a crime, but a convention. In Europe it's an n-dash with surrounding spaces, in the US is an m-dash without spaces. For me, the former is nicer, but crimes are maybe a tad more serious.

I think Medium uses hairspaces. And of course there’s some automation since all writers seem to get that thing.

There ends my trivia about that unusable site.

This is precisely what I do religiously in my latex: M-dashes are always {\,---\,}

Unspaced em & en dashes tend to stay glued to the surrounding words when there should instead be "word" wrapping at one end or the other of the dash. It is a crime against text aesthetics. We have met the criminals, and they is us - software types.

Not to mention, ems and ens are not Ascii and thus not strictly kosher.

And BTW, all of these can be found on the new AZERTY keyboard :


(BÉPO version also exists)

That looks well thought out. I use a QWERTY layout with similar reasoning applied to the Option/AltGr levels (but entirely different in specific placements) and I routinely type various dashes and quotes without conscious thought, any more than I consciously think about Shift-level punctuation.

In Spain the RAE (equivalent to the Oxford Dictionary) recomends «this», but you will almost never find it except in professional printing. They are not in the keyboard, so everybody uses "this".

It's a shame when technology fails us in this way - I just mean that computers are created to be our tools, and if we want to easily write «this», we can make that happen. If we only have people with this mindset (computers are our tools) in the right places.

I'm sure that's a ton of fun for anyone trying to write a natural language parser. LMAO using the end brackets as start brackets and vice versa.

> trying to write a natural language parser.

I assume we're done doing that, that task is finished ;)

I get you mean chatGPT has solved the problem, but it feels as if its solved the problem without answering the deep questions. We still don't really get how the brain does it or the answer to any of the deep linguistic questions, instead we get two systems capable of language which no one understands. But at least its useful! So maybe there are natural language parsers yet to be written, for nothing else than to finally test our understanding of natural language parsing.

Actually we quote „like this“.

There are many other quote styles - my language uses „these signs” (which we call "ghilimele", similarly to French "guillaumets").

EDIT: Seems HN is eating up the right signs... You can see them on Wikipedia here, they essentially look like two small commas: https://ro.wikipedia.org/wiki/Ghilimele

Oh, you quoted correctly, but the display of the right quotes is messed up. They should go from upper left bottom to upper right top, but instead show as upper left top to upper right bottom.

Yeah, so we could conclude that punctuation is not just a cultural thing, but – to make matters worse – depend on the whims of the font maker as well.

No, both “ and ” characters exist, as well as ".

“Convex” or „concave“ usage varies by language. See https://en.wikipedia.org/wiki/Quotation_mark#Summary_table

To clarify, I was referring to the mere technical fact that only if you type in a character like `“` (U+201C, “Left Double Quotation Mark”) using one font, it isn’t guaranteed to be rendered in the exact same style in a different font.

E.g., when I type a comment on HN and enter said `“` in the input text field, it uses my system’s default monospace font (Courier), which renders the character so that the stroke appears to go from bottom left (thick) to top right (thin). After I submit my comment, HN uses Verdana (the one from my system), which renders the very same character so that the stroke appears to go from the top left (thick) to the bottom right (thin). It’s the same Unicode character, but both fonts happen to render them differently according to how the font maker laid out and mapped the respective characters. (I can observe the same behaviour when I compare both fonts in my word processor, so it’s not HN-specific.)

“” look like 66 99 in conventional serif text fonts, but have wide variation in sans-serif and decorative fonts where they often resemble ‶″ or ″‶ .

„‟ are more consistent in current computer fonts by virtue of their Unicode names strongly suggesting a particular appearance.


I do often wonder whether we should maintain traditional typography when moving to a digital age because punctuation evolves as language does. If we’ve deemed it unnecessary to have seperate symbols for each of the dashes and everyone uses language that way then that’s fine. We can also ask this question about smart quotes, you’ll notice I’ve been using the U+2019 as the apostrophe here and I could “quote” like this. It's a question of how much ambiguity it causes, how easy it is to input, and how subjectively aesthetically pleasing it is.

My personal opinion for hyphens is:

- Ambiguity: most can be cleared up with spaces, and for examples like 3-8 if it’s numbers we can tell it’s a range from context

- Ease of input: one character is a lot easier to decide between than 3 (or 4 if you include minus), and if there are rules for software to be able to input the correct character every time then the differences in characters become redundant

- Subjective aesthetics: I quite like the consistent compactness of the single hyphen

And for quotes:

- Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier

- Ease of input: Usually automated but can absolutely tear through code if pasted in the wrong place. If we deem these smart quotes useful enough then they can coexist with typewriter quotes peacefully if we do not run the quote formatting on code blocks (which is where code should be anyway)

- Subjective aesthetics: I do like the look of smart quotes but would be willing to use straight quotes

The pragmatic thing is to stay glued to the typewriter and then escape our nested strings with Unix toothpicks everywhere.

> Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier

Typographic conventions go further than that.

In Norwegian it’s `«»` for one level of nesting. For nested quotes you are supposed to use something else. Maybe `‘’` (single quotes) for the second level and then `“”` (American English double quotes).

Maybe American English uses `“”` and then `‘’`.

In my opinion that’s not necessary. At least for text storage.

Part of my complaint about that is that although I think the different punctuation marks are great, using them is a pain because of keyboard layouts.

It's easy to find a hyphen (or something close enough) on your physical keyboard, but there's no em dash. OSes also make it a pain to automate even when they claim otherwise.

I go out of my way to use em dashes but do I think others would? No way. So is lack of use because of lack of utility or because of idiosyncrasies in keyboards?

Hyphens are great for some things but are too short to visually offset text.

The Mac layouts handle the dashes well in my opinion (quotes not so much). Option+‘-’ is ‘–’ (en dash), Option+Shift+‘-’ is ‘—’ (em dash). Option is equivalent to AltGr in the Windows PC world.

What about using tilde for numeric ranges?

"The global conflict spanning the years 1939~1945 is known as World War 2..."

Tilde is already used for approximation though.

The sentence as you wrote it could be misinterpreted as "the conflict spanning the years 1939 to ca. 1945...".

Had you used a dash/hyphen/minus/whatever nobody would be likely to misinterpret that as "the conflict spanning the years minus six..."

No, ≈ is used for approximation, ~ is just the most similar ASCII character, and it became ingrained by people used to using old computers. Just like * is not a multiplication sign, but × is.

In other words, tilde is used for approximation just like the asterisk is used for multiplication and “literally” is used figuratively. We can argue over those uses being correct or incorrect, but they are used like that.

Thus I agree that using tilde for numeric ranges would be confusing. Might as well just use a hyphen, which is easier to type and most people won’t notice the difference from the correct character (en-dash).

> but they are used like that.

Using that form of reasoning, it could be claimed that, say, “espresso” is pronounced ”expresso”, because some people do pronounce it like that.

But that would be disingenuous, since “is pronounced” does not generally mean “is sometimes, by some people, pronounced”, but “is supposed to be pronounced” or “is properly pronounced”. The same goes for “tilde is used for approximation”; no it isn’t. If would be different if scbrg had written “tilde is sometimes used for approximation”; it would have indicated a possible interpretation of the first meaning, and not the second.

> If would be different if scbrg had written “tilde is sometimes used for approximation”;

Oh, dear lord. I apologize for leaving out this very important word. I thought it was fairly clear that I didn't mean it was the only symbol used for approximation, pretty much like how, I don't know... nothing is the only thing used for anything.

Whatever phrase, symbol, word or tool in general you find, you can be fairly certain that there's something else that could be used instead.

In the really real world, people tend to use the symbols that are easy to type with their keyboards. Ironically, this is a bit like what TFA complains about; people always use the hyphen that's available with one keystroke when in fact they "should" (for some arbitrary value of "should") use a handful of different ones. And they use tilde for approximation, because nobody knows how to type a fucking ≈. You'll also note that they use " when they "should" have used “, ” or any of the umpteen other variants of quotation marks.

When it comes to ambiguity, which was what this sub thread was about, how things are often used is actually quite important. Because, you know, it's what people actually write that you have to disambiguate, not what they should have written.

OK, fair enough. I was recently on the other side of that same argument here on HN: https://news.ycombinator.com/item?id=34940715

No, they were right—languages change and a single tilde (~) definitely means approximately: https://en.m.wikipedia.org/wiki/Tilde

Most people associate a double tilde with “approximately equal”.

The asterisk is an approximation of dot, not a replacement for x. They just mean the same thing on scalars.

Using × or ∙ for multiplication is, IIUC, a cultural differentiator – just like English uses . as a decimal separator, but many Europeans use , for the same purpose. But in Unicode, × is “MULTIPLICATION SIGN” and ∙ is “BULLET OPERATOR”, and * is more visually similar to × than ∙, so I assume that’s where it originates.

At this point I actually handwrite an asterisk to denote multiplication. If I think about it I know it's "wrong", but I do it anyways.

¿But which tilde? I'm a fan of typographical abuse of the ⁓ swung dash myself.

Ohh yes, reducing the amount of tasks the hyphen is used for helps as well

It is easy to say this doesn’t matter, and personally, I couldn’t care less which is used. However, professionally, I have twice in the past two months had a deal with text that was edited by line editor for my organisation, where they strongly criticised our use of these punctuation markers.

And, after much cursing, and my team spending time changing the text, I reflected, and came to like those punctuation markers. Took me a long time, but I have been converted.

in 180 characters: https://twitter.com/swyx/status/1344127570753646593

A guide to the 3 dashes in English:

Hyphens (-) are compound-words.

En dashes (⌥ -) connect beginning–ending.

Em dashes (⌥⇧-) can replace parentheses and colons — use them more!

Also on Windows: http://wincompose.info/

Of the three major OSes, Linux has the most intuitive way of producing special characters with its compose keys

To my surprise, no spaces around mdash is the general recommendation.

Moving through text using Cmd + Left/Right arrow will jump over two words if there’s an em dash between them with no spaces. As a frequent em dash user that was very annoying, so I switched to adding spaces — to hell with the APA.

True; depends on the style guide, but most opt in that direction. The Chicago Manual of Style is very firmly in favor of it.

It's more a matter of continent than style (ie, the former can explain most of the variance).

The alternative is usually en-dash with spaces.

> ⌥⇧-

What is this

macOS shortcuts. Option-Shift-minus

In Polish em dash is supposed to be surrounded with spaces. I got that rule ingrained in my subconscious so heavily that I feel very uneasy looking at em dashes without spaces even in English. Same way if someone didn’t put a space after a full stop. So I’ve decided to go the British way and use en dash surrounded with spaces. And, after doing that, em dash really feels way too long. :)

"Do the first two look the same to you? It’s because some devices display them inconsistently, when the characters sit all by themselves."

And also because this article uses an en dash in the table in place of a hyphen.

Interestingly, if I copy that first character in the table early enough in the page load, it's a hyphen. If I copy it later, it's an en dash. Considering that this article is from 2010, I assume there's some JS added in the last 12 or so years that's autoconverting it.

EDIT: Wayback confirms it's supposed to be a hyphen: https://web.archive.org/web/20120120121527/http://www.punctu...

It’s the server (probably WordPress’s fault), not JS. &#8211; is an en dash:

    $ curl -s https://www.punctuationmatters.com/en-dash-em-dash-hyphen/ | grep -A6 '<h3>What'
    <h3>What do they look like?</h3>
    <table style="height: 139px;" width="289">
    <td><strong> &#8211;</strong></td>
    <td><strong> hyphen </strong></td>

I guess they respected their own recommendations: "when you're trying to illustrate what a hyphen looks like" was not one of the recommended uses of a hyphen!

I should also note that this whole point seems at best a point for typography geeks. These are three almost identical marks that have very similar uses. I am completely convinced that no one has ever disambiguated a phrase by noticing that something is a hyphen and not an en-dash or vice-versa.

For a somewhat more advanced (and IMHO much more beautifully typeset) but still succinct overview of em dash (and some other dashes) in practical use, see https://twos.dev/dashes.html.

Suitable for those who are familiar with punctuation basics but may want a refresher, and AFAICT gets some things more correctly (e.g., the numbers in a range are generally separated by a figure dash, not en dash).

It ... somewhat ... saddens me that HN's parser doesn't distinguish these as Markdown-based comment systems do:

Hyphen: -

En dash: --

Em dash: ---

On usage --- I find the practice of using the em-dash without bounding spaces (typical of most modern style-guides) is visually distracting and more difficult to read than when spaces are provided around the punctuation (as I've done here, and my stylometric stalkers may file as a personal identification tell).

And finally:

- Hyphenated.

- Non-hyphenated.

There is no justice.

Although that allows “hypenated-and-autological”, which is very useful under some circumstances and in frontend.

Can we at least get all the people making "no one" into one word ("noone", which drives me crazy) to hyphenate it?

Or does no-one care but me?

For a long time, I thought there is actually a word "noone" pronounced with an "oo" sound like "noon". You know, like no one says "whom" anymore but you still see it written.

I'm just going to leave this here for you https://en.wiktionary.org/wiki/firstable

I dislike that alot.

Literally noöne

Powergen Italia, anyone ?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact