m dash: --
n dash: -.
.. I take it few people find morse code puns funny anymore.
Seriously, what's the point of this pedantry. What does having 3 basically identical characters add to the language other than a pointless rules for insufferable pedants to power trip over. We've all been using - just fine. On what basis does the person writing this article believe these rules matter, are important, disambiguate language?
Call me a hopeless philistine, but I say down with the dash. One symbol is fine for word-compounding, numerical ranges, subtraction, mid word line breaks. No one needs an em dash to tell them pages 3-8 is not a compound word.
By the same logic you might as well say:
"why are we even kerning fonts, who cares if there's a few gaps when i write »irl«."
The fact that using different dashes does encode meaning in a subtle sense does have relevance for semantics -- but that's, imho, almost secondary to this argument, as it's not as grammatically relevant as commas and. periods, for example.
The primary importance of using the correct dashes is that it preserves a good flow for reading and is paramount to micro-typographic balance:
- A longer dash to link words that belong together is visually perceived as an interruption and doesn't feel like those two words are one
- In reverse, a shorter dash when switching context -- or interjecting another idea within a sentence -- doesn't slow the pace of the text flow enough, and your brain will read/intonate it the same way as when linking words.
- And at last, either of them won't preserve optical balance when displaying a numerical range, as numbers are wider than a hyphen, but narrower than an em space, which would result in either insufficient visual separation compared to spaces following said numbers, or too much of an optical gap within an entity that belongs together.
That's the barebones set of dashes that are relevant for a balanced typographical appearance, not made up pedantic complexity to annoy people. Otherwise we'd be taking about half and quarter em dashes and the likes.
These are typesetters concerns, not writers concerns. They are all context sensitive tweaks to what amounts to the same glyph.
If the rules for each have as well defined contexts as the article suggests, then it sounds like something more suited to ligatures and kerning.
Full glyph replacement ligatures were not something initially supported by all font formats, so perhaps the fact that they continuing to exist as separate characters is more of a historical detail. It's something that could easily be added with new fonts though.
From a software "separating of concerns" viewpoint it feels wrong to me to have your font renderer infer meaning. A pre-processor that replaces hyphens with the correct dash – like Word does – feels more sane to me.
Anyone can use a hyphen for all three purposes right now and people would understand the meaning, because the meaning is primarily derived from the context of surrounding glyphs. Only typesetters would complain that a subtly more appropriate glyph should be used for the purposes of refined optics and geometry etc.
Therefore an endash and emdash ligature could only change the meaning IF the context of each use case overlap. i.e if there is a valid glyph based context in which endash and emdash are both valid... which I don't think there is because that would be far too subtle.
My favorite is that it will render 1920x1080, for example, as 1920×1080. I think the former looks terrible and unprofessional, especially when I see it in actual products rather than prose. So I really hope this catches on.
It may not be entirely irrelevant, but it’s very close to it. A bit like saying your tie has to be knotted a specific way to look respectable. Very fun for the in-group, but completely incomprehensible to those outside.
Like, I’m not opposed to having a few silly things to learn just to separate those that can be bothered to pay attention from those that do not, but I’d be hard pressed to say it’s actually relevant outside of that.
That is an extremely low bar. People lived their entire lives without X for any X.
You started by talking about kerning fonts, which is a great analogy.
Building on that - kerning is awesome because stuff looks better and I don't need to do anything for it to happen. Would it work to have my display system figure out which type of dash to use automatically?
Like, a dash inside a word should be short (under the assumption that you're linking the words together) and dashes with whitespace around it should be longer (under the assumption that you're switching context/injecting an idea into a sentence).
That said, I don't use en dashes, if I want my numbers to line up I use a fixed-width font.
Works just fine
Comma, not semicolon, is the usual alternative to em-dashes for setting off asides, semicolons set off independent clauses.
This character ‘—‘ looks like one long dash to me, even though I typed the dash button twice. What’s even crazier is if I type four dashes ‘——‘ it still looks like one even longer dash; even six ‘———‘ is a solid line, and I can delete it by pressing the backspace button once
I have no idea how to get my phone to display two short dashes side by side: ‘--‘ maybe I can fake it by puttin an emoji in between, knowing that hackernews will filter it out. Let’s see what happens.
Edit - ooh that totally works. I’d never really paid attention to how this feature worked before.
“--” can also be input on iOS by entering “- -” and then deleting the space. Or you can disable the “Smart punctuation” setting and type what you actually mean.
I also use capitalization and punctuation when I type while many people do not. It'd be great if they did, since it makes reading easier and takes almost no additional effort, but I'm not going to let it ruin my day. The parent comment is about why the distinction in dashes matters and has virtually nothing to do with typography enthusiasm, but rather reader comprehension. If you don't want to integrate that information into your life, great, but that's not really a refutation. For my part, I found it interesting. Even though I use em-dashes I learned more about how they're helpful. If you don't want to use them, I'm almost positive no one is ever going to correct you.
I don't and; but dumb tools replace -- with em-dash which breaks shit.
> If you don't want to use them, I'm almost positive no one is ever going to correct you.
Still have to look at it and suffer consequences of dumb editors replacing -- with em-dashes when someone innocently tries to just say commandline parameter
Looks good: “Sometimes writing for money—rather than for art or pleasure—is really quite enjoyable.”
Unreadable: “Sometimes writing for money-rather than for art or pleasure-is really quite enjoyable.”
Yes, punctation does matter.
(In French the em-dash is almost inexistant; we use parenthesis instead usually.)
“Sometimes writing for money - rather than for art or pleasure - is really quite enjoyable.”
Personally, I think two hyphens also looks better than just one, and it conveys that you really intended it to mean emdash rather than hyphen.
Hyphens are simply for connected-words while dashes are -- for better of worse -- to make asides.
It's context-dependent. (Aside: you wouldn't write "context--dependent", which is the use case of the hyphen.)
Ostensibly the en dash is primarily used for ranges, although that's a case where I'm inconsistent. I won't typically write "A - Z" or the technically correct "A–Z", as I think in that case I tend to write "A-Z", using a simple hyphen. I certainly won't write "A -- Z".
The em dash is even wider—it's not typically mistaken for a hyphen.
Using just one dash for everything will be readable in a text message or comment. But not in a (complicated) book, because there the benefit of these small things gets multiplied by the scale of the book.
IMHO, this is the main determination on when I decide to use em-dashes: is the text between them an aside of some kind? An alternatives would be to use parentheses.
Personally I do not find that " - " as the GP suggests enough of a visual cue as "—". And on macOS using different dashes is fairly straight-forward:
* hyphen: the key next to zero, "-"
* en-dash: alt/option-"-": –
* em-dash: shift-alt/option-"-": —
Some apps (e.g. Mail) auto-convert double-"-" into an em-dash as well.
On Linux, one needs to enable Compose key (keyboard layout settings). After that, you get default sequences like --. and --- for en- and em-dashes.
> “Sometimes writing for money - rather than for art or pleasure - is really quite enjoyable.”
To me this looks like a cryptic-case of the corrective comma.
“Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable.”
In other words a reader should be able to skip reading the contents of parenthesis with negligible impact on the context or meaning of the sentence. They should be able to skip reading the contents of em-dash-seperated text without changing the meaning of the sentence. And text between commas should be considered integral to the sentence, while secondary to the primary gist.
Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable.
Other than that, many people have come up with many writing styles. We mostly seem to be able to understand each other, so we are "all good".
Now, anyone typing random texts to a friend or a few need not care, but I think people that write in a professional capacity to more than a few people should know and care.
“Sometimes writing for money – rather than for art or pleasure – is really quite enjoyable.”
The only French-speaking place I've seen em-dashes used in daily life was Québec. For some (good) reason, it seems administration took a lot of care in using correct typography. My voting district for example was Mercier–Hochelaga-Maisonneuve (the first dash being an en-dash, and the second one a hyphen) and I was always amazed at how all communication actually used these two different dashes.
I can't imagine this level of care in French or Belgian official communication.
Punctuation matters, but space -- the "zeroth punctuation mark" -- matters more!
The author does discuss spacing the dashes but is, given the overall point of the article, surprisingly noncommittal.
How I learned the Unreadable: “Sometimes writing for money -rather than for art or pleasure- is really quite enjoyable.”
To the teacher I learned from this was a standard way of punctuating on a typewriter.
All possible and dealt with in under a second, but in the first example with the longer dash my brain recognises a parenthesis and I take a little "breath pause" before carrying on.
"Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable."
In your head, do you read those differently?
“Sometimes writing for money, rather than for art, or pleasure, is really quite enjoyable.”
– this seems awkward to me. This version, though:
“Sometimes writing for money—rather than for art, or pleasure—is really quite enjoyable.”
Isn’t that more fluid?
I think that changes the meaning, since it’s now a list of 3 items with an Oxford comma, rather than two lists, with the first list having 1 item, and the second list having 2 items. And I’m having a rough time even making sense of such revised meaning.
Expressed as pseudo-code, I read the original intent of that sentence as:
“money and not(art or pleasure) == enjoyable”
and that can be broken into
“((money and not art) or (money and not pleasure)) == enjoyable
French uses em-dashes ("tiret cadratin") or en-dashes ("tiret semi-cadratin") for dialogue. Like so:
– bonjour, dit-elle, comment allez vous?
– bonsoir, repondit-on. Ça va ça vient, et vous?
A very common example is in threads for machined screw threads, e.g., 1/4-20. This is not a range of numbers spanning from 0.25 to 20.0, but rather a pair of numbers that define two metrics of a single thing, which combine to uniquely identify the thread.
Perhaps context is sufficient, but adding this to your examples gives us at least three scenarios where the single symbol would mean very different things with pairs of numbers: compounding, subtraction, and numerical ranges. If we add on the clause separation duties of the dashes mentioned in the article, we have four uses where a single symbol sits between two numbers and means entirely different things.
Compounding and numerical ops are basically never confused. Machine screw is the only one of these where its even plausible. Not that subtraction and range are ever ambiguous, but if they were just use "#1 - #n" to denote "the numbers 1/n being used as labels for some range of options, not as a numerical values".
All in all, we have plenty of characters. A minimal set of rules, minimal set of characters, rich in predictable patterns, is what makes for a good language. The existence of a whole slew of specialized characters, all basically indistinguishable and frankly unheard of to most, has to work hard to justify itself right to live on my keyboard. We have parenthesis, commas, colons both full and partial, brackets square and curvy, braces, slashes forward and back...More than enough permutations and code space for anyone's expressive needs. Why anyone would opt for more byzantine characters with more rules on top is beyond my imagination.
While we are at it, we have so many words. Perhaps we should simplify to one of the several published standards of simplified English. After all, the number of combinations of a thousand words in sentences of arbitrary length is enormous. Why anyone would opt for more byzantine words with more nuanced definitions and rich history of usage, tradition, and cultural value is beyond me.
We could go on with grammar (I mean really, what the hell is pluperfect), spelling ('c', for example is useless on its own, its uses being filled alternately by k or s), fonts (wtf is a serif), capital and lowercase letters, and I am sure many other topics.
Why do we keep more words, punctuation, and other linguistic and typographical devices around than we need? A mix of inertia and legitimate uses and perceived value. It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.
Parenthetical type grammar with an explicit start character and end character is pivotal for encoding information unambiguously. You can't replicate that with any system that uses the same characters for the start and end, because it would be ambiguous as to if you are starting a nested context or ending the present one. Double, single, and even the rare triple quote allow for nested quotation. In principle a clean open and close quotation mark would also solve this (no subtle pixel hunting). You're right that we don't truly need four redundant variations on bracketing, but reducing it to just one is probably too few as it would be representing too many possible things at once. How about one pair for a narrative context (aka a quote), one pair for linguistic recursion (like I'm doing right now), one pair for collections of objects such as a list or a set. Colons probably could be skipped, everything beyond that is strawmanning me. A certain small number of delimiters / particles / whatever are needed to have expressive completeness. You need to be able to build sequential lists, unordered lists, one of several possibility sets, and / or / not type relations. In other words, a natural language at the very least needs some sort of regex subsystem, but it need not be much more more sophisticated than regex. I'm not a grammar denialist in fact quite the opposite. I want the information coded in simple grammar rules, not ad hoc arbitrary tables continually expanding.
I say this as someone who had a 12th grade vocabulary in 5th grade and its only gone up since, vocabulary is a waste of time.
Actually, I'm almost with you on 'c', but I'd rather throw out 'k' because its one of the few that don't fit on a 7 segment display. Capital letters also don't add much information. Yes actually, I'm fine with all of those going away. I couldn't tell you why the people who design way finding signage avoid serifs like a pox, yet other design fields refuse to read without them. With or without seems to read just fine. I really don't care too much either way. Letters would be better if they all worked more like EFHLT. Right now, too many clashing elements. Some are boxy, some are round, some have sharp diagonals. I'm not saying it has to be a 7 segment design, but it would certainly be pleasing if learning the alphabet, its ordering, how to write it, could all happen much faster by just noticing a few easy repeating patterns. Yes actually, lets do language reform.
>It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.
Well I'll agree with you there. All to often pointless pedantry comes down to "my school must be right otherwise I am wrong". Love or hate my reasoning, at least you can't accuse me of doing that.
You argue against multiple types of dashes because context is sufficient, despite there being typographical ambiguity. But you insist that we must have typographically unambiguous bracket characters. I must admit that I am struggling in this conversation to determine when we can depend on context and when we need unambiguous markers. Perhaps I am just incapable of picking up on the subtle context that backs up this position of yours. (:
> everything beyond that is strawmanning me
In fact, you will find examples of real human languages that exhibit more extreme versions of the things I have suggested.
FOREXAMPLELATINWASORIGINALLYWRITTENINASINGLECASEWITHNOSPACESBETWEENWORDS SENTENCESWERESEPARATEDBYASINGLESPACE OBVIOUSLYALLOFTHEPUNCTUATIONISUNNECESSARY SOALLARGUMENTSABOUTTYPOGRAPHYOTHERTHANTHATOFFONTSAREBASEDINREALITY
There are languages with simpler tense systems than what English has. Slavic languages, for example, tend not to have a pluperfect. So, the example of removing tenses is based in reality.
Hawaiian has an alphabet of just 13 letters. So, removing letters from the 26 in the English alphabet is based in reality.
The Dictionnaire de l'Académie française is being updated to its 9th edition and is expected to have ~60K words, whereas English dictionaries report an order of magnitude more (even with the issues in the linked source, this is a large gap). Basic English has a vocabulary of less than 1,000 words (if you desire a vast overhaul of the existing norms of typography, I hope that you are at least willing to entertain prior art in the area of overhauling the use of natural language as a valid example, even if you disagree with the intention or outcome). If you wanted me to go to extremes (which again, I did not in the post you replied to), I could have just suggested we use Toki Pona. Of course, if I did suggest such a conlang, you may have been correct that I was strawmanning you and going to extremes just for a point. Nevertheless, we can definitely conclude that there are, in fact, natural human languages with substantially fewer words than modern English, and there are definitely constructed and artificially restricted natural languages with enormously smaller vocabularies.
You need not agree that these examples constitute best practice, or that they represent desirable goals in the continued evolution of language and written communication. I hope, though, that you can recognize that none of these are strawmen, but based in reality, many in natural languages, and some in artificially constrained natural languages for specific purposes. If anything, I presented examples that do not represent the extremes of any position (I could easily have brought up languages with no written representation, for example). I merely selected additional examples that conform to a broad categorization of removing stuff from modern English.
I welcome further discussion on the topic, but I worry you might dismiss things I say you disagree with, as you have done once above by ascribing an intention of strawmanning you, and as you seem wont to do with typographical conventions you dislike. And if you want to eliminate the punctuation you dislike, what might you do to a person whose arguments you dismiss? (;
It seems though, that you just don’t like the various dashes, which is totally fine. Many other people and I find value in them. Still more probably just go along because, as I said, a big part of language norms comes from inertia. The point of language (other than perhaps some, but not all, artistic expression) is communication. Why abandon the norms that facilitate this communication? Is it better to stand on preference (or perhaps principle) and harm your attempts at communication or to yield to norms and be better understood (though perhaps annoyed)? I do not know that there is a correct answer to this question.
I do hope, though, that I have disabused you of the fanciful notions that I was cherry-picking ideas that are extreme just to prove a point and that I was strawmanning your argument. I have shown above numerous examples that back up each of my suggestions, grounded in the reality of natural human languages. Further, I have shown several examples that are truly extreme to show that my original suggestions were not “intentionally extreme to prove a point.”
In theory you could just make "(" and ")" the universal sub-context denoting symbol. You would just need a different extra symbol to clarify between what a parenthesis means. The three systems makes sense. One for data agnostic compression like a JSON object / foiling a math expression, one for relaying text itself as an object in the domain of discussion rather than as the thing being said (aka a "quotation"), and one for scopes that are part of the discussion per se (not quotation).
Context suffices when the parts of speech have no chance of being in the same slot. Compound words and numbers.. your machine screw example was pretty rare. I think the dashes are too specialized in meaning and too hard to tell apart to justify code points in the docs and buttons on my keyboard. If need be, distinguish the various flavors of hyphen with some rule about touching the letter or having two in a row. Our symbol set is reasonable. Not as succinct as Hawaiian, not so bloated as Chinese. 13 chars fits in 4 bits. 26 chars fits in 5. With great strain you can maybe find a workable set of grammatical symbols without blowing past 32 chars, but will probably end up using a 6th. I'm against bloating the raw number of symbols and rules everyone has to rote learn, not dashes in particular. If its already in frequent use like all the paren styles then fine, but lets not make anything worse than it has to be.
This is absolutely correct.
> its about encoding how to deserialize the linear sequence of words
This is absolutely incorrect. Grammar is the collection of rules that prescribes the combination of words to make valid collections of the same in a language. Specifically, grammar is distinct from semantics, which is concerned with meaning. A nonsense statement may be grammatically correct.
Punctuation is the collection of non-character glyphs that are used to capture the nuances of spoken language into a written form.
Punctuation is orthogonal to grammar.
Put more briefly: spoken language has grammar and no punctuation; written language has the same grammar as the same spoken language and also punctuation.
Parenthetical asides are represented in spoken language with some combination of marker words, pauses, tone of voice, word choice, and perhaps other indicators I may have forgotten. The purpose of punctuation is to lend some of the nuance of spoken communication to the otherwise sparse written word.
The argument of the number of bits to encode glyphs is also orthogonal to the purpose or usefulness of language, writing, and communication. Computers are tools. A keyboard should justify the paucity of its glyphs, rather than the other way around. Once we get here, we are in the realm of pure opinion and preference, which I don't have much interest in pursuing.
I'm sure then you already know, colorless green ideas sleep furiously.
I see no contradiction in what you are calling incorrect. At some point whatever representation our brain uses for concepts and thoughts, to share that object requires us to pack into a linear sequence of words which can then be reliably unpacked by on the other side. The very nature of verbal communication forces the existence of serialization/deserialization rules. Those rules are what we call grammar. Grammar may be somewhat orthogonal to semantics, as you observe it is possible to encode valid nonsense, but the grammar exists to encode semantics and is thus to some degree tied to it. The grammar rule of "subject verb object" doesn't only tell you how to check the validity of "colorless dreams sleep furiously", it tells you how to deserialize that sentence back into a hierarchy tree of constituents and their relations. It just so happens to unpack as an object of useless constituents and impossible relations.
Punctuation maybe orthogonal to grammar in the general case, but in this particular language they are highly coincident. Virtually all punctuation marks are grammatical particles. It doesn't have to be like this. Some languages have "audible parenthesis" words. Others have words for marking the end of a sentence as a question. Calling punctuation marks a non character seems a bit artificial. Let's just call them the non-audible characters, in analogy with non-printable characters.
The argument about bits was apparently lost in transmission. I assure you this isn't a preference and opinion thing. Information theory applies just as well to natural language encodings as it does to computer protocols. The basic principles of information entropy and optimal transmission encoding shows up in every language: the least frequently used words are the longest. In an analysis of conversations across languages, researchers found the bit rate to be constant. Some spoken languages are seemingly very fast, but that's because the information density per word is lower. The brains bit rate is a constant. Irrespective of if we are using a computer or not, the size of an alphabet is measured in bits. The bits in the alphabet determine how much you can possibly say per character. On an extreme end, Chinese has over 5000 characters. That's around 13 bits of information per character, at the low low cost of memorizing all of them. For comparison, ignoring capitalization and punctuation, English is a 5 bit alphabet meaning the same amount of information fits into 3 letter words. The Hawaiian alphabet can cover 80% of those possibilities with just 3 letters, and the remainder with a 4th. Think about how powerful that is. Is memorizing 5000 arbitrary squiggles worth it to compress the width of words down by ~3 chars?
The number of bits that are in an alphabet also determines the minimum number of unique design elements needed to construct letters for it. 7 segment displays are a great example. As I said, our characters fit on 5 bits. That's the minimum. Now when our letters came about, they didn't know about bits and they certainly weren't doing this on purpose, but almost every letter can be expressed on a 7 segment display. In other words, writing a letter only wastes two bits per character relative to saying it.
When you learn a new ligature in an Arabic script, you've doubled the number of letters you know. When you learn a new Chinese character, you've learned a new Chinese character. Language is a transmission medium. Its a tool. My takes here are no more preference and opinion than the allocation of the radio spectrum. There's an optimization tradeoff to be had between the limited character choices of Hawaiian and the extreme rote memorization of Chinese. Going from 13 to 26 characters does double the learning time, but the learning time at that stage was short anyway. Going from what we currently have to perhaps 60ish characters (6 bits) doubles it again. Maybe that's tolerable. The next step up is a ~128 characters. There may be things you can do quicker with a large set of symbols, but the ROI for learning all those symbols doesn't pay off. Around 5 to 6 bits is where most writing systems settle.
And that's why bloating the raw glyph table with letters and marks is the wrong solution.
so when he wrote something . he used only periods to denote pauses . no other punctuation symbols . no capital letters . some people were thinking that his periods stand for perl concatenation operators . i dont know if he is still doing this . i hope he stopped
There's a reason monks of old read aloud. It was about the only way to confirm the actual meaning of a text.
(Bass-ackwards hint: "Arabic" expresses poorly when vowels are excluded.)
Some people do talk like that . All complete thoughts . Sequential.
Other people—and I very much count myself among them—have a less linear, more tree-like mode of expression; where the ideas, instead of building on what came before, are being laid out out of order – the ideas aren’t completed – and more complex punctuation is needed to establish the relationships between those thoughts.
It sounds like I’m saying the former is less sophisticated than the latter. I don’t think that’s true.
I think we should probably try to express our ideas in a way that doesn’t require out-of-sequence reasoning. Short, simple sentences. With clear meanings. Building on one another. Much easier to follow.
The tree-like mode of endless nested parentheticals and asides is just a rendering of an incomplete thought process.
Not better or more sophisticated. Just still in progress.
em-dashes and parenthetical should be used sparingly so it isn’t too annoying to do all the extra typing.
If it's necessary to be explicit for clarity and proper rendering, then sure. But otherwise, the less friction the better.
After years of procrastinating in learning LaTeX (the Lion Book turned out to be a clear, delightful, and highly useful reference), one of the pleasant surprises was that paragraphs are simply denoted by two carriage returns. After years of hand-coding HTML where matching <p> and </p> tags (among many others) was a constant occupational hazard, this was just ... pleasing.
Markdown has a similar philosophy, if a far more restricted set of capabilities. That set is however sufficient for a tremendous number documents, and if it's ultimately insufficient still remains a useful way to get started with writing.
But even if it's not strictly necessary to balance <p> tags ... it is necessary to do so with many other HTML elements, and missing or mis-typed tags can utterly bork a page, particularly if there's any complexity to it.
(Hand-crafting tends to minimise that complexity, but it's still possible to get reasonably twisted.)
That said, checking one of my favourite HTML5 references, whose page source itself is a beautiful example of clean HTML ... I see that Mark Pilgrim in fact omits the close tags on his paragraphs:
And that said: LaTeX and Markdown also omit the need for the opening paragraph tag. So there's that.
Still, fair point ;-)
1. Drew DeVault has suggested as much: <https://drewdevault.com/2020/03/18/Reckless-limitless-scope....>
The guy who created it said something like, “If you love Comic Sans you don’t know much about typography and should probably get a new hobby. And if you hate Comic Sans you don’t know much about typography and should probably get a new hobby.”
I feel the same about this. The average person has about a billion things to improve in their writing before the “correct” use of different dashes should become something they think about.
Most papers are fundamentally flawed, unfortunately, due to lacking sufficient information and data for replication, being underpowered (and not controlling for many factors).
It took decades to get to some minimally sensible standards (preregistration, conflict of interest declarations, awareness of the most common stats issues, power analysis), but we're still far from doing effective science.
Money is still handed out based on feels, hypes, name recognition (when it's not blinded) for laughably small projects, instead of focusing on establishing longer term ones and/or improving the actual science output (ie. data and hypothesis generation) of existing ones.
(Yes, of course, academia approximates this. Yes, yes. Everything's fine. We'll have a usable model of Alzheimer's any second now! Aaany second. Just let this new totally effective model of depression/obesity/learning/ME-CFS out of the door first.)
The perfect topic for HN!
I however--as a typographer--strongly disagree. Typograpy is both about beautiful typesetting as well as making sure that the information contained in the text is understood easily.
The former is obvious to me. It may not be to you but that doesn't make your reasoning right.
As an analogy, there are quite a few people among my friends & acquaintances who cook occasionally or rarely. They usually share the trait that they care more about eating than how something tastes. Bluntly spoken.
They commonly have one kind of oil in their kitchen (most often suflower) and they use it when the recipe demands "oil".
Usually recipes specify what oil to use. It may say olive oil or peanut oil or sesame oil. They won't have these oils and they don't care.
Even though the effect of using a different oil is profound on many levels (not even only taste). If you care, that is. Same with the dashes. Text looks and reads very different when those different dashes are used correctly.
Which leads to the information part. Why do we have these different dashes? They actually map to spoken language.
A hypen is used to pull things together. A word can be hypenated (should be read as if the hypen didn't exist) or two words can be pulled together (making the pause between them shorter) "ever-changing" is pronounced differently than "ever changing".
An en dash used between points in time or space conveys that. A distance. The spoken pause is usually longer.
And finally, an em dash, like a comma, conveys an even longer pause between the words it separates.
So I get it. visual design language serves a purpose. An important purpose. Its not the artful navel gazing outsiders think it is. Well, maybe some people are like that, but there really is objective purpose under it all. I'd even say I agree about rules for hyphens touching their neighbors or not. For compound words it should be a train-like-in-construction whereas in a delimiter roll like range of items it should go Boston - DC.
I just can't see having a whole dedicated set of minutely different characters fit for this purpose. I dislike it for the same reason I dislike lego sets that have a particular piece in them which isn't used for anything else in any other set and never will be. It ruins the elegance of the system. It offloads a minor design problem onto somewhere it doesn't belong (namely the character set). I want to know everything while learning as little as possible. Which is why I strive for encodings that express as much as they can with as few elements as possible.
If it were me, I'd just have '.' , '-' and '_' exist at mid, bottom and top heights and be done with it. Don't like my line length? make it whatever length you want either dotted dashed or continuous. Solves every use case, extremely composable, every permutation that should logically be there, is there. .,;:' notice anything incomplete? LHTIFE notice whats missing? qbhrnujdp damn that's frustrating. KRBPF where's the rest of the set?
My takeaway wasn't that the article was being pedantic, just that it was being informative.
What's the point of punctuation? The point is that ambiguity exists in human communication. Where accuracy and precision are important — for example in formal communication — different punctuation marks and rules help prevent misunderstandings.
When engaged in less formal communication, or when the stakes of miscommunication are lower, these rules seem (as you observe) unnecessary. I think that insisting on proper syntax, spelling, grammar, or whatever else in an online forum like HN would be silly. But, internet forums aren't the entire world, and it is conceivable to me that there may be places where people need to depend on the meaning of their message being conveyed reliably.
Of course I'm mostly writing about computer/software topics and don't write for publications or a non-technical audience.
There are different contextual requirements that can be served by specific typographic characters. I'd much rather see them used, than not.
I'd imagine most would feel the same‽
A comment led to the follow-up https://www.punctuationmatters.com/the-difference-between-a-..., but it’s still very insufficient, only dealing with MINUS SIGN and assuming HYPHEN-MINUS was exclusively a hyphen. And appears to have suffered from the same replacement of lone HYPHEN-MINUS with EN DASH as this article.
I guess the difference here is that someone’s boss might complain that they should follow this article, since we all write stuff from time to time.
That said, personally I need my different dashes, commas and parentheses for my excessive wavering.
You could argue, however, that they should refrain from posting, but they probably felt the need to share in case others felt the same way.
The simplified rules for the em-dash are pretty much intuited and prescribed versions of this which gut the effectiveness of em-dash. In general use an em-dash should be used to denote thoughts without having too restructure/delete what you just wrote to accommodate that thought.
Edit: I oversimplified. Consistency is what is important, using an em-dash like a comma that isn't a comma leads to ambiguity when you also use commas. A writer who avoids semicolons and quotes all dialog can use an em-dash very differently than raising the voice, but they can also use a semi-colon very differently than its standard accepted role, that is what these simple guides miss, the consistency of usage, they just list all of the various ways you could use any given mark and people start using an em-dash to "fix" their long run-on sentence with all of its commas.
The closest thing we have to standard use allows for wonderfully complex sentences which can convey great meaning but consistent and well defined use is most important.
comma - connects independent and dependent clauses
em-dash - raises and lowers the voice
semicolon - connects independent clauses in a more direct way than the paragraph
colon - elaborates an idea
parenthesis - an aside, stated instead of thought
period - end of thought
Question mark and exclamation points do not need to be at the end of a sentence, they can double as comma, semicolon, or colon.
I seem to be missing a nuance of HN's line breaks and formatting.
We could also use em dashes to signal excitedly running from one thought to the next—as if we’re just riffing on an idea—too fast to be interrupted—wouldn’t that be amazing?
Or we can use the em dash to slow us down—to pause and reflect on what we just said.
Or in dialog:
“Perhaps we can use it to signal an unexpected inter—“
“No, an interruption.”
“Yes, that would make more sense.”
“Oh! I just thought of something—we could also use it to indicate stunned silence.”
What's even more interesting to me is that this contrasts with a parenthetical which I now realize lowers the voice when we read it aloud.
Did you discover that difference on your own or did you read it somewhere? Just curious.
My mental model is that an em-dash is a parentheses that author was too excited to slow down and make vertical.
>sounds rather restrictive and specific, to be honest.
Write a single sentence which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc to write their wonderfully long and complex sentences and by complex I am referring too meaning as much as structure, we can have great meaning with simple structures but we have to accept a certain amount of ambiguity with that. Sure that challenge can be executed as a paragraph but then it ceases being a single thought, it is a collection of thoughts and that is a very different thing.
> Write a single sentence, which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion, without following the "standard" I included in my edit: This is what allows writers (like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc) to write their wonderfully long and complex sentences (and by complex I am referring to meaning as much as structure); we can have great meaning with simple structures, but we have to accept a certain amount of ambiguity with that—sure, that challenge can be executed as a paragraph, but then it ceases being a single thought; it is a collection of thoughts, and that is a very different thing.
I tried to stick to your 'standard', though you might disagree on some of my choices. I would say I found it a little constraining. Here's an alternative edit that doesn't follow your rules but – I find – creates a more fluid reading of your original words:
> Write a single sentence, which clearly and concisely includes: exposition; thought; aside; rhetorical question; self rebuttal; and conclusion – without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc, to write their wonderfully long and complex sentences—and by complex I am referring to meaning, as much as structure. We can have great meaning with simple structures – but we have to accept a certain amount of ambiguity with that. Sure, that challenge can be executed as a paragraph; but then it ceases being a single thought—it is a collection of thoughts, and that is a very different thing.
All of which I hope goes to show that these choices are a matter of taste, not absolute rules
>All of which I hope goes to show that these choices are a matter of taste, not absolute rules
Throughout this exchange I have avoided calling these rules and used convention instead, and that is what punctuation use is. Conventions are easy to break but you should have a good reason to do so if you want something to be readable and you must be consistent in your choices. I have tried to emphasize this throughout and I have repeated it many times, consistency of use is what really matters, punctuation marks are pretty much sign posts for the reader and as long as they remain consistent in their use and are well formed than most readers will have no issues figuring out any use.
Imagine if a town decided one day that they could save some money on removing a no longer needed stop sign by simply agreeing that it is not a stop sign, it is a small town and they can just pass the word that the stop sign on third is now a 30mph sign. This works quite well and save some money so now the town continues with this and starts changing the meaning of other signs. It is not difficult to see why this would be troublesome, eventually people will get confused and no one from out of town will have a clue. Thankfully our governments are consistent with their signs and a stop sign is a stop sign.
It is a really complex thing and part of what makes English literature what it is. We have conventions which have evolved over time when it comes to punctuation and we have prescription, but we don't really have rules unless you are writing tech documents or journal submissions. It comes down to having a clear and consistent use more than anything else and using every punctuation mark for any accepted use based on whim is not clear or consistent.
Perhaps, typesetting still uses these, but that's okay. They can keep doing so, since these probably add aesthetic appeal to how flyers are designed.
I also noticed a pundit-battle brewing in the depths of the hyphen-m&ndash-soup.
Let’s make that even more clear.
THE EN DASH IS ABOUT AS WIDE AS AN UPPERCASE N; THE EM DASH IS
AS WIDE AS AN M.
En and em dashes aren’t called that because they’re as wide as
a lowercase “n” and a lowercase “m.” They’re called that
because those are the specific typography jargon words that
refer to the height of a physical piece of type (the “em,”
also called the “mutton” to reduce confusion) and half that
height (the “en,” also called the “nut”). An em dash was
originally as wide as the font is tall.
En dash is all over the place in personal/business writing, even just in email, thanks to Word and Outlook autocorrecting a hyphen to an en dash whenever it's between two spaces (rightfully in my opinion). If you've never seen it then that surely says more about what you notice than the content of what you've read.
That doesn't necessarily contradict your point – if you never notice the distinction then what's the point? But it's different from how I read the implication of your post.
(Funnily enough, without thinking, I put an en dash in the paragraph above by holding down on hyphen in the Android keyboard, and only caught myself after I did it.)
I'll agree with this. It also brings up the point, if punctuation isn't seen - is it useful? Probably not to me - maybe yes to others.
"Spelling, grammar, and punctuation are a kind of magic; their purpose is to be invisible. If the sleight of hands works, we will not notice a comma or a quotation mark but will translate each instantly into a pause or an awareness of voice [...] When the mechanics are incorrectly used, the trick is revealed and the magic fails; the reader's focus is shifted from the story to its surface."
- Janet Burroway, Writing Fiction: A Guide to Narrative Craft
Following this thread, the discussion isn't about the existence or absence of punctuation. The discussion is about the case of three specific punctuation marks, which appear extremely similar if not identical. These punctuation marks are being discussed after reading an article about their differences, which are only apparent to those among us who find memorization more important than clarity.
In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Moving on from that.
> In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.
Indeed I did read it, I disagree.
Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.
I'm saying that the people who read literature containing en and em dashes would notice the difference were they not there. I'd echo what another commenter said: these marks wouldn't be missed until they're gone but we would definitely miss them.
> Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.
What I was saying is that I don't see them in my day-to-day activities, which I don't. You are making assumptions about what type of communications I am involved with daily, and the types of people I communicate. I communicate with cryptographers and security professionals who all use mono-spaced text. I also communicate with C-level people who barely need to use punctuation other than a period. They have mastered brevity and communicate exceeding well.
It would make far more sense if you wrote, "I don't think it's reasonable that chownie has never seen an em-dash in business or personal writing."
For the analogy about not explicitly hearing the baseline, but the music still being affected by it: maybe you interpreted it just a little too directly? The analogue to removing the baseline is not the total removal all three of those punctuation marks. Instead, it's the removal of the distinction between them. I think it was pretty clear (and apt).
Given there is usage of en dash in the wild as you mentioned, there's a possibility this may be a case of "you don't know what you got 'til it's gone."
 a bunch of Shakespeare's sayings scraped together, after they were trampled in a mosh pit.
There’s an en dash in the first line of text on apple.com right now. There are en dashes, em dashes, and hyphens in the most recent press release on that site, all used correctly.
You have definitely seen them. All professional writing outlets, like e.g. the New York Times, use em-dashes, curly quotes, and other “typographic” characters that one is supposed to use in American English.
And newspapers in my own country follow the typographical rules. Even though no one uses it in informal communication on HN or FB. (Well, some on HN do.)
We can discuss that I chose the word "seen", when I meant "noticed", but there is no doubt that I didn't write what you intimated. I have seen the dashes in formal writing and in newspapers.
A too-hurried reading is worse than not reading at all.
And also not “formal writing”.
And presumably no copy-pasted message from Word or whatever other app inserts “smart”-whatever automatically.
And also not any regular old business website. (Did you think newspapers were the only ones? Just because those were the examples?)
Even for personal writing: some people even take the time to insert bullet points, so “proper” punctuation is easy for them.
You’re a fine one to complain about pedantry. (I guess yours is a just-right level of (cover your ass) pedantry.)
You've added exactly nothing to this discussion, just written a personal attack founded on misunderstanding and make-believe.
> THE EN DASH IS ABOUT AS WIDE AS AN UPPERCASE N; THE EM DASH IS AS WIDE AS AN M.
> They’re called that because those are the specific typography jargon words that refer to the height of a physical piece of type (the “em,” also called the “mutton” to reduce confusion) and half that height (the “en,” also called the “nut”).
An em was traditionally the width of an uppercase M and an en half that (around the width of an uppercase N). Nowadays, this relationship doesn't necessarily hold: one em is equal to the font size (e.g., a 12 pt font has one em = 12 pt).
A colon is used in this context, when you're introducing the question that follows.
I think this isn’t just a matter of personal preference, but it’s also largely a cultural thing – in German, for example, the “space-en-dash-space” form is common.
This is true for a lot of other punctuation as well. For instance, in Germany, we quote „like this“ instead of “like this”. Whereas in Switzerland or France, it’s common to quote using Guillemets, as in «Hello there!». This style can also be found in German texts, though it’s less common than quotation marks, and it would typically be used »inversely«.
This is also the traditional style in Dutch; it's what I was taught at school. These days many just use "upper quotes". You can still find the traditional style in books and some newspapers, but others have switched over the years.
In traditional Ethiopian you would use ፡ as a word separator, and ። as a full stop. Over time, people have started to "just" use the space as a word separator. There's some Wikipedia pages that mix both styles; for example on  you can see ፡ being used for the first three paragraphs and then it switches to a space. I rather like being able to see the evolution of language/typography on a single page.
What do they mean? Just curious.
Eastern Europeans often drop articles because that’s (apparently) what they do in some Slavic languages. That’s a minor second/third-language quirk, not about an intellectual deficiency (lack of capacity).
Of course, some extra whitespace is even more harmless.
The only thing we can really do is try to notice these biases in ourselves and ignore them as best we can.
The unspaced em-dashes—like this—is typically American.
There ends my trivia about that unusable site.
Not to mention, ems and ens are not Ascii and thus not strictly kosher.
(BÉPO version also exists)
I assume we're done doing that, that task is finished ;)
EDIT: Seems HN is eating up the right signs... You can see them on Wikipedia here, they essentially look like two small commas: https://ro.wikipedia.org/wiki/Ghilimele
“Convex” or „concave“ usage varies by language. See https://en.wikipedia.org/wiki/Quotation_mark#Summary_table
E.g., when I type a comment on HN and enter said `“` in the input text field, it uses my system’s default monospace font (Courier), which renders the character so that the stroke appears to go from bottom left (thick) to top right (thin). After I submit my comment, HN uses Verdana (the one from my system), which renders the very same character so that the stroke appears to go from the top left (thick) to the bottom right (thin). It’s the same Unicode character, but both fonts happen to render them differently according to how the font maker laid out and mapped the respective characters. (I can observe the same behaviour when I compare both fonts in my word processor, so it’s not HN-specific.)
„‟ are more consistent in current computer fonts by virtue of their Unicode names strongly suggesting a particular appearance.
“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
„ U+201E DOUBLE LOW-9 QUOTATION MARK
‟ U+201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
〞 U+301E DOUBLE PRIME QUOTATION MARK
My personal opinion for hyphens is:
- Ambiguity: most can be cleared up with spaces, and for examples like 3-8 if it’s numbers we can tell it’s a range from context
- Ease of input: one character is a lot easier to decide between than 3 (or 4 if you include minus), and if there are rules for software to be able to input the correct character every time then the differences in characters become redundant
- Subjective aesthetics: I quite like the consistent compactness of the single hyphen
And for quotes:
- Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier
- Ease of input: Usually automated but can absolutely tear through code if pasted in the wrong place. If we deem these smart quotes useful enough then they can coexist with typewriter quotes peacefully if we do not run the quote formatting on code blocks (which is where code should be anyway)
- Subjective aesthetics: I do like the look of smart quotes but would be willing to use straight quotes
> Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier
Typographic conventions go further than that.
In Norwegian it’s `«»` for one level of nesting. For nested quotes you are supposed to use something else. Maybe `‘’` (single quotes) for the second level and then `“”` (American English double quotes).
Maybe American English uses `“”` and then `‘’`.
In my opinion that’s not necessary. At least for text storage.
It's easy to find a hyphen (or something close enough) on your physical keyboard, but there's no em dash. OSes also make it a pain to automate even when they claim otherwise.
I go out of my way to use em dashes but do I think others would? No way. So is lack of use because of lack of utility or because of idiosyncrasies in keyboards?
Hyphens are great for some things but are too short to visually offset text.
"The global conflict spanning the years 1939~1945 is known as World War 2..."
The sentence as you wrote it could be misinterpreted as "the conflict spanning the years 1939 to ca. 1945...".
Had you used a dash/hyphen/minus/whatever nobody would be likely to misinterpret that as "the conflict spanning the years minus six..."
Thus I agree that using tilde for numeric ranges would be confusing. Might as well just use a hyphen, which is easier to type and most people won’t notice the difference from the correct character (en-dash).
Using that form of reasoning, it could be claimed that, say, “espresso” is pronounced ”expresso”, because some people do pronounce it like that.
But that would be disingenuous, since “is pronounced” does not generally mean “is sometimes, by some people, pronounced”, but “is supposed to be pronounced” or “is properly pronounced”. The same goes for “tilde is used for approximation”; no it isn’t. If would be different if scbrg had written “tilde is sometimes used for approximation”; it would have indicated a possible interpretation of the first meaning, and not the second.
Oh, dear lord. I apologize for leaving out this very important word. I thought it was fairly clear that I didn't mean it was the only symbol used for approximation, pretty much like how, I don't know... nothing is the only thing used for anything.
Whatever phrase, symbol, word or tool in general you find, you can be fairly certain that there's something else that could be used instead.
In the really real world, people tend to use the symbols that are easy to type with their keyboards. Ironically, this is a bit like what TFA complains about; people always use the hyphen that's available with one keystroke when in fact they "should" (for some arbitrary value of "should") use a handful of different ones. And they use tilde for approximation, because nobody knows how to type a fucking ≈. You'll also note that they use " when they "should" have used “, ” or any of the umpteen other variants of quotation marks.
When it comes to ambiguity, which was what this sub thread was about, how things are often used is actually quite important. Because, you know, it's what people actually write that you have to disambiguate, not what they should have written.
Most people associate a double tilde with “approximately equal”.
And, after much cursing, and my team spending time changing the text, I reflected, and came to like those punctuation markers. Took me a long time, but I have been converted.
A guide to the 3 dashes in English:
Hyphens (-) are compound-words.
En dashes (⌥ -) connect beginning–ending.
Em dashes (⌥⇧-) can replace parentheses and colons — use them more!
En dash (–): Compose - - .
Em dash (—): Compose - - -
What is this
And also because this article uses an en dash in the table in place of a hyphen.
EDIT: Wayback confirms it's supposed to be a hyphen: https://web.archive.org/web/20120120121527/http://www.punctu...
$ curl -s https://www.punctuationmatters.com/en-dash-em-dash-hyphen/ | grep -A6 '<h3>What'
<h3>What do they look like?</h3>
<table style="height: 139px;" width="289">
<td><strong> hyphen </strong></td>
I should also note that this whole point seems at best a point for typography geeks. These are three almost identical marks that have very similar uses. I am completely convinced that no one has ever disambiguated a phrase by noticing that something is a hyphen and not an en-dash or vice-versa.
Suitable for those who are familiar with punctuation basics but may want a refresher, and AFAICT gets some things more correctly (e.g., the numbers in a range are generally separated by a figure dash, not en dash).
En dash: --
Em dash: ---
On usage --- I find the practice of using the em-dash without bounding spaces (typical of most modern style-guides) is visually distracting and more difficult to read than when spaces are provided around the punctuation (as I've done here, and my stylometric stalkers may file as a personal identification tell).
There is no justice.
Or does no-one care but me?