Hacker News new | past | comments | ask | show | jobs | submit login

Markdown and/or markdown processors are known to change.

Since there's no single Markdown spec, determining just how a page will render, or what will break, is a bit of a crapshoot. And since Markdown treats nonparsable markup as ... plain text, you don't even get errors or other indicators of failure. You've got to view and validate the output manually or by some other means.

With formal tag-based markup languages (HTML, SGML, LaTeX, DocBook, etc.) you've at least got 1) an actual markup spec and 2) something that will or won't validate (though whether or not the processor actually gives a damn about that is another question, hello, HTML, I'm looking at your "The Web is an error condition": https://deirdre.net/programming-sucks-why-i-quit/)

I can't find the post at the moment, but someone recently wrote a cogent rant on the fact that a change in their hosting provider (GitHub via a static site generator IIRC) had swapped out markdown processors, with changed behaviours, rendering (literally) all their previously-authored content broken.

Which is indead a pain.

I personally like Markdown, and find it hugely convenient. For major projects though, I suspect what I'll end up doing is starting in Markdown, and eventually switching to a more stable markup format, which probably means LaTeX (HTML has ... proved less robustly stable over the 25+ years I've worked with it).

Though for simple-to-modestly-complex documents, Markdown is generally satisfactory, stable, and close enough to unadorned ASCII that fixing what breaks is not a horribly complicated task.

Up to modest levels of scale, at least.






I appreciate your reply. Seems Markdown is more complex than I recognized and this just makes me want to avoid it more. If you do find the rant you mentioned, let me know.

> HTML has ... proved less robustly stable over the 25+ years I've worked with it

The first website I made in 2002 still views fine in a modern browser. I didn't do anything fancy, though. I would be interested in what has been unstable as it might give me ideas on what to avoid in HTML.

I don't find HTML to be that much harder than plain text or Markdown so I think I'll keep using it for smaller projects. LaTeX is worth considering as well, particularly given that I will have math on some of my webpages. One issue is that the stability of LaTeX depends strongly on which packages you use. I need to take a closer look at the health of every package I use. I think avoiding external dependencies is easier with HTML.


My sense is that Markdown is probably pretty safe for most uses, particularly if you control the processing. If not, then yes, it can bite. For me that means pandoc to generate endpoints such as HTML, PDF, etc. I'm fairly confident that most of that toolchain should continue to work (provided computers and electricity exist) for another 2-4 decades.

For certain more complex formatting, Markdown has limitations and features are more likely to change. But I've used Markdown to format novel-length works (from ASCII sources, for my own use) with very modest formatting needs (chapters, some italic or bold text, possibly blockquotes or lists), and it excels at that.

For HTML, it's a combination of factors:

- Previous features which have been dropped, most to thunderous applause. (<blink>, <marquee>, etc.)

- Previous conventions which have largely been supersceded: table layouts most especially. CSS really has been ... in some respects ... a blessing.

- Nagging omissions. The fact that there's no HTML-native footnoting / endnoting convention ... bothers me. You can tool that into a page. But you can't simply do something like:

    <p>Lorem ipsum dolor sit amet.
        <note>Consectetur adipiscing elit</note> 
        Nulla malesuada, mauris ac tincidunt faucibus</p>
... and have the contents of <note> then appear by some mechanism in the rendered text. A numbered note, a typographical mark ( * † ‡ ...), a sidenote, a callout, a hovercard, say.

In Markdown you accomplish this by:

    Lorem ipsum dolor sit amet.[^consectetur] Nulla malesuada, mauris ac tincidunt faucibus

    [^consectetur]: Consectetur adipiscing elit.
Which then generates the HTML to create a superscript reference, and a numbered note (when generating HTML). Or footnotes according to other conventions (e.g., LaTeX / PDF) for other document formats.

- Similarly, no native equation support.

Maybe I'm just overly fond of footnotes and equations....

But HTML and WWW originated, literally, from the world's leading particle physics laboratory. You'd think it might include such capabilities.

- Scripting and preprocessors. I remember server-side includes, there's PHP, and JS. Some browsers supported other languages -- I believe Tcl and Lua are among those that have been used. Interactivity and dependency on other moving parts reduces reliability.

The expression "complexity is the enemy of reliabilty" dates to an Economist article in 1958. It remains very, very true.

HTML is for me more fiddly than Markdown (though I've coded massive amounts of both by hand), so on balance, I prefer writing Markdown (it's become very nearly completely natural to me). OTOH, LaTeX isn't much more complex than HTML, and in many cases (simple paragraphs) far simpler, so if I had to make a switch, that's the direction I'd more likely go.


I agree with you entirely on the abandoning of conventions with HTML. I haven't paid much attention to multi-column layouts in CSS over the years but my impression is that it's gone from tables to CSS floats to whatever CSS does now that I'm not familiar with. Browsers are typically backwards compatible so this isn't that big of a deal to me. But I have no idea if what's regarded as the best practice today will be seen as primitive in 15 years.

> The fact that there's no HTML-native footnoting / endnoting convention ... bothers me.

I've seen people use the HTML5 <aside> element for sidenotes, styled with CSS. Some even make them responsive, folding neatly into the text as the viewport shrinks. I'm not sure if this is the intended use for <aside> but the result is reasonable and I intend to do the same. If you're set on footnotes, though, yes, I don't know a native implementation.

Equation support with MathML is okay in principle but not practice. I'd like to have equations without external dependencies (MathJax's JS alone is like 750 kB!), but that's not possible until Chrome decides to catch up with Firefox and Sarafi on MathML. I've been thinking about just using MathML as-is (no external math renderer), and if Chrome users complain, I'll tell them to get a better browser. ;-) Maybe that'll help some Chrome users understand why they should test their websites in other browsers.


Semi-relatedly, I think even the linear form of UnicodeMath [1] is very readable, and it would be great if there was more support for building it up into nicer presentation forms in the browser wild (MathJax has had it on the backlog since at least 2015, for instance), as that seems to me to be a better "fallback" situation than raw MathML given its readability when not built up.

[1] http://www.unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf

> I haven't paid much attention to multi-column layouts in CSS over the years but my impression is that it's gone from tables to CSS floats to whatever CSS does now that I'm not familiar with.

CSS Grid [2] is the happiest path today. It's a really happy path (I want these columns, this wide, done). CSS Flexbox [3] is a bit older and nearly as happy a path. Some really powerful things can be used with the combination of both, especially in responsive design (a dense two dimensional grid on large widescreen displays collapsing to a simple flexbox "one dimensional" flow, for example).

Flexbox may be seen as primitive in a few years, but Grid finally seems exactly where things should have always been (and what people were trying to accomplish way back when with tables or worse framesets). Even then, Flexbox may be mostly seen as primitive from the sense of "simple lego/duplo tool" compared to Grid's more precise/powerful/capable tools.

[2] https://caniuse.com/#feat=css-grid

[3] https://caniuse.com/#feat=flexbox


Thanks for mentioning UnicodeMath. That does seems like a better fallback solution than raw MathML. It appears there's a newer version of the document you linked to that was posted on HN, by the way: https://news.ycombinator.com/item?id=14687936

I'll also look more closely at CSS Grid.


Thanks for mentioning grid, as that's a tool I've not looked at myself.

CSS columns and Grid are not entirley substitutable, though they share some properties.

I see Columns as a way of flowing text within some bounding box, whilst Grid is preferred for arranging textual components on a page, more akin to paste-up in Aldus Pagemaker (am I dating myself) though on the rubber sheet of the HTML viewport rather than on fixed paper sizes.


CSS columns are actually ... mostly ... pretty useful:

https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Columns...

My preference is to use them with @media queries to create more or fewer columns within auxiliary elements (headers, footers, asides), usually to pretty good effect.

Multi-column body text is largely an abombination.

For images, I'm still largely sticking to floats.

I've done some sidenote styling that I ... think I like. I don't remember how responsive this CodePen is or isn't though I've created some pretty responsive layouts based on it:

https://codepen.io/dredmorbius/full/OVmKZX

I consider equation support a lost cause.


I feel like the difference with Markdown is that it's not meant to be a hidden source format. It's meant to take an existing WYSIWYG styled-text format—the one people use when trying to style text in plaintext e-mail or IM systems—and to give it a secondary rendering semantics corresponding to what people conventionally think their ASCII-art styling "means."

If a Markdown parser breaks down, it's quite correct for it to just spit out the raw source document—because the raw document is already a readable document with clear (cultural/conventional) semantics. All a Markdown parser does is make a Markdown-styled text prettier; it was already a readable final document.


Whether or not it's intended to be a hidden source format, the fact remains that if it does not render reliably and repeatably, it's failing to do its job.

Markdown's job is to be a human-readable, lightweight, unobtrusive way of communicating to software how to structure and format a document.

It's one thing for a freshly-entered document to fail -- errors in markup occur and need to be corrected. It's another to change the behaviour and output of an unchanged text, which is what Markdown implementations have done.

(I've run into this myself on Ello where, For Mysterious and Diverse Reasons, posts and comments which I'd previously entered change their rendering even when I've not touched the content myself. This is compounded by an idiotic editor which literally won't not muck with plain ASCII text entered and insists on inserting or creating hidden characters or control codes. Among the reasons for my eventual disenchantement of what would otherwise be an excellent long-form text-publishing platform.)


> Markdown’s job is... communicating to software

No, that’s a misunderstanding. Markdown is, as I said, a formalization of existing practice. Nobody’s supposed to be “writing Markdown” (except computers that generate it.) You’re supposed to be writing plaintext styled text the same way you always have been in plaintext text inputs. Markdown is supposed to come along and pick up the pieces and turn them into rich text to the best of its ability. Where it fails, it leaves behind the original styled text, which retains the same communication semantics to other humans that the post-transformation rich text would.

The ideal Markdown parser isn’t a grammar/ruleset, but an ML system that understands, learns, and evolves over time with how humans use ASCII art to style plaintext. It’s an autoencoder between streams of ASCII-art text and the production of an AST. (In training such a system, it’d probably also learn—whether you’d like it to or not—to encode ASCII-art smilies as emoji; to encode entirely-parenthetical paragraphs as floating sidebars; to generate tables of contents; etc. These are all “in scope” for the concept of Markdown.)

In short: you aren’t supposed to learn Markdown; Markdown is supposed to learn you (the general “you”, i.e. humans who write in plaintext) and your way of expressing styles.

If there’s any required syntax in Markdown that a human unversed in Markdown wouldn’t understand at first glance as part of a plaintext e-mail, then Markdown as a project has failed. (This is partly why Markdown doesn’t cover every potentially kind of formatting: some rich-text formatting tags just don’t have any ASCII-art-styled plaintext conventions that people will recognize, so Markdown cannot include them. That’s where Markdown expects you to just write HTML instead, because at that point you’ve left the domain of the “things non-computer people reading will understand”, so you may as well use a powerful explicit formal language, rather than a conventional one.)


Interesting viewpoint, though not one that persuades me.

At least not today ;-)

Human expression is ultimately ambiguous. In creating some typographic output, you've got to ultimately resolve or remove that ambiguity. Preferably in some consistent fashion.

There's an inherent tension there. And either you live with the ambiguity or you resolve it. I lean on the "deambiguate" side. Maybe that means using Markdown as a starting point and translating it ultimately to some less-ambiguous (but also less convenient) format, as I've noted.

But that means that the "authoritative source" (Markdown manuscript) is not authoritative, at least as regards formatting guidelines. Whether or not this is actually a more accurate reflection of the status quo ante in previous, print-based, typographic practice, in which an author submits a text but a typesetter translates that into a typographic projection, making interpretations where necessary to resolve ambiguities or approximate initial intent, I don't know.

Interesting from a philosophical intent/instantiation perspective though.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: