
A Formal Spec for GitHub Flavored Markdown - samlambert
https://githubengineering.com/a-formal-spec-for-github-markdown/
======
jorams
It's odd how neither this post, nor the spec, nor GitHub's "Mastering
Markdown" help page[1], nor the more complete "Basic writing and formatting
syntax" page[2], mentions the fact that GitHub treats every newline as a hard
break.

CommonMark contains this little sentence to work around its specified
behavior, which is left untouched in the GFM spec:

> A renderer may also provide an option to render soft line breaks as hard
> line breaks.

I'd say whether or not it does this is a rather important thing to mention.
When I write Markdown documents for GitHub I have to change my editor
settings, only because of this.

[1]: [https://guides.github.com/features/mastering-
markdown/](https://guides.github.com/features/mastering-markdown/)

[2]: [https://help.github.com/articles/basic-writing-and-
formattin...](https://help.github.com/articles/basic-writing-and-formatting-
syntax/)

~~~
kivikakk
GFM itself leaves that line unchanged because we don't actually change that
option in our implementation — the reference implementation `cmark` (which we
built upon) supplies the "hardbreaks" option, described as follows:

> \--hardbreaks Treat newlines as hard line breaks

We turn this option on when rendering issues, issue comments and so on, but
leave it off when rendering blobs (such as README.md). Both are GFM, one just
uses this option to make it more conducive to communication.

~~~
tbrock
Good to know. Why is the setting different for issues than for blobs?

~~~
steveklabnik
This is covered in the post, no?

> There is a fundamental difference between these two kinds of content: the
> user comments are stored in our databases, which means their Markdown syntax
> can be normalized (e.g. by adding or removing whitespace, fixing the
> indentation, or inserting missing Markdown specifiers until they render
> properly). The Markdown documents stored in Git repositories, however,
> cannot be touched at all, as their contents are hashed as part of Git’s
> storage model.

~~~
tbrock
Sure, but I'm still confused as to why the display format setting has anything
to do with the on-disk format. To me they seem completely orthogonal.

~~~
kivikakk
In general people have come to expect that hitting return once in a comment
field on GitHub will produce a newline, having been the case for many years,
so we try to preserve that expectation. Hence not changing the option being
used when rendering comments.

Conversely, they don't expect the same from Markdown files stored in their
repository (e.g. I put each sentence in a paragraph on its own line for my
blog, for easier diffing and editing). Additionally, we couldn't normalise
these documents even if we wanted (to prevent everything breaking by being
over-vertically spaced). Hence not changing the option being not used in this
case!

------
erlend_sh
GitHub devs & Markdown enthusiasts at large, please consider contributing some
brainpower to these last remaining issues that are blocking the v1.0 release
of CommonMark:

[https://talk.commonmark.org/t/issues-we-must-resolve-
before-...](https://talk.commonmark.org/t/issues-we-must-resolve-
before-1-0-release-8-remaining/1287)

------
simplehuman
This is great! A couple of years back, there was a failed attempt at
standardizing this - [http://www.vfmd.org/](http://www.vfmd.org/) and
[http://www.vfmd.org/vfmd-spec/specification/](http://www.vfmd.org/vfmd-
spec/specification/). GitHub given it's popularity will surely succeed more.

~~~
zeveb
There was also Common Mark ([http://commonmark.org/](http://commonmark.org/)),
which failed IMHO mostly due to John Gruber taking offense at their first
choice of name, Common Markdown. Will formalising this as 'GitHub Flavored
Markdown' similarly cause offense?

~~~
gregmac
Did it 'fail'? It seems to have a decent amount of use. And it fixed a bunch
of problems with the original Markdown spec (or lack thereof).

I suppose since one of their goals was to get Github and StackExchange 'out of
the markdown business' but neither use CommonMark, and further, Github now has
put work into creating their own spec, they failed in that aspect.

~~~
steveklabnik
> Github now has put work into creating their own spec

A big aspect of the post is talking about how GFM is now a set of extensions
to CommonMark. This is a huge win, not a failure.

------
legulere
It's a specification based on the commonmark specification. Both are not a
formal specs. They are more of an informal specification with some edge-cases
listed (in contrast to the original markdown specification which has known
unspecified edgecases).

~~~
UK-AL
I really wish their was concise formal spec for markdown, rather than a multi-
page essay. It makes it incredibly difficult for anyone trying to create
something to parse it. There is no mechanical way for go from spec -> parser.

I think its quite difficult to do though.

~~~
adrianN
Is it really useful to write a formal spec for Github Markdown? The software
they use to parse and render it is open source. If you want to know how
_exactly_ something works, you can read the source.

~~~
legulere
Yes, if you also want to have more implementations, that can differ in e.g.
license or programming language.

For instance it is the reason why there's no reimplementation of TeX.

~~~
adrianN
There is pdflatex, luatex,...

Nobody stops you from translating the Ruby or whatever into your favorite
language.

------
rcarmo
I have to wonder why this isn't done in the form of a context-free grammar,
like Hitman[0] uses. Specs in English are still too vague for my liking.

[https://github.com/chameco/Hitman](https://github.com/chameco/Hitman)

~~~
daigoba66
Is it even possible to define a context-free grammar for markdown?

~~~
jws
It is not, and not just because of a few idiosyncrasies like C which require
context.

Fundamentally, markdown was specified as some pattern matching and English
description. The original specification was not done thinking of productions
and grammar rules, and you typically don't get there by accident.

See [http://roopc.net/posts/2014/markdown-
cfg/](http://roopc.net/posts/2014/markdown-cfg/) for a detailed exposition of
how the '*' character in markdown is sufficient to ruin any chance of a CFG.

I have sometimes pondered how to make a markdown-like language with a simple
production based grammar. I have not succeeded and would appreciate any
pointers. The criteria being that is has to have something like the minimal
intrusion into the prose of markdown.

~~~
nerdponx
Is there some kind of more general grammar that can encode the Markdown spec?

~~~
eslaught
There was some discussion of this on the CommonMark forums shortly after the
initial release of CommonMark:

[https://talk.commonmark.org/t/commonmark-formal-
grammar/46](https://talk.commonmark.org/t/commonmark-formal-grammar/46)

See in particular these comments by maradydd:

[https://talk.commonmark.org/t/commonmark-formal-
grammar/46/1...](https://talk.commonmark.org/t/commonmark-formal-
grammar/46/17)

[https://talk.commonmark.org/t/commonmark-formal-
grammar/46/2...](https://talk.commonmark.org/t/commonmark-formal-
grammar/46/22)

[https://talk.commonmark.org/t/commonmark-formal-
grammar/46/2...](https://talk.commonmark.org/t/commonmark-formal-
grammar/46/23)

I'm not sure they ever came to a conclusive answer (at least on this thread).

Edit: Here is JGM himself saying he doesn't know:
[https://talk.commonmark.org/t/commonmark-formal-
grammar/46/3...](https://talk.commonmark.org/t/commonmark-formal-
grammar/46/33)

~~~
Ericson2314
If GitHub is normalizing comments anyways, I wonder they could have adopted a
CFG.

Overall this a step in the right direction but the whole saga is a perfect
microcosm of our understructure cranking out pooly-understood stuff which
comes back to bite us and cannot be tamed.

------
legulere
[https://github.github.com/gfm/#disallowed-raw-html-
extension...](https://github.github.com/gfm/#disallowed-raw-html-extension-)

Why this? This is not a working blacklist to prevent XSS (e.g. onload="...")

~~~
scrollaway
Wow, TIL about the <plaintext> tag. Here I thought I knew most of the corner
cases of HTML.

~~~
dbbk
What's bonkers to me is that there is no closing tag, _everything_ after it is
no longer parsed as HTML.

~~~
ptx
If it disallowed certain words (i.e. treated them specially instead of just
reproducing the text as written) such as "</plaintext>" it wouldn't be plain
text.

------
forsaken
This is great -- lack of a standard that was actually used (unlike CommonMark)
was one of my main issues with Markdown
([http://ericholscher.com/blog/2016/mar/15/dont-use-
markdown-f...](http://ericholscher.com/blog/2016/mar/15/dont-use-markdown-for-
technical-docs/)) -- It's really great to see GitHub leading in this
department, and it gives me hope that one day we might actually have Markdown
that is portable between implementations.

~~~
SEJeff
Perhaps you missed Jeff Atwood (codinghorror)'s standardized markdown spec,
which is about as close as you'll get:

[http://commonmark.org/](http://commonmark.org/)

~~~
forsaken
Yea, I mention commonmark in the post, and referred to it in the "standard
that was used" part -- commonmark is great, but wasn't widely adopted.

Updated my original post to be more clear.

------
IgorPartola
At the risk of starting a mini-flame war, is RST a more cohesive format? If
one was to pick one of the two formats to start using for personal
documentation, which format should one choose?

~~~
mdaniel
I enjoyed this article [http://eli.thegreenplace.net/2017/restructuredtext-vs-
markdo...](http://eli.thegreenplace.net/2017/restructuredtext-vs-markdown-for-
technical-documentation/) and since the article didn't provide supporting
evidence for the Linux/OpenCV/LLVM assertion:
[https://www.kernel.org/doc/html/latest/doc-
guide/sphinx.html](https://www.kernel.org/doc/html/latest/doc-
guide/sphinx.html) and
[http://docs.opencv.org/2.4/doc/tutorials/introduction/how_to...](http://docs.opencv.org/2.4/doc/tutorials/introduction/how_to_write_a_tutorial/how_to_write_a_tutorial.html)
and [https://github.com/llvm-
mirror/llvm/blob/master/docs/index.r...](https://github.com/llvm-
mirror/llvm/blob/master/docs/index.rst) respectively

I do think rST is waaay more expressive, but I also recognize that in many of
the instances one would want to use markup in a chat or PR situation, the
expressiveness likely wouldn't be well received if the trade-off is verbosity.

This is something in life that I file away with competing regex standards: my
brain just has to switch languages based on the app in which I'm typing
(between markdown, pseudo-markdown (ahem, __Slack __), org-mode, rST, etc).

------
patrec
Now if only org-mode could define a sane, parsable format.

~~~
alphapapa
What do you mean?

------
jostmey
HTML used to serve as a simple way to format a document. Now HTML is too
complex for that purpose. Introduce markdown. In ten years, markdown will be
too complex for formatting documents.

I am big of markdown in case I didn't make that clear. I love it

~~~
hawkice
It wasn't an issue of simplicity, it was about HTML being extremely hard to
sanitize so it isn't turing complete. Having formatting (that can still line
up with your sites styles) while knowing you can't get script injection is
pretty useful.

------
dreamcompiler
This is great news! Does anybody have a recommendation for a Javascript parser
for Formal GFM? (I know there are a million JS MD parsers; I'm looking for a
good one that will let me serve GFM docs over HTTP and render them on the
browser.)

~~~
steveklabnik
[https://www.npmjs.com/package/marky-
markdown](https://www.npmjs.com/package/marky-markdown)

~~~
dreamcompiler
Looks good. Thanks.

------
charonn0
So _that 's_ why my project wikis have suddenly stopped rendering markdown
properly. I've been trying to figure out WTF was going on since yesterday!

------
libeclipse
I'm really happy to see this. It's actually quite frustrating that although
markdown is so nice, it barely has a consistent standard. It's almost
impossible to use it cross-service.

Hopefully now that Github has standardised their own flavour of it (and quite
a nice flavour too), more people will start to use it.

Of course there is the obligatory XKCD:
[https://xkcd.com/927/](https://xkcd.com/927/)

~~~
zzleeper
I would argue that it is consistent now.

At the lowest level, you have commonmark. Then, you have extensions at the
top, such as GFM.

If Pandoc/Github/Reddit/SO/kramdown switch, that accounts for almost all
front- and back-end cases that I care about.

And given that the first four were actively involved with commonmark, I would
take as given that they will support commonmark or a superset of it.

~~~
kccqzy
Pandoc already supports different flavors of markdown (commonmark included)
and you can add/remove extensions to base flavors.

------
tomcam
web2py handled all of these issues and made its markup language extensible
with its markmin specification:
[http://www.web2py.com/init/static/markmin.html](http://www.web2py.com/init/static/markmin.html)

------
Siecje
What's wrong with [http://www.vfmd.org/](http://www.vfmd.org/) ???

------
strikedout
Why no ~~strike out~~ in spec?

~~~
kivikakk
Right here: [https://github.github.com/gfm/#strikethrough-
extension-](https://github.github.com/gfm/#strikethrough-extension-) :)

~~~
oneeyedpigeon
Is their no corresponding equivalent of INS? I would have thought something
like

    
    
        +new text+
    

could be used...

------
flippyhead
Jeesh, about time!

------
harmonyinfotech
Is it ok if I promote my domain here? I read the guidelines but it doesn't
mention anything regarding self promotion. Sorry if it ain't appropriate but
anyone looking for a relevant domain ( _markdown.in_ ) please get in touch or
any suggestion if it is better to develop it.

