Naming  
Closed (initial) development 
Lack of formal grammar 
Lack of tables 
Just use another format (e.g. asciidoc) 
Ambiguity is a feature 
Markdown is a popular language with an ambiguous, poorly specified spec and buggy default implementation (so buggy that the vast majority of Markdown users have probably never used it, if they even know it exists). Now, we have a much more well-specified English language spec that explicitly addresses the challenges of the grammar, with a test suite that does a much better job of covering the corner cases, and much better default implementations. These improvements are by no means perfect, but they are improvements.
Yes, we need a formal grammar. Yes, we need tables, and other features. Yes, the naming is unfortunate, though frankly I don't feel sorry for Gruber given that Atwood posted a public letter calling for change two years ago. Yes, these are all issues. But it is an achievement nevertheless, and let's celebrate that.
And then we can get back to work.
More generally, I think we have issues with entitlement here on HN. Frankly, we have no fundamental right to either the original Markdown or this new version. But when posts like this come up, we hack away at them as if we had a right for better, as if the authors should be working to please us personally. We need to stop acting like we have a right to sentence judgement over what others release as open source.
Some examples of ambiguities:
1. It does not specify precedence. For example, if a line like "~~~" (or "[ref]: /url") is followed by a setext underline, is that a header, or is that the start of a fenced code block (or ref definition)?
2. The spec says: "Code span backticks have higher precedence than any other inline constructs except HTML tags and autolinks". It says as an example that "<a href="`">`" is a HTML tag. What happens for different placement of backticks, like "<a `href=""`>" or even "`<a href="">`" is left unspecified.
3. What is the precedence or associativity of span-level constructs? For example, does "<asterisk>a[b<asterisk>](url)" result in "a[b" being emphasised or "b<asterisk>" being linked?
Thing is, a specification-by-example like this would have to keep an ever-growing list of corner cases and give examples for each of them. To get completely unambiguous, the list needs to be very long, and when it gets very long, it becomes unwieldy to handle for an implementer of the spec.
Hence the need for a formal grammar, which is the shortest way of expressing something unambiguously. But it's not possible to write a CFG for Markdown because of Markdown's requirement that anything is valid input. So the next best thing is to define a parsing algorithm, like the HTML5 spec. (Shameless plug: vfmd (http://www.vfmd.org/) is one such Markdown spec which specifies an unambiguous way to parse Markdown, with tests and a reference implementation.)
So if "Standard Markdown" is NOT unambiguous and wouldn't be, then it's not a "standard", so calling it "Standard Markdown" is not quite proper.
Re (3): we have an asterisk which can open emphasis. So, to see if we have emphasis, the rules say to parse inlines sequentially until an asterisk that can close emphasis is reached. The first inline we come to is [b*](url), which is a link. There's no closing asterisk, so we don't have emphasis, but a literal asterisk followed by a link.
Re (1): I believe you are right that the case of a referenc e definition before a setext header line should be clarified. However, the other case seems clear enough. ~~~ starts a fenced code block, which ends with a closing string of tildes or the end of the enclosing container. The underline would be included in that code block either way.
Re (2): I believe the talk of precedence may be misleading here (I thought it would be useful heuristically). The basic principle of inline parsing is to go left to right, consuming inlines that match the specs. This resolves all of these cases. Perhaps the talk of precedence should be removed.
I am no stranger to formal specifications. I wrote what I think was the first PEG grammar for markdown (peg-markdown, which came to be used as the basis for multimarkdown and several other implementations). PEG isn't a good fit, especially for block-level parsing. It almost works for inline-level parsing, but there are some constructs (like code spans) that can't be done in PEGs. It might be worth specifying inline parsing in a pseudo-PEG format to avoid worries like those you've expressed.
I understand what you have now is a provisional spec, but I have reason to believe that a specification based on declaring constructs and defining by examples is never going to get completely unambiguous. A lot of the ambiguity in parsing Markdown lies in the interplay between different syntax constructs. A spec like yours doesn't address them at all, so they remain as ambiguities. All examples of ambiguities I gave involve the interplay of different constructs (more on them below).
It's debatable whether translating your code to English is "simple" without talking about memory addresses, pointers and arrays. In any case, vfmd is _not_ such a translation (I'm not saying that you imply that it is). vfmd was first written as a spec, then tests written to match the spec, and then implemented, followed by more tests. (However, the spec did get fixes during testcase development and implementation.)
> But it seemed to us that there was value in giving a declarative specification of the syntax, one that was closer to the way a human reader or writer would think, as opposed to a computer.
I agree there is value in making an easy-to-read syntax description. However, making a readable specification for document-writers and making an unambiguous specification for parser-developers are opposing objectives. The document writer asks "What should I do to get a heading?", while a parser developer asks "How should I interpret a line starting with a hash?". Your spec is good if you target only document writers, but falls short as a spec for parser developers, because of the ambiguities.
In vfmd, I addressed this by creating two documents - one for document-writers and one for parser-developers - that are consistent with each other.
On the specific examples:
> Re (3) ... the rules say to parse inlines sequentially until an asterisk that can close emphasis is reached
Yes, but where does your spec say that an asterisk can not close emphasis if it's contained within a link? As it stands now, going by the rules in the emphasis part of the spec (section 6.4), it should be treated as emphasis, and going by the rules in the link part of the spec (section 4.7), it should be treated as a link. The spec is silent on corner cases where multiple constructs overlap: Does the leftmost construct always win? What happens if it's not a well-formed link? What if three syntax constructs interleave?
> Re (1): ... ~~~ starts a fenced code block, which ends with a closing string of tildes or the end of the enclosing container. The underline would be included in that code block either way.
Going by the setext headers section of your spec (section 4.3), I'm not at all sure why a "~~~" line followed by a "===" line is not a setext header. Yes, your implementation interprets it as a code block, but your spec is ambiguous on how this _should_ be interpreted.
> Re (2): ... The basic principle of inline parsing is to go left to right, consuming inlines that match the specs. This resolves all of these cases. ...
If the basic principle of inline parsing is to go left to right and if all inline constructs should be parsed like that, then "[not a `link](/foo`)" should be interpreted as a link (which is contrary to Example 240 in your spec). Clearly, code spans should have a higher priority, but that needs more than a couple of examples to define correctly.
This principle is also looks contrary to your reply to (3) above, where you say "<asterisk>a[b<asterisk>](url)" is a link, not emphasis.
As noted above, the problem with a declarative spec for Markdown is the ambiguity (which is quite similar to the ambiguity in defining Markdown as a CFG, for example). As long as the spec is declarative, there will be multiple ways of interpreting an input (which, ironically, was the problem that parser-developers found with John Gruber's original Markdown syntax description too). Problems like this cannot be completely solved by providing examples because the combinations between the different constructs are too many to list as examples in a spec.
I only listed these items to illustrate the bigger problems in the design or style of the spec itself. Even if these individual items are addressed, there will always be more coming up, so I don't think it would make sense for me to keep finding and reporting ambiguities to your Discourse forum.
We considered writing the spec in the state machine vein, but I advocated for the declarative style. It may be worth rethinking that and rewriting it, essentially spelling out the parsing algorithm. As you suggest, a parallel document could be created for writers.
I'll need to study your spec further to see what the substantive differences are.
I'll be happy to open a post in talk.commonmark.org on the ambiguity problems caused by using a declarative style for the stmd spec. I'll do that once the forum is back (I can't seem to access it right now).
In parallel, I too will try to work out what the syntax differences are between stmd and vfmd. Meanwhile, please see: http://www.vfmd.org/differences/ (in case you haven't already).
Post on declarative style: http://talk.commonmark.org/t/571
Syntax diff: https://github.com/vfmd/vfmd-spec/wiki/commonmark-vs-vfmd
Honest question here: how do CFGs prevent you from parsing anything as valid input? E.g. AFAICT this CFG in BNF accepts anything as valid input (including no input!)
<s> ::= <x> EOF
<x> ::= CHAR <x>
Also, isn't there a compromise between HTML's crazy parsing strategy and a CFG? A formal grammar, even if not context-free.
Actually, I should have said it's not possible to write an _unambiguous_ CFG for Markdown.
Say we need to parse emphasis in span elements. "_a_" is em and "__a__" is strong, but "_a", "a_", "__a" and "a__" are normal text. If we write the rules for all these, we end up with a grammar than can generate the same string in many different ways. To determine whether an "_" is the syntax qualifier of an em or just part of normal text, we might have to look ahead an arbitrary number of characters, and potentially till the end of the input. This is why it's not possible to write a useful (or unambiguous) CFG for Markdown, and this is because of the requirement to not throw an error on any input.
> Also, isn't there a compromise between HTML's crazy
> parsing strategy and a CFG?
PEGs have been written for Markdown and they work because PEGs are inherently unambiguous, but use backtracking instead. But those PEGs don't handle nested blocks cleanly.
My own HTML5-ish Markdown spec (http://www.vfmd.org/vfmd-spec/specification/) is not as crazy as HTML5's, but admittedly, is not trivial to implement either.
> But those PEGs don't handle nested blocks cleanly.
What's the problem exactly?
More details on PEG for Markdown:
- from the author of the PEG grammar, who's also the spec-writer of "Common Markdown": http://talk.standardmarkdown.com/t/standard-markdown-formal-...
- from myself: http://www.vfmd.org/introduction/#prior-work
Also, I wrote an expanded explanation of essentially what I said about CFGs and Markdown: http://roopc.net/posts/2014/markdown-cfg/
> A specification-by-example like this would have
to keep an ever-growing list of corner cases and give examples for each of them. To get completely unambiguous,
the list needs to be very long, and when it gets very
long, it becomes unwieldy to handle for an implementer of
Even more troubling, they skipped the chance for some basic innovations which will probably ultimately result in a Standard Markdown 2 spec. So, for example, they are defining Markdown as a mapping to HTML, rather than a mapping to an internal tree structure which can then be serialized to HTML. If you make that change in perspective, then you can have Markdown for other languages too: not just HTML but also literate code in an arbitrary language, for example.
Another innovation which should probably work its way into Markdown as it becomes more of a file format is metadata. It's a little hard to remember, but acceptable metadata tagging was one of the killer features of MP3s, leading ultimately to their global rise. We don't have a good metadata expression for text files, and Markdown's embedded link references are, essentially, a sort of metadata already. Do this before it gets to the W3C so that we can start off a document with a simple
@author: Chris Drost
 This isn't a huge change in the language but it's a huge change in perspective. The main decision needed to fix this is to say that the "embedded HTML blocks" should have a special sigil at the beginning which is not the < character of the first tag; those "raw" blocks are then held separately in the Markdown tree, and the serializer to HTML passes the raw blocks through without HTML escaping or embedding in another tag.]
 Why not just use backticks? We could, of course. One problem here though is that there is no good way to distinguish those literate-code blocks which are commentary and those literate-code blocks which are code to be executed. If you don't fix that now, it will probably be fixed in SM2.
Criticism is valuable. If I release anything into the public, I expect criticism. Heck, I even look forward to being criticized because it means people care about what I did (even if I did it wrong!)
"Neither the name “Markdown” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission."
It's a total breach of license, and on a fairly standard BSD too.
I'm seriously getting tired of copy-pasting this, and surprised that ya'll are defending this whole thing. but I guess I shouldn't.
Anyway, I better jet before I start trolling. In the end, I'm ok with making a better markdown, but completely opposed to breaking the law.
⚫ The spec isn't code. More to the point, it's not a derivative work of Markdown itself. The BSD license applies to the licensed work as a whole. What it doesn't apply to, specifically, is other works which refer to the original. That is: copyright is not trademark, and you cannot embed an effective trademark license within a copyright license.
⚫ Works (parsers, libraries, etc.) based on the Standard Markdown spec which do not incorporate any of Gruber's original Markdown code are themselves not derivative works of Gruber's original BSD-licensed work, and again, are not governed by the license.
⚫ There are a number of things copyright doesn't apply to: functional works (SEGA v. Accolade), simple compilations of facts (Feist v. Rural Electric), and APIs or standards (I'm aware of a few cases involving these, though I don't recall specifically if any are precedent). Specifically, however, a copyright license cannot cover facts or functional design. It only applies to a specific expression of an idea. Paraphrase that idea and you're scott-free.
So while you raise an interesting point, it's moot here.
Now we get into the question of whether an abstract API can be copyrighted -- we all assumed no, although the Oracle v. Google case (about Java), suggested an abstract API could be copyrighted. With, most of HN thinks, fairly disastrous consequences if that holds as law, although I think it's probably going to be litigated more.
I frankly doubt Gruber was assuming that his abstract (poorly specified!) API for Markdown could be copyrighted. And almost everyone on HN seemed to think it would be disastrous if that were the case regarding Java; it would be somewhat hypocritical if the same people think it somehow ought to be the case regarding Markdown.
It's too bad because I actually really like this, but I think it's not going to take off because of all the courtroom drama.
That never hurt Perl... :-) Sorry, couldn't resist.
Your sentiments are well-taken.
No one "owns" Markdown, and quibbling over words is unproductive.
There is a concept: "a simple in-line markup language that is easy to write and maps clearly to HTML". One instance of that concept is a specific, buggy, ill-defined language called "Markdown".
"Standard Markdown" is another instance of the same concept. This is the way English works: adding modifiers to words that designate one instance of a concept to designate other instances of the same concept.
Getting to an EBNF grammar from the spec would be awesome, but the spec is a great place to start. All spec generation is tendentious, and it is common to have multiple people highly invested in the generation of different specs for the same nominal standard. Politics, as Aristotle tells us, is the master art. This is why.
So you think anyone would think it was totally cool if someone released things called 'Standard Excel' and 'Standard MacOS'? That wouldn't get by without criticism, why should this?
That is to say, you can't really spend a decade or two letting "Markdown" sort of mean whatever people want it to mean, and only then cry foul for the N+1 person to use the term. "Excel" and "MacOS" have not been genericized (is that a word?) to the same extent that "Markdown" has.
Legalities aside, this is a dick move that goes directly against the license.
As for your genericization argument, what do you think are the first 2 results in Google for markdown? And the wikipedia description (3rd result) includes both the syntax and the 'tool'. Popular != Generic
Also, if the problem with "Standard Markdown" is the use of the term "Markdown", then the license is relevant. But that means that everyone else (Multi-markdown, Markdown Extra, Github-flavored Markdown, Python-flavored Markdown, Markdown Python, Markdown PHP, etc. are all equally guilty of a "dick move". I don't see anyone arguing that calling it "XXX Markdown" is a problem. It's calling it Standard Markdown that they have a problem with, and the license doesn't really address that without also addressing everything else.
I do get the reaction to some degree. I would stop short of calling it a "dick move", but maybe it's a bit tone-deaf admittedly. I just think that the horse has left the barn on taking offense that someone implemented something sort of like what you called "Markdown" and called their version some form of "Markdown".
Gruber is OK with "Github flavored Markdown" because the name makes it sound like a fork. "Standard Markdown" does not sound like a fork. See my earlier comment in this thread for a cite to Gruber verifying that this is his problem with the name.
The thing that makes this weird in that context is that I can't think offhand of any examples where the inventor of "X" has been opposed to the idea of doing anything about the mass of incompatible versions floating around. Usually, you have a language, there are a few implementations, and then everyone gets together to nail down what an "official" version should do. My gut is that at some point, if after a decade of pleading, the maintainer of an "official" version is not stepping up, the community is OK to go on without them, but clearly there are arguments on both sides there.
I don't think anyone would dispute that both Excel and MacOS are, in fact, trademarks established through bona fide use in commerce and which have not been genericized, and which, therefore, currently serve to clearly designate the origin of particular commercial products.
I don't think the same consensus exists with Markdown. In fact, I suspect if there was a general consensus that "Markdown" had some clear meaning, whether or not it was to designate the original of a particular commercial product, the "Standard Markdown" project wouldn't exist.