If you think there are ambiguities, please comment at http://talk.standardmarkdo...

roop · on Sept 5, 2014

jgm, thanks for your comment. I have a lot of respect for your work in Markdown parsing (especially your PEG grammars and Babelmark2), which I find useful and valuable.

I understand what you have now is a provisional spec, but I have reason to believe that a specification based on declaring constructs and defining by examples is never going to get completely unambiguous. A lot of the ambiguity in parsing Markdown lies in the interplay between different syntax constructs. A spec like yours doesn't address them at all, so they remain as ambiguities. All examples of ambiguities I gave involve the interplay of different constructs (more on them below).

> The C and javascript implementations use a parsing algorithm that we could have simply translated into English and called a spec. (That's the sort of spec vfmd gives.)

It's debatable whether translating your code to English is "simple" without talking about memory addresses, pointers and arrays. In any case, vfmd is _not_ such a translation (I'm not saying that you imply that it is). vfmd was first written as a spec, then tests written to match the spec, and then implemented, followed by more tests. (However, the spec did get fixes during testcase development and implementation.)

> But it seemed to us that there was value in giving a declarative specification of the syntax, one that was closer to the way a human reader or writer would think, as opposed to a computer.

I agree there is value in making an easy-to-read syntax description. However, making a readable specification for document-writers and making an unambiguous specification for parser-developers are opposing objectives. The document writer asks "What should I do to get a heading?", while a parser developer asks "How should I interpret a line starting with a hash?". Your spec is good if you target only document writers, but falls short as a spec for parser developers, because of the ambiguities.

In vfmd, I addressed this by creating two documents - one for document-writers and one for parser-developers - that are consistent with each other.

On the specific examples:

> Re (3) ... the rules say to parse inlines sequentially until an asterisk that can close emphasis is reached

Yes, but where does your spec say that an asterisk can not close emphasis if it's contained within a link? As it stands now, going by the rules in the emphasis part of the spec (section 6.4), it should be treated as emphasis, and going by the rules in the link part of the spec (section 4.7), it should be treated as a link. The spec is silent on corner cases where multiple constructs overlap: Does the leftmost construct always win? What happens if it's not a well-formed link? What if three syntax constructs interleave?

> Re (1): ... ~~~ starts a fenced code block, which ends with a closing string of tildes or the end of the enclosing container. The underline would be included in that code block either way.

Going by the setext headers section of your spec (section 4.3), I'm not at all sure why a "~~~" line followed by a "===" line is not a setext header. Yes, your implementation interprets it as a code block, but your spec is ambiguous on how this _should_ be interpreted.

> Re (2): ... The basic principle of inline parsing is to go left to right, consuming inlines that match the specs. This resolves all of these cases. ...

If the basic principle of inline parsing is to go left to right and if all inline constructs should be parsed like that, then "[not a `link](/foo`)" should be interpreted as a link (which is contrary to Example 240 in your spec). Clearly, code spans should have a higher priority, but that needs more than a couple of examples to define correctly.

This principle is also looks contrary to your reply to (3) above, where you say "<asterisk>a[b<asterisk>](url)" is a link, not emphasis.

As noted above, the problem with a declarative spec for Markdown is the ambiguity (which is quite similar to the ambiguity in defining Markdown as a CFG, for example). As long as the spec is declarative, there will be multiple ways of interpreting an input (which, ironically, was the problem that parser-developers found with John Gruber's original Markdown syntax description too). Problems like this cannot be completely solved by providing examples because the combinations between the different constructs are too many to list as examples in a spec.

I only listed these items to illustrate the bigger problems in the design or style of the spec itself. Even if these individual items are addressed, there will always be more coming up, so I don't think it would make sense for me to keep finding and reporting ambiguities to your Discourse forum.

fiddlosopher · on Sept 5, 2014

Your comments (coming from someone who has actually tackled this surprisingly difficult task) are some of the most valuable we've received; having them on the Discourse forum would be great.

We considered writing the spec in the state machine vein, but I advocated for the declarative style. It may be worth rethinking that and rewriting it, essentially spelling out the parsing algorithm. As you suggest, a parallel document could be created for writers.

I'll need to study your spec further to see what the substantive differences are.

roop · on Sept 8, 2014

Thanks. Really happy to see that you're open to a complete rewrite of the stmd spec (to a possible algorithm-based style).

I'll be happy to open a post in talk.commonmark.org on the ambiguity problems caused by using a declarative style for the stmd spec. I'll do that once the forum is back (I can't seem to access it right now).

In parallel, I too will try to work out what the syntax differences are between stmd and vfmd. Meanwhile, please see: http://www.vfmd.org/differences/ (in case you haven't already).

roop · on Sept 12, 2014

Here they are:

Post on declarative style: http://talk.commonmark.org/t/571

Syntax diff: https://github.com/vfmd/vfmd-spec/wiki/commonmark-vs-vfmd

encoder · on Sept 11, 2014

I (for one) can't wait to see what the two of you can do together. I highly respect the work you've both done around Markdown, and I think you could easily accomplish your goals (of a consistent and sensible Markdown parsing rule-set) as a team.