Hacker News new | past | comments | ask | show | jobs | submit login

Is it even possible to define a context-free grammar for markdown?

It is not, and not just because of a few idiosyncrasies like C which require context.

Fundamentally, markdown was specified as some pattern matching and English description. The original specification was not done thinking of productions and grammar rules, and you typically don't get there by accident.

See http://roopc.net/posts/2014/markdown-cfg/ for a detailed exposition of how the '*' character in markdown is sufficient to ruin any chance of a CFG.

I have sometimes pondered how to make a markdown-like language with a simple production based grammar. I have not succeeded and would appreciate any pointers. The criteria being that is has to have something like the minimal intrusion into the prose of markdown.

Is there some kind of more general grammar that can encode the Markdown spec?

There was some discussion of this on the CommonMark forums shortly after the initial release of CommonMark:


See in particular these comments by maradydd:




I'm not sure they ever came to a conclusive answer (at least on this thread).

Edit: Here is JGM himself saying he doesn't know: https://talk.commonmark.org/t/commonmark-formal-grammar/46/3...

If GitHub is normalizing comments anyways, I wonder they could have adopted a CFG.

Overall this a step in the right direction but the whole saga is a perfect microcosm of our understructure cranking out pooly-understood stuff which comes back to bite us and cannot be tamed.

Interesting. I wonder if there's a thesis in solving this problem.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact