Hacker News new | past | comments | ask | show | jobs | submit login

I'm a little surprised at the level of negativity in the majority of the comments here. I understand our cultural need for criticism here, but the critics here have touched almost everything (and note that the most insubstantial of claims have bubbled up to the top of the comments here):

    Naming [2] [5]
    Closed (initial) development [1]
    Lack of formal grammar [3]
    Lack of tables [4]
    Just use another format (e.g. asciidoc) [6]
    Ambiguity is a feature [7]
Can we please take a moment to appreciate the achievement here? As opposed to fulfilling our own sense entitlement?

Markdown is a popular language with an ambiguous, poorly specified spec and buggy default implementation (so buggy that the vast majority of Markdown users have probably never used it, if they even know it exists). Now, we have a much more well-specified English language spec that explicitly addresses the challenges of the grammar, with a test suite that does a much better job of covering the corner cases, and much better default implementations. These improvements are by no means perfect, but they are improvements.

Yes, we need a formal grammar. Yes, we need tables, and other features. Yes, the naming is unfortunate, though frankly I don't feel sorry for Gruber given that Atwood posted a public letter calling for change two years ago. Yes, these are all issues. But it is an achievement nevertheless, and let's celebrate that.

And then we can get back to work.

----

More generally, I think we have issues with entitlement here on HN. Frankly, we have no fundamental right to either the original Markdown or this new version. But when posts like this come up, we hack away at them as if we had a right for better, as if the authors should be working to please us personally. We need to stop acting like we have a right to sentence judgement over what others release as open source.

[1]: https://news.ycombinator.com/item?id=8265469

[2]: https://news.ycombinator.com/item?id=8266370

[3]: https://news.ycombinator.com/item?id=8264828

[4]: https://news.ycombinator.com/item?id=8265299

[5]: https://news.ycombinator.com/item?id=8266073

[6]: https://news.ycombinator.com/item?id=8264895

[7]: https://news.ycombinator.com/item?id=8266121




The problem is that there's no formal grammar and the spec of "Standard Markdown", while being more specific than John Gruber's, is still full of ambiguities.

Some examples of ambiguities:

1. It does not specify precedence. For example, if a line like "~~~" (or "[ref]: /url") is followed by a setext underline, is that a header, or is that the start of a fenced code block (or ref definition)?

2. The spec says: "Code span backticks have higher precedence than any other inline constructs except HTML tags and autolinks". It says as an example that "<a href="`">`" is a HTML tag. What happens for different placement of backticks, like "<a `href=""`>" or even "`<a href="">`" is left unspecified.

3. What is the precedence or associativity of span-level constructs? For example, does "<asterisk>a[b<asterisk>](url)" result in "a[b" being emphasised or "b<asterisk>" being linked?

Thing is, a specification-by-example like this would have to keep an ever-growing list of corner cases and give examples for each of them. To get completely unambiguous, the list needs to be very long, and when it gets very long, it becomes unwieldy to handle for an implementer of the spec.

Hence the need for a formal grammar, which is the shortest way of expressing something unambiguously. But it's not possible to write a CFG for Markdown because of Markdown's requirement that anything is valid input. So the next best thing is to define a parsing algorithm, like the HTML5 spec. (Shameless plug: vfmd (http://www.vfmd.org/) is one such Markdown spec which specifies an unambiguous way to parse Markdown, with tests and a reference implementation.)

So if "Standard Markdown" is NOT unambiguous and wouldn't be, then it's not a "standard", so calling it "Standard Markdown" is not quite proper.


If you think there are ambiguities, please comment at http://talk.standardmarkdown.com. This is meant to be a provisional spec, up for comment. There is undoubtedly room for improvement.

The C and javascript implementations use a parsing algorithm that we could have simply translated into English and called a spec. (That's the sort of spec vfmd gives.) But it seemed to us that there was value in giving a declarative specification of the syntax, one that was closer to the way a human reader or writer would think, as opposed to a computer.

Re (3): we have an asterisk which can open emphasis. So, to see if we have emphasis, the rules say to parse inlines sequentially until an asterisk that can close emphasis is reached. The first inline we come to is [b*](url), which is a link. There's no closing asterisk, so we don't have emphasis, but a literal asterisk followed by a link.

Re (1): I believe you are right that the case of a referenc e definition before a setext header line should be clarified. However, the other case seems clear enough. ~~~ starts a fenced code block, which ends with a closing string of tildes or the end of the enclosing container. The underline would be included in that code block either way.

Re (2): I believe the talk of precedence may be misleading here (I thought it would be useful heuristically). The basic principle of inline parsing is to go left to right, consuming inlines that match the specs. This resolves all of these cases. Perhaps the talk of precedence should be removed.

I am no stranger to formal specifications. I wrote what I think was the first PEG grammar for markdown (peg-markdown, which came to be used as the basis for multimarkdown and several other implementations). PEG isn't a good fit, especially for block-level parsing. It almost works for inline-level parsing, but there are some constructs (like code spans) that can't be done in PEGs. It might be worth specifying inline parsing in a pseudo-PEG format to avoid worries like those you've expressed.


jgm, thanks for your comment. I have a lot of respect for your work in Markdown parsing (especially your PEG grammars and Babelmark2), which I find useful and valuable.

I understand what you have now is a provisional spec, but I have reason to believe that a specification based on declaring constructs and defining by examples is never going to get completely unambiguous. A lot of the ambiguity in parsing Markdown lies in the interplay between different syntax constructs. A spec like yours doesn't address them at all, so they remain as ambiguities. All examples of ambiguities I gave involve the interplay of different constructs (more on them below).

> The C and javascript implementations use a parsing algorithm that we could have simply translated into English and called a spec. (That's the sort of spec vfmd gives.)

It's debatable whether translating your code to English is "simple" without talking about memory addresses, pointers and arrays. In any case, vfmd is _not_ such a translation (I'm not saying that you imply that it is). vfmd was first written as a spec, then tests written to match the spec, and then implemented, followed by more tests. (However, the spec did get fixes during testcase development and implementation.)

> But it seemed to us that there was value in giving a declarative specification of the syntax, one that was closer to the way a human reader or writer would think, as opposed to a computer.

I agree there is value in making an easy-to-read syntax description. However, making a readable specification for document-writers and making an unambiguous specification for parser-developers are opposing objectives. The document writer asks "What should I do to get a heading?", while a parser developer asks "How should I interpret a line starting with a hash?". Your spec is good if you target only document writers, but falls short as a spec for parser developers, because of the ambiguities.

In vfmd, I addressed this by creating two documents - one for document-writers and one for parser-developers - that are consistent with each other.

On the specific examples:

> Re (3) ... the rules say to parse inlines sequentially until an asterisk that can close emphasis is reached

Yes, but where does your spec say that an asterisk can not close emphasis if it's contained within a link? As it stands now, going by the rules in the emphasis part of the spec (section 6.4), it should be treated as emphasis, and going by the rules in the link part of the spec (section 4.7), it should be treated as a link. The spec is silent on corner cases where multiple constructs overlap: Does the leftmost construct always win? What happens if it's not a well-formed link? What if three syntax constructs interleave?

> Re (1): ... ~~~ starts a fenced code block, which ends with a closing string of tildes or the end of the enclosing container. The underline would be included in that code block either way.

Going by the setext headers section of your spec (section 4.3), I'm not at all sure why a "~~~" line followed by a "===" line is not a setext header. Yes, your implementation interprets it as a code block, but your spec is ambiguous on how this _should_ be interpreted.

> Re (2): ... The basic principle of inline parsing is to go left to right, consuming inlines that match the specs. This resolves all of these cases. ...

If the basic principle of inline parsing is to go left to right and if all inline constructs should be parsed like that, then "[not a `link](/foo`)" should be interpreted as a link (which is contrary to Example 240 in your spec). Clearly, code spans should have a higher priority, but that needs more than a couple of examples to define correctly.

This principle is also looks contrary to your reply to (3) above, where you say "<asterisk>a[b<asterisk>](url)" is a link, not emphasis.

As noted above, the problem with a declarative spec for Markdown is the ambiguity (which is quite similar to the ambiguity in defining Markdown as a CFG, for example). As long as the spec is declarative, there will be multiple ways of interpreting an input (which, ironically, was the problem that parser-developers found with John Gruber's original Markdown syntax description too). Problems like this cannot be completely solved by providing examples because the combinations between the different constructs are too many to list as examples in a spec.

I only listed these items to illustrate the bigger problems in the design or style of the spec itself. Even if these individual items are addressed, there will always be more coming up, so I don't think it would make sense for me to keep finding and reporting ambiguities to your Discourse forum.


Your comments (coming from someone who has actually tackled this surprisingly difficult task) are some of the most valuable we've received; having them on the Discourse forum would be great.

We considered writing the spec in the state machine vein, but I advocated for the declarative style. It may be worth rethinking that and rewriting it, essentially spelling out the parsing algorithm. As you suggest, a parallel document could be created for writers.

I'll need to study your spec further to see what the substantive differences are.


Thanks. Really happy to see that you're open to a complete rewrite of the stmd spec (to a possible algorithm-based style).

I'll be happy to open a post in talk.commonmark.org on the ambiguity problems caused by using a declarative style for the stmd spec. I'll do that once the forum is back (I can't seem to access it right now).

In parallel, I too will try to work out what the syntax differences are between stmd and vfmd. Meanwhile, please see: http://www.vfmd.org/differences/ (in case you haven't already).



I (for one) can't wait to see what the two of you can do together. I highly respect the work you've both done around Markdown, and I think you could easily accomplish your goals (of a consistent and sensible Markdown parsing rule-set) as a team.


> But it's not possible to write a CFG for Markdown because of Markdown's requirement that anything is valid input.

Honest question here: how do CFGs prevent you from parsing anything as valid input? E.g. AFAICT this CFG in BNF accepts anything as valid input (including no input!)

    <s> ::= <x> EOF
          | EOF

    <x> ::= CHAR <x>
          | CHAR
Isn't it just a problem of crafting a proper grammar which covers all cases? I can see why HTML needs other parsing strategies, but it should be easy for Markdown since everything that is not a non-paragraph is a paragraph (or a blank line, i.e. a paragraph separator).

Also, isn't there a compromise between HTML's crazy parsing strategy and a CFG? A formal grammar, even if not context-free.


True, we can write a CFG that can accept any input, but not one that can parse Markdown.

Actually, I should have said it's not possible to write an _unambiguous_ CFG for Markdown.

Say we need to parse emphasis in span elements. "_a_" is em and "__a__" is strong, but "_a", "a_", "__a" and "a__" are normal text. If we write the rules for all these, we end up with a grammar than can generate the same string in many different ways. To determine whether an "_" is the syntax qualifier of an em or just part of normal text, we might have to look ahead an arbitrary number of characters, and potentially till the end of the input. This is why it's not possible to write a useful (or unambiguous) CFG for Markdown, and this is because of the requirement to not throw an error on any input.

> Also, isn't there a compromise between HTML's crazy > parsing strategy and a CFG?

PEGs have been written for Markdown and they work because PEGs are inherently unambiguous, but use backtracking instead. But those PEGs don't handle nested blocks cleanly.

My own HTML5-ish Markdown spec (http://www.vfmd.org/vfmd-spec/specification/) is not as crazy as HTML5's, but admittedly, is not trivial to implement either.


Interesting reply.

> But those PEGs don't handle nested blocks cleanly.

What's the problem exactly?


With PEGs, it's not possible to express nested content, so we have to collate for example the blockquoted content of blockquotes and process them separately. Per my understanding (and I might be wrong here), part of the problem is that PEG includes tokenization, so it's not possible to handle blockquote-levels in the tokeniser separately.

More details on PEG for Markdown: - from the author of the PEG grammar, who's also the spec-writer of "Common Markdown": http://talk.standardmarkdown.com/t/standard-markdown-formal-... - from myself: http://www.vfmd.org/introduction/#prior-work

Also, I wrote an expanded explanation of essentially what I said about CFGs and Markdown: http://roopc.net/posts/2014/markdown-cfg/


well, they are working on it. Rome wasn't built in a day.


It wasn't, but this is not a good way to build it either because (from my previous post):

> A specification-by-example like this would have to keep an ever-growing list of corner cases and give examples for each of them. To get completely unambiguous, the list needs to be very long, and when it gets very long, it becomes unwieldy to handle for an implementer of the spec.


I mean, they've explicitly disclaimed that this is version 1.0, but they've also explicitly claimed that this is complete, which it doesn't seem to me. This spec is extremely repetitive (0-3 spaces is okay, 4 spaces is too much) and doesn't actually follow a "top-down" approach which would resolve things like precedence. What they've released is something more like a testing suite.

Even more troubling, they skipped the chance for some basic innovations which will probably ultimately result in a Standard Markdown 2 spec. So, for example, they are defining Markdown as a mapping to HTML, rather than a mapping to an internal tree structure which can then be serialized to HTML.[1] If you make that change in perspective, then you can have Markdown for other languages too: not just HTML but also literate code in an arbitrary language, for example.

Another innovation which should probably work its way into Markdown as it becomes more of a file format is metadata. It's a little hard to remember, but acceptable metadata tagging was one of the killer features of MP3s, leading ultimately to their global rise. We don't have a good metadata expression for text files, and Markdown's embedded link references are, essentially, a sort of metadata already. Do this before it gets to the W3C so that we can start off a document with a simple

    @author: Chris Drost 
because once the W3C gets a hold of it that's likely to become something more messy like:

    @author[http://www.w3.org/TR/markdown/2/] "Chris\u00a0Drost"
Another nice change would be an implementation-dependent $extra sigil$ for inline text.[2] Some LaTeX math sites use Markdown with precisely this extension for inline equations; it might be nice to say "this is mapped to a $ node, but the meaning of that is dependent on the tree-reader; the HTML reader simply prepends and appends a literal dollar sign to the text without embedding it in a tag."

[1] This isn't a huge change in the language but it's a huge change in perspective. The main decision needed to fix this is to say that the "embedded HTML blocks" should have a special sigil at the beginning which is not the < character of the first tag; those "raw" blocks are then held separately in the Markdown tree, and the serializer to HTML passes the raw blocks through without HTML escaping or embedding in another tag.]

[2] Why not just use backticks? We could, of course. One problem here though is that there is no good way to distinguish those literate-code blocks which are commentary and those literate-code blocks which are code to be executed. If you don't fix that now, it will probably be fixed in SM2.


They worked on it for two years before going public.


So you equate criticism with entitlement. Too bad.

Criticism is valuable. If I release anything into the public, I expect criticism. Heck, I even look forward to being criticized because it means people care about what I did (even if I did it wrong!)


http://daringfireball.net/projects/markdown/license

"Neither the name “Markdown” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission."

It's a total breach of license, and on a fairly standard BSD too.

I'm seriously getting tired of copy-pasting this, and surprised that ya'll are defending this whole thing. but I guess I shouldn't.

Anyway, I better jet before I start trolling. In the end, I'm ok with making a better markdown, but completely opposed to breaking the law.


Interesting point, but it fails a number of crucial tests:

⚫ The spec isn't code. More to the point, it's not a derivative work of Markdown itself. The BSD license applies to the licensed work as a whole. What it doesn't apply to, specifically, is other works which refer to the original. That is: copyright is not trademark, and you cannot embed an effective trademark license within a copyright license.

⚫ Works (parsers, libraries, etc.) based on the Standard Markdown spec which do not incorporate any of Gruber's original Markdown code are themselves not derivative works of Gruber's original BSD-licensed work, and again, are not governed by the license.

⚫ There are a number of things copyright doesn't apply to: functional works (SEGA v. Accolade), simple compilations of facts (Feist v. Rural Electric), and APIs or standards (I'm aware of a few cases involving these, though I don't recall specifically if any are precedent). Specifically, however, a copyright license cannot cover facts or functional design. It only applies to a specific expression of an idea. Paraphrase that idea and you're scott-free.

So while you raise an interesting point, it's moot here.


IANAL but that license term only applies to derivatives of the Markdown code.

Markdown is written in Perl. Standard Markdown has released implementations in C and Javascript. It seems unlikely that they are using Markdown code, therefore I don't think they are bound by anything in the Markdown license.


Yep, that license seems to apply to the code.

Now we get into the question of whether an abstract API can be copyrighted -- we all assumed no, although the Oracle v. Google case (about Java), suggested an abstract API could be copyrighted. With, most of HN thinks, fairly disastrous consequences if that holds as law, although I think it's probably going to be litigated more.

I frankly doubt Gruber was assuming that his abstract (poorly specified!) API for Markdown could be copyrighted. And almost everyone on HN seemed to think it would be disastrous if that were the case regarding Java; it would be somewhat hypocritical if the same people think it somehow ought to be the case regarding Markdown.


They are bound by the fact that it's about the name 'Markdown', the code/spec is irrelevant, hence the words 'Neither the name "Markdown"...". Discussions about code/spec/philosopy are irrelevant and would be deemed as such in court.

It's too bad because I actually really like this, but I think it's not going to take off because of all the courtroom drama.


You can't just write whatever you want in a license and bind people who aren't using/modifying/distributing your software.


So is any fork of markdown that uses the word markdown count as a violation of the BSD license until you get written permission from gruber himself?


<em>Markdown is a popular language with an ambiguous, poorly specified spec and buggy default implementation...</em>

That never hurt Perl... :-) Sorry, couldn't resist.

Your sentiments are well-taken.

No one "owns" Markdown, and quibbling over words is unproductive.

There is a concept: "a simple in-line markup language that is easy to write and maps clearly to HTML". One instance of that concept is a specific, buggy, ill-defined language called "Markdown".

"Standard Markdown" is another instance of the same concept. This is the way English works: adding modifiers to words that designate one instance of a concept to designate other instances of the same concept.

Getting to an EBNF grammar from the spec would be awesome, but the spec is a great place to start. All spec generation is tendentious, and it is common to have multiple people highly invested in the generation of different specs for the same nominal standard. Politics, as Aristotle tells us, is the master art. This is why.


"This is the way English works: adding modifiers to words that designate one instance of a concept to designate other instances of the same concept."

So you think anyone would think it was totally cool if someone released things called 'Standard Excel' and 'Standard MacOS'? That wouldn't get by without criticism, why should this?


I think you can draw an analogy to trademark law here. I know we're not talking about anyone registering trademarks and suing each other here, but when it's good, law tends to reflect common practice, and I think trademark law actually does that pretty well. I think most reasonable people look at the way trademark law treats issues like this as being more or less how they think the issue should be treated.

That is to say, you can't really spend a decade or two letting "Markdown" sort of mean whatever people want it to mean, and only then cry foul for the N+1 person to use the term. "Excel" and "MacOS" have not been genericized (is that a word?) to the same extent that "Markdown" has.


It's right in the darned license for markdown:

"Neither the name “Markdown” nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission."

Legalities aside, this is a dick move that goes directly against the license.

As for your genericization argument, what do you think are the first 2 results in Google for markdown? And the wikipedia description (3rd result) includes both the syntax and the 'tool'. Popular != Generic


Popularity has no relationship with being generic. The classic case, Kleenex, has the entire first page of Google hits referring to the product of the same name, but it's still a generic mark now. Even if popularity did matter, Google isn't necessarily the best way of measuring that, and if forced to guess, I'd say the percentage of markdown files processed by Gruber's perl implementation is very small.

Also, if the problem with "Standard Markdown" is the use of the term "Markdown", then the license is relevant. But that means that everyone else (Multi-markdown, Markdown Extra, Github-flavored Markdown, Python-flavored Markdown, Markdown Python, Markdown PHP, etc. are all equally guilty of a "dick move". I don't see anyone arguing that calling it "XXX Markdown" is a problem. It's calling it Standard Markdown that they have a problem with, and the license doesn't really address that without also addressing everything else.

I do get the reaction to some degree. I would stop short of calling it a "dick move", but maybe it's a bit tone-deaf admittedly. I just think that the horse has left the barn on taking offense that someone implemented something sort of like what you called "Markdown" and called their version some form of "Markdown".


> Also, if the problem with "Standard Markdown" is the use of the term "Markdown", then the license is relevant. But that means that everyone else (Multi-markdown, Markdown Extra, Github-flavored Markdown, Python-flavored Markdown, Markdown Python, Markdown PHP, etc. are all equally guilty of a "dick move".

Gruber is OK with "Github flavored Markdown" because the name makes it sound like a fork. "Standard Markdown" does not sound like a fork. See my earlier comment in this thread for a cite to Gruber verifying that this is his problem with the name.


Maybe we're just too immersed in our own terminology, but to me, "Standard X" (e.g., Standard ML) refers not necessarily to the notion that "our version owns the concept of X" but that "our version of X is standarized". Common Lisp took a slightly different term, but that really reflected their goal with building the language spec -- they weren't after any old standard, they specifically wanted to incorporate common features from the bunches of different Lisp implementations. Calling this "Common Markdown" doesn't quite capture the right meaning, I think.

The thing that makes this weird in that context is that I can't think offhand of any examples where the inventor of "X" has been opposed to the idea of doing anything about the mass of incompatible versions floating around. Usually, you have a language, there are a few implementations, and then everyone gets together to nail down what an "official" version should do. My gut is that at some point, if after a decade of pleading, the maintainer of an "official" version is not stepping up, the community is OK to go on without them, but clearly there are arguments on both sides there.


> So you think anyone would think it was totally cool if someone released things called 'Standard Excel' and 'Standard MacOS'?

I don't think anyone would dispute that both Excel and MacOS are, in fact, trademarks established through bona fide use in commerce and which have not been genericized, and which, therefore, currently serve to clearly designate the origin of particular commercial products.

I don't think the same consensus exists with Markdown. In fact, I suspect if there was a general consensus that "Markdown" had some clear meaning, whether or not it was to designate the original of a particular commercial product, the "Standard Markdown" project wouldn't exist.


The important thing is that you managed to feel superior to both camps, that's what matters. Criticism is criticism, take it for it's merit, not because it hurts people's feelings.


I can't help but think about http://xkcd.com/774/ when I read this comment


Well, you are using the same tactics to feel superior to them all.


Sorry, that accusation expires after one use per conversation.


At least someone understood the reference... :-)


Never underestimate HN's lack of humor ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: