
Show HN: Fast and Extensible Parser for Markdown in PHP - erusev
http://parsedown.org/
======
ethomson
I don't need another Markdown Parser. I need another Markdown.

Markdown has outgrown its original spec, yet Gruber both clings on to it and
is unwilling to update it. Meanwhile, different websites and different parsers
proliferate, each adding new extensions with varying degrees of usefulness and
compatibility, all under the name "Markdown" or some variation.

I wish GitHub would drop the name "GitHub flavored Markdown", give it a clever
new name, a cleverly branded website and use their bully pulpit to cast off
Gruber's shackles and effect change.

~~~
NaNaN
Agreed. Most new Markdown parsers for higher speed have different edge cases,
including the famous _marked_.

If you don't mind the frustrating syntax tree, you can try Strictdown (not
Markdown), and get some insights to make a better one. (I'm now lazy to update
it.)
[https://github.com/jakwings/strictdown](https://github.com/jakwings/strictdown)

~~~
rhythmvs
Strictdown looks thorough: well-done! Added to the inventory of Markdown
parsers and resources.¹

SkrivML² is another thoughtful take on lightweight markup, next generation.

¹ [https://github.com/rhythmus/markdown-
resources/](https://github.com/rhythmus/markdown-resources/) ²
[http://markup.skriv.org/](http://markup.skriv.org/)

~~~
NaNaN
Thank you! Wonderful collections!

------
simonw
It's important to remember that most markdown implementations (including his
one) cannot be used to provide a safe mechanism for authoring user generated
content without opening a site up to XSS vulnerabilities, since markdown
allows arbritrary HTML markup.

~~~
McGlockenshire
Easily solved by proper use of HTMLPurifier on the output.

~~~
phpnode
Thus negating any speed improvements in the markdown parser....

~~~
Navarr
Considering you have to run any markdown parser through a sanitizer, the speed
improvements still matter.

~~~
phpnode
The markdown parser should be able to do it in an ideal world. Htmlpurifier is
very slow.

edit:

To whoever downvoted me, I'm sorry, was I wrong? The markdown parser has to
look at every input byte, obviously it's better to do the HTML sanitation at
this level because the HTML parser must also look at every input byte, so,
combine them into one pass...

Running HTMLPurifier on the output of the markdown parser is inefficient -
it's sanitizing known good elements not just the potentially bad ones, so
you're giving it more work to do.

------
brute
Different markdown editors seem to be in disagreement how to parse the
following:
[https://gist.github.com/anonymous/810ae1f7d52bcfffa1ef](https://gist.github.com/anonymous/810ae1f7d52bcfffa1ef)

If the second empty line marks the end of the list block, the indented html
(code block) should preserve tags

~~~
stevekemp
Yeah it looks like my site fails at this -
[http://markdownshare.com/view/96996ce5-63bc-45ca-
af49-ba18cb...](http://markdownshare.com/view/96996ce5-63bc-45ca-
af49-ba18cb53ace1)

------
Navarr
I remember seeing this on /r/PHP, and one of the top comments there was about
it using Regex instead of parsing it like a language.

However, I also recall that it's thanks to using regex that it works so
quickly. So I figured I'd get this argument out of the way before someone else
brought it up.

~~~
nkozyra
I think semantics parsing with lexer/tokens is better for a lot of things but
it sometimes overkill when the patterns are predictable and simple.

That said, has there ever really been an issue with _speed_ as it pertains to
markdown translation? I can't imagine it's an everyday, practical concern.

~~~
seer
As with most tech, if there such a leap in speed (about 10 times) then a lot
of other applications become possible. You could remove a layer of caching
because its not needed anymore, thus reducing your app complexity. But apart
from that, imagine how many places use markdown? If people all move to a 10
times faster implementation, that an incredible reduction in wasted cpu
cycles.

~~~
nkozyra
My point was not that we shouldn't work to produce even small efficiencies
(which, yes, cascade into larger aggregate ones).

It was more wondering whether speed in markdown parsing is such a concern that
this would merit a marquee 'selling' point.

------
nodesocket
We started with a server side Markdown parser, but switch to a JavaScript
parser ([https://github.com/chjj/marked](https://github.com/chjj/marked)).
Really there is no reason to do this work on the server.

------
zaf
I've worked with several markdown implementations and parsedown is my current
choice due to my main constraint - speed. Great work and thanks for sharing.

~~~
debaserab2
If speed is your top priority you may also want to look at Sundown, which can
be installed as a PHP extension and is likely faster since it's just C.

[https://github.com/chobie/php-sundown](https://github.com/chobie/php-sundown)

------
alphadevx
Looks great, happy to see the Markdown Extra extension. With regard to
performance, I've always gotten around the slowness of the original Markdown
parser by making liberal use of caching, but warming the cache is still
painful for a CMS. Will look to migrate to this.

------
phpnode
Parsedown is certainly very fast, but I wouldn't call it "extensible". CeBe's
markdown parser is nearly as fast, but focusses on being very easy to extend,
so it's trivial to add custom syntax elements, see
[https://github.com/cebe/markdown](https://github.com/cebe/markdown)

(CeBe's library is inspired by parsedown)

~~~
seer
> Parsedown is certainly very fast, but I wouldn't call it "extensible".

Parsedown is extensible and it already has been extended. There's a well
working extension of Parsedown that adds support for Markdown Extra. It's
called Parsedown Extra. It can be found at
[https://github.com/erusev/parsedown-
extra](https://github.com/erusev/parsedown-extra)

~~~
lacksconfidence
It is possible to extend, but extensible requires more. In this case,
ParsedownExtra looks to directly extend the Parsedown class. This is fine for
a single extension, but it discourages utilization of multiple independent
extensions.

------
tshadwell
Perhaps I am looking at this wrong, but I don't see why you would use a
Markdown parser written in PHP if you're looking for speed. Case in point the
parsedown system is fast because it has heavy use of regular expressions,
which parse faster and run faster than the host language-- it already relies
on a language other than PHP to essentially emulate parts of a well-written
lexer.

As debaserab2 says[1], if you are looking for speed, consider PHP extensions.

In my opinion, writing a system like this is a misappropriation of PHP, which
evolved from and works best as a hybrid templating/scripting language. It
becomes a powerful development platform when its extensive library of C
functions is used to do most heavy lifting.

[1]
[https://news.ycombinator.com/item?id=7784219](https://news.ycombinator.com/item?id=7784219)

~~~
NaNaN
If someone has done it well without compiling C library, why don't you try it
(on a shared server, maybe)?

~~~
tshadwell
I don't understand what you mean. Could you also explain the negativity around
this comment? I didn't think it was a badly voiced opinion, and karma is not
meant to be used to show how much you agree or disagree with someone.

~~~
NaNaN
I had no right to downvote your reply. (even now) ;) Please take it easy. I
just want to say that using C library is not always preferred.

~~~
tshadwell
I agree, but I did not mean to make any sweeping statement in that regard, but
in the case of a standardised markup like markdown, there are already suites
of field-tested C libraries that provide much better speed than this library
would; for markdown content, this provides a better experience for your users.

~~~
NaNaN
Agree, too. Well, plain text sucks. So I like to use smiley symbols now. ;-)

~~~
tshadwell
:¬)

