
Easy Parsing with Parser Combinators - lihaoyi
http://www.lihaoyi.com/post/EasyParsingwithParserCombinators.html
======
scott_s
Great write up, particularly building up towards parsing simple expressions.

One benefit - hinted at, but not explicitly said - between this approach and
parser generators is that all of the user's parsing code lives in the same
place. An ANTLR grammar that solved this problem would take about the same
amount of lines - but that wouldn't be enough. You would still need to write
the Java code that takes actions on the grammar rules, and that code will live
in a separate place. Add in the complexity of setting up your build to deal
with generated source code, and you won't want to go the ANTLR (or other
parser generator) approach unless you really have to.

I only compare against parser generators because the other solutions are, to
me, in different classes. Regular expressions are not arbitrarily flexible, so
they're not a full solution. And rolling your own recursive descent parser is
not using an existing library, but going the "hammer and tongs" route (as my
compilers professor used to say).

------
okket
For Swift fans, there is a parser combinators (with implementation) talk by
Yasuhiro Inami:

[https://realm.io/news/tryswift-yasuhiro-inami-parser-
combina...](https://realm.io/news/tryswift-yasuhiro-inami-parser-combinator/)

The last free episode of Swift Talks by objc.io was about parsing techniques
(highly recommended as an introduction to parsing) and it had also a small
section at the end covering parser combinators.

[https://talk.objc.io/episodes/S01E13-parsing-
techniques](https://talk.objc.io/episodes/S01E13-parsing-techniques)

There are at least two parser combinator implementations in Swift on Github
(one is from the first talk above, the other from the objc.io book 'Functional
Swift'):

[https://github.com/tryswift/TryParsec](https://github.com/tryswift/TryParsec)

[https://github.com/kareman/FootlessParser](https://github.com/kareman/FootlessParser)

~~~
parenthephobia
Meanwhile, for Ruby fans...

[http://kschiess.github.io/parslet/](http://kschiess.github.io/parslet/) is a
parser combinator library for Ruby.

~~~
monocasa
Meanwhile, for Rust fans...

[https://github.com/Geal/nom](https://github.com/Geal/nom) is a parser
combinator library for Rust.

------
rbonvall
Hi lihaoyi, what advantages does your library have over
`scala.util.parsing.combinator` and parboiled? I have only used the former,
and while I loved it there were a couple of things that I felt could be
improved.

~~~
lihaoyi
There is a short write-up in the main doc-site
[http://www.lihaoyi.com/fastparse/#Comparisons](http://www.lihaoyi.com/fastparse/#Comparisons)

~~~
rbonvall
Thanks, that's exactly what I wanted to know.

------
bd82
Thanks for the article.

What about Error Recovery capabilities? I did not find any reference in the
docs regarding that.

Is it something you would want to add to FastParse? Do you think it is even
possible without a separate scanning phase?

------
davexunit
Does this parser combinator implementation handle left-recursive grammars?

~~~
lihaoyi
Nope; you need to left-factor them unfortunately. Other parser combinators may
have different implementation algorithms, but this one is naive PEG/recursive-
descent so you need to help it out manually otherwise it'll just infinite-
recurse

~~~
davexunit
Yeah, it's much easier to write a parser combinator implementation that
doesn't support it. Thanks for answering.

For those interested, here is an article about a parser combinator
implementation in Racket that supports left-recursive grammars:

[https://epsil.github.io/gll/](https://epsil.github.io/gll/)

------
fsloth
Do parser combinators have any obvious downsides?

~~~
lihaoyi
Sure! I didn't mention them in the post because it was already getting a bit
long, but here are a few:

\- Unlimited backtracking means it's really easy to accidentally end up with
`O(2^n)` parsing behavior. Most real-life languages you parse won't be like
that, but there's nothing in-the-framework that's stopping you from becoming
exponential. In this way, it's just like recursive descent.

\- Performance isn't as good as code-gen or hand-written; it's basically a
grammar-interpreter rather than a grammar-compiler. We're looking at
(depending on grammar) 4-10x slower than the most optimized hand-written
parser, e.g. in the same benchmark, the FastParse JSON parser runs at ~2
million lines of JSON a second, vs ~8 million lines/second for "industry
standard" parsers on the same platform (JVM) like Jackson. Whether this is a
problem for you is an empirical question.

\- They take some initialization time at-runtime. A hand-rolled or code-gen
parser is ready almost immediately, whereas a parser-combinator parser takes a
bit of time instantiating `Parser[T]` objects and plumbing them around. This
is a one time cost, since the parsers are immutable and can be re-used, but
it's still a cost nonetheless if you're trying to shave milliseconds off your
startup time.

~~~
jhpriestley
It's worth mentioning that, except for the initialization time, these are not
inherent downsides of parser combinators. Any parsing strategy can be
implemented using combinators, for example the "earley" package for haskell
produces Earley parsers using combinators. Writing combinators for an
algorithm which needs to analyze the whole grammar does require explicit
fixpoints (`A -> aA` has to be encoded as something like `fix(lambda A. aA)`,
or in Haskell with `MonadFix` syntax), and unsafe type casts.

~~~
lihaoyi
Yes, you are right. These are limitations of the PEG/recursive-descent style
of parser-combinators, which FastParse is a member of. If you use a set of
parser combinators that use a different execution model (e.g.
[https://github.com/djspiewak/gll-
combinators](https://github.com/djspiewak/gll-combinators)) then the API will
be similar, but the performance and runtime characteristics will be totally
different.

------
fpoling
The article does not touch the issue of error reporting. Can parser
combinators produce sensible messages without much efforts?

~~~
lihaoyi
The main doc-site has a section on debugging and error reporting
[http://www.lihaoyi.com/fastparse/#DebuggingParsers](http://www.lihaoyi.com/fastparse/#DebuggingParsers)

------
DenisM
It would help to have a list of similar libraries for languages other than
Scala.

~~~
stirner
I'm not very deep into parsing stuff, but I believe attoparsec [1] is popular
in Haskell world.

[1]
[https://hackage.haskell.org/package/attoparsec](https://hackage.haskell.org/package/attoparsec)

