
You could have invented Parser Combinators - lettergram
http://theorangeduck.com/page/you-could-have-invented-parser-combinators
======
mden
This is an aside, but I really liked the presentation of this article. It
starts from the right place, builds at the right speed, and relates to the
reader in a fun way. The format of statement followed by the same statement
but expressed in code was great. Thanks for writing/linking it.

------
Strilanc
I agree. You can only write so much object-to-data packing code followed by
the redundant data-to-object packing code before you realize there's an
exploitable pattern here.

(I had exactly this experience, in fact[0][1]. My name was way worse, though:
"Jars".)

0: [http://twistedoakstudios.com/blog/Post4708_optimizing-a-
pars...](http://twistedoakstudios.com/blog/Post4708_optimizing-a-parser-
combinator-into-a-memcpy)

1:
[https://github.com/Strilanc/Tinker/blob/master/Warcraft3/Pro...](https://github.com/Strilanc/Tinker/blob/master/Warcraft3/Protocol/Game%20Action%20Protocol.vb#L459)

------
pygy_
It happened to me while re-implementing LPeg [0] as a pure Lua library[1].

While targeting Lua, the original is actually written in C. It compiles
patterns to bytecode interpreted by a custom VM.

I was wary to write an interpreter in a language that's already interpreted
and chose to compile patterns to Lua functions instead. It turns out that the
most straightforward way to do that is by using closures higher level
functions, in a way that I later learned was known as parser combinators...

————

0\. [http://www.inf.puc-rio.br/~roberto/lpeg/](http://www.inf.puc-
rio.br/~roberto/lpeg/)

1\. [https://github.com/pygy/LuLPeg](https://github.com/pygy/LuLPeg)

------
tel
I recently wrote a little intro to parser combinators in Haskell. It may be of
interest to readers here:
[https://gist.github.com/tel/df3fa3df530f593646a0](https://gist.github.com/tel/df3fa3df530f593646a0)

In particular I exploit the structure of a parser combinator type as a monad
transformer.

------
rkowalick
The title seemed to be based on Timothy Chow's article "You Could Have
Invented Spectral Sequences", a subject of mathematics that is regarded as
unintuitive and advanced. ([http://www.ams.org/notices/200601/fea-
chow.pdf](http://www.ams.org/notices/200601/fea-chow.pdf))

~~~
sprobertson
I was going to suggest it was based on Dan Piponi's slightly more relevant
"You Could Have Invented Monads (And Maybe You Already Have)" [1] but after a
quick dig found a comment seemingly by Dan Piponi [2] saying his article was
indeed inspired by "You Could Have Invented Spectral Sequences". So I guess
either way that's the original spark.

1: [http://blog.sigfpe.com/2006/08/you-could-have-invented-
monad...](http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-
and.html) 2: [http://blog.ezyang.com/2012/02/anatomy-of-you-could-have-
inv...](http://blog.ezyang.com/2012/02/anatomy-of-you-could-have-
invented/comment-page-1/#comment-3465)

------
nraynaud
I like parser combinators in theory, but I never found a fast parser
implementation. Recently in javascript, I resorted to parsing the easy common
stuff with a fast regex, and the general stuff with the parser combinator.

I had the same problem in java.

~~~
pseudonom-
Attoparsec is pretty fast.
[http://www.serpentine.com/blog/2014/05/31/attoparsec/](http://www.serpentine.com/blog/2014/05/31/attoparsec/)

------
Animats
That approach is a search algorithm which spends a lot of time exploring dead
ends, to solve a problem for which there are much better solutions. For some
grammars, it looks like it will lead to combinatorial explosion. I did
something like this in SNOBOL a long time ago. Someone commented "That's the
slowest program I've ever seen that does something useful".

~~~
yxhuvud
Well, in practice, it is a variant of recursive descent. It has pretty bad
performance for certain grammars, but it is a perfectly good (and fast)
alternative if you feel like spending time rewriting the grammar by hand to
fit.

For example, nowadays gcc use recursive descent to parse C.

Also, not handling arbitrary CFG grammars well is extremely common, it is a
feature that LL, LR, LALR, PEG and other all share. Is it obviously worse to
have exponential time parsing something if compared to not parsing it at all?
No, but if you need to handle ambiguity you probably should use a different
algorithm.

------
kabdib
Pretty neat.

One thing I've found that's nice about recursive descent parsers written by
hand is that you provide useful and meaningful error messages. 'yacc' was
always just saying "error" (and if you helped, the lexer might even supply a
line number).

A message like "Expected a number for operator +" is way better than "syntax
error line 109 around column 30".

~~~
Animats
That was bad design in "yacc". To get decent diagnostics out of a parser
generator of that type, error outputs need to indicate the last point at which
the parser was on a unique, successful path, and where the parser input stream
is at the error. The problem in the input is between those two points.

Between those points, the parser was working on several alternate possible
parsings, none of which could be completed. Give the names of those grammar
constructs to the user. So an error message should look like:

    
    
        struct point { float x, float y };
                              ^^^^^^^^^ Expected one of:
                              <structure-variable-name>
                              ;
                              }

------
windor
I have played with parser combinators in Scala for a while. The post makes it
interesting to read. I like it!

------
segmondy
For those that don't know, Prolog allows of parsing in very nice ways and very
easily using Domain Clause Grammar.

parse(Output)--> [Input], {member(Input, [ab, cd]), atom_codes(Input,
Output)}.

test(In):- phrase(parse(L), [In]), writeln(L).

?- test(ab). [97,98] true .

?- test(cd). [99,100] true.

?- test(xy). false.

------
opensandwich
I currently gave writing a parser combinator in R for a laugh. It might be
interesting to someone here.
[https://github.com/chappers/Ramble](https://github.com/chappers/Ramble)

------
Clever321
From someone not at all involved in the field of computer science where this
might be applied... where might this be applied in industry? Very compelling
stuff, but I don't have a particular use case...

~~~
zaphar
I work in industry and I recently had to implement a special purpose
templating language for our product. For various reasons none of the off the
shelf ones quite did what we needed.

As a result I had to write a parser _and_ interpreter. The thing about a lot
of these computer science things is that you don't need them until you do. And
then when you do it's nice to know about how they work.

~~~
MarkL4
Can anyone recommend an article for interpreters that is exactly at the level
of the original link?

~~~
minikomi
Perhaps
[https://www.hashcollision.org/brainfudge/](https://www.hashcollision.org/brainfudge/)

------
eklavya
Thanks for this, I really enjoyed reading it :)

------
Jare
For some reason, this explanation makes me think about Behaviour Trees in AI
more than it does about parsing.

