Parsing is hard. I've worked at several companies where I was literally the only person who understood how to write a recursive descent parser and similarly the only person who understood how maintain or modify parsers of any sort.
The thing is, recursive descent, already fairly hard, seems several steps easier to me than a construct like function-derivatives (or parser-combinators).
And as far as I call follow this paper, the process seems similar to recursive descent except instead of generating code, you generate a bunch of the insides of the code and apply it as an array or something.
And that's thing. The view of process is nice and regular for someone who likes to look at languages abstractly but you have to be leveraging a lot of the laziness of functional languages to make this work.
So altogether, I tend to question whether approaches are really worth the effort.
Having worked with both recursive descent and parser combinators, I can say that I strongly prefer parser combinators. Haskell has some libraries that are fast, easy to use, and have good documentation.
I've never managed to find a parser combinator library that I love outside of the Haskell family of languages. I'm not sure why that is.
It might have something to do with most parsing libraries having a Monad/Monoid and Alternative type. Those two together give a truly excellent programmer experience for writing pragmatic, readable, and correct parsers.
In my opinion Brzozowski’s derivatives of regular expressions are worth the effort. I think it's a historical fluke that it was overshadowed by other parsing techniques.
I don't think I'd call that a parser combinator, it's a regex DSL - it's just a more verbose way of writing out a regular expression, with a 1:1 translation between the DSL and a regex.
"A parser for a thing is a function from a string to a list of pairs of strings and things."
> Traditional parser combinators are just a recursive descent parser DSL, with a 1:1 translation between both representations.
Yes, exactly - a recursive descent _parser_ recognises a grammar and produces the parse result for the grammar. Parser combinators take little parsers and make bigger parsers.
combine-regexp doesn't make parsers, it makes regular expression grammars, that you then feed to some _other_ parser that actually does the parsing based on that grammar.
The library combines regular grammars that need an interpreter (the `match` method, which is bundled with the regexp object), whereas parser combinators work on code written in the same language (that is ultimately interpreted by the CPU).
Edit: the parser in this case is a `(Regexp.prototype.match, regex.source)` tuple, whereas in the case of a praser combinator lib in a language that compiles to machine code, it would be `(call instruction, instructions list)`.
A lot of parsing techniques have specialized contexts where they are useful. Hand written recursive descent dominates for production languages because it is one file and error reporting and recovery is very important. Parsing combinatory are great for small embedded languages, parsing generators are great when efficiency or concenience is desired, and so on.
This paper is hilarious, and Matt & co are geniuses. It’s 6 1/2 pages long and contains a number of gems:
Since a context-free language is a recursive regular language, it is tempting to use the same code for computing the derivative. From the perspective of parsing, this has two chief drawbacks: 1. It doesn’t work. 2. It wouldn’t produce a parse forest even if it did.
> Since a context-free language is a recursive regular language, it is tempting to use the same code for computing the derivative. From the perspective of parsing, this has two chief drawbacks:
1. It doesn’t work.
2. It wouldn’t produce a parse forest even if it did.
I'm working on a type-inferencer for the Joy language (purely functional, stack-based, concatinative) and I wanted to be able to specify a kind of Kleene star type variable that can stand for zero or more of some other type. I realized I could use Brzozowski's derivative of the K-star at the right place in the unification algorithm. You "split the universe" into two and in one the K-star variable disappears and in the other it "spawns" a new type variable of it's, uh, "starred type". In both cases unification proceeds as normal. It seems to work fine.
The fact that you can splice the Kleene star part of the dRE algorithm into the unification algorithm suggests that you should be able to do the same with the other parts of it and be able to specify union types and intersection types. I'm kinda out of my league I think, and a lot of the papers that seems relevant are behind paywalls.
The thing is, recursive descent, already fairly hard, seems several steps easier to me than a construct like function-derivatives (or parser-combinators).
And as far as I call follow this paper, the process seems similar to recursive descent except instead of generating code, you generate a bunch of the insides of the code and apply it as an array or something.
And that's thing. The view of process is nice and regular for someone who likes to look at languages abstractly but you have to be leveraging a lot of the laziness of functional languages to make this work.
So altogether, I tend to question whether approaches are really worth the effort.