Hacker News new | comments | ask | show | jobs | submit login
Parsing with Derivatives: A Functional Pearl (2011) [pdf] (might.net)
84 points by fanf2 6 months ago | hide | past | web | favorite | 16 comments



Parsing is hard. I've worked at several companies where I was literally the only person who understood how to write a recursive descent parser and similarly the only person who understood how maintain or modify parsers of any sort.

The thing is, recursive descent, already fairly hard, seems several steps easier to me than a construct like function-derivatives (or parser-combinators).

And as far as I call follow this paper, the process seems similar to recursive descent except instead of generating code, you generate a bunch of the insides of the code and apply it as an array or something.

And that's thing. The view of process is nice and regular for someone who likes to look at languages abstractly but you have to be leveraging a lot of the laziness of functional languages to make this work.

So altogether, I tend to question whether approaches are really worth the effort.


Having worked with both recursive descent and parser combinators, I can say that I strongly prefer parser combinators. Haskell has some libraries that are fast, easy to use, and have good documentation.

I've never managed to find a parser combinator library that I love outside of the Haskell family of languages. I'm not sure why that is.


It might have something to do with most parsing libraries having a Monad/Monoid and Alternative type. Those two together give a truly excellent programmer experience for writing pragmatic, readable, and correct parsers.


In my opinion Brzozowski’s derivatives of regular expressions are worth the effort. I think it's a historical fluke that it was overshadowed by other parsing techniques.

I have some crude Python code that (kind of) illustrates the brevity and elegance of the idea: https://gist.github.com/calroc/ad4321dc4c43d5f3e09d39283461b...


Parser combinators are basically just libraries for writing recursive descent parsers, aren't they?


Mostly, yes. They can be applied with a different formalism though.

See https://github.com/pygy/compose-regexp.js which uses a parser combinator API with regexps as backend.


I don't think I'd call that a parser combinator, it's a regex DSL - it's just a more verbose way of writing out a regular expression, with a 1:1 translation between the DSL and a regex.


A parser combinator is a function that takes one or more parser as argument and returns a parser with the same API as result.

combine-regexp is a parser combinator lib backed by regexps.

Traditional parser combinators are just a recursive descent parser DSL, with a 1:1 translation between both representations.


"A parser for a thing is a function from a string to a list of pairs of strings and things."

> Traditional parser combinators are just a recursive descent parser DSL, with a 1:1 translation between both representations.

Yes, exactly - a recursive descent _parser_ recognises a grammar and produces the parse result for the grammar. Parser combinators take little parsers and make bigger parsers.

combine-regexp doesn't make parsers, it makes regular expression grammars, that you then feed to some _other_ parser that actually does the parsing based on that grammar.


The line is blurrier in my eyes.

The library combines regular grammars that need an interpreter (the `match` method, which is bundled with the regexp object), whereas parser combinators work on code written in the same language (that is ultimately interpreted by the CPU).

Edit: the parser in this case is a `(Regexp.prototype.match, regex.source)` tuple, whereas in the case of a praser combinator lib in a language that compiles to machine code, it would be `(call instruction, instructions list)`.


A lot of parsing techniques have specialized contexts where they are useful. Hand written recursive descent dominates for production languages because it is one file and error reporting and recovery is very important. Parsing combinatory are great for small embedded languages, parsing generators are great when efficiency or concenience is desired, and so on.


Delighted to see they called their proof of concept library `derp` for derivative parser :-)

http://www.ucombinator.org/projects/parsing/


This paper is hilarious, and Matt & co are geniuses. It’s 6 1/2 pages long and contains a number of gems:

    Since a context-free language is a recursive regular language, it is tempting to use the same code for computing the derivative. From the perspective of parsing, this has two chief drawbacks: 1. It doesn’t work. 2. It wouldn’t produce a parse forest even if it did.


You can't use indentation for block quoting on HN

> Since a context-free language is a recursive regular language, it is tempting to use the same code for computing the derivative. From the perspective of parsing, this has two chief drawbacks: 1. It doesn’t work. 2. It wouldn’t produce a parse forest even if it did.


I'm working on a type-inferencer for the Joy language (purely functional, stack-based, concatinative) and I wanted to be able to specify a kind of Kleene star type variable that can stand for zero or more of some other type. I realized I could use Brzozowski's derivative of the K-star at the right place in the unification algorithm. You "split the universe" into two and in one the K-star variable disappears and in the other it "spawns" a new type variable of it's, uh, "starred type". In both cases unification proceeds as normal. It seems to work fine.

The fact that you can splice the Kleene star part of the dRE algorithm into the unification algorithm suggests that you should be able to do the same with the other parts of it and be able to specify union types and intersection types. I'm kinda out of my league I think, and a lot of the papers that seems relevant are behind paywalls.


Scala implementation from one of the authors here: https://github.com/djspiewak/parseback




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: