
Learning Parser Combinators with Rust - lelf
https://bodil.lol/parser-combinators/
======
indentit
This is a very well written and easy to follow article about how parser
combinators work and how to write one in Rust. I know a little Rust, and I
have worked with parser combinators before (for example, `superpower` written
in C#), so I admittedly got bored when the article moved on to whitespace
handling, but that's because it explains things so well which I already
understand :)

Personally I prefer to use a DSL which uses regex (`.sublime-syntax` files
come to mind, though they output "scoped" tokens which, depending on one's use
case, must then be iterated over to build an AST etc.) these days, because a
parser written in an easy to understand DSL is (for me, at least) generally
quicker to reason about and faster to write/tweak than using code that uses a
parser combinator library - no need to recompile etc.

This has the additional advantage of being able to recover from errors
gracefully, and allows me to parse any file for which a language grammar for
syntax highlighting already exists.

Obviously there are some downsides to my approach, as regex patterns can take
non-linear time to match depending on the engine used and backtracking etc.
But is useful for lookaheads simplifying knowing which state to push into etc,
which I think is harder to do in code.

So I'm interested to hear other people's opinions on this, how you solve
similar problems etc and build up your AST/structs with the minimum of effort
and maximum efficiency.

Specifically, the Rust `syntect` crate is able to parse files into tokens
using the aforementioned `sublime-syntax` grammars and is pretty quick so
tends to be fast enough for me. EDIT: note, this crate is also used by `bat`,
the popular `cat` replacement

~~~
rightbyte
I usually use re2c or yacc with lex. Return some object from the parser for
each token type and put it in a tree node.

With re2c the code is quite clean with inlined reg ex:s and logic in tagged
comments.

These function chaining approaches gets out of hand for any non trivial
problem you could have solved with just writing some loops anyway by hand.

How are you supposed to read code like this example with all these
parentheses.

I wonder how the compiler optimizes the chains. You would want the char
literals to be immediate literals in the machine code and I am not too sure
the compiler are able to produce that from this. I also wonder about stack
depth.

~~~
intertextuality
> How are you supposed to read code like this example with all these
> parentheses.

Once you get to the end, the examples are quite readable. The initial
implementation was left in to demonstrate an iterative approach.

If you look Pom, which is an actual crate that uses these sort of parser
combinators, the code is quite readable also.

[0]: [https://github.com/J-F-
Liu/pom/blob/master/src/parser.rs](https://github.com/J-F-
Liu/pom/blob/master/src/parser.rs)

[1]: [https://github.com/J-F-
Liu/pom/blob/master/examples/json.rs](https://github.com/J-F-
Liu/pom/blob/master/examples/json.rs)

