
Show HN: Pest – Fast parser generator written in Rust - dragostis
https://github.com/pest-parser/pest
======
jorams
I'm a bit confused by the benchmark. I looked through the benchmark code and,
without _really_ understanding what I'm looking at, the result of the nom
parser looks much closer to the "pest (custom AST)" benchmark than the "pest"
benchmark. The description above the image, however, primarily compares the
"pest" result.

Am I misunderstanding something or is the comparison not fair?

~~~
dragostis
I've added a clarification. The point of the benchmark is not to compete with
other projects, it's merely there to put the parsing speed (and not
necessarily the processing that comes after it) in a representative window of
performance.

------
weberc2
This is a neat project! I might take a look at this if/when I get started on
my dream language. :p

One constructive criticism, which may be only an unpopular opinion, but I find
creative operator overloading unnecessarily hard to read. And I don't think my
opinion is entirely subjective since, by definition, you have to learn new
semantics whereas a good function name would make the meaning obvious. Maybe
this syntax would be familiar to people who are already familiar with formal
grammar notations?

~~~
dragostis
I agree with your point. I tried my best not to sway too far off of some of
syntax I've seen in other PEG projects, while also keeping it compatible with
Rust's standard macros. While pest does have this constraint any longer with
the new beta, changing the grammar too much would have been a nuisance for
anyone who had grammars written in the older versions.

~~~
weberc2
Understood. Thanks for taking the time to explain!

------
DC-3
What type of parser does this produce?

~~~
dragostis
For now, it produces a recursive descent parser. Packrat parsing is an open-
question since I'm afraid that adding a memoization layer all throughout the
parser will lead to a consistent general slowdown.

~~~
DC-3
Interesting. I'll try and find some time to read the source - am interested to
see how this is implemented.

~~~
dragostis
A good place to start would be in the manually written example. [1] My current
plan is to try and limit the the use of memoization such that it still
guarantees linear parsing, but it doesn't memoize unless necessary.

[1]: [https://github.com/pest-
parser/pest/blob/master/pest/example...](https://github.com/pest-
parser/pest/blob/master/pest/examples/parens.rs#L20)

~~~
DC-3
I appreciate the link - I'm teaching myself about parsers right now and like
the rest of HN I like Rust programs ;)

------
OtterCoder
I remember playing with PEGjs back in the day, making a toy implementation of
JS in JS. It had terrible Unicode support though. To build a token out of
valid Unicode non-whitespace characters took pages and pages of range
definitions. Is that any better here?

~~~
runevault
Rust by default uses Unicode for strings. I would be surprised if this had
problems with them.

~~~
OtterCoder
The difficulty wasn't in the engine. JS itself is just as good with accepting
Unicode characters as Rust, both in code and strings. The problem was with the
PEG implementation. Without some sort of shorthand for Unicode families,
defining programming languages that are valid in multiple real-world languages
becomes a serious burden.

~~~
yellowapple
This is one of the things Perl 6 does _really_ well,).

------
turboladen
Been using this for a couple weeks now and dig it. The DSL feels familiar and
the speed is yummy. I'd been using LALRPOP for some months; I dig it too, but
just couldn't get the speed I was after from it.

------
hobofan
Shouldn't the title better by "Pest - Fast, modern parser _in_ Rust"? The
current title made me think that this would be a Rust AST parser, similar to
syntex.

~~~
reificator
Exactly what I came into this thread expecting. Was curious why when it looks
like there's already great IDE support for Rust with existing tools.

~~~
dikaiosune
There's actually a parser for rust called syn, and it's very useful for
procedural macros, as the current API exposes a stream of tokens, not a full
AST.

~~~
steveklabnik
To elaborate a bit on that, exposing an AST directly means that you're tied to
that version of the AST; this was considered really hard for compatibility
over versions. Token streams are much easier to define an interface for, and
more flexible: you're not just stuck with whatever AST representation we'd
have given you.

For example: [https://doc.rust-lang.org/book/first-edition/procedural-
macr...](https://doc.rust-lang.org/book/first-edition/procedural-macros.html)

