Hacker News new | past | comments | ask | show | jobs | submit login
Left-Recursive Peg Grammars (medium.com/gvanrossum_83706)
52 points by pauloxnet on Aug 26, 2019 | hide | past | favorite | 6 comments



Great blog. I haven't used PEG and have learned a lot from this post. However, handling left-recursive in PEG seems complex.

I've recently written a Java parser by hand (https://github.com/tanin47/javaparser.rs) and aimed to emulate a bottom-up parsing as much as possible.

The hand-written bottom-up parsing approach seems more intuitive because it enables us to construct`left` + `right` immediately before looking further.

For the same problem, the hand-written bottom-up parsing approach would look like below:

    def expr1(input):
      left, input = expr2(input)
      return expr1_tail(left, input)

    def expr1_tail(left, input):
      if input[0] == '+':
        right, input = expr2(s[1..]);

        new_expr = BinaryOp(left, input[0], right);

        return expr1_tail(new_expr, input);
      else:
        return left, input

    def term(s):
      if input[0] == '(':
        new_expr, input = expr1(input[1..])
        assert input[0] == ')'
        return new_expr, input[1..]
      else 
        # parsing numbers here
        return number, the_rest_of_input
(Please excuse my python... I'm a bit rusty)


You could even implement this with a loop instead of a recursive call, to avoid running out of stack.

Please note that memorization is a simple trick to solve a lot of parsing problems. I developed an intepretting parser, that basically does back-tracking, which is sped-up with memorization. Because the core interpretter is relatively small (fits in primary cache), it performs quite well. You simply give it a very natural looking grammar, with some annotations, and it builds you an abstract parse tree. I also implemented an unparser, which can be used to produce nicely formatted output for each abstract parse tree (just using the same grammar, with some annotations about where to insert line breaks and indentation). For more details see https://github.com/FransFaase/IParse


I got all the way to the end of the article without noticing who the author was, thinking "what a clearly written piece -- I should keep an eye out for this author in the future".

Then I realized it was an author I knew.


I’m not sure if taking the longest match is necessarily right. Imagine a (nonsensical) rule like:

  LeftRec <= LeftRec ‘+’ Term
           / “ok”
           / [a-z]*[0-9]
According to PEG, “ok3” should consume only “ok” here, and leave the 3.


Well, I had the same epiphany 5 years ago, and I implemented it in a C++ library here:

https://github.com/axilmar/parserlib

It's nice to be able to handle left-recursive grammars in a PEG parser. It feels liberating. Although I never used it to write my dream programming language :-).

Note that the same thing can be done via a stack of alternate parsers: when the recursive parser is on top of the stack, then the one below it shall be chosen.


For those who just want to read the article without Medium nonsense: https://outline.com/jWuKxc




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: