
Writing a recursive ascent parser by hand - Rusky
https://www.abubalay.com/blog/2018/04/08/recursive-ascent
======
dwheeler
As an _exercise_ in showing how LR works, this was an interesting article.
However, I'm unconvinced that this would be a good approach to implement in a
real system when the grammar might change in the future (and they pretty much
always do). When things change, I think it'd be difficult to keep it correct,
and there are lots of tools to help (such as bison and ANTLR). I've
implemented several recursive-descent parsers; their simplicity makes it
practical to do directly. But recursive-descent, if you're going to use it,
seems to cry out for tooling. Does anyone have a different experience?

~~~
tyingq
Haven't used it myself, but have heard many rave about Perl's
Parse::RecDescent

~~~
swiley
Recursive descent is very different (if I understand correctly.) Some
languages that can be parsed with a recursive ascent parser cannot be parsed
by a recursive descent parser.

~~~
tyingq
Right. Was responding to _" But recursive-descent, if you're going to use it,
seems to cry out for tooling"_

~~~
barrkel
As I mentioned in an uncle comment, recursive descent permits localised
hackery to switch parsing scheme or dynamically fix tricky context-specific
grammar constructs, which argues against tooling.

Tooling is still very useful for grammar validation though, and it's the way
to go for ad hoc parsing of a simple or new language.

------
Cieplak
This is a pretty cool JSON parser in C:

[https://github.com/vincenthz/libjson/blob/master/json.c](https://github.com/vincenthz/libjson/blob/master/json.c)

------
chubot
I'm pretty sure that Brendan Eich's Narcissus (JavaScript in JavaScript) uses
a "recursive ascent" parser, or what I would call a hand-written bottom-up
parser:

[https://github.com/mozilla/narcissus/blob/master/lib/parser....](https://github.com/mozilla/narcissus/blob/master/lib/parser.js)

I played with this code like 8 years ago and remember being pretty confused by
this style of parser. I was expecting recursive descent but it was something
else.

~~~
Rusky
That definitely looks like normal recursive descent- it has functions
`Script`, `Statements`, `Statement`, `FunctionDefinition`, `Expression`, etc.

~~~
chubot
Recursive ascent / bottom-up also has functions that follow the grammar, so
that doesn't mean anything.

It's a matter of whether the children are resolved before the parents, or the
parents are resolved before the children.

The pushTarget() calls all over makes it look like bottom-up parsing. That
doesn't appear in recursive descent (top down) parsers.

~~~
Rusky
> It's a matter of whether the children are resolved before the parents, or
> the parents are resolved before the children.

Yep, and Narcissus resolves the parents before the children by virtue of
`Script` calling `Statements`, `Statements` calling `Statement`, etc. before
ever actually seeing the bodies of their corresponding production rules.

A bottom-up parser doesn't work that way at all. Its states aren't determined
by what they're about to parse, but by what has already been parsed. The
`pushTarget` function in Narcissus is completely unrelated to bottom-up
parsing.

~~~
chubot
Yeah I see what you mean. There is some stuff that doesn't look like a
recursive descent parser to me, but that's probably because there are some
JavaScript idioms and that's not my main language.

~~~
BrendanEich
The code you mentioned, pushTarget in particular, is for semantic checking,
nothing to do with purely syntactic analysis AKA parsing. In particular,
pushTarget is for break to label and continue to label (the "Target" is a
labeled statement's exit or continue point).

~~~
chubot
OK interesting, thanks. Yes that definitely could be part of my confusion --
that the code is doing more than parsing; it's also doing some semantic
analysis in the same pass.

------
titzer
Early on in the WebAssembly project, when it was still a pre-order AST
encoding, I wrote an LR-style shift-reduce parser for the bytecode. It was fun
exercise, until it came to debugging. It's fast but relatively hard to
maintain, even for a constrained, designed bytecode like WebAssembly. WASM is
now a stack machine, so parsing maintains an abstract stack. It's remarkably
close to the same algorithm, but a lot easier to think about because of WASM's
(now) explicit stack.

------
fjfaase
Many parsing algorithms were designed with the idea that the input cannot
stored in memory as a whole and that you need the parse the file from start to
end. However, nowadays, memory is usually not a problem, especially not for
parsing files that are editted by humans. Who is going to use a gigebyte sized
source file? When you store the source in memory this allows for parsing
algorithms that can go backwards. I implemented a recursive decent algorithm
with back-tracking which stores all parsed non-terminals at the location where
they were first parsed. This results in a simple and relatively fast parser. I
used it to implement an interpretting parser, which takes a grammar and input
file and parses the input file according to the grammar, returning an abstract
parse tree. Actually the parse takes an abstract parse tree of the grammar to
parse the input. The grammar is parsed by the interpretting parser itself
using a hard coded abstract parse tree of the grammar of the grammar. See
[https://github.com/FransFaase/IParse](https://github.com/FransFaase/IParse)

~~~
tom_mellior
> Who is going to use a gigebyte sized source file?

I used to mess around with the ROSE C++ source-to-source transformation
framework ([http://rosecompiler.org/](http://rosecompiler.org/)). One of the
files in there (defining the AST structure for representing C++ programs) was
about 100,000 lines of autogenerated code. It wasn't heavily templated, I
think, but it did pull in a bunch of standard C++ headers. I could imagine the
whole preprocessed thing approaching a gigabyte.

~~~
fjfaase
Yes, parsing generated source code might be a problem, and yes, C++
preprocessing adds a lot as well. But even 100,000 lines of 80 characters
whould still be (a little less than) 8 Mbyte. I did not design my
interpretting parser with maximum performance as a goal, but just to have a
parser that is easy to use and allows for much flexibility with respect to the
grammar specification. I feel that it is a good parser for domain specific
languages.

~~~
tom_mellior
> even 100,000 lines of 80 characters whould still be (a little less than) 8
> Mbyte

Um, you're right of course. I _did_ do the same math in my head but somehow
got confused by two orders of magnitude.

------
pubby
Top-down (descent) parsing gets a nice speed boost when you realize creating
an AST isn't always necessary. Top-down parsing happens in the same traversal
order as AST iteration, and this makes creating an AST not required.

Bottoms-up parsing on the other hand typically requires creating the AST
first, only to traverse it top-down in a manner resembling recursive descent.
The speed advantages of bottoms-up are often lost in the inefficiencies of the
AST.

Of course this doesn't always apply. It depends on the language.

------
master_yoda_1
At first I thought the parser is written by hand literally :)

