
Ask HN: How did you learn parsing? - vector_spaces
Most programming language implementation&#x2F;compiler&#x2F;interpreter books I know of (EOPL, SICP, PAIP, Dragon Book) seem to sort of gloss over parsing, if they cover it at all, or they use some hack like the homoiconicity of Lisp which isn&#x27;t super generalizable. PAIP goes so far as to call parsing a &quot;distraction&quot;!!<p>Or in the case of the Dragon Book, they present it in a way that is very detailed but isn&#x27;t super helpful for someone who&#x27;s self-teaching (but might be great for someone in a compilers class with a professor to provide context and focus).<p>I can hack together some parsing code for the most part if I need it, but I&#x27;d like to learn how to do it &quot;properly&quot;.<p>I&#x27;m a self-taught working programmer (i.e. no degree in CS), and would appreciate responses from other self-taught people who&#x27;ve learned to write LALR&#x2F;CLR&#x2F;Pratt&#x2F;etc parsers for fun or profit.<p>What resources or projects have helped you here? Any suggestions for someone in my position?
======
smitchell2
I would suggest Recursive Descent for general parsing, and Precedence Climbing
for expressions.

Once you get the hang of Recursive Descent, parsing becomes mechanical. And
with Precedence Climbing for expressions, you just plug in the appropriate
operator priorities.

Some of the literature below refers to "parsing" as "syntax analysis".

See this introductory set of articles:

[https://www.semware.com/html/compiler.html](https://www.semware.com/html/compiler.html)

Then see this (small, 160 page) book from Wirth:

[http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf](http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf)

See examples here - especially the C and Python examples: (lexical analysis
examples included, since it must feed the parser)

[http://rosettacode.org/wiki/Compiler/lexical_analyzer](http://rosettacode.org/wiki/Compiler/lexical_analyzer)

[http://rosettacode.org/wiki/Compiler/syntax_analyzer](http://rosettacode.org/wiki/Compiler/syntax_analyzer)

------
kazinator
The "Dragon Book" glosses over parsing? Yagoddabekidding.

It teaches how to write LALR(1) parser generators, with detailed expositions
of the algorithms including step by step execution of shift-reduce parsing.
There is a chapter on implementing regular expressions. It has examples of the
use of the Lex and Yacc tools.

~~~
vector_spaces
My bad, I wasn't being very clear. What I meant with my second sentence was
that it's thorough to the point of being a bit overwhelming for someone who's
self-teaching. That said, I should probably revisit it and with a more recent
edition.

Thanks for the response

------
tannhaeuser
The dragon book doesn't "gloss over" parsing at all, and is how I learned to
create practical parsers by hand (recursive-descent and simple bottom-up
parsers) or using lex/yacc. I'd also recommend learning Prolog for easy DCG
parsers and extensions (DCTG/attribute grammars).

------
PaulHoule
The hard way, over a long period of time. (Decades!)

I started out writing parsing code by hand, eventually worked out that parsing
email headers was easy with a finite state machine, got into parser generators
when I needed to, eventually absorbed a lot of the theory, always motivated by
projects that needed the work done.

If you find the subject hard to learn don't just blame yourself, blame the
subject. The theory and practice of parsing in 2019 is a lot less developed
than it really should be.

(1) Small changes in grammar should be small changes in code. For instance, if
you want to add an "unless" keyword to a language which is the opposite of
"if" that really should be a few lines of code that are physically separate
from the rest of the grammar.

(2) Grammars should be composable: you should be able to make a language that
is like Java, except you can write something like

sql(select count(*) from customers)

and be able to do it by referencing existing grammars and just a few lines of
code.

(3) Parsers should generate idiomatic data structures. If you are working in
Java you should get ordinary objects, maybe in Javascript or Python get JSON-
like structures. Small changes in the grammar (say operator precedence) should
involve small changes in the parse tree.

(4) When you make a parser you should get an "unparser" for free that will
convert the parse tree back to text

(5) Compiler books will talk about how making error messages that make sense
is a hard problem and stop there. I've seen more than one system at this point
hit the wall (e.g. Drools) because users could not figure out how to decode
the error messages.

I think Parsing Expression Grammars are a step in the right direction, but
I've yet to see one that ticks all the boxes above.

I find the old parsers for C written by the Bell Labs people interesting
because of the hacks involved. Namely, a simple mechanism for operator
precedence (try doing that for PEGs!) and a hackish level of dependence
between the early stage parser and the symbol table. Finally the "callback
hell" interface of yacc was a great idea at the time but is insane in modern
application languages.

------
nonsapreiche
For me started learning lexx and yacc (and the red dragon)

------
sethgecko
Check out [http://norvig.com/lispy2.html](http://norvig.com/lispy2.html)

