
Owl: Parser generator for visibly pushdown languages - ccmcarey
https://github.com/ianh/owl#
======
ccmcarey
I've been looking at parser generators recently in an effort to begin writing
my own simple interpreted language.

My end goal is a self hosted language.

My first attempt was writing a recursive descent parser by hand:
[https://gist.github.com/cmcarey/eee1571721141c356d4f61b453a6...](https://gist.github.com/cmcarey/eee1571721141c356d4f61b453a600f0)

This didn't work so well (badly structured as well as finding out my grammar
was ambiguous after I'd written 600 lines of parser code).

Looking into alternative ways to write a parser, I've seen the usual
yacc/bison/ANTLR type tools and relevant discussions here on HN.

Owl looks fantastic and probably exactly what I'm looking for. Fantastic
interactive page where you enter the grammar and code and it shows a visual
representation of the parse tree:
[https://ianh.github.io/owl/try/#example](https://ianh.github.io/owl/try/#example)

~~~
dbcurtis
The problem with yacc/bison-style LALR parser generators is that you end up
with an LALR parser. Which works great on syntactically correct code, but
trying to get a reasonable error message out of one is about as much fun as
repeatedly poking yourself in the eye with a sharp stick. Also, by the time
you add the empty productions that you want to help with code generation, the
grammar gets delicate and brittle.

OTOH, the parser generators will do a good job of making sure your basic
grammar hangs together. Here is what I have been doing lately:

1) Use ply (Python LALR parser generator by Dave Beazly) to create an accepter
(just parse, no code gen). That will help you chase out all the grammar
inconsistencies.

2) For production, write a hand-generated recursive descent parser, and use
"precedence climbing" for expressions. This gives you a nice table-driven way
to handle expressions with a couple of mutually recursive functions, thus
eliminating the main PITA of recursive descent: those darn arithmetic
expressions.

~~~
jhpriestley
Could you explain in more detail what you mean about LALR error messages? I
haven't used the parser generators you mention but I've implemented LALR
parsing before, the error reporting situation doesn't seem drastically
different from recursive descent to me. You can report the location where the
parse failed, what was recognized before that point, what tokens would have
been allowed to follow.

Is the problem that these specific tools (yacc/bison) don't report errors
well? Or is the issue you're talking about that LR grammars will fail later on
than LL grammars in general? Or the greater ambiguity (e.g. more possible
lookaheads due to the greater parsing power)?

~~~
nickpsecurity
I'll also add you can use two parsers: one that's super fast on code that's
correctly written; if that fails, one that makes error messages easy to
handle. sklogic said he used that strategy in his tools for program analysis.
The extra code is negligible. I don't know how much extra time or maintenance
burden but I figured marginal versus cost of handling errors in first place.

~~~
lihaoyi
My FastParse
([https://www.lihaoyi.com/fastparse/](https://www.lihaoyi.com/fastparse/))
parser combinator library does this; by default it runs without error logging,
just giving you an error position, but if something fails you can ask it to
re-do the parse keeping track of additional metadata to give you a nice error
message with what tokens could have succeeded and a stack trace telling you
why wants those tokens.

Tracing errors slows things down about 2x, which is why it isnt on by default,
on the assumption that most parses are successful

------
qwerty456127
Is there an example of such a language?

~~~
olooney
It appears that the most important restriction is on recursion:

> This is what guarantees the language is visibly pushdown: all recursion is
> explicitly delineated by special symbols.

> Plain recursion isn't allowed. Only two restricted kinds of recursion are
> available: guarded recursion and expression recursion.

So the Owl grammar itself qualifies:

[https://github.com/ianh/owl/blob/master/grammar.owl](https://github.com/ianh/owl/blob/master/grammar.owl)

JSON certainly qualifies:

[https://github.com/ianh/owl/blob/master/test/json.owl](https://github.com/ianh/owl/blob/master/test/json.owl)

I think the full JavaScript language would have qualified in the past, when
anonymous functions could only be the explicitly delimited "function() {}"...
but now with ES6 we have fat arrow functions (for example "(x,y) => x+y") so I
think modern JavaScript can no longer be parsed with Owl. That's because you
can write something like "f = x => y => z => x+y+z" which has no guard nor can
it be parsed with expression recursion.

I do not think you can do a C-like language because of function types. But you
can do a simpler, C-like language, for which they have an good example in the
test directory:

[https://github.com/ianh/owl/blob/master/test/something.owl](https://github.com/ianh/owl/blob/master/test/something.owl)

------
imoverclocked
Unfortunate naming. This is not to be confused with OWL or OWL2:

[https://en.wikipedia.org/wiki/Web_Ontology_Language](https://en.wikipedia.org/wiki/Web_Ontology_Language)

~~~
goldfeld
Yes, I'm trying to find resources on this lib, which I guess aren't many since
it's new and because all results on first page for my query are about the
Ontology language, which is frustrating so I agree that the naming was a bit
unfortunate.

