The popularity of PEG baffles me. I guess it ties in nicely with the wider and older pattern of rejecting good formal language and parsing theory, but PEG seems like an all around bad technology; like, is there even a single benefit to using PEG?
The difference between PEG and the usual grammars is that PEG introduces special meaning to the "choice" grammar operation so as to define away ambiguity. But defining away ambiguity actually just "hides" language constructions that will be ambiguous to users. Although PEG notation may be indistinguishable from those used to define CFGs, PEG sucks as a language specification tool as it is unclear what language a PEG defines (unlike with CFGs). From the perspective of implementation, PEG parsers backtrack, meaning either exponential run-time or additional linear memory requirements.
Also some Jeffrey Kegler quotes:
> With PEG, what you see in the extended BNF is not what you get. PEG parsing has been called "precise", apparently based on the idea that PEG parsing is in a certain sense unambiguous. In this case "precise" is taken as synonymous with "unique". That is, PEG parsing is precise in exactly the same sense that Jimmy Hoffa's body is at a precise location. There is (presumably) exactly one such place, but we are hard put to be any more specific about the matter.
and
> When you do not know the language your parser is parsing, you of course have the problem that your parser might not parse all the strings in your language. That can be dealt with by fixing the parser to accept the correct input, as you encounter problems.
> A second, more serious, problem is often forgotten. Your PEG parser might accept strings that are not in your language. At worst, this creates a security loophole. At best, it leaves with a choice: break compatiblity, or leave the problem unfixed.
and
> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.
In a formal sense, there are some non-context-free languages that PEG can recognize ( A^n B^n C^n is the classic example ). It’s an open question whether there are any context-free languages PEG isn’t capable of recognizing.
Fundamentally, PEG is a formalization of the common practice of writing a recursive-descent parser: It defines languages based on an exemplar algorithm that parses them instead of the generative approach taken by BNF, which defines how you can enumerate the legal strings of the language. They’re both as mathematically rigorous as each other, but approach the problem space from opposite sides.
For BNF, ambiguity is a question of whether the generation process is reversible: An ambiguous grammar has multiple paths that produce the same string which means that there’s no way to recover the path used to generate that string.
PEG is “unambiguous” because it’s not based on a generative model at all, so this definition is nonsensical. With PEG, we know, by definition, how any given string will parse but it can be hard to enumerate the strings that will parse in a given way. If PEG has an interesting notion of ambiguity (which it probably does), that’s the place to look for it.
PEG for one thing offers the possibility of composibility.
For a language like Python or Java it shouldn't be much harder to add an "unless(X) {Y}" equivalent to "if(!X) {Y}" than it is in LISP if the compiler was structured in such a way that you could take "Java Compiler A" and specialize it to "Java Compiler B" such that one term was added to the grammar and one AST tree transformation that writes the term.
Exclusive of POM files, import statements and other packing material that should be less than 50 lines of code if we built compilers to be extensible.
The arangodb database has a kind of SELECT statement that returns a list of unordered dictionaries which is frustrating if you want to plot the result in a pandas Dataframe. I wrote something in Python using PEG that would parse an adb query and static analyze it and determine the intended field order when definite.
I am up for any compiler technology that makes it easy to do that even if it means I don't get compilation as fast as Go.
Another issue that matters in error handling. Drools is a great rules engine on paper, but as it is composed with an external Java compiler you can often get into cases where the error messages don't make sense at all and it isn't like the Jena rules engine which is simple enough that I could get quick results looking at it in the debugger. That's a complicated example but I think language users deserve better error handling and compiler authors need better tools to do that.
The popularity of PEG baffles me. I guess it ties in nicely with the wider and older pattern of rejecting good formal language and parsing theory, but PEG seems like an all around bad technology; like, is there even a single benefit to using PEG?
The difference between PEG and the usual grammars is that PEG introduces special meaning to the "choice" grammar operation so as to define away ambiguity. But defining away ambiguity actually just "hides" language constructions that will be ambiguous to users. Although PEG notation may be indistinguishable from those used to define CFGs, PEG sucks as a language specification tool as it is unclear what language a PEG defines (unlike with CFGs). From the perspective of implementation, PEG parsers backtrack, meaning either exponential run-time or additional linear memory requirements.
Also some Jeffrey Kegler quotes:
> With PEG, what you see in the extended BNF is not what you get. PEG parsing has been called "precise", apparently based on the idea that PEG parsing is in a certain sense unambiguous. In this case "precise" is taken as synonymous with "unique". That is, PEG parsing is precise in exactly the same sense that Jimmy Hoffa's body is at a precise location. There is (presumably) exactly one such place, but we are hard put to be any more specific about the matter.
and
> When you do not know the language your parser is parsing, you of course have the problem that your parser might not parse all the strings in your language. That can be dealt with by fixing the parser to accept the correct input, as you encounter problems.
> A second, more serious, problem is often forgotten. Your PEG parser might accept strings that are not in your language. At worst, this creates a security loophole. At best, it leaves with a choice: break compatiblity, or leave the problem unfixed. and
> PEG is not unambiguous in any helpful sense of that word. BNF allows you to specify ambiguous grammars, and that feature is tied to its power and flexibility and often useful in itself. PEG will only deliver one of those parses. But without an easy way of knowing which parse, the underlying ambiguity is not addressed -- it is just ignored.