
Classp: a “classier” way to parse - rey12rey
http://google-opensource.blogspot.com/2015/03/classp-classier-way-to-parse.html?m=0
======
munro
> Parser generators are a great help in writing parsers, but they all have one
> major drawback--they are based on grammars. So what's wrong with grammars?
> Well ... nothing as an expository tool. When you want to diagram a complex
> rule for deriving tree-like structures within a string of symbols, nothing
> beats a good old-fashioned grammar. But when you want to write a computer
> program to convert a string into the logical structure that it represents
> and then use that logical structure for further processing, then grammars
> are not ideal.

Not sure what they mean by "logical structure". But I've used Parsec to
convert a string into AST. Then did a second pass with Parsec to turn that AST
into VM representations of what the program is talking about. Which worked
really well for me. :)

~~~
tel
And if that's not a "logical" enough of a structure then you can skip the AST
step if you really like and parse directly into the final effect.

~~~
david-gudeman
By "logical structure" I mean the AST. I make this clear in the rest of the
document. The point is, with Classp you don't have to write a grammar to
create a parse tree and then convert to an AST; you write a specification that
produces the AST directly.

------
sklogic
Cannot see how it's shorter and more readable than a formal grammar + AST
production rules. Given that the class syntax is very bloated (unlike, say, ML
ADT constructors), there is no chance a language grammar will be compact and
comprehensible at a single glance.

Disclaimer: of course I'm biased, I'd oppose anything with the word "class" in
it.

~~~
JoeAltmaier
The cool part is, its not only a specification but an implementation.

------
gaze
Is there any formalization of how to invert a formatter, or is this just
something they're figuring out in code as they go along?

~~~
jlarocco
I looked at the code and the manual a little bit, and it almost looks like
they're just writing an object oriented style wrapper syntax around Bison and
Flex. At a quick glance, I'm not sure they're really inverting a formatter as
much as they're adding a "syntax(...)" attribute that tells the tool how to
format the parse tree elements generated from that grammar rule.

~~~
agumonkey
Ha, spinning it as inversion... not very classy.

~~~
infogulch
Maybe that's why "classier" is in quotes in the title.

------
btown
I'll leave other commenters with more parser-writing experience to critique
this... but the general concept of "Write code in the simplest-to-reason-about
direction, and the system magically inverts it into a much-harder-to-reason-
about algorithm" is one of my favorite patterns in computer science.

For instance, there's probabilistic programming [1], where you write a
procedural program that says "if we were to fix a bunch of properties/states
of the natural world and provide them as input, here's the output/reading
you'd hypothetically get," and then from that, a system creates a function
that, given a large number of outputs/readings, provides a posterior
probability distribution over those hidden properties/states of the world.
It's still far from the level of performance you'd get from hand-rolled
machine learning algorithms that take advantage of all sorts of insights you
have into your data... but then again, in general programming, there was a
time when compiler optimizations were so naive that you had to hand-roll
assembly code for almost anything. I see probabilistic programming in a
similar level of infancy, and I'm excited to see where it goes, especially
with the backing of a large DARPA initiative [2] to do some really exciting
fundamental research.

But there's so much more to the "invertible" paradigm. SQL itself is exactly
along these lines - your WHERE clause is just a filter function that works on
a single-record level, but it's interpreted as highly optimized instructions
to iterate along data structures, to collect all the data that satisfies that
filter function. Facebook's Relay [3] is the beginning, I think, of a large
push towards declaratively specifying data dependencies for web applications,
and having a system that does all the heavy lifting of figuring out when to
fetch data, when to push it back to the server, how to handle requests on the
server, and how to handle mutations with optimistic client-side updates. All
you need to think about is the easy question of "what data might I need, and
if it were to exist at any given moment, how would I display it to the end
user?"

Humans think generatively, whether it's about data or formatting or GUIs or
the physical world. We have all these ideas in our mind, and we can visualize
how we would expect them to appear. When our computer systems can let us focus
on those high-value thoughts, that's what the future looks like.

So let's cut the out-of-the-box parser thinkers some slack!

[1] [http://www.pl-enthusiast.net/2014/09/08/probabilistic-
progra...](http://www.pl-enthusiast.net/2014/09/08/probabilistic-programming/)

[2]
[http://www.darpa.mil/our_work/i2o/programs/probabilistic_pro...](http://www.darpa.mil/our_work/i2o/programs/probabilistic_programming_for_advanced_machine_learning_%28ppaml%29.aspx)

[3]
[https://gist.github.com/wincent/598fa75e22bdfa44cf47](https://gist.github.com/wincent/598fa75e22bdfa44cf47)

~~~
david-gudeman
An interesting take. I've always been something of a programming-language
junkie motivated by the feeling that there must be clearer, better ways to
express things. Classp was motivated in large part by my dissatisfaction with
having to write a recognizer and warp it into a parser and then turn the parse
tree into an AST. Why can't I just express directly how the AST is represented
in the target language?

------
atesti
I loved
[http://llvm.org/docs/tutorial/index.html](http://llvm.org/docs/tutorial/index.html),
where a parser is just build by hand. Looks surprisingly easy!

