

Language-python: A Python Parser Written in Haskell - rogercosseboom
http://www.haskell.org/pipermail/haskell-cafe/2009-March/058051.html

======
almost
This looks fun :) I can think of all kinds of fun uses for it already...

I wonder why they choose Alex/Happy instead of Parsec though. From my,
admittedly limited, viewpoint Parsec looks brilliant for parsers that can be
reused and combined in interesting ways. My experience with it is limited to
having done a piece of coursework in it for a compilers course (took me a few
hours to do the coursework for the entire course vs. a lot longer for people
using other tools).

Anyone who knows a little more about these things who can comment on the pros
and cons of Parsec that I may have missed? Is it just that performance isn't
as good or is there more to it?

~~~
a-priori
I'm currently in the process of writing a C-Minus compiler in Haskell using
Parsec for a compilers course. I'm working on the code generation phase now,
so I'm mostly finished dealing with Parsec.

Parsec is indeed very nice, and I highly recommend it for pretty much any kind
of parsing task. It certainly beats the pants off flex & bison. The biggest
drawback I've found is that it has no support for error recovery. Once an
error has been detected, Parsec prints it and aborts. It would be nice if you
could make it recover by skipping to the next language element and accumulate
the errors.

~~~
miloshh
Yeah, Parsec is great - I used it to parse some shading languages. I have
thought about the error recovery problem, and I think for some C-like
languages there might be a solution, although I never tried it. You could use
a "smart" lexer that divides the incoming stream into functions (based on
counting "{" and "}") and run Parsec separately on each function. I can't
think of anything much better...

~~~
jrockway
_You could use a "smart" lexer that divides the incoming stream into functions
(based on counting "{" and "}") and run Parsec separately on each function._

You can't really divide the stream without parsing it, due to things that
change the meaning of "{" and "}", like /* and ".

~~~
miloshh
Yeah, but in a way a lexer is a parser, just a simpler one (so you can write
it from scratch, without a library like Parsec). Most lexers can deal with
comments and string constants already, so you could extend it to do "{" and
"}" as well...

~~~
jrockway
You can still really only do this correctly if the input contains no errors.
If you try to recover from errors, you risk generating errors that aren't
actually errors (because the code is correct if you consider the erroneous
part you skipped). I'm sure you've worked with compilers that generate
multiple errors; sometimes you can trust all of the error messages, but most
of the time, you fix the first error and the rest go away.

Parsing is hard.

