
Writing Parsers Like it is 2017 [pdf] - JoshTriplett
http://spw17.langsec.org/papers/chifflier-parsing-in-2017.pdf
======
saywatnow
This should be titled more honestly: perhaps something like "Case studies in
rapidly replacing dangerous C parser code with Rust using nom".

The examples are interesting and well presented. But the first sections trying
to put a veneer of respectability on rust-all-the-things were a bit rough and
got my cynic sense tingling.

Yes, you believe Rust will produce better results: don't try to justify that
with facts you don't have ("Several languages were tested .." bullshit, unless
you show some data. Likewise the assertions about type-safety and no-GC being
essential properties). The data you _do_ have (implementations produced and
integrated and tested in a paper-like time frame) are valuable, unfortunately
they're cheapened/buried under this false veneer.

~~~
geal
We in fact tested multiple languages. I can even point you to various works
done at the ANSSI like [https://github.com/ANSSI-
FR/bootcode_parser](https://github.com/ANSSI-FR/bootcode_parser) (python) or
[https://github.com/ANSSI-FR/caradoc](https://github.com/ANSSI-FR/caradoc)
(OCaml). I tried Haskell for VLC and it was not really suited for it (GC
pauses in a synchronized media pipepline, and not meant to be called from C).
But this was not a paper about comparing parser implementations.

Type-safety and lack of garbage collection are essential properties, could you
tell me why you don't think that's the case?

Giving the reason for our language choice felt useful. Otherwise, it would
have really looked like Rust developers steamrolling into projects :)

------
Analemma_
Is it possible to have "parser generators" (not necessarily in the formal
sense of the term) that produce recursive descent parsers? Even if
mathematically they can't be perfect, could we have "good enough" ones? Nobody
uses parser generators because even though P.G.s "work" on the barest level of
ingesting source code and spitting an AST, they don't do anything beyond that.
For example, trying to get helpful error messages (think: from old GCC to
Clang) from a P.G.s output is close So instead everyone writes custom parsers,
and that introduces these problems.

Can't we have smarter parser generators that do make debugging nice, but are
still formally verified?

~~~
tom_mellior
> trying to get helpful error messages

Menhir is a parser generator that tries to do that:
[http://gallium.inria.fr/~fpottier/slides/fpottier-2015-11-ou...](http://gallium.inria.fr/~fpottier/slides/fpottier-2015-11-oups.pdf)
(Don't get scared by the first slide; the title is in French, the rest is in
English.)

> formally verified

Menhir has been used for a formally verified parser for C:
[http://gallium.inria.fr/~xleroy/publi/validated-
parser.pdf](http://gallium.inria.fr/~xleroy/publi/validated-parser.pdf)

The result is in CompCert:
[https://github.com/AbsInt/CompCert/tree/master/cparser](https://github.com/AbsInt/CompCert/tree/master/cparser)

------
kazinator
As long as you're writing parsers, it is forever 1969. Knuth invented all you
need to know just recently. Off you go and parse!

------
BigIQ
Nice overview of the pitfalls of C-parsers, their hardening, a presentation of
Rust advantages, parser combinators, the nom crate, its usage, the application
to VLC and an intrusion detector, the integration with those complex existing
C codebase, fuzzing and a few ideas to improve rust for more security.

And I am happy to see the ANSSI here :)

------
souenzzo
No mentions to instaparse?
[https://github.com/Engelberg/instaparse](https://github.com/Engelberg/instaparse)

"Parse IS be something easy as regexp"

------
CodesInChaos
In my experience avoiding unlimited recursion/stackoverflows is one of the
most annoying parts of writing a parser, so I find it surprising that the
paper doesn't even mention that topic.

~~~
geezerjay
> In my experience avoiding unlimited recursion/stackoverflows is one of the
> most annoying parts of writing a parser

This, x10.

My take is that this is perhaps the main reason why everyone tends to rely on
hand-written parsers instead of simply using code churned out by a parser
generator. Parser generators are developed with a single use case in mind:
pick up a grammar definition, perhaps with pre and post-conditions, and
proceed to map it to some programming language. Yet, this fails very basic
requirements for real world parser applications.

------
JoelJacobson
Are there any Context-sensitive algorithms/parsers/generators?

The list at [https://en.wikipedia.org/wiki/Context-
sensitive_grammar](https://en.wikipedia.org/wiki/Context-sensitive_grammar)
only contains two links, where one of them, "LuZc" seems completely dead with
"lorem ipsum" under Downloads, and the other "bnf2xml" seems to be misplaced
since BNF is not context-sensitive.

~~~
sfvisser
Monadic parser combinators are context-sensitive, because you can arbitrarily
branch your parser based on previously parsed content.

------
amelius
Parser combinators are less powerful than what parser generators can do, in
terms of expressiveness and efficiency. And we've known parser generators
since the 70s.

So I'm not sure what the point is of the title of the article.

~~~
geal
(one of the authors here): parser generators are generally good for one thing:
parsing programming languages. For more complex formats, where you have to
carry state around, or binary formats, they're extremely cumbersome to use. I
often meet people that tell me they want the parser generator to end all
parsers. But for most real world formats, you'll have to hack around the
generator's limitations. Theoretically, parser combinators are less
expressive, but they're just functions: you can write whatever you want inside
a subparser, and integrate it with the rest. That approach works well, since
we wrote a lot of complex parsers with nom.

~~~
mattnewport
I thought most real world compilers tend to be hand written rather than using
a generator? By no means an expert but that's something I've heard and know to
be true for many real world compilers.

~~~
dukoid
References: [https://stackoverflow.com/questions/6319086/are-gcc-and-
clan...](https://stackoverflow.com/questions/6319086/are-gcc-and-clang-
parsers-really-handwritten)

~~~
amelius
They probably should have used this instead:
[http://scottmcpeak.com/elkhound/](http://scottmcpeak.com/elkhound/)

------
oldsj
Sorry for the low effort comment but Clever Cloud is an awesome name

~~~
ldoguin
Thanks, it's a pretty cool product too. (yeah I work there)

