
Nom, a byte oriented, streaming, zero copy, parser combinators library in Rust [pdf] - steveklabnik
http://spw15.langsec.org/papers/couprie-nom.pdf
======
geal
Hi all, Nom's author here.

I wrote this paper in January 2015, so some things changed in the meantime:

\- the benchmarks have been optimized a bit, and someone contributed a cereal
parser that beats nom (
[https://github.com/Geal/nom_benchmarks](https://github.com/Geal/nom_benchmarks)
). It's alright, I just wanted to estimate where it stands

\- there is better error management: [https://github.com/Geal/nom/wiki/Error-
management](https://github.com/Geal/nom/wiki/Error-management)

\- the error management types give powerful debugging features:
[http://dev.unhandledexpression.com/slides/langsec-2015/img/c...](http://dev.unhandledexpression.com/slides/langsec-2015/img/colortest.png)
(test code:
[https://github.com/Geal/nom_colored_hexdump](https://github.com/Geal/nom_colored_hexdump)
)

\- It is now easy to embed in C (
[https://github.com/Geal/nom_in_c](https://github.com/Geal/nom_in_c) ). I plan
to make API compatible versions of some C libs

\- there are more example parsers now:
[https://github.com/Geal/nom/issues/14](https://github.com/Geal/nom/issues/14)
feel free to pick one up!

Here are the slides for the conference at the IEEE Langsec workshop:
[http://dev.unhandledexpression.com/slides/langsec-2015/](http://dev.unhandledexpression.com/slides/langsec-2015/)

Langsec is one of the most interesting approaches to follow in security, and
the ideas presented this year were amazing (I'm especially fond of the heap
exploitation algebra). I highly recommend watching the videos once they will
be available.

Writing parsers with nom is easy and fun, please give it a try ;)
[https://github.com/Geal/nom](https://github.com/Geal/nom)

~~~
dansimon
Great work! I've been working on a Rust parser combinator library myself (
[https://github.com/DanSimon/peruse](https://github.com/DanSimon/peruse) ),
though I've been focusing more on parsing syntax trees than binary formats,
and it's really cool to see the different approaches you've taken. I had been
planning on working on stream parsers next, but I may just defer to nom since
it basically looks like what I had wanted to do.

~~~
geal
I really like your approach with trait objects. This is something I would have
liked in nom when I started it, but macros proved more useful for my
experimentations (less types to rewrite when I change something).

For the streaming part, it is still a work in progress. I need a way to better
represent input enabled state machines, otherwise we'll end up with streaming
switch based state machines, and they are a security nigtmare.

------
nickpsecurity
Getting parsing done correctly and efficiently is a requirement that pops up
over and over. More work in the field is always good. Interesting paper,
although I haven't heard of the parsers. I usually hear about yacc, GOLD
parsing system, and so on. Like to see comparisons with most common ones for
probable use cases in terms of productivity, safety, and performance.

Far as next project, Leroy's people at INRIA did excellent work in verified,
LR parsers [1]. I'm not sure that anyone is building on it at the moment.
Implementing that in or integrating with Rust might make for one heck of a
parsing system. So long as correspondence was proven, it would be the safest
one in a systems language.

[1] [http://gallium.inria.fr/~fpottier/publis/jourdan-leroy-
potti...](http://gallium.inria.fr/~fpottier/publis/jourdan-leroy-pottier-
validating-parsers.pdf)

~~~
geal
The parser combinators are a common approach in functional languages.
Basically, instead of generating the whole parser from a grammar, you assemble
a lot of small functions, in other functions. The resulting code often
ressembles the grammar very closely, and all of the intermediate parsers are
very easy to test.

Proving with Coq the soundness of a parser compared to its grammar is a cool
approach! The biggest problem one has when writing parsers in "safe" systems
like parser generators or parser combinators, is the gaps between the input
language intended by the designer, the input language described by the grammar
and the input language described by the code. Anything that can reduce those
gaps is welcome.

I shoul try at some point to use Coq or some SMT based system to hunt
ambiguity in formats. That would make an interesting research ;)

~~~
nickpsecurity
That makes sense. The combinator approach might be verified using a form of
design by contract or VCG's if it's simple functions. I'll look into them when
I take the deep dive into functional programming.

------
alfiedotwtf
As OP says "benchmarks, they have to be considered with skepticism", but with
Nom implementing the "most naive recursive parsing", even the UN-optimised
benchmarks look impressive

~~~
flippant
[https://github.com/Geal/nom](https://github.com/Geal/nom)

[https://github.com/Geal/nom_benchmarks](https://github.com/Geal/nom_benchmarks)

------
eli_gottlieb
Ok, if Rust adds higher-kinded types, I'm giving it a try. This is _very_
interesting, being able to write almost Haskell-style code without laziness or
garbage-collection.

~~~
steveklabnik
We designed the standard library to accommodate them, but working on an actual
proposal was put off till post 1.0. It's a very highly desired feature.

------
stefanix
Nom is such good name for a parser. Will ver 2.0 be called NomNom?

~~~
geal
That's a fun numbering scheme, but it might be cumbersome in the long run,
right?

It's named that way because a nom parser takes a byte of data ;)

~~~
DanWaterworth

        Nom
        NomOm
        NomNom
        NomOmOm
        NomOmNom
        ...

------
buster
I am wondering how it compares to rust-peg[1]. I am currently using it and the
syntax is very nice. At first look it seems to be more verbose to use Nom.. Is
it much faster then rust-peg, though?

[1] [https://github.com/kevinmehall/rust-
peg](https://github.com/kevinmehall/rust-peg)

~~~
geal
I do not know if it is faster, I should add it to the benchmarks :)

The biggest difference here is the use of a syntax extension to parse the
grammar. This is something I would like to do at some point, because it can
make the code nicer. Right now, macros are good, because you can use nom
directly without Rust 1.0, no need for feature gates.

~~~
buster
That'd be nice.. i am currently contemplating to move or not. Nom does run on
Rust stable, right? Although i find rust-peg super easy to use, i don't like
that it's only working on nightly. A difference in performance would be
another reason to switch over.. or not.. :)

Anyway, thanks for this great project!

~~~
geal
Nom works on stable, but there are nice things you can do if you use the
nightly, like matching on error codes with slice or box patterns (
[https://github.com/Geal/nom/wiki/Error-management#with-
slice...](https://github.com/Geal/nom/wiki/Error-management#with-slice-
patterns) ) or use no_std to embed the parser as a C library (
[https://github.com/Geal/nom_in_c](https://github.com/Geal/nom_in_c) ).

~~~
steveklabnik
You shouldn't need no_std specifically to embed in C anymore.

~~~
geal
That's nice to know, thanks!

~~~
steveklabnik
Any time. Basically, one the runtime was removed[1], this became true.

1: Of course, all programming languages have runtimes, but once we got down to
C or C++ levels of runtime, that is.

------
amelius
What class of grammars does this support? How are ambiguities handled and
reported?

~~~
geal
Parser combinators are recursive descent parsers, so they do not manage
ambiguities at compile time. At runtime, the first parsing path that works
will win.

Nom can handle easily regular, context-free and context sensitive grammars (a
lot of binary formats are a bit context sensitive, so that was a requirement).
I suspect it could do recursively enumerable too.

------
Animats
How hard is it to translate something written for "pyparsing" to Nom? The
approach seems reasonably similar, but are there limitations in Nom that
aren't in "pyparsing"?

~~~
geal
I have never used pyparsing, but the approach seems similar. Right now, the
only real limitation is that I still have a few combinators to implement.
Otherwise, anything complex you might want can be done by writing a simple
function.

------
Apanatshka
Hmm. Someone beat me to it. Interesting though, very cool.

