
Hammer: Parser Combinators for C - ShaneWilton
https://github.com/UpstandingHackers/hammer
======
ShaneWilton
Hammer is a nifty parser combinator library from the people behind Language-
Theoretic Security [1].

Their goal is to promote the use of context-free formats and protocols, to
avoid falling into the trap of trying to parse recursively-enumerable
languages, a problem that reduces to the halting problem. The idea is that by
sticking to context-free grammars, protocols will be easier to parse, and the
bulk of memory corruption errors can eliminated.

Meredith Patterson and Sergey Bratus gave an excellent talk on the subject at
28c3 [2], and they've put together a video series on using Hammer to build a
secure parser for JSON RPC [3].

[1] [http://langsec.org/](http://langsec.org/)

[2]
[https://www.youtube.com/watch?v=3kEfedtQVOY](https://www.youtube.com/watch?v=3kEfedtQVOY)

[3]
[https://github.com/sergeybratus/HammerPrimer](https://github.com/sergeybratus/HammerPrimer)

~~~
thedonald123
What do you lose by sticking to context-free grammars?

~~~
rumcajz
Ability to prefix fields by size. Which in turn means you can't preallocate
buffers because you don't know the sizes in advance.

~~~
nly
Meh, you're probably going to want to limit your buffer size anyway in any
highly concurrent system. Assuming you need to scan them anyway, delimited
fields can also be faster to scan. If a field can't contain 0x00, and is 0x00
terminated (you can ensure this), then you don't need to do as many bounds-
checks. If you know what you're doing, then Ragels 'noend' directive enables
this, for instance.

------
nickpsecurity
Great work by the LANGSEC crowd. My recent interest in parsers was those
verified for correctness or security. So, they have security and a certain
amount of correctness. A great next step would be combining it with work on
formal verification of parsers or generators (see below). On top of that,
ensure the subset could generate SPARK and/or CompCert-compatible C code to
automate much of the rest of the problem.

TRX formally verified parser interpreter
[http://arxiv.org/pdf/1105.2576.pdf](http://arxiv.org/pdf/1105.2576.pdf)

Validating LR(1) Parsers [http://gallium.inria.fr/~xleroy/publi/validated-
parser.pdf](http://gallium.inria.fr/~xleroy/publi/validated-parser.pdf)

Verifying a parser for a C compiler
[http://gallium.inria.fr/~scherer/gagallium/verifying-a-
parse...](http://gallium.inria.fr/~scherer/gagallium/verifying-a-parser-for-a-
c-compiler/index.html)

~~~
maradydd
Thanks! Those are all excellent papers. I am actually looking into formally
verifying Hammer using Frama-C, but haven't gotten too far yet.

~~~
nickpsecurity
Smart move. It's a good default far as work vs results tradeoff. I'll try to
keep your project in mind when I review my collection of verification papers
in case I see any little-known, C verification methods that might help.

------
buserror
I've been using ragel for a few years, and really like it; but it does have
the problem of not handling very well bit based structures (or, even, 8 bits
based structures).

Also, the author closed the mailing list and packed up; presumably to work for
a company who 'bought him out'. Closing the mailing list was a notch rude...

Hammer does look pretty nice, I'm definitely going to have a poke at it!

~~~
nly
You can sort of use bitfields with ragel if you fashion a bit iterator in C++.
And why do you say you can't process binary / 8bit structures?

I'm still hopeful Ragel 7 will happen.

------
nly
Amusing name given the (coincidental?) existence of Nail [0][1]

[0] [PDF] [https://people.csail.mit.edu/nickolai/papers/bangert-nail-
la...](https://people.csail.mit.edu/nickolai/papers/bangert-nail-langsec.pdf)

[1] [https://github.com/jbangert/nail](https://github.com/jbangert/nail)

~~~
jbangert
Before I moved down to MIT, I studied with Sergey, so the name is not at all
coincidental. Nail tries a slightly different agenda. Instead of trying to
make people design their formats in a reasonable way (which is excellent in
the long term), Nail tries to allow people to get some (maybe most) of the
benefits of parser generators, but for arbitrary formats.

I also made some different design decisions than Hammer, giving it a slightly
different goal: code generation instead of runtime combinators, output
generation, no semantic actions...

~~~
CMCDragonkai
Would you consider using Nail to not just parse content-formats like JSON, but
(dependent/semantic) type validation on the values of the data itself. Some of
this involves business logic as well. For example, let's say I encode some
data in JSON. Well I can parse JSON well, but then I need something inside
that JSON to be an integer. Ok, that's fairly easy. But then I need that
integer to be between 10 and 20. Ok, we have a filter now. But then, I need
that integer to be less than or equal to a value defined elsewhere, but only
if that value exists, because it's optional. And then, finally if a certain
business condition is not met (by contacting the database), the entire input
must be considered invalid and discarded. All of this is a form of data
validation, but it increasingly involves more and more complexity.

------
anon4
I think the biggest question I need answered is "why would I use this and not
ANTLR"?

~~~
adrusi
Isn't ANTLR only for java, or at lest primarily supported on the jvm platform?

~~~
donmcc
ANTLR is written in Java, but can generate parsers for Java and C++ (and some
other languages I believe).

