
Nail: A practical tool for parsing and generating data formats (2014) [pdf] - ingve
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-bangert.pdf
======
_pmf_
I've always written non-generic code generators when dealing with a specific
protocols that havelots of special case packages and inter-dependencies
between different fields of a packet and the length and/or availability of
other fields.

With generic approaches, I've always run into some specific constraint that
could not be expressed declaratively, but I'll definitely look at this.

------
oever
Calligra uses this approach to parse most of the binary Microsoft Office
files. The grammar for this is quite daunting. This approach has the advantage
that there is clear separation between the content description and the parser
generator. The parser generator can be improved separately. In Calligra this
made it possible to work on speed and memory usage separately from the work of
figuring out the actual file format.

[https://quickgit.kde.org/?p=binschema.git&a=blob&f=src%2Fmso...](https://quickgit.kde.org/?p=binschema.git&a=blob&f=src%2Fmso.xml)

------
contingencies
The foremost issue with abstract code generation systems based on DSLs (domain
specific languages) often seems to be finding programmers (and project
managers) who are willing to invest the time learning the syntax and (often
obtuse/opaque) tooling. In this case, aside from the security benefits
(6.3/p13), the performance graphs at the end of this paper (NailDNS vs. Bind9;
6.4/p14) should be a great motivator.

That said, relevant quotes affirm this type of approach aligns well with
established wisdom:

 _The benefit of using [a formal specification language] is that it teaches
you to think rigorously, to think precisely, and the important point is the
precise thinking. So what you need to avoid at all costs is any language that
's all syntax and no semantics._ \- Leslie Lamport

 _One person 's data is another person's program._ \- Guy L. Steele, Jr.

 _Representation: Fold knowledge into data so program logic can be stupid and
robust._ \- Eric S. Raymond, The Art of Unix Programming (2003)

 _Data Design > Code Design_

 _Get your data structures correct first, and the rest of the program will
write itself._ \- David Jones

 _For 80 percent of all data sets, 95 percent of the information can be seen
in a good graph._ \- William S. Cleveland, Bell Labs

 _Less than 10 percent of the code has to do with the ostensible purpose of
the system; the rest deals with input-output, data validation, data structure
maintenance, and other housekeeping._ \- Mary Shaw, Carnegie-Mellon University

 _Pike 's 4th Rule: Fancy algorithms are buggier than simple ones, and they're
much harder to implement. Use simple algorithms as well as simple data
structures._ \- Rob Pike, Notes on C Programming (1989)

 _Pike 's 5th Rule: Data dominates. If you've chosen the right data structures
and organized things well, the algorithms will almost always be self-evident.
Data structures, not algorithms, are central to programming._ \- Rob Pike,
Notes on C Programming (1989)

... from my fortune clone ( _|grep data_ ) @
[https://github.com/globalcitizen/taoup](https://github.com/globalcitizen/taoup)
; I spoke at _Code Generation 2010_ (Cambridge, UK) but somewhat
controversially for that environment do not support UML.

~~~
jgalt212
> The foremost issue with abstract code generation systems based on DSLs
> (domain specific languages) often seems to be finding programmers (and
> project managers) who are willing to invest the time learning the syntax and
> (often obtuse/opaque) tooling.

I totally hear you on this as far as tech staff goes. However, if you can
create a DSL that is writable and maintainable by non-tech staff, that's a
huge win.

