

Dartmouth computer scientists expanding diff, grep Unix tools - abennett
http://www.itworld.com/software/231515/usenix-dartmouth-expanding-diff-grep-unix-tools

======
chriseidhof
My friend Eelco wrote his Master's thesis about typesafe diff (and patch) for
arbitrary data structures:

<http://eelco.lempsink.nl/thesis-eelco-lempsink.pdf>

It allows you to have meaningful diffs for any kind of datastructures, not
just lists of strings. The code is also available as a Haskell package:
<http://hackage.haskell.org/package/gdiff>

------
winestock
From the article:

The new programs, called Context-Free Grep and Hierarchical Diff, will provide
the ability to parse blocks of data rather than single lines. For each new
type of data structure, a vendor would provide a pattern library identifying
the basic structure of the data, which the software would then use to "extract
the constructs of interest from the document," Weaver said.

So how does the above fit into the Unix Philosophy's dictum that "Everything
is just a stream of bytes (of text)?" Or does it?

~~~
zokier
New tools, new philosophies. I would imagine that the new grep would fit well
in the json-riddled world that is web these days.

~~~
mhansen
Check out jsonpipe, it makes json greppable.

<https://github.com/dvxhouse/jsonpipe>

------
numeromancer
Behold Coccinelle, a related tool, for doing semantic patches in C:

<http://coccinelle.lip6.fr/>

The natural inverse to hierarchical diff should be hierarchical patch, which
would be (more-or-less) a generalization of what coccinelle does.

------
bazzargh
Interesting that they'd go for context-free grammars rather than PEG? A few
years ago someone created a PEG-matcher lua lib, similarly aimed at replacing
regexps: <http://www.inf.puc-rio.br/~roberto/docs/peg.pdf> And Ward
Cunningham's exploratory parser: <https://github.com/AboutUs/exploratory-
parsing>

Is there a link to the paper somewhere?

~~~
zeugma
here : <http://www.cs.dartmouth.edu/reports/TR2011-705.pdf> It is sad we have
to dig for it, it should be the first link in the article.

~~~
bazzargh
Thanks.

Having read the paper, it seems its more specialized to cisco IOS config files
- they're building a library of patterns you can match against, the paper
doesn't explain how you can add your own. And it does seem to be CFG rather
than PEG - there's a separate token library.

Ward's work looks more usably generic than this, it's not cited so I presume
they weren't aware of it.

------
jlarocco
This sounds like a neat idea, but I'm not sure how useful it'll be.

Relying on vendors to supply machine readable data explaining their file
syntax seems... optimistic. Maybe "It's JSON" will be enough?

Also, I'm curious how the patterns will be specified. I can imagine it quickly
becoming complicated.

I'm also not sure the problem is enough of a problem for people to learn the
new pattern matching syntax. The occasions where I would have needed this have
been inconvenient, but not enough that I bothered to look if tools existed. I
usually just crank out a quick Python script to do it.

------
obtu
By _longitudinal diff_ , are they referring to repeated diffs à la `git log
-p`? Then it isn't really different from repeating a diff algorithm on
adjacent snapshots. Better tools to diff tree-like structures is still good
news.

Specifically for conffiles, there's already Augeas (<http://augeas.net/>), a
good, practical tool to map many Unix/Linux formats to trees of text.

(I'm referring to the paper:
<http://www.cs.dartmouth.edu/reports/TR2011-705.pdf> )

------
iopuy
I'm sorry but this just doesn't seem of enough importance to warrant the
article. A student and professor's poster presentation? Many more
cool/interesting things have been done on GitHub.

From the article:

"...Gabriel Weaver, a Dartmouth graduate student who, along with Dartmouth
computer science professor Sean Smith, is creating the variants of grep and
diff. Weaver presented the new utilities at a poster session at the Usenix
Large Installation System Administration (LISA) conference, being held this
week in Boston."

