
Graphtage: A New Semantic Diffing Tool - mdelias
https://blog.trailofbits.com/2020/08/28/graphtage/
======
nerdponx
Hideous screenshots aside, Graphtage itself looks very useful. Can it generate
Git-compatible diffs for use as a Git difftool?

Also-- for "standalone" tools like this that are written in Python, I highly
recommend Pipx for installing them:
[https://pipxproject.github.io/pipx/](https://pipxproject.github.io/pipx/). It
installs each tool into a separate self-contained virtual environment and
symlinks the executable itself to a "bin" directory, which prevents tools with
different dependencies from conflicting.

------
ivan_ah
This is very interesting and a much needed tool. I have been searching for a
tool like this for a long time. There are so many tree-like structures that
I'm sure there will be interesting use cases...

I was recently working on a similar tool[1] but specific to the domain of
"content trees" that consist of content nodes organized into a hierarchical
structure. In my case each tree node has a persistent `content_id` associate
with the underlying content file and independent of its position within the
tree, which allows me to detect "move" operations[2] (a node with the same
`content_id` appearing in a different place in the tree).

The use case is for educational content: Kolibri channels[3] are these huge
trees that consist of thousands of nodes and it's difficult to know what has
changed when we create new versions of the channels. I tried all kinds of
general-purpose diffing tools and failed miserably so I started working on
treediffer. It's almost done; I hope to finish it later this fall, and will
look at graphtage to see how it works.

[1]
[https://github.com/learningequality/treediffer](https://github.com/learningequality/treediffer)
[2]
[https://treediffer.readthedocs.io/en/latest/diff_formats.htm...](https://treediffer.readthedocs.io/en/latest/diff_formats.html#example-
of-node-moved) [3] [https://kolibri-
demo.learningequality.org/en/learn/#/topics](https://kolibri-
demo.learningequality.org/en/learn/#/topics)

------
lewisjoe
Has anybody went through React's HTML diffing algorithm? If this one's good,
we could write a JS version and use it for HTML diffing in browsers.

~~~
brunoqc
Graphtage could be compiled to wasm and used in a browser.

------
hinkley
I was staring at a diff today and longing for better semantic diffing.

I’d changed a shell script, with a chain of commands. I added a second call to
the same command with different args and the diff was just... bad.

    
    
        something && fizz foo && another
    
        something && fizz bar && fizz foo && another
    

It decided that “bar && fizz” was my edit, and I just stared at it (it was
already a tough day). Even if they had just weighted punctuation characters
differently, it would have gotten the right answer, as it would with adding
new functions or array entries, which it always gets wrong too.

Sort it out please.

------
tingletech
interesting "This tool was partially developed with funding from the Defense
Advanced Research Projects Agency (DARPA) on the SafeDocs project."

I like the idea that it can do semantic diffs across different formats.

------
setpatchaddress
I would recommend deleting the screenshots, though. I looked at them and
thought "so what? that's been done many times before" until I read the text
more carefully.

~~~
hinkley
I would recommend reshooting the screenshots. Navy blue on a jet black
background? Removing new lines in the initial example but not in the diffs?
Fixing those would get the point across better.

Also, turn the saturation down. That’s the greenest green and the reddest red
next to the darkest blue. My eyes.

~~~
throwaway_pdp09
I can't see a problem - there's no pic. I guess they need JS to show images.

Back on point, I see so much of this grey-on-grey type thing, just a little
common sense would suggest it's very poor practice but it keeps happening.

~~~
hinkley
Exhibit A:

[https://i1.wp.com/blog.trailofbits.com/wp-
content/uploads/20...](https://i1.wp.com/blog.trailofbits.com/wp-
content/uploads/2020/08/example.png)

~~~
throwaway_pdp09
First thought was you'd given me a nethack screenshot by accident, but thanks!
Interesting project.

------
sendbits
super cool, having worked on related problems independently (tree-based file
compression & arbitrary graph-based file compares) _and_ currently been in
search of better way to compare web scrapes over time

kudos for putting the two concepts together / will give it a go

------
anotheryou
I want one that can also find non-perfectly matching moved lines :)

looks cool already though, got to try it some time.

~~~
idubrov
At my previous job I've built a tool that was capable of doing that (we were
merging XMLs with form definitions). The main idea was an interactive mode.

Initially, tool would merge based on series of heuristics and then user would
manually adjust "matching" nodes (user could say "actually, this A on the left
and B on the right are the same, it's just that it was heavily modified").

~~~
hinkley
It seems like if the editor produced hints this would work better, but your
target audience also shrinks.

