"In the second step, we conducted a manual comparison between two diff outputs produced by Myers and Histogram algorithms from all files in the sample. The first two authors of this paper were involved to independently annotate the diff outputs that makes the result is expected to be more reliable. ... The comparison results between two authors from 377 files were subsequently computed to find the kappa agreement.Footnote16 We obtained 70.82%, which is categorized into ‘substantial agreement’ (Viera and Garrett 2005). This means, the statistic result of our manual study is acceptable."
Even though I'm inclined to agree with the example given in the paper and a lot of work clearly went into the qualitative evaluation, this feels like a very weak way to perform a qualitative analysis. Specifically:
- this is a sample size of two academic authors who chose to write a paper together about the quality of different diffing algorithms, ie, a very skewed and small sample.
- there is no mention of any blinding in the labeling process, so any preconceptions about the quality of different diffing may have been present in qualitative grading -- or it may not have! We don't even know.
- there does not seem to be a clear mention of how the representative sample was chosen, or of what factors were taken into consideration for determining a representative sample of changes, so that reviewers/other researchers could potentially make different choices in the future and draw informed comparisons with this work.
To sum up: in my admittedly not at all authoritative opinion this portion of the paper cannot conclude more than something like, "further study is warranted on this topic, with a far better controlled and far larger sample size, and clearer explications of the methodological choices".
Regardless of that, it was an interesting read and not something previously on my radar as worth experimenting with at all! Kudos to the authors for drawing attention to it and for the other more quantitative aspects of the paper (which I examined less and charitably assume are top notch).
To add my subjective expt: I just compared myers vs histogram on my latest commit.
- myers presented a function I'd 90% gutted as the the same function, edited (the rest was moved in the file, so no LCS algo could find it), like word-diff often does. I thought this was clever.
- histogram presented it as one function deleted, and a completely new function added. This was cleaner.
But I'm not even sure which is more usable. Might even vary with the specific task, e.g. function evolution vs function readability. Difficult area!
I use the linux command line `diff` to implement a poor mans database.
An online file store has files added daily that I need to download but not duplicate, so I grab the list off the site, save it locally, and `diff local_downloaded_files current_list_on_site | grep '>' | tr -d'>'`..
The '>' symbol is an indication of files I haven't downloaded yet; the greater than is the operation I need to perform to make my local_downloaded_files file the same as the list on the site.
So I download all the files that are indicated by the diff, add their names to local_downloaded_files file and voila, up to date downloads.
What would it look like to store files natively as insert/delete sequences instead? So instead of filesystems and diffs on top, we could have DIFFsystems and files on top. Kind of like a WAL. Files would be checkpoints in the WAL for efficiency, and diffs would be 100% accurate between two checkpoints. Probably takes a hell of a lot more space & CPU though..
Programmer Sally: "So, what are you going to do today Bob?"
Programmer Bob: "I'm not happy with the file baz.clj residing in my/ns. So I'm going to go to line 96 and change 2 to 42. I've been thinking about deleting line 124. If I have time, I'm also going to insert some text I've been working on at line 64."
Programmer Sally: (what's wrong with Bob?)
codeq ( 'co-deck') is a little application that imports your Git repositories into a Datomic database, then performs language-aware analysis on them, extending the Git model down from the file to the code quantum (codeq) level, and up across repos. By doing so, codeq allows you to:
- Track change at the program unit level (e.g. function and method definitions)
- Query your programs and libraries declaratively, with the same cognitive units and names you use while programming
- Query across repos
but, even more subtlety: some changes also aren't truly insert delete edits to strings. e.g. suppose old state was: the string "x=1" ; new state is: the string "x=2". If your system tracks string edits, maybe it encodes this change as "delete char at index 2, append char '2' after index 1". But perhaps the user intent behind this change is actually <increment the value stored in x by 1>. So reasoning about the patch at the string level misses the point completely.
This difference doesn't really matter if you are just diffing changes from a single user, but it could matter an awful lot when trying to merge overlapping changes from multiple users! E.g. perhaps user A replaced "x=1" with "x=2" and user B also replaced "x=1" with "x=2". Depending on what the intent of both users is, perhaps the correct merged result is "x=2" or "x=3" or something entirely different!
Interestingly what you want breaks an invariant I would expect traditional merge/diff to have
diff(X, Y) = 0 => merge(X, Y) = X = Y
So what you're proposing would be a different pair of ops altogether say sdiff and smerge
agree that understanding user intent is not really feasible, unless there is a way to capture the intent of the user in some clear, machine computable way.
but, less ambitiously, taking the example of merging code for a popular programming language X: the language should have some well defined grammar, etc. maybe there's even an implementation of X's grammar in a language server we can call. It shouldn't be a blue sky AI R&D project to build a custom merge tool that is able to do a three way merge of code in language X using extra information of language X grammar or feedback from a language server. we can use the extra info to reject proposed merges that obviously don't even correspond to anything valid in X's grammar. just prune the search space down to space of possible merges that make some kind of basic sense from X language perspective. often it is likely that there's more than 1 possible merge that is also valid from an X language context, we can show all these valid merges to the user and ask the user to pick which one they intended. but we can reject the other 99% of possible merges that are valid text-level merges but break obvious mechanically testable invariants of the X programming language / X's grammar
one extension point that git offers for this is the "merge driver". you can define an external script/application that will be called by git whenever a merge conflict needs to be resolved for some particular file (based on path or a pattern).
Here's an older blog post describing a custom git merge driver for merging data files in a game engine: http://bitsquid.blogspot.com/2010/06/avoiding-content-locks-... In this gamedev context of merging game data files, it was less important to produce the "correct" merge result than it was to produce some result that had a valid file format. Less technical users could then fix up bad mergetool decisions in an editor with a UI instead of trying to resolve the merge conflicts at the level of the raw serialisation format itself (which could corrupt the data file and make it not possible to load into the editor).
In other situations where there is a large cost to automatically producing the wrong merge result, it would be a better tradeoff to "halt the line" if there is ambiguity about how a merge should be resolved, and escalate to a human to decide what to do.
Further reading: How to wire a custom merge driver in to git: https://git-scm.com/docs/gitattributes#_defining_a_custom_me... What values you can pass in to a merge driver on the command line: https://github.com/git/git/blob/f1d4a28250629ae469fc5dd59ab8... Simple example of the plumbing to wire in a merge driver, with a trivial dumb driver script: https://github.com/Praqma/git-merge-driver
"How different are different 'How different are different diff algorithms in Git' articles?
I‘d much rather encounter an error and solve it myself. Linus got it right: Keep the diffing very simple and obvious, then refer to a human if there is ambiguity.
This is the current state of things, though. Heuristic diff algorithms generate nonsensical patches all the time. They may be technically correct but completely obscure the meaning of the diff.
The simplicity of the diff algorithm does not help me understand a bad diff. And these diff algorithms are already not that simple. A machine learning approach could still guarantee technical correctness of the diff, but give patches that make sense more often.
So basically with diff we had 1 problem, with ML-diff we'll have two.
There seems to be a serious bias against ML in systems programming that is unwarranted. There are plenty of problems using bad heuristics today that could be improved with ML in a way that doesn't reduce reliability. Scheduling, compiler optimization, stuff like that.
I am my phone, so I cannot check, but my two assumptions would be: Wr might use the default so that people have a match to what they see if they are diffing on their computers.
And if it is confuigurable, probably just on an instance level, so you need to run GitLab yourself and not dot com
Again just assumptions, will double check tomorrow.
`git config --global diff.algorithm histogram`
Yea, a fairly accurate TL;DR.