Hacker News new | past | comments | ask | show | jobs | submit login
Challenges of Resolving Merge Conflicts: A Mining and Survey Study [pdf] (uni-saarland.de)
40 points by matt_d 11 days ago | hide | past | favorite | 7 comments

I'm still surprised merge conflicts show up so often. At its core, diff and diff3 haven't changed since the 1980s. Complicated, heuristic-based algorithms that don't take into account the rich metadata and semantics we can understand today.

diff3 is not

- idempotent

- semantic

- stable (formally, stability means that there exists a constant such that for small enough changes there is a guaranteed small merge)

[1] https://matt-rickard.com/diff3-shortcomings/ [2] https://matt-rickard.com/diff-the-magic-behind-version-contr...

I was struggling to see how you could apply a test for "idempotence" or "stability" to diff3. Turns out in the paper they redefined what diff3 does in order to test it.

I think what tools like GitHub and Gitlab need are a feature for PRs where near the pull/merge button there's a view of other PRs that the current one will create conflicts for when merged.

The reason this would be useful is sometimes you kind of do need a merge to be large, and while a large and important PR is being reviewed someone may merge another large but less important PR that causes a delay for the first PR. Like if I was going to merge a big PR but then I saw that it would mess something up with higher priority, I'd already know to hold off without having to inquire or always know what everyone else is working on.

After reviewing a string of gigantic PRs, I tried to use mathematical reasoning to convince my team to keep PRs small. Glad to see some actual scientific evidence behind it. Not specifically related to merge conflicts, though.


" main results, (i) we found that committing small chunks makes merge conflict resolution faster when leaving other independent variables untouched, (ii) we found evidence that merge scenario characteristics (e.g., the number of lines of code or chunks changed in the merge scenario) are stronger correlated with our dependent variable than merge conflict characteristics (e.g., the number of lines of code or chunks in conflict), (iii) we devise a taxonomy of four types of challenges in merge conflict resolution, and (iv) we observed that the inherent dependencies among conflicting and non-conflicting code is one of the main factors influencing the merge conflict resolution time."

I was thinking about merge conflicts in the context of Rust recently. In a way, the borrow checker would prevent any merge conflicts, because two people won't edit the same file at the same time. But this would require a central authority to give you permissions on a file to edit it for a certain duration, during which nobody else can edit it.

More software engineering research is good, I feel like we are lacking in that area. However, this was only for public code and analysis of private (company) code is not mentioned in the directions for future work. Would there be a way to extract information about conflicts and conflict resolution without giving away important parts of your code? I know that deobfuscation and retroengineering are whole fields, so I think companies wouldn't take the risk. But if there is a way to extract only the git information, maybe that would be possible.

I've used a version control system that has this feature (Perforce, specifically).

In principle, it's good. In practice, I think we found it to be more trouble than it's worth... Engineers go on break / vacation / get hit by a bus too often with the locks held, and then there has to be a policy process to break the lock while that user still has outstanding changes that are now stale (in an entire environment that didn't reconcile "stale" code well because it was designed to prevent the need to merge code).

At our scale, this required us to have someone act as a full-time "repo guru," which was a brutally thankless job.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact