Hacker News new | past | comments | ask | show | jobs | submit login
ConE: A concurrent edit detection tool for large scale software development (arxiv.org)
21 points by pramodbiligiri 25 days ago | hide | past | favorite | 9 comments

I'm curious what devs are doing about this right now. Sometimes I'll pull and BAM gross merge conflicts. Is there a git command I could run first? A workflow to adopt? I feel like some new tool from an academic paper will never make it into my toolbox.

Co-author here. The paper reports on a system that is being run on 100s of repos at Microsoft.

The best way to deal with the problem is still tried and true modularization.

Also it helps to merge/deploy frequently.

If you have multiple people editing same code it is bound to cause some problems. The best way is to not have multiple people editing same code by separating them either by bounds of time or by APIs (can be separate packages, modules of the same application).

https://en.wikipedia.org/wiki/Conway%27s_law If by APIs or module boundaries, then this smells like Conway's Law: the organization of code mirrors the organization of people.

It is true and it actually aids modularization.

Teams have natural tendency to produce separate modules from other teams so that they can manage their development process more effectively.

Then you can consider individual team members working on their own modules. In this case the modules also can be a reflection on how the task was delegated to the team because how you split the task between multiple team members may end up how the boundaries between modules were designed.

But it does not necessarily have to be this way. Design decisions should be made consciously and selecting how to break up your application into modules is an important design decision.

Yes, just `git fetch` first, then you can take a look at what the merge would look like before actually applying it. `git pull` is approximately a synonym for `git fetch && git rebase origin/master` (if origin/master is your upstream branch)

If you don't like the results of merging, you might like rebasing better, as it linearizes the new history, so really you end up only re-applying your changes (the ones you presumably know better) on top of the new version of their changed code.

Rebasing gets a bad rap. Rebasing is what SVN and traditional SCMs would call a merge, and merge is really the weird crazy complex operation.

If you decide to bail halfway through `git rebase --abort`

Or for a particularly hairy merge you could use a temporary branch you can always abandon

Funnily enough, older SCMs already dealt with this by having commands like `lock` to advisorily lock a file.

The best workflow to adopt to make merge conflicts less painful is that if you have an area of the codebase with a lot of merge conflict, break it up into more files so you are less likely to be making simultaneous changes. Frequent super hairy merge conflicts are an architecture code smell. You should only have serious merge conflicts when you really are working on the same logical units, and in that case you should communicate.

In a sense I'm advocating you put into practice Whoever's Law, that says that software tends to take on an architecture that reflects the org chart. If I and 10 others are constantly editing the same thing maybe we need to figure out if we can divide up the code's responsibilities so that we aren't always editing the same parts, or simplify the truly atomic inseperable parts so that the merge conflicts end up looking trivial (eg, a configuration type line that lists a bunch of strategy objects to link in, with the actual code in those strategy objects spread in across different files)

Or, alternately, after thinking it through maybe you find it's just irreducible and you have a "9 women can't make a baby in 1 month" scenario. In that case maybe you should communicate more so that everyone isn't working on it at once. If all 10 of those people have useful essential input, do some upfrong design / pseudocode and come together in a brief meeting or asychronous design doc review to get consensus on what to do before putting in many hours into the low level details, and then have just 1 or 2 people work on actually implementing it.

At the end of the day, serious frequent merge conflicts are just a code representation of org chart level issues, so they are solved with communication.

> `git pull` is approximately a synonym for `git fetch && git rebase origin/master`

`git pull` is precisely `git fetch && git merge FETCH_HEAD`.

Since rebases are history-mutating then to achieve a rebase one either adds a `--rebase` to the pull or configures pull to do a rebase in the configuration (don't do this unless you really know that you need it).

Heh, sorry, I forgot that always fetch/rebasing instead of fetch/merging is a personal "quirk" of mine. It's much better IMHO because then when I finally merge my work from a long running feature branch, my colleagues see a single merge at that time and don't have their history polluted with all the little merges that happened when I happened to pull in others' changes ("phantom merges") but that they never actually got to see at the time. Especially for long running feature branches, you can end up with an extremely illegible spaghetti history by constantly merging with git pull. Like I said, rebase should really be the safe default and merge the scary weird one. The dogma that "rebases alter history (OH NO!) but merges are SO SAFE" is just misleading when you look at what the operations actually do. After all, rebase works exactly like merge worked in legacy SCMs like SVN, and merge does something completely novel.

Therefore, merging should be reserved for when the feature branch is joined back into the mainline (thus preserving history that is already shared by the whole team), never for when newer mainline changes are joined back into the feature branch (no need to preserve "history" that nobody but me ever had-- it's not even history, it's more like metahistory, the history of when my history was updated -- actual linear history is still preserved), to avoid this nonlinear illegible spaghetti. This also prevents you from ending up with intermediate commits that don't build, thus making bisecting painful.

This seems a little off topic, but I promise it's on topic-- I learned to use git this way the hard way when working on a team that indeed had the problem of lots of developers working on the same files with lots of feature branches.

PS: I read TFA after writing this comment (ha) and it sounds like this tool is a great way to detect areas of code that could most use the kinds of refactors I just discussed. Just following the pain would work too though, albeit with a slightly larger lag.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact