
Git wishlist: aggregate changes across non-contiguous commits (2019) - luu
https://blog.plover.com/prog/git-wishlist.html
======
aeneasmackenzie
1\. implement the patch category that Pijul uses

2\. compose the patches

3\. either you have a nice diff to show the user, or an explanation of why
that is not possible, like "Can't show you the composition of A, X, Q because
X depends on C"

Unfortunately they keep rewriting pijul from scratch.

~~~
dilap
I am excited about the current rewrite though since (if it lives up to the
hype) it'll bring performance good enough to use Pijul for non-tiny projects.

Not sure if Pijul will be the software to do it, but someday we'll all be
using patch-based VCSs and look back at graph-based models like git the same
way we look at non-DVCS today.

------
Groxx
`git log` does this. Filter by --author or --grep to find the task ID or
whatever.

If you want to produce a sum-total set of changes... it's a dubious/imprecise
desire, as intervening changes may affect the result. But you can still do a
potentially-good-enough job by cherry picking commits you're interested in
from a common ancestor, and diffing the end result against the ancestor you
started from. If the cherry-picks don't apply cleanly, there's no
straightforward diff or answer, and then it's of course much more complex.

\---

edit: or I just googled around a little, and if all you want is the sum of all
changes in arbitrary commits that you can find with log: `git log -p |
diffstat`

if you add -> remove -> restore a line, it'll count that as 2 additions and 1
removal.

------
phyzome
That list is missing a level zero: "It already does, you just haven't found it
yet." (Not applicable here, mind you.)

~~~
Pxtl
Yes, I recently discovered "git cherry" in exactly that kind of situation.

In general I'm gradually sinking into "the emperor has no clothes" when it
comes to git, at least on the subject of squashing vs painstakingly
manipulating history vs accepting filthy history... But I also admit the
amount of _stuff_ you can do with git is so endless that I'm always surprised
to find it has a good solution for a problem I never even considered tackling
before it came up.

~~~
jatone
git is really simple it only has like 6 atomic operations. thats what makes it
so beautiful. a git rebase to reorder the commits and drop the uninteresting
ones would produce the result wanted here. along with a few other approaches.

~~~
Pxtl
I'm aware of the interactive rebase and the simple rebase, but the amount of
stuff we throw at students and juniors, asking them to clean their history in
such a hoary old ui seems like a lot to ask.

Plus, I've become painfully aware of what happens when you encourage squashing
and rebasing but developers don't carefully delete the source branches. The
housekeeping of old branches is nightmarish since they're all _ahead_ of the
target, because rebasing means the commits aren't the commits.

And it means allowing force-push on our git server.

~~~
TheCraiggers
git rebase is akin to being a wizard but forbidden to use magic. I _know_ it's
there. I _know_ it could solve a ton of my problems. But I can't use it
because it would break everything from the feature-branch promotion model to
my teammate's repos. It's basically useless to me, despite all the power it
grants me.

~~~
dharmab
We use rebase heavily and it makes our lives so much easier. The main rule is
that we never rebase anything in the main repository- only in our forks. And
everyone accepts that the forks are unstable.

~~~
lmm
What do you actually get out of it though? Everyone talks about "clean
history" but how much do you actually use that history? (except to bisect -
and bisect works better without rebasing, IME)

You're definitely missing out on being able to pull each other's branches -
that's a huge help for avoiding conflicts ahead of time (if you know your
colleague is halfway through a change in a given code area, start from their
branch rather than master) or for unblocking development while a quickfix is
making its way to master. I could believe that tradeoff could be worthwhile,
but I've never found that rebasing offers any real advantages in practice.

~~~
phyzome
I use it all the time! I do really messy stuff locally, and then I rebase to
organize it into commits that tell a story, which makes it easier for
reviewers (and archaeologists, such as me in 5 months) to understand.

Most common pattern: Commit 1 does a preliminary refactoring, commit 2 does
the actual feature change.

If you squash them together, the changes are harder to understand. But in the
editing process, I often end up doing additional refactoring before I'm done
with the feature work, so I use rebase -i to reorder and squash the changes
appropriately. (I also run tests at both commits to make sure that my story is
true, heh.)

(This way, you can even do things like later revert the feature change without
undoing the refactoring work.)

~~~
lmm
> I do really messy stuff locally, and then I rebase to organize it into
> commits that tell a story, which makes it easier for reviewers (and
> archaeologists, such as me in 5 months) to understand.

So people review commit-by-commit? What review tool are you using for that?
Github and Bitbucket very much nudge you towards reviewing the PR's diff as a
whole, IME.

And do you find that you actually look at the history in 5 months' time?
People worry about this a lot but I find it's pretty rare to actually need
more history than what's in the diffs.

> But in the editing process, I often end up doing additional refactoring
> before I'm done with the feature work, so I use rebase -i to reorder and
> squash the changes appropriately. (I also run tests at both commits to make
> sure that my story is true, heh.)

I do that but without the reordering - it just seems like a lot of overhead
(and lots of potential for conflicts since the refactoring will almost
necessarily be in the same code area as the feature work) plus running the
build/tests again etc.

> (This way, you can even do things like later revert the feature change
> without undoing the refactoring work.)

You can do that either way, though I guess if all the refactoring commits were
before the feature commits then that means fewer conflicts (though you still
have to worry about subsequent refactorings in the same area). Is that
something that happens often?

~~~
dharmab
> Github and Bitbucket very much nudge you towards reviewing the PR's diff as
> a whole, IME.

That's if you're using the merge or squash options. There's an enforceable
rebase option in GitHub which preserves the original commits.

> And do you find that you actually look at the history in 5 months' time?

I often go back 2-3 years to do archaeology on code written by people who are
no longer in the org.

~~~
lmm
> That's if you're using the merge or squash options. There's an enforceable
> rebase option in GitHub which preserves the original commits.

A rebase by definition doesn't preserve the original commits (whereas a merge
does). But I'm talking about what the workflow for reviewing the PR looks
like, before you approve it for merge (in any form) - that's the usual time
for review, no?

------
zwegner
As the article notes, this is hard to do cleanly, depending on the exact
semantics of what you want. And even with a full spec in hand, I don't think
git's data model makes it particularly easy to compute this information.

One method for doing this manually is to do an interactive rebase and re-order
all the relevant commits to the end. You basically create a new branch that
has all the other commits first, and then you can diff between that point and
the branch head. With the example commits from the post, you'd order them A B
D F G C E H; then git diff G H would show just the changes from C E H.

This is a bit annoying in that it needs to use the working tree to apply all
the commits in sequence, and you have to manually resolve any conflicts that
arise, which gets more difficult the larger and more-intertwined the commits
get. But maybe it's better than nothing...

~~~
chrismorgan
git-revise is what you want here. It’s rebase, but without touching the
working tree. This makes it _vastly_ faster, avoids changing file mtimes
(which is _very_ useful property if you have a build system that operates on
mtimes, such as make, or something that is watching for file system changes),
and if you do need to resolve conflicts it’s easier to confirm that you
haven’t changed anything in the final result, because that’ll produce warnings
and a dirty index.

[https://github.com/mystor/git-revise](https://github.com/mystor/git-revise)

Since learning about this, 90% of my rebase usage has switched to revise, and
I’m much happier with it all. I also probably revise commits twice as often as
I used to because it’s so much faster and has no file system modification
side-effects.

When there are conflicts, it only gets you a two-way merge, so that
_occasionally_ I reach for rebase to help with handling non-trivial conflicts
from reorderings.

~~~
ninkendo
I’ve been wanting something like this for a long time, thanks!

I like to do a lot of rebasing and splicing together/apart various commits,
and I hate that rebase touches the working tree, it means I can’t have my IDE
open during the process or it starts throwing error messages about invalid
project files (due to the conflict markers showing up.) I’ll definitely have
give this a look.

~~~
chrismorgan
Yes, this is exactly what you want. Even if it’s bothersome setting it up
(which Python things can be, depending on how you are willing to install it;
especially on Windows), you will find it worthwhile.

------
_ZeD_
umm... but why can't you just create a new branch, cherry-pick all the
relevant commit then diff the result with the "base"?

~~~
gjm11
I'd suggest: Because _creating a new branch_ just for a one-off look at some
diffs seems kinda heavyweight, and because it requires several separate
operations in order to do a single thing.

~~~
_ZeD_
branch in git are not really heavyweight, look at for example at mercurial
ones and why the pypy team prefer them [0]

moreover I think this approach could be at least "scriptable", if not
aliasable

.. [0]: [https://doc.pypy.org/en/latest/faq.html#why-doesn-t-pypy-
use...](https://doc.pypy.org/en/latest/faq.html#why-doesn-t-pypy-use-git-and-
move-to-github)

------
wbharding
My company GitClear has invested some years into this very wish. We came up
with "commit groups", described here
[https://www.gitclear.com/review_code_faster](https://www.gitclear.com/review_code_faster)

During development phase, our service has been priced at $30 user/month which
is too expensive for a diff viewing tool, but we're going to start offering a
free version in next couple weeks that will offer access to commit groups.

Here are some other experiments we've been working on to try to reduce
cognitive load spent reviewing commits:
[https://www.gitclear.com/blog/diff_viewer_updates](https://www.gitclear.com/blog/diff_viewer_updates)

I believe a lot of what makes reviewing code so time-intensive is that
traditional diff viewers aren't recognizing moved or find/replace code, so
devs still have to waste a lot of cognitive attention parsing what are
effectively no-ops.

------
VeejayRampay
you could use interactive rebase for this

~~~
cowsandmilk
Yeah, that was my thought. Sure, I can create artificial scenarios where it
wouldn’t work, but I’m certain it would work 90% of the time.

~~~
jatone
yup. and the cases it wouldn't would need developer intervention anyways.

------
jfkebwjsbx
If your commits are clean and properly ordered, you won't need this.

And if they aren't, then you cannot rely on the diff because semantics may
have changed in-between...

~~~
jonhohle
If your commits are clean and improperly ordered, this may be useful.

A perforce implementation I used in the past supported checking into a
development branch and reviewing the set of CLNs that hadn’t been merged to
mainline. I don’t necessarily agree with the workflow, but it was nice to have
code complete, but not yet ready to ship things in everyone’s working branch.

~~~
jfkebwjsbx
If they are improperly ordered, then the first thing should be reordering them
with an interactive rebase and then you are back at my previous post, though.

