Hacker News new | comments | show | ask | jobs | submit login
The second-order-diff Git trick (moertel.com)
180 points by malloc47 1734 days ago | hide | past | web | 41 comments | favorite

It looks like what you're really looking for is a local branch that you never push to the remote, and 'git diff master...' with 'git rebase'.

Create a local branch, apply the sledgehammer, and start reviewing changes. Use 'git diff master...' to review changes. (This is short for 'git diff master...HEAD' which in turn stands for 'compare current branch's HEAD with the commit on master off of which you've branched.) 'git add -p' and commit changes that you like.

Iterate until you're happy with the result. You will have ended up with a few commits on your local branch. Use 'git rebase -i master' to squash all commit in a single one. Finally, check out the master branch and merge your local branch in, preferably with --ff.

In this case, however, I don't want to use something like `git add -p` to pick the sledgehammer's good effects from the bad. That there are bad effects means the original sledgehammer was wrong. I want to fix the sledgehammer.

Yes, I could accept the good effects and then create a new sledgehammer to attempt to fix the bad effects (without affecting any good effects or any original code that just happens to look like a bad effect), but it's easier and more reliable to just roll back all of the effects, fix the original sledgehammer (tweak the regex), and reapply it to the original clean slate.

Why not just commit after the first step? You can then use the normal diff, and afterwards you just remove your temporary commit to make the final one.

> Why not just commit after the first step?

Good question. In answer, there are two reasons:

First, a stash is a commit, just one that doesn't get in your way and that you don't have to keep track of yourself because it's automatically pointed to by a known reference called "stash".

Second, by using "git stash" to create this commit instead of "git commit", you save yourself the (small) burden of moving the commit out of your way and resetting the working tree back to the clean slate you started from – and upon which you want to try a new, slightly adjusted sledgehammer from scratch. That's exactly what "git stash" does in one step:

"Use git stash when you want to record the current state of the working directory and the index, but want to go back to a clean working directory." [1]

[1] http://www.kernel.org/pub/software/scm/git/docs/git-stash.ht...

If you do that, you're just one accidental step away from uploading your bad changes to the origin. The stash way seems a bit safer and more natural to me.

or, do it on a branch, then merge it into the production branch.

I think 'git add -p .' would have worked better for the specific example he gave, though. (Step through and stage change by change)

The reason I don't use `git add -p` to incrementally approve the "good" changes is that the sledgehammer is not incremental: it expects to be applied to the original clean slate. Using git stash will get me back to that clean slate. Using `git add -p` will not, for both the approved and unapproved changes will remain in the working tree, where the sledgehammer expects neither of them to be.

Getting back to the clean slate is pretty trivial whatever you do, which is one of the nice things about git. I'd probably prefer to have a main (sub) branch that I add to incrementally, apply various sledgehammers to the "clean slate" and rebase the results of the sledgehammer liberally.

as i said in my other comment. that will only work for idempotent things: if you like to change, say, ' to ", the workflow in the article works, but git add -p won't.

I don't follow you. Any time I'm inspecting the diff to see if I made the change I want, staging the changes i want progressively and then discarding the rest has worked well.

the important part missing in my example is the 'vice versa'. the canonical example is:

> sledge() { tr ab ba < $1 | sponge $1 }

then, you might undo a change you already -p added.

more convincing might be (using gnu sed):

> sledge() { sed -i s/identifier/long_identifier/ $1 }

a second iteration will eventually generate long_long_identifier.

git stash has other cool features and use cases (for example, when you want to pull from upstream and you have conflicting changes.) http://git-scm.com/book/ch6-3.html is a nice summary and the manpage also gives some sample workflows.

I actually saw this as a strike against git usability -- this should happen transparently imo, why should I have to git stash save; git pull --rebase; git stash pop for things to apply cleanly? Just do it automatically and fail if the stash application fails.

Because then git would be doing magic stuff you don't (necessarily) understand, and people who like that aren't git's target audience. Those are all separate pieces of functionality that shouldn't be stuck together by default.

The same argument could be applied against git pull, which is really the concatenation of fetch and merge (or rebase). I agree with the previous poster that there should be better defaults available to non-tweakers.

Git pull always seemed weird to me too. I don't use it.

I'd call that "happening opaquely"

You could just create an alias for that.

Of course the problem here is that the GP can't actually be sure that every link he wanted to fix in the final step was fixed; just that the ones that _were_ fixed were fixed right!

GP here. You're right that the final diff, in this case the second-order diff, cannot by itself prove that my final adjustment fixed all of the broken sentence-end links. But I wasn't merely going on that evidence.

The whole point of using a second-order diff was to allow me to reliably carry forth the knowledge gained by my exhaustive review of the prior diff. That exhaustive review told me that there were a dozen broken sentence-end links. And that's how many showed up as fixed in the final, second-order diff: one dozen.

So the prior and final evidence, together, allowed me to be confident that the adjustment worked as intended.

Very true, and a good point. I thought it was a interesting little gotcha about the whole technique though: sometimes you will actually need to go ahead and look at whole diff to be 100% sure.

Indeed. Whenever you drop the sledgehammer, you have the obligation to exhaustively review its effects at least once to be sure there wasn't collateral damage. The beauty of the second-order diff is that, once you do an exhaustive review, you need not do another one just to adjust the sledgehammer.

I used a similar technique for a less frivolous task. I had script dumping data in text format, and wanted to refactor it. I committed the result and could refactor brutally step by step, running the script at each step and reverting whenever the dumped data showed some diffs.

when i have an idempotent sledgehammer (i.e. i can apply it twice and get the same result) i usually just

   git add -p 
everything that is ok. git diff will only show you the differences against the index, so this works as well.

This is a great tip, thanks! I never thought of using git diff to test things out, and I definitely didn't know you could diff against a stash.

That was news to me as well. I've always done the following:

    $ diff -u <(git stash show -p) <(git diff)

never again.. by the way, `git diff --no-index` also works like a generic diff utility on any two files.

(Wow, that --no-index behavior seems to be default outside repositories now. I learned something, just git diff works now.)

I find this alias really useful for accomplishing similar goals:

alias gqc="git commit -m 'quick commit'"

The command is usually preceded by "git add ." (or alias "ga."). Making a commit ends up being more reliable for me than stash. Also, the commit stays with the branch, making it easy to switch to master branch, make or check an important change, then go back to what I was working on in the develop branch. Additionally, it makes rebasing a work in progress easier. Just gqc && git pull --rebase.

When it comes time to push, I can just check the log for all the "quick commit" commits. If there's just one, then I make an amend commit. If there's more than one, I rebase interactively.

I suppose I could add a hook to make sure I never accidentally push a quick commit, but it hasn't been an issue yet (over the past year or so I've only made the mistake once).

Am i missing something, or is this nothing to do with Git? You could do this with any source control tool, or indeed with simple copies of the files. I've used exactly this approach without Git for years.

> Am i missing something, or is this nothing to do with Git?

Of course it's something to do with git, it describes a workflow in git. Yes, you can do this with other VC tools or even without, but this is how you do it with git and, I'd argue, it's better than other ways, certainly more convenient than without a version control tool at all, IMHO.

A bit offtopic, but I think that you could have let the default `pandocCompiler` in Hakyll handle your old posts written in Textile format.

I actually tried that at first, but the Typo-flavored Textile in my old posts wasn't reliably interpreted by Pandoc. (That's also why I had to clean up the posts after I used Pandoc to convert them into Markdown.) Since I had manual edits to do in any case, I figured I might as well do them after I converted my posts to Markdown since it seems to be the Pandoc's best-supported markup language.

Nice tip, My sledgehammer is a git-sub script (http://minhajuddin.com/2011/12/13/script-to-do-a-global-sear...) :)

This is very useful, as the article describes in detail.

I've found it useful to be able do diffs of diffs in my work, hence I'm planning to add the ability to do that to my toolset. Combined with live editing, I think it's going to be quite neat.

I tend to 'git commit' and then 'git commit --amend' until I'm happy, then push to others.

I find it easier anyway. But the stash approach is interesting, thanks for sharing!

That's good advice for a lot of situations but it depends if your "sledgehammer" is assuming a certain starting condition.

In the authors example if he got it wrong and it replaced a load of non-links with links then running it again on the output of the first run isn't going to do any good. So he's suggesting replacing "git reset" with "git stash", both reset the repo to the way it was pre-sledgehammer but "git stash" also keeps around the previous results for comparison.

Oh I see, because he's using git itself to find the files that need work on them? I had honestly missed that part of the logic.

That'll teach me to 'scan read' and then comment!

Nice one, indeed!

This is the equivalent in Perforce of shelving your change, then redoing your work and diffing against your shelf.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact