Hacker News new | past | comments | ask | show | jobs | submit login
Git: Using Advanced Rebase Features for a Clean Repository (mtyurt.net)
114 points by mtyurt on Aug 9, 2017 | hide | past | web | favorite | 84 comments

Editing git history makes sense in several cases:

1. Projects like the Linux kernel which use frequently use 'git bisect' to perform a binary search on the history of project (to find when a bug was introduced), or where the patch series tell a story to code reviewers.

2. Open source projects, where some contributors have terrible git habits that the maintainers don't want to merge.

Editing git history makes less sense in other circumstances:

1. When dealing with less sophisticated git users who don't understand git's data model. About 90% of git nightmares begin when a novice git user tries to rebase something.

2. When it adds a whole layer of unnecessary process for zero payoff. If your developers all work in short-lived branches with clean histories, then just go ahead and merge normally without forcing everybody to jump through a bunch of hoops to beautify a history that nobody ever looks at anyway. Git was designed to handle branches.

I always twitch a bit when I see a 10-page blog post describing a "git workflow", with all sorts of complicated branching rules and heavy use of rebasing. That can make sense in certain specialized situations, but it shouldn't be considered a "best practice" that everybody needs to emulate.

> I always twitch a bit when I see a 10-page blog post describing a "git workflow", with all sorts of complicated branching rules and heavy use of rebasing. That can make sense in certain specialized situations, but it shouldn't be considered a "best practice" that everybody needs to emulate.

I tend to justify this based on a simple observation: writing clean code that is easy for others to understand is also hard. If you could write a document that teaches others how to write clean code, what would it contain? How long would it be? How complex would it be? I'm betting you've built up a repertoire of wisdom on writing easy to read code, and I'm betting that document would be quite long, especially if you furnished it with examples.

I think it's the same for commit history. If you're rebasing, then you're probably trying to shoot for a clean history that other humans can understand. Maintaining a clean history can be just as hard (and just as rewarding) as maintaining clean code.

How is it rewarding? No one ever sees the history, honestly. Writing good code results in working code the people build off of. Maintaining clean, linear git histories does.. what exactly?

Uh, I read commit history all the time. (My initial comment assumes you believe the idea that a clean history is useful, because that's the context of ekidd's comment. So I really don't feel like debating that particular point, if that's what you're getting at. It's been re-litigated hundreds of times.)

Your git history is very important for working out what the rest of your team did, when they did it, and why. Having a clean and linear history, preferably with informative commit messages and maybe even references to bugs or features, lets you understand what was going on faster and with fewer mistakes and false starts.

Why would you want to understand what was going on? Maybe for fixing a bug, maybe for working out why a seemingly useless change was necessary, maybe to understand the context of a code review building on previously merged branches, or maybe just because you need to catch up on what happened while you were gone.

If that's not important to you, then don't bother, but it can be hard to understand what you're missing until you've tried both ways.

No one ever sees the history?

Really ? To me the need for checking history is extremely common. Many projects have multiple delivery, maintenance and other branches, patches need to be applied here and there. Bugs are found and the versions affected needs to be identified.

Not maintaining a clean history would cause severe pain.

How do you enforce a commit policy in your projects?

Do you automate it?

Or do you provide feedback by mail, IM, ... to the author when a commit mistake is detected?

Review-before-merge, maybe?

Obviously, for any old arbitrary project, there's going to be limits to what you can ask your contributors to conform to before they just leave, but we're discussing "Best Practices"[1] here.

I think these articles are useful because they spread the knowledge of "How To Be A Good Contributor"... for free.

[1] Ugh. Not a term I like, but really this is almost always(!) the best way to do it.

We have a GitHub integration that runs on every pull request and push, which triggers an AWS Lambda function that verifies that disables the big green "Merge" button if the branch could not also be fast-forwarded onto its parent.

> No one ever sees the history, honestly.

Then you wouldn't see value in rewriting. Unless you're not reading the history because it's a pain to read because it's messy.

I use `git blame` relatively frequently to understand the rationale and history behind the code I'm working on, or to track down bugs. Of course, this also requires writing decent commit messages---something that I encourage developers to do so that those maintaining their code in the future have some direction. Commit messages are a bit different than software documentation, not a substitute.

I bisect looking for bugs, where a clean history with small commits _that actually build_ is essential to quickly finding a bug. That can even be automated if you can create a script that can run a test suite or some other test to check for success.

This becomes increasingly important the more code and complexity you're dealing with. There are other cases where I look at history, but those are significant examples.

> 1. Projects like the Linux kernel which use frequently use 'git bisect' to perform a binary search on the history of project (to find when a bug was introduced), or where the patch series tell a story to code reviewers.

This seems like an instance of adapting development practices to bad tooling, whereas we could be fixing the tool itself. Shouldn't there simply be a "tree aware" git-bisect that can intelligently handle branches and merges? That would resolve this sticking point.

> 2. Open source projects, where some contributors have terrible git habits that the maintainers don't want to merge.

Then tell them to go away and come back with a cleaned up PR.

> Shouldn't there simply be a "tree aware" git-bisect that can intelligently handle branches and merges?

Afaik, git-bisect is aware of branches and merges.

But there are two drawbacks to not rebasing/squashing: 1) the number of extra commits. Although bisecting is O(log N), it still adds up if it takes a long time to run the tests. 2) if there are any commits that don't build or have to be skipped (git bisect skip) because of other reasons, the number of steps to bisect increases.

It's a good idea to ensure that any commit that goes to master branch will pass all tests or at least compile. Failure to do so will make git-bisect a lot less useful.

If true, the the whole claim that nonlinear histories break git-bisect is bunk.

Non-linear histories work just fine with git-bisect. What breaks git-bisect are histories where every commit doesn't work; for instance, if you have one commit that breaks something, and a second commit that fixes it, git-bisect won't work well.

It will work just fine, you just might have to call git bisect skip a time or two or run it more than once.

That's not "fine"; bisect takes long enough when you expect it to work consistently. And that assumes the code fails at build time, rather than mysteriously at runtime.

That also breaks the ability to do automated bisects.

I don't think it's at all unreasonable, at a minimum, to expect every individual commit to build and to pass tests.

Do people actually look at the history a lot? After a pull request has been merged I rarely look at the history and I am really not interested in it. This is for a small team with around 5 people. In larger teams is it more important to see the history?

I work as lead dev on a 5 year old web app I took over from a previous dev and his team of subcontractors about a year ago. It's very helpful to see into the past when there's no way to just ask the previous dev.

Even if not at the history directly, commit messages are associated with the lines they edit. This provides great granular documentation: https://vincenttunru.com/assets/img/Commits-are-documentatio...

(From my blog post about why history matters: https://vincenttunru.com/Spend-effort-on-your-Git-commits/ )

I work in a University setting where we have student's working in our various projects. While some stay with us for a number of years, some are with us for a semester or two.

Even with full-time staff, in the almost 6 years I've been here the rest of the dev and design staff completely changed (and grew).

In my opinion, if you're not using history/blame your code is relatively new, or you're the only one touching code. Or you have regular code reviews (paired programming, whatever).

Seven years at my last job was just full-time staff, but we had the same issues, once a version control system was actually implemented.

Resolving conflicts through a merge or a rebase is a different style (all-in-one vs piece-meal), and I find both have their uses in different contexts. Sometimes, a big merge with tons of conflicts is too daunting, where a patch-by-patch conflict resolution is more palatable. Other times, you're doing busy work on tons of patches which would be far more efficient just doing once on a big merge.

But, again, this depends on the situation, the code, the history, the patches, etc. It's just a useful tool to keep in mind come merge time.

You can do both. Squash your working branch to a single commit, then rebase. That way you still get a clean view of just the changes introduced in the branch, and don't have to resolve merge conflicts every step of the rebase.

is there a site that shows an animation for each git command?

Here's one for the most basic ones, except staging: https://agripongit.vincenttunru.com/

(Disclaimer: I made that :)

Not animations, but git-scm.com is excellent

I think a lot of people don't bother with rebasing because they either don't know how to do it, or they are scared of the idea of a version control system not explicitly saying what they've done to accomplish the latest version of their code.

Once you get even a small understanding of it, there are plenty of places you can use it. For example, in the past month I've used it to:

1. Rewrite the history on a coding test I was given for an interview. The recruiter was oddly specific about how long to spend on it, and when it had to be in by, even though they said no one would look at it for another week. I worked on it throughout the extra week, and modified the history to make it look like I had done it in the allotted time-slot.

2. Rewrite history in feature branches where it doesn't accurately reflect what I had to do to get the code working. I work in an agency environment and (whether I like it or not) requirements change, sometimes mid-way through delivering something. If I've written something that won't ever see the light of day, or if I've done something that won't help the next person to read that code I'll rebase where needed.

3. Saving time for pull requests. I tend to commit early and often, and when I'm in "the zone" on a fairly chunky bit of work that can mean quite a few commits! When it comes to peer review, some people like to do it by commit, rather than the finished output, so to help these guys out I squash commits where possible, since we use our pull requests to illustrate the problem we're solving anyway. I think an untarnished history is important, but sometimes a PR audit trail is more useful than what you'd get from pure commits.

Just a note on your nr1: I don't think rebase back-dates the CommitDate, does it? There are two dates on any commit: AuthorDate and CommitDate. Try git log --format=fuller. E.g.:

    Author:     ...
    AuthorDate: Wed Aug 2 11:02:03 2017 +0100
    Commit:     ...
    CommitDate: Wed Aug 9 12:08:52 2017 +0100
This is a commit I originally created last week, but rebased onto my current branch just now. To change the CommitDate, you have to go filter-branch, afaik. But maybe I'm ignorant of a germane rebase feature?

You're correct, it doesn't change the author date.

In order to do no.1; perform your rebase, and then change the dates with filter-branch. I have a couple of helper functions for both for resetting to commit or author dates [1], use at your own risk! :)

[1] https://gist.github.com/dm/aad8e34a5ee6b542a0bc788375b548ed

Yep, it took me a while to figure out why my commit dates didn't match the dates within GitHub. Luckily, it's fairly straightforward to figure out once you know what the issue is!

> 3. Saving time for pull requests. I tend to commit early and often, and when I'm in "the zone" on a fairly chunky bit of work that can mean quite a few commits! When it comes to peer review, some people like to do it by commit, rather than the finished output, so to help these guys out I squash commits where possible, since we use our pull requests to illustrate the problem we're solving anyway. I think an untarnished history is important, but sometimes a PR audit trail is more useful than what you'd get from pure commits.

I think this is the most important one. Your commits are documentation for people reading your code, so whatever helps them is good.

Another note on #1 (which I've also done in the past). Don't your pre-rebase commits remain in the reflog?

I personally almost never rebase and/or squash, since I think that the information that gets lost (like: when was it started, what were mistakes along the lines) might be useful for future understanding of how the project work evolved over time, and what adjustments should/could be made to the development process.

But I understand that a screen as shown in the article is not immensely usuful. But if it comes down on how to present git history information to users, maybe an additional aproach would be to come up with a better presentation layer (that, for example, could hide merge commits, "squashes" branches, etc.

Say we work together. Why would you want to see my 5 solutions to a problem in `master`, of which 4 actually never really ran in production? I do commit often, things that I try and later throw away are in the history. Once rebased and squashed, only final solution is in `master` history.

> 4 actually never really ran in production?

but the history showed that it was tried (and perhaps the commit message can have a short note on why it wasn't selected).

without the git history, you'd have to rely on human memory to know this. Or a separately maintained documentation (which, lets face it, is never going to get updated).

Or store it in another branch - that's the approach we had been using in one company I worked at.

Ah, yes, this was implied :) only working code (as checked via CI) is allowed to show up in master and other long living branches.

This is much easier when I actually need to trawl through history to find things.

I think there are two perfectly acceptable schools of thought.

One says that the history should be preserved exactly, because its important we keep a record of exactly what happened.

The second says that its ok to rewrite history a little if that makes it more understandable.

I think you would put yourself in the first, and that's ok. I would put myself in the second because at the end of the day I value understanding over precision in git histories. I'm of the opinion that one cares about the commit I made to correct a missing semicolon.

The usual rule of thumb is not to modify (e.g rebase) published commits. So it's perfectly fine to adjust local commits before they go into centralized repo - from the point of view of an external observer the history is never destructively modified. This rule can be extended to topic branches or user owned branches with the caveat that others should not base their work upon the topic branch.

Also remember git push --force-with-lease!

And `git commit --amend`

I missed a semi-colon or a file? I probably noticed right after I made the commit (and before I pushed it to the remote branch).

I am in the first because I don't think you should change history. But I think there is a need for tools that can show history like it would look after rebasing and squashing.

Maybe we need another layer of abstraction, maybe recorded as merge commits, that would present the summarized history.

I really don't need to see all your false starts at a solution to the problem. I don't mean that in a patronizing way, it's just that it is truly irrelevant when I'm digging through commit history.

Question: Do you have a problem with rebase-before-push?

Regardless, there's a happy medium here: Just commit whenever you feel like it, then rebase, squash or whatever just before you're reasonably(!) confident in your approach... and push. Obviously you don't want to force-push, so at the point that you're "reasonably confident" you're committed. If you need to retrace your steps, you just do that through normal "git revert" or whatever.

The point of the above being: This way of working preserves the useful parts of history and you still avoid the absurd noise of seeing everybody's "I changed a thing" commits and non-compiling-commits as well.

The thing is I commit very frequently, and a lot of the commits don't really make much sense by themselves. Often with stuff that's just plain wrong or confusing inbetween.

e.g. remove console.log

what do you need that in the git history for?

“Fix lint” is my favorite one.

My unfavorite is WIP and "update submodule(s)". In fact I contemplated making a git hook to reject such drivel on merge.

I wish there was some happy medium, like that I can merge a branch and it gets squashed into a single commit, but then if needed I can 'expand' that commit to see its contents, even after the other branch is deleted.

That’s a rebased branch. You can see just the changed introduced by a branch:

  MERGE_COMMIT=$(git rev-parse :/“Merge .*$branchname”)
It never occurred to me that people prefer to squash so they can avoid the crummy UI of git diff

I vastly prefer rebasing and squashing. Just yesterday I had to review some history, and reading the blames was incredibly frustrating.

> Clean up style

What was this code committed with? What other changes came along with it.

Git, for me, has 2 functions. When I'm developing, it's to save my work in an incremental way that I can reverse. It's to make notes about what I'm doing. After I'm done, it serves as a way to lay blame on me for what I changed and to track those changes in Github. "Who wrote this? What was it committed with?"

I want my PR to be encapsulated in 1 commit after it's been merged. At that point, there is no reason to read the internals of what I changed.

This is the type of workflow you get when using something like gerrit.

because gerrit forces every change to be a single commit, you have to rebase, which ensures that the history is nice and linear.

I know there is some people who prefer the "git-flow" style history graph (and I was one of those people) - but there is a lot of advantages to a clean history.

And, yes gerrit can be painful at first, but is a lot better with something like git-review[0], or repo[1]

0 - https://docs.openstack.org/infra/git-review/

1 - https://source.android.com/source/using-repo

Git is a product where there are 5 opinions on how best to use it for every 4 people.

AKA "Two git users, three opinions".

No one's perfect with this stuff.

If you don't believe me, clone git itself ($ git clone https://github.com/git/git) and open up the repo in tig(1)


To be honest, I only in the past 2 years even bothered to view ($ git log --graph). Regardless of --graph getting wide now and then, I always visualize my git history as a straight line.

Also, sometimes having a non-linear history is inevitable. Especially in large open source projects where you're pulling in patches to a branch, and patches on top of that. You're not always going to be merging a branch with a single author straight onto master.

Despite posts like this (http://www.bitsnbites.eu/a-tidy-linear-git-history/) encouraging good git hygiene, I've had multiple open source projects merge in code via GitHub and never had negative consequence for it :P

Maybe there are corner cases where git bisect wouldn't work? Though I never used git bisect even once. Most I do is scroll through tig and view diffs. Also used to play with a cool git plugin in vim (https://github.com/tpope/vim-fugitive).

Also, GitHub has (since that linear git history post) introduced Rebase + Merge https://github.com/blog/2243-rebase-and-merge-pull-requests. I think that'll get you what you want.

I do keep branches ("pull requests" if you're using GH lingo) up to date with ($ git pull --rebase). That does mean a force push ($ git push --force), but it's ok if it's your personal branch. I also use interactive mode ($ git rebase -i <sha>) to edit/blend multiple commits.

Also, when I do merge, if I go through CLI, I'll preserve the history of the branch by not doing fast forwards ($ git merge <branch> --no-ff).

Just one small thing, I have found a few times git rebase breaking git bisect.

Git rebase can create large sequences of commits which no-one has ever checked out, and often don't build -- after a git rebase most people check their new head builds and passes tests, but I've never seen anyone bother check their new history works.

git rebase --exec <test command> runs your test suite on each commit during the rebase. It is super useful

I have been using that in every branch: I commit regularly and end up with 10+ commits. Then I rebase + squash them and at the same time write a summary commit. Eventually I merge. This has multiple good effects. First, you get a clean, featured-based history. Secondly, although your commit message is the one you wrote when you rebased, you can keep the old commit messages and you get a better summary of what happened during the development process of that branch.

Why have a history then at all? The idea is not to squash history but make it reasonable chunks. Remove chaff so to speak while keeping the general history. A single feature is very rarely a reasonable chunk. (For example, see Linux kernel patch series per feature.)

Otherwise you may lose the "why" unless code is extremely well commented and that never happens.

Not the OP but with my own personal usage, sometimes I'll end up with multiple related commits where I've had to go back and fix a test or correct an issue I missed with an earlier commit. Therefore I commit and squash that into the earlier commit with related changes.

I don't buy that every change should be a separate commit - if changes are related and the commit message covers the changes it makes sense to squash.

Not sure if OP is talking about squashing all commits into one. I have seen people do this and I'm always a bit confused why, similar to your point.

ok depends on the size of a feature I guess but usually I sit in a branch for 2-3 days before I finish the feature.

Also what's the point of preserving the history of a branch that is used for the development of a requested feature ? For each commit before the final, the feature is incomplete, possibly not working at all.

Not everyone is a believer in shippable increments, but it is a good practice nonetheless. (not necessarily fully working)

Splitting per feature is indeed ok if it is small including effects on dependencies. But in my practice there are rarely small enough features - these map to something akin to user stories which have to be further broken down.

For God's sake, DON'T TEACH IF YOU DON'T UNDERSTAND YOURSELF what's happening.

"Commit C's revision number has changed." - NO IT DID NOT. The original commit (f4ba6b) is still there, it still points to B, but a totally new commit has been created with a different content. To avoid confusion, it's better to name it C'. C is now a dangling commit (has no branch or tag pointing to it, and will eventually be garbage collected).

I tell everyone to save/commit as much as possible , I don't want to hear I lost 2 weeks of work. After words I just squash the gesture into master named after whatever the short feature is "feature modal","new admin view" whatever...git history without squash or debase is a nightmare when everyone has commits like "typo 1" "typo2".

Try selling git commit --squash or --fixup. Your colleagues can use this without actually rebasing, and you can help them out with the final step by doing the rebase. Especially when you have:

    git config --global rebase.autoSquash true
These commits are generally much easier to deal with than typo123 ones. But ymmv.

Something that somehow never ever makes it into these tutorials but is important: after you do the rebase locally, you must force push, don't pull then push. (After you rebase, the git client will suggest you pull.) This simple thing caused me a lot of unnecessary confusion about rebasing for a long time.

The video in the blogpost is all about it :)

Does this work even we keep pushing commits on a remote repository?

We extensively use Github Pull Requests for code reviews and most of the time we squash our PRs into a single commit but I would love to have a way to merge into multiple smaller squashed commits (as OP did) instead of one big.

The Interactive Rebase tool in SourceTree is very good and understandable/usable. I use it all the time to squash some commits, leave others.

You can watch the video, it shows examples of rebase with feature branches.

git rebase -i on a master with no changes.

As a sole developer, I am pulled in two different directions when using git feature-branch style. I want to make sure I never lose my work so I check in frequently- any time I start a change that might ultimately fail, I commit my current code so I can recover it. But I also want a clean, concise and useful history.

After reading this article I tried 'git rebase -i master' on my feature branch even though the master had no changes since starting the feature branch. This seems to work and allows me clean up my feature branch before I merge it to master.

Is there a better way to do this or are there problems with this?

The only potential problem is that you should instead run git fetch -u and rebase onto origin/master instead. (By default your rebase will use the remote anyway, but won't fetch for you.)

There is no problem with this approach! The blogpost also contains a video which explains how to use rebase in sample use-cases.

keeping a clean commit log or even strong commit rules is something which I only see as barrier for improving existing code... just thinking of code improvements by PRs which got rejected by bigger projects because the commit history was not "clean enough". History is not always good or pretty in real life... it should be the same on git commits.

To some extent I agree, enforcing it at a project level seems like bikeshedding and not necessarily that useful. On the other hand, like walking people through your house, it's nice to tidy up first to give people a good experience, and the same applies to walking through the history of your change too IMO

Even software with good and beautiful git histories can be ugly as hell (seen it all) and a good and beautiful house can have some really bad ugly history ;)

Is this a fixed-width comic font? :-/

Looks like Fantasque Sans Mono. It takes a bit of getting used to, but I've found it amazing once you do get used to it. YMMV.

Looks like Inconsolata or Consolas font to me. Not comic font at all.

What is the tool/command used to show git graphs in the screenshots of this article?

It looks like tig: https://github.com/jonas/tig

nice work Tarik!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact