Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Git rebase, what can go wrong (jvns.ca)
306 points by kens on Nov 6, 2023 | hide | past | favorite | 394 comments



I like how Atlassian puts it:

> The golden rule of rebasing

> Once you understand what rebasing is, the most important thing to learn is when not to do it. The golden rule of git rebase is to never use it on public branches.

https://www.atlassian.com/git/tutorials/merging-vs-rebasing#...

For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.


The way I phrase and teach what I consider to be the important rule of git is:

> Don't rewrite history on shared branches with proper communication.

I don't teach "never", I don't teach that `main` is special, I don't teach that force pushing is forbidden, because I don't believe in those things.

I highly prefer a rebase-heavy workflow. In addition to not "cluttering" the history, it's an invaluable tool to keep commits focused on "the right level" of atomic changes.


You can simply pass flags to “git log” to hide merge commits, without needing to rewrite history to “destroy” that information. While they are often noisy, sometimes they can be useful. I usually prefer to hide information rather than destroy it.


I read this justification in nearly every thread that pops up git rebase. I feel like a full because I cannot think of a real world example when this information crosses from signal to noise. Generally, branches that are not ready to merge tend to have enormous amounts of noise commits. Is there a blog post or some concrete examples I could work through that illustrate these benefits? I feel like workflows dramatically different from mine are likely the source of my struggle.


It's a log of what happened in dev and supports reconstructing history to understand why something worked or didn't in retrospect. "It work when we tried it" "oh this dependency was updated in this merge commit that could have changed the behaviour"


I am not sure how this is unique to a merge commit. The commit with the dependency change still exists in the main branch. The commit should never have gotten into main branch of it failed tests. If I take a positive action to rebase, I am accepting my fate from master anyway. If I merge into my working branch instead of rebase, that historical context issue only useful for that moment in time of reconstructing history and is not useful anymore. Once a branch goes into master, I want commits to main to have a 1:1 ratio of committed code for a task to positive action taken by a human.


It's not unique to a merge commit of course, but a point in favour of preserving history.


I assume that “with” is meant to be “without”?


It’s annoying when someone force pushes to a branch that you just reviewed, but you can no longer see the history so you have to scan through the whole PR you already reviewed looking for the change. Please just commit the fix, let me see it, then squash it.


You can just diff the previous head with the new one. In GitLab, it's simply a matter of clicking "Compare with previous version". Locally, it's `git diff branch@{1}..branch`.

It's only becoming tricky if the MR has been rebased onto a different base in the process, but it's not very hard to deal with that too if needed (just annoying).


Actually, it's not that annoying at all - TIL about `git range-diff`.


Unfortunately I haven't seen a git forge that will let you do "autosquash on merge" so I could just push up fixup commits as part of an merge request.



That always squashes the whole PR into a single commit, making it not very useful in practice. Git's autosquashing is much more powerful than that.


Squash merges cut down the noise considerably.


I think squash merges are a last resort heavy-handed tool for dealing with developers who refuse to clean up their commit history before merging. Most developers can do better by hand.

Git history should tell a simple, understandable story of each change. For example: 1) refactor existing code, 2) add feature. Or 1) add missing tests, 2) refactor existing code, 3) add feature.

But since you're working on the fly with imperfect knowledge, it doesn't happen in such neat steps. Refactorings and behavior changes end up interleaved in your raw git history, so you need to do a little bit of cleanup by hand in order to present a simple story in the commit log.

Of course if you have developers that don't do that and instead merge dozens of commits that just say wip, wip, wip, lol, fml, wip, wip, lol, yolo and you can't fire them or get them to change, then squash merges ftw.


I ser it the other way around - why spend time on a ‘nice’ commit history in a (smallish) feature branch when you can squash merge later.

I prefer one commit to main per feature, a long with a good description on the GitHub PR.

Sometimes I’ll branch out from a feature branch for the occasional and infamous ‘get CI working’ round of 10 one-line commits though, to not make it too muddy.


> why spend time on a ‘nice’ commit history in a (smallish) feature branch when you can squash merge later.

Several reasons:

    * facilitates much better code review discussions
    * enables use of git bisect to locate bugs
    * allows for informative commit messages associated with the changes
    * communicates clearly to future self about why changes were made


> * enables use of git bisect to locate bugs

This is really only viable if each intermediate commit on a development branch is intended to be bug free. If that's the standard you and your team work with, that's fine, but it's not usually my standard; in a development branch, I may commit things that don't even compile, let alone work, if it's a good point to commit.


The point of the parent comment is exactly that you should clean up the history before merging to a public branch, so that you can use bisect, even if so far you had wip wip doh wip as the commit messages. The way to get there is to have a mix of proper and wip commits.


Frankly, people lately spend more time managing commit history then using it. Like, commit history is useful once in a year little bit, maybe, but we spend absurd amount of time trying to make it look nice.


I use commit history a little bit more than that, but mostly agree. I had another dev recently give me crap about the mess of "WIP" commits on a feature branch because they review by clicking through the commits and my commits don't tell much of a useful story other than I apparently did some shit and eventually it all worked.

That said, I've also come to the conclusion there's basically two classes of Git users: people who really understand Git and use it fully, and those of us who basically use it as a place to shove source code before quitting for the night.


> Frankly, people lately spend more time managing commit history then using it.

At one company with a Giant Custom Enterprise App, I ended up occasionally acting as a historian for pieces of the company with bad communication/institutional-memory, ex: "Oh, the +5% Foo charge was because of a request 3 years ago by vice-president X, here's the ticket number, before that it used to be +3%."

In those circumstances--where the implementation is the source of truth for business process--a well-maintained stream of commit-messages become quite useful.


Comment with a ticket-id would be more efficient.


On a long-lived codebase you're going to end up with nearly as many such comments as there are lines of code. Now that's a cluttered mess.


When/how to comment is an art in itself, in no conflict to what I wrote.

Either a short quip, doc string, or link to the full story is an accessible combo. Nothing is the correct choice for unsurprising code.


While it's often said that comments should capture the "why" of code, I don't usually think that ought to extend to "cuz ticket#" except when that ticket number is an significant bug/limitation that explains a nasty hack.

Noting each feature ticket that ever affected a line--or even just at the function level--is sort of like maintaining a few thousand incomplete micro-changelogs. Doing it "acceptably well" takes much more effort than grooming the commit history so that someone can click "show change history for selected lines" in their IDE.

Plus consider all the unnecessary noise it makes for people reading the code, or reviewing a PR.


No, it's not difficult (cut/paste), nor a burden for reading in docstrings or even comments. The point is a short link that tells a long story, which should be accessible to non-developers.


And the commit message is an amazing place to put that.


Not if you want them read by non-developers.


Why would you have non-developers reading code?


Curating the commit history takes like 10' per PR and can easily repay in hours of work when some bug hits. Or when you want to tell the junior that wants to implement X for A, why don't you take a look on this one commit where we implement X for B?


Is there something wrong with the latest version of the method? Instead of one from ~18 months ago which may not work any longer?


I'm not sure I get you. What do you mean the "latest version of the method"? And why shouldn't code from 18 months ago not work? Some minimal regression testing should be in place for production code, and it is probably also used regularly.

So, yes seeing e.g. how "CSV export for class A" is implemented is a great guide for implementing "CSV export for class B".


Most recent. Interfaces change over time.

Everything you need is in the most recent copy. Showing an old one invites errors for no benefit.


A commit is not a method, it is a change set potentially affecting many files. Pointing people to the commit used to implement feature A lets them understand the whole story of which components need to change (and how) to implement similar feature B in a way that pointing them to a single method or file doesn't necessarily can.

You would then typically supplement reading the commit with reading the current version of the affected code, but looking at the commit points you in the direction of the files and methods you need to look at.


The idea is for junior to learn from massive commit that affected many files?


This reply is overstating a bit, but it does sound like a lot of work simply to avoid saying, "here look at these methods in this file" and the unstated, and trace the imports yourself.

Not to mention just finding the right commit months later sounds like more work than that already.

Personally, even if I were to make the history absolutely perfect, I never get the code right (interfaces etc), the first time. It might be hours or days before I'm 98% happy with the final implementation. Sometimes big refactor opportunities come to me months later, e.g. where I move code needed multiple times into a more central mixin location.


Not my experience, nor my team's experience over almost 10 years of using this approach.


I’m firmly in your camp on this one, but I’ve noticed that advocating a tidy history gets a lot of push-back online. I think there is an element of self-fulfilling prophecy here. If a team habitually leaves a messy history behind, that history is rarely going to be useful, so naturally the team has low expectations and sees little value in doing anything to curate it. And if a team isn’t used to making an effort to curate its history, they may assume that doing so is expensive because `git rebase -i` is scary and not something they use on auto-pilot for a few seconds at a time.

In other news, our developers also create several small PRs every day but each is for an incomplete change that doesn’t stand alone so we’re never quite sure which features are finished in any given build, everyone keeps complaining about being interrupted to do code reviews all the time when the code reviews have no value anyway because they always just say LGTM :+1:, and we have targets that no more than 15% of commits should break production when CI/CD deploys them and that we recover fully within an hour each time that happens. If only there were something we could do to improve all this…


> I think there is an element of self-fulfilling prophecy here.

This too, but there's another thing at play as well: many developers don't know git at all. They just memorized enough commands to let them do their work. They don't understand what they're doing, so they can't reap the benefits of the tool they use. You won't get much use of RAW photos if all you can do in a graphics editor is clicking "auto enhance" button.


I worked in a team where tech lead insisted on nice history. It was a lot of effort all the time and very little to no benefit.

He was lead and could influence salaries, his opinion mattered. So, in real life, people rarely pushed back. That is not the same as us sharing the same opinions tho. I became more verbal about history not being useful online.


If a strategy requires humans to be virtuous AND vigilant it is doomed to failure.

I rarely use history and prefer merge/squash, with automated CI tools, and tests. "Why" is kept in doc strings, comments, specs, and story tickets. Everything viewable in gitlab with automatic links. All this gets out of the critical path, every day.

I submit that, if your code is so complex that diagnosing a bug is a major research project rather than moving forward with a few extra/modified lines of obvious fix, then that is the problem to focus on.


If a strategy requires humans to be virtuous AND vigilant it is doomed to failure.

Sorry, but I don’t buy that. By the same principle, there’s also no point in writing unit tests or defining static types or having code reviews, all of which require thought and extra work, yet can yield considerable dividends when done even moderately well.

I rarely use history and prefer merge/squash, with automated CI tools, and tests. "Why" is kept in doc strings, comments, specs, and story tickets.

The argument for a tidy history isn’t just about a different place to explain a change. It’s about presenting work in clearly defined, meaningful steps to other readers like code reviewers, or perhaps someone who found these commits later through `git blame` on a problematic line of code or `git bisect` after a regression. It’s about each commit representing a complete, self-contained change that could later be reverted, or cherry-picked or merged to another branch.

I submit that, if your code is so complex that diagnosing a bug is a major research project rather than moving forward with a few extra/modified lines of obvious fix, then that is the problem to focus on.

Some problems have a lot of essential complexity. The code to solve them necessarily has at least the same degree of complexity. Sooner or later, there will probably be a change to that code with an unintended consequence for something else. Keeping the code and its history tidy and systematic is, IMHO, how you avoid those investigations becoming major research projects.


One of these things is not like the other. (Journey vs. final destination.)

As an industry we get paid primarily for 1) working software and 2) communicating with stakeholders.

Tidy yet inaccessible (to non-dev) construction stories are not on that path. I would argue unit tests et al are, to ensure #1.

No stakeholders? Put why into a readme, where it can be seen at a glance. Comments can reference docs.

Complexity must be broken down into bite-sized chunks for a solution to be feasible in the first place, reliable in the second. i.e. skull-size limits. If there’s any code I don’t understand I rewrite it until I can. With tests of course.


Sorry again, but I’m still not seeing the distinction I think you’re trying to make here.

I see version history as an asset, just like the code itself, tests, developer documentation, the bug tracker database… None of these things are directly visible to end users under normal circumstances, but they are useful sources of information and organisation and collaboration that help developers to create the software that users do see.

To me, a repo with a messy version history is like code full of superficial comments, a test suite with high coverage metrics that still doesn’t exercise the most important functionality, a dev team where the only documentation is some auto-generated static site that reproduces what any decent IDE would show in real time anyway, or a tracker where all the tickets are vague one-liners. You can produce useful software despite those things, but why would you?


It's not a black and white distinction, I agree.

Also, I said/meant stakeholders not end-users. Ours definitely do write bugs, look at docs and generate db reports etc.

The main distinction is that things on that list have a high cost-to-benefit ratio to goals 1 & 2, where history maintenance does not. The cost is high and utilization isn't. Additionally it can't be used to communicate with anyone but developers.


> there’s also no point in writing unit tests or defining static types or having code reviews

Not true. I do not do those to have nice clean process. I do unit tests, because without them the code is unstable and it is hard to fix bugs without causing unrelated ones. If the code is super simple and unlikely to break, I don't do test. I like to use static types, because I am much faster when writing them. The code is more readable and I have less bugs. Now, I have seen both useless and useful code reviews.

But, in all of those cases, things are done because they beneficial impact in final code and speed of delivery. Beautiful git history does not have such tangible measurable benefit. Git blame and bissect work without it, you just need one more step once in a while.


But, in all of those cases, things are done because they beneficial impact in final code and speed of delivery. Beautiful git history does not have such tangible measurable benefit.

I respectfully disagree. In my experience, a tidy history directly benefits both efficiency and outcomes of code reviews, speeds up investigations of both bug reports and sometimes general background before starting new development, makes development much easier in situations where changes may need to be isolated and deployed to specific environments (not all software is a web app using CI/CD…), makes it much easier to back out a problematic change without causing unnecessary collateral damage, and helps to verify which development has actually been completed and deployed to which stages/environments, which can be useful for general awareness around the team but is particularly important if you’re operating in any kind of regulated field. All of that in exchange for usually spending less time in `git rebase -i` than it’s taken me to write this comment seems like a bargain to me, but YMMV.


Fully agreed. Seems to me that most people who don't care about curating their commits (and either leave a mess or squash everything at merge regardless of context) simply don't work on projects that are either complex, distributed or long-living enough, so they can easily afford such carelessness.


Nope, ~15 year project here. No one is reading obsolete commit messages when there are hundreds of files to get familiar with today.

Not to mention quality is significantly higher now so you wouldn't want to refer to a granular history of crap anyway. Any time spent on that would have been completely wasted as I wipe out a thousand line file for a new one with a hundred lines because requests hadn't been invented yet and the original implementer didn't understand network protocols or how to use argparse and implemented it from scratch poorly.


Yeah, as I suspected.

If you can afford your first instinct to be reimplementing things from scratch, your understanding of the value provided by proper version control will be limited. Some of us work with constantly changing code developed by thousands of people from all around the world in projects that 15 years ago were migrating to git and that have tons of downstreams, and are thankful for maintainers and processes that keep their commit graphs useful.

Though that said, once you're comfortable enough with git you'll be thanking yourself for commit hygiene even when coming back to your few years old single-person codebases.

In my experience, developer's work consists mostly of gaining understanding of codebases. It's like being a detective. Writing new code happens too, but not as often and it's not as impactful (and usually can and should be handled by less experienced devs wherever possible). Among the most impactful things are single line changes that took a week to write, or a few dozen lines that took months. Rewriting existing code from scratch is something that happens only as a last resort and after very careful consideration. Maintaining some basic version control hygiene makes a whole world of difference in such work. Sure, you can live without it, but you can also live without docs, comments or tests (and sometimes have to - which makes you appreciate them when they're there).


If I can invest business hours and get back minutes during an outage at 3 AM, I should do that.


Yeah I just don't care really. I have noisy dev branches, force pushing rebases from an upstream branch, and usually just squash merge and throw the history away when merging back to master. I've never come across a situation where I needed a preserved, fine-grained commit history for every dev branch after they've already been put in working order and merged to master. I guess I just don't use commit histories very much after the fact, like months later trying to find what original commit changeset a line of code was changed in. It's never mattered.


I use the commit history every time a I open file I haven't worked on for a while. Just seeing the output of git annotate in the border of Intellij gives me a sense how old code is, what changed together with what and whom to ask in case of questions.


If merge points are your "known good" points anyway you can just use the powers of the git dag and `git bisect --first-parent` in your main branch to just bisect the merge points. There's no need for rebase/squash and you still get useful git bisect results.


All commits are good points and potentially useful points. Was the bug in the refactoring? In the feature itself? In the resolution of merge conflicts? You can only answer if you don't squash, and it becomes easier to fix the bug if you know the answer.


Sure, but also no one particularly wants to CI every commit inside a PR, so there is a usefulness in `git bisect --first-parent` as the "first pass" of known CI points (merge commits presumably from PRs) to find the "PR that introduced the problem" and then drill down into every smaller commit to see if you can get additional bisect information (from commits that may or may not have passed CI in the first place in development work-in-progress).


I think the point the GP message is making is that, prior to review/merge you extract atomic commits from your WIP that tell a clear, concise story of how the change was made. The reviewer has less built up context so by chunking it like this they can step through each commit one at a time.

IMHO the expectation is that each commit would 100% pass CI, so if you decided to extract some commits and merge that early you can. This is especially useful when a 6 commit PR is reviewed, and the first 3 commits are fine but there is more feedback on the last three. The reviewer can split the first 3 good ones out, get them merged and whittle down the PR to the remaining three. The subsequent follow up will be less.

IME team velocity goes up with this too, and it encourages small and easy to review commits like a Remove to be extracted and merged early.

Since PRs are always as large or larger than commits, I would much rather have a specific commit flagged than have to wade through the whole PR diff. If the PR is not familiar to me, I want to increase my effectiveness narrowing down the cause, so I can fix it faster.


I don't do full CI for every commit but I do run the relevant unit tests (or all of them depending on the change and the project) and ensure that they pass.


> even if so far you had wip wip doh wip as the commit messages

Aside, `git commit --fixup HEAD` is often better than `git commit -m "oops, one more thing"`, since it means you can easily `git rebase -i --autosquash`.


Of course I commit a lot of garbage commits that don't work, it's super useful to do so. Those never get pushed out into branches that I share to others though - why would I waste their time having them look at those?

What I push out are atomic commits that make sense logically, not an external undo log of my text editor; squashing those on merge provides no benefit and only loses useful information. Squashing should happen before push, not on merge, and there's no reason to have buggy "intermediate" commits recorded in your central remote branch at all.


> > * enables use of git bisect to locate bugs

> This is really only viable if each intermediate commit on a development branch is intended to be bug free.

git rebase has an --exec option that allows you to run a command or set of commands for each commit in the branch. You could rebase your development branch before pushing it up for review and ensure each commit passes coffee linting and tests.


Another good reason is that having a small commit that changes just one thing is a lot easier to revert without encountering conflicts, even after other features have been committed to the main/master branch.


There are a multitude of Git workflows, and opinions on what the basic unit of change is: for some, a feature is atomic, so squash-merging feature branches is perfectly natural.

> facilitates much better code review discussions

This can be done while adding code to the feature branch

> allows for informative commit messages associated with the changes

I'm assuming you consider individual commits to be the basic unit of change? This isn't always the case. Some products are not amenable to adding features fractionally

> communicates clearly to future self about why changes were made

You can do that with a squash-merge too!

I've noticed people who work on an evergreen deployment can afford to work on a very granular, commit-level. However, if you have to support multiple production branches concurrently and often have to cherry-pick features and fixes across them, features will naturally become the basic unit of change you will find yourself gravitating towards, and will liberally use squash-merging just to keep your sanity.


> facilitates much better code review discussions

Hmm, I usually mark PR’s as draft until ready for review, and then I expect the discussion to be about the current state, not a previous intermediate state. Easiest with small PR’s.

> enables use of git bisect to locate bugs

Interesting. I know _of_ git bisect, but haven’t used it as part of my workflow. Have you found it useful to bisect commits on a feature branch (which, presumably, represents unfinished work)?

> allows for informative commit messages associated with the changes

I find using the PR title and accompanying info in GitHub or similar to be quite informative - that should convey the purpose of the change.

> communicates clearly to future self about why changes were made

See above. Perhaps we work differently, but I find it clearer to read a git history where each commit represents a single, complete feature/fix/refactor instead of intermediate steps.


The classic example where this fails is when needing to revert something. An atomic commit for the migrations + some atomic commits for the implementation mean you can easily revert the implementation, and leave the migration intact (as should be) and add a reverse migration.


> why spend time on a ‘nice’ commit history in a (smallish) feature branch when you can squash merge later.

Agreed, it has been standard at most shops I've worked in the past 8-10 years.


Precisely, you want to keep it about one commit per feature. I think the parent comment was worried about monster merges that squash many features together.


If things are heading towards a single commit an amend commit works fine for me. If I need a previous state I just get it from reflog.


> I ser it the other way around - why spend time on a ‘nice’ commit history in a (smallish) feature branch when you can squash merge later.

Squash-merge is a scourge. I've seen squash merged commits 30 lines long ("try 15", empty line, "try 14", empty line...). I'm not even sure if you can do anything about such commits because squash-merge is a github/gitlab thing. So, I'm not sure if there are hooks to block it via a commit message linter.

And I've seen people going through some intense mental gymnastics to justify avoiding squashing locally, writing a proper commit message and then merging.


You can simply ask “git log” to show you one coarse entry per Pr rather than “destroying” the more granular history


This. It bugs me that people permanently throw away details of changes rather than show just the log of merge commits.


Why would I want to keep the details? Umpteen "tmp", "fix" and "fixed typo" provide negative value. When I check the blame of a line, I need to see the context of the change, meaning a description of all the work that was done as part of that change, and perhaps a ticket number. Anything else is noise that actively detracts from the value of the log.

It's like 4K porn. It's less appealing when you see everything.


No yeah, absolutely do squash those commits, they are actually polluting the git history.

The issue for me is when commit who are about adding a new value in the env file become mixed with template and responsive handling, mixed potentially with a bug fix.


I get why you would prefer it clean but its just too much overhead for me. I naturally make lots of changes together - especially on a complex feature, you need to build things in a "full-stack" way horizontally so you can test as you go. Then pulling things apart into "clean" atomic commits later just takes too much time and I don't really know how to do it efficiently.


The thing is that when you get good at it the overhead will go down drastically. And the practice will make it easier and faster to extract small pull requests out of your main work that can be reviewed separately.


It's like if someone said they wanted to invite you over for dinner. Then they started texting you. "at the grocery store". "bought a pound of beef". "bought some carrots". "Checking out now". "Arrived at home". "Turned oven on". "Turned oven to 450 degrees". "Turned oven up to 460 based on different recipe". "Starting to prep the beef now". and so on, and so on. I mean, just cook the damn dinner - I don't need to be needled about all the steps. I'll come over and bring the wine, and we'll eat a meal. I don't need to know every minor implementation detail in a commit log, to review and merge the branch. Arghhhh I have one dev on my team like this right now. I'll have to have a talk with them.


I disagreed with you up to this point:

> Of course if you have developers that don't do that and instead merge dozens of commits that just say wip, wip, wip, lol, fml, wip, wip, lol, yolo and you can't fire them or get them to change, then squash merges ftw.

Yes, any large organization has plenty of devs who all have their own style and preferences, for better or worse.

Whoever demands they all bend to the one true way is a fascist (lol not really but you know).

Just set up your CI/CD in such a way that PRs with weird git logs get squashed into one pretty message, preferably the PR description since other devs have to review the PR it's often given more effort. Set it up so that if things weren't formatted the "right way" they get auto-formatted or a test fails and the dev says "ah, I have to run that one task and then update the PR".

I don't think a big organization is going to scale with developer "evangelists" demanding people write their commits a certain way either.

conventionalcommits.org was the worst. I worked at one "big co" that tried to get devs to do this. Even after we had been doing it for a while, nobody ever went back to look at the history in such a way that it was worth it. We ended up throwing in the towel rather than the company trying to get all other teams to do it.


Eh, commit history is handy to have but if you spend all your time crafting the perfect commit messages and history you’ve likely lost sight of what matters. Great is the enemy of good and make the tools do the work for you. Commit as many WIPs or whatever then just hit squash and merge - it saves a lot of time and keeps momentum up.


There is a reason so many open source projects require squashing.

Ain't nobody got time for that shit.


I think squash merges are a last resort heavy-handed tool for dealing with developers who refuse to clean up their commit history before merging. Most developers can do better by hand.

This is too much thought put into a VCS. I don’t want to have to think about my VCS at all beyond the commit message. For all of Git’s popularity, I’ve never seen benefits that justify the absurd amount of work and knowledge it takes to perform simple actions. It’s the VCS equivalent of Scheme or emacs.


It really pays the effort back, though, when you can figure out why something was done, beyond knowing the feature it was related to, which is all you get with a squash merge.


But if you're using pull requests, you can just look up the PR to get the reasoning and details of the squash commit. I would argue that if you need it to be separate commits after merging, you should create separate PRs most of the time.


Why not document it then? In a place where everyone can read it, instead of only developers.


...are you suggesting spending hours searching through documentation, hoping to possibly find something relevant, instead of just being able to run "git blame" to see why a specific line was changed?


I use blame often to blame.

Code tells how, not why—the domain of specs and comments. Commit messages effectively don’t exist for non-developers.

I put spec links in doc strings whenever possible. They are accessible to everyone—devs, PMs, SMEs, stakeholders that pay bills, and myself when at a web browser.

Searching is not required but even if it was it would be a tiny fraction of “hours.”


It's not about VCS, it's about code, both now and later, and context.

If you need to do a workaround, or a complicated feature sometime it's nice to explain it as a comment in the code, but sometime it's better to put it as a comment inside the commit message. But if it's all merged i the end, along with lots of template changes, README changes, refactor irrelevant to the current changes, then you're losing an important way of navigating a codebase.


> I don’t want to have to think about my VCS at all beyond the commit message.

Fine as an opinion.

> For all of Git’s popularity, I’ve never seen benefits that justify the absurd amount of work and knowledge it takes to perform simple actions. It’s the VCS equivalent of Scheme or emacs.

This is just wrong. When people talk about this, it's not about git at all.

You write some code, and you make it into commits. Part of that is choosing if/how to organize it with multiple commits, and how much effort you want to put into that. This is fundamental to using a VCS, any VCS.

Or by analogy, if a lot of emacs users complain about your spelling, that's not because emacs is overly demanding.


That was me.

Got fired. Kinda. I was laid off.


I actually hate squash merge because of all the noise it adds. Sure, the commit graph looks nicer, but it come with a terrible loss of information when doing git blame.

I'm a big proponent of rebase and squash if it helps to make a commit more coherent, but we use squash merges by default in the current project I'm working on, and I die a little bit each time I try to understand what changes were related to a line when tracking down a bug.


This is the big one for me. destroying Commit information just to keep the graph tidy is a bad idea in my opinion. It would be better if Git provided better tools for filtering the log, e.g. providing some mechanism to elide commits from parents of any merge commit other than the 1st.


> destroying Commit information just to keep the graph tidy is a bad idea in my opinion

The commit information I see when telling teams to squash their branches on merge is not valuable.

* "fixing whitespace" * "incorporate review comments" * "fix broken test" * "fix other broken test"

(note, the broken tests were broken by the changes in the PR)

As soon as that PR is merged those commits are worthless. And there are branches with dozens of those "fixing X" commits that would otherwise pollute the commit graph.


> * "fixing whitespace" * "incorporate review comments" * "fix broken test" * "fix other broken test"

Things like this should not be standalone commits though, they should be incorporated into the previous branch by amending the original work. It takes some effort to have a useful git history, it does not just happen on its own.


Sounds like six vs half-dozen. Why does it matter if somebody amends vs squashes?


It does not matter if you have one commit. If your change is split into few commits for increased readability, in that case it does matter.

Do you really believe that if, for example, this change to btrfs filesystem https://lore.kernel.org/linux-btrfs/cover.1699470345.git.jos... would be squashed, nothing of value would be lost?


You can very easily rewrite your commit message on GitHub when squash merging. Since the organizations I work exclusively use squash merge, I often just update the commit to be more valuable, listing the important changes it contains. (And of course the PR in GitHub will contain the commit history of the branch that was squashed, as well as any discussion.)

IMO, this is a lot simpler and easier to do than rebasing your branch to have a flawless history.


I rather strongly disagree here.

Having whitespaces mucks up commit, causing you to lose focus of what's actually important.

I have `git blame` aliased to `git blame -w` which ignores whitespace-only changes.

You can also reblame when you come across this formatting commits.


Yep, intermediate commits on a branch tend to be completely worthless. I'd much rather have "git blame" point to the commit that contains the entire change together.


Agree strongly, it's nice in theory to view the intermediate commits but in practice have never needed to look at them


Those commits would be the bathwater one casts out alongside the useful commits in using squash merges.


If the useful commits are the "baby" in your bathwater analogy, all the useful information in those commits is in the squashed commit.

This assumes a branch being merged in represents one logical change (a feature/bugfix/etc) that is "right sized" to be represented by one commit.


Yes, but now it's mixed with the bathwater, and now morph into another metaphor as it become the needle in the haystack.

It's okay to have 'low information' commits one can easily ignore in your history, as long as the 'high information' ones stay readable and coherent.


You can usually see that in whatever tool youre using anyway. Blame -> find the PR -> see commit history.


You mean like for example `git log --first-parent`?


TIL, thank you! Now I know for a fact that squash-mergers have no excuse and can brandish the man page at them.


People in general just have no idea how much version control is able to do. For one example try running "git help log" and just tap page-down a few (dozen) times to get an idea what's in there.

For another example, you know how people hate aligning code vertically so much that linters don't allow it nowadays, the primary reason being that if you have to change the spacing then the diffs will identify far too many lines as having changed? Both git and svn have options to ignore whitespace changes:

  git diff -w
  svn diff -x -w


`--first-parent` also today works for blame and bisect.


> I die a little bit each time I try to understand what changes were related to a line when tracking down a bug

A change/feature/bug is a branch, which is squashed into a commit on your main branch, right? So your main branch should be a linear history of changes, one change per commit.

How does that impact the ability to git blame?


Because unless it's the most trivial of features, you'll break it up into smaller commits which each explain what they are doing and make reviewing the change easier.

As a simple example, I recently needed to update a json document that was a list of objects. I needed to add a new key/value to each object. The document had been hand edited over the years and had never been auto-formatted. My PR ended up being three commits:

1. Reformat the document with jq. Commit title explains it's a simple reformat of the document and that the next commit will add `.git-blame-ignore-revs` so that the history of the document isn't lost in `git blame` view.

2. Add `.git-blame-ignore-revs` with the commit ID of (1).

3. Finally, add the new key/value to each object.

The PR then explains that a new key/value has been added, mentions that the document was reformatted through `jq` as part of the work, a recommends that the reviewer step through the commits to ignore the mechanical change made by (1).

A followup PR added a pre-commit CI step to keep the document properly linted in the future.


In general I agree with you, there are absolutely times where you want to retain commit history on a particular branch (although I try to keep the source tree from knowing about things like commit IDs).

I would argue that those are by far the minority of PRs that I see. As I mentioned in another comment, _most_ PRs that I see have a ton of intermediary commits that are only useful for that branch/PR/review process (fixing tests, whitespace, etc). Generally the advice I give teams is, "squash by default" and then figure out where the exceptions to that rule are. That's mainly because, in my opinion, the downsides of a noisy commit graph filled with "addressing review comments" (or whatever) commits are a much bigger/frequent issue than the benefits you talk about. It really depends on the team.


> As I mentioned in another comment, _most_ PRs that I see have a ton of intermediary commits that are only useful for that branch/PR/review process (fixing tests, whitespace, etc).

Right, but that's only because developers don't amend and force push their commits to the PR branch as they receive feedback. Which is largely encouraged by GitHub being a terrible code review tool.

To me, git is part of the development process, it's not an extra layer of friction on top. So I compose my commits as I go. I find it helpful for recording what I'm thinking as I write the code. If I wait till the very end, I'll have forgotten some important bit of context I wanted to include. So during the day I may use the commits like save points. But before I push anything I'll often check out a new branch and create and incremental set of commits that have the change broken down into digestible pieces. And if I receive feedback, I'll usually amend those changes into the PR and force push it.

I'd like to add that I spend a lot of time cleaning up tech debt. And I deal with a ton of commits and PRs that don't explain themselves. So I'm really biased toward a clean development workflow because I hope to make the lives of those who come after me easier.

I was also trained on this workflow by being an early git contributor and it had extremely high standards for documenting its work. There's a commit from Jeff King that's a one line change with about six paragraphs of explanation.

There's no right answer here. I value the "meta" part of writing code. Not everyone does and that's okay.


When the word "force" is involved, it's time to take a step back and re-evaluate things.


It's due to GitHub lacking change set support. With Gerrit, force pushing isn't required.


> only useful for that branch/PR/review process (fixing tests, whitespace, etc).

I have had bugfix cases where, digging through the repo history, both of those examples accidentally introduced the bug (the first because the person who made the original change didn't completely understand a business rule so it changed both the code and the test, the second because of a typo in python that only affected a small subset of the data). Keeping the commit separate let me see very quickly what happened and what the intent actually was.


Because now instead of having a line changed within a granular level of changes, it's lost with the other changes from the same feature branch, which is a more macro level. So if a change in config is needed for the feature, the part when this config change actually need to be handled, or would impact the data-flow is harder to evaluate now that you mix it with template changes, style changes, new interactions needed for the users, etc...

EDIT: On top of that, there's usually a bit of 'related' work you need for a task, by example when you find an edge case related to your feature, and now you also needed to fix a bug, or you did a bit of refactoring on a related service, or needed to change the data on a badly formatted JSON file.

Unbeknownst to you, you added a bug when refactoring the related service, a bug that is spotted a few months after, only on a very specific edge case. If the cause is not obvious, you might want to reach for git bisect, but that won't be very useful now that everything I've talked about is squashed into a single commit.


> EDIT: On top of that, there's usually a bit of 'related' work you need for a task, by example when you find an edge case related to your feature, and now you also needed to fix a bug, or you did a bit of refactoring on a related service, or needed to change the data on a badly formatted JSON file.

I agree that's related work, but I'd argue that work doesn't belong in that branch. If you find a bug in the process of implementing a feature, create a bugfix branch that is merged separately. If you need to refactor a service, that's also a separate branch/PR.

That's actually the most common pushback I get from people when I talk about squashing. They say "but then a bunch of unrelated changes will be lumped together in the same commit", to which I respond, "why are a bunch of unrelated changes in the same branch/PR?"


I agree with you in principle, but it's usually because of process and friction. In the place I'm working right now, that would result in days lost as I need to create a new Jira ticket, which obviously require a team meeting for grooming (because Agile!), and then going after colleagues so that the PR is accepted, which best case still need for CI/CD pipeline to finally deploy, and then merge it to the dev branch, and finally rebase the current feature branch... and all this multiple times.


Because sometimes a PR touches more code than a single commit, and you lose the more granular context surrounding the more granular changes. You can always ask git to make the log more coarse, but once you “destroy” the granular history it is for all intents and purposes gone.


Me too. I care a LOT about provenance. And squash merge completely breaks that.

When my branch is up to date with `main` I can build an artifact, fast forward merge that branch into `main` and RETAIN the artifact, and merely update its tags to mark it as `merged` in.

With a squash I lose that information.

Now, GitHub does not allow me to do a fast-forward merge but I can still trace the 2 commits that are the parent of the resultant merge, and find the artifact based on that, and retag.


I do squash merges but keep the feature branches. So after determining that I made a change as part of a big pull request, I can then look at the commit/blame history for the pull request source branch if necessary.


I'm guessing you don't work on large projects. This would create an outrageous amount of noise in a busy repository.


This means having to keep these branches around cluttering up everything, and makes git bisect a lot more complicated.


Git blame confuses people even without squash merges.

I've seen people forget to go back more than one commit and then blame the person who last indented a file instead of going back to the commit that actually wrote the code many times.


> I've seen people forget to go back more than one commit and then blame the person who last indented a file instead of going back to the commit that actually wrote the code many times.

I default my `git blame` to `git blame -w` which ignores whitespace commits. Though knowing how to jump back commits should be required knowledge.


We shouldn’t tamper with code we don’t actually need to fix, it’s not a good use of time and it makes history less useful. Just because it doesn’t look like I wrote it doesn’t make it wrong.


I’m thinking of situations where the surrounding structure of the code has been changed to correct a problem.

That’s done by an automated tool. Correction of indentation is just a byproduct.

I don’t consider that “tampering”.


You can deal with that through tooling.

In a lot of my work I call those types of automated tool commits "wrench" commits personally and even have a simple shell script to help automate committing them. In my case I prefix the command line with a wrench emoji. At that point it's very obvious in git blame that if a line starts with a wrench it was last touched by an automated tool of some sort.

You can also very easily at that point grep your git log for wrenches to dump commit hashes into a git-ignore-revs file and automate that part too so that those commits don't even show up in git blame at all.


This all depends on the project. Sometimes you don’t look at history all that much. Sometimes the loss of information is acceptable.


If you don't look at history much, why would you care about keeping it "clean"? Just keep the truth of the changes in the history for those of us who do use it, and you can continue ignoring it.


No one is mentioning:

    $ git log --merges
Now you can see your features in a nice history and also have added benefit of seeing intermediary commits. Pro tip: merge commits aren't required to use the canned "Merge branch into..." message, you can give it any message you want, such as "feat: ..." or whatever your convention is.

I hate that branch squashing has become something of a defacto. I actually do rewrite my history and often add context to my commits. `git blame` can be an incredibly useful tool to get context about a given small change. Getting a massive diff for a whole feature is much less so, especially since you can just look at the diff of the merge commit.


What I think I see these days is squash merges being used lazily to avoid having to do anything to build a clean history with clearly semantically delineated commits. Squash merges are good compared to an alternative where people check in super messy noisy branches, but they unfortunately have a big downside because squash merges can make bisecting and history spelunking more difficult, when the branches that are squash merged were big.


What is a "semantically delineated commit"? What is a "clean history"? Why are these two things important?


Not parent: there are technical commits, such as "fix review", "fix jenkins", "fix typo" etc. Those don't delineate a particular feature but a fix for a problem that arose from the workflow. This ends up with a history of "big feature commit that is wrong in three trivial ways" + "fix 1" + "fix 2" + "fix 3". Of those, "big feature commit" is the important one, but "fix 3" is the only working one. This is clearly silly; you should pretend you were perfect from the start and squash "fix 1" through "fix 3" into "big feature commit". Your typos and brainfarts are not of historical relevance.


Perhaps I'm missing something, but I don't see how your comment answers my questions. Do you mean that a "clean history" is one without "fix 1", "fix 2" and "fix 3"? Or is that a "semantically delineated commit"?


A clean history is one where there is a single commit, "big feature commit", that produces a worktree that is the same as the one produced by "fix 3" in the "unclean" history.


How is this possible, while sharing code? Doesn't this require that pushed code is perfect? What about everyone else working on the same code? Do they wait until you've reached perfection? Or, do you squash the branch once it's complete, with the assumption that there's no other development on/from that temporary branch (I envy you if so)?

(I ask these questions fully assuming I'm doing it wrong.)


> Doesn't this require that pushed code is perfect?

We aren’t talking about pushed code. We are talking about cleaning up the local commit history before pushing it into a shared branch.


And that's the one--and only--reasonable use of rebasing, to squash commits from a branch before merging into main. If engineers find themselves using rebase in any other context than squashing a merge, it's time to re-evaluate the processes/culture around workflow.


What about the context where one works with other people, while sharing code?


When working with published/shared branches with other people, the advice with git has always been that history is history and not to be changed after publishing, unless there is an emergency like a security incident.

Aside from that we need might need to clarify what the question is. With shared code & git, it’s nice to use a branch & merge workflow, and it’s nice to make incoming merges as clean / nice as you can do the resulting history is as smooth as it can be while capturing what happened at a reasonable granularity. These are today’s conventions though, and it’s really up to the team to decide how to balance shared work, and what people feel are the most important workflows and tools.


You fix your local tree before sharing it. Alternatively, you can communicate with your team and tell them they'll need to run git fetch && git rebase -i origin/main to drop your erroneously merged commits.


They are. Or at least can be. Typos I probably agree with, but I've seen plenty of logic bugs introduced in those "fix" commits and keeping them separate from the big one is useful when figuring out what was supposed to happen.


A bunch of wip wip2 wip3 commits don't add any value, and make the log harder to read. But if you break a bigger PR down into "added feature x", "tests for feature x", "refactored y to support x" -- the commits are easier to read and provide valuable "why" history when you're trying to figure out what happened two years later.


That's more about the contents of the merged commits than anything else. Modifying the commit message(s) fixes that, as long as that's what the commits actually did.

Aside from that, how are "a fix for a bug" style commits not "clean"? If merge 123 into master contains a bug that is fixed in a future merge 1234, it doesn't seem "dirty" to me; quite the opposite actually, as it tracks what actually happened.

Now, "wip" style commits shouldn't be on whatever main branch everyone is working on: that's what branches are for. And if everyone is just working off the main branch and committing directly to it, that's an organizational deficiency; not one that VCS can solve.


Modifying commit messages is rewriting history, right?

> “wip” style commits shouldn’t be on whatever branch everyone is working on

Agreed! We aren’t talking about rewriting shared branch history, we are talking about removing the “wip” commits made hastily and locally before pushing them. Sounds like we agree!


> they unfortunately have a big downside because squash merges can make bisecting and history spelunking more difficult, when the branches that are squash merged were big

can you help me understand this? It is the exact opposite of my experience. The flow I see is: bug reported, write a git bisect test, identify the feature that introduced it, reach out to that developer/team.

This is allowed by squash merges. When I've seen these more "clean" histories, they have commit points that wont even compile or have runnable tests causing git bisect to fail.

> branches that are squash merged were big

it must be this - how big are your merges? All the projects I've worked on strive for smaller PRs. Large PRs are usually broken up into smaller pieces. Large PRs are an anti-pattern.


I maybe don’t know what you mean about “clean histories”. Speaking for myself, I always expect a history that’s called “clean” to compile error-free at every commit, unless otherwise noted; one of my personal criteria for calling history ‘clean’ is that efforts are made to keep the main branch up and running for every commit.

> how big are your merges? […] Large PRs are an anti-pattern.

Depends, but they sometimes on occasion can get pretty big, if there’s a bit refactor and/or multiple people in the branch. Small enough PRs are a nice goal - it’s a goal that might agree with and exist in part because squash merges on large PRs lose too much. It’s just the real world routinely gets in the way. It’s very easy for someone who needs to do an ‘atomic’ refactor to touch a ton of files. It’s very easy for a planned feature to end up way bigger than intended. You can’t always keep PRs small or enforce it on other people. Sometimes stuff happens, and when it does, sometimes squash merging feels less good than merging a branch with multiple commits. The good news is that it’s always optional. The bad news is that I can’t necessarily babysit or dictate what others do, and some people prefer squash-merging to spending any time doing cleanup on a messy branch.


Squash merges are rebases.


Only metaphorically, maybe. You can squash merge in lots of cases where a rebase will fail.


They are essentially rebase+squash, despite the name. There is no actual merge taking place.

And for that matter, you'd manually do a squash with the interactive rebase tool anyway ("git rebase -i").


Imagine a feature branch where someone has been keeping it up to date by merging main into it regularly. Now the feature is ready to go into main. You can easily `git merge --squash` that branch into main. You can likely do the same thing manually (as you point out) by running `git rebase -i` if you squash all the commits in the branch. But you’ll never manage to do a genuine rebase, where every commit in the branch gets turned into a clean non-merge commit onto main.


FWIW I consider `git rebase -i` to be a "genuine rebase"


I do too, except in cases where it’s being used simply as a more complicated UI for `git merge --squash` and there’s no actual “generate a diff and apply it to a different base commit” going on.


I think we have a rose by any other name situation.

I call that a rebase.


That's a badly fitting analogy because there's only one type of flower involved. In this situation, they're saying that most things you might do with "rebase -i" are rebases, except for one.

I'll make a math analogy. Technically a rectangle is a trapezoid, but if someone says tries to draw a distinction between rectangles and proper trapezoids, it's not hard to figure out what they mean.

When rebase -i outputs a single commit, that's a degenerate case. There are statements about rebases that are generally true but not true for that specific kind.


Just when I thought I was starting to understand git...


Rebase essentially means "create new commits out of old commits", the original use being to "move" a set of commits from one branch to another (think of the name as meaning, to change the base these commits started from).

There's a few special cases that have their own names, a common one is when you amend a commit - to do that manually you'd make a new commit, then use interactive rebase to squash the two commits together into a new one (or, use the "fixup" command available in that tool, which is a squash that automatically picks the first commit message instead of asking for a new one).

Squash merges will squash a whole branch into a single commit, rebasing it onto the target in the process, and then fast-forward the target to the new commit. It's a tightly controlled use of rebase, and can be thought of a bit like how "for", "foreach", and "while" loops are a tightly controlled use of "goto", an abstraction built on top of a far more flexible tool.


My rule of thumb for commits is that they should be of a size and scope suitable for cherry-picking. So, maybe I'm working on a small feature that entails three changes, and each of those three changes is useful in and of itself and could conceivably be cherry-picked by others. I would create three separate commits, generate a PR with all three, and merge in the work. Sure, I could squash merge and end up with one merge commit encompassing all three changes, but now none of those three changes is cherry-pickable.


They also cut down the signal.


> Squash merges cut down the noise considerably.

They do but they have their own issues. e.g. having to delete local branches using git branch -D instead of git branch -d and getting the protection from deleting unmerged work.

I still agree that on balance annoyances like that might still be worth putting up with for larger teams with mixed skill levels.


I don't mind merge commits, it's the 100 tiny individual commits some developers seem to like to do that really clutters things up. Yes, I know, git squash is a thing, but not committing until the feature is working and ready to commit is also a thing.


> not committing until the feature is working and ready to commit is also a thing

That leaves you prone to losing work if you have a false start that you need to back out of. I prefer to commit early and often on my private branches, then before submitting a pull request I clean up the history to where there are a few good commits that form useful, standalone chunks (ideally the test suite fully passes on each commit).


>That leaves you prone to losing work if you have a false start that you need to back out of.

Hasn't happened to me in over 20 years of using version control. I always keep moving forward, there's really never been a need to go back to a previous commit that hitting crtl-Z wouldn't accomplish just the same. If I wanted to try a new direction I'd just clone the repo again and do the work there. Littering the git history with dozens of superfluous commits just seems pointless. Having to stop and think about writing a commit comment is also just a waste of time - in aggregate it wastes a lot of time. It adds a lot of churn to a workflow for something that may never really be of any value.


> Littering the git history with dozens of superfluous commits just seems pointless.

This is where the final rebase comes in—you should be combining all the small commits into one.

> Having to stop and think about writing a commit comment is also just a waste of time

Most of my commits when I'm working like this are named "draft". The names don't matter when you're going to redo the history later.

> I always keep moving forward, there's really never been a need to go back to a previous commit that hitting crtl-Z wouldn't accomplish just the same.

You've never started down one path for solving a subproblem only to realize 30 minutes in that it's not going to work?


>This is where the final rebase comes in—you should be combining all the small commits into one.

Sorry, but I'm a software engineer, not a git engineer, and the less I have to do with git, the better. KISS applies to git, too. A simple thing like not creating a commit for every stupid thing keeps the history clean, doesn't bog down the developer by requiring to think about writing a commit message every 2 minutes, and keeps git simple.

>Most of my commits when I'm working like this are named "draft". The names don't matter when you're going to redo the history later.

But then what value have you added by naming everything "draft" and creating a commit? There is no value in doing this.

>You've never started down one path for solving a subproblem only to realize 30 minutes in that it's not going to work?

Sure I have, but I don't need to enter it into the git logs. I'll either start over in a clone of the repo if I want to save the bad work for whatever reason (which is very unlikely), or I'll just stash the work, or whatever. The thing I don't need to do is commit the bad work.


> For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

The purpose of history is to remember. Rewriting history, whether git or in life, is bad; outside of the context of don't use it on public repos. Such advice is similar to saying, only point the shotgun away from you when firing. If you have to remember such a rule, it's best to avoid it.


But in unmerged branches, you aren't rewriting history, you're starting your work on a more recent commit in history.


Shush, don’t say those things because maybe people discussing merge vs rebase will realize they don’t discuss but just talk side by side.

One and the other does not care what is the context and what they discuss but apparently each one just knows better.

I also don’t mean those specific users - but in general any git discussion I saw for last 10+ years.


A history you can't understand is a history you can't remember.


> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

I've heard this many times before, but haven't been able to figure out why this is a problem. In your workflow is it a problem to have a cluttered commit history? If so, could you explain how?


> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

GitHub recently added a feature that prompts people to update their branches via merge. It's frustrating because every PR now had dozens of merge commits polluting the history.


A PR with merges is fine by me, it lets me see how the PR has evolved.

What I want is for GitHub to track changes between sets of commits in a PR so that you can do most of the review with merges and "address review comments" commits, and then rebase into well organized, logical commits and review that those have the same diff as the messy history after a force push.


The problem is PRs that have <5 lines of changes followed by dozens of pointless merges, because users are prompted to merge every time another PR is merged.

It wouldn't be a problem if people took the time to organize the history prior to merging as you said, but most people don't do this.


At least Gitlab does that when you push a new commit (force or not) to the branch: it'll show a list of value of the branch head commit and you can diff between them.


So does Github, but it breaks if you fix via rebase and push -f. Gerrit and some other competitors manage this better.


Some people have pull set to merge, which is a "I don't know how you can live like that" feature.


Some of us like our history tracking tools to.. track history.


I find it fascinating that people talk about "Having a history of what people did" in such emotive terms - "Cluttering", "Polluting".

What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.

Let it go. Accept that coding is not a smooth, robotic, endeavour, where everything is always tidy. And that's just fine.


I've accepted this a decade ago. I put my ego on the side, and now I don't care if my git history doesn't look like "beautiful" when looking at the commit graph.

I've been working on dozens of projects since, and probably did thousands of commits. Some of the teams of those projects included dozens of developers working concurrently on the same codebases. We always merged the upstream branches into our development branches and never did any rebases.

I have NEVER ended up in a situation where I thought rebases would have been better. The git tools and IDE integrations of our current age allow me to find any information I need from the history without pain.


Have you ever had to use git bisect? That's really where a 'clean' git history is important. Plenty of people never use git bisect, and that's fine too. That said it's a very useful tool when you do need it, and can drastically simplify finding when and where a regression was introduced.


You can `git bisect --first-parent` and only bisect top-level merge commits. In most cases that gets you to the ballpark of "PR that introduced the bug" no matter how dirty the commit history inside that PR had been and if you can git bisect further in that branch. In my experience that is most of what you want anyway, "PR that introduced the bug" gives more than enough context.


You can bisect across the more coarse merge commits, without “destroying” history and losing the ability to bisect across more granular constituent commits. Bisect is more robust when more information is preserved.


This exactly. I'd rather pinpoint the issue to a small commit with only a few changes vs. "well I know which feature caused the issue, now to wade through 65 changed files."


I have never used git bisect, which is maybe why I'm wondering why people care so much about curating and cleaning up git history.


The point of a clean git history is not to have a clean git history. The point is to make it possible to debug later, via bisect, or show, or even just a diff. The point is to make the workspace clean for the next guy.

Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.


It's hard to tell what side you're on, because both sides refer to their stance as "clean history".

The pro-revisionists (squash, rebase) say they do what they do so the history looks clean (no intermediate commits breaking stuff, a "straight line" graph, etc)

The anti-revisionists say they do what they do so the history looks clean (can see the actual development, can safely diff different commits to see what changed in between, see the log in chronological order, etc).

> Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.

Again, both sides could argue that they're the ones with more discipline.

> The point is to make it possible to debug later, via bisect, or show, or even just a diff.

This sounds anti-revisionist.

> The point is to make the workspace clean for the next guy.

This is one of the most common pro-revisionist arguments.


> > The point is to make it possible to debug later, via bisect, or show, or even just a diff.

> This sounds anti-revisionist.

That’s not how I see it. What makes debugging via bisecting easier is self-contained changes, not exactly chronological changes where you temporarily broke stuff and then fixed it before submitting your PR.


100% agree, but nobody gives a shit, and I’ve learned to just let it go. I’ve been in so many meetings, seen so many PSAs, and you know what happens every single time? Nothing. Maybe a couple people learn what interactive rebase is for the first time, try it once, say “it lost all my code” and never try it again. Good luck explaining ref log in these cases.


Did you notice, though, that rebase advocates use very "emotive" terminology when talking about git history? Like it's a subject they care about? Seems awfully touchy feely.


You say that like it's a bad thing. If there are two groups of people, and one of them is indicating (via words or behavior) that they don't care about something all that much and the other is indicating that they do care about that something quite a bit, why would I ever listen to the ones that don't care? It is almost tautological that the group that actually cares is going to have the more persuasive arguments and is thus far more likely to be right than the apathetic group.


I don't know if "emotive" is the right word, because to me this whole discussion is like trying to tell someone to be less sloppy because they make a mess when eating at their desk, knowing that the custodians will clean up after them.


> What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.

And git blame. And git checkout to a past state. It "doesn't matter" only if ease of understanding your project history doesn't matter.


how often is "understanding your project history" something that actually comes up for you? In all my years of working with projects in git, I will occasionally look at my history to help me find a change that may have led to a bug, but it really only comes up for me once or twice a year and even then, it is rarely an extensive deep dive and never very far back in time.


>how often is "understanding your project history" something that actually comes up for you?

Frequently, for any long and complex project. Large amounts were written by people no longer working on it, and the history of how things came to be can help fill in documentation gaps and make intent clear.

By "frequently" I mean something like "I check history for about 2/3rds of bug fixes, and 1/4 of adding features" to understand the surroundings better, when writing or reviewing. Anything that makes that better saves me hours per week.

It catches and prevents more than enough subtle issues to be worth the effort.


I'm on a long and complex project. However most of previous folks were not very good and one reason I'm here to fix it. Their history is not particularly useful except to giggle at.


Do you work with other people or on large codebases at all? It comes up pretty much weekly for me.


> I will occasionally look at my history

It's others history that I'm usually interested in. I can easy follow the small diffs of individual commits, but have a much harder time grokking a wall of red and green.


When I’m on call and discover at 3 AM that we’re doing something weird, I need to know whether we meant to do that and especially why. In theory you could write all that down, but the people who aren’t doing that in git also won’t do it outside of git. The more you write down, the less likely it is that I need to page you to ask WTF.


I read git commits in either the repo I am working on or a dependency repo almost every day


It comes up often enough. I run "git blames" frequently to figure why something odd looking was introduced. It may not be a bug, but a WTF. This is in an environment with few code reviews, despite my attempts to introduce them. It is frustrating.


Sometimes. Once every few months. Sometimes it conveys useful information. Sometimes it just hits the "product imported from previous VCS a decade ago" commit.


Same, and I've worked on some large, long-running projects. But never at the scale of a big tech company.

I've used git bisect the few times I've had to diagnose an issue that wasn't detected immediately, which gets you down to the exact granular commit.


I never use rebase, and I've never once had trouble understanding who did what where and when, even in a large project with 500+ users.

That being said, after reading this stuff, I may start using it on my local branches to clean up multiple commits into one tidy one, but that's about it.


> I never use rebase, and I've never once had trouble understanding who did what where and when

And a well-organized commit can also tell you the “why.”


Every time I try to blame or bisect and just end up stuck on an irrelevant megacommit I curse the Git maintainers that don't have the backbone to just get rid of --squash.

Every time I try to review a PR and the bookmark resets because they decided to force push I curse the Git maintainers that don't have the backbone to just get rid of rebase.


I think if the definition of a “good history” is “clean and not messy”, then yes I agree that’s pointless. If the definition is “a clear ability to see what changes were made, by who, and most importantly why” I think that’s incredibly necessary and would even go so far as to say it’s naive at best to not support.

The amount of time that has been saved in my life by someone leaving an explanation in their commit (for some weird edge case or context I’d have no way of gleaning because they’ve since left the company) is SO much more than the extra time I’ve put in to make sure the history has this extra info in it.


What's worse, the desire for cleanliness ends up making things like `git bisect` less useful.

If I had a bad day and introduced something stupid, I want a bisect to point me a the code I wrote on that bad day. If you squash liberally, perhaps because you want each commit to correspond with a release-note, you're going to lose that debugging granulariry.


The git history of a project is the main source of knowledge on that project, once the people that wrote it are gone. The git history answers questions such as "wtf is that supposed to do?", "what's this code connected to?", and "why did they do it that way?". You can use other kinds of documentation, but the git history is always there, so it makes sense to make it semi-useful.


This is such a strange thing to say. I'd be curious if you feel the same way about cleaning up your code, or cleaning up your room. I think you have an unfair advantage in this argument because it's difficult to defend such intangible benefits. We have to resort to making up logical explanations, or sounding unhinged or emotional as you suggest.

But it's simply intangible. My instinct tells me that it's helpful and that's okay. I don't owe anyone a justification for how I organize things, and there's nothing controversial about this. (Or maybe I could even come up with a logical example of a benefit, but that's a trap I'm not going to fall into) And a lot of people agree, and they know what I mean, so it's not merely an individual preference. If I have to work with someone who has strong preference against it I'll worry at that point about negotiating.


> I'd be curious if you feel the same way about cleaning up your code, or cleaning up your room

Very genuinely: I do not care at all whether you clean your room starting from left and continuing to right. Or, starting from doors and continuing toward window. Or whether you clean it in a random order. I also do not care about whether you clean every Friday or whenever you feel like. That is the equivalent of git history. Because this excessive care about git history is just that - insisting that room is cleaned from left to right as if any other order was an issue.

The reason why it is hard to defend the tangible benefits of this or that git history strategy is that there are very little benefits.


> I do not care at all whether you clean your room starting from left and continuing to right

But you didn't say that you don't want it clean. It sounds like you're talking about how it's organized rather than whether it's organized.

> The reason why it is hard to defend

I'm talking about intangible benefits and no that's not the reason. Intangible benefits are inherently difficult to defend in words. Citing this as evidence of anything is akin to a debater's trick.


Insisting on highly organized git history is like insisting on particular order of cleaning. History is not the product itself. It is not the code itself. It is less important and matters only a little.

In the rare situation when I have to read it, I am perfectly ok looking at previous commit too or whatever. It is still less overall work then what people describe in here.

Even with room, I do not want my room infinitely clean. I am ok when books are not ordered by height and color for example. I do not need t-shirst ordered by color either.


A clean git history on a pull request also makes it easier for the reviewer to understand your code. Small, concise commits will tell the reviewers about your train of thought or what issues did you run into, making it easier to pick up the context. I start with every code review by looking at the commit history.

I prefer not to have squash commits in our team for this reason. It makes master look good, but usually nobody ever looks at the master commit history first, they look at the merged pull requests. However, everybody must look at the commits you made in a pull request. If you have squash commits, you are encouraged to have messy commit history in your pull requests, leading to meaningless commit messages and even large commits (causing other problems...).

IMO the only advantage of squashing is that it makes it easy to roll forward when you accidentally deploy something that causes problems.


Yeah we use pull requests for the coarse-grained stuff and leave the small commits, which should also have good comments, intact. Maybe other shops use pull requests differently.


Agree, plus let's avoid having the CI pipeline creating commits in the remote repo. I like CI/CD to be stateless with regards to the files in the repository. I tried to plea for this today with my colleagues with very mixed results


It’s ego


I’ve never understood the tradeoff of rebasing, squashing or otherwise “keeping a clean history”. It always seemed like tons of sometimes highly error prone work (sometimes you can wipe out a colleague’s work with it! Wtf!), for almost no gain (why does it matter that the git history is “clean”?).


It matters because when I:

* use filtering commands like "git log -S"

* press the "annotate" button in my IDE and can see which commit introduced each line

* run "git bisect"

* use "tig" to drill down through the history of a file (shortcut "," is "move to commit preceding current line's blame commit")

...every step of the way, I get a meaningful description of why a change was made and what other diffs were necessary to achieve that change. And not just "fix", "bug", "PR commments".


* `git log --first-parent -S`

* `git blame --first-parent`

* `git bisect --first-parent`

* At least one "tig-like" with a --first-parent first UI: https://github.com/kalkin/git-log-viewer


--first-parent is incredibly useful, but having both merge commits and properly curated atomic commits under them is even more useful.


> * press the "annotate" button in my IDE and can see which commit introduced each line

In PyCharm, I can see which commit introduced each line, regardless of branching. Same with drilling down through a files history. Is this an IDE limitation you're seeing?

> every step of the way, I get a meaningful description of why

Isn't this more about commit messages, than anything else?


The context of my comment is the usefulness of a clean history, not about merging vs rebasing.

> Is this an IDE limitation you're seeing?

I'm using Jetbrains too.


> why does it matter that the git history is “clean”?

Makes reviewing a set of changes prior to a merge much easier. It's nice if there's a 1:1 correlation between a commit message and the actual patch contents.

Im sure you've dealt with the case of reviewing a colleague's changes with a commit message like "Enable logging in foobar module" and the patch is actually enabling foobar logging and a bunch of other stuff.

This makes bisecting your git history to identify and fix bugs much more difficult.

If the git history is clean, you can just read the commit messages and implicitly trust the developer if clean git hygiene is in place (as opposed to actually needing to read the whole diff on a per-commit basis to find out what _actually_ happen at commit XYZ, despite it's message).


For me the big gain is at the code review stage. It's much easier to review a set of patches that are a clear and distinct sequence of changes without "oops, fix bug" changes later in the series. It does require extra work by the code author, but it means less work for the code reviewer. Depending on the project and the organisation and the workflow, that can be a worthwhile tradeoff.


Never understood why you wouldn't want it clean. There's no benefit whatsoever to it being messy and it's a liability for a lot of reasons, whereas the clean version is free and easy and makes everything you do that interacts with git history simpler.


There is a giant benefit to it being messy. And that is that the mess is the actual history.

Every time you do a git rebase, you are literally asking your source control system to lie about history. If you mess up, and you eventually will, you're then forced to manually figure out what the history really was despite being lied to. If you mess it up, well, good luck.

I used to work at a company where someone (we never figured out who) in another group would rebase every few weeks. We didn't find out about it until their stuff was pushed then released. The result was that features which we'd written, QAed, and released to production would simply disappear a few weeks later. With no history suggesting that it ever existed.

Have you ever been pulled off of a project to go fix a project from a month ago which has disappeared from source control? You don't know what happened, you no longer have context, you've just got complaints because your stuff no longer works.

Is your desire for a "clean history" worth potentially creating THAT disaster for other developers on your team???


> the mess is the actual history.

The true history is not recorded in your normal commits either. Every time you modify your source buffer, that is the true sequence of events. This truth is lost already as you undo/rework things before you commit. You're ALWAYS manipulating and telling a false story of history whether you realize it or not.

Commits are a tool that give stronger backup/undo protections over simple file saves and in-memory editor undo lists. Just because you happened to save your work in a commit doesn't mean it should be instantly be regarded as holy history. Not anymore so than if you simply saved the file.

I think the bar for "holy" history should be whether it is published to a shared branch.


I believe that you are confusing "full history" and "true history".

Every single point in the commit history represents an actual state of a repository at a specific point of time, along with the information of which point or points were next before it. This is all part of the true history.

This is not a full history - you don't have every keystroke, abandoned commit, switch between branches and so on. But nothing that you're being told is wrong.

As soon as you do a rebase, you're rewriting history. You're claiming that there were specific points of time with specific states that never actually existed. You're losing information about points of time and specific states that actually existed, which someone once considered important enough to do a git commit over.

The difference becomes important if that someone, which at a previous job was me far more often than I would like, tries to go back to the historical commit. And finds that it is gone without a trace.


> As soon as you do a rebase, you're rewriting history.

Agreed. But the rewrite occurs in your private branch. It's history is just as private as the undo list in your editor. No one cares about what's going on in your editors undo list. And by the same logic they shouldn't care about commits in a private branch.

> You're losing information about points of time and specific states that actually existed

If you avoid rebase, then you end up "rebasing" without rebasing. You "squash" intermediate states by never recording them to begin with.

Failing to record history is not superior to squashing it.

> And finds that it is gone without a trace.

I don't have the details, but it sounds like someone rebased a public branch. Yes that is bad. But it's sort of like saying we shouldn't drive cars because someone chose to drive the wrong direction down a 1 way road.


> Agreed. But the rewrite occurs in your private branch. It's history is just as private as the undo list in your editor. No one cares about what's going on in your editors undo list. And by the same logic they shouldn't care about commits in a private branch.

The rebase becomes part of the public branch eventually, inflicting your lies on everyone else.

> If you avoid rebase, then you end up "rebasing" without rebasing. You "squash" intermediate states by never recording them to begin with.

If only Git had a third alternative, a way to... entangle two diverging branches of history without destroying or rewriting either. You could say it would be a bit like a car merging into a highway.

> Failing to record history is not superior to squashing it.

"We don't know" is at least an honest statement. Claiming that you do but then making up some nonsense is something that the LLMs do enough of already.


I think the crux of the argument is what you think about private git commits. You may think of them as "holy" history. Assuming the commits are still private, I give them no more prestige than the editor's undo log.

What do you think of the editors undo log? It's a very real historical log. Should it be treated as "holy" history too? If not, what makes the undo log less true/important than a private git commit log?


If you did X then Y then Z, there's a difference between saying "I did Y, Z, and X" (squashing/summarizing) and "I did Y then Z then X" (rebasing).

Squashing is often dumb and unhelpful, because you're now re-summarizing the points in time that you already considered worth highlighting when they happened (when you had the most context to judge them!).

Rebasing is lying about the order and/or context that those changes happened in.

Your undo log is comparable to squashing, but not at all to rebasing.

And then again, the first-order vs second-order summarizing distinction matters, and you already capture the second-order summary in your merge commit. Squashing is just destroying information for zero practical benefit.

> private

You keep using that word, but branches are often a lot less private than you think. Push it to get a colleagues' input on something? Congratulations, it's now public. Created a pull request that you want to revise? Already public.


Do you consider it lying when a commit doesn't include all changes in the working tree at the time it was committed? How about when a committer adds a file to .gitignore?


My advice is don’t engage with the ‘rebase is a lie’ argument. It is a textbook bad-faith argument, since it deliberately and explicitly ignores the stated intention behind rebase. It’s a talking point that people like to parrot without fully understanding what the author of the argument (Fossil developers) meant, and without fully understanding the implications of the argument. FWIW, HN mods in the past have previously confirmed that repeating this hyperbolic claim goes against HN guidelines.

Fun note though, I argued this directly with Dr Hipp (principal author of SQLite and of Fossil, inventor of the ‘git rebase is a lie’ argument) and during that discussion, he agreed to soften the language on the Fossil pages. They are still hyperbolic, using the word ‘dishonest’, and continue to distort the reasons and usage behind rebase, but he did remove some instances of the word ‘lie’ and ‘lying’, which is progress.

It’s a bit of a shame that they haven’t found the strength to frame Fossil in a positive light without trash-talking the competition. There is a good-faith argument for Fossil vs git, but they’re choosing not to use it.


> Every single point in the commit history represents an actual state of a repository at a specific point of time, along with the information of which point or points were next before it. This is all part of the true history.

Committers always have a choice of which of the changes present in their working tree they stage and then commit. The commit history is always a flat approximation of the real evolution of the files in the repo.


Git was never intended to capture and preserve the order of git commit commands, and you’re making incorrect assumptions that it was a goal. Maybe the git devs knew there is no useful information and no benefit to being so strict about something so arbitrary, or maybe because forcing people to keep their commits in stone in whatever the first arbitrary way they were added is a big disincentive to making commits at random points in time, which somewhat undermines the point of having a version control system at all.

BTW rebase produces a new commit ordering, but does not modify the old one.

> You’re claiming that there were specific points of time with specific states that never actually existed.

No. You are asserting intent on the part of git users and git that has never existed, you have misunderstood what git history is. The git history is not a claim that the state at that point existed during development, you are projecting your own goals that are not shared by git or git users.

> You're losing information about points of time and specific states that actually existed, which someone once considered important enough to do a git commit over.

Hehe this is so full of assumption. You write it like I’m rebasing someone else’s work, but you already know I’m only rebasing my own commits, and I’m the one who decides what’s important enough to do a commit over.

I like commit early, commit often. I want to make small incremental commits that don’t display to others that way and I expect to put small commits and fix ups together later into a single useful commit with only one commit message.


> the mess is the actual history.

This argument reminds me of a scene from Yes Prime Minister [0]:

> Humphrey: The minutes do not record everything that was said at a meeting do they?

> Bernard: Well of course not.

> Humphrey: And people change their minds during a meeting don't they?

> Bernard: Well, yes.

> Humphrey: The actual meeting is a mass of ingredients for you to choose from.

> Bernard: Oh, like cooking?

> Humphrey: No, not like cooking. Better not to use that word in connection with books or minutes. You choose, from a jumble of ill-digested ideas, a version which represents the Prime Minister's views, as he would, on reflection, have liked them to emerge.

> Bernard: But if it's not a true record...

> Humphrey: The purpose of minutes is not to record events it is to protect people. You do not take notes if the Prime Minister say something he did not mean to say, particularly if it contradicts something he has said publicly. You try to improve on what has been said, to put it in a better order. You are tactful.

> Bernard: But how do I justify that?

> Humphrey: You are his servant

> Bernard: Oh, yes.

> Humphrey: A minute is a note for the records and a statement of action if any that was agreed upon.

I think the analogy is pretty clear. A pull request does not record every single little change you made when writing it. You choose, from a jumble of ill-digested ideas, a version which better reflects your intent as you would, on reflection, have liked it to emerge. It doesn't matter that it's not a true record, since its purpose is not to record events but to communicate ideas. You try to improve on the commits as they have been written, to put it in a better order. You are tactful.

[0]: https://youtu.be/MF-Qnv2Srfs?si=U6xKNLrTAIn5h1Iz&t=118


You do realize that Yes, Minister is satire...?


IMO the relevant 'true history' is at the plane of 'things relevant to other developers'. As far as anyone is concerned, my local work history is that I atomically wrote all my code in a single instant. Commits are that unit of atomicity.

But I have never seen a commit disappear while rebasing, ever. That workflow is busted somehow. They were doing it wrong.


The golden rule is "do not rewrite history of a public branch". Rebase/squash your PR branches to your heart's content, but once it's merged that's it.

You get clean history by not merging branches with 50 intermediary "fiddling with X" commits in them.


Related, it is a bad idea to use long-running per team branches. Merge early, merge often, get commits to trunk ASAP. With best practices on unit testing, code review, and so on, this scales to many thousands of developers. And will save a lot of pain over time.


> I used to work at a company where someone (we never figured out who)

Wouldn't this be trivially solvable by git bisecting your deploy branch?


No, because git bisect operates off of the information in the history. And thanks to the bad rebase, the history no longer existed in the branch.


Just clarifying ... they took a version of master from say a month ago, did their work on it, then, they force pushed their work out, wiping everyone else's work that was added to master since one month ago?

I mean that's the equivalent of reversing your JCB through a house on a building site because "the house was not there a week ago when I last moved the JCB".

or am I missing something?


Basically equivalent to that. This was about a decade ago.

We had 4 teams, each released on a schedule. Each team had a branch. When a team released, it was merged from master, then each team pulled. The person who did the pull would change, and it was often a rebase.

So my team released, some other team rebased badly. There would be no sign of problems for us until after they released. But since 2 teams generally released at once, and people didn't remember who actually did the merge a few weeks earlier, it was hard to figure out who was actually messing up. (I had suspicions, but no proof.)

I've seen rebase used appropriately since. But that disaster left scar tissue.


So (sorry for picking on this scab) there became 4 branches each of which for the team concerned was the next beautiful branch and then say each week a team woukd merge into master and everyone woukd rebase but fuck that up once and now the beautiful branch team B is working on is out of step with the real master ... oh god.

Yeah that would leave a mark.


It looks like you’re talking about rewriting history that has already been shared. Pretty much everyone agrees that is a bad idea. It even has a prominent warning in the git book.

What others here, including myself, are advocating is rewriting your own history before you share it, which makes a very different set of trade-offs.


But that make no sense, as the second someone pulls from that branch it would be noticed.


Wasn’t `git reflog` viable?


How do you find the right commit from a month ago to reflog to?

How do you determine which combination of changes are yours, and not accidentally undo changes that other developers made in the last month?

Could git reflog have helped? Maybe sometimes. If we had the foresight to have saved the right commits from the past, sure. I think we did start saving old branches just in case it happened again, but then we had to sort through a month of changes to figure out what to keep, what to change, and what conflicts there might be.

Remember, the person screwing up didn't know he screwed up. And the person trying to fix it is doing so a long time after the fact. It was a disaster.


If you think the mess (i.e. 50 mixed up mini commits as you work on a feature and iterate on it) is useful history (it usually isn't) then use a different DVCS because git was not designed around this mode of operation and a lot of its advanced features work poorly when applied to such a history.

Alternative DVCSes which support this workflow include: fossil


this exact problem led us to block all --force pushes.


Yeah, that's a standard feature in large organizations afaik.


You can have it clean without rebasing. This is simply a matter of properly visualizing the history. Unfortunately, most of the major version control systems have decided to basically just dump a raw graph rather than presenting the history in a more user-friendly fashion.


I want it clean. But I don't want to do the work to make it clean. The juice is not worth the squeeze.


As your parent already said, (and it matches my experience), we didn't have any of these problems when using systems without history rewriting (mercurial, for example).

I recall when I first switched to git at work and the team was insisting on a "linear history", I was bemused: Could these developers really not handle a merge graph? It was bizarre how something straightforward in other VCS is suddenly "messy" amongst git folks.


It's not that we/they 'can't' handle it. It's that we choose not to because it's a better life.

It is like moving to one's favorite part of town.


> It's not that we/they 'can't' handle it. It's that we choose not to because it's a better life.

As someone who has had to put up with rebase for several years now, my life is definitely not better. And things that were trivial in mercurial are now complicated (seeing the actual chronology - both in the log and in the graph). That graph with multiple branches that git developers find messy can actually be really useful.


Is it free though?

I'm a fan of rebase myself, but understand the point made above. For me, the biggest pro of a clean history is when doing `git blame`. If the history is clean and the commits are good, it might solve my issue. On the other hand, if the commit in question is a huge mess of unrelated things it doesn't help me at all. I also find it way easier to review a PR with a clean, well-described history.


Well I also object to multiple unrelated changes in the same commit. Those should be separate commits, and separate code reviews for that matter.


You can run ‘git log —-first-parent’ that will give you the same output as squash merging, without losing the ability to effectively manage stacked branches/PRs.

But because GitHub and other tool’s version of rendering history just flatten merge commits into spaghetti we’re stuck with squash merge. Thanks GitHub.


Another thing is if you keep commits in a clean "state", it is easier to revert a commit, when you squash or keep them messy it can make it harder to revert.

Also sometimes you decide you want to backport some change to other releases, and if commits are in a good state, it is much easier to do this.


Sometimes when working on an old code base built by developers that came and went, one needs to perform what I call "code archeology": going back in time to understand why a feature was implemented the way it was.

Whether this is feasible at all depends largely on the care developers put in structuring their commits.


Yes! The source code is itself the first level of documentation. The commit history is the second.

A free form textual interface to document everything about why you made the changes you just made? Why not maximize the value of this resource!


This has become a large chunk of my job over the past few years, as part of fixing/upgrading systems no one has touched in a decade, and none of those original people are still here. There are some weird things in there I've only been able to figure out because all the svn history still exists.


When an engineer made a change is of no consequence to me. When it got merged into the main branch does matter a whole lot if you're doing trunk-based development.


> sometimes you can wipe out a colleague’s work with it! Wtf!

I’m not necessarily on Team Rebase, but isn't this just as likely with merging gone wrong?


Not nearly as badly.

A badly done merge can indeed ruin code. But you'll always have the versions that went into the merge, and the merge itself. Your history has all of the information to recreate exactly what happened, find what changed, and then figure out how to fix it.

A badly done rebase not only ruins your work, it also removes from the branch any record of your work having been done. Unless you can find the right stray old commit which is not yet cleaned up, there is no choice but to start doing it again from scratch.


It makes your git graph instagrammable.


It's for humans. You can more easily cycle to a specific point. I find linear history easier to comprehend. But it's not like a game ender. People will do whatever they will.

I find it easier to run git binary search with it like this too.


> keeping a clean history

This being a principal reason for VCS, I very much understand the motivation.


"Clean history" is not a principal reason for VCS. "Full history so you don't accidentally lose something and can revert to any point" is the principal reason. When "clean history" conflicts with "full history", the choice should always defer to the latter. Rebase clearly breaks the full history principle.


There is obviously a point where you do not want full history. For example, it would absurd to use your editor's undo history tree with keystroke level granularity as your VCS, because in most of those points in history, the code won't even compile because you were in the middle of typing a word.

If your goal is to be able to revert the codebase to a previous version, then you want your history to a series of well prepared, atomic changes where at each point the software is actually functional.


Clearly that's the reason you make a commit to begin with. Erasing that commit history is just silly.


I make a commit whenever I want to save my progress, and rarely is it ever in a state that could even be called a "version" of the software that one would want to revert to, until my feature branch is complete and ready for code review.

At that point I have a much better idea of the scope of my changes, and I can revise them into a few coherent commits, rather than a mess of "WIP" commits that are not a useful history to keep.


> and I can revise them into a few coherent commits,

Why bother with this step at all? It's literally pointless and serves only to stroke an (IMO) silly aesthetic preference.

> rather than a mess of "WIP" commits that are not a useful history to keep.

I also disagree that WIP commits are not useful history. You might have explored 2 or 3 different abstractions to solve a problem, and picked one but it turned out to be the wrong one, and one of the others would have been a better choice, but now you've lost the history where you explored these options. Are you suggesting it's no loss to erase those other commits and the context around which you thought it wasn't a good choice at the time?


There are a lot of reasons to make a commit other than wanting to make an atomic change meant to be integrated onto shared repository. Ignoring those is just silly.


I agree 100%, and never said otherwise, but this seems like an orthogonal point to what we're discussing here.


The principal reason for VCS has nothing to do with "history" at all. It has all to do with "versions".

What exactly the word "version" means depends on context. If you put your essay under git, you probably want to track how it was changing with time. When you hack on some codebase and throw things at wall to see what sticks, you want to be able to go back to the previous attempt should your next one turn out to be useless. You may even want to commit things just because it's a convenient way to send things to be built by CI - so you basically produce "versions" to test.

But when you collaborate with others to develop a project, nobody cares about whether you made typos during your hacking session and had to come back to fix them. It's not a useful information and it never becomes a "version" of a shared project, because why would it? It's just confusing and wastes people's time on review, and makes things like blame and bisect harder to use.

Or when you put a tutorial under a git repo, with each commit representing the next step to achieve a certain outcome. You may have tweaked each step gazillion of times to perfect it, but that "history" is completely irrelevant for the resulting repo. It's meant to store "versions", not "history". Those may correlate, but don't have to.

A git repo is a data structure that you operate on. Treat it as such and use to achieve your goals.

What's funny is that "squash on merge" strategy gets you worst of both worlds. You don't get nicely curated versions in your project because if someone actually cared to fine-tune their MR you just throw that information away, and the rest doesn't care in the first place anyway. Rebasing and squashing is an incredibly useful tool for everyday use by developers in their work, but it often gets used as a band-aid for lazy developers instead.


> What exactly the word "version" means depends on context. If you put your essay under git, you probably want to track how it was changing with time.

In other words, the history.

> It's not a useful information and it never becomes a "version" of a shared project, because why would it?

Because the real world doesn't match your ideal of how software development works. You need the full history because sometimes innocuous looking "simple fixes" are neither innocuous nor fixes, and because obscuring the history makes auditing and merges more difficult, not less.

> It's just confusing and wastes people's time on review, and makes things like blame and bisect harder to use.

So blame and bisect are poorly written therefore you should hack around them using a convoluted process that obscures the true history and raises the risk of requiring people to redo weeks worth of work by overwriting branch histories on shared repos. Great idea.

The authors of Fossil and Sqlite did a complete breakdown of everything wrong with rebase and what a proper tool should do, so I won't belabour the point further:

https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md


> In other words, the history.

Yes, that was my point. Have you read it?

> You need the full history because sometimes innocuous looking "simple fixes" are neither innocuous nor fixes, and because obscuring the history makes auditing and merges more difficult, not less.

Obscuring logically split and curated commits makes auditing and merges more difficult. Obscuring pointless edit history of the developer makes them easier.

> So blame and bisect are poorly written therefore you should hack around them using a convoluted process that obscures the true history and raises the risk of requiring people to redo weeks worth of work

blame and bisect are powerful tools that can work sensibly in various repository topologies. However, if you're either intentionally putting garbage into your topology, or not utilizing it well because of ill-defined idea of "clean history", you're simply doing yourself a disservice and induce unnecessary mental load.

> by overwriting branch histories on shared repos. Great idea.

Who said anything about overwriting branch histories on shared repos? It has its uses too (it can be useful in some cases when you're a downstream working on a project maintained upstream, for example), but it's not what most projects will ever want to do. That's not what rebase is there for.


Part of not losing stuff and reverting is being able to find stuff and precisely revert.


I love rebase (I'm a tip-of-master-only person, no merges ever, squash all your commits with `rebase -i` before pushing and write one good commit message for the group). But there's one really, really irritating thing about them:

You should not be able to use `--amend` during a rebase.

For me editing all my changes onto the commit I'm working on with `git commit -a --amend` (or as I've aliased it, `gcaa`) is automatic; I do it 500 times a day, just to save my work. But I can't count how many times I've been in the middle of squashing commits and accidentally typed `gcaa` and amended someone else's commit after fixing a merge conflict, and it's super annoying to unwind (if you realize after typing `rebase --continue`) so usually I end up just giving up and starting over. I really wish amending to a commit that wasn't one of the ones you're rebasing was just totally disabled.

I guess there are some other small complaints, like the annoying reversing of `--ours` and `--theirs` from what makes sense (yes, it makes sense if you have the internal model of rebase instead of the intuitive one, but that's stupid), rebase's tendency to pick the wrong parent commit if you've accidentally amended someone else's commit (and therefore lag a while and then produce a rebase log of 1000 commits or something), and the utter tedium of editing the rebase log to replace every instance of "pick" with "s" for squash except the first, since almost 100% of the time what I want to do is squash everything (and use the last commit message, not the first, and definitely not all of them munged together which is the default).

I would love a separate command or a flag, like "git rebase --tip" that does all of this automatically for my otherwise extremely elegant workflow (and I'm gonna be really bummed if it turns out it exists and I didn't know about it for the last 5 years...).


Random thought: given you already have the gcaa alias, perhaps you could include a check that .git/REBASE_HEAD doesn't exist in that?

Probably easiest as a little shell function like

    gcca() {
      local GIT_DIR
      if ! GIT_DIR=$(git rev-parse --git-dir); then
        return 1
      elif test -f "$GIT_DIR/REBASE_HEAD"; then
        printf 'Rebase in progress: commit --amend is disabled\n' >&2
        return 1
      fi
      git commit -a --amend "$@"
    }
rather than an alias?

[Edit] I forgot about rev-parse --verify, which simplifies this further:

    gcca() {
      if git rev-parse --verify REBASE_HEAD >/dev/null 2>&1; then
        printf 'Rebase in progress: commit --amend is disabled\n' >&2
        return 1
      fi
      git commit -a --amend "$@"
    }
This also leaves you still able to use commit --amend long-hand if (for example) you want to edit one of your own commits during rebase -i.


Dude your "random thoughts" are better than a lot of folks' work output ツ


That's a great idea. Git problems are sorta in the category of problems I've avoided trying to solve because, can't solve everything, and I've been hacking around it successfully instead.


You can shorten `>/dev/null 2>&1` to `&>/dev/null`.


Unfortunately that's bash-specific I think: the POSIX shell spec still hasn't picked it up. (Though strictly speaking, my 'local' is out of spec too.)


Also works in Zsh. Can’t speak for other shells.


I’m curious how this workflow differs from `git merge --squash`.


I dunno, I've literally never used merge ever since realizing what it does to the commit history.


Then you should probably look up squash “merges,” because I believe that’s your `git rebase --tip`. I use quotation marks because it’s not an actual merge. It just takes all the changes from the feature branch and stages them in the index, ready for you to commit as one single non-merge commit.


> accidentally typed gcaa and amended someone else's commit after fixing a merge conflict

You could try reverting the first commit on the HEAD once you finish the rebase. This is of course assuming your branch and the last commit don't touch the same files.


I have a similar workflow, but make a bunch of commits with "git commit --fixup HEAD". (Or --squash REF).

Then on the final rebase the commits are automatically ordered with s and f as appropriate.

Although I do a fair bit of amending, too.


I also encounter this issue. I would also like to forbid amendind a commit that is not part of the working branch.


Since everyone is bringing up squashing...

There's a false dichotomy nobody addresses here, which is the notion that there needs to be such a thing as "the" history for you to get the benefits of a clean history.

If all you really want is a linear history, then just do merges, and make sure the "first parent" is the main branch (which you can enforce with tooling). Now you can just traverse solely the (linear!) sequence of first parents, which is exactly the same view squashing would have given you, except without the information loss.

If for some reason you can't stand the idea of something branching off your main branch at all, then set up a separate job that automatically squashes everything onto a branch that only it can write to (or branch from). Now you have a truly linear history with nothing branching off it, exactly as you would've had with squashing. And you can always reproduce it on demand.

That way you avoid the information loss, and can always do archeology on the full evolution graph if needed.


If I were to write a blog post on this I’d make a few do’s and don’ts (why make a blog post when you can blog in HN comments?)

Don’t merge the base branch into a feature branch. Rebase to “update”.

Do use rerere and the curse of fixing the same conflict over and over is (almost) gone.

Don’t rebase (or force push for other reasons) a shared branch. Rule of thumb here is you can probably rewrite history if you work with _one_ coworker in a branch but any more than that and you’re more likely than not to upset someone.

Do rebase -I HEAD~N to reorder/reword/squash into easily reviewable sequences of commits.

Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.

Do use Merge, Squash and “Rebase+FF” as appropriate for merging PR. There is no best solution for every scenario so prescribing “always merge” or “never merge” or similar isn’t helpful. A good rule of thumb though is that IF a branch has merged from the parent branch to update (which I suggested was a “don’t”) then avoid merging it back. A branch that was updated that way is better to e.g squash when merging back.


> Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.

I've had to talk to soooo many developers about this. I want to see what changed since my last review, not restart my review.


When I force push to a branch on GitLab with an open merge request, GitLab retains the previous set of commits, and provides an interface within the merge request to compare the current set of commits to previous sets. I love this feature.


`git range-diff` can be handy for the reviewer when trying to "recover from" a mid-review force-push.


> fixing the same conflict repeatedly is annoying

This is usually caused by merging an upstream branch (e.g. develop) into your feature branch and then later trying rebase it.

Effectively the commits you've merged in from develop undo the changes you've made in your feature branch. You fix them but the foreign commits undo the changes again.

The solution is actually pretty easy. Use git rebase --interactive to remove any commits from the rebase that aren't directly part of the feature work.

You may still have an odd merge conflict to fix but you'll only have to do it the once and everything should go smoothly.

I would also recommend never using the same commit message twice. When you have a list of 10 commits all called "Wip" it's hard to tell which are obviously duplicates that can be deleted.


> The solution is actually pretty easy.

git rerere is the even easier solution.


I've had to force people to clear their git rerere caches as debugging the situation when that cache is FUBAR is gross.


I don't know, you still end up with a dirty branch full of commits you didn't write.


I was talking about a solution to the problem you quoted:

   > fixing the same conflict repeatedly is annoying


Git rebase is stupid, I’ve seen countless f ups because someone needed the git history to look good


git-rebase is stupid because somebody doesn't know how to use it?

I use it all the time and I really like how I can make garbage commits (wip, test) and then squash them into atomic commits which are easy to review and later on easy to bisect when inevitably mistakes happen. Sure I've fucked up too when I was learning on how to use it and those were some painful mistakes but only through using it and making those mistakes have I learned to use the tool to great advantage (clean history).


> git-rebase is stupid because somebody doesn't know how to use it?

The whole purpose of source control is to reliably track code changes so you don't lose anything and can revert to any point or recover from bad merges. Since rebase permits you to violate this core purpose and literally lose the entire history of code changes, then yes, it is stupid.


It doesn't. Reflog still exists.


No. Reflog exists locally.

Getting away from dependence on the ephemeral is why git exists.


Git rebase is also local.


The trouble with git rebase is that it can create havoc by those who don't understand what it is doing conceptually, and (probably more importantly) how to recover when things go wrong.

When I hear people griping about rebase, I assume that nobody took the time to teach them how to use reflog first. Once I had an understanding of reflog, I could mess up all I wanted (without pushing) and recover. In that environment, rebase can become a very useful tool. Without being able to recover, rebase becomes a tool of confusing irreversible destruction.


> git-rebase is stupid because somebody doesn't know how to use it?

No, it's stupid because it's really common for people to fuck it up, and because the purported benefits (clean history) are not something which matters.


If I tell you it matters to me, will that change your mind?

You've told me that it doesn't matter to you, but that hasn't changed my mind.

It's not a matter of objective truth, it's a preference.


Most of us are paid for working software, not the story of its creation. One is definitely more important than than the other.


my case for good-looking history is pride of ownership. Are you proud of this "thing that makes money and ultimately pays your paycheck" or do you leave it polluted and full of crumbs and detritus?

I would have made this part of a root-level comment but I doubt anyone would read it, but: I think what gets lost in all these git debates is what language/context are we talking about? a shop that churns out javascript and releases to prod every 8 hours is very different than a C++ shop that writes safety-critical software. Their git needs are very different, and having an "I make my bed every day at 5:30am before I go for a 5mi run and come back and drink my juice and eat avocado toast" git regimen may be appropriate for some codebases but not for others where "I woke up hungover at 10am with a partner whose name I cannot remember, in a bed that is not mine" regimen. I think countless human-brain cycles are lost to bickering between these 2 camps.


Perhaps the danger is the appeal. Pulling off a successful rebase that passes all of the unit tests first try, that's a thrill.


Mercurial fixes pretty much all of these problems via changeset evolution: commits are marked as obsolete and the obsolescence marker says which commit replaces the obsolete commit. So you have a meta-graph of commits as they change. You can therefore undo, you can trace the history, and since obsolete commits aren't shared by default, they slowly fade away.

https://www.mercurial-scm.org/doc/evolution/

It's a good idea that's been attempted to be ported into a git

https://lwn.net/Articles/914041/


There are just too many things you have to know about which commits are what and what's going on - especially with a larger team - it's not a good use of time imho to be fixing rebases when they go wrong. Like the big list of dos and don'ts at the end of this should be a red flag.

Better alternative I've found is squash merge - topic/feature branches are squashed as a single commit instead of bringing down each individual commit or creating a merge commit. You're history is cleaner, you're able to revert stuff easily, and it's really hard to mess up since it's just an atomic last step you do in your workflow.


The trick to using `rerere` with `rebase` is to merge first, resolve the conflict, record the resolution, then go back and do the rebase. It's explained here:

https://www.git-scm.com/book/en/v2/Git-Tools-Rerere

It's often easier to resolve a conflict during a merge than during a rebase because it presents you with left, right, and the common ancestor. You're also only looking at the tips of each branch. With rebasing, you're replaying each commit one on top of the next so you lose the common ancestor information and you may also have conflicts that won't exist at the end.

Another tip: if the other branch has changed a lot since you last rebased, even a single merge may have a lot more conflicts than you want to deal with all at once. In this case, consider a series of intermediate merges since you're going to throw them all away anyway.


It's all fun and games until someone's rerere cache is spitting out spurious bug regressions that make little sense.


* Regarding "splitting commits in an interactive rebase is hard" - I actually use `git reset` (to unmake a commit) followed by several instances of `git add -i` (to add individual changes into a commit) + `git commit` (to actually make the commits). If the commits to be split are in a middle of something, it's possible to do all of this inside a `git rebase -i`...

...which is exactly what is suggested in the section linked in the article, https://github.com/kimgr/git-rewrite-guide#split-a-commit.

* Regarding "weird interactions with merge commits" - `git rebase --rebase-merges` tends to help most of the time, since, during a rebase, merge commits are skipped by default (even if they contain changes).


> force pushing makes code reviews harder

On any code base I've worked on that's larger than a small FOSS project, I've found that this simply isn't avoidable. Yes, there's merge commits but, for reasons I won't go into, I think those are worse than the alternative of rebasing and making code reviews difficult.

> One way to avoid this is to push new commits addressing the review comments, and then after the PR is approved do a rebase to reorganize everything.

Not realistic when working on a code base where PRs are being squash-merged every hour and the code review lasts for days.

The best middle-ground is to avoid rebasing until the current wave of feedback has been resolved, even if no one has actually approved yet.


> Not realistic when working on a code base where PRs are being squash-merged every hour and the code review lasts for days.

But if they collide, you have to resolve all the merge conflicts anyway, and then with rerere you should have relatively little additional work on the final rebase.


Uising squash merges has reduced my need to rebase a lot. I don't really care if I have merge commits on a feature branch for a PR if there's a reasonable history on the main branch when I'm troubleshooting an issue with git blame.

Why squash merges? I have a number of team-mates who make local commits on feature branches that make the history look like a series of less-than-useful commit messages. E.g. wip, wip, wip, wip, make it work, wip, wip. All of the context for the change is actually on the PR, so it's really only helpful to see the PR message and have a link to the PR for the discussion on the change.


Squash merges are rebases.


Maybe under the hood, but it's an option on github if you use that, or you can run `git merge --squash` as well.


git rerere only "automates" conflict solving after you already solved it. As in, it remembers previous merge resolutions, even if you undo the merge/rebase.

It is particularly useful when doing difficult merges regularly. Invariably I'll find a mistake in the merge and start over (before pushing, obviously); the second "git merge" remembers the previous resolutions so I don't have to solve all the same conflicts again.

Similar for difficult rebases that may need multiple attempts.

Git remembers resolutions across branches and commits, so in the rare case where (say) a conflict was solved during a cherry-pick, rerere will automatically apply the same resolution for a merge with the same conflict.

I think the reason it's not on by default is that the UI is confusing: when rerere solves for you, git still says there is a conflict in the file and you have to "git add" them manually. There is no way of seeing the resolutions, or even the original conflicts, and no hint that rerere fixed it for you.

You just get a bunch of files with purported conflicts, yet no ==== markers. Have fun with that one if you forget that rerere was enabled.


> when rerere solves for you, git still says there is a conflict in the file and you have to "git add" them manually.

been using rerere for years, never seen this behavior.


"Don't make me think" has been the best principal of coding for me for a long time. Looking at how much thinking overhead rebase is producing, I'd prefer to avoid it.


One day you can suddenly get rebase, and then you almost never need to think about it ever again. Different people take longer/shorter to get to that point, and some people never make it.


I'm maintaining few projects and had really bad experience with rebase - it's about making sure that everyone in your team does rebase well and according to team rules, but it's just one mistake away to get into some drama, which I had been through few times. Once rebase has been disallowed, I never had to solve similar issues again.


I think it applies to using git as a whole. It's best to use a VCS that, instead of making you work, gets out of your way.


I'm jealous of people who enjoy rebasing. Such a simple life they must lead. When I'm tasked with rebasing a feature branch with 1,000 commits, written by 10 different people, onto a new release branch with another 1,000 new unrelated commits, written by 10 different people, I really start to question my life choices.


The linked article actually has a section "rebasing a lot of commits is hard" and it's conclusion is to rather not do it, mostly regarding what is mentioned above that section. But if you know how to use rerere properly, like some of the commenters here have pointed out, it could very well make your life a lot easier if you really want to rebase so many commits. And btw, make sure nobody is working on any of these branches or forks of those branches anymore, because after the rebase they won't be able to merge anymore.


Unfortunately rerere only solves the tedium of this task. I'm much more annoyed by cases like this: Person1 modifies line of code A (commit 1). In the other branch, Person2 modifies that line (commit 2). Then, Person2 notices that the first person has made a change, and takes that into account by rewriting their own change (commit 3). When you try to rebase these two branches, you might have to resolve a conflict between Commits 1 and 2, which is pointless work because those were never intended to coexist.


Doesn't what this comment says solve this for you? https://news.ycombinator.com/item?id=38166928


I hope Julia keeps writing about git, because I'm sure it will teach me something!

I'm still searching for a way to manage long-lived Postgres submissions, the most challenging git scenario I've encountered. Julia's post finally got me to brain-dump my current process, something I've meant to write down for a while now:

https://illuminatedcomputing.com/posts/2023/11/git-for-postg...

This link could almost be an "Ask HN": if any of you have suggestions to improve my workflow, I'm all ears. (I asked around a bit last May at PGCon, but didn't get any concrete advice there. Maybe it's too complicated for a hallway off-the-cuff discussion.)


I prefer rebase over merge because when merging since the commits are added to HEAD, when I add new commits after merge, I find it harder to understand what is being added with the current pull request.

One thing that annoys people when rebasing is, if you need to rebase a couple of times and also you change the commit history of current branch, you might end up solving the same "conflicts". To avoid this, you can use git rerere, this basically saves your conflict resolution, and if the same conflict is encountered, it resolves it automatically: https://mirrors.edge.kernel.org/pub/software/scm/git/docs/gi...


I've largely come to a workflow of creating a feature branch and periodically merging main out to that branch. When it's done, I use the github squash and merge feature to bring changes back in. Cutting non-ancestor rebases out of my workflow has been great for my personal sanity.


FWIW running `git commit` after a fixing a conflict during rebase is fine. git'll pick up the "in progress" commit's message by default so you'll get the exact same message editor as you would have with rebase --continue; and you can just run git rebase --continue afterwards to continue.

If that was during an 'edit' phase and not a conflict, then you get a blank state, but you need to commit any change there anyway before applying the other patches so commit s also the right thing to do... If you had intended to amend instead of a new commit then you can just squash the commit again later, or reset (soft!) HEAD^ to undo the commit and add again/amend.


Just don't rebase something that you've shared with others. It's just rude.


Do people not collaborate on feature branches?


Don't rebase ESPECIALLY stuff that people are collaborating on.


Well aware of that, but in my experience I sometimes collaborate on feature branches with 1-2 people and we just communicate when there's a rebase happening (For example if there was a bigger change in main and we want to align our feature branch on it). I don't think that's very uncommon.


Pull requests after comments and revision?


Doing a squish merge on the actual merge, at least on github doing a rebase mid-pr makes the PR act funny (comments point at non-existent revisions and such).

When I was at amazon their internal PR tool handled them just fine so I would do them in that case.

There's more nuance to the OPs comment along the lines of

> Don't rebase on branches others are working on

On a pr branch I usually at most expect others to pull and do a build to run it locally so I'm not very worried about wiping out changes.


> undoing a rebase is hard

I do a lot of rebases, but they are trivial rebases in a feature branch, like changing the order of new features and fixups, and then squashing the fixups to make a nice PR. Don't try to do smart weird rebases!!!

From time to time I have to make a smart weird rebase because my small commits order is just a mess, or I have to split a commit or something unusual. The important first step is to make a new brach as a backup to hold the version before the rebase. If I mess the rebase process I can just go back to my version before the rebase and start again.


> Don't try to do smart weird rebases!!!

If I need to do a non-trivial rebase I always start by creating a backup branch so I can delete my fuck up and start over trivially, which was a hard learned lesson.


also you can use git rerere if you need multiple goes for the rebase, it saves your conflict resolutions and applies them automatically when they occur again(you need to resolve the conflict when it happens for the first time).


I like to look at my repository and find a clear history, the projects story. Communication is extremely important for me, thus I use rebase wisely and squash before merging (fast forward) to develop and main. Conflicts are solved under feat/chore commit and never in develop or main.

A repository shouldn't be a dump of unmeaningful commit messages, but a curation of best contributions at the time of commitment.


Another great point in this discussion is the page on why fossil deliberately does not have rebase: "Rebase considered Harmful" https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md


Responses would make more sense if people included:

- if they have done any cherry-picking - if they have done any backporting - if they have reverted any commit that made it to main - if they have ever used bi-sect

I think rebase is harder to learn than merge, but once you get used to it, it lets you have history that is easier to debug, and use without extra effort


A semi-relevant tip, instead of "HEAD^^^^^^", you can write "HEAD~6" to the same effect.


Nice to see no one agrees on how to use git correctly.


Do I have the wrong mental model of rebase?

You want to "pretend" you all took turns at making the code better, like Alice goes first then everyone downs tools while Alice makes her chnages and then Bob picks up Alices work and does his changes, then Charlie starts

The difference is that Bob can start while Alice is working and all he needs to do is right before checking his stuff in, he grabs her fixes from master, and then applies his fixes on top of hers as if he had started after she had finished. Sometimes they both worked on the same code and he needs to figure out what's safe but hey that happens any which way.

As long as you ensure fast-forward merging onto master only it's kind of simple.

Annoying. but simples.


> It also makes me wonder if there’s an easier workflow for cleaning up your commit history that’s harder to accidentally mess up.

Use real merge commits and `--first-parent` as your default view.

It's unfortunate that to make `--first-parent` default you have to either edit your git config or grow a few new habits, and I still think there should be at least few more UIs that are focused on `--first-parent` with optional "drill down" instead of raw subway map diagrams. Subway diagrams look cool in screenshots, but so much of the complaints about "clutter" and "mess" in git seem to be just that people don't actually want to read the subway diagrams.


The most widely useful method to manage changes in repos is:

  1. Open a new branch and do development there. You can rebase and force-push all you want.
  2. When ready for review, open a PR of the branch, and never rebase again.
  3. On approval, Squash-Merge your PRs and include a merge comment linking to the PR.
This way you can "clean up" your development history in your branch, maintain history of changes requested in a PR, the entire development history is available at the PR link later, and you can revert an entire PR by reverting a single commit.


Good list of best practices. I only miss 'git rebase --preserve-merges' from the text. It is pretty nifty sometimes when you actually want the merges to stay around.


git --force needs to just be an alias of force-with-lease by default. having the default be the unsafer option seems... backwards. It should be something more like:

    git --force (acts like git --force-with-lease does now)
    git --force-anyway (acts like --force does now, "anyway" is just an example and should be harder/longer than -f/--force)
I understand that force-with-lease didn't exist first but this needs to be rectified.


I've found that consistently using the --onto flag when rebasing helps prevent errors by ensuring the target hash of the rebase is always explicitly specified. Particularly in situations where you may have a feature branch based on another feature branch, and that other feature branch had commits squashed into main so history/commit hashes have changed between main and your feature branch.


This is nice to read because I’ve been recommended rebase by so many people who clearly have no idea what it does and what to do if it goes wrong.


Never have rebased.

Get some updates, then merge master branch. Seems to be pretty straightforward to me.

  git fetch -p --all

  git pull

  git merge master
(Handle merge conflicts in WE of choice.)

  git push
Boom. I'll let the CI platform squash commits post-merge.

Option for gitlab if it's a small change and you don't want to run a full suite of tests.

  git push -o ci.skip


My team of hardware and software engineers is relatively new to Git. I’m trying to get them to commit and push small changes daily to a shared personal branch e.g. james-wip instead of one monster commit every 2 or 3 weeks. I encourage them to rebase their “named” personal branches so they’re easier to review and merge into master when they’re ready.


I used to commit and then rebase -i to squash interactively, but I always had troubles with which commit is the oldest vs youngest and wether I need to pick one and squash the rest or squash them all.

Now I make a single commit per PR that I commit —amend —no-edit until it’s merged. I sometimes have to rebase it onto main for conflicts but that’s easy.


I lost patience with the various git commit cleanup tools and now I just go nuclear. I use git diff > output.file, make a new branch, get apply output.file.

Fresh clean branch, no commit history, create pull request.

I'm not convinced there's any value to incremental commit messages. This simple, clean, and undoable as long as I keep my initial branch


What you're describing is just doing a 'squash' merge


git rebase is overrated... change my mind!


As far as I can tell, all of these are problems with deferring integration, not rebase per se. I prefer trunk-based development, partly because of all of this: https://trunkbaseddevelopment.com


IMHO `--fixup` commits and `--autosquash` rebases make rebase much easier.


Please write the spec for next generation version control. I feel like Git is great but we could have one which is easier to use and has built in concepts like Pull Prequest or deployed state.


This is a short-sighted view in my opinion. Similar views often lead to reimplementing old bugs, rediscovering hidden requirements, and ultimately re-implementing most of the things that made the old system complicated.

We've seen this in npm. Npm was supposed to be simpler than maven - then it slowly rediscovered the reasons we need package signing, support for circular dependencies, and all the other messy things that go into package management.

While I don't have numbers, it seems like a huge percentage of the "this problem should be simple, let's build a new app" projects either fail or recreate the same gnarly problems that led the existing projects to be complicated.


There's a plenty of better*-designed systems, problem is no one (except a few opinionated geeks) cares [enough] so they're unlikely to ever beat Git unless a miracle happens (aka a major industry leader will make something trendy by promoting it).

For example, Darcs (older) and Pijul (newer) are based on patch theory, so Git's rebase issues are moot there.

*) Of course, "better" is subjective.


> I feel like Git is great but we could have one which is easier to use

FWIW, Mercurial's been there for almost 20 years.


I might be lucky but in my whole developer life I have only used like 3 commands git stash, git pull --rebase and git merge. I'm not even sure I used git rebase once.


Well there are lots of useful commands out there you might want to try

  git branch
  git checkout
  git clone
  git diff
  git push


what do you think git pull --rebase does..


I think the most cited drawback of rebase - public branch refs becoming obsolete - is not applicable when you only use it to pull from a canonical remote.


Magic hocus pocus. What else?


well it rebase but I never had to deal with everything in this article. It put my commits on top of HEAD and that's it, never had any issue.


Sure, there are situations when you can get by without rebase. I would even say 'stay away from rebase if you do not have a solid understanding of what it does'.

However, life is much better with rebase than without it.


Looking at the comments here, it strikes me once again that argung about the best way to use git has quietly replaced arguing about coding conventions :)


First time I tried that because fixing conflicts in merge was too hard. I rebased the wrong way round.

Shared branches is a bit of a faff just due to coordinating.


I never rebase, ever.

I branch, merge main in regularly, make a PR, squash back to main and delete the branch.

Linear history, single commits for a single piece of work.

Easy, clear, no fuss.


Squash merge your PRs if using GH. Gives a nice linear history for each piece of work. Keep PRs small and focused to one kind of change.

The PR itself contains full history of subwork. GH can recover branch from PR. No work is lost.


Everything. Everything can go wrong! :)


> HEAD^^^^^^^

Dude! Learn the tilde notation


I'm someone who was programming long before CVS was a thing. To me this is akin to a discussion of "1st world problems". We got a remarkable amount of code written without version control, and we saved a lot of time on discussions these. (They seem depressingly common.) That said, I would not be without git now - but it's the cherry on top, not the cake. Perhaps it gives me a different perspective.

I'd rate the benefits of version control, in rough order of importance, as:

* It allows multiple programmers to work on one body of code. This has always been it's main use. Rebase screws with this because multiple authors updating a rebased branch creates a cluster. The simple fix is rebases to a branch are only allowed when you are the sole person working on it. (Atlassian is wrong. The branch can be public. It's multiple writers that creates the problem, other people reading and reviewing a public branch isn't.)

* A backup. I've lost more than enough work to know the importance of backups yet if I have to do it manually I still don't do it regularly enough. git==backup. Wonderful.

* Assist in reviews. Actually, I'm not sure how you would do reviews without it, as version control system both highlights the differences and serves as a communication medium. Rebases help here, as they let the author parcel up a body of work to make the reviewers job a lot easier.

* Make open source contributions auditable. To be fair I've never used this personally, but I use software that depends on it - like the kernel, so I rank it pretty highly. If becomes very difficult to anonymously introduce malicious changes when the version control system is tracking who made every modification. In an amazing coincidence, a branch is effectively a block chain which makes it hard to change. Rebases could muck this up of course if the "only personal branches may be rebased" rule isn't enforced - but it normally is.

* Bug hunting. Blame and bisect are the main tools. In my experience compared to the previous points this gets used very rarely, but bisect in particular can save a lot of time on those rare occasions. Before version control we did bisects using by restoring backups. Notably git blame still works perfectly if you follow the "only rebase branches you own" rule.

* Going by the discussion here, some people spend time on archaeological digs through source code repositories. It seems some of them prefer their digs to be dirty (aka rebase free) and others like it clean (rebased).

Interestingly, most of the noise here comes from people arguing about the last point. That strikes me as about important as the colour of the bike shed. The only other place rebase effects is reviews. Reviews are a problem everywhere I've worked. Everybody prefers to be doing something else, so making them as friction free as possible is a worthwhile goal. Rebasing does that (and so does unit tests).


other developer: I hate rebase.

me: Do you even git bro?


Haven't read the article yet. Whenever I'm working on a feature branch, I always tend to "merge master into the feature branch". The effect of this obviously being that I want to have the latest changes incorporated into my work to avoid conflicts and/or to proceed with my own feature work. This has always worked well for me and never failed me.

If I understand what rebase does correctly, it just adds all my commits in my remote feature branch to the HEAD of the branch that's being rebased onto. That's why after doing it and finishing it, one needs to do a push force because the head of the remote feature branch would diverge. But... Why? How is this better or different than just merging master into the branch? Gitlab for example and Intellij show all the branch changes and commits with their hashes so it all can be cherry picked or reverted if needed quite easily...

Has anybody who used to "merge master into" and now uses rebase that has a much different view on it being better?


This is a simple example with only two feature branches, and it's already a bit hard to follow. Imagine it with even twelve which is not even that many branches:

    *      Merge Branch 'B' into 'main'
    |\
    * \    Merge Branch 'A' into 'main'
    |\ \
    | | |
    | | *  B Commit 2
    | | *  Merge remote-tracking branch 'upstream/main'
    | * |  A commit 2
    | * |  Merge remote-tracking branch 'upstream/main'
    | | *  B commit 1
    | |/
    | *    A commit 1
    |/
    *
Compared to a commit graph where feature branches are rebased to replay their commits against the tip of main before merging:

    * Merge branch 'B' into 'main'
    |\
    | * B commit 2
    | * B commit 1
    |/
    * Merge branch 'A' into 'main'
    |\
    | * A commit 2
    | * A commit 1
    |/
    *


Some people don't like having standalone merge commits that show up in the git history when merging master into feature


Why? Is it a visual thing, or practical?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: