Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] The merge vs. rebase debate (graphite.dev)
52 points by thunderbong on Dec 29, 2023 | hide | past | favorite | 201 comments



I'm a big fan of the rebase workflow, but not of squashing. I wrote it as several separate commits for a reason: documenting each step, making each step revertible, separating refactors from from semantic changes, etc. Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

(A workflow that preserves those commits requires actually having useful commits, and obviously if you have PRs with commits like "fix it" and "fix it more" then it might as well get squashed.)


Most developers I've worked with think of Git as a CLI front-end to GitHub. They aren't writing commits with the intention of showing their work, they are using commits to save their work before signing off, or to trigger CI. They aren't proficient enough with Git to go back over these and clean them up, as would be expected in a project like the Linux kernel.

If that's the state of developer proficiency with version control then, ideal workflows aside, we need a way to prevent those noise commits from ending up in master, and the easiest way is to squash with every pull request, as part of the "merge" automation.


Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise? That's what I do for my projects. Most contributions coming from others tend to get squashed & merged, where as I'll rebase & merge my commits. But I'll sometimes rebase & merge PRs from other contributors who have broken their change down into thoughtful commits.


This is it. The real answer to this whole debate is "it depends" but for some reason that doesn't seem to satisfy people.


The effective management approach (in opposition to the limp-souled "servant leader" anemia) is to make decisions for your team to eliminate stupid work.

What, you're going to let anyone waste time on tabs versus spaces?


You've presupposed that this is all about stupid stuff in the first place. Moreover, mixing tabs and spaces for indentation leads to sadness. But squash and rebase are perfectly compatible with over another.

Programmers sometimes have to decide stuff. That's okay.


> Why do you need to squash & merge literally every PR? Why not use squash & merge when appropriate and rebase & merge otherwise?

"When appropriate" is the key here. That's asking people to make a decision, take on additional cognitive load, which does not have a clear deliverable or success criteria. In practice the default is what happens.


We need to make decisions about code and maintainability all the time. I don't see why this is any different.


Every opportunity to eliminate cognitive overhead is a win for the team.

As I say to my children on the reg, "What's the best way to clean up a mess? Don't make it in the first place."


Telling my child not to make a mess so he can avoid cleaning it up would not be an effective tactic at anything other than acting smug.

You're presupposing that this decision is worth making for programmers in the first place. I don't see why that ought to be true.


As my children repeatedly prove: one person’s “not a mess” is another person’s disorganized chaos.


I'm a fan of the workflow where the PR gets squashed in the upstream git repo, but the individual commits are preserved in the PR in the code review tool. I feel that Phabricator handles this well.


But does that still lose the source commit long term? What I'd love to have is a mechanism that keeps references to the pre-squash commits at blame granularity, allowing one to dig deeper into the commit messages associated with a given line. Kind of like a sourcemap, but for squash instead of transpile.


You don't lose it long term if you're using GitHub PRs — GitHub keeps the "reflog" (quoted because I imagine their implementation may not actually use reflog) of the branch indefinitely, even after force pushes. Graphite (built to replicate the Phabricator workflow) enables viewing and diffing these versions. (disclaimer, I helped build this)


When you squash merge on github the new commit references the old PR. If you don't delete branches on merge you would keep the commit history on that branch, but then you have to battle with branchs persisting forever.


Branches are mostly free, so this isn’t a problem if they are properly named.

“try-again-something5” doesn’t cut it but “$ticket-at-least-five-words-here” does.


Branches are not cognitively free. Searching through the haystack of hundreds of branches to find a particular needle is a pain.


You’re translating the problem from : searching through branches that are named according to their ticket and what they are meant to accomplish to: complex and not-context-free git bisect.


GitHub and Azure DevOps also do that, you just need to know where to look.

I don’t mind squashing either, unless I’m being really intentional or rewriting my history my intermediate commits couldn’t be reverted without leaving stuff broken (totally a me problem of course).


Intermediate commits being snapshots of “broken” state isn’t a problem at all. When I quit for the day, I commit, broken or not, and pick it up in the morning. I want to be able to drop my laptop in a puddle and still pick up where I left off when I get a new one.


I think the problem is putting those broken commits into the trunk. Ideally you want to clean up your commits so if you need to revert you don’t accidentally break your build and so reading thru history isn’t awful.


Nobody anywhere suggested putting broken commits in trunk. This is why branches exist.


Squashing is nice IMHO, and even a must after a while. For one recent very small project a squash of the commit history reduced the storage from tens of kilobytes to a few hundred bytes total. Orders of magnitude. That was a very small project, so imagine the storage space savings for larger projects.

I find that the commit history tends to grow viciously for anything I've been involved with. And I fail to see the benefit of amassing that amount of detail once you are past the stages where each individual commit is reversible (or even interesting)

So, for a project that runs for, say, three months, the commits of the first few weeks aren't really very interesting or valuable at all at the end of the period. Just hard drive space being eaten up. YMMV.


Real example: I had to mitigate an outage in the middle of the night, and I found the root cause in ten lines of code. I needed very badly to “git blame” that code and find a specific commit message from three years ago and its author (a former colleague), to figure out what he had been trying to do.

Right now I have a full clone of a pretty large monorepo dating back almost nine years, and the .git dir is less than half of the total space. Sparse checkouts and shallow clones can make clown car hardware sort of work, but I do not want to go back to the pre-git days and try to work without full history to conserve 0.008 TB of SSD. We spend more than that on coffee.


Ok, so it appears we just approach some problems differently. If I had to work in the middle of the night and found a problem spanning ten LOC my procedure would be "fix, commit, back to sleep". I would never feel any need (nor have the time) to investigate the origin of the problem, much less feel a need to "git blame" anyone. Likely I would not even look at the commit history as (to me) historical code is not really relevant from a current-problem-solving standpoint. But that's just me I guess.

> make clown car hardware

I'm not a native English speaker so this appears to me as a bit confrontational and/or an attempt to ridicule me or the points that I am making? I do not work with clowns, cars, or hardware.

> full history

I can see one benefit, and that is the case of sensitive software for special use (eg govt. mil, etc). Here, I agree that in the case of a vuln being discovered it could be a good thing to go back and trace the origin. So, I'm not opposed to keeping history, it's just not relevant to any particular extent for the types of tasks I do in my current $job.

My stance is more like, if I'll never use any of this stuff, why keep it at all? It's not about costs it's just keeping things simple


A clown car is comically small for its use (the joke is that it arrives and clowns keep getting out, seemingly more than could have been inside). I’m sorry, I didn’t mean you as a poster, but someone else who is hiring professionals while under-specifying hardware they provide you. Storing source code is not a problem you should have reason to worry about.

The problem with “fix it now” is that I didn’t know for sure that our behavior was wrong, I just knew that a consumer of our microservice had begun alerting on errors. I had to find out whether this was a mistake (which maybe I could safely fix) or important and intended (and any change must be negotiated with other consumers). It comes with having an old, complex system with a lot of dependencies and without exhaustive documentation.


I can't count how many times this has happened to me. Trace an issue down to some piece of code that makes it look very intentional, then have to go spelunking through whatever history I can find to figure out what the actual correct specified behavior is. Bonus fun when you have to start reaching out to clients to find out if anyone is relying on it acting that way.


Exactly. Storage space is much cheaper than human time and brainpower.


If you want to preserve separate commits you can just make them separate PRs?

Small PRs are better since they're easier to review.


There are many reasons for having several commits in the same PR.

PRs often have a lot of overhead. They need a separate branch, CI jobs need to run, there are more notifications for everyone, separate approvals, etc.

Sometimes there's a need for keeping separate commits if they're all related to a single change. Proposing them all as part of the same PR helps maintaining that context while reviewing as well. Reviewers can always choose to focus on individual commits, and see the progression easily.

Sometimes it does makes sense to squash a PR if the work is part of the same change, but the golden rule of atomic commits always applies. Never just blindly squash PRs.

In fact, if the PR is messy and contains fixes and changes from the review, and the PR should have more than one commit, I go back and clean up the history by squashing each change to its appropriate commit. `git commit --fixup` helps with this, so I also prefer addressing comments locally rather than via GitHub's UI. Then it's a simple matter of running `git rebase --autosquash`.


I do the same. Given that you didn't mention git-absorb[1], I wonder if you've tried it. If not, you might be very happy to discover it.

[1]: https://github.com/tummychow/git-absorb


Interesting. I'll give it a try, thanks. At first glance it reads like too much magic that I'd have to undo manually, but if it works reliably, it would save some time. Though usually it's not much work to figure out which commit to reference in `--fixup`. In short histories, I usually just use `HEAD~<n>`, instead of hashes.

BTW, thanks for all your open source work. <3 You're an inspiration!


It is a very nice bit of magic. I've been using it for years, and I don't think it has ever given me a false positive. Plenty of false negatives though, but it tells you. And then you just fall back to what you've been doing anyway. It is literally a tool that has removed a fair amount of tedium from my workflow.


> making each step revertible

...

> CI jobs need to run

A step is only revertible if the previous state has passed CI, and as you noted that only happens if it is a separate PR.

The only reason to split a PR into separate commits it to make it easier to review and understand. But if it's so big you need to do that, it should be separate PRs anyway really.

IMO the only time you should ever preserve branches when merging them is if they're long-lived ones that multiple people have worked on and the commits in them have passed CI.


> But if it's so big you need to do that, it should be separate PRs anyway really.

The problem with that in practice is that the commits often depend on each other and then you have a choice of:

- serializing the PRs, i.e. only have one PR outstanding at a time, which increases your development latency; or

- no native UI for tracking how the PRs relate to each other

Yes, there are people trying to hack around the second problem. Still, the story there is far from great.

The CI thing is a real issue, I'll give you that. Though it is obviously also solvable, if the will for it were to exist.


> serializing the PRs, i.e. only have one PR outstanding at a time, which increases your development latency

I don't see a significant downside to this. It doesn't affect my development latency - if the commits depend on each serially other then you have to wait for them to be reviewed in order whether or not they are separate PRs.

I do agree that GitHub/Gitlab don't support dependent PRs very well. You pretty much have to wait for one to be merged before submitting the next. Not a big problem in practice but it could definitely be improved.


Once you start doing serialized PRs then doing squash merges increases your chance of a nightmare.

For example you have PR 1 and PR 2 (based on 1). Squash merge PR 1 into main. But while 2 is under review someone merges in 3 into main and you need to bring it into 2 to resolve some conflicts. You’re now hosed because the changes done in 1 are now present TWICE. First in 1’s commit, and also in the squash commit. Now you’re resolving all of 1’s changes as merge conflicts.

This may sound like a contrived examples but it has happened to me EVERY time I’ve worked somewhere that demands squash commits. And this is one of the reasons I hate squash commits.

“Makes the history prettier” vs “Rewriting the actual commit history”. Telling the truth > pretty.


Ah I think where you're going wrong is trying to merge master into PR 2. Instead you should just rebase PR 2 on top of master. Git will automatically figure out that PR 1 has been merged and drop it. If it doesn't for some reason you can just do an interactive rebase.

Git history isn't "the truth" until it's shared. There's absolutely no issue rewriting history history for your own edits that nobody else is using yet. You do that every time you press undo!


> Git will automatically figure out that PR 1 has been merged and drop it.

Unfortunately, that part doesn't work if you let GitHub do a squash, because then the commit on main has no corresponding commit on (the branch of) PR2.

When cherrypicking the commits corresponding to PR1 during the rebase, the merge algorithm will notice that the change is already there and notify you of the empty commit if the change is completely identical. But if it isn't, that falls down.

It's still not too difficult to recover, but it's annoying.


> Unfortunately, that part doesn't work if you let GitHub do a squash, because then the commit on main has no corresponding commit on PR2.

Did you try it? It does still work. Git will detect that the changes are identical and drop it, even if the commit hash is different.


If you're doing proper atomic commits, as most people critiquing squashing probably are, the overhead of making separate PRs for each would be ludicrous. It depends heavily on the situation, but in favorable conditions (greenfield development of some simple CRUD app, for instance) you can easily produce dozens of clean, atomic commits a day. As part of the same PR, those take almost no time at all to review. Put into separate PRs, you'd be wasting a lot of time and effort both on the reviewer's and reviewee's side.


You can "just" do anything. The problem is that things don't "just" work that smoothly. What happens when the feature or bug fix I'm working on demands a refactor or a name change or something of that sort. I could put the refactor in a separate PR, but what if I'm not sure of the changes until I get far enough along with implementing the feature or bug fix? I might want to go back and tweak the refactor I did. So if I put up a PR as soon as the refactor was done, I'll then need to put up another PR with tweaks to it and then a third PR with the actual feature or bug fix. Or I could just put them all up in one PR together broken down by commit. Reviewers can review commit-by-commit. Or I could wait until I've finished everything, and then I'm left with submitting a single PR or splitting them into multiple PRs are submitting them simultaneously. (And dealing with stacking them appropriately.)

This is of course a balancing act. Which is my point. Sometimes it makes sense to split things up into multiple PRs. But sometimes it makes sense to fatten a PR a little bit with multiple commits. You can't just say "small PRs are better." Size is but one dimension of what makes a good PR.

This is why I personally use both "squash & merge" and "rebase & merge." If a PR has a bunch of commits but is really just one logical change, then I'll squash it. But if a PR has thoughtful commits, then I'll rebase it and do a fast-forward merge.

My bottom line is that I try to treat the source history as well as I treat the source. The source history is a tool for communicating changes to other humans, both for review and for looking back on. Squash & merge has a place in that worldview, but so does rebase & merge.


The problem there is that toy systems like Github's review tool don't have good flows for stacking dependent changes, and then people end up not bothering.

For this reason I don't bother with Github and the like and just use Gerrit ;)


If you actually have commits which stand alone - i.e. the build succeeds, the tests pass etc - I see no reason not to land them altogether with a rebase. What I object to is commits which do _not_ meet those criteria, and make life much harder for the next person who has to spelunk through the history trying to work out what has happened and why.


We're in full agreement there. If you run an "every commit should compile" project, rebase. If you have "fix" and "another fix" commits stacked atop the original in a PR, by all means squash them all.


Rebasing would be so much better if Git just had better defaults.

Having to fix the same merge conflict for each of your commits is one of the leading causes of developer burnout :D


Yeah it can be a bit of a pain. If you have long running branches or do keep hitting having to re-resolve, getting good at git rerere is recommended; "reuse recorded resolution"!

https://hn.algolia.com/?q=git+rerere


Rebasing generally doesn't require repeated merge conflicts though? Since the formerly conflicting commit disappears and the non-conflicting one is now baked into a linear history.

Regardless, `git rerere` is supposed to solve that problem, but I don't do enough conflicting merges to be intimately familiar with it in practice.


I’ve known developers who left the company over the rebase mandate for this exact reason.


Squash before rebase?


It sounds like you're more describing a stacked PR workflow. I achieve the same thing using stacked PRs and can still have a bunch of the "fix stuff" intermediate commits that get squashed away, because who can really predict whether everything will pass CI in the first attempt. :)


I don't want to have a stack of PRs where each one depends on the previous, where each PR needs to justify itself separately while at the same time being interdependent and ordered. That adds cognitive overhead and makes it take longer to get merged, if it gets merged at all.

It's possible the tooling could handle that case much better, but until it's sufficiently better that it's as simple as `gh pr create` by the author and one click of a merge button (or equivalent "@somebot merge") by the reviewer, that's still too much.


If you're using GitHub or gitlab and merging through pull requests, I've found that these commits become duplicative and, given GitHub's rich comments-based collaboration UX, somewhat lossy. It is much easier/more valuable for me to view a commit off main that points to the PR that brought it in and the discussion that took place (along with the original commit in that branch) than to see the individual commit rebased into main without context.

Also, a lot of people (myself included) write really crappy commit messages that don't tell the whole story behind a change. This is another reason why falling back on the PR has been valuable for me.


Historically any time I try to rebase a branch with more than a few commits in it, it effectively fails unless I squash it because resolving all the conflicts for every commit would take me over an hour. Maybe this is just a 'many contributors' problem though, since our repo lands many large PRs each day.

(Merging has the same problem, so I squash frequently and then rebase.)


How does merging have the same problem? By definition, merging only requires you to resolve the conflicts once, at the point of the merge commit.


Me too. I think git blame is the ultimate documentation tool - you can have a description for each block of code, tied to who and when it happened. If you squash, all the sudden you have a simple explanation for hundreds or thousands of lines.


imo you should always squash,when merged pr / mr should act as a single unit not many commits.

Also having many commits does not means it's going to be easier to revert / fix than a single big one.


> Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

Can you give an example? I don't even think I understand what you're saying. You control your overall summary, so it's up to you to make it useful.

I would cite clarity as my reason for squashing! I think most people are just bad at organizing (& naming) their commits, so they make A LOT of vacuous ones, crowding out the forest for the trees. It's never helpful to see the git blame saying things like "Addressed PR comments" or "Resolved merge conflict", etc.

I do prefer a merge commit when the author has done a good job with their own commits, but that's rare. In all other cases, squashing is greatly preferable to me.


> > Squashing makes the diffs much more painful, such as by burying an important change in a sea of mechanical ones when they were originally in separate commits.

> Can you give an example?

Sure. Here's a common pattern I've used and seen others use:

Commit 1: "Introduce a helper for XYZ", +35,-8, changes 1 file. Introduce a helper for a common operation that appears in various places in the codebase.

Commit 2: "Use the XYZ helper in most places", +60,-260, changes 17 files. Use the helper in all the mechanically easy places.

Commit 3: "Rework ABC to fit with the XYZ helper", +15,-20, changes 1 file. Use the helper in a complicated case that could use some extra scrutiny.

I don't want those squashed together; I want to make it easy, both in the PR and later on, to be able to see each step. Otherwise, the mechanical changes in commit 2 will bury the actual implementation from commit 1 in the middle (wherever the file sorts), and will mask the added complexity in the commit 3 case.


> rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits

"Cleaner", for some definition of "clean". In this case, pretty, not accurate.

I just can't understand the draw of rebase based workflows. It seems to be an expression of a preference for aesthetics over accuracy. What is the point of source control, other than to reliably capture what actually happened in history? As soon as you start rewriting that, you compromise the main purpose.

Using merge commits preserves what you actually committed. If you ran tests before you committed, rewriting that commit invalidates that testing. If you need to go back and discover where a problem was introduced or what actually happened, with certainty, in a commit history, rebase undermines that, because it retroactively changes your commits.

It's like a huge portion of the industry is collectively engaging in a lie, so that our commit histories look prettier.


> What is the point of source control, other than to reliably capture what actually happened in history?

Unless you're committing every keystroke, you're recording a curated history. You choose when to commit, and by choosing you declare some historical states to be worth keeping and the rest to be merely incidental.

I think usually history "rewriting" (eg, rebasing) is much more about curation - choosing which aspects of the history you care to record - than it is about presenting a false record.


Exactly. To analogize to history history: OP wants the version control history to look like a collection of primary sources. Here's the president's daily calendar, there's the letter he received on April 24 from a small child in Wisconsin. In this model, it's up to future code historians to piece it all together into a story.

When I go back and look at the git history, I would much rather have had someone do the work of compiling the story for me at the time. Commits are your chance to document what you did for future programmers (including future you). If you insist on them faithfully reflecting every change you made over the course of three days, then future you will have to piece that all back together into a coherent story.

Why not take the chance to tell the story now, so that future you can skip all the false starts and failed experiments and just see the code that actually made it into main?


This isn't a novella. We're talking about executable code. What you're suggesting is the equivalent of using an encyclopedia as a legal reference.

Merge commits tell the coherent story. Commits reveal the messy history that got you there, which is critical exactly when you need to look at history. If you're not trying to track down the source of a problem and how it was introduced, in a deterministic way, why do you bother keeping source history? Publish pretty changelogs instead.


Can you give a concrete example of when you've used the messy details of how a change was introduced at a sub-PR level?

I'm strongly opposed to squashing, but when have you found that a chronological sequence of commits-as-they-were-committed has been helpful where a sequence of heavily-cleaned-up patches would have obscured useful information?

In my experience spelunking through git history, I've only ever been frustrated at the number of different red herrings I've found in a git blame that turned out to be a failed experiment that never got merged in.

Concretely: API changes are a big one, where in the history it looks like we may have once accepted something different than we do now, but then it turns out that that change was reverted before ever making it to production. This information being in the log clutters the git blame (the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR), without providing an ounce of useful information about the history of the production app.

As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.


> the function was actually last changed in 2016, but someone modified it last month only to revert the change before submitting a PR

I can't think of a specific example from my own history, but something like this is what has happened. A function was changed in order to support a different change elsewhere in the code. That other change was later modified, incompletely, to remove the need to modify the first function, and the change to the first function was subsequently reverted. Down the road, it's discovered that the modification was incomplete, and when reviewing the new code, you wonder, "how could this possibly have ever worked?" The answer is that it didn't, and when it was committed, there was another supporting change that made it work. By erasing the history of that other change, you remove the possibility of discovering the reasoning behind the change and the source of the introduction of a problem.

If I had seen that intermediate state that's been erased, remember it, and try to find it, now I'm being gaslit by source control, because I remember a real change that was there in a commit, but source control now will lie to me and tell me that it never existed.


> I'm strongly opposed to squashing

> As a rule, when debugging problems, I don't care about how your private branches changed over time, I care about how the production code changed over time.

Ironically, squashing is probably the best tool you have to deal with developers who won't clean up their PRs. It's a pretty blunt tool though.


Its better to have the full detail in the case of an audit. It's almost guaranteed to be in the developers benefit.


Can you provide more details of what you're referring to? I understand the importance of an auditable trunk/production branch, but I'm having a hard time imagining why the sequence of commits on feature branches would matter in an audit.

The commit history is not an audit log, it's very easy to make it look like whatever you want it to look like, even if rebasing as such is banned. I have a hard time picturing a scenario where the commit history is trusted as an audit trail and it matters that every detail is present.


I'm referring to an outside certified audit of your code. You can make it look worse for yourself with rebases/squash merges but assuming you are working legitimately those would tend to obscure your work in realtime. What you as a developer would want is to be able to mirror your code changes along with the change requests.


Okay, but rebasing is changing each point in time of that history–that you curated by choosing when to commit–to be something different from what it ever was, retroactively. It's literally creating an entirely new history that nobody has ever actually examined, introducing the possibility that points along that history are inconsistent with what was intended at the point of each commit.


> creating an entirely new history that nobody has ever actually examined

I think the confusion here is that you're assuming that OP's commit history looks like yours, with dozens of commits per PR that no one could possibly examine in detail with each rebase. At least for me, since I'm okay with rewriting history on local branches, I have a very small number of commits that do get examined each time I rebase.

I average 3-4 commits per PR. There's usually one that refactors the existing code to lay the foundation for a new feature, maybe one that just moves a few files around (to ensure git recognizes them as moves and not delete/recreate), and 1-2 that introduce the new feature.

When I rebase on main, I examine the diff for each commit before pushing to my branch. If something has meaningfully changed, then I adjust the commits appropriately.

My commits aren't a history of what actually happened, they're a description of the steps that it takes to add a feature to (or fix a bug in) main. If main changes in a way that introduces a conflict, I want to reevaluate each step that I'd previously laid out.


Try it like this, see what you think:

Commits serve two needs: saving your work and publishing it. Adopting an "early and often, explain what you did" approach is effective for saving, but when it comes to publication a "refine before release, explain why you did it" strategy is more valuable.

The commit history is an artifact of the development process, just like documentation, tickets, or even code. I'm sure you wouldn't complain about people taking the time to write better comments, and a commit message is like a super-comment, because it can apply across multiple files.

Honestly, do a maintenance programmer a favour - fix up your commits before publishing them. A linear history makes tools like bisect easier to work with.


I wonder if the difference here is in what your quality threshold for a commit is. I commit when I reach a point of coherence in the code, and ensure that the code passes tests before I commit. Each commit is thus a checkpoint of coherence, where the points in between may be out of order or failing tests.

Maybe I just don't consider "saving your work" to be a valid use case for commits. Use an IDE or other local tools for that. Commits are points that are worth saving (or "publishing" if you prefer) beyond your local workspace.


So you're already doing curation of what the source history is! Us rebasers just do a little more, and we aren't afraid to rewrite history (before merging to master) to do it.

What happens when you're a few commits deep and realize one of your prior points of coherence could benefit from revision? Perhaps an extra live of documentation. Or a small bug fix. Or a new helper routine. I would go back to the commit where it belongs and put it there. Or, if it deserves it's own commit, then create a new one. But the point is that the source history is itself a tool I use to communicate with others (including my future self).


Agreed. This is why I rewrite history so I curate commits so that I have only 1 commit in main ever. You’re already doing curation, I just do a little more!


I realize you're trying to be cute, but my argument isn't "more curation is always better." My argument is, "if you're going to do curation anyway, you might as well acknowledge as such and maybe even be intentional about it."

Curation is a means to an end, not an end itself. And rewriting history on main would violate the obvious rule of not rewriting history that you collaborate with others on.

If you're genuinely curious, see my other comments in this thread. That should clarify things.


No, I’m reducing your argument to the absurd extreme. We both acknowledge there’s a line to be drawn. I would personally draw it at “the commit is the finest level of curation”, which reasonable people can disagree on.

I just find it absurd of you to argue that “we’re both curators if you think about it” as if that has anything pertinent to add to the conversation.


I don't see what's so absurd about it. On the one hand, we have people talking about the "actual" history and "coherence points." And on the other, we have people talking about rewriting history so that it is curated. For example, in the follow-up comment, they said, "So now you've erased the record of your actual process." As if there is one "actual" history and one that isn't. But neither are actual history, and that's what I'm pointing out. Pointing out that both are forms of curation is important because it makes it clear that the difference is a matter of degree, not of something categorical.

But no part of this leads one to conclude that the most possible curation is the best. So your "reducing your argument to the absurd extreme" does not follow. If you're trying to use it as a rhetorical device, then try harder. If you already acknowledge there's a line to be drawn and that both are forms of curation, then I don't see what we're disagreeing with.

> No, I’m reducing your argument to the absurd extreme. We both acknowledge there’s a line to be drawn.

I just explained in my previous comment why my argument doesn't let you draw the extreme conclusion. If you don't want to engage with it directly, then don't bother.

> as if that has anything pertinent to add to the conversation

Pot, meet kettle.


So now you've erased the record of your actual process, that might be revealing later to someone who's trying to figure out what the heck you were thinking, for the sake of trying to create a history that looks more linear or tidy than the reality of what happened, and, if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

This strikes me as a crazy fastidiousness over making your history look the way that you want it to look, rather than preserving the actual history, which is detrimental to the value of being able to find out what actually happened when something goes wrong.


> So now you've erased the record of your actual process

You have too! Unless you're recording every keystroke, which I assume you are not.

We are both curating source history. The only difference is that I'm intentional about it.

> that might be revealing later to someone who's trying to figure out what the heck you were thinking

More curation makes this easier, not harder.

> for the sake of trying to create a history that looks more linear or tidy than the reality of what happened

No. For the sake of communicating changes. Linear history and curated source history are just means to an end. They aren't an end to themselves.

> if you're not running tests and re-evaluating all the intermediate steps along your history, introducing the possibility that you've invalidated something that worked at one of those points in history and no longer does after you rewrite it.

A risk for sure. Not a big one in practice in my experience. And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

It's a downside for sure. But I'm very happy to pay it. Usually the worst thing that happens is you have to skip a commit now and then when doing a bisect. Reverts can also be more painful depending. If the pain becomes too great, then absolutely reevaluate. I wouldn't spend so much effort curating history if it just led to me fighting with it all the time. But it doesn't.

> This strikes me as a crazy fastidiousness

To be honest, based on your comments, it doesn't look like you've given that much thought to this. Firstly, you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation. Secondly, you seem to think I'm just doing this for the fun of it, it for the sake of it. But I'm doing it for the same reason I try to write code in a way that can be understood by others. That's it.

> which is detrimental to the value of being able to find out what actually happened when something goes wrong.

This tells me you've likely never worked in an environment where intentional curation was prevalent. Intentional curation makes this easier, not harder. It's one of its benefits and one of the reasons I do it. Intentional curation makes it much easier to understand the sequence of logical changes over time that has brought the code into its current state.


> You have too! Unless you're recording every keystroke, which I assume you are not.

Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

> More curation makes this easier, not harder.

Not when "curation" is revision after the fact. What you're describing as "curation" is changing the recorded/published history from states that were intentionally recorded and examined to new states that never were even run or examined anywhere. When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

> A risk for sure. Not a big one in practice in my experience.

I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

> And you can always configure CI to run on each commit, although the tooling to do this isn't great these days.

What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

> you think the choice is between "actual" history and curated history, when in reality, the choice is between some incidental curation and intentional curation

Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred? Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".


> Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

You know, the word "history" might be abused by git as much as the word "friend" is by facebook.

BTW, I never really got an answer to this one: what do you do when you notice a typo ten seconds after you committed? A new commit that says "typo fix", or squash the commit on to the previous commit?


If I've already pushed my branch, typically I have a draft PR open and tests are being run against it, and I'm not going to force push, because that's obnoxious and I've probably prohibited it in the repo.

If it's a local commit and I catch it in ten seconds, I will sometimes throw up in my mouth a little bit as I amend the commit.

I view source control as one of the best places for an immutable log of events, and prefer immutable logs for many things in general, for a variety of reasons. So yes, I fix forward.


> Surely you can understand the difference between omitting less interesting points along a timeline and literally changing what was recorded retroactively for points that have been selected as meaningful along that timeline?

Yes? But that isn't what you said. You said "actual" history. Using that phrasing makes this conversation extremely difficult because it doesn't acknowledge that our positions are different by degrees than by categories. You said "actual" as-if it was somehow inherently better because it's the "actual" history. But it isn't the "actual" history. So your communication on this point just becomes befuddled. Please be more precise.

> Not when "curation" is revision after the fact.

How many times do we have to go over this? Unless you're recording every single keystroke, then you are also doing "revision after the fact." There are differences between our approaches, for sure, but "revision after the fact" does not capture them.

> When I'm trying to answer the question "how did this ever work?" or "what were they thinking" and the answer is "it didn't", because they committed something different, this makes troubleshooting and determining intent infinitely more difficult and complicated.

What are you talking about, "committed something different"? I don't take a PR, rewrite history and then merge it. I rewrite the history, push it back up to the PR branch and only merge (via rebase if appropriate, or sometimes via squash) when CI passes. The collection of commits still passes. CI doesn't guarantee that each individual commit does, but I already acknowledged and discussed that downside. The curation of commits is specifically all about making intent and understanding the change easier. That's the entire point!

> I've definitely spent days of my life trying to track down inexplicable problems in other people's code as a result of their rebasing, that cannot be fully explained because the history of what they actually committed was erased.

I can't even conceive of example of this. Can you give one? Even if it's hypothetical, that's fine.

To be clear, I can imagine the following examples of things going awry:

* Squashing is used which causes many commits to get squashed into one, and thus can make the history of changes less clear depending on the commits. For example, if a PR contains 2 commits where there's a thousand lines as a result of adding a new function parameter in the first commit, and then a second commit with one additional line calling the function using that new parameter in an interesting way, then squashing those two commits into 1 will lead to history that is less clear. But this is why I don't advocate for squash & merge in all cases.

* Since CI doesn't run on every commit, if you need to revert a PR the merged multiple commits via rebasing only, then you might need to revert all of the commits that came in from that PR individually. That can be a pain and it can be difficult to discover which commits you need to revert.

* Since CI doesn't run on every commit, it's possible that `git bisect` can be more annoying than it otherwise would be. Maybe tests don't build on one commit. Then you need to do `git bisect skip`.

But none of those are about browsing the search history when using rebase & merge. I can't even begin to imagine a single example of browsing the search history where I would specifically want an "actual" accounting of the history without any intentional curation. In literally every instance of me browsing source history in over 20 years of programming, I cannot imagine a single instance where I found curation to be unhelpful and wished that the source history was somehow more faithful to how the programmer arrived at the change instead of focusing on communicating the change to other programmers.

> What are you even calling "continuous integration" if you're not running tests on every commit? This also highlights that if you were doing that, which I do, and you should be, that history becomes misleading after a rebase unless you re-run tests against every commit.

If you open a PR on a GitHub project with 5 commits, GitHub Actions will not run on each commit by default. I'm not aware of easy way of changing that behavior. If you "rebase & merge" that PR, CI still won't run on every commit merged. Here's an example from one of my projects, where you can clearly see that not every commit has a green checkmark: https://github.com/BurntSushi/ripgrep/commits/master/

I run dozens of projects this way. I've never had a major issue because it just isn't a big deal if one commit now and then doesn't pass tests. If it were a bigger deal, then I'd absolutely either reconsider my curation or invest more in improving CI tooling.

> Again, do you not understand the difference between capturing something that actually occurred and changing that capture to be something that never occurred?

We're speaking past each other. I don't know how else I can explain that there is no such thing as "capturing what actually occurred." You keep saying that, but even in that case, you aren't capturing what actually occurred. You're capturing an ad hoc curation of what actually occurred.

> Your curation is literally a series of lies about the code (that I understand you may find easier to read and more convenient for the goal of forming a high level understanding of the changes over time), whereas what I prefer is a faithful recording of history. The integrity of this captured history matters a lot when you're dealing with executable, deterministic code, and the outcome of running a program can be changed by your "curation".

You don't have a faithful recording of history though. Your source history is also a lie. And the thing you call a "faithful recording of history" is more like a meandering series of "fix typo" or "fix lint" or whatever commits. The only benefits it has that I'm aware of are the following:

* It's easier, in the sense that you don't pay any attention to how a patch series is structured. You just code and commit and don't worry about anything. To me, this is like writing code without caring about whether someone else (including you) can read & understand it. Which is a thing. Lots of people do that. Let's just be open and transparent about it.

* In some cases, there is less friction with the tooling.

I still don't think you've actually tried the type of curation I'm talking about. On the other hand, I arrived at my position on curation after years of doing your approach of capturing a "faithful recording of history" and realized it was just about useless.


> Unless you're recording every single keystroke, then you are also doing "revision after the fact." There are differences between our approaches, for sure, but "revision after the fact" does not capture them.

He corrects mistakes forward (new commit fixes old commit), we correct them backwards when possible (just fix the old commit directly), otherwise forwards. I know which I prefer, but nobody's going to convince anybody.

I just wish I didn't have to wade through dozens of pointless "Fold in John's suggestions from the PR" commits when trying to get to the meat. Or have git bisect land on a merge commit with two parents, throw up its hands and say "over to you, pal".

Funny thing is, I'm normally a proponent of "worse is better". I wonder why I'm not in this case. Probably because a rebase repo is a single train track, and so much easier to reason about.

(I bet the answer to your "when did rebase screw up so badly it took days to unpick" is something to do with "push --force". With great power...)


Yeah I actually don't necessarily mean to convince anyone of my way of doing things. I'm more or less just trying to convince others to both convey their ideas more clearly, and more importantly, recognize the trade offs in each of the approaches accurately. I do not feel either one of those things is really being done here.

I feel like this sort of confusion comes up every time there's a discussion about rebase versus merge. My favorite explanation of this is the combination of overloaded terminology (jargon versus layspeak, e.g., "history" and "merge") and inexperience. I don't jump in every time, but when I do, it feels like I'm banging my head against the wall. Sigh.

> He corrects mistakes forward (new commit fixes old commit), we correct them backwards when possible (just fix the old commit directly), otherwise forwards. I know which I prefer, but nobody's going to convince anybody.

I understand this. But I don't like the phrasing because it doesn't tell you anything about the differences in the approaches, when you might want to use one over the other and the trade-offs.

> I just wish I didn't have to wade through dozens of pointless "Fold in John's suggestions from the PR" commits when trying to get to the meat. Or have git bisect land on a merge commit with two parents, throw up its hands and say "over to you, pal".

Yeah those pointless commits are why I curate. And the overhead of submitting one-PR-per-commit is why I use both rebase & merge and squash & merge on GitHub.

> Funny thing is, I'm normally a proponent of "worse is better". I wonder why I'm not in this case. Probably because a rebase repo is a single train track, and so much easier to reason about.

I definitely don't chase perfection here. I don't mind having a commit that doesn't build or pass tests now and then. What I'm after is communicating clearly. Both to folks reviewing my code and to folks looking at the source history 6 months from now. I quite literally try to treat source history like I treat the code itself. Both things benefit from thinking about how other humans are going to interpret it in the future.

> (I bet the answer to your "when did rebase screw up so badly it took days to unpick" is something to do with "push --force". With great power...)

I'm sure I bungled things up pretty badly when I was first learning `git rebase`, but that was so long ago I can't remember. In working memory, the worst fuckups with rebase have been with `push --force` (well, `--force-with-lease`) and dependent PRs. But I just recently learned about `git rebase --update-refs`, and that's already made things a lot nicer.


Oh, sweet. That's going to make the branch I'm working on right this minute easier to deal with.


> You said "actual" as-if it was somehow inherently better because it's the "actual" history. But it isn't the "actual" history. So your communication on this point just becomes befuddled. Please be more precise.

actual: "Existing in reality and not potential, possible, simulated, or false: synonym: real." history: "A chronological record of events, as of the life or development of a people or institution, often including an explanation of or commentary on those events."

One of these things is a representation of the state of a codebase at a point in time. The other is a representation of a state of the codebase that never existed at any point in time.

> our positions are different by degrees than by categories

Absolutely not. I'm not sure how to interpret your comments as other than that you may not understand what rebase is actually doing.

A commit is a snapshot of a codebase at a point in time. If you commit when you've run your program, recording a point along the path of modifying the code where you've observed the codebase to be consistent, rebasing retroactively changes the snapshot of the codebase to something that you have never examined.

If foo.c defines a function foo that calls a function bar in bar.c, and you've updated the way that you call foo in foo.c and someone else updated the behavior of bar in bar.c, the act of rebasing in itself can change the output of your program without recording the step of making that change, and without you ever observing the program's behavior after that change (and before any other commits you've presumably made to get your code to its current state).

Are we at least on the same page that rebasing in itself makes changes to the atomic bits of recorded history, irrespective of what the size of those atoms are? You seem to be fixated on the size of steps being recorded, which is completely irrelevant to the point that rebase is retroactively changing the composition/snapshot of each step. The difference is between an immutable log of immutable events and a mutable log of mutable events. One of those is easier to reason about.


Kind of. I've been thinking about it since I wrote that, and what I say and what I do are a bit different. I don't rebase when I'm done, I rebase as I go as a constant background hum.

So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into. Then I create many small WIP commits, but in the commit message I note which task they belong to. So I might have two initial commits in a branch that say "[#123] Refactor foo" and "[#123] Upgrade bar", followed by a bunch of "[WIP] typo fix, merge against foo" and "[WIP] Preparing baz to upgrade bar". Then when I feel I've reached a point of sanity I pull the main branch, rebase my feature branch on top of it, merging my WIP commits as I go. Occasionally I'll even go back and split a WIP commit in half if there's a better logical mapping to the tasks.

If I haven't pushed I don't consider it saved, so I wouldn't like to rely on local-only tools in that way. I'd much rather push to a remote repo daily. It's not like anyone's going to see it until I raise a PR.

What do you do when you spot a typo ten seconds after you committed something? A separate typo fix commit? I prefer to merge it on to the previous commit. Nobody needs to run "git blame" and see "typo fix" as the last time that line was touched. It's noise.


> So I start by creating a few empty-ish commits that are roughly analogous to the tasks you'd break a ticket down into.

This is absolutely wild to me, and admittedly not a way I've ever imagined source control being used. I can't say that I have a fully developed opinion of it, but I have a feeling this would drive me nuts as a reviewer. It seems like you're using source control to craft a descriptive history around your changes, designed to tell a story you wanted to tell rather than the messy, authentic history that reveals the struggles you went through and problems you solved along the way. But by doing so, you're creating a fabricated history and losing the aspect that is more like an audit log. So I would just not trust any of it other than the outcome.

I simply don't give much value to human narratives about code, so that's why I prefer a messy history that's a reliable log of the steps you actually went through over a narrative history that might be nicer to read.


I think the whole [WIP] approach quite a good workaround to saving vs feature complete commits. TBH i don't bother rebasing [WIP]s but I understand why that might be desirable. Each non [WIP] commit should be a complete, fully integrated feature (and ideally only one "feature").


I don’t look at anything other than the merge to a trunk or main as part of the history. It’s not an audit log. I often do check point commits to move local state to a central git as a backup, or commit when I simply want to have a rollback option for something I’m not confident in. I always commit at the end of a day, for instance, and push to a remote, as I don’t trust my laptop or whatever, or worse some cloud dev machine.

None of these commits are useful for anyone, not even myself, beyond the immediate utility. I squash intermediate commits between change sets, and try to only reveal atomic change sets on any shared branch.

It’s absolutely the history of what has changed, but it is not some sort of journal log of every event in my development workflow the shared branch should absolutely be the evolved history of the source code, but without reflecting the work style of any one developer. It should be a comprehensible history of meaningful changes that can be independent reasoned about and cherry picked or reverted to as necessary. Every other commit is noise to everyone, including yourself, once it leaves your own branch. Since it didn’t even run in production there’s not even a plausible regulatory reason to keep them.


Why not have both? If you can filter by merges, what's the harm in having intermediate positions? There have been various points I actually wanted to have the vim undo log as well. That's what I'd really like - essentially a way of undoing back to time zero, with commits denoting feature complete positions and merges denoting, well, mergeable positions that have passed review.


In branch based development you do have both, in your branch. But if you’re working with other people, do they want your undo log? Is there any value to your undo log in say 10 years? The git repo on the main branch is your shared artifact, and as a matter of good practice is should be treated a shared resource that’s presentable to all, presents an easily understood and consumed interface, and is free of individual noise. If you want an undo log for your work on X, then when you merge your branch with main, don’t delete your branch. But the shared artifact shouldn’t be filled with everyone’s individual work process artifacts.


This, I really don't mind merge commits, it's nice to see what happened when. Especially if you run into conflicts and issues caused by bad resolution it's much better to have a clear true history.


the point of git is to enable linus or al viro or whoever to review your proposed changes as quickly and efficiently as possible, so they can be confident that what they're merging into their kernel tree is relatively sane, and then to actually do the merge in a reliable way that won't introduce other unintentional changes, and to be able to reproduce their own previous state

in that context it makes sense to use rebase to present linus with the cleanest, most comprehensible patch set possible, not your lab notebook of all the experiments you tried and the obvious bugs you had. you don't want to waste linus's time saying 'you have an obvious bug in commit xyz' followed by 'oh, never mind, you fixed that in commit abc'

but for my own stuff i prefer merge over rebase because i'm both the producer and the consumer of the feature branch, and rebase seems like more work and more risk


I see no issues here.

If you run tests before commit then you also run them after rebase, same way as after merge. If tests failed - you can force pull your branch from remote and have the same state as before rebase.


You run tests against each commit in the history that you're rebasing? I doubt it, and I guarantee that nearly nobody using rebase does that.


I agree. But that was what been described.

In my experience people are not running tests locally at all. Push to the remote, open pull request and wait for pipeline results.

In such situation the result will be the same: you will never know which commit from merge/rebase brakes your pipeline tests.


> "Cleaner", for some definition of "clean". In this case, pretty, not accurate.

What do you mean "accurate"? The developer decides when to commit and what message to write, rebasing just enables more control over the final artifact that is shared.

Have you ever heard the writing advice: don't write and edit at the same time?

Rebasing allows one to use the full power of git during development, committing frequently, and creating a very fine grained record of progress while working, without committing to leaving every commit on the permanent record. The official record of development history is more useful if it's distilled down to the essence of changes, with a strictly linear history, and no commits that break CI or were not shippable to production (at least in theory). Doing so makes future analysis and git-bisect operations much more efficient, and allows future developers to better understand the long arc of the project without wading through every burp and fart the programmers did during their individual coding process.

To those who say, "don't commit until you have a publishable unit of work," I say, you are depriving yourself of a valuable development tool. To those who say, "don't rebase, just squash", I say, squashing is rebasing, just without curation. To those who say, "rebasing is more error prone than merging", I say, if a merge commit turns out to have a problem you will have a much harder problem debugging it because it could be caused by either branch, or an interaction which no one considered.

The beauty of rebasing is that it forces the developer to think about all the intervening changes commit by commit as if they started their feature development from the current state of the main branch. This is a more healthy mental model and puts more responsibility on the developer to ensure their code reflects the current state of the world, and not just hastily merging without recognition of what has changed since then. After all, production can only have one commit on it at a time, and given many investigations hinge on understanding what SHAs were in production at what point in time, it makes everything a lot easier with a linear history that hews closely to what was actually shipped.

I realize that there's a learning curve for rebasing, but once you understand it, it allows conflict resolution to be resolved much more precisely with roughly the same level of effort. You can dismiss this as an aesthetic preference, along with good commit messages, changelogs and other points of software craftsmanship, but in my experience that there is real value in maintaining a high quality history on a long-lived project.


This. It's a dirty lie, that's not what actually happened!


Why does it matter what actually happened? Can you give a concrete example of when you care the exact sequence of experiments, false starts, and refinements that a feature went through before making it into a PR?


Realistically, how much does merging vs rebasing actually matter - do you save days of time over the year, or just a few minutes cumulatively because the commit graph is prettier?

I understand that it makes the history "cleaner," but how frequently do you end up bisecting manually searching the repo's commit history?

Even on large projects with dozens of feature branches that eventually make it through a dev / main / prod branch, I've never had a problem when merge was the default rule. But maybe we never hit an otherwise common problem.


Staying consistent matters. Once I joined a new team and my first task was to take care of a large code reformat everyone was afraid of taking over. I had done a similar thing many times, a super easy thing once you know git and are focused. It turned out a terrible experience for a combination for reasons, but one important reason was that the team used merge workflow, I had always done this in rebase. Don't remember the exact details though.


In practical terms, fixing merge conflicts and running git bisect are a lot more time consuming when you have a lot of merge commits.


It's just plain harder to reason about the history of a branch when there are a bunch of merge commits in it. Even if I'm not using bisect (I rarely do), having the 'git log' be polluted by merges makes it harder for me to fit everything into my head.

I also prefer to think about my branches as 'here is a stack of commits on top of a fixed point in time', so having merges in the middle of that flow makes it much harder to reason about that way. Rebasing to choose a new fixed point is much simpler.


> Even if I'm not using bisect (I rarely do), having the 'git log' be polluted by merges makes it harder for me to fit everything into my head.

How and why? Done properly, a merge commit of a PR breach is semantically equivalent to a squashed PR commit, from the perspective of the "trunk"/main/master branch. It is the commit introducing all of the changes of the PR branch into the trunk branch.

If you're about to reason about squashed PR commits, you should be able to reason about merge commits. The only difference is that with merge commits you still have the individual commit granularity available, should you be interested in it.


Both the git log and git bisect commands accept the --first-parent flag, which eliminates the complexities of dealing with merge commits in the history.


> Realistically, how much does merging vs rebasing actually matter - do you save days of time over the year, or just a few minutes cumulatively because the commit graph is prettier?

Excellent question! This is one of those how-to-measure-intangibles sort of question, so there's really no good answer to it. The issue is that bisection is really useful when you have a hard-to-find bug, and so the question is really "how often do you have such bugs", and the answer is hard to find because few companies require recording of such metadata in their bug reporting systems.


Our git workflow:

master branch - code currently deployed to production. never used as the base branch except rare hot fixes

staging branch - the base branch for all feature branches. On prod deploy, staging is merged to master with a merge commit (probably could (should?) be a rebase)

feature branches - always use staging as base branch

Most critically: all feature branches are squashed and merged, so that each single commit in master corresponds to a single PR.

Makes it easy to revert PR but difficult to cherry pick after squashing. Also keeps the hit history extremely clean and condensed. Not sure if this method will scale, but it’s working well at our company with 6 engineers with 20 or so feature branches open at any given time.

Edit: one reason this works for us is we keep feature branches short-lived (ideally at most 2-3 weeks) and staging gets merged to master twice a week (we do a deploy Mondays and Thursdays)


This sounds like a classic “release” branch flow. This is more scalable than trunk-based on large codebases that need lots of tests/verification


how often do you cut production from staging in this "gitflow light"?

in my opinion your staging/production parity needs to be really good if you do large iterations in prod, deploying smaller changes constantly will get you little oops moments more often, but you'll be able to fix them immediately since it's clear what caused it, as opposed as 2+ weeks of commits going to prod at once.

we had every feature branch create its own little minimal staging environment and started bugging developers to finish up after a week (the staging environment would tear itself down if not told to stay up explicitly via PR labels). and those feature environments went straight into main/production.


> we keep feature branches short-lived (ideally at most 2-3 weeks)

People that do trunk-based development would consider 2-3 weeks as quite long. My definition of short-lived is about 1 day.


In case it matters, we also push in progress code to remote at end of day. (We don’t rely on local branches for in progress work)

I’m guessing with a 1 day PR length, you’re mostly pushing finished code ready for peer review.


Out of interest, what's the value of having a distinct master and staging branches, rather than just tagging prod releases on staging?


One benefit is you can have different branch restrictions. Staging can be less strict (e.g. allow merging even if test suite is failing while master can have stricter requirements, like no merging unless tests are passing.

Or only require code owner approval on staging->master (or only requiring code owner approval on merging to staging)

I’m sure there are ways to accomplish the same sort of thing with tags, I’m not hugely tied to this workflow (other than it seems to work for our team)


That seems sane, is there a cohort of engineers pushing for trunk based development?


That’s the same method as what we use


I’m confused. Is the workflow to rebase branches and squash merge into main? Because that’s what I do at work and it works quite well. You get atomic PR merges so reverts are easy and you get the clean history for a PR so people can in theory review commit by commit. Although if you want to use merge commits in your own branch, I don’t care because it all gets squashed. I don’t fully get using rebase to merge PRs cause then it’s exposing the commits of the PR, when in fact the PR should be considered atomic code changes. But I suppose for workflows where PRs are not considered atomic code changes, rebasing could make sense.

Really, what this boils down to is a confusion between commits as a save point and commits as an atomic code change. With my aforementioned process, commits inside a PR are save points, I.e. I need to just save my code before leaving work, while commits on main are atomic code changes (and therefore should correspond to a single pull request). In the rebase-everything approach all commits are atomic code changes, which I find a little too obsessive since you need to make sure your code is always working when you commit or rewrite your history so that is true.


The article author appears to weasel-word merge commits and squash-merge together when they are very different things. Squash-merge into main / feature branch is almost equivalent to rebase and is the workflow Github / Gitlab / etc supports well in the UI. The article author might be conflating rebase and squash-merge in order to create clickbait. In particular the author cites lots of “private repos” but gives no evidence because I guess they’re private haha.


I like to use pure merge commits on my solo projects (where I actually do use feature branches), because I practice good commit hygiene (and clean up sloppy commits with interactive rebase). But for collaborating with others who can't be bothered to practice good commit hygiene, blindly squashing every feature branch before merging is definitely the lesser evil.


I feel the same way.

My commits on a PR are always rebased as I go, into one or two or at most three neat changes. Meanwhile (some) others I work with seem to have no problem creating PRs consisting of a dozen or more changes, most of which with messages like “wip”, “typo”, “fix comment” etc.


I think at scale, merges are too problematic, however for a long time I worked on a (small) team who took the approach that got history should represent “what you actually did”, and the thought process behind that, rather than the “perfect ideal” of the changes being made.

This brings its own benefits, it is often easier to learn from the commits, it’s often easier to review because you have more granular commits and can follow a dev’s thought process. And if you’re doing post-merge review as we did in the early days, you don’t lose that granularity when squashing. A nice bonus was that because there was no rebasing, no one ever really “broke git”, a classic issue for more junior developers. Ultimately the approach didn’t scale beyond ~8 devs/~500k lines/~15 PRs a day, but it was good for a long time.

The important thing though is: have a git style guide, make decisions for reasons that matter to your team, and stick to the style guide.


Aside from the obvious advantages, `git bisect` with rebase-managed trees works to single patch resolution as you would hope. On merge-damaged histories it only traces it to one giant hairball or another, ie, bisect is made useless.


> it only traces it to one giant hairball or another, ie, bisect is made useless

The blog post is advocating a "squash, rebase, and merge" flow, which leaves equally giant hairballs.


Right, squash-and-rebase is not very good. You want meaningful history. So if you have a feature branch that adds some feature consisting of N sub-features, R refactorings, B bug fixes, then you want N+R+B commits. The commits you squash are the "fixup" commits; all others stay. You also want good commit order. Basically, write your code, commit, then refactor your commits so that you get nice history of all the changes you made as if you were starting from the final state of play (because you are).


Oh... well, for benefit of others since you seem to know well already, there's no need for merge at all if you are just adding patches that have already been rebased to the HEAD of the tree they're going on. They just go on top cleanly and naturally, and all right with the world. Consumers of the branch fast-forward as normal. There's a pure, linear history with no hairballs that people can bisect to patch granularity. Just say no to merge.


At some point, we decided that proposed changes (patches) should live in source control before they were approved and brought into history. In my view, this is where the problem lives.

I don't care about having an immutable record of the history of patches the same way I don't care about having an immutable record of your keystrokes as you produced the change. That's an implementation detail.

However, once something is merged into trunk, I want to know what was merged, when it was merged, who merged it, and any people who approved it.


The funny thing is — this is actually different from how OSS (the initial model that Git was built for) worked. Patches are just applied as a single linear commit to the trunk branch.

Committing the history of individual work to the source of truth remote was definitely not the intent of merges.


On a project I worked on, there were a few fans of rebase who liked rebasing long-lived branches on top of trunk a lot. They also used to think that resolving conflicts was very simple and clear-cut.

And it had indeed been so simple for them that when they finally merged their work into trunk, several bugfixes that had been released a while before the merge, just disappeared.

When the users had reported that some old bugs reappeared, it took me some time to first confirm that I wasn’t going senile and that those bugs had been fixed before, then I couldn’t find the code I had put in, then I recreated some old PRs by hand and reapplied them.

I don’t know what went wrong then and who was the idiot, and I think that my life is too short to find out exactly how it may have happened, but no commit has vanished since a blanket ban on rebasing and force-pushing.


> Merge commits have more than one parent, and Git doesn’t automatically know which parent was the trunk, and which parent was the branch you want to un-merge.

Yes git knows: The first parent is the trunk.


Shout out https://sapling-scm.com/ who makes this workflow even better.


Interesting; first I've heard of Sapling SCM. Any other comments about it?


I’ve been using it for a while, primarily to get all the benefits of the mercurial UX while still using github as a backend, but yes it also makes the stacked-diff workflow[0] really nice thanks to features like “absorb” which will smartly amend changes to a branch (eg if your branch contains one commit to add a server and another commit to add a client, you make changes to both files, then run “absorb”, your server changes will be amended to the server commit and client changes are amended to the client commit) :)

[0] Rather than a PR containing three commits “Add server” -> “Add client” -> “fix server”, it is encouraged to submit a PR of “Add server (v2)” -> “Add client” (with the original buggy “Add server (v1)” still visible in the code-review tool, so that you can see what changed in v1 vs v2, while what gets merged into the master branch is only the final bug-free v2)


Does it support phases? If it has phases, absorb, hg grep --diff, hg fa --deleted, and revsets, that would be pretty much all I need :)


Thanks! That's helpful.


I think that is 2/3 of the story. The other 1/3 is the "patch" workflow.

In linux kernel development, it is all about sending patch sequences to a mailing list, comprising well-crafted logical steps. Then the maintainer applies the changes. This side-steps the rewrite merge history problem.

This is, I argue, what the default workflow was supposed to be, as the workflow came out of the LKML working practices in developing git.

The way I see it from a project scaling perspective, the patch based workflow is the most scaleable. Written another way: patches > rebase > merge

I think it also gives the best change history, again patches > rebase > merge

Pull Requests were really a GitHub thing. I like them. I wish people made the best out of them. When they do the atomic change to the trunk, it is worthwhile using a hand-crafted meaningful message explaining the goals, and reasons for making the change, together with a terse heading sentence ahead of the detail. Why many people advocate for rebasing for clean history, but leave a trash default-created PR merge commit message has always puzzled me.


I really like the idea of rebasing because it gives you a much easier-to-parse revision history (assuming you enforce good commit messages, structure, etc).

The problem I have with it though is early development. The history can be a total mess since lots of things are changing quickly. I often feel like it's not worth the effort of good commit discipline so early on, but once we get to an alpha or beta state, I find myself wishing we had a clean commit history where every commit on main passes CI (which may not have existed in the early commits).

Does anyone have suggestions on how to address this? The two easy answers that come to mind are 1) squash everything into an "initial commit" once possible, or 2) just don't worry about old commits passing CI. What I've ended up doing is rewriting the history to move everything on main to a "legacy" branch and have a single merge commit start what ends up being the production version of main. Even with a script, it's still error prone and a little scary to me...


We use rebase on solvespace, along with sensible squashing so most commits along master are pretty self contained. You can see the clean history here:

https://github.com/solvespace/solvespace/commits/master/


The problem with rebase IMHO is that it's very hard to grasp and explain on a "how does it work in practice" way, and it's very easy to mess up into a state that permanently loses data, even for seasoned developers.

The combination of both makes rebasing completely unusable for teams largely comprised of junior and intermediate developers - your seniors will spend an awful lot of their time helping juniors to revert git rebase fuck-ups, which is why I've seen it banned in quite the few places. We got enough of our own crap to do without having to deal into what exactly the juniors did prior to noticing they've screwed up.

IMHO, it's therefore not surprising to see Facebook using rebase - they have money in abundance, they don't care about some senior dev wasting half a day wrangling with basic tooling.


> shifting to a squash-rebase-and-merge workflow. The benefits are clear: rebasing creates a cleaner, more understandable history & state of the world without the clutter of merge commits. Trunk branches remain linear, and branches function as brief, atomic diffs off the trunk.

Arguably, you could achieve the same result of a trunk that looks clean and linear, if the tooling just had an option to hide the feature branches, and show you only the merge commits. But you could also keep the possibility of looking inside the old branches and their individual commit messages, if you wanted to.


a funny but easy to overlooked aspect of rebase+squash+ff-only merge is that it's basically a patch based workflow in disguise

(through with some additional metadata during reviews and similar)


A change should be atomic.

Whether the unit of change is a commit or a pull request that gets squashed is irrelevant, what matters is that the resulting history can be understood & bisected, that changes can be reverted, and that changes can be cherry-picked.

Work-in-progress changes should be able to be backed up. This doesn't have to happen in the version control system, but it can, and if it does this should not interfere with the goals of atomic changes.


I know it is not possible, and I know it is hard to implement it correctly and working on all cases, but if we could have a "git willRebaseFail" test that just checks if a clean and direct rebase is possible without touching anything, then rebase will be used a lot more. We could even have a git rebaseIfPossibleOrMergeOtherwise and set it as the default...


The fundamental issue with git is that it doesn’t have branches. What git calls a “branch” is a misnomer - it should be called a version/commit pointer. In a tree of versions/commits a single pointer does not identify a "path" through the tree of versions.

When talking about VCSs in general, the word “branch” is used to mean a linear sequence or chain of versions connected by a parent/child relationship. Unsurprisingly many git users also have this mental picture of "branches" in git which causes confusion - but it's is not really the fault of users.

For example, if the current main/master commit in git has two parents (which it will have after a merge) then it is not possible to know with certainty which of the two parents was previously pointed to by main/master and which came from the feature "branch". Thus git is unable to present a picture/log of the main/master branch as a linear sequence of commits - all it knows about main/master is what it currently points to. So when you log, it has to present the entire tree which means mingling up commits which were master/main commits and which represent work on feature "branches".

With other "true" branching VCSs, a branch identifies an actual sequence of commits and is not just a pointer to a single version. Thus you can navigate the history even after merging. You can ask it to only show main/master commits or only show commits created on a specified branch. While once a git "branch" is advanced or points to a new commit/version, the fact that it previously pointed to a specific parent is lost.

So git "branches" can only identify a linear sequence of versions if every commit only has a single parent. And this is what the rebasing workflow gives you. And this is why the rebasing workflow, dissatisfying as it is, provides the least friction when using git.

If you use merges, then you have commits with multiple parents and now your main/master _pointer_ can no longer be used to identify a single path/sequence of commits. If main/master points to a commit with two parents, git does not record which of the parents came from main/master and which represents work which was, at one time, pointed to by the feature branch.

So while I prefer VCSs with true branches so that merge workflows can be supported but I've given up trying to use such workflows with git - it's just not supported in a useful way.


How do people with workflows that don't do any squashing do code review? To me it always seems the main consideration deciding on commit size is about being considerate of the reviewer. Don't want to harass them with huge commits but also don't want to send barrages of tiny uncontextualized changes.


Generally, I look at the diff with the base branch, not individual commits (unless there is a reason to...)


> How do people with workflows that don't do any squashing do code review?

Going through the commits one-by-one or just looking at the entire diff both work just fine in most cases. In the former case, the commit messages (even if short one-liners) actually help understand the story of how and why the changes ended up taking the shape they did, so it's usually actually easier than just reading through one big diff.

> To me it always seems the main consideration deciding on commit size is about being considerate of the reviewer. Don't want to harass them with huge commits but also don't want to send barrages of tiny uncontextualized changes.

Commits should be atomic. Their size is irrelevant to that consideration. An atomic change may be one character, or twenty thousand lines (if those changes constitute an atomic (= singular and indivisible) change). This usually doesn't result in a barrage of tiny changes that are difficult to understand on their own, but even if it did, commit messages, the sequence of commits, the full diff of the MR/PR, as well as the attached information on the issue tracker you use (which you presumably have if you're doing code review) all provide more than enough context.


My solution to this is "stacked PRs."

main <- PR #1 <- PR #2 <- PR #3 <- PR #4

You can review them in order where the diff view in the PR shows chunks of changes logically grouped together and can be commented on/amended separately. Once everyone is satisfied with the patch set #4 is pulled into #3 is pulled into #2 ... and it shows up in main as 4 commits each with an independent PR history attached.


That then becomes a problem if the first PR is accepted and rebased while the others are outstanding.

Or at least it is with bitbucket. If the first PR is instead merged, the commits vanish from the start of the sequence in the other PRs.


When I do code reviews,

- first I review the commit list (including commentary),

- then I review all the diffs in one go unless it turns out that it's better to use a different approach, in which case I review all the small commits, then all the large commits.


It seems like git itself ought to be able to combine multiple commits into a single ubercommit for the sake of PRs, but allow pulling each of those sub-commits that constituted it in the first place afterwards for debugging and QA purposes. But I guess these 3d party tools achieve that as well.


It does, if you interact at the CLI, where a "PR" concept does not exist.

   git diff ^initial-commit~1 final-commit
However for that to be available after a contribution has been included in the master branch, one has to use merge commits.

Otherwise there are no inherent markers as the graph has been flattened in to a linear sequence. One then has to make use of external tools, or commit comment conventions to recreate what was actually done or intended.


You can basically achieve that with a rebase then merge with --no-ff. You can then just bisect etc with "first-parent" to see the big chunks as they're landed.


YES. I've been saying that for years. The fact that it doesn't work this way is really pretty braindead.


> The most common critique of the rebase workflow can be boiled down to inadequate support from native Git and GitHub. Rebase commands can be intimidating to execute, and recovering previous commit versions often requires unnerving dives into the Git ref-log.

Since branches are effectively free, one "trick" is to backup your branch as a new branch (e.g. "git branch -f my-feature-branch-bk") before starting any voodoo^Wrebasing.

There's also the tried and true fallback: https://xkcd.com/1597/

> Even with everything done right, a rebase-centric workflow will flood your GitHub PR timeline with force push events. These force pushes happen even when you've merely rebased your changes onto a newer trunk, leaving the diff unchanged.

Yes that's the price you pay for a linear history.

For a high churn PR there's no need to rebase on every push, if the PR itself is still in flux and/or you're pushing purely to leverage the centralized CI system, it's fine to tack on additional "wip" commits without rebasing.

It's only the final product that must be rebased to get that silky smooth linear commit history. And you should be doing that anyway so that the logical changes in your PR are unified into a single cohesive unit.


i prefer merge over rebase, but they're both pretty okay; merge is safer, but it's rare for rebase to confuse things so badly i can't figure out what went wrong

squash merges, by contrast, are a terrible idea. unless for some reason you make commits with comments like 'another change' or 'try again', maybe because your ci system needs a commit to work from. those should be squashed. but squashing a whole branch means that the refactoring work you had to do is mixed into the same commit with the actual functional changes, which makes it much harder to read the commit later. also, obviously, giant commits in your history cripple git bisect


This is the strangest debate and it's not really clear from the blog post what the benefit of picking one gets you. Why does it matter what happens on a devs branch? just do a squash and merge when merging a branch to trunk.


Actual title: Why large companies and fast-moving startups are banning merge commits

I love that title.


Try getting your git bisect working with merge or non-squash workflow.

I don't care how the other developers came up with the solution, I want their changeset neatly tagged in the commit with ticket reference.


My rule of thumb is basically: will someone else view (or specially be affected by) the change? Then use merge. Will not? Then use rebase.

On a local branch? Definitely rebase. On a branch from an active pr? Definitely merge. Most situations are not so straightforward (what about a pushed branch without a pr? And with a draft?) But in those case mostly depends on the team. If you know/think someone used/reviewed that branch, never rewrite history. Otherwise please do.


Squashing and rebasing is lying. Don’t lie.


is it just me, or do i feel that whatever workflow you are using greatly depends on what kind of product you are developing, and on what language you are using to develop it in? it might be useful if people stated both, here and elsewhere.


In my opinion if one could choose between a well executed and instant rebase versus a well executed and instant merge, one would be a fool not to choose a rebase every single time.

In practice too, I also almost always choose rebases, BUT there are a few caveats here that are worth making…

As a general rule rebases for more than a small number of commits in cases where conflicts are non trivial can take significantly more time and patience for the developer performing it. Instead of fixing the conflicts in one swoop, you have to sometimes resolve conflicts repeatedly in the same areas of code. If the conflicts are extensive this can be a very unpleasant experience even for experienced developers.

For less experienced or less skilled developers each of these conflicts poses an opportunity to mess up. Since there are more of them when rebasing, there are also more opportunities to mess up. If the developer is in a rush or is simply impatient, I would also expect them to mess up more in a rebase than a merge.

Yes, ideally we wouldn’t have massive branches that need to be rebased with many conflicts, but in practice if you are working on a codebase where a lot is happening frequently by lots of people in the same parts of the code or code is moving around, there are often no really good alternatives. While perhaps more changes can be made incrementally than people often admit (especially if you include feature flags and the like to allow for transitions — though those can come with their own pitfalls), in practicality, not everything can in practice be done without having longer branches (and not having unreasonably large changes in a commit). Sometimes large changes just need to be brought in together.

Slow reviews or at least slow code reviews at the same level of effectiveness in the review can pose problems as well. If you want it properly reviewed in the safest possible way you really want no new code in the branch being merged to after the review begins — allowing a clean fast forward merge from the rebased branch, but that can be impractical in some codebases with a lot of activity.

Merges can still be messed up and until recently it was quite hard to even see the merge diff to debug a bad merge, but there are admittedly fewer steps where you can mess up.

All that said, I’m still a huge advocate of rebasing, but I get it that not everyone is up to the task. Sadly most developers have tons of awful, broken, and incoherent commits on their own branches that they never clean up. Until you can convince them to at least do fixups and cleanup the commit histories on their on branches with rebasing, arguing for rebasing over merges towards trunks is pointless.

It requires more from the developer upfront. If your developers are good it will require much less from a maintenance perspective but the reality is that having a team of capable developers who are willing to do the work is not something one can take for granted in most places.


The fact that rebase would ever be preferred over merge just shows how bad Git really is under the covers. It's insane that everyone is re-writing their development history to avoid issues with their source control system. There's no reason that should be necessary at all. Manually intervening and re-writing history should be completely antithetical to a good source control system.

And squashing should never be preferred by anyone if the tooling was actually any good. For PRs? You should be able to define a group of commits and look at it as a whole, not just operate on a single commit. For clutter? You should be able to "zoom" in and out, or collapse/expand branches in your UI. For disk space? If the size of each commit was O(size of change + small constant), then squishing wouldn't help, because you'd just be cutting out a few bytes per commit -- assuming you're not thrashing back and forth with your changes. The fact that squashing saves so much space just shows how inefficiently Git is storing things.

We're all suffering that Mercurial lost the war. Of course, even Mercurial isn't perfect. Why did we stop innovating on source control systems a decade ago?


> Manually intervening and re-writing history should be completely antithetical to a good source control system.

The way I use git, it's not "history" until it's in main. Up until that point, branches are mutable sequences of patches that may or may not ever get merged in.

These patches bear little resemblance to the actual sequence in which I developed the code: as often as not, I'll amend a commit rather than creating a new one, because the point of the version control system for me isn't to faithfully log what I did, it's to represent my change as a discrete sequence of steps to get from main to main+feature.

Viewed that way, rebase is obviously the preferred way to integrate main->feature branch. Main has changed under me in some way that causes a conflict, so I need to reevaluate my sequence of patches in light of the new information.

When it comes to integrating feature->main, I don't have a strong opinion, except that squash is terrible because I specifically designed the commits to make sense as distinct steps.


I cannot grasp that at all.

To rebase I take your patches, apply them, then I apply my patches, fixing any trip ups. You "got in" first, so I get to do the work.

It seems simple to me - far more than hoping merges do what we both meant.

Conceptually, it's patches all the way down - anything else is to me suspect ... might be me not being big brained enough :-)


In both cases we're just "hoping" that the result is what we want. Merging is acknowledging the parallel development and explicitly dealing with it; rebasing is pretending it never happened and applying my patches on top of a codebase that I have never seen before. In my experience, rebasing is much more likely to fail in weird and dangerous ways. But I'll admit Git's merging leaves a lot to be desired too -- that's part of what I'm complaining about, and what attracts people to rebase.


Then I think that the complexity is kind of inherent.

Firstly it's only a codebase I have never seen before because I am not keeping up with the changes. Ideally i will pause during rebase carefully read the commits which will include long thoughtful messages laying out important implications, and my tooling will make clear where the conflicts lie, and how test coverage is affected.

We won't ever just grep for <<<<<< and try and remember which HEAD means - is that mine or yours?

I am making changes against an evolving codebase. Either I am talking with people editing that codebase and we are in sympathy or we are hoping a VCS will solve our political and organisational problems.

I honestly think having time to argue about the best way to solve a problem amoung a team is the best solution - let the team talk argue and decide and they will barely need a VCS. If you cannot "spare the time we must deliver" you got other problems

Edit: Look honestly it's all patches. That's the atomic level. And to my mind, having you gather the freshest version of the codebase then apply your chnage son top is the sanest cleanest approach.

The same merge conflict resolution applies - I agree. (basically line conflicts). But what's the option? Anything else gets weirder faster (seen some trying to understand conflicts in functions, etc)


I've been keeping an eye on the Pijul[1] VCS which goes all-in on the whole "theory of patches" thing. I don't expect Git will be going anywhere for a long time, bjt it's interesting to think about what could be next.

[1]https://pijul.org/


> The fact that rebase would ever be preferred over merge just shows how bad Git really is under the covers. It's insane that everyone is re-writing their development history to avoid issues with their source control system.

It's local history that one rewrites, so there's nothing "insane" about it. What your comment does is expose your lack of understanding of this matter.

> We're all suffering that Mercurial lost the war. Of course, even Mercurial isn't perfect. Why did we stop innovating on source control systems a decade ago?

Mercurial supports rebasing and history rewriting just fine.


> It's local history that one rewrites

I don't understand this. You do not back up your code to any shared repository while still working on a feature? What if it takes 2+ weeks? You have 2 weeks of code replicated only locally on your laptop? Or what do you mean by local?


Yes, local is the wrong word. Or rather, it's imprecise. A better word is unshared. But even that is imprecise, because you might consider pushing a set of commits to the Internet as sharing. Maybe an even better word is collaborate. I don't rewrite history of source code that I am collaborating with someone else on. For example, the history on master never gets re-written. But that feature branch I pushed to get review before being merged? Absolutely. I'll rewrite its history without a second thought before it gets merged.

It doesn't matter what source control system you hand me. It could be your ideally perfect system. I would still rewrite history that I'm not collaborating with others on. I try treat source history (as much as is practical) as a sequence of logical changes to the code. I treat it like I treat the code. I try to optimize for making it intelligible and understandable to other humans (including myself).

You can look at any of my projects on GitHub for examples.


You can push your local work to a private remote branch. (By private, I don't mean that other people can't see it, I just mean some convention which means other developers will ignore it in normal circumstances.)

If a person commits a lot, they can end with all kinds of little commits which are just fixing typos or syntax errors, exploring half-baked ideas, can't be bothered writing a commit message right now, etc. What's the point of keeping all that stuff long-term?

Then, when it is ready to be reviewed, you rewrite that real history into something more logical - e.g. first commit creates an interface, second commit implements it, third makes other parts of the code call it, etc - and then submit that for review as a PR - that "fake" history is likely to be much more useful to others (and to one's own future self) than the real development history was


The reason you are asving this is because git is so aweful at managing many tiny commits. So you work around it by squashing. However this is a defect of your version control system.

back when I used hg small commits were not a problem as the branch name remained after the merge and so mostly we were able to treat branch names as the squashed whiae still having acces to the typo commit if needed.


> The reason you are asving this is because git is so aweful at managing many tiny commits. So you work around it by squashing.

I don’t think the real issue is Git’s features. Even if Git had whatever additional features you wish it had, it still doesn’t change the fact that a history of typos and stupid mistakes isn’t very useful to anybody (especially in the long-run), and hence there is little or no value in preserving it; while a faked history which breaks the change down into logical steps does have some real long-term value, in terms of helping anyone who needs to review and understand the change in the future


I'm not the person you asked, but I think they meant: changes that are not shared with others. You can back up your own stuff as you please, but once you share it with someone else, it's best to treat that bit of repo history as immutable from that point onward (because they may have made their own copies, shared it with other people, etc).

So I think what people are saying (and I largely agree with them) is that it's fine to rewrite history as long as you never share the ugly mess that existed before the rewrite with others.

A typical example of this is having un-shared ("local" - but could be backed up) streams of "tweak a comment, fix a semicolon, xxx" changesets that aren't really going to be useful to other people. Before you share with others, you squash them down to a single (or few) "implement feature X" kind of changes, often with linear history.

Having that less-branchy and less-noisy linear history available is useful in the future when you want to examine the history of the repo. Future people don't need to know that you fixed a comment, added a semicolon, etc. They want to know the big coherent logical changes that were applied to the repo.

Squasing/rebasing/fixing history before you share it allows you to have your cake and eat it too, more or less.

----

EDIT: Here's a decent email from Torvalds on the subject (not that I agree w/him about everything, but in this case): https://www.mail-archive.com/dri-devel@lists.sourceforge.net...


There's an easy way around this, and that's to namespace in some way branches that are free to be overwritten, ie WIP work not yet merged into the canonical branches. Works just fine.

If the project is too large for this, you can also fork off the repo itself and then merge in your changes to the main canonical version when you're ready.


I'm not a fan of changing history at all, but:

> to any shared repository while still working on a feature? What if it takes 2+ weeks?

A lot of projects don't want our partially implemented changes, if only one person is working on them, just the first ready-for-others result and subsequent smaller fix updates.

> Or what do you mean by local?

This is key: remember that git is distributed, so backups of your really local repository should be as easy as pushing to a remote copy. If you aren't doing that (or even if you are) keeping an automatic off-machine backup of your working directories is a good idea in case of drive failure.

If you are working on a feature with a team, perhaps a team-local shared repository is a valid answer.


"Local" probably wasn't quite the word he meant. "Personal" is maybe a better word. In other words, only you are working on it. As soon as you share a branch with other people and they start adding commits - that's when you should start avoiding "rewriting history" (though if you coordinate well even then it's fine).

And to answer your actual questions. Yes, you either:

1. Push it to a shared repo but with a name that makes clear it is a branch belonging to you (e.g. `<yourname>/branch`), or

2. Back it up with your standard laptop backup (the one you have set up that backs up non Git stuff too).


Local in the sense that every developer has a full history, and can edit their own history after the fact before pushing changes.

Basically it's the equivalent of bringing patches up to date before upstreaming them. Upstream isn't going to apply a patch that no longer applies cleanly.


Local == not published (on the trunk or whatever you call the main branch).

Local != not pushed to a remote. You can totally push to a per-developer remote, or to branches under a per-developer namespace.


> It's local history that one rewrites, so there's nothing "insane" about it.

Squashing PRs on remote is quite common.

You could also do without the disparaging comments.


> You could also do without the disparaging comments.

You're right.


Doing the extra work to rebase makes git bisect a lot more tenable, and it simplifies the view of history. You don't have to make this tradeoff, but some might want to.


I agree. Mercurial at Meta was THE BEST version control experience I ever had. In 2023, git is worse than mercurial. Stack sets are superior.


Yes. The absence of staging area and the single path of development controlled by diff approvals and landing were major improvements.

But Meta's "hg" wasn't really FOSS hg for a long time and looks nothing like it today. It started as hg but was hacked to become what is now sort of Sapling (+ Mononoke + Eden + Buck + Phabricator + Sandcastle + Landcastle) integrated together.

The lore was that the git people didn't want to help improve monorepo performance, but the hg people were somewhat more interested so that was the reason behind hg. Then later, Microsoft got git people to upstream changes to accomplish similar performance improvements.

IMO, efficient monorepo dev cannot happen without an SCM-integrated synthetic FS. Also, efficient dev cannot happen without dependencies-aware testing and build caching. It further helps if dev tools are tightly integrated together and documented. Meta didn't do documentation well and didn't name things all that well either. Most internal tools also lacked man pages and generally depend on in-group knowledge hoarding rather than self-discovery and principle of least surprise.

Every now and then, some critical tool (to your workflow) group would come along, break their tools, and leave them broken for weeks with a flimsy excuse like "Oh yeah, our news people are rewriting it in Rust" rather than reverting to a working build.


I dunno…leveraging retrospect is often useful.


Extremely useful almost every time. Why wouldn’t we re-write our personal history just before merging into the main permanent history?


I'm not a proponent of it, but some people would say: including all the messy history is a more honest representation of what actually happened.

By that school of thought, a repo should be recording what happened, not what you want to look like happened.

I haven't found it to be terribly useful though. IME, it adds noise for future code archeologists.

In very rare cases, having the messy long history can be slightly more useful for bisecting to find precisely where bugs were introduced, simply because it's more fine-grained.


> including all the messy history is a more honest representation of what actually happened

I don’t think anyone needs to see the hundred incantations of `git commit -a -m fuckfuckfuck` I do before I make a PR.

To those who say “you should make meaningful commit messages” I say “yeah that’s what you see in the single commit when I make the PR”. If you say “you should just not commit until your local work is PR worthy”, I would say you’re not using git effectively.


> I'm not a proponent of it, but some people would say: including all the messy history is a more honest representation of what actually happened.

By that logic you should leave typos in, because, you know, it's "more honest". Or you should not edit yourself in any way, because it's "more honest".

The reality is that not editing our communications, or our commit history, is a lack of respect to others, especially when that lack of editing imposes a higher cognitive burden on others (e.g., having to sift through more hay to find needles).


I think those proponents would indeed say to include the typos. And then also include the change where you fixed the typos, and so on.

I didn't mean to imply any moral or dogmatic concept of "honesty" here. It only means there is less guessing about what events actually transpired.

But as you say, I agree it most often just adds noise, in my experiences.


> But as you say, I agree it most often just adds noise, in my experiences.

But critically, that noise is a lack of respect towards others.


As with all things, context matters. It would depend on what those others want, what the team culture is, etc. I could imagine a group of these hypothetical "maximum honesty" people working together happily, and respectfully, in their way.


> Why wouldn’t we re-write our personal history just before merging into the main permanent history?

Mainly because it's more work. But it's work worth doing. It's also enjoyable.


> It's insane that everyone is re-writing their development history to avoid issues with their source control system

people rewrite their history because theyre irresponsible with pushes. it's got nothing to do with the source control system. You're supposed to push small isolated encapsulated commits with a good names. I never do that while developing. I would have the same problem on any VCS

> And squashing should never be preferred by anyone if the tooling was actually any good. For PRs? You should be able to define a group of commits and look at it as a whole, not just operate on a single commit. For clutter?

once again responsible commits are supposed to be small, isolated, single change commits. so in a large feature a good developer would have tons of commits this makes the PR easier to understand and review... and when the features finally merged, you don't necessarily need those commits you just need the feature.

I think your issue might just be with understanding how version control is supposed to work.


> I think your issue might just be with understanding how version control is supposed to work.

This has nothing to do with how it's "supposed to" work, and all to do with the standards and constraints we're putting on ourselves here. No god has made any commandments about how thou shalt organize your commits.

> I would have the same problem on any VCS ... commits are supposed to be small, isolated, single change commits

It sounds to me like you've got some self-imposed issues. Nobody told you every commit has to be a "single change", unless you're very loose with what you mean by that. Sure, a commit should be a group of related functionality, such that you either want all of it or none of it. But it's perfectly fine for one commit to involve 20+ files. In fact one of the (many) things people hated about CVS was that you couldn't bundle changes like this! And if you're that concerned about commits being single changes, then you should never squash! It's hypocritical to say every commit should be tiny, but then squash them!

> this makes the PR easier to understand and review

I already addressed this. The problem is the 1:1 mapping between commits and PRs. There's no reason for that.


> people rewrite their history because theyre irresponsible with pushes. it's got nothing to do with the source control system.

Corporate policy often is to push at the end of the day, so in the event you spill some coffee over your laptop, all you'll lose is a day of lost work and a day to set up a new laptop.


> people rewrite their history because theyre irresponsible with pushes. it's got nothing to do with the source control system. You're supposed to push small isolated encapsulated commits with a good names. I never do that while developing.

Its a good ideal but not realistic.


Why isn’t it realistic? I’m not trying to be glib. With “git add -p” and “git checkout -p” I find it fairly straightforward to do. I think usage of “git commit -a” or “git commit .” is the source of a lot of messy commits, but is often easily avoidable. Good commit hygiene is just as important as good tests, good method names, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: