Hacker News new | past | comments | ask | show | jobs | submit login
Extremely Linear Git History (westling.dev)
498 points by zegl on Nov 22, 2022 | hide | past | favorite | 358 comments

Github-style rebase-only PRs have revealed the best compromise between 'preserve history' and 'linear history' strategies:

All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.

If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.

The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).

What's weird about most of these discussions is how they're always seen as technical considerations distinct from the individuals who actually use the system.

The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.

I wholeheartedly agree!

With this, you can also push people towards smaller PRs which are easier to review and integrate.

The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).

Git recently added a --update-refs option to rebase that makes dealing with this scenario a lot easier. This post does a good job explaining how to use it: https://andrewlock.net/working-with-stacked-branches-in-git-...

oh my... I was at the point of making a git plugin to do this. This pretty much was the bane of my existence.


Git branchless have restack command that restacks whole trees/branches of commits.

Second part explains exactly what I'm juggling with right now.

What about commit signatures? If you rebase, you lose the original signature, don't you?

If you let Github do the rebase, yes, you do. But you can do so manually yourself, taking the commit down to a single squashed commit, that you then sign.

This is a tooling issue that needs to be solved client-side (i.e. where the signing key lives). It's an important one but actually really simple.

I wonder why GitHub doesn’t apply their own signature when they rebase a commit with a valid signature from one of their users. They do that when you edit a file through their Web UI.

This is completely insane under any proposed use case for commit signatures besides "tick some bureaucrat's box that asks 'was the commit signed?'".

Did you even read the article? This article is about perversely forcing the commit hashes to come out a certain way for lulz.

From the guidelines[1]:

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

[1]: https://news.ycombinator.com/newsguidelines.html

But why do you "squash" it! Why do people do this?

Ever seen a PR that implements something in a GitHub Actions workflow? The history usually looks like: clear cache, fix path, fix variable expansion, fix command, fix command again, fix syntax, […].

The best way IMO is to interactive-rebase the branch locally (or force-push a rebased version later), but sometimes 50 commits merge into a 30-ligne single-file change and nothing beats squash.

I want the 'merge' function completely deprecated. I simply don't trust it anymore.

If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.

If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.

> I want the 'merge' function completely deprecated. I simply don't trust it anymore.

Merge is perfectly fine and it is the only way to synchronize repositories without changing the history, which is very important for a decentralized system. It certainly has the potential to make a mess if used improperly, but so do rebase, cherry-pick, and basically every other command.

> If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.

If you do things correctly, that is by making sure that when you merge changes from a feature branch into the mainline, the mainline is always the first parent, you shouldn't have any problem. Git is designed this way, so normally, you have to go out of your way to mess things up. If did it like that and you don't want to see the other branch commits, git-log has the --first-parent option.

Proper use of merge is table stakes. You get warned in your PR if your non-main branch is out of date with your main branch, and after you rebase and force push your non-main branch, you review the diff in the PR.

Then you don't actually need merge? Am I missing something?

If you always rebase the branch, the commits can be applied directly.

You're missing the merge commits. `git log` is a key feature and merges belong there, at least in my workflow.

if you do this:

(starting on main)

git checkout -b feature

do work

git commit -a

git checkout main

git pull

git checkout feature

git rebase main

publish code review, get approval

git checkout main

git merge feature

You still use merge at the end, even though it's not actually doing anything that'll result in a conflict.

Rather than switching to "main" and pull it, you can just stay in "feature" and do a fetch followed by "rebase origin/main". Then pull "main" before you merge the feature.

I'd also use "merge --no-ff" to force an empty commit that visualizes where a feature begins and ends.

Totally on board (except the forbidden -a switch).

The last command is:

    cp .git/refs/heads/feature .git/refs/heads/main
No merge needed.

Touching files in .git (outside of like, .git/config) directly gives me the heebee jeebees

Fair point, and I would not advocate that specific workflow! My point was just to illustrate that we can live without merge.

    git reset --hard main feature

way easier to mess up a rebase

way easier to tell that you've messed up a rebase

it's actually way easier to accidentally mess up a rebase, especially if rearranging commits.

I can recommend git diff @{1} post rebase I alias it to d-

I'm not sure if you misread my comment, but my point was that it's far too easy to accidentally introduce bugs in merge commits that go unnoticed for a long time.

I've never seen a rebase gone awry introduce production bugs, but I've known multiple gnarly bugs caused by errant merges. YMMV.

as always, different circumstances can generate different results.

In a merge, you solve conflicts once. Whereas in a rebase, those conflicts will turn into incremental conflicts.

If the branch history is "tidy", with discrete, purposeful commits, this can be easier. Especially if incrementally rebasing.

The main difference is one rewrites history and the other does not. A rebase is by nature destructive and as such can introduce subtle changes in the process, especially if commits are reordered / modified in the process

It's not really destructive, though! That's not really the main difference.

The main difference is that a merge sticks around in your repo forever, a commit that people assume has no real code changes in it but actually sometimes it does. A rebase is done once, and then your git history doesn't have to deal with it ever again.

Yes, you raise a fair point that if you've dug yourself into a deep pit already with long-lived branches and overlapping work, it might be slightly easier to extract yourself from the pit with a merge. But then you're leaving that fetid pit in your repository forever.

When I said destructive, I meant in the literal sense, in that it rewrites history.

Don't get me wrong, I _often_ rebase, about a dozen times a day and it's been a core part of my workflow for 2 years. In that time I have learnt a lot, silently lost changes and ended up in a few mishaps.

I am in no way against the idea of rebasing, I frequently do. And personally, I often rebase && merge --no-ff. But, IMHO it's far too easy to mess up too that I'd adopt it as a dogma.

I also question the notion that VC history is best thought of in linear terms. I'd argue it's fundamentally flawed to force a DAG into a more linear structure.

In my experience, the desire to do this is to construct a DAG that's pretty in log viewer XYZ, rather than anything else. I consider this highly overrated. Just look at the DAG of the git project. Yes, it's intense, but the primary purpose of the history DAG isn't to immediately present a simple linear history.

Rather, it's to preserve a common, shared, decentralized history where it's easy to go back to a precise moment and see what was done to what and why. A dogmatic always rebase history is in my experience often a relatively pointless pursuit of constructing a git log --graph that's "simple" by default. Ie, rather than solving for the problem of "How to overview a DAG", the solution is to reformulate the DAG in a linear way which often conforms better to how we humans like to overview information.

Again, this is a personal viewpoint and I don't mean to pass judgement but I often find that such approaches, which one might liken to treating symptoms instead of curing the cause, is better solved the other way around. It's a real joy to delve into the git projects history, despite the fact that it's _littered_ with merge commits.

Basically, I think the quest for a "simpler" looking history DAG is somewhat overrated and not something I'd personally recommend pursuing.

First time I've seen aliases using other chars other than a-z; care to share your dotfiles?

It's a neat trick to explode your alias namespace, since you'll never see a tool published named `ls-` So you have reserved a huge "address block" for your personal aliases :)

I'd be happy to, this is (roughly) my git config.


NB, that some are personal custom scripts, like git-branch-status, which I also publish in the same public repository.

It's very much opinionated and geared to my use but feel free to use it, submit feedback and/or PR

Thank-you. I collected some from kristopolous recently [1] too.

I see d- there, so the one use of dash is used sparingly. There's a lot of git functionality I'm leaving on the table, looks like.

Currently, I'm using a shell script to help with a git conflict resolution flow that does something like

    read -p 'Conflict. Resolve and press [Enter]'
And fix in a separate tmux window (git add, git cherry-pick --continue).

TUIs and autocomplete popups are nice, but there's opportunity for a deeper understanding writing one's own tools. So I'm hoping to combine ZZ and `read -p` (or similar) to coax nvi (Keith Bostic) to something for Java stuff. Or at least build some primitives around that.

Encountering the equivalent of "flash of unstyled content" when switching from text editor to a--for example--Java method chooser feels like the philosophical difference between "Let's SPA" versus "Click flashes between pages is fine."

The flow would be something like

  1. In nvi, keystroke equivalent of
     Ctrl-Space brings up an Intellisense
  2. The tool loads up a list of
     autocomplete methods as well as its
     own hotkeys.
  3. Pressing up and down manipulates a
     temporary text file that just prepends
     ">" next to the line, for example.
  4. And Enter somehow brings back nvi with
     the method added, right after the
     period (with our partial typing
All this to say, dotfiles and git config are no big deal, but in CLI it's an escape hatch to molding a custom environment.

[1] https://news.ycombinator.com/item?id=33628204

Unfortunately, git rebase has a very very annoying limitation that git merge doesn't. If you have a branch with, say, masterX + 10 commits, and commit 1 from your branch is in conflict with masterX+1, then when you rebase your branch onto masterX+1, you will have to resolve the conflict 10 times (assuming all 10 commits happen in the same area that had the original conflict). If instead you merge masterX+1 onto your branch, you will only have to resolve the conflict once.

Even though I much prefer a linear history, losing 1h or more to the tedious work of re-resolving the same conflict over and over is not worth it, in my opinion.

In your example, you pretty much have to change the same line, or neighbouring line, those 10 times to end in that scenario. If it's just somewhere else in the file, git auto-merging will handle it just fine.

It seems like a very contrived example to me. We have been running rebase/fast-forward only for close to 10 years now, and I have never experienced anything that unfortunate.

> It seems like a very contrived example to me.

I run in to this quite frequently, even on projects where I'm the only one working on it (I tend to have a lot of things going on in parallel). Once branches diverge and commits accumulate it can become a right pain. Usually my solution is to merge master into the branch just to keep up to date and then just undo everything, make one new commit in the master, and rebase that. But in some more difficult cases it was "just merge and fuck it because life's too short". I've also just manually "copy/paste merged" things to a new branch, because that seemed quicker than dealing with all the merges/conflicts.

Maybe there are better ways of doing this, and arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...), but it's not that much of a contrived edge case.

> arguably I shouldn't have all these long-lived branches in the first place

This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.

Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.

  > If you have multiple long-lived branches, there's no technical
  > solution to preventing rot -- you must actively keep them in sync.
Rebasing isn't an alternative to this, it's just a different way of manually keeping in sync.

  > Regularly merging in main is the opposite of the proper solution.
  > Constantly rebasing on top of main is the proper solution.
Why? You've given no justification for your preference.

> Rebasing isn't an alternative to this, it's just a different way of manually keeping in sync.

I never said it was, I said it was the right way to keep them in sync.

> Why? You've given no justification for your preference.

I don't need to, the GGGGP said it perfectly: https://news.ycombinator.com/item?id=33705026

A rebase and a merge result in the same code. A rebase is more error prone though. Just because someone "feels" a merge isn't as safe doesn't make it so.

> A rebase and a merge result in the same code.

If done correctly, that's true, but it's beside the point. The reason to prefer one over the other is the failure mode.

> A rebase is more error prone though.

On what metric? In my experience, a merge is far, far more likely to silently introduce a production bug. I've never seen a rebase fail that way.

Doesn't rebase use the exact same automatic merge algorithm as a merge? They are equally as likely to introduce a production bug. Especially if adding a tool like rerere into the mix to do even more auto-magic merging when you hit the differences between rebase and merge.

There are two distinct possible problems:

1) The merge auto-applies cleanly, but the merged code is wrong. This is pretty niche, usually, but happens in certain edit patterns. I've never seen this produce a syntactically-valid, semantically-invalid construct (but I suppose it's possible) so generally these are caught by the compiler.

2) The merge does not auto-apply, so you get into manual resolution. This is where things get hairy.

The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal. So you end up with a bunch of code changes that logically belong to another commit, and are grouped together for purposes of expediency.

Rebase applies your changes to another branch as though they had been made there originally. If a conflict comes up, you already have all the context needed for how to resolve it, because you just wrote that code. The fix goes where it belongs.

All I can tell you is that I've been bit by merge-induced production bugs enough times that I now work to avoid that particular failure mode.

> The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal.

I'm not sure where this rule comes from. For code review, I for one normally review all of the changes that are going into master, and only look commit-by-commit if it becomes overwhelming - so, unless this is a huge merge (which should generally be avoided anyway), I wouldn't really see how this is a problem.

The only real problem I have with merging into your local branch to keep it in sync with master is the way it pollutes history when it is finally merged back into master. This is enough of a problem that I and my team always rebase unless we end up in one of these rare cases that I was highlighting.

> This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.

Well, merge actually works much smoother and rebase gives a lot more grief, so the problem is with rebase.

> Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.

The "proper" solution is the one that allows me to get stuff done. The only thing that matters is how the main branch ends up looking in the end, and what I do before that isn't really all that important.

Another problem with rebase is when multiple people are working on the branch; it requires careful coordination if you don't want to lose work. Overall, just merge in main is usually the best strategy here.

Always surprising when folks are confused about how to collaborate on git branches... I'd expect the recursive solution to be more obvious!

> The "proper" solution is the one that allows me to get stuff done.

Yeah, but the stuff that needs to get done doesn't end with your commit, it starts there. Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.

> Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.

How so? If I make an error with a rebase then I risk losing my changes. You can fetch it from the local reflog, but that's not so easy. With a merge I have a merge commit which records what was merged.

We're talking past each other. You're describing issues that come up during the git workflow. I'm talking about production bugs.

You're being quite cryptic in this entire thread and I to be honest I have no idea what you're talking about any more.

How do you constantly rebase on top of main if more than one person is working on the feature branch?


What does that mean in the context of git?

There is nothing special about the main branch! This is the recipe to collaborate in git:

- Pick a shared branch to work on.

- Work.

- If you complete within a day, push to shared branch.

- If you need to hold onto it longer, make a new branch, switch to it.

- (Possibly recurse.)

- Complete work, rebase new branch on shared branch, push.

And of course feel free to replace branch with remote/branch. It is distributed, after all, nothing special about any particular server.

A worked example, to make it more concrete:

- Pick the shared branch main.

- Work on a feature for more than a day, so:

- Create a feature branch feature/e2ee, switch to it.

- Recurse, since you'll be doing database updates and I'm adding the UI.

- Pick the shared branch feature/e2ee.

- I create a branch git.sr.ht/~couch/new-twitter/feature/e2ee

- You work and push to feature/e2ee.

- I complete my work, rebase the branch, and push to feature/e2ee.

- We are satisfied that we've completed the feature, rebase and push to main.

That doesn't really solve much: if both you and I rebase our personal feature branches onto master at different places, when we both try to push to the shared feature branch, we'll have a REALLY bad time - especially if we actually had to do conflict resolution.

> arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...)

Given that this scenario is common for you but sounds contrived to others, I would argue that this doesn't work well for you. It's just familiar enough that you're willing to deal with some pain.

Short-lived feature branches sidestep this hell. Longer-lived projects can almost always be partitioned into a series of shorter mergeable steps. You may need support/buy-in from your manager, I hope you get it.

It's not a organisational/manager problem; it's just how I like to work. I often work on something and then I either get bored with it or aren't quite sure what the best way is to proceed, so I work on something else and come back to it later (sometimes hours later, sometimes days, weeks, sometimes I keep working on it until I get it right). I do this with my personal projects as well where I can do whatever I want.

I know some people think this is crazy, but it works well for me and I'm fairly productive like this, usually producing fairly good code (although I'm not an unbiased source for that claim).

In the end I don't want to radically change my workflow to git or other tooling; I want the tooling to adjust to the workflow that works well for me.

Doesn't git's rerere help here?

I looked at it before and decided it was too "magic" and it frightened me.

So probably? But I want to avoid https://i.redd.it/jdqjhi8qv3x71.jpg

Sounds like you've never worked on a project with a file everyone wants to append to :)

If every error in your system needs a separate entry in the error enum, or every change needs an entry in the changelog - loads of changes will try to modify the last line of the file.

Even multiple appends are not that bad for rebasing - if you put the remote changes before your own then after the first commit the context for your remaining commits will be the same.

If order actually matters then yeah, git can't magically know where each new line should go.

Oh, I have. :-)

I'm not saying these situations are impossible. But you can work towards reducing when they arise. If everyone needs to change the same file, then it sounds like something should be refactored (it's probably a quite big file as well?).

If every error needs to go to the same error enum, that sounds like an error enum that might benefit from being split up.

And if every change needs to write to a common changelog file, I would personally find a new way to produce that changelog.

If it's that big a painpoint, then I would look into different ways to get around it.

Depending on the format of your files, entries like "changelog merge=union" in your .gitattributes file might work for you.

It happens pretty often when two different people are adding a new function in the same area of a file. It's likely that as you're working on that function, you'll be modifying the surrounding lines a few times (say, you have a first pass for the happy path, then start adding error handling in various passes; or, handling one case of an algorithm in each commit).

Rebase is still by far the most common case in our repo, as yes, these cases appear very rarely. But when they do happen, it's often worth it to do a merge and mess up a history a little bit (or squash, which messes with history in another way) rather than resolving conflicts over and over.

Someone else was also suggesting rerere for this use case, but I've never used it myself and I don't know how well it actually handles these types of use cases.

It definitely can, and it also sometimes happens to us.

But we try to reduce the chance this happens quite a bit, by avoiding letting files grow too big, for example.

Other things we do, is use codeformatting with rules that reduce the chance of merge conflicts. For instance, instead of having imports like:

  import SomePackage.{A, B, C}
we format it to:

  import SomePackage.A
  import SomePackage.B
  import SomePackage.C
That alone helps a lot. Other formatting rules that avoid dense lines, and instead splits over multiple lines also have a huge impact on merge-conflicts.

It's not as contrived as you may think. I, along with what I imagine are many others, do a lot of frequent micro-commits as time goes on and the feature becomes more complete, with a lot of commits in the same area of any given file. Rebasing a development branch in this state is pretty gnarly when a conflict arises.

Sadly, my current approach is to just reset my development branch to the merge base and make one huge commit, and then rebase.

I do a lot of micro-commits as well, though I rarely find that other members of my team are doing the same, to the same files, at the same time.

When that happens, we look into if it's possible to do more frequent merges (fast-forward rebases through Gerrit, to be specific) of our smaller commits to master, so we don't accumulate too much in isolation.

I find it helps reducing bugs as well, if two or more members are doing active work in the same area in that way, it's not good to be working in complete isolation as it just opens up for bugs because of incompatibility with the work going on in parallel.

Yeah that scenario only ever happens if you have an extremely large branch that hasn't been merged into the target branch for a long time (like a feature branch that takes months to develop), which btw isn't really something that should be done anyway (always try for more frequent merge with small side branches).

As sibling mentioned, this is totally solved by git-rerere.

Partially. Unfortunatly, rerere is not perfect, and will only solve 80% of the cases.

For big rebase, this can add up to a lot, which I just paid the price last week.

When can we move to Sapling again?

How would Sapling avoid this? As I understand it it uses the same data model as Mercurial which is really the same as Git's. I think you would need something like Pijul to solve it nicely. At least as far as I can tell.

I might actually try this in Pijul because I too encounter this semi-regularly (it's not a freak occurrence at all) and my solution is basically to give up and squash my branch before rebasing.

I already have - it’s pretty great :D

you can often solve this by squashing before rebasing.

That has its own problems. Separating whitespace-only reformatting commits from substantive commits makes it much easier to inspect the real changes, for instance.

Also, more fine-grain commits can help you trace down a bug, perhaps with the help of git bisect. Once you've tracked down the commit that introduced the bug, things will be easier if that commit is small.

Fortunately you can just merge from master, bringing your code back in sync with master without touching master itself. I see Beltalowda has mentioned this.

Reviewing a squashed branch is much harder than reviewing one set of closely related deltas, and then reviewing a different set of closely related deltas that happen to overlap.

You mean you can often give up and avoid solving the problem by squashing before rebasing?

To be fair, if you have 10 commits that all change the same file: squash with respect to your first commit, _then_ rebase. If you have lots of commits, always first squash-rebase to your own first commit, and only rebase to current main once that's done.

Rebase is being annoying here mostly because it's doing exactly what you want it to do: warn you about merge conflicts for every commit in the chain that might have any.

If you squash you decrease the granularity of your git history, though.

If you have ten different commits all touching the same part(s) of the same file(s), dial down your granularity a little: you've over-committed.

Either that, or you lobbed 10 different issues into the same branch, which is a whole different barrel of "no one benefits from this, you're just making it harder to generate a changelog, can you please not" fish.

man git-rerere

It often amuses me that some people will say "git is actually easy, you just need to know git commit, git pull, git push, and git branch", but when you go into the details, you find out you have to learn a hundred other rarer tools to actually fix the 5% or 1% use cases that everyone eventually hits.

For what it's worth, I had heard of git rerere before, and have looked at the man page, but haven't understood how it's supposed to work, and haven't had time to play with it to see how well it actually works in practice. `git merge` or `git squash` and accepting a little bit of a mess in history seems much easier than spending time to learn another git tool for some use case, but I fully admit I may be missing out.

When you hit a merge conflict, rerere (re)members how you (re)solved it and (re)applies the same fix when the same conflict happens again. But using it can create a new problem/annoyance: If you make a mistake with the initial resolution, and revert the merge/rebase to try again, it'll remember the wrong one next time. So you have find and tell it to forget that resolution.

Hmmm I think in your scenario you could avoid resolving the conflict 10 times by using `git rebase --onto`

Suppose "masterX+1" is called latest

Suppose "masterX" is the SHA of your mergebase with master (on top of which you have 10 commits)

`git rebase --onto latest masterX`

Yes. Usually I just squash merge to main and then `git checkout my-branch; git rebase --hard main`. Sure it squashes all the commits, but keeping them all is nearly never needed.

I was converted to rebase by my current team, and this hit every time.

I wish it works like merge, or exist a way to merge, resolve conflict, rebase?

Can I asked how they converted you (or do you mean by dictate, as opposed to becoming convinced it was better)? I find myself loving merges and never using rebases. It's not that I cannot describe technically what's happening, but I just don't understand the love.

(Not the person you replied to, but a passionate rebase-preferred) For me there are two reasons - one aesthetic, one practical.

The aesthetic reason is that it tells a more coherent story. The codebase is a single entity, with a linear history. If I asked you "how old were you last year", and you asked "which me are you asking about?", I'd be confused. Similarly, if I want the answer to the question "what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions. `HEAD^` should only ever point to a single commit.

The practical reason is that it discourages a bad-practice - long-lived branches. The only vaguely compelling reason I have heard for merge commits is that they preserve the history of the change, so that when you look at a change you can see how it was developed. But that's only the case if you're developing it (in isolation) for a long-enough time that `main` will get ahead of you. You should be pushing every time you have a not-incorrect change that moves you closer towards the goal, not waiting until you have a complete feature! If you make it difficult to do the wrong thing _while also_ making it easy to do the right thing (too many zealots forget the second part!), you will incentivize better behaviour.

(Disclaimer - I've been lucky enough to work in environments where feature flagging, CI/CD, etc. were robust enough that this was a practical approach. I recognize this might not be the case in other situations)

And yeah, I'm kinda intentionally invoking Cunningham's Law here, hoping that Merge-aficionados can tell me what I'm missing!

> what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions

I would assume that such a question would talk only about the main branch. However, I will point out that "what was the state of feature X" is only answerable with a non-linear story.

> The practical reason is that it discourages a bad-practice - long-lived branches.

Wait, long-lived branches are bad? Merging in partially done features is good? That seem insane.

First, if the feature is small enough to knock out in an hour, that's great. But sometimes it can take a couple of days. I should hope you have enough activity that the main branch will move in that time.

But committing partial features is crazy. Sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned. Other times, a feature requires changing something (e.g. an API) where a partial change cannot really work - and sometimes where you need to have a meeting before you do it. Consider the feature to be "update dependency X", which means you now have some number of bugs to track down due to the new verison.

Heck, sometimes a feature might need to be mothballed. Sometimes you have to wait for an external dependency to be fixed. And you can chuck your work, commit something broken, mothball it and come back when the external dependency is fixed or switch your dependency.

> long-lived branches are _bad_? Merging in partially-done features is _good_?

...uhhh, yes? I've never heard anything to the contrary. Can you explain why you think the opposite?

For long-lived branches: The longer a branch exists separately and diverges from main, the more pain you'll create when you try to merge it back in - both because of changes that someone else has made in the meantime (and so, conflicts you'll (possibly) have to resolve), and because you are introducing changes that someone else will have to resolve. The pain of resolving conflicts scales super-linearly - it's much better to resolve lots of small conflicts (ideally, so small that they can be algorithmically resolves) than to resolve one large one. Plus all the arguments from the point below...

For checking-in early and often: flip it around - what is _better_ about having the change only on your local repo, as opposed to pushed into the main codebase? If the code's checked in (but not operational - hidden behind a feature flag), then:

* your coworkers can _see_ that it exists and will not accidentally introduce incompatible changes, and will not start working on conflicting or overlapping work (yes, your work-planning system should also account for that - but extra safety never hurts!) * if you have introduced a broken dependency, or a performance black-hole (which might only be possible if you're running your code in "shadow mode", executing but not affecting the output until it's fully ready - which, again, is only possible if you check in early-and-often!), you can discover that breakage _early_ and start work on finding an alternative (or, if necessary, abandon the whole project if it's intractable) earlier than otherwise

In fact, to take your example - "sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned" - yep! This happens! This is not a counter-example to my claim! Orphaning an inactive "feature" that has been pushed to (but not fully activated in) production has no more impact than abandoning a local branch. Even orphaning a feature that has been partially activated is still fine, so long as it didn't result in irreversible long-term state-updates to application entities (e.g. if it added a "fooFeatureStatus" to all the users in your database, rolling it back will be tricky. But not impossible!). So there are very few (or no) downsides, and all the advantages I described above.

I do agree that API changes are the one exception to this rule - you should have those reasonably nailed down and certain before you make changes, since those affect your clients. But any purely-internal change which can be put behind a feature flag, on an inactive code path, in shadow mode, in canary/onebox testing, or any other kind of "safe to deploy in prod, but not _really_ affecting all of prod" - do it!

I'm not advocating branches should be made longer for no reason, but I see no reason to avoid them. I do think they should be made long if they need to be to encapsulate a feature. I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.

I mistyped at one point by saying to avoid a partial-feature commit when I meant partial-feature merge onto the main branch. Yes, commit to the feature branch often. Hopefully clarifying that resolves most of the issues that you raised as advantages.

Meanwhile, managing partially built features by feature flags seems worse. It has orphaned code migrate into the main codebase and stay there. You brought up a broken dependency. What happens if a dependency is broken and not likely to get fixed for a month? Just leave that code in the main codebase orphaned for a month? Further, having multiple partial feature commits complicates bisecting or simple reading a feature's history.

I concede feature flags for deployment has some advantages, especially for feature specific elevation through testing.

> I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.

Then we'll have to agree to disagree, as this is pretty fundamental to my argument - everything else ("Your coworkers get to see what you're working on and will notice clashes of intention earlier", "You can run incomplete features in shadow-mode to ensure they don't affect performance in production", etc.) is just sugar.

I really appreciate your well-reasoned and civil discussion!

Maintaining orphaned code has a cost. Keeping a change you've made to a function (and its callers) that's no longer needed obscures both the history and probably what it does.

Not saying trunk-based is wrong, but to say abandoning a feature is as cheap as in branch-based development fails to account for everything.

In my case, I switched rapidly to git-rebase because it produces history that is much cleaner and easier to understand. I only do merge if there is a good reason to preserve history (e.g. some other branches depend on it, or some test reports refer to a given commit).

I guess I find it easier to parse with feature branches than all on main.

Mostly is about the way they do things, and I always adopt the team ways (also: my initial PRs look weird to them!).

merge, then resolve conflicts with rerere, undo the merge and rebase

You should do reverse rebase(if it makes sense lol) for this. Instead of rebasing branch to master, rebase master to branch. The only downside is that it requires many force push in the branch.

Yeah force push on master is a huge no no - I can't even remember the number of times I've force pushed wrong shit in a hurry - I can't imagine entire team dealing with this.

I would first say that I would sooner re-code the whole feature by hand from memory than ever rebasing master onto anything for any serious project.

Even if we were to do that, rebasing master is likely to lead to the same issue.

My preferred solution is rebase featureB onto master for the 99% or 99.9% of use cases where this is smooth, and in the rare case that you have too many conflicts to resolve, merge master into featureB (and/or squash featureB then rebase onto master, depending on use case).

> it requires many force push in the branch

Can't say I recommend this approach.

> you might as well rebase or cherry-pick

Both tools are pure vandalism compared to merge. Among the two, cherry-picking is preferable in this case because you're "only" destroying your own history, so in the end, it's your funeral.

> Developer end up fixing additional issues in the merge commit instead of actual commits.

A merge commit IS an actual commit, in every sense of the word. The notion it somehow isn't, is what you need to get rid of.

Rebasing / rerolling is completely fine if done right, no need to be overly zealous. But merges are often more elegant

I think merge is great, having a “unit” for a feature branch being integrated is nice and not all things can be done in commits which are individually justifiable. The ability to bisect cleanly through the first ancestor is precious.

I do agree that resolving conflicts in merges is risky though. It can make sense when merging just one way between permanent branch (e.g. a 1.x branch into a 2.x), but as soon as cross merges become a possibility it’s probably a mistake.

> I do agree that resolving conflicts in merges is risky though.

How do you do otherwise, though? Or is your workflow a combination of rebases and merges? Continual rebasing of the feature branch onto `main` and then a final merge commit when it's ready to go?

> Or is your workflow a combination of rebases and merges? Continual rebasing of the feature branch onto `main` and then a final merge commit when it's ready to go?

Yes. You don't usually need "continual" rebasing, most commonly just once just before merging.

In fact a good merge tool can do it for you (and refuse to merge if there are conflicts when rebasing).

This is what I do unless I’m working with a large number of people on the feature branch (which is rare, usually that would be multiple branches).

You get the equivalent of “mergeless” history (just restrict your git log to merge commits) but can dig into the individual feature histories easily.

This work if you have only experienced professional developpers in the team. If you have juniors or non devs (mathematicians, geographers, qwants...) that just happen to also code, rebase is a minefield. This is espacially true in open source contributions.

Merge and conflict resolution is a minefield if unexperienced developers do it too. Fortunately it can (often) be arranged that those with some understanding of the issues involved can do the resolution.

you get the same, and often more, conflicts using rebase

Always clone to another folder before merging ;)

If you can’t rebase, I don’t want you pushing to my main branches. I would rather teach everyone how to rebase before I cave and allow merge commits.

In a perfect world with infinite resources and no time constraints, sure.

I've done several projects rebase-only with limited resources and time constraints. In fact, it saved resources and time when we did it.

You have been repaid the time investment spent learning rebase commands, after once being able to avoid a really bad merge conflict.

I usually rebase the branch onto the upstream branch (master or main or whatever) if there are merge conflicts. You can then resolve the conflicts commit by commit. This requires force pushes, but they are are not normally a problem because only one dev tends to work on a particular branch before it's merged.

If you do have multiple devs working on the same branch, use `git pull --rebase` to stay in sync with each other, don't use merges and leave lots of merge commits. If you need to resolve conflicts with upstream, make sure other people have stopped working on the branch, rebase it, then merge.

Rebasing is a tool of last resort, when something has so fowled up the code that merging a large-scale refactor is even more time consuming.

Rebasing takes longer and is actually more prone to error because of the clunky interface. There is absolutely nothing wrong with squashing commits in a feature branch and merging that into master/main. In fact, it's generally better for the health of the repo and the mental health of developers.

in my experience, rebase works great if the commits are structured and much more painful with lots of overlapping changes, say by continiusly doing _wip_ commits every hour

I certainly am not perfect to the degree that I make a single commit or a relatively small number of “structured commits” to any branch I’m working on. Neither is anybody else (regardless of whether they think they are). Anyone who tailors their commit structure around a poorly designed tool interface is just wasting their own time, and therefore the company’s, in my opinion.

Well, I do, more or less. Whether it's a waste of time or not is dependent on how efficient you can do it, in my workflow it's efficient, and to which degree you can avoid waste as a consequence of a "messier" approach. For me, it evens out.

I don't understand the hate for merges, or the love for rrbaded. Let's consider what may happen using a github flow strategy (main branch, feature branches based solely on main):

* If you screw up a merge, you undo the merge commit. Now your branch is exactly as it were. May not happen with a rebase.

* If you push some code to the remote, and later find out it was outdated, you can merge it with main and push again: no need to force, github can distinguish what's already been reviewed and what hasn't. With rebase, you may need to push -- force, and if someone already reviewed the code they're going to be shit out of luck, as github will lose the capability to review "changes since last review", as the reference it has may have been lost.

I also merge these features using squash commits, which provides a very linear history. This also saves some effort (you don't need to be rebase the commits in the feature branch, which can be a pain in the ass for unorganized people and git newbies, and you are pushed towards making smaller, granular PRs that make sense for the repo history).

I usually do `git merge --no-ff development` when working on my feature branch. We do not leave feature branches "open/live" for too much time, so merge conflicts are not usually a problem, but sometimes they do happen.

I like cherry-pick, but I barely use it (e.g., I need to cherry-pick one commit from branch X into my branch). I don't like rebase much because it requires force-push.

rebase only require force push if you're rebasing something already pushed

What I would like to see is a way to enforce fast-forward only merges along with the forced creation of a merge commit that references the same git tree as the HEAD commit of the branch that was just merged.

This way, you know which set of commits was in the branch by looking at the parent commits of the merge commit, but the merge commit itself did not involve any automated conflict resolution.

I've wanted this for awhile as well. Squash only merges, which are enforceable in github, get you close but leave you without any automated way to determine if a given branch was ever merged to main or not ...

merge is the only way to reliable determine if a branch is merged ¯ \ _ ( ツ ) _ / ¯

Yes, it is a shame that you can't combine git merge --ff-only --no-ff .

git rebase && git merge --no--ff

(a) Because rebase is run on the branch to be rebased, but merge is run on the branch being merged to, that reverses the parents of the merge commit and puts it on the rebased branch rather than the parent. (b) Even if that were fixed, it alters the rebased branch, rather than stopping and warning in an unexpected case.

I really do want the natural semantics of merge --ff-only --no-ff.

you can do this

git rebase -r target source && git checkout target && git merge --no-ff target

since what your asking for is not a merge, you'd have glue it together by yourself. Perhaps, add it as a script

It is also a security risk. Someone could add whatever unreviewed code and it would get glanced over as a merge commit. Put your payload in an innocuous file not likely to be touched and call a boilerplate-looking function as a side effect from somewhere.

> If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it.

I don't get it. If you rebase, you get 20 chances to do the same.

Reviewing merge commits is harder because they will sometime have huge diffs to both branches.

Rebasing is the process of redeveloping your feature based on the current master. This is smaller, easier steps to review later.

It is a pitty that we can't have tooling to create "hidden" merge commits to allow to connect rebased branches, this would retain the history better and allow pulling more easily.

If you have bunch of commits in a feature that are related it is easier to revert merges (even if you do pre-merge rebase from master and then merge with --no-ff)

I'd like a "quick ff" that will ff if there are no conflicts, or ff as far as it can with no conflicts - and an easy way to apply to many branches.

Also, a way to "rebase" that works the same as cherry picking commits on top of the target. As far as I can see, the regular rebase works it's way up the target branch, so that I end up resolving conflicts in code that eventually changed in the target.

> Developer end up fixing additional issues in the merge commit instead of actual commits.

As long as the merge commit is being reviewed with the rest of the PR, that's fine, right? (We use rebase while working on feature branches, and then squash & merge for completed PRs, which seems to be the best of both worlds)

Personally, I believe merging 'master' to your feature branch is the wrong model... what one should do is create a new branch from master and merge the old branch into it.

Why? Merging master into the feature branch is done so that you can test the conflict resolution in the branch before inflicting it on everyone. It’s also done on a regular basis in longer running feature branches to prevent large conflicts from accumulating- you can merge master into your branch multiple times to stay current with master before ever merging back into master. I’m not sure why parent says this causes them to lose track of which changes happened in which branch. The history does get a bit more complex at a glance, but for any given commit, it’s easy to pinpoint their origin if using only merge commits. It only gets harder if you accidentally rebase someone else’s commits along the way. For smaller feature branches and smaller projects, it’s okay to merge branches into master, but for large branches, large projects, large teams, and teams that care about testing, merging master into feature branches is a best practice. What makes you consider it ‘wrong’?

A merge commit is just a commit with two parents. You're not affecting the master branch at all when you "merge in master", you're just creating a new commit where the first parent is your branch, and the second parent is the master branch.

If you do things the way you're suggesting, you'll make it really hard to tell what commits were made on your branch. Git clients tend to assume the first parent is the branch you care about.

If you’re merging (and not rebasing) it’s the same exact thing. You’re just switching the “incoming” version, but conflicts will be identical.

I have never had issues with merge, unless rerere was enabled. I've had some extremely surprising results recently with it enabled and I finally disabled it for good.

What about the commit signatures? Only merge keeps them, right?

I don't know how stupid this is on a scale from 1 to 10. I've created a wrapper [1] for git (called "shit", for "short git") that converts non-padded revisions to their padded counterpart.


"shit show 14" gets converted to "git show 00000140"

"shit log 10..14" translates to "git log 00000100..00000140"

[1]: https://github.com/zegl/extremely-linear/blob/main/shit

Other customers also brew-installed: fuck [1]

[1]: https://github.com/nvbn/thefuck

You may want to take a look at the monotonic commit numbering scheme that Git already has, before trying to hack one into the hashes:


Why the trailing zero? The article quotes hashes starting with "0000001", or "0000014".

Shouldn't "shit show 14" get converted to "git show 0000014"?

Thank you for addressing my one and only concern with this scheme! No notes.

Mercurial always has had sequential revision numbers in addition to hashes for every commit.

They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.

But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.

Seems like a design fault in git that commits only have a single id (sha1 hash) and that hashes are written without any prefix indicating which type of id it is.

If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.

E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.

See also Lucky Commit [0], which uses various types of whitespace characters instead of a hash inside the commit, which makes it look more magical.

I wonder about performance, though. Why is the author's method slower than the package I linked?

[0]: https://github.com/not-an-aardvark/lucky-commit

Thanks for sharing, this is really cool! Using whitespace is a really clever trick, and running on the GPU makes it even more impressive.

I've been using githashcrash [1], but it's only running on the CPU, which is why it's a bit slower. :-)

[1]: https://github.com/Mattias-/githashcrash

Using whitespace is cool, but you know what would be really cool? Using a thesaurus to reword the commit message until it matches the hash :)

... or refactor the code using an automated thesaurus and a bit of AI in a way to generate a particular hash.

- Hey Bob, why did you rename the 'pick_person' function to 'choose_desirable_candidate'?

- git made me do it

I was going to solve some business problems today but instead there became an urgent need to GPU accelerate the task of making my commit hash appear to have the rich semantics of "a number that goes up". Hm, I bet this old FPGA could be repurposed to add a 2x speedup...

"Indubitably overhaul insect"

Only works if your commit message is written in hexadecimal characters

I don't understand — the example in the article adds the string "magic: MTQIpN2AmwQA" to the commit message. The final hash is hexadecimal, but what you feed into it isn't.

"it" (in "it matches the hash") = "the next sequential number", not "the commit message", afaict. Not very clear, I agree.

Update: git-linearize now uses lucky_commit as it's backend!

I haven't checked your codebase so I don't know how easy it was but damn, you replaced your backend within 16 minutes according to your comment timings.

That's some nice modularization. Good job!

You're giving me too much credit. The script [1] is only 50 lines of bash.

[1]: https://github.com/zegl/extremely-linear/blob/0011003da13132...

Git also support extra headers in commits. Interesting that neither went with that.

What do you mean by "extra headers"?

Exactly what the name says.

A git commit is composed of a number of headers (key: value fields) and a commit message.

There is a set of "standard headers" (tree, parent*, author, committer, encoding?), but then you can add more. In fact there's a set of semi-standard headers, as in headers git itself will add under some conditions: `gpgsig` and `gpgsig-sha256` if the commit is signed, and I think `mergetag` for signed tags. They are documented as part of the signature format but they're not "baseline" features: https://git-scm.com/docs/signature-format#_commit_signatures

But because of this, a git client should support arbitrary commit headers, and round-trip them even if it does not expose them.

I was trying to make something like this post a couple of years back and used custom headers, even made this repo with a few zeroes with no salt on the commit message and no shenanigans in the files: https://gitlab.com/pedroteosousa/teste/-/commit/000000005093...

I have this ugly code that finds the salt given the target hash, and another that actually creates the commit given the salt. Is not very useable, but I'll leave it here for anyone that finds it interesting: https://gitlab.com/pedroteosousa/useless-git-tools/-/tree/ma...

I figured that a good option would be to slightly change the date. I don't know what the date resultion us but shuffling it around by a bit shouldn't be an issue.

Of course if the date only has seconds resolution it may be to big of a shift to be reasonable.

I fail to see the point of this, in fact, I think this is a fundamentally flawed approach to dealing with your revision history. The problem is that rebasing commits has the potential of screwing up the integrity of your commit history.

How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.

The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.

It's a joke. The swooshing sound you heard was it going past you.

> The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.

To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.

Pretty analogy, but I don't see how a specific functionality of git (commit history) that has no use case other that looking tidy compares to a handle of a hammer.

This somewhat depends on how big your features are. Arguably, large long-lived feature branches are the problem themselves. If larger features are broken down and developed/merged piecemeal, then you still have smaller commits you can fall back on.

IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.

> I fail to see the point of this

I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.

Quite the opposite. The largest companies just about all use linear commit histories.

That's not the opposite of what I wrote. "This" referred to brute forcing the hashes, not to linear history.

I think the fundamental misunderstanding people with your point of view have regarding linear commit histories is that it's not just about different VCS usage, the entire development process is changed.

When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.

> How are you going to deal with non-trivial feature branches that need to be integrated into master?

That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.

It is amazing how much time projects seem to spend on rewriting history for the goal of displaying in in a pretty way. Leaving history intact and having better ways to display it seems far saner. Even after a merge, history in the branch maybe useful for bisect, etc.

Yes, a thousand times yes.

If people knew about --first-parent everyone could stop complaining about merge commits in the history.

This is a fun idea, but it will mess with your GC heuristics.


Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.

This isn't the end of the world, but something to consider.

Could use little-endian numbers to avoid this: 0000, 1000, 2000, 3000, …, e000, f000, 0100, …

I think the sweet spot in Developer productivity was when we had SVN repos and used git-svn on the client. Commits were all rebased on git level prior to pushing. If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.

We performed code review with a projector in our office jointly looking at diffs, or emacs.

Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.

> If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.

We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.

Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.

I once interviewed a junior-ish developer who told me that his then-current team had a dunce cap to be worn by whomever broke the build. I copied it immediately. There was no toxicity, it was a good laugh, and as manager I wore it more than once being a bit too liberal with my commits.

On another team I was on, in 2002 using CVS, we had an upside-down solo cup as a base for a small plastic pirate flag. If you were ready to commit, you grabbed the pirate flag as a mutex on the CVS server. Of course, this turned competitive… and piratical.

I despair about long-lived git feature branches and pull requests. The pull request model is fine for open source development, but it’s been a move backwards for internal development, from having a central trunk that a team commits to several times a day. The compensating factors are git’s overall improvements (in speed and in principled approach to being a content addressable filesystem) and all of the fantastic improvements in linters and static analysis tools, and in devops pipelines.

> I despair about long-lived git feature branches and pull requests

This comes up a lot - multiple people on this thread have even said that it's a bad idea to have a long running feature branch.

This seems like a case of the tool imposing it's will on workflows, rather than enabling them. Not all features are tiny. I don't see anything wrong with a long lived branch if the feature is in fact large. After all it may be completely redesigned multiple times over before being merged into the main branch. Or it may never make it.

And no I don't think it always works to break down a large feature into smaller ones, because your course may change as you go, and it's much easier not to have to revert incremental features when it does.

But people are so worried about having a perfect history. So they rebase. But if it's a long lived (shared) branch you don't want to do that. So now what? A merge would be ugly, can't do that. So now you've painted yourself in a corner for no good reason.

A long lived branch was a pain even in the CVS days. I'm in particular thinking about the "aviary" branch (for Phoenix/Thunderbird) Mozilla had for quite a while.

Of course tooling can make it harder — there was no such thing as rebasing on CVS.

A long lived feature branch is not a problem if you rebase it to master often. Move all refactoring to the beginning of the branch and merge them to master if they become too many.

> A long lived feature branch is not a problem if you rebase it to master often.

Yes but if it's a shared branch then you may have problems with this.

The safer way is to merge from master into the branch but nobody wants to do that because it's ugly.

For long-lived feature branches that are the target of multiple smaller PRs, history should never be rewritten. I call these branches integration branches. I agree with you wholeheartedly that master should be merged into the branch. It's also so much easier to resolve merge conflicts all at once in the merge commit rather than iteratively for each commit. Also, the information on how merge conflicts are resolved is then preserved in the merge commit. It's critical however that when you merge the branch back into master, you use --no-ff. It gets really confusing when you fast forward master to the feature branch.

The solution for it being ugly is to look at main/master's history with the --first-parent option. This lets you cut through the noise and just see the true history of the main branch. Without --first-parent, you see a bunch of commits in the log that master was never actually pointing at. This is why it's critical that you use --no-ff when merging these 'integration' branches as I call them. It's important that the integration branch is the second parent of master.

I agree with you here.

But what you are describing still isn't good enough for a lot of people, because even though `--first-parent` hides the noise it's still there and just knowing there's a mess under the rug is enough to be problematic.

I don't think it's really the fault of the tooling, moreso with what is a common interpretation of what is a mess and what isn't. If the github commit history allowed you to show `--first-parent` maybe it would be less of a problem.

The github history view is garbage and people should stop using it.

Can one merge master into tye feature branch often, and then interactive-rebase onto master removing all the merges?

> The pull request model is fine for open source development, but it’s been a move backwards for internal development

The more paranoid would claim that requiring PRs that then require approvals prevents a malicious engineer from adding an obvious back door to the code.

You would hope you can trust your co-workers, but sometimes a hack is an inside job.

There are all sorts of workflows that can be arranged to prevent that while still having optimistic continuous integration on trunk.

No joke, a few weeks ago a colleague from university shared a few anecdotes about his mentor-coworker-boss at work with me, and it's similar. Every time they broke the production branch and the boss had to change the code or pull out some AWS magic to restore a database, he would give the fixed commits names like "Cgada de [Employee Name]" which roughly translates to "[Employee Name] F*ed Up", since he knew they wouldn't forget it that way.

It's specially cool given that he would always see his employees' f*k-ups as learning opportunities. He would always teach them what went wrong and how to fix it before shaming them in the git history. He always told them he did it to assure they wouldn't forget both the shameful f*k-up + the bit of learning that came along with it. They always laugh it off and understand the boss' intentions. It isn't harsh or anything.

Yes, it's all about context. Good intentions matter a lot here.

Additionally, it keeps developers humble, because their mistakes are in the codebase "forever".

That said, it is s fine line - things can easily get toxic very quickly, so it's important that everyone sees it as a (half serious) joke.

Thankfully modern development practices should ideally run tests before commiting, the build should never be broken.

With good infra, everything from unit tests to integration to acceptable tests get ran before code hits main.

The only excuse for builds breaking nowdays. is insufficient automated safeguards.

Our whole practice revolved around not pushing broken code because all code was tested locally prior to the push. In fact we practiced continuous integration as in its original meaning, integrating code multiple times per day. Releases were performed from a release branch so the occasional hiccup wasn’t worse than let’s say a PR that does not build. However fixing the broken build was the TOP priority if it happens (like every two months)

It's not toxic because every single developer knows that it could be them next time around.

It is literally the definition of toxic. It is the antithesis of making it okay to fail, having the entire team to take responsibility. Instead individual mistakes are highlighted and publicly shamed. How can you possibly not think this is toxic?

Toxic is not the highlighting of breaking the build with a trophy, it's what gets associated with it.

Imagine an "ugliest shirt" trophy, given out to whoever wheres the ugliest shirt of the week. At a fashion magazine, this may be toxic shaming. At a tech-heavy startup it might have people start buying the worst shirts they can to try to win it.

If the attitude associated with getting the trophy is condemnation, that's bad. If it's a reminder that everyone fucks and be careful, that's fine.

Oof that hits a sore spot. I was the 2016 Winner of the Ugliest Shirts Award at one of the first technology companies I worked at. Being singled out in front of all your peers for poor fashion sense and then the ensuing obligatory laugh ruined my opinion of that company's leadership. I would strongly encourage anyone in a professional environment (especially those in leadership roles) to keep comments on appearance to yourself.

I'm sorry to remind you of a bad time. I would like to point out that "2016 Ugliest Shirts" is a pretty different concept from "Person who wore the ugliest shirt this week" with a picture of you in a ratty beloved tee. It sounds like those were year end remarks, which means instead of judging an act they were judging your long term taste. Also it implies the most memorable thing about you was your shirt choice. And lastly, you weren't anticipating it, so you found out everyone was secretly judging you on something.

If, during orientation you were told a trophy gets given out every week for it, and some people wear really ugly shirts each Friday to try to win it, it would have felt very different.

But yeah, year end humorous awards like that probably belong confined to episodes of The Office.

Sounds like the definition of making it okay to fail.

The only consequence is a plush toy of shame on your desk until the next person fails? Yes, please.

Sounds like a great way to lighten the mood about failure.

Uh no, and please never work with me. The definition of "making it okay to fail" is a pat on the back and a retrospective to figure out what went wrong and prevent it from happening again.

Thank you for distinguishing yourself.

I'm not sure why you think a humorous plush toy precludes any of the other things you mention (retrospective, etc). I see a plush toy as something that makes failure an amusing thing to laugh at, rather than something to be hung up about.

But don't worry. At your request, I will not work with you.

What I meant is that we know, inside our blood cells, that breaking the build can happen to anyone, and probably will. The trophy is not public shaming, it's the camaraderie that comes from shared humility.

You say to somebody downthread "remind me never to work with you". I would find it difficult to work with someone as hyper sensitive -- on other people's behalf, yet! -- as you seem to be in this thread.

Think of it more as a fun and gentle ribbing than public shaming.

we did something similar, but everyone knew it was a joke and we all took turns with it. I guess we didn't take ourselves as seriously

I have my old team's rubber chicken and I'm never giving it up.

In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.

Yep. It makes a lot of sense for open source where gate keeping makes sense (to reduce feature bloat, back doors and an inflated API surface that needs to be maintained almost indefinitely).

Most corporate code bases are written by a smallish team operating under tight time constraints so most contributions are actually improving on the current state of the code base. Then PRs delay the integration, and lead to all kinds of follow up activities in keeping PR associated problems at bay. For example the hours wasted by my team in using stacked PRs to separate Boy Scout rule changes to the code from the feature is just abnormal.

Commit queues are so far superior to shaming broken builds that I think it's only nostalgia that makes you miss it.

Absolutely. In my experience, it’s only “not toxic” to a few people, and for most others it is toxic, but the people who like it won’t ever be able to see that.

exactly. even if the current team is cool with it, team+1 may not be, and now they're in a position that feels shitty to them. it's good 'ole boys club shit.

people brag about their dunce caps, "john's fault" commit messages from managers, and other forms of public shame as a badge of honor when it would be so much more interesting to here about how they fixed their broken processes that led to the problems in the first place.

"oops, a developer fucked up the prod db" says more about the org and its processes than it does about the developer.

For the record: I am not recommending people to adapt a toxic culture.

What I would like people to take away from these discussions is the curiosity to question established practices and processes and re-evaluate the cost-benefit ratio of process steps just like the manufacturing people I write software for continue to optimize their working mode again and again

I think CI (even better, commit queue) is pretty much table stakes for a 3+ person project at this point.

This (plush toy and projector) has “feel good” all over it :)

Next step: a svn-git proxy that allows one to use a subversion client with a remote git repository.

Two years from peak Covid, and the plushies are the object of nostalgia.

I am literally in the middle of trying to convince my group from moving away from all this. Would you recommend going back to this system?

In this case I reminisced about the toolset but the work flow is what brought the value so I advise of course against using subversion.

Look up trunk based development and read the continuous integration book published by Addison Wesley (Is it the hez humble book or the Duvall book I always confuse the authors, both books are great though).

The hard part will be to convince people of exploring a different way working mode AND to learn that what is proposed is not an anarchist style of development but a development model that optimizes on efficiency

So I'm thinking about my approach, which is "use commits as game save points, mostly WIP, then use rebase to tidy things up before publishing".

Wouldn't working on trunk still mean I'm working on a feature branch, but it all ends up squashed into a single commit? Or do I lose my opportunity to polish?

It has been my habit for a while to make the root commit 0000000 because it’s fun, but for some reason it had not occurred to me to generalise this to subsequent commits. Tempting, very tempting. I have a couple of solo-developed-and-publicly-shared projects in mind that I will probably do this for.

How do you make the first commit 0000000? (Without using this project, obviously).

You only need to do it once if it's the first commit and you make it empty...

Might be by using that hashcrash tool.


I bet I wasn't the first person who thought this would have to be done by modifying actual file content — e.g. a dummy comment or something. That would clearly have been horrible, but the fact that git bases the checksum off the commit message is... surprising and fortunate, in this case!

It's a hash of everything that goes into a commit, including the commit message. The idea is that nothing that makes up a commit can change without changing the hash.

> It's a hash of everything that goes into a commit, including the commit message

... and, very notably, the hash of the parent commit. That is also part of the commit, which means that changing a parent commit would also imply changing the hashes of all later commits. This is sort of the whole point of git/version control.

This might be a stupid question, but does anyone call git history a blockchain, then? A centralized blockchain, without proof of work or proof of anything really of course, but still, it sounds like the basic blockchain idea is there

Git branches form a https://en.wikipedia.org/wiki/Merkle_tree. Blockchain forks do also, but the goal is usually to ignore all but the longest.

Poor man’s blockchain it is, then :)

I feel like it would be better to have some dummy file in your repo that the tool modifies than mucking up your commit messages

I wonder if Git provides a pluggable hashing mechanism as part of SHA2 migration.

I imagine stuff like this and SVN to Git mirroring to work nicely with identical hashes.

Not currently, it’s a repo-level flag and you get one or the other.

It’ll undoubtedly be easier to further expand, but it’s nowhere near pluggable.

> Full collision (entire hash is zeros, then 000...1, etc.) — `git linearize --format "%040d"` (takes ~10³³ years to run per commit)

Hah :D

This is horrible and I like it.

I find tags to be a fairly useful way of providing a linear progression, but I guess that's no fun.

> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.

That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.

[0] https://www.perforce.com/video-tutorials/vcs/mainline-model-...

Yeah, I think tags are a more practical way of accomplishing this. If you’re really interested in having a linear history, it might also make sense to switch to an alternative. Mercurial has linear version numbers and can even push to Git repositories.

At risk of coming across as a humorless Hacker News commenter, I will add that I enjoyed this post. It’s a neat hack!

Yes, it is a cool hack. I enjoy these, even if I can't find a practical application.

I thought I was a very tidy person, then I saw this.

Im not sure it is tidy to inject random junk into your commit message to get a hash prefix

Or maybe it is _extremely_ tidy.

I think the question boils down to "where is the junk?" ;) I.e, there is always junk, some put it in a commit hash, others into the files committed.

And now this uses invisible junk (white space). See update: https://news.ycombinator.com/item?id=33704810

Sane revision numbers are among the many reasons I prefer SVN to GIT.

You could automatically tag each uploaded commit with a number drawn from a sequence - using a git post-update hook. The only problem is that this centralizes the process. It's not possible to have fully "blessed" commits without pushing them first. And that's how SVN works, too.

For local repositories, you can do it as a post-commit hook.

In the hook:

  old=$(git rev-parse HEAD)
  new=$(brute force $prefix)
  git update-ref -m "chose prefix $prefix" --create-reflog HEAD "$new"
Of course, it's pretty silly and slow.

... What are the other reasons?

Basically, not to put too fine a point on it, I believe that distributed version control is a problem no one ever truly had, and no one intends to ever have in the future.

I mean: Imagine going back in time 20 years to when git, hg, and bzr were created and telling the creators of those tools: "Hey, while designing your technology, you should be aware that it'll end up being used as a worldwide centralized monorepo run by Microsoft, and no one will ever use any of that distributed stuff."

They'll either laugh you out of the room or you'll be in trouble with the Department of Temporal Investigations for polluting the time line, because what we currently understand as git sure as hell won't be the design they'll come up with.

So for me: I prefer centralized. And SVN is just a reasonable one to use.

It's worth having distributed version control just so you can work on your own with your own branches and crap and only bother others when you're ready to share. And so you can work seamlessly when offline.

SVN feels like working in someone else's kitchen while several other people are trying to cook in it, too. It's hell. I prefer that we each have our own kitchen and bring our dishes to the table when they're ready.

I've also repeatedly found git a suitable (if not great—if they'd put all their effort behind libgit2 and make that the official implementation, that'd help a ton) tool to form the foundation of larger systems. It's a damn fine toolbox for attacking certain problems. SVN wouldn't have been much help in any of those situations.

> it'll end up being used as a worldwide centralized monorepo run by Microsoft, and no one will ever use any of that distributed stuff.

And I thought I use git in a decentralized fashion all the time … at least I don't need to connect to any other machine when committing, switching branches, merging, rebasing, etc. And my colleagues can do the same without any network connection at the same time.

Also, while it has the biggest brand recognition, not everyone is using GitHub for all their repositories, are they?

> I believe that distributed version control is a problem no one ever truly had, and no one intends to ever have in the future.

Sure. The problem is not "distributed version control", some problems are:

- I'm on a train with no internet, finished working on a thing and want to start working on another thing and don't want to mix them up.

- I want to make a branch and don't want to wait for eons while everything gets copied on the server.

- Oops there's a problem with the server now no one can perform any work.

Yes, SVN might simple commands, but its internals are messed up. Git's UI sucks, but just learn about blobs, trees, commits, branches (pointers to commits), and you basically understand how Git works.

All those things could be done on a centralized version control system as well. Just not on SVN.

Working offline could be done on a centralised VCS... it's just a bad idea? You'd need separate mechanisms to work offline vs online.

Tbh I'm not sure why git is called "distributed", it's a local system with remote sync capabilities.

It is distributed because everyone has a copy of the full source (nobody's copy is the copy) and you can push and pull from any machine. I can literally push from my laptop to yours (if I have an account on your machine) and you can pull from mine to yours. Github's copy of my code is exactly the same as Dave's copy. It just happens to have a fancy web interface.

In practice of course almost nobody uses Git to push/pull from other people's personal machines (I think I've done it once ever). But it's pretty common to push and pull from multiple hosted repos (e.g. Github and an internal company Gitlab). I imagine doing that sort of thing with SVN would be a right pain.

Oh we use distributed day in and day out for everything. Once you start battling censorship you’ll get it.

...so you are among the 1% who use the functionality that causes 99% of what makes git's mental model so convoluted and hard to learn (for everyone, not just the one-percenters).

Fair point! I would love to use the Extremely Linear Git History of the parent post.

I actually wrote a new layer on top of Git years ago (I called it git4 IIRC) and I pitched it to both GitHub and GitLab but they ignored it.

I guess I should have pitched it to the mailing list. I think I was too afraid it was dumb. Will do that at some point.

the mental model is hard for so many people precisely because all they know of git is github

That does sound like the 99% are pretty dumb then for using a tool that's not suitable for them... Or maybe it's not as binary, and Gits model with its complexity has more useful properties making the trade off worth it.

> That does sound like the 99% are pretty dumb then for using a tool that's not suitable for them...

Computing history is full of examples where technologies that are objectively not the best technologies end up being dominant. It's more about economics. (Network externalities, switching costs, ...)

Although I will admit that, with version control, there isn't even an alternative out there that is anything like an "objective winner". Each one has its problems, and it's a matter of choosing the least of the evils. -- I haven't tried any of the commercial ones though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact