Github-style rebase-only PRs have revealed the best compromise between 'preserve history' and 'linear history' strategies:
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).
What's weird about most of these discussions is how they're always seen as technical considerations distinct from the individuals who actually use the system.
The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.
With this, you can also push people towards smaller PRs which are easier to review and integrate.
The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).
If you let Github do the rebase, yes, you do. But you can do so manually yourself, taking the commit down to a single squashed commit, that you then sign.
This is a tooling issue that needs to be solved client-side (i.e. where the signing key lives). It's an important one but actually really simple.
I wonder why GitHub doesn’t apply their own signature when they rebase a commit with a valid signature from one of their users. They do that when you edit a file through their Web UI.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
Ever seen a PR that implements something in a GitHub Actions workflow? The history usually looks like: clear cache, fix path, fix variable expansion, fix command, fix command again, fix syntax, […].
The best way IMO is to interactive-rebase the branch locally (or force-push a rebased version later), but sometimes 50 commits merge into a 30-ligne single-file change and nothing beats squash.
I want the 'merge' function completely deprecated. I simply don't trust it anymore.
If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.
If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
> I want the 'merge' function completely deprecated. I simply don't trust it anymore.
Merge is perfectly fine and it is the only way to synchronize repositories without changing the history, which is very important for a decentralized system. It certainly has the potential to make a mess if used improperly, but so do rebase, cherry-pick, and basically every other command.
> If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
If you do things correctly, that is by making sure that when you merge changes from a feature branch into the mainline, the mainline is always the first parent, you shouldn't have any problem. Git is designed this way, so normally, you have to go out of your way to mess things up. If did it like that and you don't want to see the other branch commits, git-log has the --first-parent option.
Proper use of merge is table stakes. You get warned in your PR if your non-main branch is out of date with your main branch, and after you rebase and force push your non-main branch, you review the diff in the PR.
Rather than switching to "main" and pull it, you can just stay in "feature" and do a fetch followed by "rebase origin/main". Then pull "main" before you merge the feature.
I'd also use "merge --no-ff" to force an empty commit that visualizes where a feature begins and ends.
I'm not sure if you misread my comment, but my point was that it's far too easy to accidentally introduce bugs in merge commits that go unnoticed for a long time.
I've never seen a rebase gone awry introduce production bugs, but I've known multiple gnarly bugs caused by errant merges. YMMV.
as always, different circumstances can generate different results.
In a merge, you solve conflicts once. Whereas in a rebase, those conflicts will turn into incremental conflicts.
If the branch history is "tidy", with discrete, purposeful commits, this can be easier. Especially if incrementally rebasing.
The main difference is one rewrites history and the other does not. A rebase is by nature destructive and as such can introduce subtle changes in the process, especially if commits are reordered / modified in the process
It's not really destructive, though! That's not really the main difference.
The main difference is that a merge sticks around in your repo forever, a commit that people assume has no real code changes in it but actually sometimes it does. A rebase is done once, and then your git history doesn't have to deal with it ever again.
Yes, you raise a fair point that if you've dug yourself into a deep pit already with long-lived branches and overlapping work, it might be slightly easier to extract yourself from the pit with a merge. But then you're leaving that fetid pit in your repository forever.
When I said destructive, I meant in the literal sense, in that it rewrites history.
Don't get me wrong, I _often_ rebase, about a dozen times a day and it's been a core part of my workflow for 2 years. In that time I have learnt a lot, silently lost changes and ended up in a few mishaps.
I am in no way against the idea of rebasing, I frequently do. And personally, I often rebase && merge --no-ff. But, IMHO it's far too easy to mess up too that I'd adopt it as a dogma.
I also question the notion that VC history is best thought of in linear terms. I'd argue it's fundamentally flawed to force a DAG into a more linear structure.
In my experience, the desire to do this is to construct a DAG that's pretty in log viewer XYZ, rather than anything else. I consider this highly overrated. Just look at the DAG of the git project. Yes, it's intense, but the primary purpose of the history DAG isn't to immediately present a simple linear history.
Rather, it's to preserve a common, shared, decentralized history where it's easy to go back to a precise moment and see what was done to what and why. A dogmatic always rebase history is in my experience often a relatively pointless pursuit of constructing a git log --graph that's "simple" by default. Ie, rather than solving for the problem of "How to overview a DAG", the solution is to reformulate the DAG in a linear way which often conforms better to how we humans like to overview information.
Again, this is a personal viewpoint and I don't mean to pass judgement but I often find that such approaches, which one might liken to treating symptoms instead of curing the cause, is better solved the other way around. It's a real joy to delve into the git projects history, despite the fact that it's _littered_ with merge commits.
Basically, I think the quest for a "simpler" looking history DAG is somewhat overrated and not something I'd personally recommend pursuing.
First time I've seen aliases using other chars other than a-z; care to share your dotfiles?
It's a neat trick to explode your alias namespace, since you'll never see a tool published named `ls-` So you have reserved a huge "address block" for your personal aliases :)
Thank-you. I collected some from kristopolous recently [1] too.
I see d- there, so the one use of dash is used sparingly. There's a lot of git functionality I'm leaving on the table, looks like.
Currently, I'm using a shell script to help with a git conflict resolution flow that does something like
read -p 'Conflict. Resolve and press [Enter]'
And fix in a separate tmux window (git add, git cherry-pick --continue).
TUIs and autocomplete popups are nice, but there's opportunity for a deeper understanding writing one's own tools. So I'm hoping to combine ZZ and `read -p` (or similar) to coax nvi (Keith Bostic) to something for Java stuff. Or at least build some primitives around that.
Encountering the equivalent of "flash of unstyled content" when switching from text editor to a--for example--Java method chooser feels like the philosophical difference between "Let's SPA" versus "Click flashes between pages is fine."
The flow would be something like
1. In nvi, keystroke equivalent of
Ctrl-Space brings up an Intellisense
tool.
2. The tool loads up a list of
autocomplete methods as well as its
own hotkeys.
3. Pressing up and down manipulates a
temporary text file that just prepends
">" next to the line, for example.
4. And Enter somehow brings back nvi with
the method added, right after the
period (with our partial typing
replaced).
All this to say, dotfiles and git config are no big deal, but in CLI it's an escape hatch to molding a custom environment.
Unfortunately, git rebase has a very very annoying limitation that git merge doesn't. If you have a branch with, say, masterX + 10 commits, and commit 1 from your branch is in conflict with masterX+1, then when you rebase your branch onto masterX+1, you will have to resolve the conflict 10 times (assuming all 10 commits happen in the same area that had the original conflict). If instead you merge masterX+1 onto your branch, you will only have to resolve the conflict once.
Even though I much prefer a linear history, losing 1h or more to the tedious work of re-resolving the same conflict over and over is not worth it, in my opinion.
In your example, you pretty much have to change the same line, or neighbouring line, those 10 times to end in that scenario. If it's just somewhere else in the file, git auto-merging will handle it just fine.
It seems like a very contrived example to me. We have been running rebase/fast-forward only for close to 10 years now, and I have never experienced anything that unfortunate.
I run in to this quite frequently, even on projects where I'm the only one working on it (I tend to have a lot of things going on in parallel). Once branches diverge and commits accumulate it can become a right pain. Usually my solution is to merge master into the branch just to keep up to date and then just undo everything, make one new commit in the master, and rebase that. But in some more difficult cases it was "just merge and fuck it because life's too short". I've also just manually "copy/paste merged" things to a new branch, because that seemed quicker than dealing with all the merges/conflicts.
Maybe there are better ways of doing this, and arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...), but it's not that much of a contrived edge case.
> arguably I shouldn't have all these long-lived branches in the first place
This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.
Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.
A rebase and a merge result in the same code. A rebase is more error prone though. Just because someone "feels" a merge isn't as safe doesn't make it so.
Doesn't rebase use the exact same automatic merge algorithm as a merge? They are equally as likely to introduce a production bug. Especially if adding a tool like rerere into the mix to do even more auto-magic merging when you hit the differences between rebase and merge.
1) The merge auto-applies cleanly, but the merged code is wrong. This is pretty niche, usually, but happens in certain edit patterns. I've never seen this produce a syntactically-valid, semantically-invalid construct (but I suppose it's possible) so generally these are caught by the compiler.
2) The merge does not auto-apply, so you get into manual resolution. This is where things get hairy.
The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal. So you end up with a bunch of code changes that logically belong to another commit, and are grouped together for purposes of expediency.
Rebase applies your changes to another branch as though they had been made there originally. If a conflict comes up, you already have all the context needed for how to resolve it, because you just wrote that code. The fix goes where it belongs.
All I can tell you is that I've been bit by merge-induced production bugs enough times that I now work to avoid that particular failure mode.
> The merge commit really ought not have any changes of its own, but lots of people consider minor conflict resolution legal.
I'm not sure where this rule comes from. For code review, I for one normally review all of the changes that are going into master, and only look commit-by-commit if it becomes overwhelming - so, unless this is a huge merge (which should generally be avoided anyway), I wouldn't really see how this is a problem.
The only real problem I have with merging into your local branch to keep it in sync with master is the way it pollutes history when it is finally merged back into master. This is enough of a problem that I and my team always rebase unless we end up in one of these rare cases that I was highlighting.
> This is the problem here. If you have multiple long-lived branches, there's no technical solution to preventing rot -- you must actively keep them in sync.
Well, merge actually works much smoother and rebase gives a lot more grief, so the problem is with rebase.
> Regularly merging in main is the opposite of the proper solution. Constantly rebasing on top of main is the proper solution.
The "proper" solution is the one that allows me to get stuff done. The only thing that matters is how the main branch ends up looking in the end, and what I do before that isn't really all that important.
Another problem with rebase is when multiple people are working on the branch; it requires careful coordination if you don't want to lose work. Overall, just merge in main is usually the best strategy here.
Always surprising when folks are confused about how to collaborate on git branches... I'd expect the recursive solution to be more obvious!
> The "proper" solution is the one that allows me to get stuff done.
Yeah, but the stuff that needs to get done doesn't end with your commit, it starts there. Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.
> Merge commits are prone to introduce unexpected and uncaught bugs: rebases just don't.
How so? If I make an error with a rebase then I risk losing my changes. You can fetch it from the local reflog, but that's not so easy. With a merge I have a merge commit which records what was merged.
That doesn't really solve much: if both you and I rebase our personal feature branches onto master at different places, when we both try to push to the shared feature branch, we'll have a REALLY bad time - especially if we actually had to do conflict resolution.
> arguably I shouldn't have all these long-lived branches in the first place (but it works well for me, so...)
Given that this scenario is common for you but sounds contrived to others, I would argue that this doesn't work well for you. It's just familiar enough that you're willing to deal with some pain.
Short-lived feature branches sidestep this hell. Longer-lived projects can almost always be partitioned into a series of shorter mergeable steps. You may need support/buy-in from your manager, I hope you get it.
It's not a organisational/manager problem; it's just how I like to work. I often work on something and then I either get bored with it or aren't quite sure what the best way is to proceed, so I work on something else and come back to it later (sometimes hours later, sometimes days, weeks, sometimes I keep working on it until I get it right). I do this with my personal projects as well where I can do whatever I want.
I know some people think this is crazy, but it works well for me and I'm fairly productive like this, usually producing fairly good code (although I'm not an unbiased source for that claim).
In the end I don't want to radically change my workflow to git or other tooling; I want the tooling to adjust to the workflow that works well for me.
Sounds like you've never worked on a project with a file everyone wants to append to :)
If every error in your system needs a separate entry in the error enum, or every change needs an entry in the changelog - loads of changes will try to modify the last line of the file.
Even multiple appends are not that bad for rebasing - if you put the remote changes before your own then after the first commit the context for your remaining commits will be the same.
If order actually matters then yeah, git can't magically know where each new line should go.
I'm not saying these situations are impossible. But you can work towards reducing when they arise. If everyone needs to change the same file, then it sounds like something should be refactored (it's probably a quite big file as well?).
If every error needs to go to the same error enum, that sounds like an error enum that might benefit from being split up.
And if every change needs to write to a common changelog file, I would personally find a new way to produce that changelog.
If it's that big a painpoint, then I would look into different ways to get around it.
It happens pretty often when two different people are adding a new function in the same area of a file. It's likely that as you're working on that function, you'll be modifying the surrounding lines a few times (say, you have a first pass for the happy path, then start adding error handling in various passes; or, handling one case of an algorithm in each commit).
Rebase is still by far the most common case in our repo, as yes, these cases appear very rarely. But when they do happen, it's often worth it to do a merge and mess up a history a little bit (or squash, which messes with history in another way) rather than resolving conflicts over and over.
Someone else was also suggesting rerere for this use case, but I've never used it myself and I don't know how well it actually handles these types of use cases.
That alone helps a lot. Other formatting rules that avoid dense lines, and instead splits over multiple lines also have a huge impact on merge-conflicts.
It's not as contrived as you may think. I, along with what I imagine are many others, do a lot of frequent micro-commits as time goes on and the feature becomes more complete, with a lot of commits in the same area of any given file. Rebasing a development branch in this state is pretty gnarly when a conflict arises.
Sadly, my current approach is to just reset my development branch to the merge base and make one huge commit, and then rebase.
I do a lot of micro-commits as well, though I rarely find that other members of my team are doing the same, to the same files, at the same time.
When that happens, we look into if it's possible to do more frequent merges (fast-forward rebases through Gerrit, to be specific) of our smaller commits to master, so we don't accumulate too much in isolation.
I find it helps reducing bugs as well, if two or more members are doing active work in the same area in that way, it's not good to be working in complete isolation as it just opens up for bugs because of incompatibility with the work going on in parallel.
Yeah that scenario only ever happens if you have an extremely large branch that hasn't been merged into the target branch for a long time (like a feature branch that takes months to develop), which btw isn't really something that should be done anyway (always try for more frequent merge with small side branches).
How would Sapling avoid this? As I understand it it uses the same data model as Mercurial which is really the same as Git's. I think you would need something like Pijul to solve it nicely. At least as far as I can tell.
I might actually try this in Pijul because I too encounter this semi-regularly (it's not a freak occurrence at all) and my solution is basically to give up and squash my branch before rebasing.
That has its own problems. Separating whitespace-only reformatting commits from
substantive commits makes it much easier to inspect the real
changes, for instance.
Also, more fine-grain commits can help you trace down a bug, perhaps with the help of git bisect. Once you've tracked down the commit that introduced the bug, things will be easier if that commit is small.
Fortunately you can just merge from master, bringing your code back in sync with master without touching master itself. I see Beltalowda has mentioned this.
Reviewing a squashed branch is much harder than reviewing one set of closely related deltas, and then reviewing a different set of closely related deltas that happen to overlap.
To be fair, if you have 10 commits that all change the same file: squash with respect to your first commit, _then_ rebase. If you have lots of commits, always first squash-rebase to your own first commit, and only rebase to current main once that's done.
Rebase is being annoying here mostly because it's doing exactly what you want it to do: warn you about merge conflicts for every commit in the chain that might have any.
If you have ten different commits all touching the same part(s) of the same file(s), dial down your granularity a little: you've over-committed.
Either that, or you lobbed 10 different issues into the same branch, which is a whole different barrel of "no one benefits from this, you're just making it harder to generate a changelog, can you please not" fish.
It often amuses me that some people will say "git is actually easy, you just need to know git commit, git pull, git push, and git branch", but when you go into the details, you find out you have to learn a hundred other rarer tools to actually fix the 5% or 1% use cases that everyone eventually hits.
For what it's worth, I had heard of git rerere before, and have looked at the man page, but haven't understood how it's supposed to work, and haven't had time to play with it to see how well it actually works in practice. `git merge` or `git squash` and accepting a little bit of a mess in history seems much easier than spending time to learn another git tool for some use case, but I fully admit I may be missing out.
When you hit a merge conflict, rerere (re)members how you (re)solved it and (re)applies the same fix when the same conflict happens again. But using it can create a new problem/annoyance: If you make a mistake with the initial resolution, and revert the merge/rebase to try again, it'll remember the wrong one next time. So you have find and tell it to forget that resolution.
Yes. Usually I just squash merge to main and then `git checkout my-branch; git rebase --hard main`. Sure it squashes all the commits, but keeping them all is nearly never needed.
Can I asked how they converted you (or do you mean by dictate, as opposed to becoming convinced it was better)? I find myself loving merges and never using rebases. It's not that I cannot describe technically what's happening, but I just don't understand the love.
(Not the person you replied to, but a passionate rebase-preferred) For me there are two reasons - one aesthetic, one practical.
The aesthetic reason is that it tells a more coherent story. The codebase is a single entity, with a linear history. If I asked you "how old were you last year", and you asked "which me are you asking about?", I'd be confused. Similarly, if I want the answer to the question "what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions. `HEAD^` should only ever point to a single commit.
The practical reason is that it discourages a bad-practice - long-lived branches. The only vaguely compelling reason I have heard for merge commits is that they preserve the history of the change, so that when you look at a change you can see how it was developed. But that's only the case if you're developing it (in isolation) for a long-enough time that `main` will get ahead of you. You should be pushing every time you have a not-incorrect change that moves you closer towards the goal, not waiting until you have a complete feature! If you make it difficult to do the wrong thing _while also_ making it easy to do the right thing (too many zealots forget the second part!), you will incentivize better behaviour.
(Disclaimer - I've been lucky enough to work in environments where feature flagging, CI/CD, etc. were robust enough that this was a practical approach. I recognize this might not be the case in other situations)
And yeah, I'm kinda intentionally invoking Cunningham's Law here, hoping that Merge-aficionados can tell me what I'm missing!
> what was the codebase like at this point in time // immediately prior to some point?", you shouldn't need to ask clarifying questions
I would assume that such a question would talk only about the main branch. However, I will point out that "what was the state of feature X" is only answerable with a non-linear story.
> The practical reason is that it discourages a bad-practice - long-lived branches.
Wait, long-lived branches are bad? Merging in partially done features is good? That seem insane.
First, if the feature is small enough to knock out in an hour, that's great. But sometimes it can take a couple of days. I should hope you have enough activity that the main branch will move in that time.
But committing partial features is crazy. Sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned. Other times, a feature requires changing something (e.g. an API) where a partial change cannot really work - and sometimes where you need to have a meeting before you do it. Consider the feature to be "update dependency X", which means you now have some number of bugs to track down due to the new verison.
Heck, sometimes a feature might need to be mothballed. Sometimes you have to wait for an external dependency to be fixed. And you can chuck your work, commit something broken, mothball it and come back when the external dependency is fixed or switch your dependency.
> long-lived branches are _bad_? Merging in partially-done features is _good_?
...uhhh, yes? I've never heard anything to the contrary. Can you explain why you think the opposite?
For long-lived branches: The longer a branch exists separately and diverges from main, the more pain you'll create when you try to merge it back in - both because of changes that someone else has made in the meantime (and so, conflicts you'll (possibly) have to resolve), and because you are introducing changes that someone else will have to resolve. The pain of resolving conflicts scales super-linearly - it's much better to resolve lots of small conflicts (ideally, so small that they can be algorithmically resolves) than to resolve one large one. Plus all the arguments from the point below...
For checking-in early and often: flip it around - what is _better_ about having the change only on your local repo, as opposed to pushed into the main codebase? If the code's checked in (but not operational - hidden behind a feature flag), then:
* your coworkers can _see_ that it exists and will not accidentally introduce incompatible changes, and will not start working on conflicting or overlapping work (yes, your work-planning system should also account for that - but extra safety never hurts!)
* if you have introduced a broken dependency, or a performance black-hole (which might only be possible if you're running your code in "shadow mode", executing but not affecting the output until it's fully ready - which, again, is only possible if you check in early-and-often!), you can discover that breakage _early_ and start work on finding an alternative (or, if necessary, abandon the whole project if it's intractable) earlier than otherwise
In fact, to take your example - "sometimes you realize the way you are implementing it (or the whole feature) is a bad idea and all the work should be orphaned" - yep! This happens! This is not a counter-example to my claim! Orphaning an inactive "feature" that has been pushed to (but not fully activated in) production has no more impact than abandoning a local branch. Even orphaning a feature that has been partially activated is still fine, so long as it didn't result in irreversible long-term state-updates to application entities (e.g. if it added a "fooFeatureStatus" to all the users in your database, rolling it back will be tricky. But not impossible!). So there are very few (or no) downsides, and all the advantages I described above.
I do agree that API changes are the one exception to this rule - you should have those reasonably nailed down and certain before you make changes, since those affect your clients. But any purely-internal change which can be put behind a feature flag, on an inactive code path, in shadow mode, in canary/onebox testing, or any other kind of "safe to deploy in prod, but not _really_ affecting all of prod" - do it!
I'm not advocating branches should be made longer for no reason, but I see no reason to avoid them. I do think they should be made long if they need to be to encapsulate a feature. I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.
I mistyped at one point by saying to avoid a partial-feature commit when I meant partial-feature merge onto the main branch. Yes, commit to the feature branch often. Hopefully clarifying that resolves most of the issues that you raised as advantages.
Meanwhile, managing partially built features by feature flags seems worse. It has orphaned code migrate into the main codebase and stay there. You brought up a broken dependency. What happens if a dependency is broken and not likely to get fixed for a month? Just leave that code in the main codebase orphaned for a month? Further, having multiple partial feature commits complicates bisecting or simple reading a feature's history.
I concede feature flags for deployment has some advantages, especially for feature specific elevation through testing.
> I don't think that the pain of resolving conflicts scales super-linearly and that idea doesn't make sense to me. In fact, I think the opposite is true. I admit, that could be a taste issue.
Then we'll have to agree to disagree, as this is pretty fundamental to my argument - everything else ("Your coworkers get to see what you're working on and will notice clashes of intention earlier", "You can run incomplete features in shadow-mode to ensure they don't affect performance in production", etc.) is just sugar.
I really appreciate your well-reasoned and civil discussion!
Maintaining orphaned code has a cost. Keeping a change you've made to a function (and its callers) that's no longer needed obscures both the history and probably what it does.
Not saying trunk-based is wrong, but to say abandoning a feature is as cheap as in branch-based development fails to account for everything.
In my case, I switched rapidly to git-rebase because it produces history that is much cleaner and easier to understand. I only do merge if there is a good reason to preserve history (e.g. some other branches depend on it, or some test reports refer to a given commit).
You should do reverse rebase(if it makes sense lol) for this. Instead of rebasing branch to master, rebase master to branch. The only downside is that it requires many force push in the branch.
Yeah force push on master is a huge no no - I can't even remember the number of times I've force pushed wrong shit in a hurry - I can't imagine entire team dealing with this.
I would first say that I would sooner re-code the whole feature by hand from memory than ever rebasing master onto anything for any serious project.
Even if we were to do that, rebasing master is likely to lead to the same issue.
My preferred solution is rebase featureB onto master for the 99% or 99.9% of use cases where this is smooth, and in the rare case that you have too many conflicts to resolve, merge master into featureB (and/or squash featureB then rebase onto master, depending on use case).
Both tools are pure vandalism compared to merge. Among the two, cherry-picking is preferable in this case because you're "only" destroying your own history, so in the end, it's your funeral.
> Developer end up fixing additional issues in the merge commit instead of actual commits.
A merge commit IS an actual commit, in every sense of the word. The notion it somehow isn't, is what you need to get rid of.
I think merge is great, having a “unit” for a feature branch being integrated is nice and not all things can be done in commits which are individually justifiable. The ability to bisect cleanly through the first ancestor is precious.
I do agree that resolving conflicts in merges is risky though. It can make sense when merging just one way between permanent branch (e.g. a 1.x branch into a 2.x), but as soon as cross merges become a possibility it’s probably a mistake.
> I do agree that resolving conflicts in merges is risky though.
How do you do otherwise, though? Or is your workflow a combination of rebases and merges? Continual rebasing of the feature branch onto `main` and then a final merge commit when it's ready to go?
> Or is your workflow a combination of rebases and merges? Continual rebasing of the feature branch onto `main` and then a final merge commit when it's ready to go?
Yes. You don't usually need "continual" rebasing, most commonly just once just before merging.
In fact a good merge tool can do it for you (and refuse to merge if there are conflicts when rebasing).
This work if you have only experienced professional developpers in the team. If you have juniors or non devs (mathematicians, geographers, qwants...) that just happen to also code, rebase is a minefield. This is espacially true in open source contributions.
Merge and conflict resolution is a minefield if unexperienced developers do it too. Fortunately it can (often) be arranged that those with some understanding of the issues involved can do the resolution.
I usually rebase the branch onto the upstream branch (master or main or whatever) if there are merge conflicts. You can then resolve the conflicts commit by commit. This requires force pushes, but they are are not normally a problem because only one dev tends to work on a particular branch before it's merged.
If you do have multiple devs working on the same branch, use `git pull --rebase` to stay in sync with each other, don't use merges and leave lots of merge commits. If you need to resolve conflicts with upstream, make sure other people have stopped working on the branch, rebase it, then merge.
Rebasing is a tool of last resort, when something has so fowled up the code that merging a large-scale refactor is even more time consuming.
Rebasing takes longer and is actually more prone to error because of the clunky interface. There is absolutely nothing wrong with squashing commits in a feature branch and merging that into master/main. In fact, it's generally better for the health of the repo and the mental health of developers.
in my experience, rebase works great if the commits are structured and much more painful with lots of overlapping changes, say by continiusly doing _wip_ commits every hour
I certainly am not perfect to the degree that I make a single commit or a relatively small number of “structured commits” to any branch I’m working on. Neither is anybody else (regardless of whether they think they are). Anyone who tailors their commit structure around a poorly designed tool interface is just wasting their own time, and therefore the company’s, in my opinion.
Well, I do, more or less. Whether it's a waste of time or not is dependent on how efficient you can do it, in my workflow it's efficient, and to which degree you can avoid waste as a consequence of a "messier" approach. For me, it evens out.
I don't understand the hate for merges, or the love for rrbaded. Let's consider what may happen using a github flow strategy (main branch, feature branches based solely on main):
* If you screw up a merge, you undo the merge commit. Now your branch is exactly as it were. May not happen with a rebase.
* If you push some code to the remote, and later find out it was outdated, you can merge it with main and push again: no need to force, github can distinguish what's already been reviewed and what hasn't. With rebase, you may need to push -- force, and if someone already reviewed the code they're going to be shit out of luck, as github will lose the capability to review "changes since last review", as the reference it has may have been lost.
I also merge these features using squash commits, which provides a very linear history. This also saves some effort (you don't need to be rebase the commits in the feature branch, which can be a pain in the ass for unorganized people and git newbies, and you are pushed towards making smaller, granular PRs that make sense for the repo history).
I usually do `git merge --no-ff development` when working on my feature branch. We do not leave feature branches "open/live" for too much time, so merge conflicts are not usually a problem, but sometimes they do happen.
I like cherry-pick, but I barely use it (e.g., I need to cherry-pick one commit from branch X into my branch). I don't like rebase much because it requires force-push.
What I would like to see is a way to enforce fast-forward only merges along with the forced creation of a merge commit that references the same git tree as the HEAD commit of the branch that was just merged.
This way, you know which set of commits was in the branch by looking at the parent commits of the merge commit, but the merge commit itself did not involve any automated conflict resolution.
I've wanted this for awhile as well. Squash only merges, which are enforceable in github, get you close but leave you without any automated way to determine if a given branch was ever merged to main or not ...
(a) Because rebase is run on the branch to be rebased, but merge is run on the branch being merged to, that reverses the parents of the merge commit and puts it on the rebased branch rather than the parent.
(b) Even if that were fixed, it alters the rebased branch, rather than stopping and warning in an unexpected case.
I really do want the natural semantics of merge --ff-only --no-ff.
It is also a security risk. Someone could add whatever unreviewed code and it would get glanced over as a merge commit. Put your payload in an innocuous file not likely to be touched and call a boilerplate-looking function as a side effect from somewhere.
Reviewing merge commits is harder because they will sometime have huge diffs to both branches.
Rebasing is the process of redeveloping your feature based on the current master. This is smaller, easier steps to review later.
It is a pitty that we can't have tooling to create "hidden" merge commits to allow to connect rebased branches, this would retain the history better and allow pulling more easily.
If you have bunch of commits in a feature that are related it is easier to revert merges (even if you do pre-merge rebase from master and then merge with --no-ff)
I'd like a "quick ff" that will ff if there are no conflicts, or ff as far as it can with no conflicts - and an easy way to apply to many branches.
Also, a way to "rebase" that works the same as cherry picking commits on top of the target. As far as I can see, the regular rebase works it's way up the target branch, so that I end up resolving conflicts in code that eventually changed in the target.
> Developer end up fixing additional issues in the merge commit instead of actual commits.
As long as the merge commit is being reviewed with the rest of the PR, that's fine, right? (We use rebase while working on feature branches, and then squash & merge for completed PRs, which seems to be the best of both worlds)
Personally, I believe merging 'master' to your feature branch is the wrong model... what one should do is create a new branch from master and merge the old branch into it.
Why? Merging master into the feature branch is done so that you can test the conflict resolution in the branch before inflicting it on everyone. It’s also done on a regular basis in longer running feature branches to prevent large conflicts from accumulating- you can merge master into your branch multiple times to stay current with master before ever merging back into master. I’m not sure why parent says this causes them to lose track of which changes happened in which branch. The history does get a bit more complex at a glance, but for any given commit, it’s easy to pinpoint their origin if using only merge commits. It only gets harder if you accidentally rebase someone else’s commits along the way. For smaller feature branches and smaller projects, it’s okay to merge branches into master, but for large branches, large projects, large teams, and teams that care about testing, merging master into feature branches is a best practice. What makes you consider it ‘wrong’?
A merge commit is just a commit with two parents. You're not affecting the master branch at all when you "merge in master", you're just creating a new commit where the first parent is your branch, and the second parent is the master branch.
If you do things the way you're suggesting, you'll make it really hard to tell what commits were made on your branch. Git clients tend to assume the first parent is the branch you care about.
I have never had issues with merge, unless rerere was enabled. I've had some extremely surprising results recently with it enabled and I finally disabled it for good.
I don't know how stupid this is on a scale from 1 to 10. I've created a wrapper [1] for git (called "shit", for "short git") that converts non-padded revisions to their padded counterpart.
Examples:
"shit show 14" gets converted to "git show 00000140"
"shit log 10..14" translates to "git log 00000100..00000140"
Mercurial always has had sequential revision numbers in addition to hashes for every commit.
They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.
But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.
Seems like a design fault in git that commits only have a single id (sha1 hash) and that hashes are written without any prefix indicating which type of id it is.
If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.
E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.
I was going to solve some business problems today but instead there became an urgent need to GPU accelerate the task of making my commit hash appear to have the rich semantics of "a number that goes up". Hm, I bet this old FPGA could be repurposed to add a 2x speedup...
I don't understand — the example in the article adds the string "magic: MTQIpN2AmwQA" to the commit message. The final hash is hexadecimal, but what you feed into it isn't.
I haven't checked your codebase so I don't know how easy it was but damn, you replaced your backend within 16 minutes according to your comment timings.
A git commit is composed of a number of headers (key: value fields) and a commit message.
There is a set of "standard headers" (tree, parent*, author, committer, encoding?), but then you can add more. In fact there's a set of semi-standard headers, as in headers git itself will add under some conditions: `gpgsig` and `gpgsig-sha256` if the commit is signed, and I think `mergetag` for signed tags. They are documented as part of the signature format but they're not "baseline" features: https://git-scm.com/docs/signature-format#_commit_signatures
But because of this, a git client should support arbitrary commit headers, and round-trip them even if it does not expose them.
I was trying to make something like this post a couple of years back and used custom headers, even made this repo with a few zeroes with no salt on the commit message and no shenanigans in the files: https://gitlab.com/pedroteosousa/teste/-/commit/000000005093...
I have this ugly code that finds the salt given the target hash, and another that actually creates the commit given the salt. Is not very useable, but I'll leave it here for anyone that finds it interesting: https://gitlab.com/pedroteosousa/useless-git-tools/-/tree/ma...
I figured that a good option would be to slightly change the date. I don't know what the date resultion us but shuffling it around by a bit shouldn't be an issue.
Of course if the date only has seconds resolution it may be to big of a shift to be reasonable.
I fail to see the point of this, in fact, I think this is a fundamentally flawed approach to dealing with your revision history. The problem is that rebasing commits has the potential of screwing up the integrity of your commit history.
How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.
The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
> The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.
Pretty analogy, but I don't see how a specific functionality of git (commit history) that has no use case other that looking tidy compares to a handle of a hammer.
This somewhat depends on how big your features are. Arguably, large long-lived feature branches are the problem themselves. If larger features are broken down and developed/merged piecemeal, then you still have smaller commits you can fall back on.
IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.
I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.
I think the fundamental misunderstanding people with your point of view have regarding linear commit histories is that it's not just about different VCS usage, the entire development process is changed.
When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.
> How are you going to deal with non-trivial feature branches that need to be integrated into master?
That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.
It is amazing how much time projects seem to spend on rewriting history for the goal of displaying in in a pretty way. Leaving history intact and having better ways to display it seems far saner. Even after a merge, history in the branch maybe useful for bisect, etc.
Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.
This isn't the end of the world, but something to consider.
I think the sweet spot in Developer productivity was when we had SVN repos and used git-svn on the client. Commits were all rebased on git level prior to pushing. If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.
We performed code review with a projector in our office jointly looking at diffs, or emacs.
Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.
> If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.
We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.
Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.
I once interviewed a junior-ish developer who told me that his then-current team had a dunce cap to be worn by whomever broke the build. I copied it immediately. There was no toxicity, it was a good laugh, and as manager I wore it more than once being a bit too liberal with my commits.
On another team I was on, in 2002 using CVS, we had an upside-down solo cup as a base for a small plastic pirate flag. If you were ready to commit, you grabbed the pirate flag as a mutex on the CVS server. Of course, this turned competitive… and piratical.
I despair about long-lived git feature branches and pull requests. The pull request model is fine for open source development, but it’s been a move backwards for internal development, from having a central trunk that a team commits to several times a day. The compensating factors are git’s overall improvements (in speed and in principled approach to being a content addressable filesystem) and all of the fantastic improvements in linters and static analysis tools, and in devops pipelines.
> I despair about long-lived git feature branches and pull requests
This comes up a lot - multiple people on this thread have even said that it's a bad idea to have a long running feature branch.
This seems like a case of the tool imposing it's will on workflows, rather than enabling them. Not all features are tiny. I don't see anything wrong with a long lived branch if the feature is in fact large. After all it may be completely redesigned multiple times over before being merged into the main branch. Or it may never make it.
And no I don't think it always works to break down a large feature into smaller ones, because your course may change as you go, and it's much easier not to have to revert incremental features when it does.
But people are so worried about having a perfect history. So they rebase. But if it's a long lived (shared) branch you don't want to do that. So now what? A merge would be ugly, can't do that. So now you've painted yourself in a corner for no good reason.
A long lived branch was a pain even in the CVS days. I'm in particular thinking about the "aviary" branch (for Phoenix/Thunderbird) Mozilla had for quite a while.
Of course tooling can make it harder — there was no such thing as rebasing on CVS.
A long lived feature branch is not a problem if you rebase it to master often. Move all refactoring to the beginning of the branch and merge them to master if they become too many.
For long-lived feature branches that are the target of multiple smaller PRs, history should never be rewritten. I call these branches integration branches. I agree with you wholeheartedly that master should be merged into the branch. It's also so much easier to resolve merge conflicts all at once in the merge commit rather than iteratively for each commit. Also, the information on how merge conflicts are resolved is then preserved in the merge commit. It's critical however that when you merge the branch back into master, you use --no-ff. It gets really confusing when you fast forward master to the feature branch.
The solution for it being ugly is to look at main/master's history with the --first-parent option. This lets you cut through the noise and just see the true history of the main branch. Without --first-parent, you see a bunch of commits in the log that master was never actually pointing at. This is why it's critical that you use --no-ff when merging these 'integration' branches as I call them. It's important that the integration branch is the second parent of master.
But what you are describing still isn't good enough for a lot of people, because even though `--first-parent` hides the noise it's still there and just knowing there's a mess under the rug is enough to be problematic.
I don't think it's really the fault of the tooling, moreso with what is a common interpretation of what is a mess and what isn't. If the github commit history allowed you to show `--first-parent` maybe it would be less of a problem.
> The pull request model is fine for open source development, but it’s been a move backwards for internal development
The more paranoid would claim that requiring PRs that then require approvals prevents a malicious engineer from adding an obvious back door to the code.
You would hope you can trust your co-workers, but sometimes a hack is an inside job.
No joke, a few weeks ago a colleague from university shared a few anecdotes about his mentor-coworker-boss at work with me, and it's similar. Every time they broke the production branch and the boss had to change the code or pull out some AWS magic to restore a database, he would give the fixed commits names like "Cgada de [Employee Name]" which roughly translates to "[Employee Name] F*ed Up", since he knew they wouldn't forget it that way.
It's specially cool given that he would always see his employees' f*k-ups as learning opportunities. He would always teach them what went wrong and how to fix it before shaming them in the git history. He always told them he did it to assure they wouldn't forget both the shameful f*k-up + the bit of learning that came along with it. They always laugh it off and understand the boss' intentions. It isn't harsh or anything.
Our whole practice revolved around not pushing broken code because all code was tested locally prior to the push. In fact we practiced continuous integration as in its original meaning, integrating code multiple times per day. Releases were performed from a release branch so the occasional hiccup wasn’t worse than let’s say a PR that does not build. However fixing the broken build was the TOP priority if it happens (like every two months)
It is literally the definition of toxic. It is the antithesis of making it okay to fail, having the entire team to take responsibility. Instead individual mistakes are highlighted and publicly shamed. How can you possibly not think this is toxic?
Toxic is not the highlighting of breaking the build with a trophy, it's what gets associated with it.
Imagine an "ugliest shirt" trophy, given out to whoever wheres the ugliest shirt of the week. At a fashion magazine, this may be toxic shaming. At a tech-heavy startup it might have people start buying the worst shirts they can to try to win it.
If the attitude associated with getting the trophy is condemnation, that's bad. If it's a reminder that everyone fucks and be careful, that's fine.
Oof that hits a sore spot. I was the 2016 Winner of the Ugliest Shirts Award at one of the first technology companies I worked at. Being singled out in front of all your peers for poor fashion sense and then the ensuing obligatory laugh ruined my opinion of that company's leadership. I would strongly encourage anyone in a professional environment
(especially those in leadership roles) to keep comments on appearance to yourself.
I'm sorry to remind you of a bad time. I would like to point out that "2016 Ugliest Shirts" is a pretty different concept from "Person who wore the ugliest shirt this week" with a picture of you in a ratty beloved tee. It sounds like those were year end remarks, which means instead of judging an act they were judging your long term taste. Also it implies the most memorable thing about you was your shirt choice. And lastly, you weren't anticipating it, so you found out everyone was secretly judging you on something.
If, during orientation you were told a trophy gets given out every week for it, and some people wear really ugly shirts each Friday to try to win it, it would have felt very different.
But yeah, year end humorous awards like that probably belong confined to episodes of The Office.
Uh no, and please never work with me. The definition of "making it okay to fail" is a pat on the back and a retrospective to figure out what went wrong and prevent it from happening again.
I'm not sure why you think a humorous plush toy precludes any of the other things you mention (retrospective, etc). I see a plush toy as something that makes failure an amusing thing to laugh at, rather than something to be hung up about.
But don't worry. At your request, I will not work with you.
What I meant is that we know, inside our blood cells, that breaking the build can happen to anyone, and probably will. The trophy is not public shaming, it's the camaraderie that comes from shared humility.
You say to somebody downthread "remind me never to work with you". I would find it difficult to work with someone as hyper sensitive -- on other people's behalf, yet! -- as you seem to be in this thread.
I have my old team's rubber chicken and I'm never giving it up.
In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.
Yep. It makes a lot of sense for open source where gate keeping makes sense (to reduce feature bloat, back doors and an inflated API surface that needs to be maintained almost indefinitely).
Most corporate code bases are written by a smallish team operating under tight time constraints so most contributions are actually improving on the current state of the code base. Then PRs delay the integration, and lead to all kinds of follow up activities in keeping PR associated problems at bay. For example the hours wasted by my team in using stacked PRs to separate Boy Scout rule changes to the code from the feature is just abnormal.
Absolutely. In my experience, it’s only “not toxic” to a few people, and for most others it is toxic, but the people who like it won’t ever be able to see that.
exactly. even if the current team is cool with it, team+1 may not be, and now they're in a position that feels shitty to them. it's good 'ole boys club shit.
people brag about their dunce caps, "john's fault" commit messages from managers, and other forms of public shame as a badge of honor when it would be so much more interesting to here about how they fixed their broken processes that led to the problems in the first place.
"oops, a developer fucked up the prod db" says more about the org and its processes than it does about the developer.
For the record: I am not recommending people to adapt a toxic culture.
What I would like people to take away from these discussions is the curiosity to question established practices and processes and re-evaluate the cost-benefit ratio of process steps just like the manufacturing people I write software for continue to optimize their working mode again and again
In this case I reminisced about the toolset but the work flow is what brought the value so I advise of course against using subversion.
Look up trunk based development and read the continuous integration book published by Addison Wesley (Is it the hez humble book or the Duvall book I always confuse the authors, both books are great though).
The hard part will be to convince people of exploring a different way working mode AND to learn that what is proposed is not an anarchist style of development but a development model that optimizes on efficiency
So I'm thinking about my approach, which is "use commits as game save points, mostly WIP, then use rebase to tidy things up before publishing".
Wouldn't working on trunk still mean I'm working on a feature branch, but it all ends up squashed into a single commit? Or do I lose my opportunity to polish?
It has been my habit for a while to make the root commit 0000000 because it’s fun, but for some reason it had not occurred to me to generalise this to subsequent commits. Tempting, very tempting. I have a couple of solo-developed-and-publicly-shared projects in mind that I will probably do this for.
I bet I wasn't the first person who thought this would have to be done by modifying actual file content — e.g. a dummy comment or something. That would clearly have been horrible, but the fact that git bases the checksum off the commit message is... surprising and fortunate, in this case!
It's a hash of everything that goes into a commit, including the commit message. The idea is that nothing that makes up a commit can change without changing the hash.
> It's a hash of everything that goes into a commit, including the commit message
... and, very notably, the hash of the parent commit. That is also part of the commit, which means that changing a parent commit would also imply changing the hashes of all later commits. This is sort of the whole point of git/version control.
This might be a stupid question, but does anyone call git history a blockchain, then? A centralized blockchain, without proof of work or proof of anything really of course, but still, it sounds like the basic blockchain idea is there
I find tags to be a fairly useful way of providing a linear progression, but I guess that's no fun.
> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.
Yeah, I think tags are a more practical way of accomplishing this. If you’re really interested in having a linear history, it might also make sense to switch to an alternative. Mercurial has linear version numbers and can even push to Git repositories.
At risk of coming across as a humorless Hacker News commenter, I will add that I enjoyed this post. It’s a neat hack!
You could automatically tag each uploaded commit with a number drawn from a sequence - using a git post-update hook. The only problem is that this centralizes the process. It's not possible to have fully "blessed" commits without pushing them first. And that's how SVN works, too.
Basically, not to put too fine a point on it, I believe that distributed version control is a problem no one ever truly had, and no one intends to ever have in the future.
I mean: Imagine going back in time 20 years to when git, hg, and bzr were created and telling the creators of those tools: "Hey, while designing your technology, you should be aware that it'll end up being used as a worldwide centralized monorepo run by Microsoft, and no one will ever use any of that distributed stuff."
They'll either laugh you out of the room or you'll be in trouble with the Department of Temporal Investigations for polluting the time line, because what we currently understand as git sure as hell won't be the design they'll come up with.
So for me: I prefer centralized. And SVN is just a reasonable one to use.
It's worth having distributed version control just so you can work on your own with your own branches and crap and only bother others when you're ready to share. And so you can work seamlessly when offline.
SVN feels like working in someone else's kitchen while several other people are trying to cook in it, too. It's hell. I prefer that we each have our own kitchen and bring our dishes to the table when they're ready.
I've also repeatedly found git a suitable (if not great—if they'd put all their effort behind libgit2 and make that the official implementation, that'd help a ton) tool to form the foundation of larger systems. It's a damn fine toolbox for attacking certain problems. SVN wouldn't have been much help in any of those situations.
> it'll end up being used as a worldwide centralized monorepo run by Microsoft, and no one will ever use any of that distributed stuff.
And I thought I use git in a decentralized fashion all the time … at least I don't need to connect to any other machine when committing, switching branches, merging, rebasing, etc. And my colleagues can do the same without any network connection at the same time.
Also, while it has the biggest brand recognition, not everyone is using GitHub for all their repositories, are they?
> I believe that distributed version control is a problem no one ever truly had, and no one intends to ever have in the future.
Sure. The problem is not "distributed version control", some problems are:
- I'm on a train with no internet, finished working on a thing and want to start working on another thing and don't want to mix them up.
- I want to make a branch and don't want to wait for eons while everything gets copied on the server.
- Oops there's a problem with the server now no one can perform any work.
Yes, SVN might simple commands, but its internals are messed up. Git's UI sucks, but just learn about blobs, trees, commits, branches (pointers to commits), and you basically understand how Git works.
It is distributed because everyone has a copy of the full source (nobody's copy is the copy) and you can push and pull from any machine. I can literally push from my laptop to yours (if I have an account on your machine) and you can pull from mine to yours. Github's copy of my code is exactly the same as Dave's copy. It just happens to have a fancy web interface.
In practice of course almost nobody uses Git to push/pull from other people's personal machines (I think I've done it once ever). But it's pretty common to push and pull from multiple hosted repos (e.g. Github and an internal company Gitlab). I imagine doing that sort of thing with SVN would be a right pain.
...so you are among the 1% who use the functionality that causes 99% of what makes git's mental model so convoluted and hard to learn (for everyone, not just the one-percenters).
That does sound like the 99% are pretty dumb then for using a tool that's not suitable for them... Or maybe it's not as binary, and Gits model with its complexity has more useful properties making the trade off worth it.
> That does sound like the 99% are pretty dumb then for using a tool that's not suitable for them...
Computing history is full of examples where technologies that are objectively not the best technologies end up being dominant. It's more about economics. (Network externalities, switching costs, ...)
Although I will admit that, with version control, there isn't even an alternative out there that is anything like an "objective winner". Each one has its problems, and it's a matter of choosing the least of the evils. -- I haven't tried any of the commercial ones though.
“
So we only have one option: testing many combinations of junk data until we can find one that passes our criteria.
“
I have a somewhat related interest of trying to find sentences that have low Sha256 sums.
I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com
I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.
The article talks about eight-character prefixes later in the article, but Git short refs actually use seven-character prefixes when there is no collision on that (and that’s what’s shown earlier in the article). So you can divide time by 16.
For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.
I left some details out of the post to make it shorter.
What I’m actually is doing is generating a 7-digit incremental number followed by a fixed 0. Some UIs show 7 characters and some show 8, this felt like a nice compromise. Plus it’s easier to distinguish between the prefix and the suffix when looking at the full SHA when they are always separated by a 0.
Git hasn't used "seven-character prefixes when there are no collisions" in a long time.
It's a combination of the "repo size" (as in, estimated number of objects) and a hard floor of seven characters.
You can see this by running "git log --oneline=7" on any non-trivially sized repository (e.g. linux.git). There's plenty of hashes that uniquely abbreviate to 7 characters, but they're currently all shown with 12 by default.
There may be some extra trigger that causes it to go beyond seven for everything, I don’t know (never worked on a repository anywhere near that large), but there’s certainly still at least some form of collision logic in there (and this is why I said what I said, because I’ve used lucky_commit enough to experience it):
$ git init x
Initialized empty Git repository in /tmp/x/.git/
$ cd x
$ git commit --allow-empty -m one
[master (root-commit) 4144321] one
$ git log --oneline
4144321 (HEAD -> master) one
$ lucky_commit
$ git log --oneline
0000000 (HEAD -> master) one
$ git commit --amend --no-edit --reset-author --allow-empty
[master 3430e13] one
$ git log --oneline
3430e13 (HEAD -> master) one
$ lucky_commit
$ git log --oneline
0000000f (HEAD -> master) one
$ git reflog --oneline
0000000f (HEAD -> master) HEAD@{0}: amend with lucky_commit
3430e13 HEAD@{1}: commit (amend): one
00000005 HEAD@{2}: amend with lucky_commit
4144321 HEAD@{3}: commit (initial): one
$ git reflog expire --expire=now --all
$ git reflog --oneline
$ git log --oneline
0000000f (HEAD -> master) one
$ git gc --aggressive --prune=now
Enumerating objects: 2, done.
Counting objects: 100% (2/2), done.
Writing objects: 100% (2/2), done.
Total 2 (delta 0), reused 0 (delta 0), pack-reused 0
$ git log --oneline
0000000 (HEAD -> master) one
Yes, there's also a collision check, but it's not truncating to 7 characters and adding as needed to get past collisions. Rather it's truncating to N and then adding as needed. N=7 for a new repository, but it'll relatively quickly bump that to 8, then 9 etc
You don't need a very large repository to start bumping it up to 8 etc. E.g. my local redis.git is 9, some local few-thousand commit (partially automated) that I've only ever added to are at 8 etc.
This changed in v2.11 released in late 2016[1], but because the observable default on a new repository is 7 the "it's 7 unless collisions" has persisted in various places online.
All of which is to say that if you brute-force the first commit to be 0000001..., it'll start being displayed as <that><x>, where <x> is a random 0..9a..f character, unless you brute force to 8, 9 etc.
I had in past some teammates merging master into a very old branch. This get's pushed back into master with every past history already committed. Could someone suggest series of command so that only their latest updates are separately moved to latest version of master on current or new branch?
Clever and tempting. I would maybe like to use a smaller prefix but ensure a 0 suffix to the number too, to make it easy to read anyway. Like 00010bad, 00020fed, 00030be1, etc..
Wraparound doesn't really matter, as long as it's spaced long apart.
I love linear git! Branches are very confusing for a nonempty set of people. For us, it is always clearer to work with explicit files in the main branch. You are implementing a new feature? Nice: just create a new file on the main branch and keep updating it until you add it to the tests, and later you call it from the main program. This system may break down on large teams, but when you are just a handful of grug-brained developers, it's perfectly appropriate.
This doesn't handle the reasonably common case very well where someone is working on changes which are constantly breaking the branch for everyone else. They should have their own branch and be frequently rebasing/merging so as to not disrupt others.
Also exploratory branches where any nonsense may go on (that may end up being merged, at least partially!). Also test/development vs. production branches! One may be broken, the production branch should ideally never be in a state that cannot be deployed.
That said, keep the branches limited and try to keep them 'linear' in the sense that you don't want to be merging between 100 different non-main branches in some byzantine nightmare. Perhaps encourage merges only to the development branch and then rebranching.
> Also exploratory branches where any nonsense may go on (that may end up being merged, at least partially!). Also test/development vs. production branches! One may be broken, the production branch should ideally never be in a state that cannot be deployed.
Well, why don't you simply copy the code into a new directory and commit that? Then you can do whatever you want in the scratch directory.
To me that seems messier than a new branch. For one, how are others to know my files are test/scratch/feature branch files? I'd have to use a naming scheme, and some kind of other signal to make sure nobody imports them before they're ready or mistakes them for deployable files - and at that point I'm just replicating a branch!
> and at that point I'm just replicating a branch!
Yes. My entire point is that you can always replicate branches with actual, explicit files, and that this is a good thing to do because files are (very often) better than branches. Files plus some editor discipline are essentially equivalent to branches.
Files+discipline are better than branches, according to standard unix philosophy: (1) everything is a file, (2) protocol not policy.
I don't see why this would be bad practice. If you complete half a feature during your work day then you may well want to commit it (and likely push it to a remote). Merging it to a branch others are working on is likely to be worse than not merging it until it is complete as it may simply be in a not working state.
There's nothing special about commits. Feel free to commit as many broken and non-working things as you feel like. It's not much different from saving in your editor.
It's in master (or your production branch etc) where you only want Commits That Work.
Btw, if you are re-factoring your types, you won't be able to hide that from your compiler via a simple feature flag.
Do it on a single commit? (or contiguous series of commits) It's not going to conflict with any other branch because there are no other branches ;)
I guess that large refactorings/reorganizations are harder if you have many branches, because they will inevitably lead to merging conflicts. On a linear setup, you don't have this problem.
I don't understand your comment. The method that I describe only requires that the programming language ignores unused files. As far as I know, all modern programming languages have this feature.
The worst is when you move a bunch of files around in Solution Explorer and commit, maybe do a merge and push, before you realise the MSBuild/csproj files were never saved (gotta press Save All for some reason) - now you have a change you need to apply to a pre-merge commit. Good luck with that.
This feels like a lot of extra work to throw away the benefits you actually get out of version control. I would very much not like to work on this team.
I prefer a linear version number on the main branch and I have a really tiny version file that gets incremented on every change to the src/ directory. That's not entirely automated, but a commit queue could do that.
Brute-forcing hash collisions seems like an April Fool's joke. You can't really be serious that people are going to do this regularly?
I wish Git had more support for "linear" revisions in the main branches. It's great for continuous delivery where you can get a unique identifier that's also human-friendly.
I emulate this by counting the number of merges on main:
git rev-list --count --first-parent HEAD
But it's not that traceable (hard to go from a rev back to a commit).
We do this at TVL, and push the corresponding revision as a ref (refs/r/$n) back to git. See for example our cgit log view: https://code.tvl.fyi/log/
This way, a correctly configured git client (which pulls those refs) can use `git checkout r/1234` to get to that revision. It's also noteworthy that this is effectively stateless, so you can reproduce the exact revisions locally with a single shell command without fetching them from the remote.
Now, merge only the next-in-line hashes and the contributions to your repo can reach Cloud Scale™. Harness the ultimate power of distributed intelligent agents to create the future, backed by strong mathematical foundations and an ecosystem of innovative technologies. Just at your fingertips
I am writing a solo project. I only use main (aka master) and never use branching. Otherwise, I inevitably screw something up. It is good enough to keep me from losing stuff most of the time, and I almost never have to struggle with understanding what the heck Git is doing.
The fact that we use a hash as the main way to interact with commits shows how bad git interface is. Sure, you should be able to easily check the sha anytime, but expose the plumbing to end users on almost any interaction is mad. We just got used to it.
The Webkit project would love this. Can't help but feel that half the reason they spent all the extra effort with subversion was user-friendly commit revisions.
Wouldn't it have been better if we could use something other than SHA1 as the actual name of something?
Where in the worst dystopian parts of software do we do this?
The SHA1 is kind of a security feature if anything, a side-show thing that should be nestled 1-layer deep into the UI and probably most people are unaware of.
Whereas commits and branches should be designed specifically for the user - not 'externalized artifacts' of some acyclic graph implementation.
Git triggers a product designers OCD so hard, it's hard for some of us to not disdain it for spite.
I don’t want to make up a good name for every commit. Good comments are hard enough.
A SHA-1 might not look friendly to a dev who doesn’t understand it, but as someone who works with hash values all the time, having my repo be a Merkle tree gives me a warm fuzzy.
Git has problems: stipulated. Improvements in design are possible: also stipulated.
But, your reply is annoying in opining about my mental state and preferences. How am I confused by the SHA-1 commits, exactly? And how am I unclear that I’m looking at commits when I issue a “git log”?
It just means you have to coordinate more. Or just have one person in charge of the master branch. I don't think the post is supposed to be taken so seriously, though.
I would not call this a joke project. It’s a fun and optional sort of a thing, but there’s no reason why you shouldn’t take it seriously, provided your approach to work makes it compatible.
This is a serious criticism... even if it's not likely to catch on enough to have a real effect on the sea level, it is a complete waste of energy to accomplish something that could be done much more efficiently some other way, if it is indeed worth doing at all.
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).