The single best command that made working with `git rebase` infinite times easier is
git reflog
At the very least it will stop you from "oh I fucked up the <rebase/merge/cherrypick/whatever> so bad that I must delete and clone the repo again".
At best you can salvage commits that have been rebased out, get back whatever you happened to checkout and basically it's the best command I found that does true to the "it's almost impossible to lose something in git once you've commited it"
You can also skip the reflog if you have a reference to the commit (the commit hash being the most reliable since it can't change), `git reset` is like the teleportation device for the git graph. Even if it's orphaned or dereferenced (due to rebase) it takes quite a while before git GC will remove it (two weeks is the default I think).
I feel like this should be introduced before any rebasing tutorial, or quite frankly any kind of more advanced git commands than commit/pull/push.
It's really important to know that it's ok to fuck (almost) everything up in order to make progress in learning. This even goes beyond git itself and onto the code it tracks - I've noticed some newbies will get stuck tip toeing around code worrying about "breaking it too much" when debugging something tricky... When I see that I tell them to park their work in a commit, then show them how to reliably restore the working tree to that point (reset --hard), then they can rip the codebase to shreds to properly zero in on the issue in short order.
Important to note however is that the reflog is local and isn't pushed. Sometimes people panic when things aren't working and once someone asked me to fix their branch, but then it turns out they had deleted the local repo because they wanted to "start over again". Bad idea!
One thing I didn't know is that certain macOS apps keep their own version history of files, which you can access by opening the file in TextEdit (Xcode for instance will do this) and then File > Revert To > Browse all versions. So if you've truly lost work and can't restore with reflog or another git command, it's worth trying this.
I always read `reflog` as `re-flog` not `ref-log`. And, I was confused about the connection between flogging and the finding all git commits. It was much later that I made the connection.
Yes, this another thing doesn't get said enough to a beginner:
A branch is nothing more than a pointer (or mark) to a commit.
This realisation along with the `git reflog` command I wrote above, made for a tremendous mental shift and suddenly almost everything around git just clicked together.
This (along with the fact that a commit is a state, not a patch) is really the most basic thing that every git tutorial should start with and you shouldn't move anywhere past that until you grasp it.
But what you usually get instead is "this is a command to commit, this is a command to push, and this one to pull, here's how you make a branch, have fun" - and then people complain that git is complicated because they end up with situations they don't understand. Well, duh - any new concept seems complicated until you take a closer look at it, and if you learn git this way you never actually did try to understand it. It's a data structure manipulation tool, so it's in your best interest to have at least some rough understanding of what data structure it operates on. Otherwise it's like trying to use a word processor without knowing how to write.
As another example, I find it completely backwards to teach `git pull` before `git fetch <remote> <ref>:<local_ref>`. How is one supposed to understand what `pull` does this way? However you imagine it at this stage, it's going to be wrong and will make you end up confused later on. Once you know how fetch works and what merge is, `git pull` becomes obvious. It doesn't work the other way around.
> A branch is nothing more than a pointer (or mark) to a commit.
Yep; the way I like to phrase it is "symlink". As in, .git/refs/heads literally contained symlinks in it pointing to commits. (It doesn't anymore because Windows.)
Or you can simply never rebase and always merge. I still don't understand why people think history-destroying rebase is a good idea, but that's just me.
in my global gitignore, so that whenever I have the desire to litter a repo with little working files there's no chance of accidentally committing them. For things like curl or debug output, that kind of thing.
Right. If working with other people, a global gitignore is probably most sensible if you you have say an uncommon editor that likes to create a subfolder with metadata, or has a weird convention on temporary in-place rename file names. In those sort of situations, adding the paths from your weird setup to every .gitignore file you work with might be overkill, and potentially unwelcome by maintainers of some projects you work on.
But for more common stuff like visual studio cruft in a windows C++ or .NET project, or the python cache folders in a python project, etc., it probably makes more sense to have that stuff in the repository level .gitignore.
Or even if you have a common editor! Having to add several editors' worth of excludes to .gitignore on every project is annoying. Much easier if everyone takes care of their own local droppings, and the committed .gitignore can focus on project-specific ignores.
> Patterns which should be version-controlled and distributed to other repositories via clone (i.e., files that all developers will want to ignore) should go into a .gitignore file.
> Patterns which are specific to a particular repository but which do not need to be shared with other related repositories (e.g., auxiliary files that live inside the repository but are specific to one user’s workflow) should go into the $GIT_DIR/info/exclude file.
> Patterns which a user wants Git to ignore in all situations (e.g., backup or temporary files generated by the user’s editor of choice) generally go into a file specified by core.excludesFile in the user’s ~/.gitconfig. Its default value is $XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is either not set or empty, $HOME/.config/git/ignore is used instead.
Creating empty git commits and pushing them to a PR might be all you need to scratch that “let’s just try it once more, without changing anything” itch.
Please just don't. If your CI / CD pipelines do not offer the option to retrigger builds in some way other than modifying the repository, rather spend some time improving them.
I am aware of the "Squash commits" and similar features most CI systems have, but those boxes are easily forgotten. Plus you end up squashing everything and there are valid reasons for having more than one commit in the same PR and keeping them separate
A workaround I've used sometimes to retrigger CI is to do `git commit --amend --no-edit --reset-author`. This changes the sha by changing the author timestamp. Then you can force push it to the PR branch
That's absolutely preferable but (a) you might not have done that -yet- (b) you might not be the person with permissions to do that and they might be too busy and/or unwilling to do so.
The title of the article does rather make it clear that most of these are "in extremis" rather than everyday techniques.
I make empty commits when I want to update the commit message without changing anything in the referenced tree. I do this by prefixing the title of the commit I message I want to update with squash!. Then, when run an interactive rebase with the --keep-empty flag, git will open my editor and I can just delete the original commit message and leave the new commit message I made with the squash! prefixed commit and then continue with the rebase.
This allows me to essentially show how I would change a commit message without having to run rebase or amend a commit in a code review, which hides the original commit.
Ah, what a stellar use case. It's been forever since I looked into git notes (or was it annotations? can't check right now). But I guess those had significant flaws. Empty commits that you can use the message as a todo-list and fixup/amend in changes as you go. Might try this.
I occasionally use empty commits to leave notes to myself that I want to be shown when I go to clean up my history (generally with rebase -i) and get it ready for review. It works well as rebase -i will default to commenting out empty commits, thus drawing attention to them.
Awesome, I really needed to be reminded of .mailmap. At $dayjob, the history is a mess. I just did a PR to add a .mailmap and clean that up.
But other than that, yeah I've encountered most of these. We had 2 unrelated trees in our monorepo for a long time (technically still do, but no longer used). When the company migrated to git ~10 years ago, they were doing deployments with patch files by hand. Prod and staging had diverged for various reasons. They decided they wanted to track both so that patches could be created locally from the known prod state, so they put prod's source into an orphan branch. When we'd want to deploy, we'd cherry-pick commits from the main onto the prod branch, then generate a patch file for the range of commits to go out. Eventually we fixed that by getting code parity the same across environments, and now we just do ff-merge to the prod branch, then do a git pull in prod to update things. (Note this works cause it's a PHP codebase and we don't have a build step... sometimes we need downtime for a change, but that's fine.)
Ah, that's unfortunate. I checked if GitLab has support, and it seems not yet either. Some of the comments on the issue were from some trans folk saying they were hoping to see it merged so they could hide their deadname from repos. Pretty valid reason.
I think most people who have configured `semantic-release` on a repo can cop to using —allow-empty at least once. Or if you’re like me, every other commit.
Regarding git filter-branch, that big warning does point you to try git filter-repo [1] instead. In my experience, every case that I've need git filter-branch for (or BFG for a bit) has been handled easily and well (including performance-wise) by git filter-repo. Don't let the fact that it needs Python and is still a separate download from the git suite (in part because it needs Python and that is not otherwise a git suite dependency) dissuade you from using an easier and faster alternative to git filter-branch.
Absolutely not. cat-file reformats the data, but even if you ignore that loose objects are compressed on-disk, and that they contain a nul byte, a tree object is binary data. Amongst other things it’s the first time you’d encounter an oid as raw bytes instead of hex (though that’s also the case in pack files).
> Filter branch
When even git’s own documentation tells you that something is riddled with footguns, you probably want to avoid it. If they suffice, prefer git-filter-repo or BFG, they’re much more reliable and easier to use.
If there's any command that's not needed, it's git pull. This command basically combines git fetch and git merge. But merging the remote changes into your local branch isn't always appropriate. I know that you can change the second action to rebase if you use the --rebase flag, but you have to remember to do that.
I personally prefer just running git fetch and then deciding whether I want to merge, rebase, or just reset my local branch to the upstream commit and running the approriate command for that.
You can at least configure rebase to be the default:
git config --global pull.rebase true
I understand why this isn't the "safe" default option.. but at the same time, it's the sane default option for anyone working with others. Maybe it would make more sense for the default to be a prompt: "Your local and upstream both have changes: do you want to rebase (replay) your work on top or generate a noisy merge commit?"
Luckily in many of the common usages (git flow, github flow, scaled trunk-based development) where developers are doing actual work on independent branches and merging via pull requests, this doesn't happen often.
Not a fan of that, either. It just seems like the kind of command that new users should avoid because they won't anticipate any consequences and experienced users do avoid because they can anticipate any consequences.
I think experienced users use the common git workflows, which usually avoids this situation.
But, FWIW, this is a litmus test for me of one's true git prowess. Merge commits like "Merged origin/master into master" tell me the user doesn't know how to use git (as in: doesn't know how to rebase).
If you know how to rebase, I don't know why you wouldn't have this set by default, because undoing it if it accidentally happens is more work -- and requires rebasing anyway.
Context dependent. With the "right" git workflow in use on a project/group/company/whatever, "rebase on pull" is what you want, always, no questions. With a different workflow, it may be what you absolutely do not want. So for some new users, its the right thing to do so they don't forget, for other new users its something they should avoid. Only the git workflow can determine that.
This doesn't affect anyone else! Other than they don't see noisy, meaningless merge commits.
It only takes effect when you are both behind AND ahead of the remote (as in: you have local unpushed commits, and there are remote commits you don't have).
It pulls the remote commits first, then rebases your commits using those as a base.
You are left with a repository where you are ahead of remote, and can just push your commits normally.
Only your local history is affected (insofar as your hashes will change).
No, you're now pushing up a history that is a lie. Those commits represent states that never existed, and which you never tested. They certainly don't represent your headspace at the time.
Those merge commits are the truth. The histories diverged, and had to be reconciled at some point. If I'm trying to understand the history then I want to see that, not your idealized version where you try to pretend that the divergence never happened.
It can be hard to accept sometimes, but you're putting in a lot of effort for an end result that just makes life worse for everyone involved.
> Yet, I know off the top of my mind at least two occasions where creating an empty commit can make quite a bit of sense:
> 1. Initializing new repositories.
> 2. Triggering continuous deployment pipelines.
We use 3. Change branch timestamp so that old branches cleanup job won't sweep your branch.
> Honestly, in practice I haven’t found a single valid use-case for octopus merges which aren’t already covered by sequencing a series of merges, one after the other. Perhaps there are some integration use-cases out there which really let’s the octopus merge strategy shine. Let me know!
Functionally, the octopus strategy is slightly more conservative than a sequence-of-merges-with-abort-on-conflict, protecting you a bit more from the rough edges of Git’s unsound patch theory; but in practice you’re unlikely to ever trigger or observe the difference (though I did encounter it a number of times as part of a major refactoring—the diffs were just right to trigger it). I don’t think this alone would be particularly worthy of note (though for the sequence of merges, it might still be worth switching to the resolve strategy).
The difference I value is performance.
A few years ago I was working at a place that used labels on merge requests in GitLab to specify which of several pre-production environments that MR should be deployed to, and then on all head or label changes, reconstructed a branch for each environment based on merging the appropriate heads. When I started, it was taking each MR in sequence, fetching from its source repository and branch, and merging it, but this took several seconds per MR, and it wasn’t rare to end up with ten or even twenty include-in-beta MRs, and you couldn’t deploy the beta branch until that process was finished. It was also fiddly to diagnose the problem when you had conflicts (one MR conflicting with master, or two mutually-conflicting MRs), though this was more just a “no one got round to making the script perform the reverse correlation itself” thing. I got fed up when the process was sometimes taking two minutes, so I rewrote it to use a single specific fetch of all the heads (basically `git fetch origin refs/merge-requests/123/head refs/merge-requests/456/head …`, the GitLab API gave the commit hashes already so I let the fetches dangle rather than giving them a local ref, and used their commit hashes in the merge), followed by a single octopus merge. Since an octopus merge failure’s output was basically useless for identifying the problematic MRs, I made it fall back to sequential merges along with implementing the required reverse correlation and telling you which MRs conflicted. (And if the sequential merge succeeded, which is theoretically quite possible, I made it fail noisily, on principle).
The end result of it all:
• What had taken one and a half or two minutes now took under 10 seconds. (And even in the conflict case with its sequential merge fallback, the single fetch meant it’s be up to twenty or thirty seconds faster.)
• The job server performing the branch rebuild experienced vastly slower repository growth, no longer needing periodic aggressively-pruning garbage collection to avoid running out of disk space.
Thank you for this elaborate explanation and very valid use-case. I did find out that octopus merges have been used quite extensively in e.g. the Linux history as I was trawling the archives for one of the anecdotal remarks I made. I eventually found them both, by the way (post updated):
I suppose whenever you've got scenarios where integrating a significant volume of unrelated work (with low probabilities of conflicts) it starts to make a lot more sense. I guess it's not really been that relevant to me personally.
Both git fetch and ordinary git push update refs/remotes from refs/heads but they don't change anything under refs/heads at all.
Ordinary git pull will update refs/heads/<currentbranch> because by default it merges/rebases the current branch after fetching, but that's not always desirable when branch switching is expensive.
Pushing locally updates refs/heads/<somebranch> from refs/remotes/<somebranch> without switching to that branch.
Now I also understand @myme's sentence "In fact, in order to properly “push to the local repository” it’s necessary to invoke a git fetch first to ensure that the remote tracking branches are updated."
Thanks to @myme and other responders in this thread!
Yes. This one was hard for me to explain. I've discussed it with others too and might try to rephrase it again in the post.
As far as I know, there's no other git command that let's you (as easily) change which commit a local branch is pointing to without first checking it out. And "without checking it out" is the crucial part.
If, as the example tried to explain, you have a huge piece of software (think C++ monorepo) that takes ages to build, unnecessary switching of files back and forth easily messes up build tools relying on modified timestamps to determine what needs to be rebuilt. (Sure, blame the build system, but that might not be a trivial fix.)
So, in that example, the use-case is to make sure the branch is up to date before switching to it. (And you could of course argue that why don't you just create a new branch, which is also a completely valid approach).
re: using `git commit --allow-empty` for initializing new repositories - I used to do that for the same reason (being able to rebase), but eventually I started checking in a LICENSE file and skeleton README.md file (name of project + LICENSE blurb) as the first commit instead.
Don't get me wrong, I'd say that's a very good approach too. A lot of my personal repos are private and synced through personal infrastructure so I don't always bother with those kind of files.
I habitually start with 'touch .gitignore' and then committing that.
Since an empty .gitignore doesn't actually do anything and has never been an issue to rebase onto, it works out fine for me.
I should probably, arguably, switch to using --allow-empty but my current workflow is long since baked into my fingers and has yet to cause a problem for me or anybody else, so if I switch it'll be for reasons of pure pedantry (which me being me may eventually be quite sufficient motivation, but hasn't just yet).
Yes and no. In case of changes to the runner environment and complex pipeline with many manual jobs, expiring artifacts, etc, it's easier to make an empty commit than to go around deleting what is obsolete and trying to find a working sequence
Empty commits are useful to indicate that lengthy, manually run tests match some prior good state in development branches. For example commits like...
1.Good
2.Refactor
3.Rewrite
4.Empty with goodness re: 1
5.Rewrite
6.Refactor
...and now you retest and something is broken. You have an easy way to isolate any issues as happening after #4. And you have left markers for yourself 6 months from now if something goes wrong after a merge (i.e. confirm 1 and 4 match).
> If you use any others and you are a normal developer, chances are you are overcomplicating things and should rethink your way of using git.
I strongly disagree with this one.
Git is one of those constants in our developer-live that is bound to stay for a long time. Probably longer than most other tech/tool that most of us use and learn on a daily basis.
The fundamentals of git aren't that complicated, the datamodel isn't any more complex than what you get taught in an introductory datastructure course. And once you understood it, you can leverage it to create an actually readable and usable commit history instead of the unreadable mess that your average enterprise project is. In my opinion, it's well-worth the trouble of taking an afternoon to understand git beyond the commands you've mentioned.
> The fundamentals of git aren't that complicated,
Agreed
> the datamodel isn't any more complex than what you get taught in an introductory datastructure course.
Also true. The problem is that people love to shoot themselves in the foot by overcomplicating normal use of git
That's all fine, until you try to push to someone else's repo and things start to crack.
So as a person who had to maintain a fairly large project, and had to fix a lot of madness from developers I nowadays suggest everyone keep their git adventures to a minimum.
git status
git log
git reflog
git reset
git merge
git cherry-pick
git rebase
These days you can use the new (and less confusing) "git switch" and "git restore" commands instead of "git checkout". I still use "checkout" because it's ingrained in my muscle memory. But I wouldn't recommend it for new developers.
… which results in either losing work sooner or later (because you don't commit often and early enough to always have something to roll back to), or pollutes history with junk WIP commits and merges. Maybe this is because I am not a "normal developer" (more like below normal), but this "overcomplication" both produces a clean and easy to read history (that is also easy to bisect when anything goes wrong and you need to find the offending commit), and also saved my bacon countless times.
> … which results in either losing work sooner or later (because you don't commit often and early enough to always have something to roll back to),
You can just git add without commit. The changes will be saved to index and live till git decides to GC its files (so few months). Extracting those might be PITA (as requires playing with more arcane git commands) but you can't really lose data
git stash is also lifesaver if you need to quickly jump to branch to do something else
Lastly there is "git commit --amend" way of just incrementally changing the current commit but that requires some self-discipline (or hooks preventing your from) to not amend pushed commit and then having to fix that.
One command I use that I haven't seen anyone mention so far is git apply. I use this to stage or edit diff hunks before committing. Basically, at least in vim, I can highlight part of the diff (along with the diff header) and pipe it to the git apply command with the --cached and --recount flags to stage or unstage changes.
This, at least in my opinion, is easier to use than the -p flag for git add or git reset, and provides much finer grained control over staging and unstaging compared to those commands.
Worth noting that `git switch` and `git restore` have been introduced (few years now) as easier-to-use alternatives to `git checkout`. Even in status you can see it's restore that is shown in the help messages. From commands mentioned, it's missing (one of) my most often used one(s) `git diff`. What's the point of using a version control if to never check what changes are made?
> om my experience the circles of "complains about git" and "read the manual properly" barely touch
Yuuup. For anyone reading this who thinks of Git as a bunch of magic commands, go read all of gittutorial, gittutorial-2 and gitcore-tutorial. That last one is particularly important to read & follow along with. It gives you Git's underlying data model, which makes it much easier to reason about what the UI-level commands are doing.
I even used that knowledge once to recover my boss's lost data at a previous job! He had git-added a new file, and then lost it (git-clean/git-reset/something) before committing. I knew from reading that tutorial that git-add is sufficient to add it to git's object store, so we managed to find the right object using file timestamps and git-cat-file to retrieve it.
... although these days rebase is mostly handled on the server. By the time your PR is reviewed its already behind main so server side needs to rebase it anyway before merge can be done.
If you do rebase to squash commits, most projects prefer you didn't
> If you do rebase to squash commits, most projects prefer you didn't
I've been using rebase to fix bugs in earlier commits (on a feature brannch that hasn't yet been merged into main) so that you don't end up with broken commits in the history.
I've also been using filter-branch to retroactively apply autoformatting to all commits in the branch.
If your work-in-progress files are trivial, it's easy to create a temp directory with a .gitignore with just a '*' in it and leave your files there.
If it's not so trivial, create a local temp branch so you can cherry-pick it or merge it back in later, or maybe convert your temp branch into a more enduring one.
Stashes are pseudo-commits with no branch context, and are more of a conceptual burden then they're worth.
The `git stash` is useful but that by default status doesn't show anything related to stashed changes makes it easy to forget you have them. Can replace stash workflow with commit, branch, etc and have one less thing to think about.
I mean I can configure my shell correctly so that isn't a problem in the first place
-> ᛯ date >> asd
[23:57:10] ^ [/tmp/a] {master *}
-> ᛯ git stash
Saved working directory and index state WIP on master: bf5a0f8 fix
[23:57:13] ^ [/tmp/a] {master $}
-> ᛯ git pop
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: asd
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (4f46bc00a3d6032c2726f04540a44656090e05ca)
[23:57:16] ^ [/tmp/a] {master *}
-> ᛯ
it's pretty non-missable (git pop is just alias to stash pop)
At best you can salvage commits that have been rebased out, get back whatever you happened to checkout and basically it's the best command I found that does true to the "it's almost impossible to lose something in git once you've commited it"