These add no benefit to history, and actually provide an impediment to bisecting (since a lot of these intermediate revisions will not even compile).
At my previous job, we used gerrit. The nice thing about gerrit from my perspective is that it kind of "hid" all of the intermediary stages in the gerrit review. So you could push all you wanted to your gerrit review fake-branch thing, and when you finally pushed to master, there would just be a nice, clean atomic change for your feature. If you needed more detailed history of the steps during review, it was there in gerrit. But master was clean, and bisectable.
Is there any git other git tool or workflow which both allows people to back things up to the central git repo AND which allows squashing changes down to meaningful bits, AND which does not loose the history of review iterations?
If you use feature branches, then it might help to
- rebase interactively to clean up/edit/remove commits that are not relevant before merging
- merge into master with the `--no-ff` flag - this forces Git to create _one_ merge commit, even if it is a fast-forward merge
FWIW the two above can be used individually or together. The way I work (and many others I work with) is
- create a feature branch off master
- hack, commit, hack, commit
- rebase interactively to clean up history
- issue PR on Github
- Merge using Github (which under the covers does a `--no-ff` merge so you get ONE merge commit)
Bisecting with this workflow is a bit more coarse grained than if you did a workflow allowing fast-forward merges b/c usually the closest I get is to know that it was a PARTICULAR merge introduced a bug or a regression. That merge commit might have any number of commits that constitute it.
Hope that helps.
[Update 1] - If you use the CLI to merge, perhaps alias merge to merge --no-ff so you don't forget?
ff = no
I'm confused though -- I thought that if you rebased & squashed something after you pushed it, then it would confuse the git clients of anybody who had pulled before the squash?
Thanks so much for all the suggestions!
When a single developer if working on a feature in their own feature branch, and using that branch also as a backup (e.g. pushing "going to lunch" commits), there should be no need for anyone else to pull this branch when the work is unfinished and ongoing.
Well, they can pull it to e.g. take a look at the code, but as long as they contribute to the branch themselves (and why would you base your work on someone else's "going to lunch" commit?).
If a group of people works on the same feature, then they should set up a "master-feature" branch, in addition to their personal branches.
But before I share that work with anyone, it needs to be squashed and then broken out into a logical progression of cleaned up commits with good descriptions that are bisect friendly.
By all means use the power of Git to give you great freedom while coding locally -- but don't push the resulting cruft to shared repos.
I personally prefer to rebase on a new branch, naming the new branches with a suffix in the form "-vN". Would something go wrong with the rebase, it will be way simpler to reset the new branch to the head of the old than to recover from the reflog. Nowdays I rely heavily autosquash, interactive add, interactive rebase and Magit (which makes the later two a breeze).
a) Never change pushed history
b) Have the Pull Request act as a single source of what changed in each feature branch. Since Stash also includes comments from code review, we find it offers more context than straight commit history
c) Don't have to worry about team commit-early-commit-often vs. team commit-only-when-it-works-perfectly. The changes will all be centrally documented in the PR, no matter the individual styles
It seems pretty clear to me that the traditional git workflow is the one used by the Linux kernel, which is after all the reason why git exists in the first place. The traditional git workflow strikes the correct balance between locally rebasing to create a logical sequence of meaningful commits without squashing everything when your work gets published and merged.
A lot of the noise on the git-related threads here on HN would probably be avoided or at least reduced if everybody was actually aware how the traditional git workflow f- i.e. the Linux kernel workflow - works.
It turns out that the Github "Squash and Merge" option for Pull Requests does basically this – you review the PR as a bunch separate commits, when you click "Squash and Merge" a single commit is created in the target branch with all the commits squashed, but you can still go back to the original closed and merged PR (e.g. by following a link in the commit message) to view the individual commits.
Obviously this doesn't help if you want to bisect among more granular commits, but (without having actually used it), it sounds like a good middle ground to me. The previous workflow I was used to involved squashing and force pushing to your branch, therefore overwriting the individual commits in the PR for ever.
Or you can use git to track this with merge commits and it comes along wherever you go.
It's only helpful if the 'un-expanded' commit is used through the whole system, from git blame, to rebases, etc.
I'm rather new to the --first-parent flag myself but I think it works with anything that supports revision options.
Here's what I see in git 2.8.1:
usage: git blame [<options>] [<rev-opts>] [<rev>] [--] <file>
<rev-opts> are documented in git-rev-list(1)
This, unfortunately, is not a great thing to propose.
Many of git's operations on refs are basically O(n). You really don't want to accumulate an unbounded number of them over time.
I can see how, in the second workflow (feature branch workflow), this can become a problem. If the history looks like this
This gets more complicated in the third (gitflow) workflow, when all feature branches also have their own "feature master" branch and developer branches. Then all commits in the several "feature master" braches are merge commits, and then all the commits in the main master are merges from those merge commits. Then a commit in the master branch has only merge commits as its parents, so how can you tell which one of the parents is the previous commit in the master, and which one in a closed, finished feature branch?
If you could tell, then when bisecting or whatever, you could first just go back in history in the master, and only when needed, take a look at more details in the feature branch commits which contributed to the single merge commit in the master.
All this is solved by the named branches in Mercurial, where a commit carries forever a nametag, to which branch in originally belonged. You can add informal extra info in git commits in commit messages, so with some extra tooling you could make the history in these branching models usable again.
Then again, in the fourth workflow (forking workflow) when developers pull directly from each other, and there is no central repo, permanently tagging a commit with a branch name would make no sense. And this is the use case with Linux kernel work, for which git was originally designed.
> ... how can you tell which one of the parents is the previous commit in the master ...
You can use --first-parent  to disambiguate that. In a nutshell, the master branch in your example would be the first parent, letting you disambiguate (and there's support for this in a lot of other git tools!)
Someone needs to start enforcing proper commits, and a CI system need to be setup to disallow broken commits.
You have a problem with commits here, not the lack of squashing. A messy history is only messy if you make it so.
> These add no benefit to history, and actually provide an impediment to bisecting (since a lot of these intermediate revisions will not even compile).
This is also a problem---every commit should compile, even if only for precisely the reason of bisecting.
Squashing commits is avoiding the problem; instead, perhaps educate the others on proper practices for committing, and enforce those practices.
No, that advice is wrong and not relevant to a git private branch. That strict & disciplined attitude about commits was relevant for older centralized source control tools with lock/checkout/checkin/unlock such as CVS and SVN. In that previous scenario, your colleagues depended on the shared repo to properly compile and therefore, you shouldn't "break the build" and derail the team.
>educate the others on proper practices for committing,
On a private branch, people should commit whenever they want on any whim of a reason. This will result in many commits that don't compile/build. That's ok. That's what the later step of "squash" into "logical" commits is for.
To repeat a previous comment about it:
The confusion is that the same "git commit" command is used for 2 very different semantic purposes:
(1) git commit -m "fixed bug #23984" --> as Logical-Unit-Work and worthy of bisect
(2) git commit -m "wip" --> as meaningless backup/savepoint like Ctrl+S save
The type (2) was for the programmer's internal purposes of safety backups, cleaning up whitespace, typos in comments, reflexive muscle memory of saving often, etc. Type (2) commits can have deliberate broken syntax and they're not meant to be built or be bisected.
Type (2) commits should never be discouraged because saving work often (including broken midstream work) is a good habit but from an outsiders perspective of the reviewers upstream, they are way too noisy. The spurious commits could be less than 30 seconds apart with no compile/build step in between.
>Squashing commits is avoiding the problem
I hope it's now clear that "squashing" is the correct tool for Type 2 commits.
The concern was when that private branch isn't cleaned up before it is make public; it then becomes an issue.
Though see my reply to developer2 (sibling of your post) for rationale against garbage commits to begin with.
The problem is that using rebase to clean up your local commit log before sharing with others isn't the easiest thing to learn. Even once you supposedly "know what you are doing", it is still possible to make a mistake. Git has a lot of amazing functionality, but the majority of commands are not intuitive to use out of the box. Even GUI frontends to git don't manage to simplify the more complex commands all that much. I'd like to believe I'm an "intermediate" git user, but I run into issues often enough that I'm sure I overestimate my knowledge.
Most teams I've seen using git wind up using only the core commands (clone, commit, fetch/pull, merge, push), essentially using git as a drop-in replacement for svn without taking advantage of the additional possibilities git offers. Again, this is because becoming a git guru is a steep learning curve. It doesn't matter if 19 people on a project know everything about git; it only takes a single 20th person to make a tangled mess out of a centrally shared repository. I've spent many an afternoon working to rectify botched rebases and similar issues.
One example is forcing a push for a specific branch, knowing that the resulting destruction of history is desired and not harmful to others; only to forget to specify the branch name on command line, which results in force pushing all branches. Whoops!
That's a dangerous practice. When you first commit the changes, you have all the necessary context, and hopefully flow hasn't been interrupted. If you rebase hours, days, maybe weeks later, then that context is completely lost---it's just like trying to get back into a project that amount of time later.
The reason the detail you'd put into your commits is so useful is because you have a perspective that others won't; if you return to it later, you'll be reading the diff and suffering just like others, albeit with a bit more knowledge.
Git commits are so fantastically cheap, and especially easy if it's a one second task via keyboard shortcut in your IDE rather than terminal commands. You wind up with a lot of commits throughout the day all related to accomplishing a simple task, and then rebase them down into a smaller number of meaningful chunks of work.
I agree that this should not be the workflow for larger units of work. If you're working on a week-long task, you would use this flow to refactor each day's 50 commits down to what you would have normally committed (maybe 1-5 commits for the day). This allows your feature branch's log to contain meaningful information. At the end of the week, when your feature branch is complete, you then have the additional choice as to whether the branch's already-more-compact history is meaningful; if not, you can further squash the merge commit.
>> When you first commit the changes, you have all the necessary context
Rebasing your local changes before sharing is for the case where you're spamming loads of meaningless commits that don't have - or deserve - their own context.
But since context has been lost, that "useful" commit might just be a single, squashed one; I've observed this situation at work from others.
So, this is a situation where those users are digging their own hole.
Also, if I am working on something very complex, I might go hours or even days with code that can't compile. I don't want to risk losing that because I was avoiding committing.
A lot of these problems can be resolved with the proper use of --fixup and --squash and auto squash rebasing.
I have also pushed for the purpose of pulling on another PC; that commit then gets `reset --soft`'d.
Seriously, I think that the cost/benefit ratio of a perfect history is so terrible that there are very very few people that follow through with it.
If one submits a PR and it has a messy history, what is the reviewer supposed to do? Make them go through the commits and clean them up? How will the reviewer verify that every commit compiles? Make them squash it? We are back to square one. If you only have squash commits from PRs and run CI on them, you are sure that every commit compiles.
That's up to the project. Some do, yes, for the reasons that I described---it's much easier to review a series of patches that can be comprehended, and has benefits later on (e.g. bisecting). If the patch is large from a contributor, and a bisect arrives at that commit, and the contributor is not the one doing the debugging, it can be a frustrating and inefficient experience. I've wasted many nights on something that could have otherwise been immediately obvious.
> If you only have squash commits from PRs and run CI on them, you are sure that every commit compiles.
It's not difficult to loop through commits and make sure they compile.
In fact, bisect can do it for you, if you aren't fond of command-line loops.
Where I work, we don't enforce that each commit build---it's known to be a good practice, and they're going to have someone flip out on them if they're trying to debug something and they have to skip a dozen commits when bisecting (incidentally, I had that problem today).
In that case, it's a cultural thing. If someone consistently commits code that doesn't build, then they should be addressed.
If it's a random contributor to a project, more care should probably be taken. If building each commit isn't feasible, then maybe only building modified files (that is, a normal `make` for example rather than a fresh workspace) is better than nothing. If that's too long, maybe build a few sample commits. Etc.
In any case, even if you don't build each commit, the history is still useful.
I find that especially important for bisecting, and on that merit alone (I have other reasons as well) reject squashing commits---it makes bisecting useless for large changes. When someone commits what you can better call a project, you're going to struggle, even if it's easy to comprehend the code. Same goes for review requests---we encourage our team to post small reviews, and all of us get frustrated when we have to try to grok a 500+ line diff (unless those lines are entire new or entirely removed large hunks).
At the end of the day, my WIP is usually quite small, with some exceptions. Even on complicated changes---much of that planning is on paper (or mental paper) or writing test cases. WIP can be hard to get back into when flow is broken, letalone the next day, or after the weekend.
It feels to me that "what really happened" only really applies to "what did we release?" rather than "how did we get to what we released?". Completely agreed that every release should be tracked in version control without modifications, but I'm skeptical that auditors care that you forgot to run the linter before committing and then you did a follow-up commit to add a semicolon where one was missed, but all of that happened between releases.
To avoid cluttering the main repo but still have a backup, you can just use a fork (even just a directory on a remote server).
The next day, do a "reset --mixed" from the tip of your backup branch back to your last good commit on the feature branch. This brings all of the remote backup code into your working copy. From there, start work again and begin making staged commits of atomic changes.
In a private environment squashing history is the precise opposite of what I want. I want immutable history. Anything anyone every checks in is there forever. Safe and secure. Impossible to lose. Impossible to screw up.
Furthermore, I want all the changes they made along the way to their feature. Because lord knows there will be a moment down the road where there's a line of code that doesn't quite make sense. And I'll want _full_ history to understand where that line came from. See how it evolved.
Git still doesn't have anything as good as p4 timelapse view. Which is deeply unfortunate. That's a great tool for spelunking the past.
This is that code reviews and code commenting are for. You shouldn't need to dig into the nitty-gritty of multiple commits of a single feature to understand a single line. Written once, read hundreds—right?
> And I'll want _full_ history to understand where that line came from. See how it evolved.
So you want a full keystroke history as well, then? Because as far as I'm concerned, this is essentially what an unsquashed commit history is.
Don't get me wrong, I'm not saying you should have 15,000 line commits that encompass a single feature—these lines should have made their way back into the code a long time ago—but seeing dozens of one-line changes is useless: "fix ci by including dependency"; "&& instead of ||"; "tidy up style"; "strings should be UTF-8"—and these are just useful names for useless commits which totally ignore commits of "..."; "shit"; "fixes stuff"; "work"; "progress";.
These commits just fragment the history and especially gitblame, because it becomes difficult to see many changes cleanly wrapped into a single, logically grouped block.
The obvious answer is to squash commits, but keep the information. Intermediate commits aren't needed very often. But they can be exceptionally useful when they are needed. Especially if the commits were from an employee who is no longer working at your private company.
Git still needs a p4 timelapse view tool. It's wonderful. All that tool needs is a little check box called "expand squash". That's the best of both worlds. Everyone gets what they want. But that tool and checkbox don't exist. So given the choice between squashed vs unsquashed I'll take unsquashed 100% of the time. At a private company that is. Open source projects, especially large projects, might choose differently.
> In a private environment squashing history is the precise opposite of what I want. I want immutable history. Anything anyone every checks in is there forever. Safe and secure. Impossible to lose. Impossible to screw up.
> Furthermore, I want all the changes they made along the way to their feature. Because lord knows there will be a moment down the road where there's a line of code that doesn't quite make sense. And I'll want _full_ history to understand where that line came from. See how it evolved.
> Git still doesn't have anything as good as p4 timelapse view. Which is deeply unfortunate. That's a great tool for spelunking the past.
Seems like this is an unpopular opinion here. I can only imagine the down voters are Linux kernel developers? With 600k+ commits, they have some standing to make rules forcing people to squash commits.
Of course nobody can force (for example) the discourse project to accept unsquashed commits. Any project is fre to refuse contributions on any reason. Who am I to say anything if a project won't accept my pull request because of my gender or race or because I am a moron who uses spaces in some places and tabs in others? I've said before, developer is free to squash their own commits before they push. I will indulge in a lot of bike-shedding (latest of which has been gpg signing commits).
I will say though that there is some merit to the idea of people who want to force squash down our throats. Firstly, I'd like to present my point of view. My point of view is that if we keep all commits, then we have all commits. We can drill down the list of commits if we ever need to do so. Why would I want to throw away information? Unlike the kernel project which is 1.6 GB of source code and 600k+ commits just on torvalds/master we will probably never have this issue.
This is not to say I think the opposite idea is completely without merit though. I understand the opposition to keeping all commits. In keeping all commits, you could argue that we are actually throwing away information because we can keep only so much information in the "staging area" of our brains. If we just start using commit as an alternative to ctrl + s, we might be throwing away information because we will lose the context of what we were thinking when we made that commit when we come back and look at it in six years.
I'm a big fan of Uncle Bob. I don't always do TDD but I think only committing once unit tests pass locally is a good compromise here.
Also, would the downvoters please care to elaborate why they downvoted forestthewoods?
I specified "private environment" for good reason!
I readily accept that source control needs for large, public projects is unique. I think it's important to recognize that Git was made for a very particular purpose. Other environments, such as private work environments, have very different wants and needs.
* saves the full history of work
* shows only meaningful history on master
* keep the history of pull requests
I've never done this, and I'm only running from approximation, but you
actually made me curious enough to think about it. I must admit that I'd
seriously like the more successful git hosting platforms to seriously
consider point 3, because the history of a given pull-request is
impossible to follow (you never know if the committer added a new
commit, or if they amended the last commit)
1. Hack and commit along the way
\ / /
\ / / /
\ / / /
Unfortunately the standard git isn't smart enough to show two parallel
branches as truly parallel, it tries to squeeze the graph to as few
parallel branches it can show, meaning they all needlessly look
intertwined. Moreover it requires manual fiddling in merge commits to
properly tag them and edit their message for it to be meaningful, but
nothing that can't be easily programmed.
We're a small shop with only 1 or 2 folks working on the code at a time, and mostly are using git as part of our deploy process... so simplest is the best for us.
and if you're going to back port to historical releases and hotfix and or patch them, its better to fork off master and then cherry-pick back, and then eventually end those branches.
As an example: I've been working on a new spike for the past 2 weeks with one other developer. Maybe 10 times a day we'll need something that the other person has committed, so we work against one branch (master). The workflow suits this extremely rapid iteration.
One repo has now matured to the point where developer branches make sense. We created "develop" on it as well as our own branches off that. We're not close to a v0.1 yet - but we'll be evolving to git flow the minute we want to ship.
Eventually as more devs join, we'll need the full-blown PR workflow, that also naturally stems from its predecessor.
There's a "meta-workflow" here, which implies which workflow to use.
Are you kidding? :)
CVS, Subversion, Perforce, and some propriety revisioning systems were HUGE PITA. People used to lock files, so noone else could change them when they were working with them. Merging different feature branches was like locking whole repository with huge lock, so noone else could touch anything at the time.
Any time I had to work with legacy revisioning systems I wanted to kill myself. Git workflows are conventions to help improve small issues with staff that wasn't even possible befor (before Distributed VCS). And a lot of these problems are superficial ("oh those merge commit are annoying in `git log`").
It reminds me: https://www.youtube.com/watch?v=ZFsOUbZ0Lr0
I (have to) use one of these systems you speak of without the horrors of having to worry about merge histories or organizing any branches whatsoever.
I have to worry that someone in California broke some code, somewhere right before lunch and it'll take me as long to workaround as it will to just take a second lunch.
I have to worry that if I miss a release because of QA issues I need to spend a bunch of time reversing all my changes temporarily and then redoing them in a new release and hoping nothing got borked in between.
I have to worry about how to show someone out of the office what my changes look like because they're still too janky to push into the release branch.
If I'm working from home and Comcast decides I'm due for my weekly outage then I'm SOL.
I have to worry about a failed commit going and locking files on other people because apparently I'm the only person ever to run a command that fails (granted, PEBCAK), so let's not worry about making that failure impact a centralized shared state in a negative way.
You can spend a lot of time thinking about the best way to do something in git because it lets you do a lot. The choice is essentially "Elaborate workflows" vs "bridge the inadequacies of the tooling by brittle ritualized human processes".
Git problems are VCS' first-world problems.
They generally don't exist with Git, either, if you limit yourself to the same set of features and capabilities that other revision control systems have.
Take Microsoft Team Foundation version control for example. All of the "problems" with Git disappear if you use a workflow that gives you the same features as TFVC and have the same expectations (especially around history): collaborate around a single shared repo instance, push every commit when it's made, and allow fast-forward merges only. Conflict? Rebase locally.
Personally, I think a good idea for a Git tutorial would be to recreate the workflow of another version control product into Git in order to introduce some of the new ideas in a familiar war.
Subversion instead is a tree with projects as nodes toward the leaves, each project with its own trunk, branch, tags. It's each of these projects that corresponds to a git repo. Teams I worked on always treated each project as its own "repo" ... so the central single Subversion tree became like our 'github' or 'bitbucket' ... and one could do all the lockless branching within each project, no problem. YOU COULD BE AS NON-LINEAR IN THIS APPROACH AS YOU NEED TO BE, with full support for branching, tagging, merging, etc.
Where Subversion was much better was in supporting consolidated views of multi-project build / release environments, or in mixing sub-project code in parent-project code. Using svn:external it was each to put "that subproject in this place in my own project". Using git submodules and other approaches is a pain. You end up having to check out a bunch of git repos and managing your own glue.
Over time, my workflow has become simpler and simpler. I've worked with some weird and wacky workflows before, which have been born from a given requirement, such as quick deployment to a number of different environments, or two separate teams working on separate parts of one codebase while maintaining separate CI workflows. Some of these workflows have seemed absolutely mental, but I've seen them several times over in different places, so there must be some kind of logic to the madness.
Different dev teams have wildly different practices, so it'd be good to acknowledge the "typical" way of doing things, and embracing the workflows that work if you need to do something out of the ordinary.
Without a firm grasp of one's intent(workflow) learning git commands is pointless and leads to people desperately flailing out commands.
What do you think?
Much simpler, opens the door for feature toggles, continuous delivery and more without any merge headaches.