Hacker News new | comments | ask | show | jobs | submit login
Comparing Git Workflows (atlassian.com)
268 points by AJAlabs on May 26, 2016 | hide | past | web | favorite | 101 comments



One of the things I hate about the traditional git workflows describe there is that there is no squashing and the history is basically unusable. We have developers where I work that use our repo as a backup, then when things are merged to master, the history is littered with utter garbage commits like the following: "commit this before I get on the plane" "whoops, make this compile" "WTF?"

These add no benefit to history, and actually provide an impediment to bisecting (since a lot of these intermediate revisions will not even compile).

At my previous job, we used gerrit. The nice thing about gerrit from my perspective is that it kind of "hid" all of the intermediary stages in the gerrit review. So you could push all you wanted to your gerrit review fake-branch thing, and when you finally pushed to master, there would just be a nice, clean atomic change for your feature. If you needed more detailed history of the steps during review, it was there in gerrit. But master was clean, and bisectable.

Is there any git other git tool or workflow which both allows people to back things up to the central git repo AND which allows squashing changes down to meaningful bits, AND which does not loose the history of review iterations?


Do you use feature branches, or does everyone work off `master` ? (From your comment it seems like you do)

If you use feature branches, then it might help to

- rebase interactively to clean up/edit/remove commits that are not relevant before merging

- merge into master with the `--no-ff` flag - this forces Git to create _one_ merge commit, even if it is a fast-forward merge

FWIW the two above can be used individually or together. The way I work (and many others I work with) is

- create a feature branch off master

- hack, commit, hack, commit

- rebase interactively to clean up history

- issue PR on Github

- Merge using Github (which under the covers does a `--no-ff` merge so you get ONE merge commit)

Bisecting with this workflow is a bit more coarse grained than if you did a workflow allowing fast-forward merges b/c usually the closest I get is to know that it was a PARTICULAR merge introduced a bug or a regression. That merge commit might have any number of commits that constitute it.

Hope that helps.

[Edit: Formatting]

[Update 1] - If you use the CLI to merge, perhaps alias merge to merge --no-ff so you don't forget?


Re --no-ff, you can flip the behavior so that you need --ff when you really want to fast forward by adding this config:

    [merge]
      ff = no


We use feature branches, using atlassian stash rather than github.

I'm confused though -- I thought that if you rebased & squashed something after you pushed it, then it would confuse the git clients of anybody who had pulled before the squash?

Thanks so much for all the suggestions!


It does confuse anyone who has pulled before the squash, but if you're working on feature branches, then there shouldn't be anyone else who's working on the branch. If the feature is too large for one person, let's say you have 5 people working on a feature branch; then each one of those 5 people should be branched off the feature branch, for the same reason.


> I thought that if you rebased & squashed something after you pushed it, then it would confuse the git clients of anybody who had pulled before the squash?

When a single developer if working on a feature in their own feature branch, and using that branch also as a backup (e.g. pushing "going to lunch" commits), there should be no need for anyone else to pull this branch when the work is unfinished and ongoing.

Well, they can pull it to e.g. take a look at the code, but as long as they contribute to the branch themselves (and why would you base your work on someone else's "going to lunch" commit?).

If a group of people works on the same feature, then they should set up a "master-feature" branch, in addition to their personal branches.


One of the pure joys of working with Git is being able to commit early and commit often. I'll often commit a 100 times in a day, and not always with cogent commit messages. The freedom to make a mistake or false start and know you can rewind to an earlier point is very liberating.

But before I share that work with anyone, it needs to be squashed and then broken out into a logical progression of cleaned up commits with good descriptions that are bisect friendly.

By all means use the power of Git to give you great freedom while coding locally -- but don't push the resulting cruft to shared repos.


I use the index as a sort of quasi-commit, which lets me effectively step back one revision if needed. I find this to be a lighter-weight option than many commits that may be meaningless, while providing probably 90% of what those commits would give me throughout the day. As a result I typically make 4-5 actual commits but may have had many transient points at which I could have - and sometimes did - revert a set of changes.


To avoid this the team could agree on what branches rebase is OK, maybe adopting a naming scheme that will make clear what branches might pull the rug from under you (e.g.: branches namespaced with the developer's username (i.e.: drewg123/fix-for-bug)).

I personally prefer to rebase on a new branch, naming the new branches with a suffix in the form "-vN". Would something go wrong with the rebase, it will be way simpler to reset the new branch to the head of the old than to recover from the reflog. Nowdays I rely heavily autosquash, interactive add, interactive rebase and Magit (which makes the later two a breeze).


Usually, a feature branch is a branch from which only a single person works from. So other git clients aren't part of the equation, there is only you. As such, rewriting history/git push --force is allowed there.


We also use Stash and just enforce that PR descriptions should be the "clean" version of your combined commit messages. This way you:

a) Never change pushed history

b) Have the Pull Request act as a single source of what changed in each feature branch. Since Stash also includes comments from code review, we find it offers more context than straight commit history

c) Don't have to worry about team commit-early-commit-often vs. team commit-only-when-it-works-perfectly. The changes will all be centrally documented in the PR, no matter the individual styles


I haven't mastered rebasing yet (nor interactively) but I've started feature branching off master, committing (and more if needed), then checking out master and `git merge --squash feature/<branch>` to squash the commits. Seems alright, so far.


I find it mildly amusing what you're calling "traditional git workflows".

It seems pretty clear to me that the traditional git workflow is the one used by the Linux kernel, which is after all the reason why git exists in the first place. The traditional git workflow strikes the correct balance between locally rebasing to create a logical sequence of meaningful commits without squashing everything when your work gets published and merged.

A lot of the noise on the git-related threads here on HN would probably be avoided or at least reduced if everybody was actually aware how the traditional git workflow f- i.e. the Linux kernel workflow - works.


I completely agree. The Linux kernel workflow is the only way to use git effectively and the history is always useful and each commit is nice and self-contained. Some people apparently don't understand that "no squashing of patchsets" means that the merger shouldn't squash a patchset (not that the developer can't make the patchset clean) and that "no rewriting history" refers to master once things have been merged into it.


It's not the only way to use git effectively, but it's definitely the canonical one. I think a Chesterton's fence rule should apply to git: if you're doing something different from LKML, you should be able to clearly explain exactly why. No linking to some blog post you found, either, unless it's by Junio Hamano (and you still have to explain, you can't just link).


Is there some ~concise guidelines or post about how the Linux kernel developers use git?



I wonder if 'expandable commits' would be useful, where in the full history, a change (with all it's code review fixes, etc) would appear as one commit, but if you wanted to dig deeper, you could 'expand' that commit into all it's gory details of 'draft version before review', 'tried refactoring this part but gave up', etc.


I thought the very same thing, as I can see the benefits to both having a single atomic commit for ease of reading history, and to being able to see the individual commits that made it up to avoid losing valuable information about changes.

It turns out that the Github "Squash and Merge" option for Pull Requests does basically this – you review the PR as a bunch separate commits, when you click "Squash and Merge" a single commit is created in the target branch with all the commits squashed, but you can still go back to the original closed and merged PR (e.g. by following a link in the commit message) to view the individual commits.

Obviously this doesn't help if you want to bisect among more granular commits, but (without having actually used it), it sounds like a good middle ground to me. The previous workflow I was used to involved squashing and force pushing to your branch, therefore overwriting the individual commits in the PR for ever.


Now your history is in two places and can't be inspected from a cli. If you ever stop paying github (for a private repo) or github goes out of business your data is lost.

Or you can use git to track this with merge commits and it comes along wherever you go.


I think you can do that if you merge with --no-ff, then use git log --first-parent. Only the merge commit will appear on the list, hiding the commits from the merged branch.


Does git blame let you see the same thing? I don't see --first-parent as an argument to it.

It's only helpful if the 'un-expanded' commit is used through the whole system, from git blame, to rebases, etc.


I have used git blame with --first-parent to get the 'un-expanded' commit identifier in the blame view.

I'm rather new to the --first-parent flag myself but I think it works with anything that supports revision options.

Here's what I see in git 2.8.1:

  usage: git blame [<options>] [<rev-opts>] [<rev>] [--] <file>  
      <rev-opts> are documented in git-rev-list(1)
And --first-parent is listed as an option here: https://www.git-scm.com/docs/git-rev-list


This is exactly what merge commits are.


Phabricator does this, effectively. The Diff URL is in the commit message for the squashed commit to master; you can go there and see all its constituent commits in their gory detail.


A couple weeks ago GitHub added merge squash as an option for accepting pull requests (via the UI, you could always do it manually). As long as you don't delete the feature branch you'll still have the history of iterations in that branch.


> As long as you don't delete the feature branch

This, unfortunately, is not a great thing to propose.

Many of git's operations on refs are basically O(n). You really don't want to accumulate an unbounded number of them over time.


I agree with you, but I was responding to the request to keep "all the history". My understanding is the ref count would be the same if you merge in all the history or keep the branches around, so in those cases it wouldn't make a difference. I could be wrong though -- I don't know enough about git internals to know if one is worse than the other.


TAKE HEED! This is the solution to everything


> history is basically unusable.

I can see how, in the second workflow (feature branch workflow), this can become a problem. If the history looks like this

        ---------
       /         \
    --A           E--
       \         /
        B---C---D
then we can sort of guess that commit B started a feature branch and E is the merge commit that merged the feature back to master. And commits B, C and D can be "unusable" intermediate work, maybe even uncompilable. But if nobody ever works directly on master, all development is done in the feature branches, then when E has two parents, A and D, and A is a merge commit and D is not, then you know that A is in the master branch and D was in a finished feature branch.

This gets more complicated in the third (gitflow) workflow, when all feature branches also have their own "feature master" branch and developer branches. Then all commits in the several "feature master" braches are merge commits, and then all the commits in the main master are merges from those merge commits. Then a commit in the master branch has only merge commits as its parents, so how can you tell which one of the parents is the previous commit in the master, and which one in a closed, finished feature branch?

If you could tell, then when bisecting or whatever, you could first just go back in history in the master, and only when needed, take a look at more details in the feature branch commits which contributed to the single merge commit in the master.

All this is solved by the named branches in Mercurial, where a commit carries forever a nametag, to which branch in originally belonged. You can add informal extra info in git commits in commit messages, so with some extra tooling you could make the history in these branching models usable again.

Then again, in the fourth workflow (forking workflow) when developers pull directly from each other, and there is no central repo, permanently tagging a commit with a branch name would make no sense. And this is the use case with Linux kernel work, for which git was originally designed.


> ...this is solved by the named branches in Mercurial... I'm not super-familiar with Mercurial (and only somewhat-familiar with git), but couldn't you get the same effect as named branches by just not deleting fully merged branches in git?

> ... how can you tell which one of the parents is the previous commit in the master ...

You can use --first-parent [1] to disambiguate that. In a nutshell, the master branch in your example would be the first parent, letting you disambiguate (and there's support for this in a lot of other git tools!)

[1]: https://git-blame.blogspot.ca/2012/03/fun-with-first-parent....


The --first-parent history can be ruined pretty easily by an inexperienced Git user. They'll use their local master as their feature branch for days and then they'll try to push it. Git will unhelpfully tell them to first do a "git pull" before pushing. So they do, which leaves them with a bullshit "merged master into master" merge commit and then they push that shit, thereby guaranteeing that --first-history will always omit the actual history of where origin/master was and instead take a trip through this dude's little feature adventure.


Yikes. Fair enough; that would definitely trip it up!


Commit messages are very rarely read. I'd say it's well worth committing anything that compiles (that can only help bisection, which is a much more important use for history than human readability). Even a non-compiling commit isn't really a hindrance - if it doesn't compile mark it as unknown (you're using an automated bisect script that does that, right?), and you end up with the same result as if that commit hadn't been made.


I am always reading commit messages to find out why such -and-such a thing changed. I agree that making frequent commits is really great but I always want to see every commit have a meaningful message, either at the time it is made or in cleanup before pushing to the remote branch.


Gerrit by itself won't fix the problem you are describing. It is still possible for a commit to pass code review and not compile at all.

Someone needs to start enforcing proper commits, and a CI system need to be setup to disallow broken commits.


> the history is littered with utter garbage commits like the following: "commit this before I get on the plane" "whoops, make this compile" "WTF?"

You have a problem with commits here, not the lack of squashing. A messy history is only messy if you make it so.

> These add no benefit to history, and actually provide an impediment to bisecting (since a lot of these intermediate revisions will not even compile).

This is also a problem---every commit should compile, even if only for precisely the reason of bisecting.

Squashing commits is avoiding the problem; instead, perhaps educate the others on proper practices for committing, and enforce those practices.


>--every commit should compile,

No, that advice is wrong and not relevant to a git private branch. That strict & disciplined attitude about commits was relevant for older centralized source control tools with lock/checkout/checkin/unlock such as CVS and SVN. In that previous scenario, your colleagues depended on the shared repo to properly compile and therefore, you shouldn't "break the build" and derail the team.

>educate the others on proper practices for committing,

On a private branch, people should commit whenever they want on any whim of a reason. This will result in many commits that don't compile/build. That's ok. That's what the later step of "squash" into "logical" commits is for.

To repeat a previous comment about it:

The confusion is that the same "git commit" command is used for 2 very different semantic purposes:

(1) git commit -m "fixed bug #23984" --> as Logical-Unit-Work and worthy of bisect

(2) git commit -m "wip" --> as meaningless backup/savepoint like Ctrl+S save

The type (2) was for the programmer's internal purposes of safety backups, cleaning up whitespace, typos in comments, reflexive muscle memory of saving often, etc. Type (2) commits can have deliberate broken syntax and they're not meant to be built or be bisected.

Type (2) commits should never be discouraged because saving work often (including broken midstream work) is a good habit but from an outsiders perspective of the reviewers upstream, they are way too noisy. The spurious commits could be less than 30 seconds apart with no compile/build step in between.

>Squashing commits is avoiding the problem

I hope it's now clear that "squashing" is the correct tool for Type 2 commits.


> On a private branch, people should commit whenever they want on any whim of a reason.

The concern was when that private branch isn't cleaned up before it is make public; it then becomes an issue.

Though see my reply to developer2 (sibling of your post) for rationale against garbage commits to begin with.


But if I commit on a whim, e.g. before jumping on the plane, how can I then patch to separate some changes in two commits?


I'll often use WIP commits at the end of the day, or when context switching to something else, then when I'm ready to carry on I reset back to the previous commit and it's like nothing happened.


git rebase -i master


The nice thing about git is that your commits can be garbage. Until you push (or, much less commonly on most teams, let someone pull from you), your local commits can be an utter disaster, and it just does not matter. From a purely technical standpoint, there is no excuse for messy logs. Any mess is the result of the user, not the software.

The problem is that using rebase to clean up your local commit log before sharing with others isn't the easiest thing to learn. Even once you supposedly "know what you are doing", it is still possible to make a mistake. Git has a lot of amazing functionality, but the majority of commands are not intuitive to use out of the box. Even GUI frontends to git don't manage to simplify the more complex commands all that much. I'd like to believe I'm an "intermediate" git user, but I run into issues often enough that I'm sure I overestimate my knowledge.

Most teams I've seen using git wind up using only the core commands (clone, commit, fetch/pull, merge, push), essentially using git as a drop-in replacement for svn without taking advantage of the additional possibilities git offers. Again, this is because becoming a git guru is a steep learning curve. It doesn't matter if 19 people on a project know everything about git; it only takes a single 20th person to make a tangled mess out of a centrally shared repository. I've spent many an afternoon working to rectify botched rebases and similar issues.

One example is forcing a push for a specific branch, knowing that the resulting destruction of history is desired and not harmful to others; only to forget to specify the branch name on command line, which results in force pushing all branches. Whoops!


> The nice thing about git is that your commits can be garbage.

That's a dangerous practice. When you first commit the changes, you have all the necessary context, and hopefully flow hasn't been interrupted. If you rebase hours, days, maybe weeks later, then that context is completely lost---it's just like trying to get back into a project that amount of time later.

The reason the detail you'd put into your commits is so useful is because you have a perspective that others won't; if you return to it later, you'll be reading the diff and suffering just like others, albeit with a bit more knowledge.


I could have been more specific about the scenario I was referring to. Many people commit to git very frequently in such a way that their local git history is more like an IDE's undo log. You can commit 10+ times in the span of an hour, using the log as nothing more than a savepoint system. This is really common from people who for some unfathomable reason perform a commit before testing, which results in a lot of those one-liner commits that really do not deserve their own commit (this specific task can be done via amend, but you don't see it from most devs).

Git commits are so fantastically cheap, and especially easy if it's a one second task via keyboard shortcut in your IDE rather than terminal commands. You wind up with a lot of commits throughout the day all related to accomplishing a simple task, and then rebase them down into a smaller number of meaningful chunks of work.

I agree that this should not be the workflow for larger units of work. If you're working on a week-long task, you would use this flow to refactor each day's 50 commits down to what you would have normally committed (maybe 1-5 commits for the day). This allows your feature branch's log to contain meaningful information. At the end of the week, when your feature branch is complete, you then have the additional choice as to whether the branch's already-more-compact history is meaningful; if not, you can further squash the merge commit.

tldr:

>> When you first commit the changes, you have all the necessary context

Rebasing your local changes before sharing is for the case where you're spamming loads of meaningless commits that don't have - or deserve - their own context.


I see; this is a very different workflow than I am describing (or am used to). While I understand it's useful as a snapshot, that's not a useful use of history in Git. In fact, it's quite useless, and I'd agree that those commits should not remain, and should be replaced with something useful.

But since context has been lost, that "useful" commit might just be a single, squashed one; I've observed this situation at work from others.

So, this is a situation where those users are digging their own hole.


I commit for lots of reasons, though. Sometimes, I am about to try something crazy with the code, and I want a save point to go back to. Sometimes, I want to switch computers in the middle of working on something (going from desktop to laptop, for example), so I commit and push to the other computer.

Also, if I am working on something very complex, I might go hours or even days with code that can't compile. I don't want to risk losing that because I was avoiding committing.

A lot of these problems can be resolved with the proper use of --fixup and --squash and auto squash rebasing.


My argument was for not retaining those commits, or not committing them in the first place; drewg123 was complaining about the situation from others, meaning that their private branches were not cleaned up prior to pushing. You wouldn't push your WIP branch, I would assume.

I have also pushed for the purpose of pulling on another PC; that commit then gets `reset --soft`'d.


I usually want to push it somewhere, to maintain a backup in case of hardware failure.


It'd pe perfect, if only we could take people out of the equation.

Seriously, I think that the cost/benefit ratio of a perfect history is so terrible that there are very very few people that follow through with it.

If one submits a PR and it has a messy history, what is the reviewer supposed to do? Make them go through the commits and clean them up? How will the reviewer verify that every commit compiles? Make them squash it? We are back to square one. If you only have squash commits from PRs and run CI on them, you are sure that every commit compiles.


> If one submits a PR and it has a messy history, what is the reviewer supposed to do? Make them go through the commits and clean them up?

That's up to the project. Some do, yes, for the reasons that I described---it's much easier to review a series of patches that can be comprehended, and has benefits later on (e.g. bisecting). If the patch is large from a contributor, and a bisect arrives at that commit, and the contributor is not the one doing the debugging, it can be a frustrating and inefficient experience. I've wasted many nights on something that could have otherwise been immediately obvious.

> If you only have squash commits from PRs and run CI on them, you are sure that every commit compiles.

It's not difficult to loop through commits and make sure they compile.

In fact, bisect can do it for you, if you aren't fond of command-line loops.


For larger projects this takes literally hours


It's not a one-size-fits-all type of thing; if it doesn't work for your project, then you need to try something else.

Where I work, we don't enforce that each commit build---it's known to be a good practice, and they're going to have someone flip out on them if they're trying to debug something and they have to skip a dozen commits when bisecting (incidentally, I had that problem today).

In that case, it's a cultural thing. If someone consistently commits code that doesn't build, then they should be addressed.

If it's a random contributor to a project, more care should probably be taken. If building each commit isn't feasible, then maybe only building modified files (that is, a normal `make` for example rather than a fresh workspace) is better than nothing. If that's too long, maybe build a few sample commits. Etc.

In any case, even if you don't build each commit, the history is still useful.


Ya I do an end of the day commit of alot of things (to start fresh in the morning) it doesnt mean commit trash at the end of the day. It implies to stop 5 minutes early, make a clean commit of the WIP (which compiles, or throws no syntax errors in the PHP world) and call it day.


To aid in comprehension, fine-grained bisecting, and debugging, I commit as small of a patch as I can for a given change; if I have to use "and" in the commit message, I might have gone too far. I also commit a ChangeLog-style message describing the changes so that they can be eyeballed without having to read through the diff.

I find that especially important for bisecting, and on that merit alone (I have other reasons as well) reject squashing commits---it makes bisecting useless for large changes. When someone commits what you can better call a project, you're going to struggle, even if it's easy to comprehend the code. Same goes for review requests---we encourage our team to post small reviews, and all of us get frustrated when we have to try to grok a 500+ line diff (unless those lines are entire new or entirely removed large hunks).

At the end of the day, my WIP is usually quite small, with some exceptions. Even on complicated changes---much of that planning is on paper (or mental paper) or writing test cases. WIP can be hard to get back into when flow is broken, letalone the next day, or after the weekend.


Depending on what you are working on, this sort of "cleanup in 5 minutes" task might be impossible. There are lots of reasons you might need to go a longer time between working code than the length of time you want to go with no backup or save point.


This. If your team writes bad comments and cryptic names, you don't respond by deleting all comments and obfuscating all names, you tell them to do better work. Why are commits taken less seriously?


For those of us working in domains where auditors might care what has happened, there's a constraint: Auditors want the commit history to reflect what really happened, not the sanitized version of same. One of the nice things about git is that you can have a fork with a clean commit history and an "official" repo that reflects the real history of its merges.


That would seem to suggest that you commit either after every keypress or at least before using the backspace or delete keys, right?

It feels to me that "what really happened" only really applies to "what did we release?" rather than "how did we get to what we released?". Completely agreed that every release should be tracked in version control without modifications, but I'm skeptical that auditors care that you forgot to run the linter before committing and then you did a follow-up commit to add a semicolon where one was missed, but all of that happened between releases.


It really doesn't matter why you (or I) think. It matters what the auditors will accept.


And I've worked at places where developers are so intimidated by the "don't check on a commit that so-and so wouldn't approve of" that they have lost significant work. I would say help people work in a branch or some such and encourage micro-commits. A clean bisect-able history is a laudable goal, but it is one concern among many.


You could use `git merge --squash` with Feature Branch or Gitflow. But IMHO it's better not to squash.


Git reflow does this (and more). If you want to do presubmit code reviews, and squash merge to master to preserve history. You might like it. https://github.com/reenhanced/gitreflow


I am guilty of that. But what can I do? I use dropbox as a sort of backup (in case of a lost laptop or HDD failure), which doesn't work with git. When pushing to personal & private projects that's fine, but I need a better backup soln when pushing to a public repo..


You can squash commits locally before pushing. I'm not a fan of "one commit per feature", but I still use interactive rebase on my personal branch to combine (or split) commits before merging it with an upstream branch.

To avoid cluttering the main repo but still have a backup, you can just use a fork (even just a directory on a remote server).


If you're working on branch `myFeature`, simply branch to `myFeature_BACKUP` and commit your entire working copy with commit message "WIP for the night; for remote backup only". Push this to the remote to back it up and protect against a lost laptop.

The next day, do a "reset --mixed" from the tip of your backup branch back to your last good commit on the feature branch. This brings all of the remote backup code into your working copy. From there, start work again and begin making staged commits of atomic changes.


Gross.

In a private environment squashing history is the precise opposite of what I want. I want immutable history. Anything anyone every checks in is there forever. Safe and secure. Impossible to lose. Impossible to screw up.

Furthermore, I want all the changes they made along the way to their feature. Because lord knows there will be a moment down the road where there's a line of code that doesn't quite make sense. And I'll want _full_ history to understand where that line came from. See how it evolved.

Git still doesn't have anything as good as p4 timelapse view. Which is deeply unfortunate. That's a great tool for spelunking the past.


> Because lord knows there will be a moment down the road where there's a line of code that doesn't quite make sense.

This is that code reviews and code commenting are for. You shouldn't need to dig into the nitty-gritty of multiple commits of a single feature to understand a single line. Written once, read hundreds—right?

> And I'll want _full_ history to understand where that line came from. See how it evolved.

So you want a full keystroke history as well, then? Because as far as I'm concerned, this is essentially what an unsquashed commit history is.

Don't get me wrong, I'm not saying you should have 15,000 line commits that encompass a single feature—these lines should have made their way back into the code a long time ago—but seeing dozens of one-line changes is useless: "fix ci by including dependency"; "&& instead of ||"; "tidy up style"; "strings should be UTF-8"—and these are just useful names for useless commits which totally ignore commits of "..."; "shit"; "fixes stuff"; "work"; "progress";.

These commits just fragment the history and especially gitblame, because it becomes difficult to see many changes cleanly wrapped into a single, logically grouped block.


There exists a spectrum from full keystroke history to squashed commit. Squashed commits throws away information. It is lost forever. Full keystroke contains all that information information but it is incredibly noisy. Per commit state snapshot is a pretty happy median. It doesn't tell you all the things that keystrokes could tell you. But it's very easy to use. And tells you a lot of things that squashed commits does not.

The obvious answer is to squash commits, but keep the information. Intermediate commits aren't needed very often. But they can be exceptionally useful when they are needed. Especially if the commits were from an employee who is no longer working at your private company.

Git still needs a p4 timelapse view tool. It's wonderful. All that tool needs is a little check box called "expand squash". That's the best of both worlds. Everyone gets what they want. But that tool and checkbox don't exist. So given the choice between squashed vs unsquashed I'll take unsquashed 100% of the time. At a private company that is. Open source projects, especially large projects, might choose differently.


Data is not necessarily information. If three commits to fix a typoe make you happy, so be it. I don't need and I do not want this kind of history and information.


In throwing away three commits to fix a typo you also throw away 10 commits that show how the parameters in a function declaration changed over time. You may need that information. And you're choosing to throw the baby out with the bathwater.


Doesn't an undeleted feature branch essentially offer the same functionality as "expand squash?"


> Gross.

> In a private environment squashing history is the precise opposite of what I want. I want immutable history. Anything anyone every checks in is there forever. Safe and secure. Impossible to lose. Impossible to screw up.

> Furthermore, I want all the changes they made along the way to their feature. Because lord knows there will be a moment down the road where there's a line of code that doesn't quite make sense. And I'll want _full_ history to understand where that line came from. See how it evolved.

> Git still doesn't have anything as good as p4 timelapse view. Which is deeply unfortunate. That's a great tool for spelunking the past.

Seems like this is an unpopular opinion here. I can only imagine the down voters are Linux kernel developers? With 600k+ commits, they have some standing to make rules forcing people to squash commits.

Of course nobody can force (for example) the discourse project to accept unsquashed commits. Any project is fre to refuse contributions on any reason. Who am I to say anything if a project won't accept my pull request because of my gender or race or because I am a moron who uses spaces in some places and tabs in others? I've said before, developer is free to squash their own commits before they push. I will indulge in a lot of bike-shedding (latest of which has been gpg signing commits).

I will say though that there is some merit to the idea of people who want to force squash down our throats. Firstly, I'd like to present my point of view. My point of view is that if we keep all commits, then we have all commits. We can drill down the list of commits if we ever need to do so. Why would I want to throw away information? Unlike the kernel project which is 1.6 GB of source code and 600k+ commits just on torvalds/master we will probably never have this issue.

This is not to say I think the opposite idea is completely without merit though. I understand the opposition to keeping all commits. In keeping all commits, you could argue that we are actually throwing away information because we can keep only so much information in the "staging area" of our brains. If we just start using commit as an alternative to ctrl + s, we might be throwing away information because we will lose the context of what we were thinking when we made that commit when we come back and look at it in six years.

I'm a big fan of Uncle Bob. I don't always do TDD but I think only committing once unit tests pass locally is a good compromise here.

Also, would the downvoters please care to elaborate why they downvoted forestthewoods?


> I can only imagine the down voters are Linux kernel developers

I specified "private environment" for good reason!

I readily accept that source control needs for large, public projects is unique. I think it's important to recognize that Git was made for a very particular purpose. Other environments, such as private work environments, have very different wants and needs.


git flow has added a -S option in v0.4.2 which hasn't been released yet https://github.com/nvie/gitflow/issues/14. Should do what you want if it's combined with -k (to keep the feature branch) ?


What about just using --fixup and --squash with your commits, and then use autosquash rebasing?


Ok, I think there's something that could be tried with git, that both

* saves the full history of work * shows only meaningful history on master * keep the history of pull requests

I've never done this, and I'm only running from approximation, but you actually made me curious enough to think about it. I must admit that I'd seriously like the more successful git hosting platforms to seriously consider point 3, because the history of a given pull-request is impossible to follow (you never know if the committer added a new commit, or if they amended the last commit)

So:

1. Hack and commit along the way

  ----A----B----C
                  \
                   --D---E---F
2. When you're ready for a pull-request, create a merge commit and put a tag on it saying "PR 123 attempt 1". The merge commit should contain a summary of whatever happened in the initial branch. Put the merge commit in its own branch, like "PR 123" and create a pull-request from this merge commit to master

  ----A----B----C
                  \
                   \-------------M
                    \           /
                     --D---E---F
3. If you want to modify the commits, do the changes on the feature branch, and when you're done create another merge commit. Put it after the first one in the "PR 123" branch, and put a "PR 123 attempt 2" tag to it. Using standard tools such as github or bitbucket, by putting the merge commit in the same branch the PR will automatically be updated

  ----A----B----C
                  \
                   \-------------M1-----M2
                    \           /       /
                     --D---E---F---G---H
4. You don't have to put changes after the ones in the previous attempt, they can totally be in parallel

  ----A----B----C
                  \
                   \-------------M1-----M2--M3
                    \           /       /   /
                     |-D---E---F---G---H   /
                     \                    /
                      \                  /
                       ---I---J---K------
5. When you like the changes, just merge Mx into master

  ----A----B----C -----------------------------L--
                  \                           /
                   \-------------M1-----M2--M3
                    \           /       /   /
                     |-D---E---F---G---H   /
                     \                    /
                      \                  /
                       ---I---J---K------
With this scheme you have the history of the PR itself through the branch, where you can see the different attempts, and the summary at each attempt. When you're on master and you want to have a quick overview of the history, you can use "git log --first-parent" to only see the commits that are part of this branch and not all the parent. In the poor drawings I attempted, you'd see L, then C, then B, then A. The history is easy to see, however a little googling shows that bisecting this is not exactly trivial. You'll have to instruct git bisect to skip over M3^C, ie "everything that is in M3 but not in C".

Unfortunately the standard git isn't smart enough to show two parallel branches as truly parallel, it tries to squeeze the graph to as few parallel branches it can show, meaning they all needlessly look intertwined. Moreover it requires manual fiddling in merge commits to properly tag them and edit their message for it to be meaningful, but nothing that can't be easily programmed.


This article has actually been around for a while. Explains it really great. But one advice from me is that try to choose only what is sufficient to your project and team. No benefit in being overequipped for a simple job.


I agree--- I've worked with several of these, but for most of the (smallish) projects we do, we end up coming back to a centralized repository, occasionally using feature branches if necessary.

We're a small shop with only 1 or 2 folks working on the code at a time, and mostly are using git as part of our deploy process... so simplest is the best for us.


This, over complicating the git workflow will make people not using it. In my experience the feature branch works for 80% of the projects, gitflow works for bit projects and I guess (no experience there) that the forking will work for open source projects or HUGE code bases.


even reasonably large open source projects don't need anything beyond feature branches and one long-lived master branch.

and if you're going to back port to historical releases and hotfix and or patch them, its better to fork off master and then cherry-pick back, and then eventually end those branches.


You can also evolve, basically, to each model in the order that they appear in the article.

As an example: I've been working on a new spike for the past 2 weeks with one other developer. Maybe 10 times a day we'll need something that the other person has committed, so we work against one branch (master). The workflow suits this extremely rapid iteration.

One repo has now matured to the point where developer branches make sense. We created "develop" on it as well as our own branches off that. We're not close to a v0.1 yet - but we'll be evolving to git flow the minute we want to ship.

Eventually as more devs join, we'll need the full-blown PR workflow, that also naturally stems from its predecessor.

There's a "meta-workflow" here, which implies which workflow to use.


developer branches never make sense


Tell those guys ;)

https://git.kernel.org/cgit/


There are many people in this conversation describing then as the Savior of comments for commits, as it not using them is a disservice to the team. I'm not much of a coder myself, but the suffering opinions are interesting.


Excluding the possibility of a laptop getting lost or stolen.


git is not [necessarily] a backup.


It amazes me how the entire software industry seems to be adapting its workflows around the necessity of making Git usable. While there are certainly other positive attributes about some of these workflows, the main reason people use them in my experience is because "if you don't use workflow X you get undesirable problem Y with Git". Most of these problems simply didn't exist or were not nearly as severe with previous revision control systems, so we never needed these elaborate workflows. Now suddenly Git is considered a defacto tool, and downstream effects of using it are transforming the entire software development process.


"Most of these problems simply didn't exist or were not nearly as severe with previous revision control systems"

Are you kidding? :)

CVS, Subversion, Perforce, and some propriety revisioning systems were HUGE PITA. People used to lock files, so noone else could change them when they were working with them. Merging different feature branches was like locking whole repository with huge lock, so noone else could touch anything at the time.

Any time I had to work with legacy revisioning systems I wanted to kill myself. Git workflows are conventions to help improve small issues with staff that wasn't even possible befor (before Distributed VCS). And a lot of these problems are superficial ("oh those merge commit are annoying in `git log`").

It reminds me: https://www.youtube.com/watch?v=ZFsOUbZ0Lr0


In Subversion you'd need to explicitly lock the file, it doesn't require it. Merge the is also not as bad as you're describing. I actually had more headaches with merges in Git than in Subversion.


> these problems simply didn't exist or were not nearly as severe with previous revision control systems

I (have to) use one of these systems you speak of without the horrors of having to worry about merge histories or organizing any branches whatsoever.

I have to worry that someone in California broke some code, somewhere right before lunch and it'll take me as long to workaround as it will to just take a second lunch.

I have to worry that if I miss a release because of QA issues I need to spend a bunch of time reversing all my changes temporarily and then redoing them in a new release and hoping nothing got borked in between.

I have to worry about how to show someone out of the office what my changes look like because they're still too janky to push into the release branch.

If I'm working from home and Comcast decides I'm due for my weekly outage then I'm SOL.

I have to worry about a failed commit going and locking files on other people because apparently I'm the only person ever to run a command that fails (granted, PEBCAK), so let's not worry about making that failure impact a centralized shared state in a negative way.

You can spend a lot of time thinking about the best way to do something in git because it lets you do a lot. The choice is essentially "Elaborate workflows" vs "bridge the inadequacies of the tooling by brittle ritualized human processes".

Git problems are VCS' first-world problems.


>> Most of these problems simply didn't exist or were not nearly as severe with previous revision control systems

They generally don't exist with Git, either, if you limit yourself to the same set of features and capabilities that other revision control systems have.

Take Microsoft Team Foundation version control for example. All of the "problems" with Git disappear if you use a workflow that gives you the same features as TFVC and have the same expectations (especially around history): collaborate around a single shared repo instance, push every commit when it's made, and allow fast-forward merges only. Conflict? Rebase locally.

Personally, I think a good idea for a Git tutorial would be to recreate the workflow of another version control product into Git in order to introduce some of the new ideas in a familiar war.


The article completely mischaracterizes Subversion workflows, making the mistake of treating a Subversion repo just like developers typically use git repos ... one-repo-per-project.

Subversion instead is a tree with projects as nodes toward the leaves, each project with its own trunk, branch, tags. It's each of these projects that corresponds to a git repo. Teams I worked on always treated each project as its own "repo" ... so the central single Subversion tree became like our 'github' or 'bitbucket' ... and one could do all the lockless branching within each project, no problem. YOU COULD BE AS NON-LINEAR IN THIS APPROACH AS YOU NEED TO BE, with full support for branching, tagging, merging, etc.

Where Subversion was much better was in supporting consolidated views of multi-project build / release environments, or in mixing sub-project code in parent-project code. Using svn:external it was each to put "that subproject in this place in my own project". Using git submodules and other approaches is a pain. You end up having to check out a bunch of git repos and managing your own glue.


It'd be nice to see someone collate more git workflows, and what the advantages and disadvantages of these are.

Over time, my workflow has become simpler and simpler. I've worked with some weird and wacky workflows before, which have been born from a given requirement, such as quick deployment to a number of different environments, or two separate teams working on separate parts of one codebase while maintaining separate CI workflows. Some of these workflows have seemed absolutely mental, but I've seen them several times over in different places, so there must be some kind of logic to the madness.

Different dev teams have wildly different practices, so it'd be good to acknowledge the "typical" way of doing things, and embracing the workflows that work if you need to do something out of the ordinary.


Kudos to atlassian for bringing some much needed clarity to a confusing topic. So many people that claim mastery of git only know particular workflows and, when attempting to mentor others, just mansplain whatever they know without consideration that there are alternative valid ways of doing things.

Without a firm grasp of one's intent(workflow) learning git commands is pointless and leads to people desperately flailing out commands.


I think the ideal workflow depends on the complexity you need. I've tried to write about what kind of requirements cause what kind of workflow in http://docs.gitlab.com/ee/workflow/gitlab_flow.html

What do you think?


Or you could actually practice continuous integration and let everyone work on master.

Much simpler, opens the door for feature toggles, continuous delivery and more without any merge headaches.


Those are general development models and not specific to git.


Was this updated recently? This has been up for awhile.


Yes, the article has been up for a while


What was the workflow in mind when Git was designed?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: