I never understood the point of squashing if you just want the short version, only read the merge commits; if you want the full details to figure out some bug or whatever, then yes, I want those fix typo commits most definitely, because as often as not, those are at fault.
Squashing buys you next to nothing, and costs you the ability to dive into the history in greater detail.
I suppose if your project is truly huge, it becomes worth it to reduce load on your VCS, but beyond that...
Squashing for example allows me to have a history where each commit builds. This has been very useful for bisecting for me. I wouldn't call it "next to nothing".
The "greater detail" part can cost me a lot of time.
Having commits that do not build represents the history more accurately. It could very well be a “fix” to some build error that silently introduces an issue, that context is lost when you squash.
> Having commits that do not build represents the history more accurately.
Sure it does, but sometimes that level of detail in history is not helpful. Individual keystrokes are an even finer/"more accurate" representation of history; but who wants that? At some point, having more granular detail becomes noise - the root of the disconnect is that people have a difference in opinion on which level that is: for some (like you), it's at individual commit-level. For others (like me), it's at merge-level: inspecting individual commits is like trying to parse someone's stream-of-consciousness garbage from 2 years ago. I really don't care to know you were "fixing a typo" in a0d353 on 2019-07-15 17:43:32, but your commit is just tripping-up my git-bisect for no good reason.
Sure, I can - but should I? That's the fundamental difference in opinion (which I don't think can be reconciled). I don't need to know what the developer was thinking or follow the individual steps when they developing a feature or fixing a bug, for me, the merge is the fundamental unit of work, and not individual commits. Caveat: I'm the commit-as-you-go type of developer, as most developers are (branching really is cheap in Git). If everyone was disciplined enough not to make commits out of WIP code, and every commit was self-contained and complete, I'd be all for taking commits as the fundamental unit of code change
If the author did something edgy or hard-to-understand with the change-set, I expect to see an explanation why it was done that way as a comment near the code in question, rather than as a sequence of commit-messages, that is the last place I will look - but that's just me
But the problem with this approach is that you're making it impossible to extract the actual changes when you do want them, whereas simply skipping non-merge commits is a minor inconvenience (`--first-parent` tends to cover it).
I mean, granted, it's not ideal. I think this is a bit of a problem with the low-level nature of git - Ideally it'd be easier to semantically bundle such sequences of commits such that it's be more reliably dealt with in the broader ecosystem (not every tool supports --first-parent), and in any case, there's nothing forcing you to maintain the first-parent-is-linear-mainline-history; that's just a tradition which, again, many common tools follow. Then of course there's the poor integration with git hosting (such as github) and git - I can blame a file, but I can't easily correlate that with the discussions in the PRs, and whatever correlation there is is purely online, with all the limitations of a single-vendor non-distributed system like that entails.
Ideally this wouldn't even be a tradeoff at all; it would be obvious how to track history both at the small scale and the larger scale (and perhaps even more?), but alas, it's what we have.
Out of curiosity - when you merge via squash, what kind of commit messages do you retain? Do you mostly concatenate the commit messages, or rewrite the whole thing?
> when you merge via squash, what kind of commit messages do you retain? Do you mostly concatenate the commit messages, or rewrite the whole thing?
Context-specific. For a bigger PR that deals with an extensive refactor I'll prefer to have a descriptive title and hand-curated task list below (so definitely not 1:1 to commit messages). For smaller PRs -- or more focused ones, like those dealing with a single feature or bug -- I'll only leave a descriptive title.
But I usually never leave a list of commit messages. Not because I have no discipline; sometimes some refactoring requires 4-5 steps and all commits have 99% identical messages which is not useful when you aggregate those in a single list of bullet points in the end.
---
> But the problem with this approach is that you're making it impossible to extract the actual changes when you do want them, whereas simply skipping non-merge commits is a minor inconvenience (`--first-parent` tends to cover it).
Again, that's not the issue here. The issue is that when you work on a big project (like many of us do) you get something like 4-7 merged PRs a day; don't pull/fetch for 3 days and you'll get 60+ lines in your terminal when you get to it.
There are people who manage releases and people who chase subtle regressions. Having git bisect narrow it down to a big PR squashed commit is actually a win; it gives them a localized area inside which they can work with other tools (not bisect).
In the end I suppose we can say it's a subjective taste. But I always appreciated the main branch's history to only consist of squashed commits. Again, it gives you a good bird-eye's view.
Storing every single version of the file which ever hit disk locally on my machine in the history would be the most accurate, yet no one seems to advocate for that. Even with immutable history, which versions go into the history is a choice the developer makes.
>Storing every single version of the file which ever hit disk locally on my machine in the history would be the most accurate, yet no one seems to advocate for that.
You'd be surprised. It's only because we understand (and are used to) tool limitations (regarding storage, load, etc) that we don't advocate for that, not because some other way is philosophically better.
I'd absolutely like to have "every single version of the file which ever hit disk locally on my machine in the history".
I'd be very surprised if you used such a feature on a daily basis indeed.
I understand the rationale but the balance tilts too far into the "too much details" territory for me and that can slow me down while digging.
What I found most productive for myself is that searching for a problematic piece should happen on a two-tiered tree, not a flat list. What I mean is: first find the big squashed PR commit that introduces the problem, then dig in more details inside of it.
Not claiming my way is better but for almost 20 years of career I observed it was good for many others as well, so I am not exactly an aberration either.
To me a very detailed history is mostly a distraction. Sure `git-bisect` works best on such a detailed micro-history but that's a sacrifice I am willing to make. I first use bisect to find the problematic squashed commit and then work on its details until I narrow down the issue.
I mean, this isn't even really all that far-fetched, other systems do work like that, such as e.g. word's track changes or gdoc's history - or even a database's transaction log.
And while those histories are typically unreadable, it is possible to label (even retroactively) relevant moments in "history"; and in any case just because a consumer-level wordprocessor doesn't export the history in a practical way doesn't mean a technical VCS couldn't do better - it just means we can't take git-as-is or google-docs-as-is and add millions of tiny meaningless "commits" and hope it does anything useful.
All of that is completely valid and I appreciate it. But you are addressing a group of people most of which are already overbooked and have too much on their plate every day. How viable is it to preach this approach to them?
Why not? I do not see any reason to have a history at all for anything except to be able to go back to a specific version to track down a problem. Inaccurate history makes that less useful.
Typo commits and the usual iteration during development isn't "accurate history". Noise in commit logs provides negative value.
Ideally, each commit should be something that you could submit as a stand-alone patch to a mailing list; whether it's a single commit that was perfect from the get-go or fifty that you had to re-order and rewrite twenty times does not matter at all; the final commit message should contain any necessary background information.
It would be needlessly restrictive to prevent users from making intermediate commits if that helps their workflow: I want to be able to use my source-code management tool locally in whichever way I please and what you see publicly does not need to have to have anything to do with my local workflow. Thus, being able to "rewrite history" is a necessary feature.
Patch-perfect commits are an idealistic goal. The truth is that, as already mentioned, many of those typos and “dirty” commits can be the source of bugs that you’re looking for. Hiding them hides the history.
Indeed they are, hence the need to rewrite the history. Managers, tech leads, or users of your OSS project don't care about the "fix typo" comments. They are interested in a meaningful history that tells a bigger story.
And to be frank, I am interested in the same, mid-term and long-term. While I am grappling with a very interesting problem for a month then yes, I'd love my messy history! But after I nail the problem and introduce the feature I'll absolutely rewrite history so the squashed PR commit simply says "add feature X" or "fix bug Y".
> Managers, tech leads, or users of your OSS project don't care about the "fix typo" comments. They are interested in a meaningful history that tells a bigger story.
Why would they be looking at the version control system for this? That is not what it's there for.
GitHub in particular is widely used by managers -- not the higher-level managers of course, but a lot of engineering managers have mastered the usage of GitHub issues, Markdown task lists inside PR descriptions, and reviewing results of CI/CD pipelines.
And many tech leads simply don't have the time to review every single WIP commit. They want meaningful message/description of the big squashed PR commit. If you just post a merged list of all commit messages with 10x "fix stuff" inside you'll be in big trouble the next time around and your work will be inspected very closely.
The practices I am describing to you are reality in many tech companies. Writing code there is not about you at all. And almost nobody will read your code and PR descriptions unless they really have to. Hence it's a professional courtesy to make those as small and meaningful as possible.
Why? You don't need to. GitHub will show you that diff, git itself will show you that diff. There is no need to permanently rewrite history to do this. That makes no sense.
For the last time: squashed commits help teams who need the main branch be a high-level history of delivered features and fixed bugs. Where one commit is one delivered feature or a bug fixed. That's the idea.
Whether you think that "makes no sense" is inconsequential. Many people find it very meaningful.
That is absolutely what it's there for! git blame and git log are extremely commonly used, git bisect is only convenient in a commit history where breakage is uncommon, &c
Problem is, when you're debugging later on you need to understand what a breaking commit does (or is intended to do) before knowing what it did wrong.
Let alone the code review issue. In any multi-person project, it is just as important that your commit history be readable by a third party for information as that it be useful for your own personal debugging.
That's not realistic at all. Programmers are often times "in the zone" and having to craft perfect messages while you're rushing on to the next problem is killing productivity.
Compromises with human nature must be made. Hence -- we need to be able to rewrite history.
Your academic purism is out of place, dude. Real humans don't work like you say they do. Some do -- most don't.
I'm of the opinion that you should commit WIP stuff. Use the SCM for managing your source code, damn it!
Just don't publish WIP crap; fortunately, you can have your cake and eat it too, with git.
The biggest reason git (and any similarly advanced SCM) is superior to non-distributed alternatives like Subversion is that I can use it to manage my own workflow, instead of just as the final off-site backup of whatever I decide to publish. I get to actually use everything git offers for shuffling commits and code around while coding.
Want to switch contexts quickly? git commit the whole thing and just switch a branch.
How about untangling a hairy merge? Do it piecemeal and commit when you're done with each bit; it's trivial to then undo mistakes, redo, combine or reorder stuff and you cannot lose any work by accident because git commits are immutable.
All of these features essentially require history rewriting; sure, you're free to rebrand and not call "store WIP state in repository" a commit even though it is one, but I would consider any SCM without these features nigh useless for most work.
Because when debugging with said history a week or a month down the line, or when someone else is debugging with said history, their human comprehension is essential for understanding why a certain event in the history is causing problems and what that event was intended to do.
In the Linux kernel mailing lists, which are the submit-by-email culture I'm most familiar with and with the best documentation of norms, the main criterion for individual patches is that they be comprehensible to a reviewer. Reviewers and bugfixers face similar reading comprehension constraints.
But it absolutely should when you work in a team. I'll personally scold you if you waste my time with merging a 20 commit PR of which 10 commmits are "fix stuff" or "fix typo" or "oops forgot variable" etc. It won't pass code review.
You are free to disagree. I am only telling you how it is in many companies.
2. Waste of later coders' time when running git blame on a line when trying to figure out the purpose of code
3. Waste of later debuggers' time when they need to decide whether this error is the thing they're looking for, an unrelated error to bisect skip, or an unrelated error in possibly-related code that they need to manually fix and then re-test.
As the other sibling poster says, it's about not wasting other people's time. Make the messages/descriptions ruthlessly short and to the point and your colleagues will like you.
Leaving aside the readability and code-review concerns, bisection as a process is supremely painful when you have to separate out your targeted bug from the usual intermittent compilation and runtime issues that show up during local development.
You're not saying anything new as far as I can tell. Your grandparent already said what you said. I only disputed the "this costs next to nothing" part, which you don't seem to comment on.
ideally your branch commits would usually build too; but admittedly, I tend to bisect by hand, following the --first-parent. I suppose if the set of commits were truly huge that might be more of an issue. And of course, there are people that have hacked their way to success here: https://stackoverflow.com/a/5652323, but that sounds a little fragile.
Who squashes like this? I rebase all of my PR's not because I want to trash history, but because I want history to be meaningful. If I include all of my "whoops typo fix" and "pay respect to the linter gods" commits, I have made my default branch history much less readable.
I would say what you're describing is a break down in CI/CD and code review. How is code that is that broken getting into your default branch in the first place?
I certainly don't, but it's a standard option in git merge (and IIRC github supports it), so I'm pretty sure some teams do.
As to rebases to clean up history (and not just the PR itself)... personally, I don't think that's worth it. My experience with history like this is that it's relevant during review, and then around 95% of it is irrelevant - you may not know which 5 % are relevant beforehand, but it's always some small minority. It's worth cleaning up a PR for review, but not for posterity. And when it comes to review, I like commits like "pay respect to the linter gods" and the like, because they're easy to ignore, whereas if you touch code and reformat even slightly in one commit, it's often harder to skip the boring bits; to the point that I'll even intentionally commit poorly formatted code such that the diff is easy to read and then do the reformat later. Removing clear noise (as in code that changes back and forth and for no good reason) is of course nice, but it's easy to overdo; a few typo commits barely impact review-ability (imho), and rebases can and do introduce bugs - you must have encountered semantic merge conflicts before, and those are 10 times as bad with rebases, because they're generally silent (assuming you don't test each commit post-rebase), but leave the code in a really confusing situation, especially when people fix the final commit in the PR, but not the one were the semantic merge conflict was introduced, and laziness certainly encourages that.
It also depends on how proficient you are with merge conflicts and git chicanery. If you are; then the history is yours to reshape; but not everybody is, and then I'd rather review an honest history with some cruft, rather than a frankenstein history with odd seams and mismatched stuff in a commit that basically exists because "I kept on prodding git till it worked".
But that's what a merge commit is - the diff along the merge commit is what the squashed commit would be; the only thing squashing does is "forget" that the second parent exists (terminology for non-git VCS's may be slightly different).
Because in big commercial projects several people or teams merge inside the main branch on a regular basis. It's easier to look at a history only including squashed commits each encompassing an entire PR (feature or a bug).
It gives you better high-level observability and a good bird's-eye view. And again -- if you need more details you can go and check all separate commits in the PR/branch anyway.
And finally, squashed commits are kind of atomic commits. Imagine history of three separate PRs happening at roughly the same time. And now all those commits are interspersed in the history of the main branch.
How is that useful or informative? It's chaos.
EDIT: my bad, I conflated merging with rebasing. Still, I prefer a single squashed commit for most of the reasons above, plus those of the other two commenters (useful git blame output and buildable history).
Oh yeah, rebasing branches as a "merge" policy is definitely tricky like that. (I mean, I'm sure some people do that with and perhaps with good reason, but it makes this kind of stuff clearly worse).
This is fine if its how your team develops but not for everyone. We don't care about full history in branches; maybe it has more detail than main/master but it should still be contextually meaningful. I'd never approve a commit into main with the message "merging PR #xxx" either; it's redundant (merging), has no summary about what it actually does and relies on an external system (your PR/MR process) for details. I do agree that keeping noise out of your main is key, but would go even further than you to keep it clean AND self-contained.
greater detail can ultimately lead to less information if noisy commits crowd out the important ones. IME you want a commit to mean something and that usually leads to tweaking the work/change/commit relationship, which is where squashing helps.
I also want to be able to tell _why_ which is why I dislike working on codebases that squash commmits. Too many times I've done a blame to see why a change was made, and it's a giant (> 10) list of commit messages. Oftentimes, the macro description of what was going on does not help me with the line-level detail.
Also, in case it helps you in the future, `blame -wC` is what I use when doing blame; it ignores whitespace changes and tracks changes across files (changes happened before a rename, for example.)
Neither does squashing though: you still can't tell if that line was introduced or modified.
I've come across "fix indentation" or "fix typo" commits where a bug was introduced, like someone accidentally comitted a change (maybe they were debugging something, or just accidentally modified it).
For example: I'm tracing a bug where a value isn't staying cached. I find a line of code DefaultCacheAge=10 (which looks way too short) and git blame shows the last change was modifying that value from 86400. What I do next will be very different if the commit message says "fix indentation" vs "added new foobar feature" or "reduced default cache time for (reason)".
Squashing buys you next to nothing, and costs you the ability to dive into the history in greater detail.
I suppose if your project is truly huge, it becomes worth it to reduce load on your VCS, but beyond that...