Hacker News new | past | comments | ask | show | jobs | submit login
Theodore Ts'o on how he uses Git when working on Linux (2017) (kernel.org)
48 points by nativecoinc on March 5, 2023 | hide | past | favorite | 7 comments



The OP[1] wants the best of git-merge and git-rebase: to be able to rewrite history as well as to have a new “replaces” pointer which points back to the commit before the rebase happened (basically).

> I've been calling this proposal `git replay` or `git replace` but I'd like to hear other suggestions for what to name it. It works like rebase except with one very important difference. Instead of orphaning the original commit, it keeps a pointer to it in the commit just like a `parent` entry but calls it `replaces` instead to distinguish it from regular history. In the resulting commit history, following `parent` pointers shows exactly the same history as if the commit had been rebased. Meanwhile, the history of iterating on the change itself is available by following `replaces` pointers. The new commit replaces the old one but keeps it around to record how the change evolved.

Ts'o thinks[1] that this is too simplistic, citing some workflows that he has experience with from working on Linux. He says that they use metadata in the form of key-value pairs in the commit messages in order to track cherry-picks across trees, how to test the commit, and even the fact that a commit has been dropped in the current tree.[3]

> My experience, from seeing these much more complex use cases --- starting with something as simple as the Linux Kernel Stable Kernel Series, and extending to something much more complex such as the workflow that is used to support a Google Kernel Rebase, is that using just a simple extra "Replaces" pointer in the commit header is not nearly expressive enough. And, if you make it a core part of the commit data structure, there are all sorts of compatibility headaches with older versions of git that wouldn't know about it. And if it then turns out it's not sufficient more the more complex workflows anyway, maybe adding a new "replace" pointer in the core git data structures isn't worth it. It might be that just keeping such things as trailers in the commit body might be the better way to go

[1] https://lore.kernel.org/git/CALiLy7pBvyqA+NjTZHOK9t0AFGYbwqw...

[2] Submission link

[3] How? By making an “empty commit” (I presume: a commit which doesn’t change the tree) and adding the “dropped” metadata to the commit message.


There are two competing needs to be considered when figuring out what your workflow should be in regard to history.

Both come from the fundamental question: “When (if) we look back in history, what are we looking for?” Keeping everything as it was reduces the risk of deleting something that will later be important; consolidating is supposed to reduce the risk of missing the needle in the haystack or discouraging looking back at all.

Curating the past is 99% wasted effort since looking back is rare. I think the best compromise is to add some automation if you really care, as Ted suggested.


>Curating the past is 99% wasted effort since looking back is rare.

This is the worst kind of self-fulfilling prophecy. It is exactly the same as 'tidying your home is not worth it because you will need to search for things anyway'. For it to be useful, you need to have proper atomic commits and useful messages, same as a good organization, otherwise looking back would have little added value. And it's also something with little overhead if the discipline is integrated into your workflow.

The only reason you wouldn't need a good history is if your git repository is 100% bug-free. Then you don't need to understand why or how the bug was introduced, if some weird piece of code is handling a very specific edge-case or was just poorly written. Is the bug generalizable? It's also something you'd probably see fast by knowing if it was introduced in a local commit or a refactoring one.

Code-wise 'history-obliviousness' (or proper Git hygiene) is among the worst banes of programmers, I believe.


I guess "curate" could be interpreted either way. It would seem possible you are both arguing for preserving history but interpreting the word differently...


I agree the parent comment is making a great point, but I disagree with its conclusion that "curating the past is wasted effort", because it is tremendously useful if done right, whereas thinking that way creates the problem --a bad history is useless. You can argue the same about documentation, bad documentation is not read, so it is useless to write it (well, in fact you have to write it properly).

The definition of curate is 'to apply selectivity and taste to' a collection, so I'd say it does mean both. I have not built a theory of what is the most useful, but let's take the two extremes. On the one end you have "push only development", that commit bad program states and all their fixes. It's bad because it adds way to much noise to the history. On the other end, you have 'squash only' development, where one polished feature is pushed in one commit. It's a huge diff that carry little more information than the code itself, and loses all subfeatures milestones and discussions, therefore it is mostly useless.

In a way, imagine you have to teach something by demonstration. You don't want the student to get lost by you screwing up the details. You also need to chunk that information into a set of simpler and well-articulated parts. If done well, your git history carries the information of your process in a very similar way.

You have to be somewhere in the middle, so I'd say to do a semantic rebase at last step before merge. A fantastic tool that is not so well-known is git-absorb, which helps a lot doing that cleanly and automatically.

https://github.com/tummychow/git-absorb


If you search through the commits on the Git project you’ll notice that they often reference previous commits in their commit messages. So yes: past history often comes in handy.


There’s another concern: repository bloat.[1] Ts'o does not want this link to be a mandatory part of the final commit which everyone needs to get on every pull.[2]

His own proposal does not necessitate keeping “history blobs” around: just use Git commit metadata (trailers) and leave the pointed-at data (beyond the cherry-pick backlinks) in an external store like Gerrit.

I think other commenters who suggested no-changes-to-core-git solutions might have mentioned git-notes, which is similar to the external store point since git-notes are completely optional refs (i.e. if you have notes on your commits in your tree then no one else needs to know about it; those who want the metadata can fetch it, those who don’t can save their bandwidth).

[1]:

> If the complaint about Gerrit is that it's not a core part of Git, the challenge is (a) how to carry the code review comments in the git repository, and (b) do so in a while that it doesn't bloat the core repository, since most of the time, you don't want or need to keep a local copy of all of the code review comments going back since the beginning of the project.

[2]: I’m guessing that people who want a “replaces” link would also want to make it optional. Keeping in line with the best-of-both-worlds mantra.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: