
Linus on keeping a clean git history (2009) - pushingbits
http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg39091.html
======
lukev
This highlights the only thing I don't like about Git. It's an immensely
capable tool, but it gives no guidance regarding the _right_ way to do things.

Our own teams have a set of practices which are similar but different from
what Linus outlines here. And different projects on my company use different
practices from those.

The worst thing is that there's no way of enforcing these workflows or
practices other than out-of-band social conventions. And so minor mistakes
happen, all the time. Our Git projects are never as pretty as they should be.

In other words, Git provides an awesome set of primitives for source control.
I'm not sure what it'd look like, but I'd like to see a product that built on
those primitives to enforce a little more order on projects.

~~~
LnxPrgr3
On a repo I've been maintaining, I'm horribly tempted to revoke almost
everyone's commit access, for two reasons: to add a code review step to the
process, and to be able to keep the commit history reasonably clean.

It's low-tech, but a human gatekeeper's really your only hope for enforcing
whatever conventions your project has.

~~~
mattdeboard
Based on what I've seen from popular open-source Python projects I've used in
the past (Fabric, Haystack and basically anything else by daniel lindsley)
having a _single_ human gatekeeper is the express lane to hell. If you do that
make sure you have at least a few core contribs who can approve commits.

~~~
qznc
Linus is the single gatekeeper of Linux.

~~~
EternalLight
No, he isn't. He's the final gatekeeper but there are several of lower level
gatekeepers/maintainers for different parts.

At least according to rumors, I've never invested the time to get involved
myself.

~~~
dfc
There are an awful lot of maintainers for smaller individual pieces/subsytems
of the kernel. Take a look at the MAINTAINERS file to see who is responsible
for the smaller chunks:

    
    
      grep -B1 ^M: /usr/src/linux-kernel/MAINTAINERS

------
mattdeboard
Like lukev said, git is "an awesome set of primitives". How you build a
workflow out of those primitives isn't set in stone (though, like most things,
Linus has strong opinions on exactly how to use his products). This is
basically what Github has done, with an extra layer of UI glitz, social, and
(much-improved) notifications.

That said, IMO there is still quite a lot of room for customization in git
workflow when using Github. For example, we don't "send patches around" as
Linus says. Our private feature branches live on Github but we've adopted the
convention that the "private" branch name is prefixed by who's working on it,
e.g. mdeboard-oauth, jschmoe-url-routes. If it has someone's name at the
front, don't touch it. That enables us to still use the "D" in DVCS while
retaining the ability to safely rebase our own work to keep our history clean.

The only reason I'd want a git-based product to "enforce order" is a culture-
related one: ensure that contributors/collaborators do things in line with the
conventions we've established. However, IMO it's always better to have a
conversation about that than work with an overly prescriptive tool.

------
silverlake
I'm still new-ish to git and don't get why rebase is popular. If I do my work
on a branch B, I can merge this branch into the master M. The merge point will
have a succinct message "Bug Fix #1". You can print the history so it only
shows these merge messages and not the messy history in the branches. Isn't
this the same as rebase? That is, rebase _removes_ the messy branch history.
But I'd prefer to keep that history, but rarely use or display it. bisect can
also ignore those branches and only use the merge points. Saving the branch
history shouldn't be problem. What am I missing?

~~~
Jacquass12321
There are two major things I really gain out of rebasing frequently.

Firstly and most importantly, Thanks to rebase I'm constantly working against
the most recent mainline, merge pains are reduced by frequently dealing with
smaller rebase merges instead of trying to do one massive merge at the end
when I'm finished with a longer life task that might last a week or two. The
more often you merge the less painful it is.

Secondly there's the cleaning part of history involving squashing. I believe
the issue with your viewing the merge history of the main line will miss out
on changes that were able to be introduced fastforward without a merge. And
frankly no one else on the team cares that I committed 6 times in the process
of one task, they want to see all the code relevant to that task, and ideally
it's all in one change set.

There's a pretty reasonable summary over here
<http://blog.sourcetreeapp.com/2012/08/21/merge-or-rebase/>

For certain teams rebase just makes a lot of sense.

~~~
jrochkind1
> merge pains are reduced by frequently dealing with smaller rebase merges
> instead of trying to do one massive merge at the end when I'm finished with
> a longer life task that might last a week or two. The more often you merge
> the less painful it is.

You can take care of that just by doing frequent regular merges, no need to do
rebase ever, and rebase doesn't make this part any easier, does it?

I think the 'cleaning part of history', and trying to avoid those annoying
merge commits in the logs, is in fact the only reason to do rebases, no? It's
obviously an important one to many people.

------
smithzvk
So I'm relatively new to version control entirely, but in the last few years
my group has been making a big push to institute Git. I have been wondering
lately, however: how much history cleaning is expected/desirable?

When I develop, I split my commits into as many small changes as I can so that
the commit messages are single topic. I thought that was basically the idea.
Every once in a while I use rebase to combine a few commits that should have
been done together as they all addressed the same issue. This all seems right
to me. I am left with a clean history of everything I have done on a very fine
grained time scale. But the large number of commits, each with little
significance to whole program hides the large scale structure of the
development.

However, I could use rebase to start combining loosely related commits,
trading the time resolution for clarity in the commit history. There seems to
be a continuum along this scale. Where is the proper place in that continuum
to say this is clean enough? Also, I don't like making changes where I am
losing perfectly good information.

I know that I can group certain commits by defining a branch, developing on
it, then merging (non-fast-forward) back to the original. The branch should
keep the grouping in the commit history. I even suppose that this is can be
done after the fact using rebase with the proper amount of git-fu. Is
branching and non-fast-forward merges the preferred method of grouping related
commits in the history?

If so, this seems troubling as it means that partially fixing something is
difficult to do with a clean history. Until the piece of the program you wish
to fix is completely working, it shouldn't be merged into master because it
would ruin the grouping of the related commits. This means that there can't be
any partial thought's like fixing bugs as you find them, because presumably
you might want to group all bug fixes of a function together, but have a
distinct commit for each.

Now I'm more confused than when I started. Seriously, any references or advice
on this sort of topic are welcome.

~~~
wickedchicken
> However, I could use rebase to start combining loosely related commits,
> trading the time resolution for clarity in the commit history.

In general, your commits should be the smallest atomic operation that makes
sense. When people talk about 'clean history,' they're talking about working
in the awesome workflow git provides:

1\. Write half-written broken code. 2\. Fix that code up. 3\. Add some more
onto that. 4\. Fix a typo! 5\. Forgot to update the README.

Now, you _could_ push that to master, but then the main master is littered
with commit messages like 'oops' and 'typo.' Instead, you can rebase 5-1 onto
the latest master, squash them together, and have one 'nice' commit that only
has the cleaned up final changes.

This is one of the most powerful things about git: in a private repo, you can
commit all kinds of garbage and half-written stuff without caring. When you
want to make your stuff public, rebase and squash, then send it out. Be
careful though! Only rebase your own private branches, or you're gonna have a
bad time™.

~~~
smithzvk
Okay, that is basically keeping with my current understanding (though I'm not
sure how much I live up to the "only have working history in the public repo"
rule).

There is the other issue I raised, however: is there a good way to group a
series of commits that happen to be towards a single distinct goal. Using
branches is a clear step in that direction, but it seems like a nightmare to
perform a rebase like you described if the commits are mixed and I would like
the end result to involve grouping via branches. That is confusing, hopefully
this will clear it up:

1\. Bugfix in function1. 2. Bugfix in function2. 3. New feature in function2.
4. Bugfix in function1. 5. Bugfix in function2

...and we want in the end:

    
    
          /-- 1 ---- 4 ---\
      ---<                 >--HEAD
          \- 2 -- 3 -- 5 -/
    

Can rebase do this easily? Is this a good idea (it seems like it is to me)?
The programmer would have to confirm that the code works at every state.

~~~
lmm
Switching branches is cheap, I'd say the "right" way to get a tree like you
want is to have two or even five branches all the time you're working. But I
suspect you could make two branches and cherry-pick different sets of commits
onto them to get the result you're after. To my mind it wouldn't be worth the
effort though; how often do you really care whether the code worked with only
1 and 4 applied?

~~~
smithzvk
Right, I would say that it isn't worth the effort. Also, I probably never care
about the code with only 1 and 4 applied. So perhaps branches aren't the right
way to do what I am describing.

I always saw VC as a systematic way to keep a log of my development so that I
could figure out where I may have broken my code. For this purpose, having
some sort of meta-data where commits can be grouped would be nice. It would
also work to do something like always end my commit messages with some kind of
meta-data tag that I could grep the log for. I was just wondering if there was
a prescribed/built-in way for Git to handle this.

~~~
lmm
git-bisect is the standard tool for figuring out where you broke something. I
don't know what it does with branching histories though, I tend to effectively
linearise my history by rebasing each branch on the trunk head before merging
it.

------
easy_rider
Funny. I was just finishing a chat with a colleague about a git strategy for a
coming new release of a production product, then saw this post on top. I've
been working on it without collaboration for about half a year now, so thats
easy.. I've had mixed experience with both rebasing and pull strategies before
that. I've found rebasing being a lot better when working with tightly coupled
code. And pull being a lot cleaner in being able to cherry-pick and revert to
previous states more easily. rebase is indeed a destroyer.

We've now decided to use this model, while only deleting feature branches
after RC acceptance.

<http://nvie.com/posts/a-successful-git-branching-model/>

My colleague just suggested to rebase regularly from the develop branch while
developing features "I'm working on a branch. someone - e.g. you - updates the
develop branch. I will have no info if that is related to my stuff or not so,
I should rebase regularly to the latest version of the develop branch"

I'm kinda clueless now. Git is really powerful and flexible in strageties, and
that adds to complexity.

------
leeoniya
here's a more recent rant:
[https://github.com/torvalds/linux/pull/17#issuecomment-56599...](https://github.com/torvalds/linux/pull/17#issuecomment-5659933)

------
jrochkind1
oh yeah, perfectly straightforward, only took several thousand words to
confusingly explain.

Nope, not simple. Yep, this is a git usability problem.

In the ruby/github world, people generally violate this and DO rewrite
'public' history in order to get 'cleanness', primarily because almost ALL
history is 'public', since you tend to show people work in progress on github,
or just push it there to have a reliable copy in the cloud. And yes, this
sometimes leads to madness.

------
chris_wot
Unintentional contradiction two messages down the thread: Linus says "But
note: none of these rules should be absolutely black-and-white. Nothing in
life ever is."

Or perhaps intentional. I can never tell when I read a Linus fiat.

[http://www.mail-archive.com/dri-
devel@lists.sourceforge.net/...](http://www.mail-archive.com/dri-
devel@lists.sourceforge.net/msg39094.html)

------
mibbitier
git is so overly complex (Coming from svn).

~~~
pm215
I think that for people with an svn background there are three different
issues that all hit at once:

* distributed rather than centralised version control brings a new set of concepts to understand

* git is flexible enough to support many different workflows. This means you have to actually _choose_ one, and choice is difficult especially when you're just trying to get to grips with a new tool. svn has much more of a "one standard way to do it" approach

* git's UI is in places confusing, inconsistent and occasionally just randomly and unnecessarily different from most other version control systems

The first two are 'essential complexity'; the third is more 'accidental
complexity'. In any case I feel it's having to deal with all three sources of
confusion that makes the svn->git transition tricky for many people.

~~~
mibbitier
Don't most people actually end up using git in a centralised manner though? eg
the rise of github.

I can totally see git is ridiculously powerful, and general purpose. I just
wish it'd default to what most people want a bit more.

~~~
kbolino
"Distributed" is not the same as "ad hoc". In virtually all workflows, whether
using distributed or centralized RCS, there will be a master copy. The
difference between distributed and centralized is whether that master copy is
the _only_ copy.

------
gosub
git needs a "git propaganda" command. Instead of changing history, it would
tell it in a different manner.

------
3825
I've heard some of these words...

------
jebblue
I have tried to get git, some people say one project per repo (which seems
crazy but I did it), many projects are ok, you do need a main master repo, you
don't need one, then there's the half dozen commands where with SVN it's one.

Now the most valuable thing to me in source control, history, I'm supposed to
keep clean? That's like a sacred cow, you _don't_ mess with history.

>> That's fairly straightforward, no?

No _Linus_ it isn't. Git is hard to get right. If it wasn't for EGit I'd be
lost. I tried Canonical's bzr and it is more understandable for ordinary
humans.

All that aside I really like Linux. :)

~~~
klj613--
Best way to learn git is in the command line (get away from any GUI). And then
play with repositories to see what the commands actually do.

"Don't mess with history"? I don't have to commit to my commits as long as my
commits ain't public.

Rewriting history is a lie? Well, if you want to keep everything you do in
history, maybe commit on each keystroke? That's insane.

Don't commit unless your ready to commit? Then that be hard to keep track of.
Come time to commit you've got 50+ files modified good luck at doing decent
commit messages.

~~~
jebblue
>> Best way to learn git is in the command line (get away from any GUI).

I've used a lot of source control systems and the best always have a GUI and
so guess what? I want a GUI unless the CLI for such system is inherently
intuitive which if you read my comments I do not think git is intuitive at
all.

>> I don't have to commit to my commits as long as my commits ain't public.

Huh?!?! I don't get that, it like makes no sense to me whatsoever. Why do you
think I should even try to comprehend it?

>> Don't commit unless your ready to commit?

Are you suggesting I said or asked that??? Are you advising me? Seriously
what?

>> Then that be hard to keep track of. Come time to commit you've got 50+
files modified good luck at doing decent commit messages.

Huh? I'm sorry is that English because it doesn't even make sense at all to
me? Is it 50 lines changed all clearly related? Is it 50 totally different
changes?

~~~
klj613--
Personally I think its better using the CLI for git.

I commit very often however I rewrite the commits. In other words, I mess with
my history and it is a good thing (My commits ain't final, in other words...
"I do not commit to my commits").

In SVN I try not to commit too often because I do not want to commit (publish)
changes which I may not want to keep.

With git I commit very often in stages. Then I can remove them or change them
at a later stage. If I do not do this I will end up with a load of files (e.g.
50+) which has been modified and either I commit them all in one (bad) go or
try and separate out each step I've taken the past 12 hours and do decent
commit messages (good).

Of course you could commit very often, create new commits to fix errors you've
done in recent commits (rather than rewriting history). You could also merge
master into feature-x everyday (rather than rebasing), but then you'd have
history which looks like chaos and hard to follow.

-

Honestly, when I started git I was lost (first VCS I learnt). Until one day I
figured out how simple git is to use.

