

Lessons from PostgreSQL's Git transition - corbet
http://lwn.net/SubscriberLink/409635/d605f07a2dc7bf27/

======
timmorgan
GitHub makes a downloadable tarball of each repository available for download,
which is nice for those who don't have (or don't want to install) Git.

I understand that the Postgres team isn't using GitHub, but it makes me think
-- could they use an oldschool tarball to update their BuildFarm nodes who
cannot run Git? Or better yet, rsync, so only changes are transferred?

Anyway, this is a decent writeup. I've wondered before about large project
migration to Git.

~~~
plaes
IIRC, they even provide two-way (commit and fetch) subversion->git gateway...

~~~
mrduncan
You're correct:

<http://github.com/blog/626-announcing-svn-support>

<http://github.com/blog/644-subversion-write-support>

------
DannoHung
Great pains taken to preserve the full commit history... does anyone know how
valuable it really is? (Not to imply that it is worthless, but how much effort
went into preserving it vs how much effort would be spent if it were not
available). More of a general VCS conversion question, I suppose.

~~~
jacobian
In my experience with Django, the commit history has been incredibly critical.
There's been a good number of bugs that required digging back to nearly the
beginning of the public commit history (July 2005). There have even been a few
bugs that we only really understood after prying into the old private repo
(which was converted from CVS and goes back to 2003).

~~~
kingkilr
+1, you can tell how valuable it because I always end up crying when I'm
trying to track the history of something and it turns out the change is from a
branch merge that wasn't properly tracked, and thus you can't get the precise
commit that introduced it.

------
jarin
"Yes, that's correct. No merge commits. To submit a patch, extract it as a
context diff and e-mail it. Committers are to apply the patch under their own
names, without branch history. The project has decided, more-or-less, to use
Git like it was CVS as far as commits to the main repository are concerned.
Rather than adapt the PostgreSQL project's workflow to Git, Git would be
adapted to the project's workflow."

I hope they fix that culture issue soon, one of the biggest strengths of Git
is its merging.

~~~
selenamarie
Peter notes in the LWN comments that committers use merging in their personal
repos regularly. The restriction only applies to commits to the master. Many
people share their individual repos and branches to git.postgresql.org (and
GitHub).

The point is to keep our commit history clean while we figure out best
practices for the project.

~~~
Groxx
I've been wondering what the point of keeping a "clean" history is. In what
way is _less_ information better? All this means is you're destroying the
_real_ history, rewriting it to look more like a straight line, which is not
at _all_ how it was developed.

If you want such a history to reflect how it was developed, there are
wonderful version control programs out there. Like RCS. Why not just use that?

~~~
silentbicycle
While I can't speak for the parent comment, one reason people try to keep
clean commit histories in git is that you can easily undo, cherry pick, etc.
individual commits.

If all of the changes necessary to add a feature / fix a major bug are in one
commit, backporting that commit becomes much easier. If those changes are
broken up in multiple commits ("fix this issue" + "oops, forgot that file (and
unrelated changes in the same file)"), then it's more trouble. It usually
doesn't make sense to have the repository in those intermediate states anyway,
once the primary commit is done. History rewriting can make those commits
atomic.

Generally speaking, "clean" git histories have relevant history combined, but
no significant information lost.

~~~
chousuke
I don't think there's any reason you couldn't have both merges and a clean
history.

IIRC the git project itself accomplishes this so that the maintainer forms
topic branches from patch series sent to the mailing list (a single patch must
still be a self-contained change, so "cleanliness" and bisectability is
retained) and then merges those topic branches to whatever branch they're
approved for: pu for "proposed updates", "next" for the next "major" release,
and "master" for the stable branch. The system is interesting in that "pu"
gets completely rebased regularly while next and master have immutable
history; and master is periodically merged with next, so that any fixes for
the stable branch get propagated into next as well.

~~~
silentbicycle
Indeed. I don't understand the problem the postgres people had with merge. I'm
sure they have their reasons, though.

When people talk about clean commit histories in git, they usually mean
rebasing / merging topic branches atomically like we said, though, and Groxx's
comment asked about the rationale.

------
dochtman
Everyone here should subscribe to LWN and get this kind of content sooner.

