

SQLAlchemy Migrated from Mercurial to Git  - dbader
http://www.sqlalchemy.org/blog/2013/05/25/sqlalchemy-migrated-to-git/

======
ezquerra
Every project has the right to chose the VCS that they feel is right for them.
However I find the reasons they give for the switch are pretty weak except for
the last one (github is more popular).

1\. Repository size: Such size differences are uncommon, and mostly likely due
to the fact that the git repo was a brand new repo created from scratch by
converting the original hg repo. If they repacked the mercurial repo it would
probably be greatly reduced in size. They could also do:

    
    
        hg convert --config "extensions.convert=" --branchsort sqlalchemy sqlalchemy-smaller
    

Which results in a reduction of the repository size from 45 MB in the original
to 24 MB in the converted repo. This is not as good as git but quite close
IMHO.

2\. This is not correct: Mercurial bookmarks _used_ to be an extension (a long
time ago!) but they have been built into mercurial for a long time now. They
are basically equivalent to git branches. So nowadays this is just a process
issue. If using named branches is a problem just tell your devs to not use
them and use bookmarks instead!

3\. Mercurial's rebase and mq extensions are part of core mercurial. They are
packaged and distributed with mercurial, their test suites are part of the
mercurial test suite and they are supported and developed by the core
mercurial developers. The only difference between a core command and one of
the official extensions is that these must be manually enabled to be user (by
adding a line to the mercurial config file). This is a good thing! This makes
it harder to mess up your repo if you do not know what you are doing. If you
are not knowledgeable enough to find and edit the mercurial config file you
should definitely not be using rebase or collapsing your changesets.

The last reason on the other hand is one that I can understand and which I
think has merit. Is it worth the hassle of changing their DVCS system?
Perhaps. However it bothers me when legitimate _social_ reasons to switch from
mercurial to git are supported by not quite as clear cut technical reasons
such as the ones put forward in this blog post.

~~~
zzzeek
> If using named branches is a problem just tell your devs to not use them and
> use bookmarks instead!

but they do, because all the devs know git and are only using mercurial
because they want to send you a pull request, and now their feature branch is
_forever_.

Mecurial is unforgiving of mistakes - once in the repo, they are forever.

As for MQ, I've really tried to understand it, but it's just profoundly
awkward IMHO. Mercurial's main benefit is that it's much easier to use and
understand than git, so if I'm going to need to fire up the extra brainpower,
I might as well learn git where these advanced patterns are extremely
commonplace, and are part of the core functionality, not as a series of
optional extensions that have been bolted on over the years. I've almost never
seen anyone using MQ or rebase with mercurial (or even bookmarks for that
matter).

I've tried carefully in my post to express that my points are all based on an
overlap of social with technical issues. I am aware that _technically_ , hg
has answers for all these things. But these answers don't pan out in the real
_social_ world, for better or for worse. If git didn't exist, I'm sure there
would be a much larger knowledge share for mercurial, and I was even betting
on that in the beginning (certainly the "easier" tool will be more popular?)
but it hasn't turned out that way.

~~~
dlitz
> Mercurial's main benefit is that it's much easier to use and understand than
> git

That's really only true if you're a beginner _and_ you weren't properly taught
how git actually works. Fundamentally, git's data model is really simple and
transparent. The commands are somewhat arbitrary, but no moreso than Vim's or
Emacs's. As with Vim and Emacs, you don't need to know most of the commands
anyway, except for convenience.

~~~
marssaxman
The simplicity of git's data model is the root of the problem; that is the
reason it is hard to learn and hard to use. It operates at a lower level of
abstraction than the level we are thinking at when we are actually developing
code. Git is less of a version-control system and more of a library one could
use to build one, but in absence of any canonical high-level git wrapper, we
all end up implementing and running one in our heads. This is a waste of
brain-power.

~~~
lotyrin
To say I disagree is a massive understatement. I've seen teams try to use
wrappers and just mess everything up because they don't understand what's
actually going on in the underlying system.

Git has the concepts it has because they're all important. It shows them all
to unfiltered because to do so is vital.

~~~
marssaxman
HG seems to manage the same problem with less complexity.

------
zalew
> Largely due to the popularity of Github, Git has achieved a much higher
> userbase, to the degree where we regularly have users requesting us to move
> to Git so they can provide pull requests.

I am too a Mercurial refugee and prefer Git, but how is that even a reason?
People don't want to contribute if it's on a Mercurial repo?? Python seems to
be doing fine on HG.

Meanwhile, thanks for your amazing work on SqlAlchemy(&co), zzzeek!

~~~
gamache
I think what those users mean is that they want the project on Github. There's
no concept of "pull requests" within Git.

Anyway, most Git users aren't Mercurial refugees; I wager we're Subversion
refugees and kids.

~~~
zalew
> There's no concept of "pull requests" within Git.

<http://git-scm.com/book/ch5-2.html> "run the git request-pull command and
e-mail the output to the project maintainer manually."

[http://www.wired.com/wiredenterprise/2012/05/torvalds_github...](http://www.wired.com/wiredenterprise/2012/05/torvalds_github/)
"Git comes with a nice pull-request generation module, but GitHub instead
decided to replace it with their own totally inferior version."

> I think what those users mean is that they want the project on Github.

that would be even worse. luckily as zzzeek explained below, it wasn't the
case.

~~~
gamache
Wow. I had no idea about git pull requests, so I even googled it, and didn't
see it on page one. My mistake, thanks!

------
SEJeff
Very smart move. Some time ago Riak did the same thing. Their rationale was
more detailed, but came to the same conclusion: [http://basho.com/a-few-more-
details-on-why-we-switched-to-gi...](http://basho.com/a-few-more-details-on-
why-we-switched-to-github/)

------
develop7
1\. What a huge saving — 0,002¢ per repo copy. Don't spend it all in one
place.

2\. Bookmarks is core feature since March 2011. This basically means those who
pushed "move to Git" decision, haven't checked if anything was updated in
Mercurial (and with bookmarks in particular) since then at least. This is
pretty understandable — it is known that only software gets updated is one
you're paying attention to, others don't.

3\. As ezquerra mentioned, those extensions Mike is probably referring to, are
shipped with Mercurial, and are enabled with only single line added to .hgrc.
Not mentioning those extensions do not _emulate_ Git features, they
_reproduce_ it.

4\. Only reason makes sense.

------
etanol
Too bad there are no comments on the blog:

    
    
        SQLAlchemy's issue repository will remain hosted on Trac;
        while a Git repository can be mirrored in any number of
        places, an issue repository cannot (for now! Can someone
        please create a distributed issue tracker? Should be
        pretty doable, though getting Github/Bitbucket to use it,
        not so much...), so SQLAlchemy's long history of issue
        discussion remains maintained directly by the project.
    

<http://fossil-scm.org> !!!

~~~
qznc
There are tools like Bugs Everywhere [0], which mix well with distributed VCS.
However, the question is if distributed bug tracking makes sense.

[0] <http://bugseverywhere.org/>

~~~
travisb
There are many different projects which implement distributed bug tracking[0],
some better than others. The question of how much distributed bug tracking
makes sense depends strongly on the structure of the project. If the project
is developer heavy (such as developers responding to user help requests) it
can work well, but if the project has several strata of developers and support
people and users then it might not make as much sense.

[0]
[http://travisbrown.ca/blog.html#TooMuchAboutDistributedBugTr...](http://travisbrown.ca/blog.html#TooMuchAboutDistributedBugTracking2013-04-20-83)

------
trosenbaum
Would something like Kiln Harmony work for storing the main upstream
repository?

<https://secure.fogcreek.com/kiln/>

It seems that if KH translates seamlessly between HG and Git that a project
could accept pushes from both, right?

I assume since KH is not open source / free that this would not be an
acceptable method for maintaining the main repo, or that the translation would
introduce interesting wrinkles, but it does seem like one could potentially
have his cake and eat it too.

~~~
qznc
Thanks to hg-git, Mercurial users should have no problem using the git
repository.

~~~
gecko
hg-git and Kiln Harmony serve different, though overlapping, purposes.

hg-git allows people who know both Git and Mercurial, but who prefer
Mercurial, to work with Git repositories from Mercurial. No effort is made to
hide the Git _model_ , and no effort is made to ensure that the Mercurial
repository generated from the Git one is idempotent (i.e., a given Git repo
will always generate the same Mercurial repo) or can round-trip (i.e., I can
trivially craft repos that hg-git can work with just fine, but where pushing
to a bare Git repo will result in a different Git repository from the
original). The benefit to this model is that the Git users don't need to do
anything different, ever, and no one needs to know that you were using
Mercurial, ever.

Kiln Harmony does something different: it's designed to let a team use
whatever tools they want. This means that Mercurial users don't have to learn
the Git model, Git users don't have to learn the Mercurial model, and
generally, everything "just works" in that situation. Doing that requires a
lot more processing power than hg-git requires, which is part of why we only
offer it as a hosted solution for the moment. It also works best when your
central repository _is_ a Kiln Harmony repository.

I obviously think that's a fine trade-off, but if part of the main motivation
of SQLAlchemy was GitHub, then Kiln Harmony probably isn't a good solution.

------
octo_t
> Git manages the size of the repository more efficiently; while the Mercurial
> repository has been approaching 50M in size, the Git repository is only 17M.

17MB vs 50MB - almost a third of the size. That is definitely quite
impressive.

~~~
micampe
I’m not the biggest git fan and I still think it’s right choice, but how is
this even a reason to switch? Unless you go in (older) Subversion
ridiculousness how are 40MB of disk space even a concern? They even listed it
first. At best it would be Oh, and we saved 40MB, the size of two raw pictures
from a digital camera.

~~~
etanol
The size issues of the SQLAlchemy repository come from the way Mercurial
handles copies and renames.

I prefer Mercurial because is much easier to use but this file rename issue
always make me feel uncomfortable when reorganizing code.

These days I'm giving Fossil a try, which still is easier to use than Git and
the repository size sits between Git and Mercurial.

~~~
gbog
From the other side of the mirror, I mean from the position of someone used to
git, this comment seem weird.

Git is not hard to use. It is adding a few articulations in your workflow, and
they are just allowing you to run faster.

One example: interactive staging with git add -p, this articulation masks it
much easier to debug: add print all over the place, try some tweaks, find the
one, stage this one snippet, checkout the files, run the test, and you're
done.

------
k_bx
Could someone please explain me an argument about removing branch in git vs
closing it in mercurial? As I understand, when you "remove" branch in git,
just as when you close it in mercurial, it doesn't delete commits from tree
either, since it can lead to disaster (if those commits were pushed already).
So someone else could actually start new branch from commit which is in branch
you just "deleted". Am I wrong here?

~~~
csense
> Am I wrong here?

It sounds like you may be a little fuzzy on how Git works.

> when you "remove" branch in git...it doesn't delete commits from
> tree...since it can lead to disaster

When you delete a branch, only commits and filesystem states that are no
longer accessible from any other branch, tag, index, or stash become eligible
for garbage collection. So in normal usage [1] [2], you shouldn't ever be able
to delete the contents of any commit that exists in the history of any named
commit.

A commit that is newly eligible for garbage collection will not actually be
garbage collected during a grace period, I believe one week. During this time
the commit is still accessible by commit hash [3].

Branches are kind of like variables in a garbage-collected language, and
commits are like objects in that language. An object is eligible for GC if and
only if there are no variables that point to it. Making new variables that
point to the same object is a really lightweight operation since most of the
data's shared. _Unlike_ many OO languages, objects in Git are immutable, which
means git can safely collapse identical copies of the same object to a single
instance (an optimization which git is quite aggressive about).

> it can lead to disaster (if those commits were pushed already)

If you want to cause a remote repository to delete a branch, look at the
--mirror and --delete options for the push subcommand. AFAIK, pushing the
deletion of a branch has the same effect as if you deleted the branch locally
on the machine you're pushing the deletion to.

> someone else could actually start new branch from commit which is in branch
> you just "deleted"

If another developer has merely fetched the deleted branch, their remote
tracking branch will be deleted when they fetch again. If your colleague has
created an actual local branch with the changes you deleted (as opposed to a
remote-tracking branch), when or whether your colleague deletes that branch is
entirely under his/her control [4].

[1] Using low-level git subcommands to force git to delete objects which are
still referenced by other objects is abnormal usage.

[2] Corrupting the object database in your .git directory through unclean
shutdown, hard drive failure, or vindictive hex editing is also abnormal
usage.

[3] If you neglected to write down the commit hash before you deleted the
branch, you can likely find it in the reflog. The reflog is a history of the
commit hash at the tip of each branch. Alternatively, you can use git
subcommands to browse the object database and find orphaned commits.

[4] At this point, git has no _technological_ way [5] for anyone but your
colleague to delete the local branch from your colleague's machine.
_Socially,_ of course, you can use project management techniques to encourage
your colleague to delete the branch (e.g., if the official maintainers
announce that the deleted branch is considered obsolete and will not support
it or accept changes based on the version of the project in the deleted
branch, that can be a compelling reason for people to stop using the copies
that are floating around. Or your colleague's boss can tell him to delete it
or be fired.)

[5] If you have git push access to your colleague's repository, or you have
shell access to your colleague's user account, you can delete their branches.
Footnote [4] is referring to the _usual_ case where your colleague's machine
is a private box where you have no access.

~~~
k_bx
Thanks for your answer. I really was never aware of git's GC mechanisms
before, that's why I was sure that commits still hold inside a tree. And you
perfectly answered my question (well, at least parts that I understood all
details about, I clearly see I'd need to read more some time later).

I still find mercurial's branching model "the right thing" in terms of tree
showing development history inside a branch, or overall (multi-branch)
history-review (where you clearly see which commit was made in which branch),
but I now see that it's really nice from repository-cleanup perspective to
have features git has, to deal with non-needed branches and commits (via
different mechanisms).

~~~
csense
Git's branching model is "the right thing" once you learn the idioms of git
development.

For example, if you want to build a new feature, make a local branch so your
changes are isolated. You can make work-in-progress commits that split up a
large change into pieces, some of which may be experimental, contain debugging
statements, or temporarily break things.

Then when you're satisfied your changes are bug-free, you can use an
interactive rebase (git rebase -i) to turn the commits into clean patches that
would be something you might e.g. send to the mailing list if you're working
on the flagship Git project, the Linux kernel.

I like to keep my history clean, six months from now I won't be interested in
all the bugs I wrote when I implemented a feature, and all the fixes that I
came up with for them during initial testing. I won't want to see the
implementation of that feature scattered over multiple commits. I just want to
see one clean, bug-free patch that implements the feature.

------
Ziomislaw
TL;DR - we didn't read the HG manual, so we had to change as we did't know how
to use its features.

~~~
qu4z-2
Or because none of their contributors who only installed hg to submit a patch
had read the entire hg manual.

~~~
develop7
Of course, all they had to read and follow was "contributing to sqlalchemy"
manual.

