

Ask HN: Which revision control system? - cconstantine

For the past few years we've been using Perforce at the company I work for.  For a number of reasons, we've been given the OK from management to switch to a free SCM system.  This is supposed to be a big opportunity to improve our process and save us money, but we're having problems finding a good replacement.<p>Our must-have features include; ability to import our perforce history, easy/trivial branching and merging (preferably the ability to use an external merge tool), large file support, ability to pass changes from one workstation to another without affecting everyone, and ability to work with multiple gigs of binary files (most are roughly 500k, some are 100+megs).  The binary files are mostly build artifacts, but for a large number of very good reasons we can't expect developers to generate them.  Each developer having their own branch would be nice, but isn't critical.<p>Our products only run and compile on Windows, so decent windows support would be nice.  We aren't exactly an MS house; our product works with the artifacts of programs that only run on Windows so it runs on Windows.  Our products are very-much closed source, so things like Github are of no use to us.<p>Distributed systems like Git are compelling for the workstation to workstation support.  Unfortunately Git chokes on our binary files (I get an out of memory error), and I'm afraid that the other DSCM projects will stagnate in favor of Git.  Centralized systems like Subversions are in line with the way we've always done business, but they lack the ability to view/review/pull changes on another workstation and branching/merging isn't as easy.<p>I'm a big fan of Git, but if it can't work with large file it won't work for us.  Subversion seems like a step backwards, but I could be wrong.  Whatever we choose, we'll be with it for a long time and use it every day.<p>So, HN; what revision control system would you suggest?
======
etal
Out of the three leading free DSCMs -- Git, Mercurial, Bazaar -- none seem in
danger of going extinct any time soon. Bazaar is the foundation for Launchpad
and Ubuntu, almost irrevocably. Mercurial has less buzz than Git for open-
source projects, but I suspect it's even more popular in the business world,
and its Windows support is solid. Mozilla uses it, anyway. So you might want
to give the other distributed systems, particularly Mercurial, another look.

------
rmaccloy
Git _isn't_ very good for large files, unfortunately. In my experience large
is quite over >100MB, though.

I'd love to convince everyone to switch to git, but I want to note that SVN's
branching/merging capabilities are at par with p4's, since at least 1.4 using
svnmerge and probably in 1.5 standalone. (My current employer uses p4, and I
switched my last workplace over to SVN from CVS.)

It does suck to go back to centralized SCM after using a DVCS though (I clone
my work p4 checkout into git for minute-to-minute use). A possible solution:
if your build artifacts are most of the large-file problem, consider storing
them out of band and having workstations fetch them using Ant/Ivy or something
similar?

~~~
cconstantine
I would love to have the build artifacts stored in a system for managing build
artifacts, but I don't really know of any.

We could (theoretically) store the build artifacts in Subversion and the
source in Git, but that would be very unpopular. No one in the company is
interested in having multiple revision control systems to maintain.

Our main product is built in Visual Studio, but we use ant for the surrounding
build tasks. I think of ant as being a 'better make', what does it have to do
with grabbing remotely stored build artifacts?

~~~
rmaccloy
Ivy is an addon of sorts for ant that does dependency management (e.g.
libraries, etc.) It sort of jacks the Maven dependency resolution bits and
leaves the rest. It's definitely not a 'system for managing build artifacts'
(and it's pretty java-focused) but you can set up your own repository and have
a build target fetch them, put them in the right place, etc.

I wouldn't necessarily advocate this since you'd inevitably end up having to
build some system around it, but conceptually it seems like what you'd want.
YMMV, I've only used it for pretty straightfoward Java stuff.

------
zacharydanger
Branching/merging on Subversion has been absolutely trivial since the version
1.5 release. Prior to that it was a nightmare, but now it's ridiculously easy.

Can we finally put this to rest now?

~~~
litewulf
Agreed. Though, one caveat: SVN will slow down as it diffes large binary
files. I remember reading a developer works article file about this.
([http://www.ibm.com/developerworks/java/library/j-svnbins.htm...](http://www.ibm.com/developerworks/java/library/j-svnbins.html)
thanks Google!)

(PS: if I'm misinformed, or out of date, I apologize. I use SVN, but never
upgrade and have PSDs in the ~100-200MB range and its not terribly fast.)

------
sh1mmer
I really love Bazaar.

I love that it's distributed and I can version a local folder with a single
command. I love the simple integration with Launchpad.net (which is annoyingly
mediocre). I love that it's a single command which has all the inline help
built in. I love the conflict management.

However, people with a much better understanding of the details wrote
comparisons with each of the popular DVCS currently in use. I suggest if you
use another VCS you at least read the Bzr side (<http://bazaar-
vcs.org/BzrWhy>).

 _Specific comparisons_ :

Subversion: <http://bazaar-vcs.org/BzrVsSvn> Git: <http://bazaar-
vcs.org/BzrVsGit> Mercurial: <http://bazaar-vcs.org/BzrVsHg>

~~~
gecko
Their bzr v. Mercurial comparison is a bit disingenuous at points, if not
outright misleading.

Firstly, having used bzr 1.8 and 1.9, and Mercurial 1.0 and 1.1, the claim
that bzr's speed is close to Mercurial's is downright hilarious. There simply
is no comparison. bzr's speed is so atrocious that, when Python was looking at
using bzr for its version control system, the only technique that bzr's
fanboys could come up with to ensure fast checkout was to have you download a
tarball of the pre-checked-out sources. You can see more at
<http://www.python.org/dev/bazaar/> . Conversely, I routinely work with
Python-sized repositories in Mercurial without incident. If you have one take-
away, this should be it.

They also claim that bzr can swap out its backend and hg can't (patently false
--the backend has in fact changed for 1.1, due out very soon; see
[http://www.selenic.com/mercurial/wiki/index.cgi/fncacheRepoF...](http://www.selenic.com/mercurial/wiki/index.cgi/fncacheRepoFormat));
that Mercurial cannot be served over vanilla HTTP (it can); that Mercurial
does not let you change your merge algorithm easily (it does, see
[http://www.selenic.com/mercurial/wiki/index.cgi/MergeToolCon...](http://www.selenic.com/mercurial/wiki/index.cgi/MergeToolConfiguration?highlight=%28merge%29));
that Launchpad, which even you admit is mediocre, is Mercurial-only, without
noting that Mercurial has <http://bitbucket.org> and similar sites; and so on.

There actually are a few advantages of bzr over hg. That site just happens to
make most of them up.

~~~
sh1mmer
That's pretty bad. Is it out of date or just overly fanboy-esque? It would be
cool to put some feature list comparisons somewhere.

Right now most of this stuff seems really anecdotal because people that
understand one system mostly use that system exclusively.

~~~
gecko
I honestly think most of it's just ignorance. I don't honestly think that
anyone who has used Mercurial in anger can seriously be unaware of merge
customization, or of the fact that Mercurial has simple revision numbers in
addition to hex codes, or that Mercurial can clone from HTTP, but I can
definitely see missing those features if you just messed around with it for an
hour when you were trying to evaluate the two systems. Likewise, even someone
who had used Mercurial fairly heavily could easily be unaware of Launchpad-
like sites, or new back-ends in upcoming Mercurial versions and the like. At
any rate, I don't think that the bzr team is being malicious; just uninformed.

The one point that I do think they're guilty of intentionally bending the
truth are their claims of speed. bzr goes to great lengths to have more
thorough history and merge tracking than either Mercurial or git. It pays for
that by being slower. That's a perfectly viable trade in some cases, but bzr
needs to own up to it. Instead, rather than simply admit that bzr is slower,
its supporters generally try to explain ways to get around the fact bzr is
slow, and then claim that the slowness doesn't matter. For example, initial
clones ("bzr branch") with bzr are simply atrocious. A bzr supporter will tell
you that you don't do clones that often, and besides, there are work arounds--
e.g., since bzr works with whole-file hashes like git, it can share file
objects and revisions across all of the repositories on your system, making
branching repositories you've already brached at least once go more quickly,
and for ones you've not yet, you can just download a tarball of an existing
clone and use that as the base. These arguments are weak, sidestepping the
issue. I'm reminded of git in its early days, when you had to manually run
very, very slow gcs every once and awhile, and its supporters shot back with,
"Well, yeah, but you can totally just stick that in a cron job." Mercurial's
not innocent, either; I remember its dev team trying to argue why it didn't
need named branches, when everyone using git (including me!) thought that
git's branching was one of its killer features. bzr needs to do what those
projects did: quit explaining away the bug, and just fix it.

bzr's merge algorithm is better than either Mercurial or git's, its GUI (QBzr)
is best-of-breed, and its online operational mode makes it a drop-in
replacement for Subversion for sites that are trying to migrate away from old
habits slowly. It also happens to be significantly slower than the
competition, to the point that, much as I personally believe that Mercurial
has a better power-to-usability ratio than git, I think bzr has a bad power-
to-speed ratio compared to either git or Mercurial. Whether you agree with my
assessment is up to you. I just wish that their comparison pages were more
honest about what the trade-offs are.

------
shutter
I use Mercurial, but have been exploring Git a little because of what you
noted -- the community momentum seems to be headed that way.

~~~
thomasmallen
Another vote for Mercurial. Whenever a project gets that "KoolAid" feel (Git,
Rails, even Haskell at times) I tend to gravitate towards its competitors, so
take that with a grain of salt, but I can say that Mercurial's been great for
the small teams I've used it with. On larger projects, I've unfortunately only
used Subversion, but I can say that it works, at least.

~~~
gecko
I'll third the recommendation for Mercurial. There is no question that git has
more community momentum--something which I hope will begin to change--but
Mercurial is nevertheless an outstanding distributed version control system.

Mercurial's default mode of operation has the benefit of being extremely
Subversion-like--in a good way. Indeed, if you never interact with anyone
else, most of the normal Subversion commands--ci, mv, rm, add, log--"just
work." There is no git-style index to worry about, no rebasing, and the
command set is small and regular. That means that the learning curve is far,
far shallower than git. Yet it still provides the same distributed branching
and merging as git, and, though it _is_ slightly slower, the difference is
truly negligible in my experience--maybe a couple percent difference at most.

What about the incredible power of git, though? What if you actually want to
be rebasing and rewriting your history all the time? When you _want_
unfettered power in Mercurial, you've still got it. The `mq` extension, which
can be enabled by adding a single line to your configuration file, allows you
to do all the crazy patch rewriting, merging, splitting, and rebasing that git
does.[1] But you can ignore that functionality if you want, and still have a
very powerful and fast distributed change system.

When Fog Creek looked at going to a distributed source control system last
year, I advocated Mercurial over the competitors. Though the transition wasn't
seamless, I've been extremely happy with the result. The Unix experience is
extremely pleasant, and TortoiseHg (<http://tortoisehg.sourceforge.net/>)
provides a surprisingly solid Windows experience out-of-the-box.

If you can look past the fanboyism, I'd strongly encourage you to give
Mercurial a try. I think it strikes a much better power-vs.-usability balance
than git does.

[1] Except microbranching. Mercurial doesn't currently support local named
branches. You can achieve similar things by using mq with qguards, if you
really need them, but in practice, I find it's usually easier to just clone a
second repository. In practice, although I do miss microbranches sometimes,
I've found I greatly prefer the streamlined workflow of Mercurial.

~~~
cconstantine
The momentum here seems to be pretty heavily for Mercurial... maybe I should
consider git to support the underdog ;)

All kidding aside, Hg might be what we need. I'm looking into it now. Is it
easy to import p4 history into Hg?

~~~
thomasmallen
Don't take our advice like that! Try it first: You may hate the way that one
VCS works compared to another. Technical merits have less to do with this
choice than your own preferences.

~~~
cconstantine
Hehe, yeah I'm trying to try it.

Unfortunately it doesn't like any of the large files (anything over 10megs),
and the windows shell plugin causes explorer to run like a dog.

I've done some research on it and I want to like Hg. any advice on having it
play nice with large files?

~~~
silentbicycle
Maybe this is too far outside of the question, but particularly if you aren't
going to be merging the large files, have you considered mirroring them with
rsync (<http://www.samba.org/rsync/>) instead? I'm not sure tracking really
large binary files is best handled by a VCS. I'm trying to read between the
lines in your question, but would e.g. periodically making dated snapshots of
the binaries and otherwise automatically mirroring around the newest version
suffice?

The Mercurial page on binary files
(<http://www.selenic.com/mercurial/wiki/index.cgi/BinaryFiles>) doesn't say
anything about especially large ones.

Also, has anybody had good experience importing from p4 to hg on Windows? I've
tried using tailor and some scripts from the mercurial wiki, but no success
yet. One of these days I might write my own importer script (mostly because I
need to import from five or six major branches, about 50k commits), but
haven't had the time yet. (I'm working on Windows for similar reasons.
Mercurial has been great for typical VC usage.)

~~~
cconstantine
Our import from p4 doesn't have to work in windows. We're comfortable in
Linux, we just can't develop in it.

You are correct, the binary files will not be merged. Some of them are large
encrypted databases and can't be merged. The database is built based on source
that may be merged, but the merge will happen in the source, and when the
source merge is complete we'd rebuild the databases and check in the result.
The largest number of binary files are compiled programs and very rarely
change.

You're suggesting something a lot of other people in this topic have
suggested. I would love to implement some kind of binary file management
system; it just isn't going to happen. These binary files don't need to be
merged, but they will be changing semi-frequently and are likely to be
different between branches. We don't have the time and manpower to implement a
system that would work for us. Either the VCS needs to handle these files or
we can't use it.

I think I found a config setting deep in the dark heart of git that can make a
repository friendly to large binary files. I'll check it out, and if it works
we could create a set of repositories for these binary files. I'll try this
and see how it goes. (
<http://www.gelato.unsw.edu.au/archives/git/0607/24058.html> )

I'm really trying to find a way to use Hg, but if it can't handle our use-case
we can't use it. That doesn't mean it's a bad VCS. I really like some of its
features and the fact that it is far simpler than git. I also really like that
it has file explorer integration. It just needs to handle projects with an
obnoxiously large code base and large binary files.

~~~
silentbicycle
Did you look at rsync? You don't really need to implement anything.

My point, though, was that in some sense you're trying to find a way to bend a
VCS into doing something well outside the strengths of VCSs, so it is worth
looking into categories of tools better suited to the problem.

~~~
cconstantine
Yes, I'm familiar with rsync. We've talked with Perforce Support a fair amount
regarding large files, and they remind us that we are abusing their system.
It's like using Harley Davidsons on a worksite to move around loads of dirt.
It may work, but it's not the intended use.

We might be able to get that to work for some of the binary files realativly
easily. The problem is that the majority of these files are build artifacts
for our test programs and the builds happen in the same directory as the
source (yes, this is a stupid way to do things, but we need to do what
customers do, and our customers do this). It would be very hard to distinguish
between build sources and build artifacts. This leaves the issue of branches.
Each branch would have to 'know' which of the binaries to grab in the shared
space, and update other branches at integration time.

This could be a valid way of doing things, but it would take a fair amount of
effort because our environment is like the real world; dirty and complicated.
For the past couple of years it's been on the backlog to clean and simplify
our environment, but something more important always comes up.

~~~
silentbicycle
Good analogy.

How do you distinguish between the build source and artifacts now? It's not
difficult to specify (via filename regexes) whath should and shouldn't be
examined by Mercurial as potential VC files, beyond whether or not you
explicitly add them: look at the .hgignore file. (Not sure that's directly
helpful, but for the archives.)

~~~
cconstantine
We don't distinguish between them. The test programs were built and all the
resulting files were checked in. Unfortunately some of the build artifacts
share extensions and directories with build source, and some of the build
artifacts have no extension. The only way we could add the build artifacts to
.hgignore (or .gitignore) would be to manually add them one at a time, and
that would be a huge task.

~~~
silentbicycle
You can also add * to the ignore file and explicitly specify what to track,
either by hand or by "hg add [fname]" in some sort of script.

I track/sync my home directory with mercurial, and did that to keep it from
scanning most of my drive for updates. (You can probably do the same with
git.)

(For the archives as much as you, though I hope it's useful.)

------
s3graham
Personally, I use Git and hg at home and for small work projects.

But, for the main work projects (console games: lots of code, but also many
large assets), none of the alternatives to p4 are reasonable. (Yes, I'm well
aware Perforce has vast and sundry problems too).

Might be worth doing some testing on hg, if it can handle your data sizes
reasonably, it meets all your other requirements I think.

------
mhartl
_Our products are very-much closed source, so things like Github are of no use
to us._

For what it's worth, GitHub offers paid repositories for just this case; open-
source repositories are free, but you can pay to make your repositories
private. Even the biggest plan is only $200/month; see
<http://github.com/plans> for more information.

~~~
cconstantine
What makes you think we trust GitHub? ;) The only way GitHub could help us is
if they were to opensource their site and let us an instance of GitHub in our
datacenter.

We have some very strict rules about the code never leaving company computers,
and those rules will not be changing.

~~~
ptman
gitorious is like a subset of github and is open source

~~~
cconstantine
Thanks! That looks like it might be something we can use :)

------
artificer
A valuable resource comparing major scm's that I point people to when they ask
me is the FreeBSD project's wiki page:
<http://wiki.freebsd.org/VersionControl> Hope it helps. Regarding specifically
SVN versus GIT, be sure to look at the VCSWhy link at the bottom.

~~~
cconstantine
Thanks, that was helpful :)

------
ryanbooker
Do not go from Perforce to Subversion. Whatever you decide. That is a massive
step backwards.

~~~
Tritis
I wouldn't call it a step backwards. Perhaps a step to the side.

------
thorax
I'm actually a big fan of Perforce. I don't feel that any system I've tried
(~dozen) really beats it when it comes to internal software development inside
of an organization. For open source, distributed version control makes a lot
more sense, but internally, Perforce is pretty hard to beat and modified
versions of it are used at places like Google and Microsoft.

What reasons do you dislike Perforce? Maybe we can help you pick by comparing
what you're trying to move away from.

If by "given the OK" you mean you need to swap because your company doesn't
want to admin/pay for Perforce licenses, then maybe you can clarify that a
bit, too.

~~~
cconstantine
Perforce is not terrible. I'd even go so far as to say; of the centralized
systems, it's better than any other I've tried (including Subversion, and
cvs). It could be better, but it could be much worse.

    
    
      Things we like about perforce:
       - Client side changelists for modified client side code.  This helps organize our local 
         changes to make sure when we're working on multiple things they stay separate.
       - The server is solid.
       - Can handle large files, including the ability to 'forget' previous revisions to save space
         on the server.  It's abusing the system, and we've been told as much by Perforce.  It just
         happens to be the way we do business, and changing that is another project.
       - It can do branching/merging
       - The visual diff and merge tools are pretty good (mostly, more below).
       - Everything that's under revision control is in one place.  If you don't want to have the 
         files on your workstation you can simply remove that tree from your view.
    
      Things we dislike about perforce:
       - It costs money.  At the size of our organization it costs about as much as another employee.  
         This is the reason we were 'given the OK'; developers don't really care about cost as long as 
         we're employed, and management doesn't really care about features as long as we're productive.
       - We've had significant issues when merging.  Conflicts are not properly flagged and "ghost" code 
         (code that didn't exist in either branch) sometimes appear in the merge result.
       - The clients are very iffy.  Crashes are frequent, and the merge problems are related to 
         client-side bugs.
       - No one in the company is a fan of being required to 'check out' files to get them to be writable. 
         This is how perforce knows when files are modified and because there is no equivalent for new 
         files people frequently forget to add files and break the build.
       - It can only show you < 1000 files in a given changelist.  Big changes like that are when it's 
         most important to see what you're doing.  This is pretty common when doing branch integrations.
         When this happens you have to hit the "auto-merge" and hope for the best.
       - Branching/integrating isn't streamlined enough to really support every developer having their 
         own branch.  If it was dead-simple we could support a distributed development model with a 
         centralized server.
       - No real way to share code (for reviews, or collaboration) without going through the shared 
         depot.  We use p4tar, but it doesn't play nice with cygwin and has it's own problems.
       - No direct way to revert code.  Reverting is a 5 step process that isn't entirely obvious.
    

I think that's it.

~~~
thorax
Excellent list there. I think all of your issues are all things I also see as
drawbacks to Perforce. I think the main breaking point out of your list would
be the merging instability you mention, which we haven't had at either of my
last two companies.

We do have merging issues in a different way, but it's always because someone
used the wrong flag when merging the two or didn't properly create the initial
branch. Still, this is significant anyway because merging/branching really
needs to be easy to manage (i.e. hard to screw up) for a tool like this.

In the mega integrations, we came up with a few tricks to work around the file
size limit-- often involving merging subtrees one by one from the target
branch. This is often needed organizationally for us anyway because the
teams/experts are often different when it comes to resolving those merges. It
results in more changelists, but it tends to work out okay and the integration
history is actually in a better state regarding a proper contact for
discussing it later.

We wrote our own Perforce tools at our company for sharing code for code
reviews, or some teams use user/feature/"pre" branches for that. I'd like to
see that improve in Perforce, but it's been pretty bearable.

What I really want is something with the maturity of Perforce in terms of
tools/API/integration/monitoring/history/etc but is built on the premise of
needing to do lots of merges and branches easily. I know some major
organizations have gone with things like Mercurial because it felt to them
like it would mature the fastest in terms of corporate needs, but I've yet to
see any of the distributed version control systems that has gotten over the
curve: <http://en.wikipedia.org/wiki/Image:Gartner_Hype_Cycle.svg>

I can't wait until they do, really, because the perspective that merging is
central to version control is something I agree with.

------
mace
I evaluated Mercurial some months ago and found it very easy to migrate to
from Subversion.

Here are some benchmarks on the Linux source tree that might be useful:

[http://laserjock.wordpress.com/2008/05/09/bzr-git-and-hg-
per...](http://laserjock.wordpress.com/2008/05/09/bzr-git-and-hg-performance-
on-the-linux-tree/)

This blog post is pretty good at summarizing git and mercurial:
[http://importantshock.wordpress.com/2008/08/07/git-vs-
mercur...](http://importantshock.wordpress.com/2008/08/07/git-vs-mercurial/)

------
rms
Why does Git choke on large files?

~~~
gaika
Linus Torvalds: "The git architecture simply sucks for big objects":
<http://kerneltrap.org/mailarchive/git/2006/2/8/200591/thread>

------
zitterbewegung
I use darcs for personal use and it is very easy to use.

~~~
babo
I'm sorry to say but it's starts to get messy when it's used by a team. Darcs
suited me fine while worked alone with projects but as others joined we faces
serious problems, including loss of data. I gave it up, switched to Mercurial
and never ever missed darcs.

~~~
yummyfajitas
Even when used by yourself. Branch a project, build a complicated new feature
(e.g. 50 patches), then try to merge.

The exponential merge problem really sucks.

~~~
gecko
I gave up on darcs awhile ago, but darcs 2 honestly does greatly reduce the
merge issues that used to plague it. If you're still on darcs, get the
upgrade. You may find you no longer need to move to a new product.

------
mseebach
Would it perhaps be practical to pull the binary files out of SCM and build a
small tool that figures out denpendencies and downloads the relevant files
from the build server?

It sounds like that might ease your constraints.

~~~
cconstantine
Something like this might be exactly what we need, but we don't make money
building distributed build artifact caching systems. If we don't make money
doing it, we aren't doing it :(

"Is this good for the company?"

~~~
mseebach
It's often said that developers need to understand the business, and bring
forward a business-case, not a technical case.

So: how much money will lose by choosing the wrong SCM today because you're
unwilling to change a process you even agree is sub-optimal?

Try to count the number of check-ins, branching, merging etc. the entire team
does in a year, multiply by five, and then multiply by even a few _seconds_ of
lost time, annoyment and agony pr. event.

I would guess that in that perspective you can afford spending a few days
seeing if you can change the build-artifact process to allow you a wider
selection of SCMs.

~~~
cconstantine
I fully understand this argument, but during the last quarterly all-hands
meeting our CEO made it abundantly clear that we aren't doing anything unless
it directly brings in more revenue or directly reduces overhead. One of the
major reasons for switching from p4 is to reduce overhead, if we have to build
a new system for it's replacement to work for us we aren't doing a very good
job of reducing overhead.

------
babo
I should recommend either mercurial or git, but the learning curve of git is
steep.

~~~
icky
If you set up git's bash completions, and run through a git tutorial, you'll
know enough git to work on your projects. The more esoteric subcommands are
generally useful in specific situations and simply don't have equivalents in
most other VCSes.

------
vegai
Darcs!

I mean mercurial!

No, git!

(Aww, these computer science problems are so hard!)

