This could actually give Mercurial a big edge over Git for development environments where large binary files are a core part of your workflow - like game development. Products like Perforce are a big hit in games precisely because they are really good at handling this specific class of file.
It's a shame, because I hate using Mercurial, but this would give me a very strong reason to use it for my game projects instead of Git.
Mercurial "just works", and its commands are less arcane.
To be fair, git is now much easier to use. But also to be fair, mercurial has become much more powerful. In mercurial, doing straightforward things is simple, and doing complicated things is more complex, which is the way it should be IMHO
It even has rebase, although one might argue that is not a great differentiating feature for a REPOSITORY
Why would I have to type that? I have been using git for 4 years and never had to type git-update-cache --add or copying .git/FETCH_HEAD. Are you spreading FUD?
Union-merging is insane and thus should take some insane understanding of what you're doing before you can do it. Anyone that's merging two unrelated repositories should understand what's going on under the hood anyways.
What you're doing in git actually makes sense, since a normal pull is just fetch+checkout+merge+commit, you have to go into the plumbing and trick every step of the way to fetch, checkout, and merge using the wrong repositories. In the future, this could be made porcelain as it has been in hg, but do people really want to do this? It's much safer and saner to use submodules for 99% of use cases.
I still think people prefer git to mercurial these days (if we ignore github) for the same reason they prefer vim because real programmers use butterflies. http://xkcd.com/378/
What in particular do you hate about Mercurial? I'm curious as a contributor to hg if there's anything in particular that might be fixable (obviously there are some sacred cows, but many things can be fixed just by turning on an extension).
There are lots of little usability issues I hate, but it's not really fair to call those the reason, since git has similar usability issues.
It really all comes down to patch queues: They're awful. They're so awful that they make me dread writing code and make me wish I was using Subversion or CVS. Any time I need to move changes that aren't ready for trunk yet from one machine to another I can expect to spend hours figuring out what went wrong with mq this time and probably lose some work.
Exporting/importing patches, etc - none of it has ever worked right for me on the first try and I've never found documentation that explains a way to use mq without pain and suffering.
In comparison, importing/exporting patches with git is a snap.
The lack of a good equivalent to git's rebase -i is also unfortunate, but not really a showstopper. Hell, maybe mercurial has an equivalent by now.
You don't have to use mq ever. I don't really use it anymore. I still have it enabled for the 'strip' subcommand (so I can discard revisions if I really need to), but I don't even use that. Instead, it's all rebase and histedit, with a sprinkling of bookmarks so I can keep track of many lightweight branches at once.
That that is the another problem -- extensions. You go and read how to do something in mercurial. Go try it on your machine doesn't work. After a while realize you have to enable an extension. Why not just enable all the useful extensions.
For one: Because many, many features have sharp edges I don't want exposed to newbies. 'git reset --hard HEAD^' for example.
Another: anything that's in hg proper has really strict backwards compatibility promises. Extensions let us screw around with a concept until the UI is _right_.
Maybe it is already implemented, but I imagine it would be possible to let the user know that they are attempting to use an extension and said extension is disable and suggest how to enable it.
Something like Ubuntu's missing command package works. If I type a command that is missing but is installable by a package in apt, there will be a suggestion printed "It looks like you are trying to use this command, it can be installed via apt-get ..."
% hg fetch
hg: unknown command 'fetch'
'fetch' is provided by the following extension:
fetch pull, update and merge in one command
use "hg help extensions" for information on enabling extensions
but yeah, the story for out-of-tree extensions could probably be better. I'll mull that over (and also try to upstream some of my extensions again).
I'm the developer of [git-annex](http://git-annex.branchable.com/) which is AFAIK the closest eqivalant for git. I only learned about the mercurial bfiles extension (which became the large files extension) after designing git-annex.
The designs are obviously similar at a high level, but one important difference is that git-annex tracks, in a fully distributed manner, which git repositories currently contain the content of a particular large file. The mercurial extension is, AFAIK, rather more centralized; while it can transfer large file content from multiple stores it can't, for example, transfer a large file from a nearby client that happens to currently have a copy, which git-annex can do (if a remote is set up). This location tracking also allows me to have offline archival disks whose content is tracked with git-annex. If I ask for an archived file, git-annex knows which disks I can put online to retrieve it.
Another difference is that the mercurial extension always makes available all the large files for the currently checked out tree. git-annex allows a tree to be checked out with large files not present (they appear as broken symlinks); you can ask it to populate the tree and it retrieves the files as a separate step. This is both more complex and more flexible. For example, I have a git repository containing a few terabytes of data. It's checked out on my laptop's 30 gb SSD. Only the files I'm currently using are present on my laptop, but I can still manage all the other files, reorganizing them, requesting ones I need, etc.
git-annex also has support for special remotes, which are not git repositories, but in which large files are stored. So large files can be stored in Amazon S3 (or the Internet Archive S3), in a bup repository, or downloaded from arbitrary urls on the web.
Content in special remotes are tracked the same as other remotes. This lets me do things like this (the first file is one of my Grandfather's engineering drawings of Panama Canal locks):
joey@gnu:~/lib/big/raw/eckberg_panama>git annex whereis img-0124.png
whereis img-0124.png (5 copies)
5863d8c0-d9a9-11df-adb2-af51e6559a49 -- turtle (turtle internal drive)
7e55d8d0-81ab-11e0-acc9-bfb671110037 -- archive-panama (internet archive http://www.archive.org/details/panama-canal-lock-design-papers)
905a3a64-4149-11e0-8b3f-97b9501cdcd3 -- passport (passport usb drive 1 terabyte)
9b22e786-dff4-11df-8b4c-731a6178061c -- archive-leech (archive-6 sata drive)
f4c185e2-da3e-11df-a198-e70f2c123f40 -- archive (archive-5 sata drive)
ok
joey@gnu:~/lib/big/raw/eckberg_panama>git annex get img-0124.png --from archive-panama
get img-0124.png (from archive-panama...) ok
I'm hopeful that git will grow some internal hooks for managing large files that will improve git-annex and also allow others to develop extensions that, perhaps, behave more like the mercurial largefiles extension. I recently attended the GitTogether and this stuff was a major topic of discussion.
does git-annex work on windows? if not, do you have plans to port it? it's an important problem and it'd suck to have a solution which doesn't work on all major platforms.
have you seen: https://github.com/apenwarr/bup
"file backup system based on the git packfile format. Capable of doing fast incremental backups of virtual machine images. "
Sure, git-annex can use bup as a special remote. This way you get bup's nice properties of storing big stuff in git with binary deltas, with git-annex's nice properties of a normal-looking file in a git clone.
That options work fine, provided that all of your binary assets sit at one location in the tree, and that you just happen to have a Subversion server lying around at the time. largefiles allows you to put your assets where you want, and allows you to avoid setting up a Subversion server.
Binary files are already diffable, both in how they're stored (in fact, the only thing that Mercurial stores internally are binary diffs), and in terms of sending around patches (that's what the Git patch format is for).
There are two problems that largefiles tries to solve: first, that while binary files are technically diffable, most of the popular ones store large amounts of compressed data, which means that their diffs are insanely poor. Combine that with the second problem, which is that distributed version control systems tend to include the entire history in every repo, and you've got a recipe for disaster: those 200 MB worth of textures that you just color-corrected are now going to be another 200 MB of data that every last developer needs to get whenever they attempt to fetch your repository.
largefiles solves this by saying that certain, user-designated files, are not actually stored in the repository. Instead, stand-ins, which are one-line text files with the SHA-1 hash of the file they represent, are stored instead. Whenever you update (checkout, in Git parlance) to a given revision, largefiles fetches your missing files on-demand, either from the central store, or (if available) from a per-user cache.
The benefit of this approach is that, if just want the newest revision, you don't have to also fetch all the historical versions of all the assets. The downside of this approach is that a clone doesn't, by default, have the full, reconstructable history of the entire repository. Whether this trade-off works for you will largely depend on who you are and what your workflow is, but we've found many Kiln customers who find it to be an excellent trade-off.
That's exactly what it doesn't do. It versions the checksum referring to a largefile along with the rest of your repository, meaning it's little more than sugar for a fancy "network symlink".
However this should be enough in most circumstances, e.g. to allow representing the complete state of a Debian package repository, without causing Mercurial to slow to a crawl. (Note I just made this use case up, it's just an example)
Of course! largefiles is a direct descendant of kbfiles (our initial, Kiln-specific version of this functionality). We are really happy to see it integrated into the official Mercurial release, and will be supporting it within the next couple of weeks. We're just working on making sure that the switch is painless and transparent for everyone who's currently using kbfiles.
/me wonders when Mercurial will ever do anything other than copy BitKeeper.
We've been doing this for years, my photos are in a ~100GB BK/BAM repo.
Release notes for BitKeeper version 4.1 (released 12-Oct-2007)
Major features
BAM support. BAM stands for "Binary Asset Management" and it adds
support to BK for versioning large binaries. It solves two problems:
a) one or more binary files that are frequently changed.
b) collections of many large binaries where you only need a subset.
The way it solves this is to introduce the concept of BAM server[s].
A BAM server manages a collection of binaries for one or more BAM
clients. BAM clients may have no data present; when it is needed
the data is fetched from the BAM server.
In the first case above, only the tip will be fetched. Imagine that
you have 100 deltas, each 10MB in size. The history is 1GB but you
only need 10MB in your clone.
In the second case, imagine that you have thousands of game assets
distributed across multiple directories. You typically work only
in one directory at a time. You will only need to fetch the subset
of files that you need, the rest of the repository will have the
history of what changed but no data (so bk log will work but
bk cat will have to go fetch the data).
To really copy BitKeeper Mercurial would need to start charging for use and come out in Basic, Pro and Enterprise editions, with things like BAM support disabled at the lowest level. Fortunately they don't do this.
It's easier to copy than it is to invent stuff, sad but true. hg has a long history of doing illegal copies of BK tech, we could have sued them out of existence years ago. Imitation is the sincerest form of flattery, so I guess we should be flattered :)
Legalities aside, the point I've made for about a decade now is that it would be interesting to see a release announcement from hg where I went "That's cool! Why didn't we think of that?"
As for the 3 levels of product, um, when you build commercial products with commercial support, you don't do that? You really want only one offering? That doesn't play well in the commercial world, we tried that. If you are just taking a dig at commercial software, sorry about that, but we have to pay for dev somehow. I'd love a way to open source the thing and make money, haven't found it.
It's a fairly long and pretty sordid story that ended with a certain hacker's employer sitting down with us and saying "We're not admiting that he did it, but just hypothetically speaking, suppose he did. What do you want?"
And we said "we want him to stop his illegal activity". And he finally did and we dropped it, we're not on the lawyering business. We could have made a pretty big stink about it, it's not one of open source's finer moments, but all we wanted was a level playing field, we got that, we moved on.
I remember that, long ago; I don't think there's anything wrong with not wanting users of your product to leverage it to build a replacement nor disallowing it in the license.
I think where people (including myself) have a problem, is claiming that mercurial somehow has ripped off something from BK. From vague memory, BK laid claim to using a DAG for a DVCS? I assume you must also think that if someone is first to implement something that can be trivially found in an introductory computer science text, they have a patentable claim?
Ironic that "largefiles" is a user contributed extension, and that only mercurial seems to be guilty of "copying," but git must be so different that it's unworthy of being sued?
This is a very serious allegation. It would be nice if you substantiated your claims. One could be tempted to write off unsubstantiated claims otherwise.
Hg 0.1 was released around 6 years ago, not a decade ago.
I'd be happy to do so if you can show me an outcome that is anything other than bad PR for us and an open source hacker looking like a jerk. We looked at this hard and for us the least bad outcome was to just to get the guy to stop and move on. Anything else was going to be like this thread, everyone saying it's not true and then, if they ever believed it was true, they'd still be pissed at us.
If some credible person in the open source community wants to talk to us about it, look at the evidence, and confirm the facts and relay that back without naming names, that's fine with me.
I'll think about it. It's not an easy choice, the guy in question is someone that I liked a lot, tried to hire him, he's a good guy other than this one issue. As much as I'd like to show you all that I'm right I'm not sure that dragging someone through the mud is worth it.
After a bit of digging, I learned that BitKeeper has, as a part of its EULA, a provision that disallows its users to contribute to other source control projects[1]. Larry McVoy (aka luckydude) has actually tried to enforce the EULA by contacting the users' employers to get them to stop contributing, but my guess is that he never took it to court because, well, he would be laughed right back out again.
As for Mercuiral, one of the early developers, Bryan O'Sullivan, apparently worked for a company that used BK, and McVoy told him that he had to stop contributing[2] to the project, which he did[3]. O'Sullivan is now showing up in the commit history again, which I presume means he's no longer working for a company that uses BK.
We didn't see the upside of sueing and we could sure as heck see the PR down side.
What would you have done? You got a well known hacker who all of the open source guys will side with, so you lose the PR battle, but the guy is ripping off your technology, his employer basically admitted that in front of your lawyer. What would you do?
Yeah, not sure, not an easy problem. Maybe go public? In open source world reputation is almost everything. Honesty figures in that well. If you have good proof, their reputation would be ruined.
I personally don't care if this was RMS himself or Torvalds, if they are copying code from others as their own, I lose respect for them and will refuse to use or promote their projects.
Maybe present it in an exploratory kind of blog post -- "So yeah we found this out and we don't know what to do, what does the community think?" Just make it public.
This is either him reminding everyone of the hissy-fit Bitkeeper threw when Tridgell telnet'd into a bitkeeper server and typed HELP, or the hissy-fit Bitkeeper threw when Bryan O'Sullivan dared to contribute to Mercurial while his employer held a license for Bitkeeper.
Either way, the reminder that Bitkeeper's licensing terms are so utterly ridiculous that either of the above cases were considered in any way, shape, or form "nefarious" only strikes me as a good way to scare away even more potential customers from their product.
Tridge is not exactly telling the whole story. That telnet thing is quite but the part he left out is when Linus was at his house running BK commands and Tridge was snooping the network to figure out the protocol. There isn't any chance that Tridge figured out what to do by telneting to bkbits and I'll back that up with a $10K challenge for anyone to write the same code Tridge wrote, in the same time frame, with the only resource being telnet to bkbits.net.
Go talk to Linus and see what he says about all this, don't take my word for it.
People don't take your word for it though, so it's much ado about nothing. The burden of proof is on you, not anyone else ("Extraordinary claims require extraordinary evidence") hence no need to "go talk to Linus," because your claims are so highly suspect.
Because, really, nobody thinks BK contains some kind of rocket science for anyone to rip off in the first place. If there is, IMNSHO I don't see it in the mercurial or git sources, which is readily available to anyone. Hence why nobody takes seriously what you, or even a supposed "employer" or "lawyer" says either (and what your lawyer would say is even more highly suspect :)).
And if all this supposed restraint is due to some kind of self-interested, game theoretic calculation as you imply, why does that same thinking not restrain you from touting unverifiable claims? It's only generating "bad PR" for you which you wish to avoid, makes you look bad, and by extension your claims ever more doubtful. That can't be good for business.
Alright luckydude, moment of truth. Does your EULA say what another poster here quoted -- that it prevents the employees of the company from contributing to Open Source projects?
Here's the scoop. We kinda invented the whole distributed source management thing, go to google groups (if they are still there) and do a date search for changeset before 1998. Now do one today. That's all us. (in case they aren't still there, there were something like 16 hits for changeset before us and now there are zillions.) Is that proof we invented it, I dunno, it's something. We certainly raised awareness that Subversion, which started when we started, wasn't the way to go.
What our EULA said, and says, is while you are using our system you can't contribute to an open source system that competes with us. If you want to work on some other open source you are fine. When you stop using our system, have the big fun, but while using it, no copying our stuff to some open source clone. Yeah, I know that makes many (all?) of you insanely pissed off. But we invented this stuff, we've been ahead of the curve a bunch of times, and open source is always right there ready to "rewrite" whatever we invent. Is it really so unreasonable that we don't want to hand whatever is next cool thing (in source management, right, that's cool :) to people and say "here ya go, have at it, copy it"?
We don't think so because we spend tons of time and money figuring out a right answer. It was a lot of work and just like you, we'd like to reap some benefits from our work.
It's a shame, because I hate using Mercurial, but this would give me a very strong reason to use it for my game projects instead of Git.