Hacker News new | past | comments | ask | show | jobs | submit login
Mercurial 4.0 Sprint Notes (groups.google.com)
148 points by steveklabnik on Oct 18, 2016 | hide | past | web | favorite | 127 comments



I know this is slightly tangental but I'm always a little shocked that Facebook (and I think Google to some extent) have massive mono repositories.

The benefits of having one repository do not seem to be worth the serious performance issues as well as potential coupling that can happen with a gigantic code base as well also making much more difficult to OSS certain parts.

e.g. why doesn't FB use dependency management (or binary package management aka maven, npm, etc)? That is have multiple repositories and cut releases. Build tools to help cut and manage the releases and dependency graph of the releases.

There are even plugins and shell scripts that will make Mercurial act like a mono repository for many small repositories (I use it for our own Java Maven code base).

I must be missing some killer features and would love to see it in action (FB major repository).


Anyone who is old enough to remember the pain of VCSes like CVS and RCS will note that the number one feature touted to move to any other VCS is invariably "atomic changesets"--all the necessary changes to files are listed in a single changeset. Monorepos are nothing less than remembering the value of those atomic changesets.

An example of utility of monorepos is things like automation--if you change, for example, how you publish packages, and you need to maintain multiple stable branches, it is immensely useful to keep the automation steps in the same repository as code. If you don't, your automation repository then looks like

    if version < 31:
        step_a()
        step_b()
        step_c()
    elif version < 35:
        step_a()
        step_bv2()
        step_d()
    else:
        ...
which quickly grows unmaintainable at scale.


I can only imagine that the only way to scale that would to make whoever owns the repo responsible for automation as well. A dedicated team can provide the tooling/frameworks (basically a "golden path") but if/how would need to be up to the team. I think in a general sense this is the approach Netflix takes to services.

That said I have no idea how Netflix does source control, only that they use BitBucket/Stash on-prem and that supports both mercurial and git.


If you do it right your automation steps are just another package dependency that is versioned (I know because this is how we do it). We have a single bash script that will auto update itself (think akin to home brew albeit in bash).

The only immensely dangerous thing that can happen is if you drop your package repository or change formats of the repository which rarely happens.


When you split your libraries into separate projects, you have to start versioning them.

And every update to the code requires other teams to then update their library/app to use the new version, and apps depending on that dependency... you get my drift.

You end up with a complicated dependency headache which hurts productivity.

With a monorepo, you can statically identify all places where your library is used and update those automatically with tooling, or manually. You can also monitor where and how a piece of code is used across the company, etc.

If you then ensure that only commits get accepted that pass the relevant tests, you end up with a sane and working HEAD that's always up to date.

Of course there are lot's of drawbacks as well.

But Google and FB seem to have concluded that this is the better approach for them.


> You end up with a complicated dependency headache which hurts productivity.

>With a monorepo, you can statically identify all places where your library is used and update those automatically with tooling, or manually. You can also monitor where and how a piece of code is used across the company, etc.

Aka you have to figure out dependencies regardless of monorepo or not. Aka you need dependency management. Of course you could make the argument everyone has to use the latest greatest but you now have the possibility of changing one dependency and requiring an entire redeploy of the whole company.... I have seen what people do in these cases during desperate times... they copy the code and put it in their own project which sort of defeats the purpose and worse it is now not tracked (well I suppose if they have a good enough SCM comment it sort of is).

> If you then ensure that only commits get accepted that pass the relevant tests, you end up with a sane and working HEAD that's always up to date.

It depends on workflow. For some HEAD is what is actually deployed. If that is the case it is fairly difficult to achieve that goal with a giant monorepository if you have tons of teams doing microservices and deploying often.


> When you split your libraries into separate projects, you have to start versioning them.

Or you can have your tooling use the convenient built-in versioning provided by the VCS. It's not called a "version control system" for nothing.

> And every update to the code requires other teams to then update their library/app to use the new version, and apps depending on that dependency... you get my drift.

This happens regardless of whether the codebase is organized as a monorepo or not. I've been subjected to monorepos at my previous and current employers, and we run into difficulties with this all the time. At least once a quarter, I get an e-mail from someone telling me that they updated such-and-such, and now the build fails in one of my projects. So I have to sit there and figure out wth they changed, why it's causing my code to fail the build, and how to fix it. Only once was the problem actually in my code (my Makefile, actually, which failed to bind some variable that was expected by the build system but wasn't documented anywhere, and of course the build system spits out a worthless error message, but I digress).

> You end up with a complicated dependency headache which hurts productivity.

Again, monorepos aren't immune from this, nor do polyrepos inherently suffer from it.

> With a monorepo, you can statically identify all places where your library is used and update those automatically with tooling, or manually. You can also monitor where and how a piece of code is used across the company, etc.

There is literally nothing preventing this from being doable with a polyrepo. In the case of updates, you may need to have the updated repos checked out, but it's not like you can only ever have one repo checked out at a time.

> If you then ensure that only commits get accepted that pass the relevant tests, you end up with a sane and working HEAD that's always up to date.

I've never worked anywhere on any project that disallowed commits that didn't pass the tests. It's typically been up to the committer to ensure that what they commit is acceptable. Furthermore, while my employers thus far have all been customers of AccuRev or Perforce (whose only real feature beyond what Subversion offers is that they cost lots of money), and have not personally had the pleasure of working with a DVCS at my day job, this workflow of "allow only test-passing commits" is wholly contrary to the "many small commits in quick succession" workflow afforded by DVCSs, which I contend is one of the biggest productivity boosts afforded by DVCSs (along with cheap branches and network unnecessity) and one of their main draws. And even so, it's still rather easy to maintain a sane and working HEAD with a DVCS (thanks to the cheap branches if nothing else).

If you find that you can't do anything you've listed here without a monorepo, that's a tooling issue rather than an organizational one. Frankly, I've never seen a large monorepo I felt was justified. Every single one was just baggage held over from a time long past when a single repo still made sense for the codebase.


As someone who works at another company with a big monorepo, briefly:

* Ability to change an API and all its users at the same time.

* Circular dependencies become a non-issue in a lot of cases where the would be if you vendor your dependencies.

* Even if you vendor your dependencies hunting for bugs is a lot easier, your bisect of a bug in a library will just come down the commit that upgraded it from 1.0 to 2.0 without a monorepo, with a monorepo you can bisect anything down to a specific commit.

* Code discoverability / mass edits are a lot easier, and integrate seamlessly into a lot more tools than if you need to manage multiple checkouts etc.


The big thing I miss about Subversion is that you could check out one subtree. On the biggest SVN project I worked on, we built 5 binaries from the same code tree, but only the leads and a couple of senior devs that worked on crosscutting concerns (ie, the people who would do the mass edits you mention) had the entire tree checked out. Everyone else had just the one or two modules they were working on.

The second biggest, we used just enough of a module system to enforce modularity by having a single code repo but separate compilation units. Your circular dependencies between subprojects would show up at compile time (or at least, on a clean build, as our CI machine did). Stopped a lot of obscure runtime issues and made people think about what they were trying to do.


This post is about Mercurial with which i'm not familiar, but in Git you can checkout a subtree since version 1.7[1]. It's commonly called "sparse checkout".

[1] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-s...


But you still need to clone the whole thing. Sparse clones are theoretically possible, but aren't implemented AFAIK.


You can do a shallow clone; see the --depth option.


That reduces the history depth, but still gets the whole tree. Arguably, the history is much more interesting than having the entirety of the monorepo for many people.


> e.g. why doesn't FB use dependency management (or binary package management aka maven, npm, etc)? That is have multiple repositories and cut releases. Build tools to help cut and manage the releases and dependency graph of the releases.

Because it's a pain in the ass and a monorepo means you don't have to deal with that crap for internal code. It's especially important for refactorings or API changes where you can atomically perform a change at once rather than have to wait for the changes to trickle down across hundreds of repositories over days or weeks.

> potential coupling that can happen with a gigantic code base

The coupling is a feature not a bug. Decoupling is a mean not an end, and it has costs. If you don't need it you'd rather not pay it. Same as generic collections, if you don't need them there's nothing wrong with specific ones, quite the opposite.


Aside from all the other technical bits, one killer feature for my company is TortoiseHG (http://tortoisehg.bitbucket.org/). Unlike a number of the other Tortoise-* projects (which don't seem related actually), TortoiseHG is a really great cross-platform gui, written in python & qt -- runs just about everywhere, and makes working with mercurial uniformly easy.

The fact that instead of "hg ci" I can type "thg ci" and get a commit window with cherry-picking and meld integration right off the bat is really powerful. I've not seen any good git guis that come close to it's featureset.


Yeah, THg is the absolute best DVCS GUI client I've had the pleasure of using. Everything else is awful and confusing in comparison IMO.

I even use it for Git with hggit sometimes.


Been at Google for coming up 5 years and I don't think I'm giving away any secrets when I say don't recall there being any performance issues with the mono repository.

And coupling is what is explicitly being sought, not rejected. The whole point is to build everything off head, keep head sane at all times, and avoid version dependency hell.


Then maybe you can answer a few questions:

- Do you have zero external dependencies on 3rd party library not own by Google? These must still be managed anyway? How do you deal with project X is not ready to move to external library version N but project Y needs version N?

- What about release branches? If you need to intergrate a bug-fix in a sub-system, the magic mono-repo now means merging a fix is harder as it may depend on other unrelated changes all over teh repo? The laissez-faire attitude of not having to care about details in HEAD would seem to bite back in release branches.


"- Do you have zero external dependencies on 3rd party library not own by Google? These must still be managed anyway? How do you deal with project X is not ready to move to external library version N but project Y needs version N? "

I own third party policy, so i can answer this. I'll stick to public info.

There are thousands of 3rd party libraries. The rules of the shared codebase is "third party libraries are not free headcount". You choose whether you use them or not (implicitly or explicitly). If you add a third party library, you get to maintain it, and stay within the support horizon of upstream. If you choose to use one, you get to stay up to date with the upgrades others make. If you need features not in the current version, you get to upgrade the library (and work with teams, who are not allowed to block you).

This is pretty much the only way to make it all work in practice (i'm aware of how it sounds. In practice, even upgrading stuff literally every google target depensd on takes a week. It's only stuff where folks let it go for 6 years [we now have better detection of out-of-date code] that become a problem to upgrade).

If you don't like it, don't use third party code.

Note that these rules make it just like any other code, because your problem is not specific to third party code.

Note that binary versioning, etc, is pretty much always a complete disaster in practice on a large scale

"the magic mono-repo now means merging a fix is harder as it may depend on other unrelated changes all over teh repo?"

This is rarely true in practice, because usually fixes are targeted.


From the ACM link someone else provided:

> An area of the repository is reserved for storing open source code (developed at Google or externally). To prevent dependency conflicts, as outlined earlier, it is important that only one version of an open source project be available at any given time. Teams that use open source software are expected to occasionally spend time upgrading their codebase to work with newer versions of open source libraries when library upgrades are performed.

So if an external library has to be updated, the entire codebase must be migrated all at once. Then Rosie is used to split the change into a lot of smaller changes (to be reviewed by all affected teams). Once all the smaller changes are LGTM'ed, it's submitted all at once.


I think its insane not to have dependencies and third party code in your scm. Although it seems facebook plan to move those to a package manager due to performance problems and libs touching ten thousand files in a path release.


Yes but the question is where do you stop. Are you going to check in GNU Make and the compilers as well? How about whole operating systems?

I also would say "only check in what you change" is a simpler rule than "check in all dependencies needed".

I understand company source code needs to be different because of security reasons and convenience but should it really be that much different than any OSS project (which I guarantee would be pissed if you checked in all your dependencies).

> I think its insane not to have dependencies and third party code in your scm

Consequently there are some that might think the complete opposite (I wouldn't assign insanity but I would say it probably in most cases is not the right thing to do).


It all depends. Google might want to keep track of the full stack, whereas your average small software company will quickly adapt when their platform pulls the rug.


I am not sure how much of that I can answer in detail, sorry. But yes we have working strategies for dealing with both situations.


http://m.cacm.acm.org/magazines/2016/7/204032-why-google-sto... has a lot of answers (was published last summer).


May be I am not understanding the problem correctly, but I wonder why you couldn't use separate repos, and build a script to record the current commit hash of every other repo when you make a commit in any one of the repo.

So every commit in every repo has a map that maps the repo path to a commit hash.

You can use this info to sync the versions of other repos when you update/checkout a version in any of the repos. And all of these can be scripted.


Why would you resort to a set of hacky scripts that everyone who wants to use the repository must follow when you can just let the VCS itself manage everything for you?


>Why would..

Because then you can keep the repos separated. Not saying that you should. But if this is the only problem that forces you to use a mono repo, then I am just asking if that problem can be solved in this manner?

>hacky scripts that everyone who wants to use the repository must follow when you can just let the VCS itself manage everything for you?..

What hacky scripts? Do you think they (google, facebook etc) don't have enough resources to build a 'non hacky' script for what ever they needs done?


I haven't worked on a monorepo, but I will say that Facebook _does_ use at least some of these tools, apparently: enough so that they created a whole replacement for the npm client, yarn.

From talking to some companies that use monorepos that want to use Rust, they _do_ use these kinds of tools, but want support for it in the tool. Firefox, for example, is pretty much a monorepo, and we built tooling in Cargo to help.

I think that part of it is that "monorepo" can mean slightly different things. Does it only contain your companies' code, or also your dependencies? Do you modify those dependencies in-repo?


Yeah I'm not even sure I understand what FB's monorepository is so it is probably unfair critique.

I can definitely see an "app" (Firefox) as a monorepo but not giant web companies (particularly with the surge of microservices).

BTW Cargo is fantastic (and I'm fairly picky on package management / build tools).


Regarding performance: these mono-repo tools work differently and are optimised for their workload.

Regarding why the hell you would do it: In a corporate setting you tend to throw all kinds of st into your repo. In my workplace that includes, network configurations, 200 MB binary tools, usernames (and who knows, passwords too probably).

We do this because we want all the different things, including external dependencies and "ops stuff" to be syncronised. That would be totally insane in the open source world where you are releasing to the public -- but it seems corporations can make it work internally.



OSSing certain parts is certainly an issue, but what do you mean with "potential coupling" being an issue? That's exactly the advantage, right?


I suppose you could make the argument the other way but my thoughts are that by having the code all together it maybe confusing what is public API (ie interfaces / contracts) and what is not.

I know personally that I have had to move components into separate projects (not necessarily repositories but compile units) to avoid developers from accessing things they shouldn't (e.g. access the database directly instead of going through a different layer).

Separate repositories would be enforcing it to a greater extent. But like I said earlier I suppose I could see it the other way (ie visibility and thus prevention).


I don't use version control systems to any extent but it seems like your developer problem is an issue of team-coherence and code review, and not a matter that your vcs should/need to enforce.

Even in a distributed team like linux kernel development, code maintainers have the ability to say "this is how we (as core maintainers) want you to access certain data structure. If we catch you doing anything else illegally, your code won't be accepted, and we will keep rejecting your code until you adhere to our coding style and api design."


While I appreciate the "code review should be improved" it is fairly easy to accidentally import something you should not have in many IDEs (the merge the class path) as well as other accidents. These issues along with many others including code formatting can be automated.

Separate projects allow less namespace collision accidents.

You could achieve all of this with monorepo but requires proper tooling.


As an open-source example, the Linux kernel is a mono repository. It makes refactoring easier, and it makes it much easier to drop support for old (or not so old) unused features.


The Linux Kernel and Firefox (mentioned earlier) are not really good examples. I would put those in a single repository as I would imagine most people would.

But a whole company as big as FB and Google using a single tree seems like an incoherent nightmare without proper (somewhat proprietary) tooling (filtering of logs, branches, tags, etc).


> There are even plugins and shell scripts that will make Mercurial act like a mono repository for many small repositories

Seems kinda backwards to me, why not project portions of a monorepo to behave like individual repositories? That way you get atomic commits too.


> Seems kinda backwards to me, why not project portions of a monorepo to behave like individual repositories? That way you get atomic commits too.

But on the other hand you now need custom scripts to filter out other projects that are completely irrelevant to you. Just the .hgtags file alone must be a 10k line nightmare... or maybe the big boys just don't deploy/release often. It is a lot more tooling than I think you might think.

Linux and Firefox having a monorepository is not the same as a whole company as big as google and facebook having a single repository for all of their projects.

I'm not even sure if these companies really are doing that. The use of monorepository could just mean a single place of storage but maybe they do allow a couple of trees (or maybe not).


Maybe Docker is the solution.

Your "working state" is now in an image (maybe a Docker file that checks out particular revisions of different repos), so putting everything into one giant VC repo is not necessary.

A versioned Dockerfile that says "this is how we built SystemX at version 1.5" is much better than doing it in Git, as it also covers how the underlying server was built, while a giant git repo might have every application you need at the correct version, but won't say anything about the server it is deployed to.


The advantage of a monorepo is that you have atomic commits. What you're suggesting does not provide that.


I had no idea Google and FB were dabbling with Mercurial.

I checked it out years ago, but pretty much settled on Git.

What are the advantages?


1) The .git directory doesn't play nicely with mono-repos. Since all files are just hashed files that live in the .git dir, knowing which files in there are part of a subtree is hard. On the other side, Mercurial .hg dir uses a tree structure to track files, so you can do things like NarrowHG[0].

2) As well, Git has multiple client implementation (like git, egit, jgit, etc...). Adding new features is a bit more complicated as all the implementations need to add them before they can be more widely used. Mercurial has one implementation that everyone uses. So new features are easier to add.

3) The .git structure is simple, which is great, but it's become the API for git in a way. While mercurial explicitly says you should never rely on the structure of the .hg directory. If you want to interact with the .hg dir from other software, you should either issue 'hg' command or start up a command server[1] to talk with it. So it creates a cleaner API barrier. Because of this, the Mercurial team can make changes to the .hg dir to better serve different needs (like those of a mono-repo), without breaking the world.

[0] https://bitbucket.org/Google/narrowhg

[1] https://www.mercurial-scm.org/wiki/CommandServer


The .git file format has also changed multiple times, packed refs, multiple pack file formats etc. There's even WIP ref backends now which store the whole thing in some embedded database format.


There is no ".git" file. It's a directory with a lot of files. Some of the packing has been storage optimisations but the logical model (objects identified by hash) has remained the same throughout.

The nice thing about this is you can present the same logical model while being flexible about the way that model is persisted, unlike Mercurial which has a fixed file format upon which operations are based.


Mercurial also has a proper plugin architecture, which means that new features can be developed in the wild as needs arise, and then either become widely used or rolled into the standard distribution.


narrowhg looks like a good replacement for subrepo, thanks for the tip


Keep in mind that narrowhg is extremely experimental!


The possibility of extending the Mercurial core with a well-defined API is one major reason. In fact, most interesting features are extensions, bundled with Mercurial, and after several releases and experience, the functionality usually gets integrated into Mercurial proper (most often than not still as an extension). This is, unfortunately not an C API but Python, but it's still an advantage Mercurial has for now. Git has various efforts to build reusable libraries to write tools with, but there isn't an officially sanctioned _and_ complete one that works across all platforms. Microsoft has removed their reliance on libgit2 and shells out to git in Visual Studio now. If the git project had an official libgit, which exposed all functionality, is reused by git itself, and worked across all platforms, the situation would be in favor of git because consuming a C API is more broadly supported than, say, using a Python API inside a .Net application.

tldr: Git is still faster overall, but those who want to extend a dvcs choose Mercurial for its API that exposes the data structures in a stable manner, albeit in Python, which limits use cases.


Actually the fact that Mercurial uses Python is what made it quite usable on Windows from the get go, before Microsoft and others bothered to step up and improve the experience.

As for using Python on .NET, that is what Iron Python is for.


Of course Python as an abstraction layer made it more readily available on Windows and allowed allocation of developer resources to hgtk, including a Windows Explorer extension.

Still, despite its flaws, C is the common layer we have to expose an API that you want to be consumed everywhere. That, or a message passing interface with a client/server architecture. A client/server design may lead to zombie servers, while a tightly coupled C API might crash your application, though you can isolate the C API consumer in a supervised and automatically restarted server you talk to with messages, so that's the more flexible API to have.

The Rust rewrite of parts of Mercurial by Facebook is a no-brainer and given the possibility of GC-less C API in Rust, I wouldn't be surprised if a built-in-Rust C API for Mercurial were to follow. I don't like Rust when compared to high-level languages, but it's a viable C replacement with compile-time exclusion of certain bug classes, so I can get behind such a project. That said, the soundness bugs reported on github are worrisome, so I wouldn't trust Rust's checker to be correct or exhaustive, just yet. It's still a step up from C, that's undeniable.


Well, on Windows we also have COM, but I get your point.

Going off topic, maybe someone will eventually do a SQLLite re-write as well and other critical projects to our modern stacks that still rely on C.


> Going off topic, maybe someone will eventually do a SQLLite re-write as well and other critical projects to our modern stacks that still rely on C.

SQLite does not need to be rewritten. It has the best and most comprehensive test suite in the history of software development -- I would go so far as to say that there are no implementation bugs in SQLite (every single branch in the code has been extensively tested and also extensively tested with dummy failures and so on). So a rewrite in a safer language would benefit nobody (and would just be a huge time sink).


I'm not advocating one way or the other, but I will say that I don't believe 100% code coverage guarantees that you have no issues lurking that a safer language would prevent.

Now it probably isn't worth the effort for a very well tested project like sqlite, but that doesn't validate the premise.


> but I will say that I don't believe 100% code coverage guarantees that you have no issues lurking that a safer language would prevent.

It's 100% branch coverage, with 100% fault coverage as well. If there is an "issue lurking that a safer language would prevent" I would honestly be shocked. SQLite is not a good project to mention rewriting, because it is an incredible technical acheivement in terms of how well tested it is.


> SQLite is not a good project to mention rewriting

Which, as I said, is not what I'm doing. I'm only disagreeing with the premise that 100% test coverage means 0% chance of an unsafe bug existing in the code base (for any code base, not just for SQLite).


Personally, I consider 100% branch coverage to mean an effectively 0% chance of an unsafe bug existing in the code base. Does Rust have 100% branch coverage in their compiler? So how can you be absolutely sure their compiler doesn't have bugs when doing borrow checking and other tomfoolery? 100% branch coverage is an _insanely_ high standard and you can't compare it to any other project unless that project also has 100% branch coverage or similar.


Python for .NET (pythonnet) allows to bridge CPython and .NET/Mono runtimes.


The biggest technical difference is the mostly immutable history, which is a feature or a drawback depending on who you talk to.

More subjectively, most people I've chatted to about it seem to find Mercurial's interface much easier to grok / pick up as a new user than Git's (which is somewhat notorious for its quirks).

There are other differences, but these stand out to me. That said, I use Git because adoption + community (and my experience with hg-git has been less successful than some... though it's been a while since I gave it a spin)


CTO of RhodeCode here.

If you use phases (draft/secret/public) correctly it's really close to being mutable. For example all our devs forks are having non-publishing repositories since it's a private non-shared space. By keeping all commits as draft it's easy to do a rebase and then push new changed commits.

While the main repo is publishing and once pushed commits are never mutable, also we keep a workflow that only dev forks can have multiple heads while production repo cant.

IMHO it's the nicer form of mutability


Do you have or know of writeups describing this workflow in more detail? I'm interested in learning more.


This workflow is described in the documentation for the mercurial evolve extension: https://www.mercurial-scm.org/doc/evolution/


Here's a blog post on how to use Bookmarks in Mercurial workflow using code-review. We back ported from one of the guys that works for us and described our workflow:

https://rhodecode.com/blog/120/mercurial-workflow-using-book...


Can you elaborate on what you mean by immutable history, and how git lacks it?


In git you can edit past commits and rebase, which completely destroys historical data and traces of the rebase. This can be very useful, e.g. for keeping a frequently-committed-to branch "clean" and informative by squashing commits, for removing accidentally committed sensitive credentials, etc.

Mercurial, on the other hand, is architecturally set up in a way that considers the repo history to be a somewhat "sacred" truthful account.

You still get the same flexibility as git if/when you need the above mutability: it's not 100% immutable, it supports local rebases, and also global mutablity via "phases" - see Marcin's post on this. In an absolute worst case scenario you can also coordinate reclones of a repo of course, but generally speaking the point is that immutability is "on by default".


I'm sorry, I still don't see the distinction. It sounds like in both systems the default is immutability, with commands to override that if necessary.

And by the way, doesn't git rebase not mutate history? I believe it creates a parallel history (and updates the branch and head refs to point to the new history) but the old history still exists in git's storage and can be recovered (until you GC your storage).


In git, you `git rebase` any commit will always do whatever you tell it to do.

In Mercurial, `hg rebase` will abort if you're trying to make it do something to public commits. Public commits are commits that have been shared on a publishing server. The contrast with draft commits, which are commits that have not been shared or only shared on a non-publishing server.

If you really want to edit public commits, you have to manually force them back to the draft phase before you can rebase or rewrite them.


If you "git push" after rebasing published commits, you get an error, unless you use "--force" or you pull and merge manually before hand.

So I'm still not sure the distinction you are making between which one is "immutable" and which one isn't.


Phases catch this problem early before `git push`. You don't want to run into the problem too late. Then you'll be faced with the decision to throw out your work because you accidentally rewrote something that should not have been rewritten. Or you'll have to decide at the last minute that you really did mean to make problems for everyone else.

Btw, with Mercurial Evolve, there's no need to force-push, as Evolve will propagate meta-history to other users that indicates what commits replace which ones.


Ok so it has nothing to do with immutability, it's just one warns earlier than the other.

Thanks for clarifying.


It is about immutability. No Mercurial command (histedit, rebase, uncommit, amend) will allow you to change an immutable (public) commit unless you first force it back into the draft phase.

You can say that because you can always force public commits into drafts that they're not really immutable, but that's a bit of a perversion of what Mercurial's phase system is intended to do.


And no git command lets you change what public history looks like unless you --force the push.

So the check is in a different place. That doesn't seem to me to imply that Mercurial's history is "immutable" and git's is "mutable", especially since those words have precise meanings, and even WITH the --force, git doesn't change anything in the history, it just writes out a new history and updates the branch ref to point to the new history.

The only thing mutated (in both systems) is the ref to the branch head, right? So aside from warnings and errors being in different places (both before publish time), what is the difference between the two that leads you to argue that one is immutable and one isn't?


I think what Mercurial brings here is that even if you have a repo and you do a force-push you keep all the heads. Thus it's almost like a history of changes.

I think if you do a code-review and you get the final state of the repo to check, you can nicely tell how it was evolved/squashed/re-ordered.

I find that workflow much more useful than just rewriting history like in git (i know about reflog)


To add my 2 cents, immutable history is more of an annoyance than anything. If someone checks in something by mistake that absolutely must be removed (e.g. something that contains a password), it is a considerable undertaking to actually remove that commit from the repository.


It is not hard once you know how to do it is just not readily documented (ie histedit and force push).

BTW if you checked in a password... you really should now go change that password :)


I know better than to check in passwords, thank you very much. Regrettably, some of my colleagues do not.


A sane distributed source control system implemented in Python.

I am still disappointed that Git won.


Agreed. And yet it is encouraging to see the continued investment and upstream engagement by Facebook and Mozilla. (I knew Facebook and Mozilla were heavily invested; I'm not sure what Google's involvement is since code.google.com was abandoned.)

Things could get more competitive in terms of global mindshare if more Facebook engineers (some of whom the OP mentioned have forgotten how to use git) start speaking out in favor of Mercurial.


Google is working on adding mercurial as a frontend for their repository management system (Piper) and development environment (CitC).


Depends on your definition of won, I guess...fourth job in a row using Hg. I've never used git professionally, just for OSS stuff.


Also C won, Unix won, Android won. Seems technical subpar options are better business options.


You forgot JavaScript and PHP.



The big advantage to call out is that they found Mercurial much easier to add extensions to and more receptive to their issues with large monorepos. (For an idea of the scale, Facebook's repository I have heard is ~10× the size of mozilla-central, which is probably the largest public Mercurial repository.)


Much faster on Windows for large repos.

Better extension system. Some like hg evolve are science fiction compared to typical git workflows. Sounds like hg absorb is similar.


If you have a ton of files inside of one gigantic repo, the work facebook is doing on mercurial might help you out.

If you live in a saner world though, you'll probably benefit more from git's superior cli & tooling


I think you mostly call it a "saner world" because git tooling and performance doesn't work with unified repos, so people avoid it.

But it can be fixed, too, which is what Facebook and Google are doing.


It's weird to hear git's cli called superior. I have much more often heard the opposite.


Mercurial CLI is safe by all means! I have never lost a commit in hg. Git CLI is an abomination and is a nightmare to deal with. As a colleague of mine said once, "It is easy to develop programs than to use version control them using git".


What about speed?

Our team just moved from hg to git for an enormous project that has ~25 years of history (CVS -> SVN -> hg|git). The biggest improvement to my daily life is that a git pull takes seconds, while an hg pull takes minutes (or even large fractions of hours when I've spent a week or two away from work).


What version of Mercurial were you using? Additionally, was your repository very branchy? Were your pulls stuck for a very long time on 'adding manifests'?

It's possible that your slow pulls were due to the initial storage format being inefficient for very branchy repositories. I've documented migrating to generaldelta to solve this here: https://book.mercurial-scm.org/read/scaling.html#scaling-rep...

Additionally, using the 'clonebundles' feature, it's possible to speed up your initial clone by a huge amount (making it way faster than non-clonebundles Mercurial or Git): https://book.mercurial-scm.org/read/scaling.html#improving-s...

Of course, this is too late for you, I guess...


That is not normal. We at RhodeCode works with some of biggest Mercurial behind-the-firewall setups. We have one customer which uses a global VPN for their few instances.

They always measure Mercurial pull performance using a load tests. For example between 4.2.X and 4.4.X version of our software we went form 1.8s to 1.4s average time for a pull to happen under load.

We only do this via HTTP since this can be really optimized for speed. So having to take minutes sounds like some backend problems like overloaded server, not enough workers to handle connection etc.


Where were and are you hosting it? SSH vs HTTPS? Client? OS? Any hooks?


We were hosting both on an ESX VM and clones / pulls were done via SSH. OS was FreeBSD on both ends. Hookless.

Part of the reason for the move is that we wanted to take advantage of the corp-wide infrastructure of an Atlassian stash server hosted in the cloud and professionally maintained, so as to get away from maintaining our own repo.

But the speeds I quote above were for the initial phase of the conversion, when both repos ran on the same ESX VM and direct comparisons were meaningful. Now that its hosted professionally in the cloud, it seems even faster.

Note that I'm across ~2500 miles of VPN from the home office, and that surely has something to do with it.


Interesting...I've noticed mercurial can be a little slow at times, but never had a problem with that much pull lag. Could be an overloaded server maybe? Or just outside the scope of my experience.


You don't have much history in git at the moment.


Didn't he say he imported a repository with 25 years of history?


We have 25 years of history.

The initial clone time for hg and git is within the same order of magnitude (order of an hour), though git manages to be about 50% faster at that too.


> 25 years of history

Why do you need to keep all 25 years of commits?


Because it's useful to figure the intent of strange-looking code that was last touched 15 years ago. (Don't know the parent's situation, but that's the case for me here. Perforce isn't too bad for that)


I heard you. But for me, sometimes it's better just to start fresh than to try to figure out the intent of some ancient history. In this case, I guess that's BDD over TDD.


That was a busy meeting and the developer mailing list is very busy. It's great to see continued investment from so many interested parties.

Judging by the notes in the wiki [1], however, the purveyors of my preferred server, Kiln, are not so engaged lately:

> Available hosting solutions: Bitbucket, Kallithea (self-hosted), Kiln (still exists?)

I believe it is maintained and even if not maintained would continue to work for ages, but I suspect we will be held to 3.x for a while.

[1]: https://www.mercurial-scm.org/wiki/4.0sprint


Phabricator also supports hosting mercurial but there are some rough edges. Its need to do some basic monitoring of the wire protocol (primarily to determine whether actions are read/write for ACL and cluster resolving) have uncovered that interpreting the underlying wire protocol is difficult and challenging [0]. I wonder if Kiln or others have similar issues? I would guess that this sort of problem is one of the reasons hosted tooling around hg falls behind git even though touting a great extension API (aside from the large gap in user base).

[0] https://secure.phabricator.com/T9548


Mercurial doesn't use semantic versioning. If tooling that depends on mercurial breaks in a mercurial update, this is often considered a mercurial regression, so long as the tooling isn't using mercurial's internal APIs which have no backward compatibility guarantees.

I do know that the person primarily behind kiln harmony left a while ago and now works at Khan Academy.


What's you best KILN feature ? I wonder if we can adopt it at RhodeCode


Hmm. I haven't looked at RhodeCode. For me Kiln has provided a good UI and workflow and just worked ever since their v2.4 or so. I'm not sure there is a particular best feature, but roughly these features are what come to mind:

- Code review UI including comment pane with one level of sub-threading and easy click and drag linking to lines of code, small changeset selection pane, and large scrolling code pane are simple and work well. By contrast, I'm less sure about github and bitbucket pull requests UI.

- largefiles

- Single sign-on and linking with FogBugz (if we reference a FogBugz case in a commit message, then the commits show up in the case notes)


Thanks for sharing, actually those two first ones was our main focus from beginning.

We worked back in the days with Unity which was using kiln to make the code-review workflows similar.

If you miss kiln, you should check RhodeCode out, and our community edition is even open-sourced.


Kilns biggest omission, which is what made me choose Bitbucket, is the lack of branch pull requests. Kilns commit-based PRs are useless in a feature branch workflow.

Bitbuckets code review implementation doesn't handle big changes gracefully though, so I primarily inspect the changes in Beyond Compare.


What's hilarious/sad to me is that Kiln started out (as the prototype that won Django Dash) doing only pull requests. That was literally all it did. Pushing directly to a repo would instead, behind-the-scenes, make a branch repo and put your commits there. When you accepted the review, it'd automatically get merged. Kiln would even warn you if you could safely do the merge without conflicts. This was all back in 2008, and I believe predated GitHub launching entirely, but certainly predated PRs being common.

Hilariously, we concluded internally that doing things that way was too complicated/weird for people to use, while GitHub concluded the exact opposite, and the rest is what you see.


True. As a workaround you can bridge your feature branch workflow to Kiln reviews by using the kiln extension locally with the push command to attach the changesets to a review, as in:

    hg push --review -r [REV|BOOKMARK|BRANCH]
Theoretically. The extension seems to be broken with respect to hg 3.8 and with the Kiln API itself.


KILN and Fogbugz have a nice integration.


hg absorb sounds very useful. I'd like to know details, since I can imagine undesired results it might produce.


Looks like it uses annotate information, seems pretty handy.

https://bitbucket.org/facebook/hg-experimental/src/default/h...


So is this a comeback of Mercurial?

I know Subversion is pretty much dead, Mercurial was sidelined and Git seems to have conquered the world. The only thing I think would shake things up a little would be Perforce going Open Source. But i dont see that happening as they seems to be very comfortable in their niche.


anyone knows if mercurial subrepo feature is used by big corporations


Relevant quote:

  > Facebook is writing a Mercurial server in Rust. It will be distributed and
  > will support pluggable key-value stores for storage (meaning that we could
  > move hg.mozilla.org to be backed by Amazon S3 or some such). The primary
  > author also has aspirations for supporting the Git wire protocol on the
  > server and enabling sub-directories to be git cloned independently of a
  > large repo. This means you could use Mercurial to back your monorepo while
  > still providing the illusion of multiple "sub-repos" to Mercurial or Git
  > clients. The author is also interested in things like GraphQL to query repo
  > data. Facebook engineers are crazy... in a good way.


A suggestion if I may... Don't use monospaced formatting (two-space prefix) as a way to put a '>' on every line of a long quote, as it becomes unreadable on mobile. One '>' at the beginning of each paragraph is good enough (with blank lines to separate paragraphs), or italics are fine too.

Here's a copy that should be readable on any device:

> Facebook is writing a Mercurial server in Rust. It will be distributed and will support pluggable key-value stores for storage (meaning that we could move hg.mozilla.org to be backed by Amazon S3 or some such). The primary author also has aspirations for supporting the Git wire protocol on the server and enabling sub-directories to be git cloned independently of a large repo. This means you could use Mercurial to back your monorepo while still providing the illusion of multiple "sub-repos" to Mercurial or Git clients. The author is also interested in things like GraphQL to query repo data. Facebook engineers are crazy... in a good way.


This is offtopic so I'll leave it to just this one reply, but thanks! Lack of quoting in HN's markdown is the only part I have actual frustration with it; I usually use the two-space + wrap format because it makes it much more clear that it's a quote; I find your version hard to tell. But that said, I checked on my phone, and I see what you're saying about mobile. Ugh.


HN markdown does support italics, so italicizing entire paragraphs is a viable alternative to differentiate what you're quoting from your own comments. I've seen folks do it now & then.


CTO of RhodeCode here. This is exciting, we need to take a look from our side how this would affect our product.

I really like that Mercurial is gaining some traction with the big guys, which tries to solve some nice problems at scale.


Over here: https://www.mercurial-scm.org/wiki/4.0sprint

  >  - Goal is to open source the server, once it's more than just slideware.
So you might not be able to look at it just yet, but hopefully soon?


Hopefully !

We spend a lot of time on our own to scale our Mercurial backend. Currently with the http based vcs-server and gevent we can support a lot of concurrent hg operations, but imho that thing can put it on the next level...

I wonder if it will support all things like phases etc ootb.


That's exciting, sounds similar to the git fusion stuff Perforce is doing.

I'd be happy to see a bit more diversity in source control. Git is great but falls over in more than a few scenarios(unmergeable large binary files come to mind).


Congratulations, this is the kind of stuff that will improve Rust's adoption.


It's interesting they're doing this in Rust. I would have expected them to do it in D.


Warp [0], which was a huge marketing win for DLang is not maintained anymore. Not sure if it is being used at all. Will there be any D projects in Facebook?

[0] https://github.com/facebookarchive/warp




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: