Hacker News new | comments | show | ask | jobs | submit login
Mononoke – A Mercurial source control server designed to support large monorepos (github.com)
106 points by guifortaine 9 months ago | hide | past | web | favorite | 31 comments

Git Community PM at Microsoft here: I'm very impressed by Facebook's source control team in general (and Mononoke in particular). They're doing a lot of interesting things, and they've developed a very opinionated workflow about working in their monorepo that is very helpful for their workflows.

Another comment suggests that there's "fierce competition" in the version control space. In fact, I think that there's fierce collaboration in the version control space! Durham Goode from their team gave an interesting talk at Git Merge 2017 about how they're using Hg and some of the interesting user-experience tools that they've built on top of it: https://m.youtube.com/watch?list=PL0lo9MOBetEGRAJzoTCdco_fOK...

I don't personally prefer Mercurial (I like Git, as you might have guessed), but I think that there's some opportunities to learn from their user experiences and perhaps add or improve some high-level Git commands.

Tech lead of Mononoke here. Thanks for your comments!

The experience of using Mercurial within FB is quite different from the stock open source one. We have many local extensions, as well as integrations with the rest of our developer infrastructure. We're definitely optimizing for the linear-history monorepo model.

We don't see things as a competition between hg and git, but between good developer experiences and bad. Thus far, we've found it easier to improve the developer experience with hg than with git, so that have been our main focus.

But we feel we're reaching the limits of what we can do with hg, which is why we're investing in Mononoke. We feel that version control in general has been pretty stagnant, and we're hoping to do some neat stuff over the next few years (based on Mononoke and other projects).

While you're here, can you say anything about when beta users outside of MSFT/GH might be able to try GVFS?

Do you think GVFS will also be able to support "thousands of commits an hour"? (It prominently claims to support "millions of files").

Your second question about "thousands of commits an hour" is actually a server scale question, as opposed to a GVFS question. The main difference is that GVFS is primarily client software plus some extra server protocols that a server must support.

I find the "thousands of commits an hour" measurement to be conflating multiple things. I'll split out my expectations in our experience with the Windows repo.

The Windows repository is hosted in VSTS and handles over 4,000 developers doing their daily work. This includes an average of 3,000+ pushes per day via the Git protocol. Each push may include multiple commits. At least 2,300+ pull requests are completed per day, which creates a new merge commit each time. Also, the build machines run daily and kick off a push as their first action to update a version number. This creates 250+ parallel pushes within a 10 second window, so a spike of concurrent pushes are supported by VSTS.

Thanks, that's very helpful color!

(Plus, a nice ad for VSTS – well done).

Btw, mercurial has a longer term goal to rewrite parts of hg in rust.


There's some irony that this lives on...github.

/git evangelism strike force

They really should have put it on Bitbucket. Hosting code for a VCS publicly and using the most fiercely competing system for it just sends the wrong message. How did nobody catch this?

I think that the message it sends is that we collaborate with each other - Mercurial and Git, Facebook's Mononoke and Microsoft's GVFS. And since GitHub is _the_ place where open source software development happens, that's where this is hosted.

We're not enemies. We're not competing. We're trying to make developers lives better, and there are a number of techniques that can work, and a number of tools that can help them.

Competition does not stop when you don't want to see it. It exists already by the virtue of having multiple similar products in the first place. You can't wish that away. The world doesn't work that way.

Well, then I wish Mercurial nothing but luck in winning the lucrative free version control client market.

Edit: sarcasm aside, you're right in saying that there is competition here. And competition is healthy, it's what drives us to solve problems in different ways and _allows_ us to collaborate. But to suggest that we shouldn't use each others technologies feels short sighted and like the wrong kind of competition.

Github is proof that the market is lucrative. There is no sarcasm here.

You have to be careful what you use other technology for. If you don't show that you're relying on your own product when you clearly have a chance to do so, you immediately lose trust. First impressions matter a ton there.

Github isn't git; if hg takes off, github (the company) can just add support for it.

SCM hosting is very different than SCM software; the latter is usually... not lucrative.

Eg; I'm sure MSFT hopes to make money from their GVFS effort, but not by selling it – simply by using it as a marketing tool to establish credibility (eg; for Team Foundation Server and other developer products).

Even if nobody _used_ GVFS, it could still be a profitable project for MSFT if it made everyone think MSFT an expert in source control. (To be clear, I love this incentive structure, and it's working on me – my respect for Microsoft is growing, thanks in part to GVFS).

Most people might know the name from the 1997 anime [0], not really know what the word means, but recognize the "mono" part as "singular". But really, it means monster [1]

[0] https://en.wikipedia.org/wiki/Princess_Mononoke [1] http://jisho.org/word/%E7%89%A9%E3%81%AE%E6%80%AA

Mono-repo -> we're focusing our efforts on few large repos

Mono-tone -> The first cryptographic Merkle-tree distributed source control system, which was a large influence on both Git and Mercurial.

Monotone was written by Graydon Hoare, who later went on to design Rust (I think he was actually designing proto-Rust at the time), so the reference to Monotone is to both Mononoke's function and implementation.

Why did Facebook choose Mercurial over Git?

from a previous blog post:

"Our engineers were comfortable with Git and we preferred to stay with a familiar tool, so we took a long, hard look at improving it to work at scale. After much deliberation, we concluded that Git's internals would be difficult to work with for an ambitious scaling project."


Note that Microsoft arrived at the opposite conclusion and wrote GVFS: https://github.com/Microsoft/GVFS

Reading that it sounds like they reached the same conclusion, that they didn't want to mess with git's internals. They wrote a filesystem to run underneath git.

Hearing from friends that work there, the RTT for the underlying filesystem adds quite a lot of time for daily operations, especially if they are not working on the west coast of the USA. It was said a pull takes 45 minutes.

This leads me to believe that to handle large repos and files within a tool like git, its internals should be changed a bit so that there's not as many file accesses that need to be done (fstat, read, write). Also for certain operations to be batched together to better hide the latency involved in global communications.

If you do it right, it will eventually work. Google does something similar with fuse and their mono repo (CITC is magic).


If you're working with a Git repo on a remote filesystem, you're doing it wrong. Git is not designed for remote filesystems. It relies on certain file operations to be extremely fast. So Git works best with local filesystems. With Git, you want to clone the entire repository locally and work with it locally. That's the idea behind the distributed version control: every committer has the entire copy of the repository. With a remote filesystem you're effectively centralizing your repository.

Yeah, the git metadata is on a remote file server, that's the entire point of the Microsoft file system extensions for running a huge git monorepo. The code and assets you are working on is local, but the history and other metadata elements are stored remote and lazy loaded.

So if you're remote file server is relatively close it doesn't matter too much and the lag is not noticeable, but if it's across the country or across the world...

If you look at that same repo, they're trying a similar approach, https://github.com/facebookexperimental/eden

There was some interesting answers around this in the original thread for GVFS: https://news.ycombinator.com/item?id=15725497

The Microsoft PM for GVFS commented there.

As a game dev in a small studio that runs on git, I'm so exited to see where GitVFS goes, and especially since the announcement they're working on macOS drivers. We're pushing git to it's limit, and it would really nice if we wouldn't have to switch to another system (git is mostly great, and we're all good at using it, even the artists). GitVFS seemingly solves all our problems in a really slick way.

Probably because they prefer to use sane SCM systems.

What is insane with the other ones?

Do you really need to ask regarding git, given the amount of tutorials how to rescue the working state back?


I’ve taught an intro to git class a number of times to scientists wanting to build some basic software development skills (through Software Carpentry: https://software-carpentry.org/).

It can be easy to forget that the cognitive overhead of version control is pretty extreme. Looking at git through beginners’ eyes is telling. And our curriculum basically gets people to a baseline proficiency with change, add, and commit plus pushing, pulling, and a little merging. In other words, scratching the surface.

Scientists then start asking all the reasonable questions about restoring work under various scenarios, and whether they should store large amounts of data with their code. It’s in those moments where I start asking myself hard questions about git, its usability, and its architecture. Subversion at least had binary diffs and reasonable support for big files.

Git may have won the day, but I cannot help but think that it’s a pretty major compromise (even a retrograde step) compared to what could have / should have been.

In my experience the most confusing thing for beginners is adding new files, committing, and pushing in separate steps. It also doesn't help that Google returns links to the "right way" to use Git - meaning advanced large enterprise usage.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact