
Scaling Mercurial at Facebook - chairmanwow
https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/
======
ktRolster
_Our code base has grown organically and its internal dependencies are very
complex._

So instead of cleaning up the internal dependencies, they decided to rewrite
Mercurial. That is the kind of thing Facebook likes to do: for example, when
PHP got too slow, they wrote a PHP compiler....

~~~
loeg
I think you may not grasp how many thousands of engineers Facebook employs at
this point. It is literally easier and less risky for them to assemble a 5-10
person team to fix Hg than to fix the code of thousands of other engineers.

~~~
gaius
While that is true, it is also true that it would have made sense for them to
pay for Perforce upfront. I would say that hindsight is 20/20 but I'll wager
there was an engineer there who did say this at the time...

~~~
cmrdporcupine
Perforce won't scale to that level either. As Google has shown. Google's
"Piper" is based on Perforce but isn't Perforce, because Perforce couldn't
hack it.

------
joshbaptiste
[https://news.ycombinator.com/item?id=7019673](https://news.ycombinator.com/item?id=7019673)
871 days ago - 241 comments

------
Smudge
I generally see two reactions to the "one codebase to rule them all" approach
(used by Facebook, Google, et. al):

1\. Holy god why would you let your code grow to such a massive,
interdependent scale? Just release everything separately and versioned so that
breaking changes don't affect everyone all at once. The idea of git being a
bottleneck is absurd and you are using it wrong.

2\. This is a very reasonable, practical approach to sharing code across a
company. It reduces siloing and ensures that major refactors can happen in one
pass without a ton of coordination. Better to fix the version control system
than waste endless resources refactoring millions of lines of code.

Both reaction is valid. Having worked in both styles of codebase, I recognize
that there are trade-offs in either case. The optimal solution depends on the
project and the team.

Sometimes the path of least resistance--that is to say, the path to getting
things shipped and, in turn, making money--is to let the codebase grow
organically and worry about cleaning up any messy interdependencies later,
once you have a better idea of what code you even need to keep around. In this
scenario, it's important to recognize that developer efficiency is going to be
an uphill battle in the long run, but if you are proactive about maintenance
and tooling improvements then this approach can still be relatively painless.

Other times, especially when you're working on a tried-and-tested product with
a clear API and a dedicated team, it can be productive to split it out and let
the team manage their own versioning and releases. This becomes especially
useful if the product is open source. (For instance, I wonder how Facebook
manages its open source releases relative to its shared Mercurial codebase.)
In this scenario, developer efficiency is usually less of a problem, as proper
use of versioning can ensure faster, more agile updates to each product. But
the downside is that your company as a whole can end up in a kind of
versioning hell, where every project depends on a different version of every
other project, and keeping everything up to date can require a huge amount of
coordination.

So, in the end, pick your poison. My reaction, years ago, was more along the
lines of #1, but I used to be much more of an idealist earlier in my career.

~~~
philwelch
Approach #1 lands you in a dependency hell where you have to maintain multiple
incompatible versions of each internal library or framework, or each external
library that's in use, multiplies the work involved in upgrading dependencies,
and in other various ways leads to its own problems. There's no panacea but I
can see the appeal of having a single shared codebase.

~~~
Shorel
Still, much easier to manage with branches than before git and mercurial were
around.

~~~
philwelch
It's not a question of managing source branches, it's a question of whether
you have to update your OpenSSL dependency once for the whole company or a
thousand times over for each individual software package.

------
chriscool
The same kind of work is being done on Git these days:

[http://thread.gmane.org/gmane.comp.version-
control.git/29510...](http://thread.gmane.org/gmane.comp.version-
control.git/295106/)

~~~
elgenie
I believe this is Twitter once again attempting to integrate their use of
watchman (open sourced by FB) into the git core.

Their first attempt was in May 2014:
[http://www.spinics.net/lists/git/msg230487.html](http://www.spinics.net/lists/git/msg230487.html)

~~~
chriscool
Yeah, David Turner has been working for Twitter. But his work is based on
previous work on an "index-helper" daemon by Nguyễn Thái Ngọc Duy who is not
working for any company as far as I know.

~~~
elgenie
He's also involved in the thread from May 2014.

------
totally
> January 7, 2014

This is years old. Am I missing why this is newly relevant?

------
z3t4
This seems to be an old article (January 7, 2014) anyone know if Facebook
still use Mercurial?

About securing different parts of the repo, the mercurial server actually
doesn't have any user authentication! You are meant to do that yourself with
SSH or a web server, where you should be able to have more restrict access to
some special folders.

About using a single repo, it does make sense to have all code that interact
with each other at the same place. Imagine changing a variable name in some
API and at the same time update all usage of that name in the whole codebase.
And imagine the bureaucracy and people management for just making a variable
change if there where separate repos witch you might not even have access to.

------
dreamcompiler
I kind of like to hate on Facebook because they waste my time and (try to)
track me everywhere. But I'm also a big Hg fan, and Watchman looks awesome. So
it's hard to completely hate them. Sigh. They're kinda like Google.

------
jbyers
(2014)

------
cheez
It's funny how they did not use Perforce which works really well for large
repositories.

~~~
curtis
They did mention Perforce in passing:

> For a repository as large as ours, a major bottleneck is simply finding out
> what files have changed. Git examines every file and naturally becomes
> slower and slower as the number of files increases, while Perforce "cheats"
> by forcing users to tell it which files they are going to edit. The Git
> approach doesn't scale, and the Perforce approach isn't friendly.

I don't know if Perforce's "unfriendliness" is enough reason not to choose it
nor do I know if that was the only reason they rejected it. However, speaking
for myself, I vastly prefer Subversion, Git, and Mercurial over Perforce
precisely because Perforce requires you to ask for permission before you can
edit a file.

~~~
dhd415
For any of the major IDEs with Perforce plugins, this happens automatically
when you start editing a file. It's essentially a non-issue for normal use
cases. Things work a little differently when you're offline, but for most
people, offline work is uncommon.

