Hg is a bit of a nightmare in the wfh situation. Really slow, hangs for a long time if you haven't synced in a few days. Yes, im sure there are ways to tweak, but not sure if you can tweak them enough!
For the benefit of everyone else on this thread, note that Facebook uses https://github.com/facebookexperimental/eden. From the README:
> Despite having originally evolved from Mercurial, EdenSCM is not a distributed source control system.
i.e., this is not "stock" Mercurial. For example, the Eden server (Mononoke) is written in Rust and the complementary virtual filesystem is written in C++. This has very different performance characteristics than real Mercurial.
EdenSCM works with both EdenFS (the custom filesystem) and a traditional filesystem. If you use EdenFS, pulls will be much cheaper because you only fetch what you use. If you use a traditional filesystem, EdenSCM supports the same "sparse checkouts" feature as stock Mercurial (https://firefox-source-docs.mozilla.org/build/buildsystem/sp...), which can also be used to reduce the size of the slice of the monorepo you pull down.
Last I checked, Perforce (and Google's "implementation" of Perforce, Piper) did not provide nearly the same level of support for stacked diffs as Eden. As both Google and Facebook have cultures of pre-commit code review, working with stacked diffs makes it much easier to make progress while waiting for approvals on earlier diffs.
I believe there are relative advantages/disadvantages of Eden vs. Piper+CitC and that both projects aspire to have the best of each in the limit.
It also overwhelms http header size norms on the regular and runs up memory when attempting to use those bundles.
Chunks/bundles are, in fact, strictly worse unless you get your initial clones with a good piece of software like curl.
(I also worked at Google but not the google3 monorepo - instead I got emerge, which was misery.)
I do Android occasionally, from laptop, given wfh.
At my extremely small startup in the past ~6 months I've made about 1,000 files (@ HEAD, not including deletions). So, maybe, at the absolute worst case each engineer could produce ~150 files/month. Facebook likely has ~1,000 engineers. That's a worst case 1,800,000 files/year + an existing massive code base.
Good luck to the engineers who are forever battling the scaling issues underlying these orgs! It's extremely impressive to even hear about overcoming these huge hurdles.
I used to do vendor support for a company that does version control for chip design. Basically Perforce + some metadata and applications that automate away a bunch of the administrative grunt work.
Some of our larger customers had quite a few layers of abstraction to try and speed it up (to varying degrees of success). But I don't know how Google's scale would compare to a company that would routinely have check-ins they claimed to be hundreds of GB. Or another that checked in their giant PDKs to send them to each of their sites globally
Discussed at the time: https://news.ycombinator.com/item?id=7019673
Github won, so we're all forced to use git now, but mercurial is really a simpler interface and a more straightforward mental model.
It's a much nicer experience IMO to be able to work on various different pieces, then iteratively run `git add -p` or similar to stage individual files or hunks, verify everything I'm adding with `git diff --cached`, then `git commit` when I'm actually ready and happy. It lets me fix multiple parts of my project at the same time, without interruption, while still letting me compose my batch of changes into multiple, self-contained changesets.
`hg record` feels really limited by comparison, and rushes you immediately into committing your selected changes, and then further makes it difficult to selectively amend more (partial) changes to that commit afterwards.
But hg makes it easy to walk up and down a stack, putting fixes in the right place.
At the risk of overreach, this feature would lead to better software. If your tools have better support for preparing a stack of commits, then code review will be easier, and software will be higher quality.
I wrote git-prev-next to scratch my itch but IMO git should embrace it as a core feature. http://github.com/ridiculousfish/git-prev-next
Can you expand on this? I've thought partial checkouts are accepted and the user facing part even has been released with 1.7?
It shouldn’t be a surprise that FB decided to go elsewhere.
I've heard all these stories about build servers or source control servers where, for the longest time, the solution to scalability problems was to throw larger machines at it. At some point the machines have well into the TBs of RAM, each, and it becomes a major priority to make the system distributed.
That's why neither Google nor Facebook use Perforce for most of their code. The repos are just too large.
The Perforce servers were… large. And backed by large, extremely expensive SANs that worked with RAM as backing store. Yet, it was slow as molasses at Google's scale.
I enjoy the patches the make mercurial usable without the crazy extensions that few of us will ever use.
I feel like I saw a good video in 2017 or so but cannot find much these days.
Edit: Why am I getting downvoted for asking a question?
We use multiple repos and we have a pretty large codebase consuming a TB a day.
1. Completely de-coupled code. I.e. teams ship libraries, like a 3rd part api. This slows down development considerably, and makes re-use of code harder.
2. Keep everyone on one repo, and allow to changes stream in.
1. Slows down dev. (it suddenly becomes more beourocratic), 2. requires scaling your code versioning tool (gir/mercurial, etc) and processes associated with it.
Also, single repos, make sense in one domain (e.g. server side, ios, android, etc...). You can have different repos for different 'domains' where code doesn't intersect with each other that much.
I've found this to be a feature of polyrepos because monorepos can easily become a rat's nest of dependencies. Polyrepos make you think harder about what should really be exposed and shared.
Allowing fine grained visibility at the level of a file, package, or artifact is better than at the granularity of a repo.
(And at least at Google visibility changes required the approval of the team who you want to depend on)
Having seen "a bunch of repos" in action, I think people don't talk enough about just how awful the experience can be, and how much more work it is to manage multiple repos. As you increase the number of repos, the pain gets worse at a rate which is faster than linear. There are plenty of articles talking about how wonderful a monorepo is, just not many articles about how bad multirepo is.
NPM hell is a close approximation of the multirepo experience. Try upgrading the dependencies in a large NPM project and you'll see all sorts of problems. You might find that upgrading X breaks Y, but you have to upgrade X in order to upgrade Z, and you need to upgrade Z for some reason.
With monorepo, all the versions march forward in sync. If you fix trunk, you will probably ship it, eventually. With multirepo, you need to fight the tooling just to show (from the example above) that a patch to Y will let you upgrade your project to use a newer version of Z.
Not to mention that Google internally avoids multiple repositories (and instead runs of a single monorepo) - it's just the android/chrome/public stuff that's mostly split up.
This thinking right here, sounds like a reason why Google retires a lot more services at much higher frequency compared to other companies?
In https://thehftguy.com/2019/12/10/why-products-are-shutdown-t..., the HFT guy brought up the "5 years upgrade pain" as the reason why a service would be retired, at the time when the pain from upkeeping outweights the gain from revenue.
By regularly making backward incompatible API change, as afforded by the monorepo, while also having engineers moving freely between projects, the upgrade pain cycle becomes way shorter for Google services.
By the way, this line of thought also leaks into public facing source code, with guava being the poster child of breaking backward compatibility, althought it has learned its lesson starting with version 21.
Of course Google, Facebook etc have given back a lot to OSS. Just not extemporaneously to when they received the value. It may be years until the internal rebuild of something some guy copied from somewhere is rereleased.
Git was designed for OSS. That includes its radically transparent form of development. Then again with submodules you can easily vendor your private stuff, rather than doing things the other way around.
They do vendor, but that's for security and efficiency reasons mostly, not to avoid giving back.