Hacker News new | comments | ask | show | jobs | submit login

Having a local "bare" copy (to use Git's term; `hg clone -U`) of the Mozilla repo, I'd like to create a checkout of it (`hg clone`), and build Firefox from it.

This results in 90 minutes of waiting with hg pegging one CPU core (CPU-bound, not disk-bound!), then a few minutes of hg actually writing files, then the actual build taking 60 minutes.

In the git community, it was always taken as a sign that they were doing something right that `git clone` was faster than `cp -r`. I'm having a hard time not taking it as a sign that Mercurial is doing something wrong that `hg clone` is so much slower.

edit: The saying was that `git checkout` is faster than `cp -r`, not that `git clone` is. The sentiment stands.

Mercurial tries to use hardlinks to share as much actual repository data as possible when doing a local clone. Maybe that's the part that's interfering here. You can try using hg clone --pull instead of just hg clone, which will not do hardlinks. Don't have much experience with terribly large repositories in Mercurial, though. Those we have at work have maybe a few thousand files and less than 20k commits and they're not particularly taxing.

`git clone` will also use hardlinks by default.

       --local, -l
           When the repository to clone from is on a local machine, this flag
           bypasses the normal "Git aware" transport mechanism and clones the
           repository by making a copy of HEAD and everything under objects and refs
           directories. The files under .git/objects/ directory are hardlinked to
           save space when possible.

           If the repository is specified as a local path (e.g., /path/to/repo), this
           is the default, and --local is essentially a no-op. If the repository is
           specified as a URL, then this flag is ignored (and we never use the local
           optimizations). Specifying --no-local will override the default when
           /path/to/repo is given, using the regular Git transport instead.

How long ago was this? And what OS? I know we're faster than unzip on some platforms, but IIRC we just found a pretty awful performance issue on...linux? I think?... that meant our parallelized approach to writing files was burning a ton of time in the kernel filesystem locks or something.

Just a few days ago. Parabola GNU/Linux-libre (like Arch Linux). On btrfs, if that makes a difference.

From what I could tell, whatever the slow part is, it wasn't parallelized at all; of 16 cores, 15 were idle, and 1 was pegged at 100% (and very low disk-wait).

I've been meaning to dig in to this more. I'd tried adding --stream, figuring that maybe it was re-compressing the entire repo; but that just lead to a warning about "server doesn't support streams" or something like that. I'll definitely try ygra's suggestion to try --pull, and if that doesn't yield good results, I'll actually dig in to the sources and see what's going on.

Sounds a lot like https://twitter.com/indygreg/status/1028097388261433344 - I'm pretty sure Greg was working with hg when he noticed this issue.

In any event, if you're enthusiastic about this kind of thing, we'd love to have more eyes on making hg checkouts consistently fast, even in the face of filesystems undermining those efforts. ;)

If it were just btrfs being silly, I'd expect the high CPU usage to be attributed to a kernel thread, not to the hg process. But we'll see :)

I have pulled and built Firefox from scratch over Australian internet in less than half that time? In general I have found all Mozilla repos to be an order of magnitude faster for this type of operation than the equivalent Google/Chrome repos (not using build caches).

The actual network clone is faster than that. The slow clone that I'm speaking of is cloning from one place on my hard drive to another place on the same hard drive.

How is git clone faster than `cp -r`?

My first guess would be that git is going to be faster since it won't be doing a huge number of stat() calls and similar to grab metadata about the files it's creating, instead git will have the data in a single data structure it can efficiently ask for it from. Of course the filesystem has this too, but git will have it in a single location and probably more compactly, leaving it able to do it all with fewer reads. Then it has the data in the files compressed so the read is faster too. All this should mean that git can get to writing the files faster than cp will.

It may be in relation to when you duplicate a repository.

When you cp an active git repository you receive an identical file state; when you git clone you receive only what is necessary. Old objects, reflogs, et al aren't in the clone. A clone of an existing local repository may also make hard links, if your file system supports it.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact