
Speed Up Git Pull - nahname
http://interrobeng.com/2013/08/25/speed-up-git-5x-to-50x/
======
lsb
A 50x speedup is pretty cool in its own right. Kudos.

However, I wonder if this isn't treating a symptom versus a root cause.

Is saving that 5s round-trip so common in your workflow that you needed to
optimize it, and would it be more productive to refactor the app so you and
collaborators are working on different files?

Also, this has the entirely valuable guidance that pushing to a local server
is much faster than a remote server. There's a Github Enterprise product, for
you to run close to you. It'd be an interesting calculation to see the
performance hit from waiting 5s to push to a remote server, versus the
performance hit from keeping your nearby server up and patched.

But nice to read, kudos all the same!

~~~
robin_reala
[http://xkcd.com/1205/](http://xkcd.com/1205/)

If you pull 10 times a day (conservative for me) then you’ll save ~5 hours
over a year.

~~~
gcr
But couldn't it very well take more than 5 hours to set this up?

~~~
nilved
It took me less than five seconds to make the SSH changes and less than five
minutes to set up the intermediate server. Now I also have a cache for when
GitHub goes down (all the time.)

~~~
gcr
> Now I also have a cache for when GitHub goes down (all the time.)

Isn't the `.git` a cache for when github goes down? Git keeps all of your
history inside your repository; you don't need network access to do anything.

~~~
nilved
Sort of. I've automated synchronizing it with the GitHub repo, so my local
repo may be behind the cached repo. I'm also protected if a repository is
deleted/moved/DMCAed.

~~~
marblar
What benefits does this provide that a cron job running `git fetch` doesn't?

~~~
nilved
Redundancy, public access and not having hundreds of unused repositories on my
computer. I'm not trying to sell anyone on the idea, it just works for me and
I like it so whatever.

------
shizcakes
If you're a heavy SSH user, using multiplexing in this manner can have
negative consequences [1]. Downsides include having all your multiplexed
connections exiting if the master exits!

[1] [http://www.anchor.com.au/blog/2010/02/ssh-controlmaster-
the-...](http://www.anchor.com.au/blog/2010/02/ssh-controlmaster-the-good-the-
bad-the-ugly/)

~~~
andrewaylett
More recent SSH clients can use "ControlPersist" to establish the master
connection in the background, so the first session doesn't control the
lifetime of the connection. This makes using ControlMaster workable.

I usually set ControlPersist to 30 seconds, which may not be long enough for
people hoping to get performance improvements from GitHub, . Setting it to too
large a value increases the risk that you'll have stale server sockets after a
network outage.

~~~
craigyk
> This makes using ControlMaster workable.

Best part of reading this article. I had turned off connection sharing because
of this.

So what, in more details, are the downsides to ControlPersist?

~~~
georgebashi
One that I frequently run into is that if you use SSH tunneling (like -L), you
have to specify it the first time you ssh to that machine (i.e. when the
ControlMaster is connected) and can't change it later. Using -L on later ssh's
to the same machine silently fail, which can be infuriating if you don't
realise it's happening. The best you can do at that point is to kill the
ControlMaster ssh (disconnecting you across all your sessions), and then
reconnecting with the right -L.

~~~
croikle
You can skip the master and spawn a fresh connection for your tunnel using `-o
ControlPath=none`.

~~~
croikle
In fact, even better: you can add forwarding to your existing connection.
<newline>~C opens a command line, which accepts the following commands:

    
    
        ssh> help
        Commands:
          -L[bind_address:]port:host:hostport    Request local forward
          -R[bind_address:]port:host:hostport    Request remote forward
          -D[bind_address:]port                  Request dynamic forward
          -KR[bind_address:]port                 Cancel remote forward
    

(If you're not familiar with them, some of the other escape sequences are
useful too. ~? lists them all.)

[EDIT] Apparently, if you have a recent enough version, you can add a forward
to the master with `ssh -O forward ...` [1]

[1] [http://serverfault.com/questions/237688/adding-port-
forwardi...](http://serverfault.com/questions/237688/adding-port-forwardings-
programmatically-on-a-controlmaster-ssh-session)

------
CoffeeDregs
Meta question: assuming that lots of GH users do this (nice trick), would GH
have loads of dormant SSH connections? At scale, this could be a huge number.
Would this be an issue?

~~~
ceejayoz
No. I have this enabled and Github closes my connections after a very short
period. I use it primarily for SSHing into my cluster of EC2 instances (which
does massively speed things up).

~~~
susi22
Same here. I do a

    
    
        (ssh -fqN -o "StrictHostKeyChecking no" git@bitbucket.org >&/dev/null &)
    

with bitbucket and github in my zshrc and bitbucket stays open but github is
getting closed at some point. It used to stay open however.

~~~
eru
They could offer it as a premium feature.

------
nkuttler
Hrm, so this isn't about making git 50x faster but fast network communication.

> Establishing an SSH connection every time you perform a Git operation costs
> many round-trips

I don't really understand why the author is saying this. The whole point of
git is to be distributed and not to push/pull at each commit.

That being said, he found something that speeds up his workflow tremendously,
so congratulations.

~~~
dlitz
Even with a short timeout (say, 10 seconds), it could be useful for a
maintainer who pulls from several other users' GitHub repositories. Instead of
establishing a new SSH connection for each remote, a complete "git remote
update" of multiple repositories could be done over a single connection.

------
dminor
Note that if you are on centos 6, the openssh version isn't new enough to
support this feature.

~~~
er0k
this is not true

~~~
dminor
Yes it is. ControlPersist was introduced in openssh 5.6.

Centos 6 ships with a patched version of 5.3.

~~~
jlgreco
Both ssh and sshd I guess? Do both ends need to support the feature?

------
scotty79
...and keep your development copy of the project on ramdisk. Have a script
that you launch as you start work that makes the ramsdisk and puts files from
persistent location there using rsync, then periodically launches rsync to
copy changes you make on our ramdisk back to persistent location.

I used this setup for almost a year. It saved me lot of time and sanity.

~~~
gte910h
Does this buy you that much over a SSD?

~~~
txutxu
swap on ssd is (pseudo and slower) increased ram on big needs.

Example: $5 digital ocean droplet (comes without swap, but it's over SSD, so
you can create a swap file, and is less painful than mechanical swap).

~~~
gte910h
Apologies, I think you know the answer, but I couldn't understand what it was
from this post. Could you reword that?

------
dkl
Too bad it doesn't work on Cygwin. (I share my ssh config between Linux and
Windows.) Too bad ssh doesn't have conditional configuration. (Yes, I know I
could script this, but it's a little more pain than I want for this gain.)

~~~
AceJohnny2
How about VirtualBox in seamless mode instead of Cygwin?

------
voltagex_
Are there any "shorter" options for ControlPath that are still unique? I've
had a few instances of silly hostnames that have caused an error about the
name being too long for the socket.

------
verbatim
Doesn't this only help if github lets you leave ssh connections open and not
doing anything for long periods of time?

Surely if they do, they won't for too long if lots of people start doing this.

------
nilved
What can be done to prevent the "Connection to github.com closed by remote
host" error that comes a few minutes after a push/pull with the Control
settings enabled? Since I'm normally in vim by then, it ruins the layout.

~~~
frankil
Not sure how you can stop that error, but you can run "Ctrl-L" or ":redraw" in
vim to fix your layout.

------
hesselink
I just tried the first part of this (the ssh multiplexing) and instead of
getting faster, 'git fetch' got slower (1.9 to 2.4 seconds). Any ideas why,
and how I can debug/improve it?

------
alexchamberlain
Dear GitHub,

Please setup local ssh termination to your network.

Thanks,

Alex

------
warmwaffles
git fetch

git merge origin/master

A much preferred workflow.

