

Git performance results on a large repository - justinmk
http://comments.gmane.org/gmane.comp.version-control.git/189776

======
beagle3
If anyone changes anything in the git format to solve this, there are two
changes that need to be piggybacked on the next format change:

1) add a "generation number" - Linus mentioned this is a deficiency that he
would like to fix, but is not worth introducing a backward incompatible
change. (Instead, generation numbers are currently being computed by scanning
through history; luckily, they are only needed in some merge scenarios so it's
not too bad)

2) Add support for "chunked files" - a file that is represented as a text file
of chunk ids (like a directory, sans the file names); to get the file
contents, you recursively unpack each file, and concatenate them.

bup does (2) "artificially" without notifying git. As a result, it can quickly
and efficiently handle small changes to a 100GB file. It's a shame that git
can't do that - the rebuilding of such files need to be in the core, but their
encoding can be left to a bup style extension (and different break algorithms
may match different usage scenarios)

With (2), you'd be able to "git add" your 30GB virtual machine file to your
git repository, and version it. How cool would that be?

------
gms
As of August, they think they are on their way to a solution:
[http://www.quora.com/Facebook-Engineering/Has-Facebook-
solve...](http://www.quora.com/Facebook-Engineering/Has-Facebook-solved-their-
large-Git-repository-problem-yet/answer/Chuck-Rossi)

~~~
justinmk
Fascinating. Thanks for the link. Sam mentions in the thread:

    
    
      With the hard–working part of git on the other end of a network service, 
      you could back it by a re–implementation of git which is written to be 
      distributed in Hadoop.  There are at least two similar implementations 
      of git that are like this: one for cassandra which was written by github 
      as a research project, and Google's implementation on top of their 
      BigTable/GFS/whatever.
    

It will be interesting to see how FB approaches it.

