
How to Handle Big Git Repositories - brudgers
http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
======
yeukhon
submodules is a curse IMO. I used submodules in the pass and I hate it. Some
of our projects still use submodules and I am vocal about removing the
reference. My reason disliking it is that 1) I always forget to do recursive,
2) I hate to muddle my root repository with another repository inside of it,
3) I forget to update the commit, and 4) most of the time I don't need
submodules. I can live with more git clone commands. Oh add one more - a lot
of people use git@ rather than [https://](https://) because I don't use SSH to
clone (I enable two-auth however) on my laptop, so whenever I clone one of our
org's project which has sudmodule, I have to fix sudmodule config....

Personally I really dislike the fact Ansible's plugins are now sudmodules, but
not up to me to decide.

The only large repo I have worked with was Firefox (mozilla-central) but I use
Mercurial. Either way the speed was never an issue for me for a fresh clone -
I expect a while. In git I could clone specific branch or specify a depth if I
know I just need specific branch to work with. I guess when you work with real
big repo like Facebook's or Google, then maybe there is a concern.

------
sytse
Git Annex is a great way to handle large files. You can version them with git
but they don't make the repository larger. And they are synced at rsync
speeds. GitLab.com and GitLab EE have build in support for it
[https://about.gitlab.com/2015/02/17/gitlab-annex-solves-
the-...](https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-problem-
of-versioning-large-binaries-with-git/)

------
res0nat0r
BFG Repo Cleaner is also a good tool which I don't see mentioned in the
article: [https://rtyley.github.io/bfg-repo-
cleaner/](https://rtyley.github.io/bfg-repo-cleaner/)

~~~
e40
This looks very interesting, but I'm a little nervous about it since it would
appear it operates directly on the repo internal files.

Anything you can say to make me feel better about using it?

~~~
robertotyley
I'm the author of the BFG, I'll try to help with that.

The BFG uses the JGit library to act on Git repository internals - JGit's the
library used by Google for hosting the Android codebase, handling thousands of
commits a day, and it's basically a pretty serious library. As for the BFG
itself, I've used it myself on several critical repositories at the Guardian,
and it's been used on many major projects around the world - here's a hundred
tweets by different people who've used the BFG:

[https://twitter.com/rtyley/timelines/464727264345993216](https://twitter.com/rtyley/timelines/464727264345993216)

...and here's a comment by the head maintainer of Git itself, Junio Hamano:

[https://plus.google.com/+JunioCHamano/posts/Lm7iBwSLvoo](https://plus.google.com/+JunioCHamano/posts/Lm7iBwSLvoo)

So, I guess, most people like it.

------
m0th87
We made git fit to manage big assets as well:
[https://github.com/dailymuse/git-fit](https://github.com/dailymuse/git-fit)

Didn't go with git annex because, frankly, I never felt like I grasped what
was going on in the background when using it.

------
shurcooL
Someone should show this article to Jonathan Blow.

------
aikinai
Does anyone know if mainline git has any plans to eventually add better
support for binary files? There are a number of hacks and bolt-on solutions,
but it would be a lot easier to propose git for projects if it would just work
as-is.

------
skeletonjelly
Wish Atlassian's Stash supported Git Annex natively

