

Rewriting RubyGems.org Git History - arthurnn
http://blog.rubygems.org/2015/02/01/rewriting-history.html

======
akerl_
Obviously I'm not in their place, but in hindsight, my first thought when
reading their initial issue was "Why not make a container that serves the
minimal subset of Rubygems stuff necessary to bootstrap Rubygems, and use that
as a source in the event that the main side died and needed to be kickstarted
again.

Docker itself used a similar strategy for their build process: the recommended
build process uses containers to build the Docker binary, so they made a
precompiled binary available that was crafted "the hard way" that could then
be used for running the saner build process.

~~~
Xylakant
The point that they want to track the versions of gems rubygems.org depends on
in a VCS and track the dependency for each version of rubygems.org. You can
build a container that includes those, but then you'd have to keep the
container in a VCS - that's obviously possible but tangential to the problem.

------
jbergknoff
It would be really helpful to go into some detail on the steps taken to clean
up the git history.

~~~
jerf
It isn't what they did, but my recommendation for most such use cases is the
BFG Repo-Cleaner: [https://rtyley.github.io/bfg-repo-
cleaner/](https://rtyley.github.io/bfg-repo-cleaner/)

Don't be deceived by the "java" invocations, it's written in Scala, so it's
HN-approved!

------
anton_gogolev
The-other-good-DVCS-that-everyone-almost-forgot-about, Mercurial, has exactly
two solutions to the problem of overly large repos which, I presume, contain a
lot of BLOBs.

First is Streaming Clones[1], whereby Mercurial server just takes all the
files from the repository and transmits them over the transport connection,
significantly decreasing TTFB.

The second is Largefiles extension[2], which turns Mercurial into more of a
CVCS, but allows for efficient storage and retrieval of large binary files.

[1]: [https://hglabhq.com/blog/2014/6/20/working-with-mercurial-
ov...](https://hglabhq.com/blog/2014/6/20/working-with-mercurial-over-
unreliable-connections) [2]:
[http://mercurial.selenic.com/wiki/LargefilesExtension](http://mercurial.selenic.com/wiki/LargefilesExtension)

------
apetresc
I find it funny that the problem was they didn't think enough people know
about the `--depth 1` flag to git clone, so the solution is to use submodules
(which far fewer people know how to use properly).

~~~
Xylakant
You can check for a missing submodule (folder) in your setup script and print
helpful advice, but you can't do the same for the large repo: the damage is
already done by the time you get to print instructions. In this case things
are even easier: the submodule is just a backup for the thing that bundler
will do anyways when running "bundle install".

