
Notes on using git-replace to get rid of giant objects - lainon
https://blog.plover.com/prog/git-replace.html
======
leipert
If you want to go the mentioned "rewrite-whole-history" route, I have used BFG
Repo Cleaner [0] successfully in the past. It's way faster than git-filter-
branch [1].

Another thing: You can work with shallow clones. If the commit is ancient, no
need to pull the whole history of a project.

[0]: [https://rtyley.github.io/bfg-repo-
cleaner/](https://rtyley.github.io/bfg-repo-cleaner/)

[1]: [https://git-scm.com/docs/git-filter-branch](https://git-
scm.com/docs/git-filter-branch)

~~~
gbacon
_[ Addendum 20181009: A lot of people have unfortunately missed the point of
this article, and have suggested that I use BFG or reposurgeon. I have a small
problem and a large problem. The small problem is how to remove some files
from the repository. This is straightforward, and the tools mentioned will
help with it. But because of the way Git works, the result is effectively a
new repository. The tools will not help with the much larger problem I would
have then: How to get 350 developers to migrate to the new repository at the
same time. The approach I investigated in this article was an attempt to work
around this second, much larger problem. ]_

------
tln
Maybe create a `new-master` branch with the offending commits rebased out;
then automation to cherry-pick new commits between `master` to `new-master`.

Or change all CI / dev docs to use `--depth`, and save more than 350MB per
checkout in the process.

~~~
WorldMaker
The rebasing approach is effectively what solutions like BFG and git filter-
branch automate. It's dangerous to do whole branch rebases, especially with
350 developers that you can't tell to stop working while you do it, that way
lies a merge hell that can take years to dig out of. But if you are going to
rebase an entire branch like that, BFG and git filter-branch are faster and
easier than a manual rebase.

`--depth` is a useful tool. There are still some issues with things like
log/blame/annotate in working with more shallow clones. It's hopeful that the
work on the commit-graph cache will help make this a lot better.

~~~
tln
Sure, so BFG/filter-branch/rebase on a NEW branch, that can't affect existing
developers, then automate cherry-picking `master` <=> `new-master`, get all
developers to switch at their own pace...

Good point on log/blame/annotate.

~~~
WorldMaker
"automate cherry-picking" is where I feel the migraine starts in git. I've
seen three-way merge hell and it starts with something like that.

------
TekMol
What would happen if you:

1\. checkout the last commit before the bad one.

2\. cherry-pick every commit after the bad one.

Would that get rid of the bad commit?

~~~
rzzzt
Interactive rebase allows you to do that (it will list all affected commits in
the editor, you can remove the line corresponding to the bad commit, and by
default it picks all other ones to the rebased branch): [https://git-
scm.com/book/en/v2/Git-Tools-Rewriting-History#_...](https://git-
scm.com/book/en/v2/Git-Tools-Rewriting-History#_changing_multiple)

~~~
WorldMaker
git filter-branch and BFG are faster/easier than an interactive rebase for
this particular problem.

The author wants to avoid rebases because they cannot "stop the world" on the
project and have around 350 developers they can't tell to stop working while
the rebase effort is done, and then rebase all their current work on top of
the new branch history.

------
gorkish
Fire the BFG. It's a no-brainer.

~~~
mjd
Cleaning up the mess _after_ firing the BFG is less of a no-brainer. Getting
rid of the bad object is easy. But afterward I have to arrange for 350
developers to switch simultaneously to the rewritten repository. The article
discusses an attempt to find a way to avoid this much larger and much more
difficult problem.

~~~
gbacon
Handling intentionally deleted objects does seem like a missing use case in
git and not a huge leap in terms of the implementation.

~~~
mjd
I think the concern here is the security implications. Suppose commit X
contains some family of configuration files F1, F2, ... that lock down
security. Someone, intentionally or otherwise, retroactively destroys F23 in
commit X using your proposed feature. Now the security configuration at commit
X could have an exploitable loophole. Later on, a serious problem is
discovered in the production deployment and the deployed version is rolled
back to commit X, but without crucial file F23.

Git's current design makes this sort of scenario impossible. Any tampering
with commit X after the fact is immediately detectable, and Git will raise an
alarm and refuse to check out the commit.

~~~
gbacon
I was thinking more along the lines of what you were attempting to do locally
but with downstream effect: mark the blob itself as intentionally deleted
rather than missing or corrupted. This would leave all hashes the same, which
sidesteps the much larger coordination problem.

To cover the rollback case in your comment, give git checkout a safe mode
(either with the default being safe and requiring —-force to override or an
explicit —-safe) that fails if the tree-ish points to any deleted blob.

How to propagate this change is a delicate matter because any fetch may reach
back and destructively update my local clone. When I’ve wished for this
feature, it has always been in the context of a central team repository. Being
able to delete an object from the central repository at least has the benefit
of _subsequent_ clones not being bloated by the 350-megabyte file. Presumably
people with access to the team’s central repository can be trusted to wield
this power responsibly. Other members of the team may optionally run the
hypothetical git delete-object ffff9999 to shrink their respective clones.
Leaving it as plumbing to be run on a repository to which one has
administrative access seems like a decent balance.

This could be a potentially expensive operation because replacing a packed
object with a known-but-deleted object would trigger a repack. The ripple
effect of this new special case — new object type, really — would be
nontrivial and a change that would break backward compatibility.

