The problem here is unbounded growth of a git repo. In this specific case, a size limit was triggered. In other circumstances, it would have required too much time to transfer or no storage space left.
Anyway, the problem is that git stores all changes, forever.
A better approach would be to clean old commits, or somehow merge them into snapshots of fixed timespans (like, anything older than a year get compressed into monthly changesets)
A more conservative approach would be some sort of layered storage/archiving, I guess. The older the commit, the less likely it is to be used, so it could be archive in a different storage, optimized for long term. This way you keep the "hot" history small, while keeping the full history still available.
Accidentally publish secrets/credentials? Rotate them yes but also remove them from the published history.
Accidentally publish a binary for a build tool without the proper license? Definitely remove it (and add it to your .gitignore so it doesn't happen again!)
You discover a major flaw that bricks certain systems or causes data loss? Retroactively replace the Makefile/configure script/whatever to print out a warning instead of building the bad build.
AFAIK, copyright problems are fixed by just another commit without history rewrite. There's also no need to care about outdated credentials. Should bugs be fixed by deleting all history too? That code was buggy, bad, bad, must delete? VCS just becomes glorified ftp this way.
I don't think that your points are actually in conflict.
If this is my source code, I want the whole history. I want that 10-year old commit that isn't used in any current branch. A build machine may not need any history: it just wants to check out a particular branch as it is right now, and that works too.
But there is an intermediate case: Let's say that I have an issue with a dependency. I might check out that code and want some history to know what has changed recently, but I don't need that huge zip file that was accidentally checked in and then removed 4 years ago. If it were a consistent problem, perhaps you'd invent some sort of 'shallow' or 'partial' clone, something like this:
A value of a commit approaches zero as it gets older. After a certain threshold, no one will ever see it. Never say never; any reason why should we keep deadweight around?
Your premise is incorrect. The other day I was looking around a repository that's been through many migrations, and found a commit from 2004 that was relevant to my interests.
Isn’t that what packs are for? The raw, content addressable object store has no inherent optimization for reducing repo size. Any changed file is completely copied until a higher level does something to compress that down.
I've never used it before, but from what I understand, it's very powerful but also very confusing and easy to mess up, and of course with a sort of vague ambiguous name that makes it hard to discover; in other words, it's quintessentially git.
Github cant rewrite the refs on their own without breaking users stuff. They can only repack the existing objects, the squashing needs to be done by the developers. Also its a non-fast-forward thing, so it needs to be coordinated between the git users anyways.
Anyway, the problem is that git stores all changes, forever. A better approach would be to clean old commits, or somehow merge them into snapshots of fixed timespans (like, anything older than a year get compressed into monthly changesets)