

Ask HN: How does GitHub minimise wasted disk space from duplicate files? - andrewstuart

There must be enormous potential for duplicate files on Github.<p>How does Github minimise the impact of duplicate files?
======
brown-dragon
I am not sure exactly how github is set up but there is a simple way of
limiting the impact of duplicate files: use hardlinks.

`git clone` uses hardlinks when possible so if the fork is on the same
filesystem, it should limit duplication automatically. The caveat here is that
`git gc` has to be handled with more care (all references have to be updated).

`git clone` also provides a `--reference` for more explicit caching of
duplicate objects. This also requires some care during cleanup but otherwise
can work pretty well.

