Hacker News new | comments | show | ask | jobs | submit login

GitHub only shares objects among forks. Source: I used to work on GitHub's Git Infrastructure team, but this is publicly available information.

You can read about their architecture in their discussions of Spokes, which replicates repository networks (the original repository and its forks) across data centers. eg: https://githubengineering.com/stretching-spokes/

Trying to put all of GitHub's object files in a single packfile - even just putting them on a single server - would be impossible.

But even on hosting providers with a bespoke implementation - that do not use core git to manage Git repositories - this would be challenging. We have a custom Git server implementation in Visual Studio Team Services, but it still makes sense to shard object storage with the repository: you have to worry about scalability and performance, but also things like data sovereignty. We can't just put a user's git repository in some global SQL Azure database that contains all the repositories in Visual Studio Team Services: the repositories need to be geographically located with the VSTS account they created.




I vaguely remember that somebody from github claimed in HN comment that all objects are shared.

Puting whole github in single packfile is obviously impractical, but having whole github on some bespoke Venti/IPFS-style content addressable object store is not.


It's not impossible but it too is impractical for performance reasons. Providers that run git core repack repositories regularly because packfiles are efficient; loose objects are horribly inefficient - even on the local filesystem. Moving them to a network filesystem is impractical.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: