Couldn't they use git over IPFS?

kevincox · on Feb 3, 2017

No. The problem isn't only the storage or fetching of the files (this is the easy bit :) ), it's the operations that detect changes in the working tree. If you have a large tree scanning it becomes slow.

Using a vfs allows you to track which files have changed so that these operations no longer need to scan. Now they are O(changed files) which is generally small.

Now IPFS has a vfs, but it is just a simple read/write interface. This vfs needs slightly more logic to do things like change the base revision and track changes.

ianopolous · on Feb 4, 2017

IPFS clearly does a lot more than storing and fetching files. Seriously, go have a read. A single hash can represent an arbitrarily large subtree of data (Microsoft's entire repo). Using an IPLD selector (in its simplest form, a path beyond the hash) an arbitrary sub component can be addressed. This can be used to avoid scanning entire subtrees (maintaining your O(changed files)). To commit your modifications is O(changed files + tree depth to the root of your modifications) you never need to do anything with the rest of the repo.

For tracking changes (i.e. mutable data) you can use IPNS and create a signed commit history. This will be built on IPFS eventually so it's only a matter of time.

janwh · on Feb 3, 2017

It was explained in the talk at Git-Merge that their problem is not large files per se. The codebase is huge in the amount of source files alone. It was stated that the repo contains about 3.5 million files. Having IPFS here wouldn't help, would it?

ianopolous · on Feb 3, 2017

yes, IPFS is designed to host the entire internet. You can selectively mount sub-graphs arbitrarily, which means only downloading locally exactly what you need.

janwh · on Feb 3, 2017

Unfortunately that alone would not have allowed Git to be fast on such a huge repository. Normally (without tools as sparse-checkouts) Git would read all files for example on git status. Therefore IPFS would also download all files locally, making it a moot addition.

ianopolous · on Feb 3, 2017

You would probably still need the changes they made to Git itself. But fundamentally IPFS is also a filesystem virtualization layer (so should be able to do everything their file system virtualization is doing - if it doesn't already), and inherently has lazy checkouts.

The main added benefit is that if your friend on the LAN has also checked out the parts you need you can get them directly from them rather than some central repo, which could make a big difference in a company of 10's of thousands of employees.

WorldMaker · on Feb 3, 2017

This is why I think that enhancing the protocol that GVFS uses for downloads with a IPFS backend might be an interesting solution to making everything distributed again.

ianopolous · on Feb 3, 2017

Not sure why I'm being down voted. It was a serious question. IPFS solves the handling of large files (by chunking them), and works in a P2P way in which you can locally mount remote merkle trees (the core data structure of git). I believe this use case is also actually one of the original design goals of IPFS.