
Announcing GVFS: Git Virtual File System - janwh
https://blogs.msdn.microsoft.com/visualstudioalm/2017/02/03/announcing-gvfs-git-virtual-file-system/
======
greg7mdp
This is similar to what Google uses internally. See
[http://cacm.acm.org/magazines/2016/7/204032-why-google-
store...](http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext):

"Most developers access Piper through a system called Clients in the Cloud, or
CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13
file system. Developers see their workspaces as directories in the file
system, including their changes overlaid on top of the full Piper repository.
CitC supports code browsing and normal Unix tools with no need to clone or
sync state locally. Developers can browse and edit files anywhere across the
Piper repository, and only modified files are stored in their workspace. This
structure means CitC workspaces typically consume only a small amount of
storage (an average workspace has fewer than 10 files) while presenting a
seamless view of the entire Piper codebase to the developer."

This is a very powerful model when dealing with large code bases, as it solves
the issue of downloading all the code to each client. Kudos to Microsoft for
open sourcing it, and under the MIT license no less.

~~~
general_ai
Google is far more advanced than this. They have one giant monorepo (Piper)
that's backed by Bigtable (or at least it was, when I was there). Piper was
mostly created in response to Perforce's inability to scale and be fault
tolerant. Until Piper came along, they would have to periodically restart The
Giant Perforce Server in Mountain View. Piper is 24x7x365 and doesn't need any
restarts at all. But the key bit here is not Piper per se. Unlike Microsoft,
Google also has a distributed, caching, incremental build system (Blaze), and
a distributed test system (Forge), and they are integrated with Piper. The
vast majority of the code you depend on never actually ends up on your
machine. Thanks to this, what takes hours at Microsoft takes seconds at
Google. This enables pretty staggering productivity gains. You don't think
twice about kicking off a build, and in most cases no more than a minute or
two later you have your binaries, irrespective of the size of your transitive
closure. Some projects take longer than that to build, most take less time.
Tests are heavily parallelized. Dependencies are tracked (so tests can be re-
run when dependencies change), there are large scale refactoring tools that
let you make changes that affect the entire monorepo with confidence and
without breaking anyone.

Google's dev infra is pretty amazing and it's at least a decade ahead of
anything else I've seen. Every single ex-Googler misses it quite a bit.

~~~
d0vs
> Google's dev infra is pretty amazing and it's at least a decade ahead of
> anything else I've seen. Every single ex-Googler misses it quite a bit.

This may be naive but why not recreate it as an open source project?

~~~
kentonv
Blaze has been: [https://bazel.build/](https://bazel.build/)

Forge and Piper are built on Google's internal tech stack and designed for
Google's production infrastructure, so open sourcing them would be a very big
project. I think it would be a lot more likely for them to be offered as a
service -- and that might be more useful to users anyway, since you'd be able
to share resources with everyone else doing builds, rather than try to get
your own cluster running which might sit idle a lot of the time. Of course,
there are privacy issues, etc.

(Disclaimer: I'm purely speculating. I left Google over four years ago, and
have no idea what the tools people are up to today.)

------
chokolad
There is a discussion thread on r/programming, where MS folks, who implemented
this answer questions. A lot of questions like why not use multiple repos, why
not git-lfs, why not git subtree, etc. are answered there

[https://www.reddit.com/r/programming/comments/5rtlk0/git_vir...](https://www.reddit.com/r/programming/comments/5rtlk0/git_virtual_file_system_from_microsoft/)

~~~
stinos
Thanks for bringing this up, it was actually a more interesting read than this
thread. Less trolling, more facts and also interesting to read stuff I didn't
happen to know. Like

 _One of the core differences between Windows and Linux is process creation.
It 's slower - relatively - on Windows. Since Git is largely implemented as
many Bash scripts that run as separate processes, the performance is slower on
Windows. We’re working with the git community to move more of these scripts to
native cross-platform components written in C, like we did with interactive
rebase. This will make Git faster for all systems, including a big boost to
performance on Windows._

~~~
EdHominem
> "We’re working with the git community to move more of these scripts to
> native cross-platform components written in C"

Sad. Rather than fix the root problem they rewrite the product in a less-agile
language and require everyone to run opaque binaries.

They probably even think they're doing a good thing.

~~~
Analemma_
C is portable, bash scripts are not.

~~~
EdHominem
Bash is portable across other OSes... They could work on a good port. Or,
remove some bash-isms from the code so it would work in another shell if that
was an issue.

I understand they took the initially easy route. But it'll be harder for
everyone to use that code now, including them.

------
tambourine_man
It's interesting how all the cool things seem to come from Microsoft these
days.

I still think we need something better than Git, though. It brought some very
cool ideas and the inner workings are reasonably understandable, but the UI is
atrociously complicated. And yes, dealing with large files is a very sore
point.

I'd love to see a second attempt at a distributed version control system.

But I applaud MS's initiative. Git's got a lot of traction and mind share
already and they'd probably be heavily criticized if they tried to invent its
own thing, even if it was open sourced. Will take a long time to overcome its
embrace, extend and extinguish history.

~~~
sytse
Maybe something that has the data models of git but has a more consistent
interface? Today on Git Merge there was a presentation about
[http://gitless.com/](http://gitless.com/)

For example one of the goals is to always allow you to switch branches. Stash
and stash pop would happen automatically and it would even work if you're in
the middle of a merge.

~~~
Ajedi32
I'm still waiting for a decent GUI that takes full advantage of the simplicity
of git's underlying data model. The CLI is okay and I've gotten really good
with it, but fundamentally I think git's DAG is something that would be best
represented and manipulated graphically.

[Reinventing the Git Interface][1] was written almost 3 years ago now and yet
to my knowledge nobody's implemented anything quite like that yet.

[1]: [http://tonsky.me/blog/reinventing-git-
interface/](http://tonsky.me/blog/reinventing-git-interface/)

~~~
daxelrod
Git Kraken has some neat ideas around dragging bits of the DAG around to
manipulate them.

~~~
Ajedi32
Yeah, I've been meaning to try that for a while now. Unfortunately I can't use
it at work because Kraken still don't support connecting to the internet
through a proxy, and they won't let you use it offline.

------
gvb
Using git with large repos and large (binary blob) files has been a pain point
for quite a while. There have been several attempts to solve the problem, none
of which have really taken off. I think all the attempts have been (too)
proprietary – without wide support, it doesn’t get adopted.

I'll be watching this to see if Microsoft can break the logjam. By open
sourcing the client and protocol, there is potential...

Other attempts:

* [https://github.com/blog/1986-announcing-git-large-file-stora...](https://github.com/blog/1986-announcing-git-large-file-storage-lfs)

* [https://confluence.atlassian.com/bitbucketserver/git-large-f...](https://confluence.atlassian.com/bitbucketserver/git-large-file-storage-794364846.html)

Article on GitHub’s implementation and issues (2015):
[https://medium.com/@megastep/github-s-large-file-storage-
is-...](https://medium.com/@megastep/github-s-large-file-storage-is-no-
panacea-for-open-source-quite-the-opposite-12c0e16a9a91)

~~~
cies
I think Joey Hess' attempt at "solving the problem" deserves a mention.

It is open source (GPLV3) licensed. [not proprietary]

Written in Haskell. [cool aid]

Currently has 1200+ stars on Github and is part of at least Ubuntu
([http://packages.ubuntu.com/search?keywords=git-
annex](http://packages.ubuntu.com/search?keywords=git-annex)) since 12.04.
[shows something for support and adoption]

edit: Link to Github [https://github.com/joeyh/git-
annex](https://github.com/joeyh/git-annex) \-- thanks dgellow

~~~
Ajedi32
For the problem of large files I think Git LFS has largely won out over git
annex, mostly because it's natively supported by GitHub and GitLab and
requires no workflow changes to use.

~~~
WorldMaker
Atlassian's Bitbucket and Microsoft's Visual Studio Team Services both also
support Git LFS.

------
kentt
It's disappointing that all the comments are so negative. This is a great idea
and solves a real problem for a lot of use cases.

I remembering years ago Facebook says it had this problem. A lot of the
comments were centered around that you could change your codebase to for what
git can do. I'm glad there's another option now.

~~~
anon987
It's because the 'problem' it solves is a corner case that's rarely
encountered. I love their absurd examples of repos that take 12 hours to
download. How many people have that problem, really?

All they did is create a caching layer.

~~~
nine_k
If you deal with code, the case is marginal for you.

If you deal with graphics, audio assets, etc, the binary-blob type of data,
the case is central.

~~~
aanm1988
This is about code, and code history. Just insane volumes.

------
wyldfire
I'm immediately reminded of MVFS and clearcase. Lots of companies still use
clearcase, but IMO it's not the best tool for the job. git is superior in most
dimensions. From what this article says, it's not quite the same as clearcase
but there's certainly some hints of similarities.

The biggest PITA with clearcase was keeping their lousy MVFS kernel module in
sync with ever-advancing linux distros.

I really liked Clearcase in 1999, it was an incredible advancement over other
offerings then. MVFS was like "yeah! this is how I'd design a sweet revision
control system. Transparent revision access according to a ranked set of
rules, read-only files until checked out." But with global collaborators,
multi-site was too complex IMO. And overall, clearcase was so different from
other revision control systems that training people on it was a headache.
Performance for dynamic views would suffer for elements whose vtrees took a
lot of branches. Derived objects no longer made sense -- just too slow. Local
disk was cheap now, it got bigger much faster than object files.

> However, we also have a handful of teams with repos of unusual size! ... You
> can see that in action when you run “git checkout” and it takes up to 3
> hours, or even a simple “git status” takes almost 10 minutes to run. That’s
> assuming you can get past the “git clone”, which takes 12+ hours.

This seems like a way-out-there use case, but it's good to know that there's
other solutions. I'd be tempted to partition the codebase by decades or
something.

~~~
tcbawo
I used Clearcase (on Solaris) in 1999 and was not a fan. It slowed our build
times by at least 10x. I'm sure it was probably set up wrong, but this was a
Fortune 100 company with lots of dedicated resources.

~~~
foobiekr
Clearcase performance for many builds was specifically impacted by the very
poor performance of stat(). You could make very real improvements on build
times by reducing the number of calls to stat(). It was sort of amazing.

Clearcase also suffered, at least in my experience, from a clumsy and ugly
merging process and deeply unintuitive command set which meant everyone who
"used clearcase" actually tended to use some terrible homegrown wrapper
scripts.

Still, considering it was the last remaining vestige of the Apollo Domain OS,
not bad.

------
dewyatt
I think they could have picked a name that doesn't conflict with GNOME Virtual
File System (GVfs).

~~~
wslh
They used to choose very bad names like .NET or COM [1] (this predates
Internet) makes searching information very tricky. MSDN doesn't help.

[1]
[https://en.wikipedia.org/wiki/Component_Object_Model](https://en.wikipedia.org/wiki/Component_Object_Model)

~~~
foxrob92
A few years ago I had to do some interfacing between python and some modelling
software. I went through a COM interface, and it was a bloody nightmare to
find docs.

I later found out I could have looked for "ActiveX" and found similar results.

~~~
wslh
A few years ago your best friend would have been
[https://www.codeproject.com](https://www.codeproject.com) . The issue of
searching difficult questions using another keyword (e.g. ActiveX) is that you
can miss the only answer available. For common questions (with answers!) you
can find an answer with all the variations.

------
daigoba66
The article doesn't directly say it, but are they migrating the Windows source
code repository to git? That seems like a big deal.

I seem to recall that Microsoft has previously used a custom Perforce "fork"
for their larger code bases (Windows, Server, Office, etc.).

~~~
vtbassmatt
Yes, Windows is migrating to Git.

~~~
adrianN
Do you have a citation for that?

~~~
janwh
It was stated in Saeed Noursalehi's talk "Scaling Git at Microsoft", held at
Git-Merge 2017. Until the conference recordings are available, here is the
closest thing to a "source":
[https://twitter.com/no_more_ducks/status/827479795185364993](https://twitter.com/no_more_ducks/status/827479795185364993)

------
Ericson2314
If I understand this correctly, unlike git-annex and git lfs, this not about
extending the git format with special large files, but changing the algorithm
for the current data format.

A custom filesystem is indeed the correct approach, and one that git itself
should have probably supported long ago. In fact, there should really only be
one "repo" per machine, name-spaced branches, and multiple mountpoints a la
`git worktree`. In other words there should be a system daemon managing a
single global object store.

I wonder/hope IPFS can benefit from this implementation on Windows, where FUSE
isn't an option.

~~~
manojlds
The blog post does mention that some changes have been made to git (in their
fork)

~~~
sdesol
I did a quick comparison of Microsoft's fork and it appears they have done
quite a bit with it.

Microsoft's fork contains 67,522 commits. The official Git repo contains
45,810. It appears the bulk of the work started in 2010, with significant ramp
up of development in 2015.

[https://gitsense.com/mgit-vs-git/history.png](https://gitsense.com/mgit-vs-
git/history.png)

Looks like Microsoft only really introduced about 100 more new files.

[https://gitsense.com/mgit-vs-git/files.png](https://gitsense.com/mgit-vs-
git/files.png)

Microsoft's repo contains 1712 contributors. Git's repo contains 1685
contributors. So it looks 20 - 30 employees worked on Microsoft's fork.

[https://gitsense.com/mgit-vs-git/mgit-
contributors.png](https://gitsense.com/mgit-vs-git/mgit-contributors.png)
[https://gitsense.com/mgit-vs-git/git-
contributors.png](https://gitsense.com/mgit-vs-git/git-contributors.png)

------
hoov
This is pretty big news. I know that when I was at Adobe, the only reason that
Perforce was used for things like Acrobat, is because it was simply the only
source control solution that could handle the size of the repo. Smaller
projects were starting to use Git, but the big projects all stuck with
Perforce.

------
kevincox
I love this approach. From working at Google I appreciate the virtual
filesystem, it makes a lot of things a lot easier. However all my repos are
large enough to fit on a single machine so I wish there was a mode where it
was backed by a local repository, however the filesystem allows git to avoid
tree scans.

Basically most operations in git are O(modified files) however there are a few
that are O(working tree size). For example checkout and status were mentioned
by the article. However these operations can be made to O(modified) files if
git doesn't have to scan the working tree for changes.

So pretty much I would be all over this if:

\- It worked locally.

\- It worked on Linux.

Maybe I'll see how it's implemented and see if I could add the features
required. I'm really excited for the future of this project.

------
rethab
Assuming that the repo was this big in the beginning, I wonder why the ever
migrated to git (I'm assuming they did, because they can tell how long it
takes to checkout). At least when somebody "tries" do the migration, wouldn't
they realize that maybe git is not the right tool for them? Or did they
actually migrate and then work with "git status" that take 10 minutes for some
time until they realize they may need to change something?

Also, it would have been interesting if the article mentioned whether they
tried other approaches taken by facebook (mercurial afaik) or google.

~~~
xearl
To me it sounds like these numbers are from a migration-in-progress. So they
are trying, but instead of giving up and saying "not the right tool for us"
they are trying to improve the tool.

------
imron
> repos of unusual size

Sounds like they've almost solved the secrets of the fire swamp!

~~~
krallja
Repos of Unusual Size? I don't think they exist.

------
rbanffy
Did they really need to make a name collision?

[https://en.wikipedia.org/wiki/GVfs](https://en.wikipedia.org/wiki/GVfs)

------
Navarr
This sounds like a solid use case and a solid extension for that use case -
but definitely not the end-all-be-all.

For one, it's not really distributed if you're only downloading when you need
that specific file.

But that doesn't change the merrits of this at all, I think.

------
cafebabbe
My sysadmin: "we won't switch to git because it can't handle binary files and
our code base is too big"

Our whole codebase is 800MB.

~~~
alkonaut
Our codebase (latest tree) is similar, but switching to git it's the total
history size that is the problem. Our history is well over 25GB which git
doesn't handle very gracefully.

~~~
kevincox
History shouldn't be a problem, you can do a shallow checkout. But you will
have to store the working tree at least on your workstation.

This solves the next scaling problem of avoiding managing the whole working
tree. (without requiring narrow clones which have significant downsides)

~~~
alkonaut
Yeah, the working tree works well to have locally, and that's what's done with
svn currently.

The problem is that I also want a fast log/blame for any file back to the
beginning of time - but I'm ok with that requiring devs connecting to the
server containing the history (as with svn).

I also haven't found a way to make git work smoothly in shallow mode as the
default, e.g can I make checkout of a branch always remember it must be
shallow? Can I make log use remote history when necessary etc? I don't want to
fight the tool all the time because I'm using a nonstandard approach.

------
yakk0
I appreciated the Princess Bride reference with "repos of unusual size"

~~~
wyldfire
I don't believe in them.

------
0X1A
Just to make sure I have this right, this has to do with the _amount_ of files
in their repo and not the _size_ of the files? So projects like git annex and
LFS would not help the speed of the git repos?

~~~
WorldMaker
That's how I read it, that this is about monorepos with file trees with large
numbers of files where users don't necessary need every single file in their
local worktree to get work done.

I'd assume this GVFS would work hand in hand with Git LFS for the use case of
large files.

------
OJFord
> _when you run “git checkout” and it takes up to 3 hours, or even a simple
> “git status” takes almost 10 minutes to run. That’s assuming you can get
> past the “git clone”, which takes 12+ hours._

How on Earth can anybody work like that?

I'd have thought you may as well ditch git at that point, since nobody's going
to be _using_ it as a tool, surely?

    
    
        git commit -m 'Add today\'s work - night all!' && git push; shutdown

~~~
stinos
_How on Earth can anybody work like that?_

Since it's look like they are still migrating I don't think a lot of people
actually did work like that. Maybe just a couple of times to figure out how
long it would actually take. Or maybe those who really use it are actually
doing shallow clones which would probably take much less time. Actually
shallow clone is nice but doesn't seem to be known very well. I use it often
if I know I won't ever need the full history anyway. Also great to shave time
of CI builds.

~~~
OJFord
Shallow clones are great, until they're not. I don't think I've ever (having
tried a few times) cleanly cloned 'below' the graft point when I've needed to,
or a different branch.

------
mortdeus
Or how about we start some compartmentalizing your codebase so that you can
like. You know, organize your code and restore sanity to the known universe.

I think when the powers that be said that whole thing about geniuses and
clutter, they were specifically talking about their living spaces and not
their work...

------
zwischenzug
Does anyone know Microsoft's open source policy works internally? I'm thinking
from a governance perspective, as I'm involved in a similar effort at $WORK.

------
scotty79
I had a medium sized project in Ruby on Rails as git repo inside vm.

It was slow to do 'git status' and other common commands. Restarting RoR app
was also slo. I've put repo on RAM disk which made the whole experience at
least few times faster.

Since all was in vm that I rarely restarted I didn't have to recreate files on
ram disk all that often. I was syncing changes with the persistent disk with
rsync running periodically.

------
myrandomcomment
"For example, the Windows codebase has over 3.5 million files and is over 270
GB in size."

Okay, so this is a networking issue. Or is it a stick everything in the same
branch issue?

Whatever the reason here the issue is pure size vs. network pipe, pure and
simple. Hum, when can I get a laptop with a 10GBaseT interface?

One of the issue with the way they are doing this (only grab files when
needed) is you cannot really work offline anymore.

------
amingilani
I'm no expert but if most single developers only use 5-10% of the codebase in
their daily life, wouldn't it make to maybe break the project into multiple
codebases of about 5% each and use a build pipeline that combines them
together when needed?

Although I could definitely be wrong but this sounds a lot like monolith vs
microservices to me.

------
nojvek
Microsoft is moving away from source depo to git it seems. I think its
fantastic that a company like Microsoft is adapting git for its big king and
queen projects such as office and windows. Also open sourcing the underlying
magic tells a lot about the new Microsoft. They're really moving away from
not-invented here syndrome

------
krishoog
Does this article imply that Microsoft itself is also moving towards Git?
Instead of e.g. using their own product like TFS?

~~~
daigoba66
TFS has first-class support for Git repositories (in addition to the classic
TFSVC repositories). So yes, they're moving more and more to Git. But no,
they're not abandoning TFS.

Interestingly, however, most of their "open source" efforts (.NET, C#, and
related) are all on GitHub rather than their own hosted offerings: CodePlex
(which is basically dead) or "Visual Studio Team Services".

~~~
vtbassmatt
Not sure why Visual Studio Team Services is in scare quotes -- that's the
product's name. And it's not an open source hosting service, which handily
explains why Microsoft's open source isn't hosted there.

Disclosure: I'm a PM on VSTS/TFS, and I own part of version control.

~~~
sterwill
Are those scare quotes or just regular old quotes? TFS and associated
technologies have been through a lot of names (Visual Studio Team System, TFS,
Team Services, and probably a few I can't remember).

Disclaimer: used to work on TFS team.

~~~
vtbassmatt
Ha, fair point :)

------
b1gtuna
MS has been doing really neat stuff lately. I never worked on a project that
takes hours to clone. The largest repository I regularly clone is the Linux
repo. It still takes only a few minutes. Yet I can see the GVFS being
beneficial for me as I spend most of the time just reading the code (so no
need to compile) on my laptop.

------
alkonaut
Could this also help a smaller repo but with long history, making the total
repo size too large?

The whole repo is needed for every developer - i.e it's not possible to do a
sparse checkout but many gigs of old versions of small binaries I would prefer
to keep only at the server until I need it (which is never).

------
acqq
And for all those who still try to stick to anything older:

[https://github.com/Microsoft/gvfs](https://github.com/Microsoft/gvfs)

"GVFS requires Windows 10 Anniversary Update or later."

------
srott
I remember few years ago Git under Windows was very slow, is it still true?

~~~
WorldMaker
Git on Windows has gotten very fast and stable in the last few years.
Microsoft employees themselves, among others of course, have directly
contributed to a much better Git experience on Windows.

~~~
Groxx
The reddit thread has quite a few people with opposing opinions, fwiw. Mostly
"stuff that's ~instant on unix takes many seconds on Windows" and the like.
It's true that Microsoft has contributed a lot (to the benefit of all), but
from what I'm seeing it sounds like it's still lagging quite a bit.

I haven't touched Windows in quite a while, so I can't really make a claim
either way.

~~~
WorldMaker
I'm at least speaking from daily use in my anecdotes. Apples for apples, yes
Windows is going to lag behind Linux. [1] That doesn't mean it isn't fast and
stable from the perspective of day-to-day Windows usage, and definitely as I
stated in the previous comment, it is much faster and more stable on Windows
today compared to Windows a few years ago.

Several of the anecdotes on the reddit thread don't even seem to take account
what version the offending slowness was happening in, and anecdotally every
time I've helped a Windows user experiencing slowness enough to complain about
it, they've been years behind on their git version and installing the latest
removed the complaints.

[1] ...and is just about guaranteed to in the many places in git where a
command is still built as a tower of bash scripts calling perl scripts calling
more bash scripts... If you read the changelogs, a lot of the performance
optimizations that are helping _every platform_ are the places where entire
commands are getting replaced with C versions of themselves.

~~~
Groxx
Ah, you're right, you were referring to on-windows progress.

And "years behind on their git version" is I think the norm for git users :) I
pretty regularly have to recommend that coworkers / etc upgrade from git 1.7
(or 1.8 or something similar) to an even-remotely-modern version.

------
dstaheli
Check out the GVFS back story and details here:
[https://news.ycombinator.com/item?id=13563439](https://news.ycombinator.com/item?id=13563439)

------
pjmlp
Quite nice use of C# and C++/CX for a virtual system implementation.

~~~
contextfree
looks like C++/CLI (C++/CX reused its syntax and maybe parsing code, but
they're still distinct)

~~~
pjmlp
Yes, hence why Microsoft had lots of trouble to convince developers that don't
read documentation, that they are distinct and C++/CX gets compiled to just
pure native code, as they were spreading misinformation about it.

In any case, when C++/WinRT gets feature parity, I imagine it will eventually
be deprecated, depending which one gets more developer love.

------
lolikoisuru
Is it really that fucking hard to check if your package name is unique?

Here is another virtual filesystem with the exact same name:
[https://wiki.gnome.org/Projects/gvfs](https://wiki.gnome.org/Projects/gvfs)

Debian package for it:
[https://packages.debian.org/jessie/gvfs](https://packages.debian.org/jessie/gvfs)

------
mfontani
So... what happens when one runs "git grep foo" on it?

~~~
kevincox
It will be slow. Small steps. But in practice companies with large repos have
other search solutions so that each user doesn't have to do a raw search on
the entire working tree.

------
igtztorrero
Anybody knows what does Linus think about it ?

------
cikey
Can we use this together with git LFS?

------
ianopolous
Couldn't they use git over IPFS?

~~~
kevincox
No. The problem isn't only the storage or fetching of the files (this is the
easy bit :) ), it's the operations that detect changes in the working tree. If
you have a large tree scanning it becomes slow.

Using a vfs allows you to track which files have changed so that these
operations no longer need to scan. Now they are O(changed files) which is
generally small.

Now IPFS has a vfs, but it is just a simple read/write interface. This vfs
needs slightly more logic to do things like change the base revision and track
changes.

~~~
ianopolous
IPFS clearly does a lot more than storing and fetching files. Seriously, go
have a read. A single hash can represent an arbitrarily large subtree of data
(Microsoft's entire repo). Using an IPLD selector (in its simplest form, a
path beyond the hash) an arbitrary sub component can be addressed. This can be
used to avoid scanning entire subtrees (maintaining your O(changed files)). To
commit your modifications is O(changed files + tree depth to the root of your
modifications) you never need to do anything with the rest of the repo.

For tracking changes (i.e. mutable data) you can use IPNS and create a signed
commit history. This will be built on IPFS eventually so it's only a matter of
time.

------
zahreeley
Don't believe in modular development with smaller repos?

~~~
Groxx
Yeah, I see things like this, and I always wonder why they don't make a
submodule tree.

It wasn't an option a couple years ago, but submodules work fine now. With a
little bit of scripting to wrap common uses, they're practically pain-free.

~~~
shandor
Could you elaborate a little what has changed there? My understanding is that
submodules are still considered a mess, but would be really nice if some
actual improvements have happened.

~~~
Groxx
[https://github.com/blog/2104-working-with-
submodules](https://github.com/blog/2104-working-with-submodules) is a decent
overview of what was available a couple years ago (things have improved a bit
since then too), though it needs a tl;dr. So here's an attempt.

1) When you cd into a submodule, it's the same as if it you just cloned into
there, all normal git commands work. need to update your submodule-lib? cd, do
stuff, git push, _at worst_.

2) `git clone --recursive` instead of just `git clone`, no need to `git
submodule init --update` / etc.

3) `git pull` will automatically pull submodules when the parent repo changes
which commit it's using. `git push` should push any changes too, though the
manpage isn't explicit (there's an identical flag/config value for push as for
pull to control this). also solvable with `pushall` and `pullall` aliases,
which is a very minor re-education.

4) submodules can track submodule-repo branches, not just commits. auto-
updating ftw? if you want it.

5) there are some somewhat-unhappy defaults / you probably want `git diff
--submodule=log` and `git config --global status.submoduleSummary true`, etc.
these (and aliases) are easily fixed the same way as you probably already have
for templated .gitignore / etc - just generate some company-wide defaults, and
move on with your life.

\---

A lot of the "you have to git submodule command everything all the time" is a
thing of the past, the difficulty now is largely related to it being a minor
conceptual difference from a monorepo. It's a repo in a repo, and you're
manipulating the pointer to the version. There are more options because of
this, but they exist for good reasons, and they're not too hard to wrap your
head around.

[https://git-scm.com/book/en/v2/Git-Tools-Submodules](https://git-
scm.com/book/en/v2/Git-Tools-Submodules) also has some nice examples, and e.g.
`git submodule foreach` can simplify a lot if you actually dive into
submodules and make changes across multiple simultaneously (big refactor
maybe?).

------
testUser69
Why is that so hard to believe? America is run by Donald Trump.

The problems with these companies is that developers aren't making technical
decisions, it's executives who know nothing about computer science. That's why
Windows 10 is such a mess with spyware and adware.

Now they have some FOSS advocate who doesn't really know anything about
software or VCS but saw that an internal problem they were trying to solve was
making their code base work with git. So he decided it would be really cool
for Microsofts image to develop an open source extension of git, instead of
actually solving the underlying problems (because he didn't recognize them).
Now he's probably got a promotion at Microsoft for "fixing" their problem with
git.

~~~
dang
We detached this subthread from
[https://news.ycombinator.com/item?id=13559893](https://news.ycombinator.com/item?id=13559893)
and marked it off-topic.

------
ksec
Interesting M$ is moving to Git and the rest of the world is pretty much
Github & alternatives while Facebook and Google are going with Mercurial. I
actually liked Mercurial apart from its name being little hard to pronounce,
but it doesn't seems to get used anywhere.

So are the DVCS converging to Git and Git only?

~~~
jgalt212
In the open source areana, it certainly seems like the game is over. But as
others mentioned, both FB and GOOG are big Mercurial users and contributors.

Our shop uses Mercurial becuase of its Python basis and the amount of time and
effort it takes to master Git makes me draw strong and uncomfortable parallels
to emacs.

