"Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer."
This is a very powerful model when dealing with large code bases, as it solves the issue of downloading all the code to each client. Kudos to Microsoft for open sourcing it, and under the MIT license no less.
ClearCase did what a lot of Enterprise companies needed at the time, and most importantly, it created hooks, that were mostly too difficult to remove. Once you create deep integration with ClearCase, you are very much committed to using it long term.
For those who have never worked with/administered ClearCase before, you may not fully appreciate how insanely complex it is. In order to use it, you first have to apply kernel patches from IBM, which shows how committed you had to be. ClearCase provided something that others couldn't, which is why it was so expensive. With Git, everything has changed.
Since nobody owns Git and its implementation, the differentiating factor right now is mostly superficial. There really isn't anything, other than hosting repos at a massive scale, that can't be easily duplicated. Git hosting, in my opinion, is now officially a commodity product. And what differentiates GitHub, GitLab, Bitbucket, etc. is mostly marketing.
With GVFS, things could change. This could be the first step, in Microsoft owning the hard part, that can't be easily duplicated by others. I really don't know what is on their roadmap, but views in ClearCase were pretty powerful and if they are looking at the level of integration, then it could be tough for GitLab, GitHub and others to follow.
I ruminated about ccase/git elsewhere in this thread:
Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.
This would be like if I worked on the core Windows SDKs and I could routinely test my changes against everything from Microsoft Flight Simulator to the Bing server code before I submit.
Because the build server is centralized it can be aggressive about caching intermediate build steps. Incremental builds aren't just incremental for you, but incremental for everybody.
You could. It would just not leave your branch for a while. Around the scheduled merges it would run against the tests of progressively more of the larger organization.
Parts of this actually constituted a good way to prevent being distracted by the bugs of faraway teams. If something reached your branch, where you were working, it was vetted by the tests required to make it into winmain.
The downside was that people got fairly political about what goes into the branch and when, even for small things.
Googlers like to joke internally that Google looks like a race car from the outside and like Moving Castle from Hayao Miyazaki'a cartoon from the inside, but that's not the case at all. Comparatively speaking it's a race car inside and out, it's just that the insiders don't know how shitty things are elsewhere.
P.S. I heard Bing is different, but I have no visibility into it, so can't comment.
At the time, Azure was a joke (partly due to the fact that the initial teams were headed up by ex Office devs with no cloud experience, if I remember correctly). But Azure was cannibalizing the Bing team pretty hard. I hear that strategy worked and that Azure is in much more capable hands now.
I dunno. I don't miss the 1-minute incremental builds. (Maybe they've improved since I left, though.)
BTW Forge is not just the test runner, but the thing that runs all build tasks, farmed out to all servers. Blaze interprets the build language and does dependency tree analysis but then hands off the tasks to Forge. Blaze has been (partially) open sourced: https://bazel.build/
This may be naive but why not recreate it as an open source project?
Forge and Piper are built on Google's internal tech stack and designed for Google's production infrastructure, so open sourcing them would be a very big project. I think it would be a lot more likely for them to be offered as a service -- and that might be more useful to users anyway, since you'd be able to share resources with everyone else doing builds, rather than try to get your own cluster running which might sit idle a lot of the time. Of course, there are privacy issues, etc.
(Disclaimer: I'm purely speculating. I left Google over four years ago, and have no idea what the tools people are up to today.)
For smaller projects, Git+Bazel (open source, non-distributed version of Blaze) works fine if you're working with C++, and other build systems work OK as well, if you're working with other languages.
That's just, like, your opinion, man. There are other bits of infra that integrate with it quite nicely, and would integrate with something like Git quite poorly. One of those things is their code review system. The closest thing I could find to it outside Google is Gerrit, but it's a tremendous pain to set up and use, and it's but a pale shadow of Google's internal tool (Critique).
And also, one does not preclude another: Google has a git wrapper on top of Piper, so you can spend your entire Google career not even touching Piper directly if that's what you prefer. And Piper went beyond the "Perforce model" in ways I can't disclose here.
I've used lots of review tools and worked a bit on Google's review tool and on ReviewBoard in the past, and Reviewable is better than all of them in my opinion (or at least, better than when I last used the others).
One of the core differences between Windows and Linux is process creation. It's slower - relatively - on Windows. Since Git is largely implemented as many Bash scripts that run as separate processes, the performance is slower on Windows. We’re working with the git community to move more of these scripts to native cross-platform components written in C, like we did with interactive rebase. This will make Git faster for all systems, including a big boost to performance on Windows.
Sad. Rather than fix the root problem they rewrite the product in a less-agile language and require everyone to run opaque binaries.
They probably even think they're doing a good thing.
I understand they took the initially easy route. But it'll be harder for everyone to use that code now, including them.
I still think we need something better than Git, though.
It brought some very cool ideas and the inner workings are reasonably understandable, but the UI is atrociously complicated. And yes, dealing with large files is a very sore point.
I'd love to see a second attempt at a distributed version control system.
But I applaud MS's initiative. Git's got a lot of traction and mind share already and they'd probably be heavily criticized if they tried to invent its own thing, even if it was open sourced. Will take a long time to overcome its embrace, extend and extinguish history.
Note that Google and Facebook ran into the same problems Microsoft did, and their solution was to use Mercurial and build similar systems on top of it. Microsoft could've done that too, but instead decided to improve Git, which deserves some commendation. I'd rather Git and hg both got better rather than one "taking over".
They didn't improve git, they only made this for themselves and for their product users. Git doesn't restrict you to a single operating system.
Given Microsoft's recent form, I'd expect this to appear on Linux before long, and possibly osx too. In any case, it's open source so you could always port it yourself.
Of course, but that would be me, and not Microsoft, who's improving git ;-)
I've assumed Microsoft have been making all this stuff all along, but keeping it internal then throwing it away on the probably false assumption that every bit of it is some sort of competitive advantage. I think they're coming around to the idea that at least appearing constructive and helpful to the developer community will help with trying to hire good developers.
For example one of the goals is to always allow you to switch branches. Stash and stash pop would happen automatically and it would even work if you're in the middle of a merge.
[Reinventing the Git Interface] was written almost 3 years ago now and yet to my knowledge nobody's implemented anything quite like that yet.
Out of curiosity, why a whole new attempt? Personally, I'd prefer the approach of "making our current tools better."
Until 1997, forking a project was considered a tragedy. I think things have improved since then :-).
I'd love to see a second attempt at a distributed version control system.
The story of git is a good case-study for people interested in group dynamics.
I was a heavy darcs user at the time and the impression I got was part of the name git in the first place was that it was intentionally the "dumb, dirty, get things done" answer to darcs' (sometimes problematic) smarts. (Remember, the British slang definition of git is "an unpleasant or contemptible person".)
It's also interesting that both Mercurial and git were spun out of the BitKeeper fiasco (BitKeeper was a commercial product that allowed free hosting for Open Source projects, up until the fiasco where they decided they were bored hosting Open Source) by Linux kernel members. Mercurial actually wound up with lead and if I recall correctly was much more usable faster than git was. The problems with Mercurial were that it was written in Python and git was lead by Linus himself and in the apparently more preferable to kernel hackers C, perl, bash, awk, sed, spit, and duct tape development environment.
It's like a whole'nother company after they got rid of Steve Ballmer.
Linus himself admitted that he isnt good at UI. Anyway, I think git just wasnt designed to be used directly, but via another UI. For example, I use it within Visual Studio Code, and that covers about 90 percent of usecases, and then Git Extensions can take care of almost everything else. Sometimes cli is needed, though.
I've still not yet seen a stand-alone GUI for Git that is better than the one that ships with Git Extensions, though.
I'll be watching this to see if Microsoft can break the logjam. By open sourcing the client and protocol, there is potential...
Article on GitHub’s implementation and issues (2015):
It is open source (GPLV3) licensed. [not proprietary]
Written in Haskell. [cool aid]
Currently has 1200+ stars on Github and is part of at least Ubuntu (http://packages.ubuntu.com/search?keywords=git-annex) since 12.04. [shows something for support and adoption]
edit: Link to Github https://github.com/joeyh/git-annex -- thanks dgellow
Pros of git-annex:
- it is conceptually very simple: use symlinks instead of ad-hoc pointer files, virtual files system, etc. to represent symbolic pointer that point to the actual blob file;
- you can add support for any backend storage you want. As long as it support basic CRUD operations, git-annex can have it as a remote;
- you can quickly clone a huge repo by just cloning the metadata of the repo (--no-content in git-annex) and just download the necessary files on-demand;
And many other things that no other attempt even consider having, like client-side encryption, location tracking, etc.
The other half is that almost all of the binary formats can't be merged and so you need a mechanism to lock them to prevent people from wiping out other people's changes. Unfortunately that runs pretty much counter the idea of DCVS.
Don't their artists and designers use version control too? Maybe they just have one such person per team, or each person owns one file, or something like that. Hard to say.
Maybe it's like how I used to work on teams that never used branches - you have various problems that you figure there's probably a solution for, but there's never time to (a) figure out what the solution looks like, (b) shift the whole team over to a brand new workflow and set of tools, and (c) clean up the inevitable mess. So you just work around the problems the same way you always have - because at least that's a known quantity.
Perforce was always the gold standard for stuff like this. Did a great job at not only providing locking but stuff like seamless proxies and other solutions to common problems in that domain(like a usable UI).
This way you can look at both and resolve the conflict.
If two people touch the same file at the same time someone is going to drop their work on the floor and that's a bad thing(tm). You need to synchronize their work with a locking mechanism that informs the user at the edit(not sync) point in the workflow.
I remembering years ago Facebook says it had this problem. A lot of the comments were centered around that you could change your codebase to for what git can do. I'm glad there's another option now.
As well, the Mercurial team does quarterly sprints (I believe), and Google is hosting the next one.
We have new workflows based on some of the underpinnings of Evolution, but without the UI confusion.
At Splunk we had the same problem, our source code was stored in CVS (perforce), but we wanted to switch to git. And not only because we really wanted to use git, but to simplify our development process, mainly because of the much easier branching model (lightweight branching also is available in perforce, but to get it we still needed to do some upgrades on our servers). We also had a problem that at the beginning we had very large working tree, don't think it was 200-300Gb, I believe it was 10x less, and actually required 4-5 seconds for git status. This was not appropriate for us, so we worked on our source code and release builds to split it in several git repos to make sure that git status will take not more than 0.x seconds.
My point is use right tools for right jobs. 4-5 seconds for git status is still a huge problem, I would prefer to use CVS instead if that will not require me to wait 5 seconds for each git status invocation.
How many of them have you used? I've used a couple, to interact with large code bases on the rough order of 300GB. In my experience they don't work very well, because you have to be hygienic about the commands you run or some part of your Git state gets out of sync with some part of your state for the other source control system. So I gave up on those, and I use something similar to Microsoft's solution at work on a daily basis. It's a real pleasure by comparison, and in spite of that I still call myself a Git fan (about 10 years of heavy Git use now). At work the code base is monolithic and everyone commits directly to trunk (at a ridiculous rate, too).
I've heard horror stories about back when people had to do partial checkouts of the source code, and I'm glad that the tooling I use is better.
The idea of breaking up a repository merely because it is too large reminds me the story of the drunkard looking for his keys under the streetlights. The right tools for the right job, sometimes you change the job to match the tools, and sometimes you change the tools to match the job.
Do you need to do anything special? Or is this just a non-issue? Doy you push to master or do you use some sort of pull request gui (like github or phabricator)
But I was always just using Git as an interface to something else, usually Perforce or something similar. When pushing with these tools, you'd only get conflicts if other people changed the same files. Git was just used to create a bunch of intermediate commits and failed experiments on your workstation, which is something that it really excels at.
The only real problem is when the file you're changing is modified by many people on different teams, which often means that it's used for operations, and when that becomes a bottleneck it'll get refactored into multiple files or the data will be moved out of source control.
My point was that with GVFS they are not really solving the problem they had - git status still takes 4-5 seconds, to be that is a lot.
Well, yeah. It's pre-production, and let Microsoft worry about their own problems anyway. But it sounds like GVFS will be killer for people who have large repos that aren't as large as the Windows repo. Even if 4-5s for the 270GB/3.5M file repo is too long, 400-500ms for the 27GB repo is fantastic.
At some point you ask yourself, "Would I split this repo if my tools could handle the combined repo just fine?" If the answer is no, then you're going to be happy that the tools are getting better at handling big repos. Microsoft's choice to exploit the filesystem-as-API and funnel all filesystem interaction through the VCS is a smart choice and there are a ton of opportunities for optimization that don't exist when you're just writing to a plain filesystem.
It sounds like they answered that:
> In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files.
Source will still be distributed among the developers that touch it. Seems like a decent compromise.
> just to give cool kids access to cool tools
Yes. DVCS with the huge code bases, large binary objects and large teams is hardly the optimal approach. But the "cool kids" are just used to use what they use. And now they can pretend to do it even when they have to be always connected, because the files are virtual and remain on the server until really used.
If Microsoft is giving the solution to the "cool kids," no reason to complain about the fact that Microsoft is willing to care for them.
And if you'd ask the "cool kids" why do they need git at all for such scenarios, have fun with the amount of arguments you'll get. Why this one "needs" vi and another "Emacs" etc. The same reasons. You'll find the arguments also in the comments here. Including mentions of Mercurial, the competition, just like "vi or Emacs". Because. Don't ask.
And no, as far as I understand, Google doesn't primarily "use Mercurial", they use something called Piper, and before they used a customized Perforce just like Microsoft did.
"Piper spans about 85 terabytes of data" "and Google’s 25,000 engineers make about 45,000 commits (changes) to the repository each day. That’s some serious activity. While the Linux open source operating spans 15 million lines of code across 40,000 software files, Google engineers modify 15 million lines of code across 250,000 files each week."
Sure, GVFS downloads files only when first read; but maybe it keeps them cached? Maybe you can still work on them and commit changes after you get offline? At least in principle, nothing prevents that.
That being said, you can see more and more people getting off the "Microsoft is evil" train. It's super slow and every bone headed thing that Microsoft does resets the needle for lots of people.
I've always been surprised how much sympathy a company like IBM or Intel gets on HN. They both sue people over patents. That both contribute to non-free software. They were early backers of Linux, though, and that is what people care about superficially.
So, I'm very, very, very sorry that I can't hear their words over the noise of their actions; and in the light of this, I eye each new gift-bearing Redmondian with suspicion.
for example, the majority of their money still comes from windows and office, but open source and hologram BS impress the most vocal anti-MS voices in the media.
my point, though, is that there are other companies that dont draw nearly as much ire that engage in the exact same practices. i think, that early antagonism between MS and Linux users has become a tribal signifier for some people. Microsoft people used to have the same kind of relationship with IBM. They also kept flogging that longer than it really made sense...just like linux and mac fans.
I don't know if "a lot" is the right qualifier. Solitary repos of millions of files have scalability problems even outside the source control system (I mean: how long does it take your workstation to build that 3.5 million-file windows tree?)
A full Android system tree is roughly the same size and works fine with git via a small layer of indirection (the repo tool) to pull from multiple repositories. A complete linux distro is much larger still, and likewise didn't need to rework its tooling beyond putting a small layer of indirection between the upstream repository and the build system.
Honestly I'd say this GVFS gadget (which I'll fully admit is pretty cool) exists because Microsoft misapplied their source control regime.
All they did is create a caching layer.
How many people have that problem, really?
If you deal with graphics, audio assets, etc, the binary-blob type of data, the case is central.
Lacking support for large binary blobs is, like, THE #1 reason that an engineer might have to use an alternative.
All you need is several hundred engineers and your monorepo becomes unwieldy for git to handle.
Microsoft has historically been one of the worst tech companies for tech enthusiasts. We can ignore all the awful things they did in the 90's that stifled open standards (because apparently that doesn't matter anymore?) and just look at 2013, when they were exposed to have been participating in the NSA PRISM project. That means there is a whole team at MS that worked on a secret government project to help violate our fourth amendment rights. Even much of congress didn't know about NSA mass data collection, but Microsoft did.
People who trust MS these days are either naive or employed by them.
The biggest PITA with clearcase was keeping their lousy MVFS kernel module in sync with ever-advancing linux distros.
I really liked Clearcase in 1999, it was an incredible advancement over other offerings then. MVFS was like "yeah! this is how I'd design a sweet revision control system. Transparent revision access according to a ranked set of rules, read-only files until checked out." But with global collaborators, multi-site was too complex IMO. And overall, clearcase was so different from other revision control systems that training people on it was a headache. Performance for dynamic views would suffer for elements whose vtrees took a lot of branches. Derived objects no longer made sense -- just too slow. Local disk was cheap now, it got bigger much faster than object files.
> However, we also have a handful of teams with repos of unusual size! ... You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.
This seems like a way-out-there use case, but it's good to know that there's other solutions. I'd be tempted to partition the codebase by decades or something.
Clearcase also suffered, at least in my experience, from a clumsy and ugly merging process and deeply unintuitive command set which meant everyone who "used clearcase" actually tended to use some terrible homegrown wrapper scripts.
Still, considering it was the last remaining vestige of the Apollo Domain OS, not bad.
NeWS - https://en.m.wikipedia.org/wiki/NeWS
Remember the mess on usenet?
comp.windows.new - not news about Microsoft Windows
I can image people at a forum:
"Hey, GVFS isn't working for me. It crashes with error -504" when I try to mount /nfs/company_data".
Try guessing which GVFS that is.
Our internal libraries need to be compatible with the Core Runtime, so we have to have them target .NET Standard, which is compatible w/ the full .NET Framework or .NET Core. To target .NET Standard, you need the .NET Core SDK/CLI which includes the `dotnet` tool, which is almost never clarified as "the SDK/CLI" in documentation or in talks, but usually just ".NET Core".
Another minor annoyance: to build a .NET Standard-compatible library, you reference the "NETStandardLibrary" NuGet package. Makes a fair amount of sense, but is hard to talk about.
If you're running on Windows and want a smaller server footprint, you can use Windows Server Nano, which requires your apps to target .NET Core Runtime (not .NET Full Framework). Note that this requirement is not true for Windows Server Core. -_-
I later found out I could have looked for "ActiveX" and found similar results.
They have a product in Azure named simply DocumentDB. I don't think "used to" is necessarily the best tense here (:
What could you possibly mean by that? The .com TLD was introduced in 1985, with microsoft.com registered already in 1991. Microsoft COM was created in 1993. (Of course, "the Internet" in any sense of the word predates all of this.)
They just came to the conclusion thas GNOME's product is no threat and that they can just claim the name. Smaller companies  tried that before.
I mean, git itself did this in the beginning.
Microsoft, under Nadella has made me not hate Microsoft again, and that's a tall order because I'm over 40. This is an impressive move, and if they effectively execute all the bits that are possible here, this is just some great work.
(Oh, and I can't even use the word nix now as a catch all for all the POSIX/ POSIX(like) OSs because of nixOS.)
I think going forward, we just have to accept name collision.
I'm over 40 as well, and I can honestly say I've never hated Microsoft or Bill Gates - what I hated (hate?) were/are their business practices.
I honestly wish I could (somehow) just get an apology from the company - something like "we were wrong, we're sorry, and we're working to make things right". Instead, it feels instead like a person you thought of as a friend, after they've put you down, did bad things to you directly and behind your back, you dropped them...then years later starting to do nice things toward you and others, trying to get back into your "good graces" - but never once apologizing for their past actions.
I want to see Nadella's and Microsoft's actions in a good light, I want to see them as an unvoiced apology. At the same time, though, if it were a person doing this, I don't know if I could trust their motives, not matter how sincere or enticing it might look like.
If you are anybody else could point me to a video of Nadella or someone else representing Microsoft making an apology regarding their past actions, it would go a long way toward me accepting their present behavior.
It probably won't make me ever install Windows 10, but I will probably see them in a better light.
GNOME Virtual Filesystem is first search result for "gvfs". Even if you use bing!
In this case, they're both called GVFS AND three of the letters have the same meaning, and they both do relatively similar things.
Even the tooling, and the output of `mount` is bound to be incredible confusing.
I seem to recall that Microsoft has previously used a custom Perforce "fork" for their larger code bases (Windows, Server, Office, etc.).
A custom filesystem is indeed the correct approach, and one that git itself should have probably supported long ago. In fact, there should really only be one "repo" per machine, name-spaced branches, and multiple mountpoints a la `git worktree`. In other words there should be a system daemon managing a single global object store.
I wonder/hope IPFS can benefit from this implementation on Windows, where FUSE isn't an option.
Microsoft's fork contains 67,522 commits. The official Git repo contains 45,810. It appears the bulk of the work started in 2010, with significant ramp up of development in 2015.
Looks like Microsoft only really introduced about 100 more new files.
Microsoft's repo contains 1712 contributors. Git's repo contains 1685 contributors. So it looks 20 - 30 employees worked on Microsoft's fork.
Basically most operations in git are O(modified files) however there are a few that are O(working tree size). For example checkout and status were mentioned by the article. However these operations can be made to O(modified) files if git doesn't have to scan the working tree for changes.
So pretty much I would be all over this if:
- It worked locally.
- It worked on Linux.
Maybe I'll see how it's implemented and see if I could add the features required. I'm really excited for the future of this project.
Also, it would have been interesting if the article mentioned whether they tried other approaches taken by facebook (mercurial afaik) or google.
Sounds like they've almost solved the secrets of the fire swamp!
For one, it's not really distributed if you're only downloading when you need that specific file.
But that doesn't change the merrits of this at all, I think.
Our whole codebase is 800MB.
Otherwise, I hope you replaced your sysadmin.
This solves the next scaling problem of avoiding managing the whole working tree. (without requiring narrow clones which have significant downsides)
The problem is that I also want a fast log/blame for any file back to the beginning of time - but I'm ok with that requiring devs connecting to the server containing the history (as with svn).
I also haven't found a way to make git work smoothly in shallow mode as the default, e.g can I make checkout of a branch always remember it must be shallow? Can I make log use remote history when necessary etc? I don't want to fight the tool all the time because I'm using a nonstandard approach.
I'd assume this GVFS would work hand in hand with Git LFS for the use case of large files.
How on Earth can anybody work like that?
I'd have thought you may as well ditch git at that point, since nobody's going to be using it as a tool, surely?
git commit -m 'Add today\'s work - night all!' && git push; shutdown
Since it's look like they are still migrating I don't think a lot of people actually did work like that. Maybe just a couple of times to figure out how long it would actually take. Or maybe those who really use it are actually doing shallow clones which would probably take much less time. Actually shallow clone is nice but doesn't seem to be known very well. I use it often if I know I won't ever need the full history anyway. Also great to shave time of CI builds.
I think when the powers that be said that whole thing about geniuses and clutter, they were specifically talking about their living spaces and not their work...
It was slow to do 'git status' and other common commands. Restarting RoR app was also slo. I've put repo on RAM disk which made the whole experience at least few times faster.
Since all was in vm that I rarely restarted I didn't have to recreate files on ram disk all that often. I was syncing changes with the persistent disk with rsync running periodically.
Okay, so this is a networking issue. Or is it a stick everything in the same branch issue?
Whatever the reason here the issue is pure size vs. network pipe, pure and simple. Hum, when can I get a laptop with a 10GBaseT interface?
One of the issue with the way they are doing this (only grab files when needed) is you cannot really work offline anymore.
Although I could definitely be wrong but this sounds a lot like monolith vs microservices to me.
Interestingly, however, most of their "open source" efforts (.NET, C#, and related) are all on GitHub rather than their own hosted offerings: CodePlex (which is basically dead) or "Visual Studio Team Services".
Disclosure: I'm a PM on VSTS/TFS, and I own part of version control.
Disclaimer: used to work on TFS team.
Microsoft is moving to Git and we use Team Services / TFS as our Git server for all private repositories. GitHub is only used for OSS since that's where the OSS community is.
> that's where the OSS community is
It's also not just the community, but GitHub provides significantly better integration support, than GitLab. Since GitHub has such a robust API, it's easier to create bots and what not, to help better manage large open source projects.
The whole repo is needed for every developer - i.e it's not possible to do a sparse checkout but many gigs of old versions of small binaries I would prefer to keep only at the server until I need it (which is never).
"GVFS requires Windows 10 Anniversary Update or later."
I haven't touched Windows in quite a while, so I can't really make a claim either way.
Several of the anecdotes on the reddit thread don't even seem to take account what version the offending slowness was happening in, and anecdotally every time I've helped a Windows user experiencing slowness enough to complain about it, they've been years behind on their git version and installing the latest removed the complaints.
 ...and is just about guaranteed to in the many places in git where a command is still built as a tower of bash scripts calling perl scripts calling more bash scripts... If you read the changelogs, a lot of the performance optimizations that are helping every platform are the places where entire commands are getting replaced with C versions of themselves.
And "years behind on their git version" is I think the norm for git users :) I pretty regularly have to recommend that coworkers / etc upgrade from git 1.7 (or 1.8 or something similar) to an even-remotely-modern version.
In any case, when C++/WinRT gets feature parity, I imagine it will eventually be deprecated, depending which one gets more developer love.
Here is another virtual filesystem with the exact same name: https://wiki.gnome.org/Projects/gvfs
Debian package for it: https://packages.debian.org/jessie/gvfs
Using a vfs allows you to track which files have changed so that these operations no longer need to scan. Now they are O(changed files) which is generally small.
Now IPFS has a vfs, but it is just a simple read/write interface. This vfs needs slightly more logic to do things like change the base revision and track changes.
For tracking changes (i.e. mutable data) you can use IPNS and create a signed commit history. This will be built on IPFS eventually so it's only a matter of time.
The main added benefit is that if your friend on the LAN has also checked out the parts you need you can get them directly from them rather than some central repo, which could make a big difference in a company of 10's of thousands of employees.
It wasn't an option a couple years ago, but submodules work fine now. With a little bit of scripting to wrap common uses, they're practically pain-free.
1) When you cd into a submodule, it's the same as if it you just cloned into there, all normal git commands work. need to update your submodule-lib? cd, do stuff, git push, at worst.
2) `git clone --recursive` instead of just `git clone`, no need to `git submodule init --update` / etc.
3) `git pull` will automatically pull submodules when the parent repo changes which commit it's using. `git push` should push any changes too, though the manpage isn't explicit (there's an identical flag/config value for push as for pull to control this). also solvable with `pushall` and `pullall` aliases, which is a very minor re-education.
4) submodules can track submodule-repo branches, not just commits. auto-updating ftw? if you want it.
5) there are some somewhat-unhappy defaults / you probably want `git diff --submodule=log` and `git config --global status.submoduleSummary true`, etc. these (and aliases) are easily fixed the same way as you probably already have for templated .gitignore / etc - just generate some company-wide defaults, and move on with your life.
A lot of the "you have to git submodule command everything all the time" is a thing of the past, the difficulty now is largely related to it being a minor conceptual difference from a monorepo. It's a repo in a repo, and you're manipulating the pointer to the version. There are more options because of this, but they exist for good reasons, and they're not too hard to wrap your head around.
https://git-scm.com/book/en/v2/Git-Tools-Submodules also has some nice examples, and e.g. `git submodule foreach` can simplify a lot if you actually dive into submodules and make changes across multiple simultaneously (big refactor maybe?).
They aren't copying all the files like you do with Git. They have a custom set up that sounds like it lets you checkout just the parts you need. I don't have time to read the whole thing, but it sounds like it works by breaking down a "super repo" into small "sub repos". This actually makes sense.
There is no way working with a 300gb git repo is fun or efficient, and they've probably been doing that for years at Microsoft.
They're explicitly not doing that. They have a massive, monolithic repo, and then tooling for interacting with that monolithic repo without having to grab the whole thing. They are not using Git. You just read the section titled "Alternatives".
The merits of a "monorepo" have been hashed out previously, it's more nuanced than "lol, M$".
They may know something we don't :)
The numbers include test code, utilities, two entire web browsers, UI frameworks, etc.
Big companies, even successful companies aren't incapable of making, and continuing to make stupid decisions.
I'm sure each company has a reason for using a single massive repo. I doubt I would agree with their reason, but I'm sure they have one.
These may not be problems that everyone feels as sharply, but they can be problems that nearly everyone might face at some point. Having these problems isn't even necessarily a sign of bad architecture: at some point all of your software likely has to play nice together on the same machine.
Certainly there are solutions beyond just monorepos, but monorepos are a very well understood, ancient solution (that seems to be making a comeback of sorts, despite many of the other solutions being easier and more powerful today than they were back when monorepos were about the only solution).
 I've seen tools like Lerna (https://github.com/lerna/lerna) used for a managing several relatively small "monorepos" lately.
I'm also curious as to how they used to do it without git, maybe using TFS? I wonder what the timings on that were.
Anyway, I don't think GVFS is the way to go, and I hope that it either doesn't get accepted or doesn't play a role outside of Windows. It's good to see more Git usage, but hacking away instead of fixing the problematic project seems somewhat idiotic. I can imagine other tools having problems with a single project that size, are they going to hack those as well?
Microsoft: "We, only one of most technologically advanced companies with only the 2nd or 3rd highest market cap of any public company on the planet depending on the day, had a problem with infrastructure trying to manage possibly the largest software project that anyone has ever made. And then we solved it."
You: "Stop doing what you're doing and pay attention to meeeeeeeeeeee."
So there is no reason to fix their mistake of a code base.
However, when you see that a large company is doing something in a way that you thing is silly or strange, the logical thing to ask first is "what do they understand that I don't understand?". It won't always be the case, of course, but most of the time it will turn out you were missing something.
Assuming off the bat that they are idiots and you know a better way is staggeringly naive.
There's even a name for this: "Going from D to C - from disparagement to curiosity". I think I first heard it from @patio11.
If no company is using many small repos for truly massive projects, then it's hard to argue it would be a good idea. Could everyone who has looked at this problem make the wrong choice?
Amazon. They built an entire system around managing versions so that they can make it work.
Google reportedly used to use Perforce for their monolithic codebase, and Facebook is supposed to use Mercurial with a bunch of modifications. They all have huge code bases mostly in one repo (I've heard Facebook had a >50GB Git repository, and Google's codebase is supposed to be in the TB range).
I recently ran some experiments, one file per experiment, one result file per experiment. At around 100,000 files, git started getting very upset. Why shouldn't I be allowed to have 100,000 files, or a million files, in a directory? Why should it be my job, as a user, to manually rearrange my data into a format my computer is happier with?
I put all the records for https://www.findlectures.com in it, because then I can use diff tools for testing changes. Obviously this is nowhere near the size of the Windows codebase, but I could see a world where GVFS would be helpful for collaboration on this project.