These sentences in some parts gloss over the details, and in others it is flat out wrong. Git was designed for tens of thousands of developers (the Linux kernel), it was designed for huge numbers of files (but large numbers of files works fine on Linux due to the dentry cache, it sucks on Windows because they don't have a cache that has the same behaviour as the Linux dentry cache). Admittedly it is slow for files that are large in size, but it was designed for source code; what sane developer would have source files that are hundreds of MB in size?
That said, monorepos with several dozen software projects can be very slow due to O(n) functions such as git blame and the like. It's true that git wasn't designed for large numbers of projects in one repo that are only indirectly shared by library code and the like.
Unfortunately that was not written in the article.
This is easy to check, and the article is right. The Linux kernel does not have tens of thousands of active developers. Over a little more than the last year:
titan:~/src/linux geofft$ git log v4.8..v4.14 --format='%aE' | sort -u | wc -l
titan:~/src/linux geofft$ git log v4.13..v4.14 --format='%aE' | sort -u | wc -l
titan:~/src/linux geofft$ git log v4.8..v4.14 --format='%aE' | sort | uniq -c | awk '$1 > 5' | wc -l
> Admittedly it is slow for files that are large in size, but it was designed for source code; what sane developer would have source files that are hundreds of MB in size?
Sometimes that's the best way to get things done? The best Debian workflow I've ever worked with (and I've worked with a lot) involved actually committing binary .debs to SVN alongside their source code, because that meant that the SVN revision number was the single source of truth. There wasn't some external artifact system, nor was there a risk of picking up the wrong packages from an apt repo when rebuilding an older SVN revision.
I'm not defending this as pretty. But I will defend this as sane. It got the job done reliably and let me work on actually shipping the product and not yak-shaving workflows.
I can't see MS having tens of thousands of developers working on any single component of Windows, so I'm guessing they _do_ want a giant monorepo. If they were to break it up into separate repos per OS component, I doubt they'd have scaling issues (of course, breaking things up introduces coordination and dependency issues).
Instead, we decided to - as you put it - change Git to fit our needs.
In particular, if my repo is big enough, I often don't have the entire tree in memory (because I'm doing other useful things with memory and caches got evicted). core.untrackedCache makes things a little better, but it's still not great.
GVFS is part of our effort within the company to standardize on a single set of best-of-breed tools, and use the same tools inside Microsoft that we deliver to customers. So we're adopting one engineering system throughout the company and we're moving everybody to Visual Studio Team Services.
As they say in: https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-g...
The first big debate was – how many repos do you have – one for the whole company at one extreme or one for each small component? A big spectrum. Git is proven to work extremely well for a very large number of modest repos so we spent a bunch of time exploring what it would take to factor our large codebases into lots of tenable repos. Hmm. Ever worked in a huge code base for 20 years? Ever tried to go back afterwards and decompose it into small repos? You can guess what we discovered. The code is very hard to decompose. The cost would be very high. The risk from that level of churn would be enormous. And, we really do have scenarios where a single engineer needs to make sweeping changes across a very large swath of code. Trying to coordinate that across hundreds of repos would be very problematic.
I can't see it being desirable either, they already have to deal with outdated external API, surely they don't want outdated internal API. A single repository lets you replace an API throughout the codebase at once and remove the old one entirely, if you start "modularising" then all "internal API" are suddenly external with all that entails.
This is totally armchair-devops, but it sounds like a more ideal case would be CI that runs based on your commit. In my experience, developers get lazy and are likely to commit binaries that dont match the code or miss some dependencies or steps.
I completely agree with your point, using the tools as intended isn't always the best way to get things done. Video games are an easy example, code is often dependent and intermixed with binary assets and you want those to be versioned together.
For games the binary data isn't from code. They're textures and models generated in a package (like Photoshop). The bits of code are often tightly coupled with the textures--like a procedural waterfall that uses still images. You'd need to version both of those together.
But for the product we were building, it was important that developers were able to test their .debs before committing to the repo. That means that CI can't build post-commit: you have to have some way to take a local modification, turn it into a .deb, test it and see if it does what you want, and then commit that.
I've thrown something together for my current job that supports both building a package interactively for testing and building one in the CI pipeline post-commit (using a git repo as the source in both cases) but it's definitely more cumbersome, and I haven't figured out quite how I want to fix it.
(Also, yes, games are a more obvious example of this, but there are lots of purists who've never worked in game development who are somehow convinced that assets aren't code and shouldn't be in the code repo. Having not worked in game development myself either, I figured it was safer not to invite that argument.)
The commit is merged only after a developer will give +2 for the review.
Are there really ten thousand active developers on the Linux Kernel?
According to  there were ~ 2,000 over a 15 month period. It's an order of magnitude difference that seems like it would have an impact.
The nearest thing you could compare the development model to is the set of base packages for the BSDs: each subsystem/package has its own upstream development (like the Linux subsystems with their maintainers); and then Linus serves as the release engineer, taking updates to those packages and merging them into the release. (I compare to the BSDs specifically because Linux kernel subsystems, like non-cross-platform BSD base-system packages, have no other life outside of their potential integration into the release. People aren’t using the upstreams directly, other than in development of the new feature.)
So I don't think raising the way linux uses git by its original intended users really contradicts the idea that it was designed for a large number of developers?
This scenario is nicely handled in svn, mercurial, tfvc and similar. If this lets git do it then it looks like a great idea.
Game developers with lots of churn in huge binary assets also have the same problem, but probably even worse.
The issue is that these giant binary files cannot be merged. Because of this you need a locking mechanism(which SVN and Perforce both support).
Without that you have two people working on the same file and someone having to throw away work which makes for a very unhappy team.
DVCS is great but not every problem domain maps to it.
Even then it looks like an implementation that misses the mark. Both Perforce and SVN mark "lockable" files as Read-Only on a sync of the local copy. It looks like all this does is prevent a push of the file which will be the same result as if someone merged it ahead of you.
Without that guard in place it makes it really easy to make a mistake and then have changes to a binary file that will collide. Having a checkout process as part of locking workflow is pretty much mandatory.
Nevermind, I finally found this that explains Git LFS does it correctly, kudos! Could do with some more detailed documentation since it looks partially implemented from the man pages.
What we need to move off large repos in tfvc or svn is something where the inconvenience is small enough that it doesn’t weigh heavier than the benefits of git. LFS was not that, this could be.
I made a stupid decision to use git-lfs to track a pre-initialized Docker database image (using LFS to store initial_data.sql.bz2). Now I have to live with few dozen gigabytes of the past I don't really need (it's a small database).
I know I can rewrite the history and then find something to garbage collect unreferenced blobs (haven't investigated it). The repo allows doing so as it only keeps those dumps, the Dockerfile and a pair of small shell scripts. And, most importantly, it has only 2 users - me and my CI. If it would be large assets and lots of code, checked out and worked on by a large team - I guess, rewriting the history isn't an option.
Storage is cheap, but still...
The alternative to doing it this way is to introduce version management between components, which is totally unnecessary complexity. Windows is one product and all of the components (or subsets thereof) ship together. As far as internal-only interfaces are concerned, version N does not need to work with N-1 or N+1, only N. That is a massively simplifying assumption that makes a lot of things much easier.
There is a reason why all the big tech companies operate this way. Microsoft, Google, and Facebook, all have one giant repository with their entire product.
Cool, but that doesn't explain why Office and Windows need to share the same repo. Ditto for gmail and chrome os. Ditto for fb messenger and whatever face recognition algorithm they use.
git was (initially) written by Torvalds to serve his own purpose -- he needed a new VCS for Linux. Let's say that it "doesn't scale for org-wide mono repos". Unless that was a design goal (and I'm pretty sure it wasn't), how is that a "failure"? It apparently works just fine for the kernel developers.
Yet because it didn't work well for Microsoft's 300 GB worth of tens millions of lines of code it is somehow a "failure"?
Apparently Microsoft's own TFS is a failure too, then, because it seems that it wasn't up to the task either.
(To quickly clarify nomenclature: TFVC is "Team Foundation Version Control", the name of the centralized version control system in TFS and VSTS, which are our on-premises and cloud hosted development platforms, respectively. They both support both TFVC and Git - Git with GVFS, and they do more than just version control, hence the naming clarification. Apologies for all the three letter acronyms.)
Anyway, TFVC is totally different and a centralized version control system with all the workflows that go along with it like expensive branching. The desire to move to Git opens up all the awesome workflows that Git offers.
The impetus here was to standardize the entire company on a modern set of development tools - one engineering system across the company - hosted in Visual Studio Team Services. We could have moved them to TFVC, it's remarkably similar to its predecessor, "Source Depot" which is a Microsoft internal tool that the Windows team was using.
But that's a lateral move. There aren't really any benefits to TFVC over Source Depot except that we sell and support one to end users and don't with the other. So that has some nice organizational benefits but not enough to warrant moving everybody and the build/release farms over.
But moving to Git unlocks all sorts of new workflows - lightweight branching, pull requests, all that good stuff that everybody who uses Git is totally used to.
Anyway, that's the background on Git vs TFVC. And, no, I totally agree that Git not supporting giant monorepos is not a failing, per se. But it is a limitation, and thankfully one that we're working to help overcome.
currently I'm evaluating git hosting on-premise and another downside is, is that git tfs takes a lot of horse power (this looks unsuitable for a small team)
When your screwdriver does a bad job at pounding nails, you're doing it wrong.
When I was at Amazon, Perforce couldn't scale (it had already been partitioned into a few depots, but those had reached their reasonable capacity). Amazon moved to a per "module" git-repo system. It was convenient to push across multiple repos simultaneously, but it wasn't a deal breaker when we lost it (I never heard of any workflows being completely unworkable following the transition).
Unfortunately, it's not available externally.
Even working at Google, my jaw still dropped at this:
Google's codebase is shared by more than 25,000 Google software
developers from dozens of offices in countries around the world.
On a typical workday, they commit 16,000 changes to the codebase,
and another 24,000 changes are committed by automated systems.
Each day the repository serves billions of file read requests,
with approximately 800,000 queries per second during peak traffic
and an average of approximately 500,000 queries per second each
workday. Most of this traffic originates from Google's
distributed build-and-test systems.
What advantage does a company wide mono repo provide?
My experience with farms of git repos is that the lack of atomic operations over many tiny repos leads to things like version sets and having to periodically merge dependencies. I've worked on teams where that was inevitably neglected during hectic periods resulting in painful merges of large numbers of changes. That problem simply doesn't exist with working at HEAD and high quality presubmit test automation/admission control. The single repo also allows for single code reviews spanning multiple packages which makes it MUCH simpler to re-arrange code (Bazel again helps here since a "package" is any directory with a BUILD file). Package creation is lighter weight for the same reason, and has fewer consequences for poor name choices since rearrangement is easy and well supported by automated tools.
Sharing one build system where a build command implicitly spans many packages also results in efficient caching of build artifacts and massively distributed builds (think a distributable and cacheable build action per executable command rather than a brazil-build per package). Each unit test result can be cached and only dependent tests re-run as you tweak an in-progress change. This is fantastic for a local workflow (flaky tests can be tackled with --runs_per_test=1000, which with a distributed build system is often only marginally slower than a single test run). Also, you can query all affected tests for a given change with a single "local" bazel query command. The list goes on from here -- I keep thinking of new things to add (finer grained dependencies, finer grained visibility controls, etc.).
It's not that you can't build most of this for distributed repos, but I'd argue it's harder and some things (like ease of code reorg) are nearly impossible.
Subjectively, having worked with both approaches at scale, Google's seems to result in much better code and repo hygiene.
There's a new project that allows you to use hg with Piper/CitC; I've been using it for about a month and it's been working really well. No clue how it works, though.
My current client has a 17-year-old multi Gb TFS monolith. New components go into git and as features evolve it's common for the respective team to hive them off into they're own git repos as stand-alone components. Nevertheless TFS is painful and if I could have an effective git repo with the monolith too, then life would be far easier.
Also how feasible is migrating from TFS to Git?
The only real complaint I have is speed. Dunno if it's our TFS server or the nature of TFS, but every checkin pulled into git takes anywhere from 5-30 seconds. In large projects with long histories you can easily spend 30+ minutes exporting history before running into a problem that prevents the export from completing.
If it makes you feel any better I had to spend two continuous weeks exporting our huge TFS monolith before I managed to have a working git copy.
But yeah, Git-TFS is a godsend.
It would probably have been better if they had assigned a team of highly skilled engineers to design a distributed source control from scratch specifically to solve the problems faced by the windows team.
After all, `git` itself was developed for Linux because Linus was not satisfied with any of the existing solutions.
(Why in the world would that have been better?)
It seems like Windows is not really one of these use cases.
It’s tough to scale VCS to very large teams, regardless of the software. No need to stand up for git.
And I would encourage you to read them, so I won't belabor the details here, but to give you just one example: git blame is not the issue that we're most concerned about. The basic, highest priority functionality in Git - things like git add - are the the issues we're most concerned about.
git add is O(n) on the number of files in the repository. When you run `add`, git reads the index, modifies the entry (or entries) that you're changing, and writes the index back out.
The Windows repository has about 3.5 million files. Running `git add` - when we started thinking about moving Windows into Git - took minutes.
Can you imagine running `git add` and having it take 10 minutes?
Now obviously there are some inefficiencies here - there's quadratic operations and the like that went in assuming that nobody would ever put 3.5 million files into a Git repository. And we've been cutting those out over time.
Thankfully, Git does have some functionality to support very large repositories - using shallow clones, sparse checkouts and the like. We've added narrow clones, to only download portions of the tree, and automation to handle this automatically without user intervention.
That's the scaling work that we're doing with GVFS. And these changes bring the P80 time for git add down to 2.5 seconds. We've been contributing these changes back to git itself, and we're thrilled to work with industry partners like GitHub who are also interested.
Sorry to go on - version control is a passion of mine, as it is for many of us working on Visual Studio Team Services. Your conclusion is very much correct: Git wasn't designed for this from the beginning. Thankfully, software isn't immutable, so we're scaling it up.
Microsoft took the same look and went the opposite direction, and decided that Git was easier for us to hack on. This was a no brainer for us. We build multiple tools that host peoples Git repositories (TFS and VSTS) and have Git repositories as deployment mechanisms for Azure. We have several contributors to tools in the Git ecosystem on staff, including people contributing to core git, and the maintainer for Git for Windows, libgit2 and LibGit2Sharp. But we have comparatively little institutional knowledge of Hg.
That post you linked is awesome. Facebook has done a lot of really impressive work scaling Hg and a lot of the lessons learned at Facebook and implementations in Hg are very similar to what we've done with Git.
This video from Durham Goode on Facebook's version control team (from Git Merge, amusingly) is also awesome:
The ability to extend, swap and replace deep into the stack was always a core component of Mercurial, hence having to enable extensions for things many users would consider core features e.g. terminal colors or graph log
Facebook didn't hack around like monkeys, they built extensions. And when they could not do it as extensions, they upstreamed improvements to the core.
The alternative would have been to fork the codebase entirely.
Monkey patching the internals would have been significantly less maintainable: having used it as a library, I can tell you that none of the internal stuff is considered stable and pretty major components will change between point releases (I was using the diff-parsing and patch-application routines for something else, the API changed basically every minor release, forking would at least give you a heads-up conflict when upstream changed, monkey-patching would either blow up at runtime or not go through the patch anymore)
> They're now talking about implementing parts in rust, which, ironically, would have prevented them from doing what they originally chose mercurial for.
GitHub doesn't develop git, it's a service built on top of git. Any modifications to git has nothing to do with GitHub. GFVS is a GitHub project, not part of git.
I'd be surprised if none of their employees have contributed to Git. I didn't interpret anything in the article as saying that GitHub develops Git, in its entirety (or even largely).
> GFVS is a GitHub project
GFVS is a Microsoft project and GitHub seems to be contributing. And the GFVS developers have been submitting, successfully, their changes to Git itself upstream to the actual Git project. So it is becoming a part of Git.
In git.git, there are 7 commits with @github.com authors, last one from 2014.
(I hope it just means GitHub folks are trying to blend into the crowd by using personal addresses, or something...)
> Shortly after our initial deploy, we also started the process of upstreaming the changes to Git so the whole community could benefit from them.
But I guess most of their git related work time goes to libgit2.
Jeff King (peff) has worked for GitHub for a long time:
Michael Haggerty (mhagger) is also a GitHub employee.
"Microsoft [...] wanted to get these modifications accepted upstream and integrated into the standard Git client.
That plan appears to be going well. Yesterday, the company announced that GitHub was adopting its modifications and that the two would be working together to bring suitable clients to macOS and Linux."
This hints that Git upstream = GitHub. I mean, why mention GitHub at all if they aren't upstream? The rest of the article doesn't explain GitHub's role in this story either.
Microsoft made some contributions and they're working to get them accepted upstream. They're maintaining a fork basically.
GitHub is adopting their modifications, i.e. GitHub is running Microsoft's fork.
Microsoft's modifications are necessary for them, particularly so they can use Git with the giant Windows repo. Presumably, GitHub also has a need or desire for those same or similar modifications. IIRC, some of GitHub's largest customers need or want a version of Git that can also handle large repos, or repos with large files or large numbers of files.
(example, the '768 conflict' bug here: https://githubengineering.com/move-fast/)
TL;DR pretty sure it's not touching git source at all.
Looking though the GVFS readme: https://github.com/Microsoft/GVFS it appears that it actually wraps git, which would make sense because in that case they would have to explicitly add support to each git host and provide individual platform ports of GVFS (all of which would be unnecessary if it was actually upstreamed into git).
So that's good... git is one of my favourite open source tools and I would really hate for M$ to start polluting it. I don't care if GVFS is a good idea or not I just don't trust the fuckers and they will always deserve that suspicion.
Not really polluting but rather having some objects be fetched only on demand.
> In addition to the GVFS sources, we’ve also made some changes to Git to allow it to work well on a GVFS-backed repo, and those sources are available at https://github.com/Microsoft/git.
For the record as far as I understand GVFS the article is correctly using git vs Github.
See previous articles which discussions:
If so, the comparable code base to check against is Android AOSP, Ubuntu, RedHat or FreeBSD.
If that is true, I believe the source code base for Ubuntu/RedHat distribution with all the Apps likely be bigger compare to Windows in term of number files, source repo and number of engineers (open source developers for all the packages such as ff, chrome openoffice.)
Microsoft folks feel free to correct me here.
It seems that the existing git's process, dev model seems to work well for much bigger projects already by using different git repo for each apps.
Still not sure what pain point does the new GVFS solve.....
So yeah, all of the code and data actually stored in source control for various Linux distributions is pretty small.
AOSP is closer to what you're imagining, but I haven't met anyone who thinks Repo (Android's meta-repo layer) and Gerrit (their Repo-aware code review and merge queue tool) are pleasant to work with. E.g. it takes forever and a day to do a Repo sync on a fresh machine. A demand-synced VFS would be very nice for AOSP development, even though it's not a monorepo but a polyrepo where Repo ties everything together.
You can think of it as giving you somewhat flexible control of the "torrenter/seeder ratio" in BitTorrent: how many complete copies of the repo are available/accessible at a given time.
That seems like a reasonable compromise for a workload which Git is otherwise completely unable to handle.
So in the end you have your local repo, your fork, and the organization repo. During a pull request process third parties might make pull requests to your fork to try and fix things
The typical Github workflow is to treat the Github repo as a preferred upstream
Speaking for myself I have only ever used triangular workflows: fork upstream; set local remotes to own fork; push to own fork; issue pull request; profit
It's very rare for anyone on GitHub to do the sort of tiered collaboration that the Linux kernel uses. If, say, I want to contribute to VSCode, pretty much the only way to get my changes upstream is to submit pull requests directly to github.com/Microsoft/vscode.
Compare to the tiered approach, where I notice someone is an active and "trusted" contributor, so I submit my changes to their fork, they accept them, and then those changes eventually make their way into the canonical repo at some future merge point. That's virtually unheard of on GitHub, but it's the way the Linux kernel works.
Pretty much the only way you could get away with something even remotely similar and not have people look at you funny in the GitHub community is maybe if you stalked someone's fork, noticed they were working on a certain feature, then noticed there was some bug or deficiency in their topic branch, then they wake up in the morning to a request for review from you regarding the fix. Even that, which would be very unusual, would really only work in a limited set of cases where you're collaborating on something they've already undertaken—there's not really a clean way in the social climate surrounding GitHub to submit your own, unrelated work to that person (e.g., because it's something you think they'd be interested in), get them to pull from you, and then get upstream to eventually pull from that person.
To put it another way: A server with multiple repos on it is a central server with respect to the repos you don’t clone! This is the same, but for parts of repos.
Smart people are working on all this, so I'm sure there are reasons, but in the all the instances where I've had to interact with a monorepo, it was because the tech debt was too high to pay off to break it apart, not because it was better.
And if you're indebted to the point where you have no point in paying it off, you damn well better have leveraged assets against it (i.e. a cash cow of a business, like MS Word or Facebook)
As far as Git's killer feature I think decentralization is only one part of it. Being able to deal with the change graph directly is sometimes handy. And knowing the basis of the model, I find Git pretty easy to understand even when history gets fuzzy. Not sure but I think Darcs might be even better for that, just without the mindshare.
Why spend so much effort bashing a distributed peg into a centralized hole?
This, above all else. That I never, or anyone else on our team, ever has to worry about wether they are running an "older version" of git is such a win.
Making wide-spread changes to all clients of your code when you make changes that break things doesn't require discipline.
It has to fail gracefully on version mismatch, other than that it's just a bad decision.
It would be pretty cool to use as a $GOPATH. Or just for making drive-by github contributions super easy.
Usage: `h <owner>/<repo>` clones the repo, if it doesn't exist, and takes you there.
This and hub are integral to my developer workflow:
$ h some/repo
$ git checkout -b some-branch
[hack and commit]
$ hub fork
$ git push -u zimbatm
$ hub pull-request
that mechanism can also be combined with permissions on the server side so contractor X who is fixing a driver need only have access to the driver and some supporting code while still checking into the monorepo.
It also means that every developer’s laptop won’t have a full copy of the repo (so if said laptop is lost, the risk is more contained)
And it’s similar to Part of LFS, bigfiles and other “blob” add-o s to git...could see in future a way to put .doc files and the like into repos.
They might want to fix that sentence. It reads as saying that Github is open source. Perhaps it should read
> The free GitHub hosting that is typically used by open source projects doesn't need the scaling work Microsoft has done
I'd love to read more technical details on this. How can Kauth support a virtual filesystem?
This style of working is needed in large code bases, where not all files are checked out on developer workstations (for performance or privacy reasons).
An open-source cross-platform virtual file system API that was also fast (as opposed to FUSE) would be amazing.
(a) the result of the work of a major Git player like GitHub, and a major software company like Microsoft, with tens of excellent engineers devoted on it, and that solves a real pain point they have, is a centralized abomination that merely replicates a feature SVN already had
(b) your description is a crude knee jerk reaction
If SVN is a terrible application with no merits to large developers like MS, why did it exist in the first place?
If the features SVN brings over git are not considered detrimental by Torvalds, why did he create his own VCS with them removed, and why hasn't he added them back in?
Where's the contradiction? All kids of terrible apps exist. Terribleness and existence are not mutually exclusive qualities.
(Assuming SVN is terrible, of course, which I didn't say. I'd say SVN was a attempt to go beyond CVS with some shortcomings that don't make it the best available option today).
>If the features SVN brings over git are not considered detrimental by Torvalds, why did he create his own VCS with them removed, and why hasn't he added them back in?
Lots of possible answers (given the assumption in your "if"):
E.g. he might not consider them detrimental for other people and use cases, but he doesn't need them for his use case (Linux kernel development) either.
Or he thinks that while they might be good, they complicate things too much, and he prefers a more minimum feature set.
Microsoft decided to stick with git but add non-strict fetching to it. On the plus side, still all the advantages of easy branching/merging and working offline if you touched the required files which distributed vcs's bring. But you still need to be connected to work with parts of the code base you haven't used before.
So I guess if you run the test suite for your current task first then you can work offline since all relevant files will be fetched?
Which seems like a fair tradeoff to me. The existing "native solution" for a very very large codebase would be to have it split into multiple, logical repositories. If you fetched one repo you needed to work on, but not all the dependencies or sibling repos, you still wouldn't be able to work on those other parts of the codebase until you connected.
Compared to mercurial, that Facebook uses?
I was pretty surprised when I made my own build of OpenJDK 9 how easy it was to work with, `hg tclone blah`, `hg tup jdk9.0.1+11`, `./configure blah; make images` and done. Even if git submodules were closer in functionality (checking out the same tag across multiple modules at once with ease) the song-and-dance with actually downloading the modules after cloning is annoying.
Even if a CDN is more "hierarchical" rather than P2P, it's still distributed, it's just distributed on a different axis than you are perhaps expecting.
Furthermore, to a very large extent that's an implementation detail. The GVFS protocol itself  is a very simple REST API, and there is absolutely nothing stopping you from building a GVFS "server" that is literally backed by IPFS or BitTorrent or some other P2P file system.
- X software package name reminds/confuses me of Y product with similar name
- X title is garbage. Here is the title I would write...
- I don't understand [basic concept available on Wikipedia]. Someone explain it to me (aka lazyweb)
- I'm not an expert on this, but [completely unqualified and uncited conjecture]
- I also once did X from TFA and [unrelated personal anecdote with no insight] (aka long form "me too" comment)
I come to HN so that I can read informed discussion from people who work in the field of TFA. To read discussion of basic topics between the uninformed, there is everywhere else on the internet.
Has infected the hell out of our company Slack. It's considered very rude to not answer.
I feel like when I was coming up, where IRC or mailing lists were a thing, having done all possible research before asking humans was an absolute cultural requirement.
It sucked at first, but I definitely miss that and long for the old days. It was so much more respectful and efficient.
Sure, in some cases, implementation of this policy fails, and something is rewritten that shouldn't be, or something should be rewritten that isn't. But, I'd estimate maybe 80% of all discussions about titles, that I've seen, have been valid concerns, and typically resulted in a rewrite.
- this project/library/company name is terrible
- this website hijacks my scroll bar
- this website doesn’t render well in mobile
- I can’t read this site because it’s too narrow
- I can’t read this site because it’s too wide
It defines a "CDN protocol" for downloading those objects as needed (which Bitbucket and GitHub are both supporting in various alpha/beta stages), which is essentially a cache offered as a paid service to big enterprise projects, but the GVFS project also has to make sure that git operates as efficiently as possible with sparse object databases, and implements how those sparse object databases work at all (which to this point was not something git concerned itself with, and partly why the work is being done as a filesystem proxy using placeholder files on the user's machine).
The project has included work in making sure that git commands touch as few objects from the object database as they can to get their work done (minimizing downloads from a remote server).
Every name is taken. There is nothing to be done about it. It's okay. I mean, it's sad, and terrible, but it's also fine.
Maybe GNOME should start trademarking project names too :/...
Am I on some kind of a black/gray list? Either that, or you specifically remember my name, which sounds odd.
- Branching in SD sucked, the only way to work on multiple features at once was to either have multiple copies of the repo or constantly fiddle with "package files" that contained your changes, and the whole thing shit the bed if the server went down.
- All the new hires both from college and the industry all know git and there were an increasing number of them who just found it bizarre that Microsoft was still on a proprietary, non-distributed system.
- Higher-ups (correctly, IMO) decided that VSTS/TFS had to have git in order to remain competitive, and that we should be eating our own dogfood. That's partly a marketing concern but also a legitimate technical decision.
The marketing here is: it's cool to work at Microsoft again!
But that makes it a technical decision! If it's easier to attract talented engineers because you're using git, that improves the product. I was trying to make this point in my other comment too: OP is drawing a sharp line between "marketing decisions" and "engineering decisions" and sneering at the former, but actually the boundary can be pretty fuzzy.
Exhibit A: GVFS was solely invented as a hack to make git usable by the Windows team according to the Windows team blog.
(Yes, there is some sour grapes and sarcasm in this post)
I seriously did that first learning Git. And there's plenty of niches and side-cases that I'm still not quite sure of. Going from client-server to distributed has a level of complexity that usually isn't discussed until you implement.
EDIT: And further understanding is, this provides a GIT filesystem based connection so that one can work on a multi-TB repo without downloading everything locally.
This seems to be the result of choosing to have all the software in one OMG-sized repo, rather than 1 project/repo. And evidently they need a "keep on server" for this. Makes me wonder more why they even went with Git or a distributed model at all. This seems more like they 'screwed up and now have tons of bandaids'.
There's also lots of interesting reads about 'monorepos' and their pros and cons. Note tho that Google and Facebook, and lots of other companies, use mono-repos. It's not just Microsoft and, not surprisingly, they've all made (more or less) reasoned, thoughtful decisions taking into account lots of factors that almost no one else in the world would ever think to do.
I suspect one factor for Microsoft is that many patches will be cross-cutting, and tooling with multiple repositories or even submodules remains poor.
Disclosure: Work at MS
I would think that versioning and compatibility issues are the main driver of monorepos - if you are ultimately shipping one product, why get wrapped up in all the labor that can be involved in breaking it down while still being able to pull off working versions to test? Might be a much better decision to just treat it as one giant repo that always stays in a working state.
And finally, as a more soft-factors issue, I think that a monorepo can help to reduce siloization. We sometimes have issues with teams not liking it when people mess with "their" repo, which slows down cross-cutting changes by a huge factor. A monorepo would probably be a powerful factor against this kind of thing.
If the problem is the usability of multiple repositories, could this be solved with better tooling? Projects like GVFS suggest that monorepos do not avoid a need for strong tooling.
It's easier for junior developers to deal with monorepos and it takes certain architecture considerations to plan for a strong component model and version management of that. Would you expect to have the right mix of senior-level staff to junior-level staff to handle that? What sort of turnover might you expect?
Furthermore, many monorepos sometimes don't happen intentionally, they just grow organically. It's sometimes only in hindsight where you realize that what you thought of as one system, one component, could have been cut into smaller pieces. It's sometimes only in hindsight where you realize that something you thought of as an internal-only API you didn't wish to version and package and support as such should have been componentized and versioned and packaged separately.
On both sides of the monorepo/small-packages spectrum there are continual trade-offs of time versus planning versus skill level, and neither is necessarily the "right" answer, and likely what you end up doing is somewhere in the middle, some combination of both, based as much on pragmatic needs as anything else.
It is extremely unlikely that your companys code size would become too much to handle in a single repo. And before that, imho it's just premature optimization to split things up.
Likewise, if you don't write at least some documentation up-front, you're much more likely to never get it done at all.
Just split with folders within the repo.
Anyways this is my opinion, and I know many wise people that disagree with me. And I've seen companies work well with both multiple and single repo.
I.e. your mileage may vary.
Microsoft goes one step further and checks the entire build system into the source tree too, compiler and all.
The effect of all this is that you can sync to any known-good version of Windows and just build it without worrying about other dependencies.
Microsoft is capable of doing that kind of work (and they sort of are with some of the azure stuff they're working on) but it's a non-trivial project that requires a tremendous amount of investment of time and resources. And that's just to get to square one, let alone something that is competitive with their other infrastructure (keep in mind that microsoft also has tremendous investments in build and CI infrastructure). That's a very hard sell, especially from a risk management perspective.