
Scaling Git, and some back story - dstaheli
https://blogs.msdn.microsoft.com/bharry/2017/02/03/scaling-git-and-some-back-story/
======
sdesol
> 1st party == 3rd party

This was actually the first thing that I actually noticed about Visual Studio
Team Services, when I first looked at integrating my search and code analytics
engine with VSTS. It was quite apparent that they wanted to make 3rd party
developers, first class citizens.

Anybody who has ever worked in Enterprise, knows feature requirements are
heavily driven by politics. And if you can't support the weirdest edge cases,
resistance for adoption can become insurmountable. Having looked at VSTS, you
could easily tell they wanted to reduce as much push back as possible.

~~~
daxfohl
It's a lofty goal to create tools with the view of supporting the direction of
one of the world's largest software companies, with the ability to support
single-person dev shops just as seamlessly. I don't know if it makes sense
(i.e. would google's monorepo scale down like that? Is Microsoft hamstringing
themselves in this way?), but I applaud the effort.

------
contextfree
Related backstory/POV from one of the lead developers behind this effort who's
now outside MSFT in this tweet thread:
[https://twitter.com/xjoeduffyx/status/827633982116212736](https://twitter.com/xjoeduffyx/status/827633982116212736)

------
luckydude
I think they sort of gave up too soon on splitting up their repos. We've been
through this before and made BitKeeper support a workflow where you can start
with a monolithic repo, have ongoing development in it, and have another
"could" of split up repos, sort of like submodules except with full on DSCM
semantics.

Might take a look at section 5 of this:

[http://mcvoy.com/lm/bkdocs/productline.pdf](http://mcvoy.com/lm/bkdocs/productline.pdf)

which has some Git vs BK performance numbers. We actually made BK pretty
pleasant in large repos even over NFS (which has to be slower than NTFS,
right?).

And BK is open source under the Apache 2 license so there are no licensing
issues.

I get it, Git won, clearly. But it's a shame that it did, the world gave up a
lot for that "win".

------
protomok
Great to see MS working on this, and also posting the code!

"As a side effect, this approach also has some very nice characteristics for
large binary files. It doesn’t extend Git with a new mechanism like LFS does,
no turds, etc. It allows you to treat large binary files like any other file
but it only downloads the blobs you actually ever touch."

It seems every day I see another attempt to scale Git to support storage of
large files. IMHO lack of large file support is the Achilles Heel of git. So
far I am somewhat happy with Git LFS despite some pretty serious limitations -
mainly the damage a user who doesn't have Git LFS installed can inflict when
they push a binary file to a repo.

I'm curious what other folks on HN use to store large files in Git without
allowing duplication?

~~~
herbst
I yet have to run into a issue where i even want a large file in git.

~~~
bluejekyll
The common case that I see are binaries that are being versioned through the
VCS. Large binaries, be they libraries or the application, are stored in the
VCS and it is used as the source of truth from that point on.

Git explicitly called this out as a bad practice. Other vendors, like
Perforce, never really did an amazing job with it, but it worked and on top of
that, created more reliance on the vendor's system. Now that most people see
the shear productivity gains of Git over centralized VCS systems, everyone
wants to move, but that comes with a catch. Many of these companies have large
workflows built around the way that their old VCS works, and even have
compliance rules that have that methodology written into things like their SOX
compliance regulations.

It's this set of people, that generally have this issue. For everyone else,
using just boring old disks is fine for packages and built product as that can
be recreated from the SCM. Now, with LFS in Git, you can maintain the same
workflow as you had in your old VCS, without changing the entire structure of
the organization to work with it.

~~~
luckydude
BitKeeper did stuff like LFS except you can have more than one binary server:

[http://www.mcvoy.com/lm/bkdocs/HOWTO-
BAM.html](http://www.mcvoy.com/lm/bkdocs/HOWTO-BAM.html)

------
bostand
The sooner they admit TFS is dead and commit 100% to git the better.

In fact, I think everyone should use git :)

~~~
justanotheratom
I think you mean Visual Source Safe ;)

~~~
D_Guidi
I think he means Team Foundation Version Control

------
quanticle
Related discussion:
[https://news.ycombinator.com/item?id=13559662](https://news.ycombinator.com/item?id=13559662)

This article covers the end-to-end approach, whereas the other article and
discussion are more focused on the GVSF filesystem driver used to support
scaling git to repositories with hundreds of thousands of files and hundreds
of gigabytes of history.

------
microcolonel
Good story. It'd be interesting to see a portable version (which I guess would
have to either run on Mono or be rewritten in something else); or maybe Google
will release some of theirs. I'm impressed that Microsoft had the courage to
scale mostly-vanilla git instead of hacking Mercurial.

~~~
sytse
During the presentation at Git Merge Microsoft mentioned that the are hiring
Linux and osx driver experts. This suggests that they plan to release the fuse
driver themselves.

------
sigmonsays
these articles are less genuine and interesting because it is the same person
with the same theme
[https://news.ycombinator.com/submitted?id=dstaheli](https://news.ycombinator.com/submitted?id=dstaheli)

~~~
grzm
Its likely just a topic of interest. The frequency of submitting is not very
high. Judge the piece on its merits.

