
Git Large File Storage 1.0 - kccqzy
https://github.com/blog/2069-git-large-file-storage-v1-0
======
onionjake
While I'm sure this will help some people use git to address a use case that
was previously impossible with git, I can't help but feel that it a bad step
overall for the git ecosystem.

It appears to centralize a distributed version control with no option to
continue to use it in a distributed fashion. What would be wrong with
fixing/enhancing the existing git protocols to enable shallow fetching of a
commit (I want commit A, but without objects B and C, which are huge). Git
already fully supports working from a shallow clone (not the full history) so
it wouldn't be too much of a stretch to make it work with shallow trees (I
didn't fetch all of the objects).

I'm sure git LFS was the quickest way for github to support a use case, but
I'm not sure it is the best thing for git.

~~~
kyrra
Mercurial marks their Largefiles[0] support as a "feature of last resort". IE:
enabling this breaks the core concept of what a DVCS is, as you now have a
central authority you need to talk with. But at the same time, many people
that use Git and HG use it with a central authoritative repo.

[0] [https://www.mercurial-
scm.org/wiki/LargefilesExtension](https://www.mercurial-
scm.org/wiki/LargefilesExtension)

~~~
novaleaf
++! When I was in the games industry, it was extremely important to have this
feature (and yes, it was a last resort!) This is why, then, we choose
Mercurial over Git.

Unfortunately there is a lot of wackyness and far too often assets got out of
sync. We ended up regressing, and put large assets (artwork) into a subversion
repo instead.

I wish there was a better option, such as truncating the history of
largeFiles, but that seems to break the concept of Git/Mercurial even more
than the current "fix"

~~~
kyrra
How long ago were you having problems with it? I've heard it's been a lot
better in the more recent years.

~~~
novaleaf
Indeed that was about 5 years ago. The problems generally were around assets
getting out of sync, and occasionally corruption when uploading to the large-
file storage server.

Problems generally occurred when a client timed out in the middle of an upload
or download.

They were troublesome issues (and silent failures) which made it unusable for
production use.

Hope they got it fixed, it was a great concept, and well ahead of Git in
attempting to solve!

------
vvanders
Without locking this is largely useless.

Usually large files are binary blobs(PSD, .ma, etc) and it becomes
_incredibly_ easy to blow away someone's work by not pulling before every file
you edit(or two people edit at the same time).

As much as some people hate Perforce that's exactly what they are setup to do.
Plus their binary syncing algorithms are top-notch. We used to regularly pull
~300GB art repo(for gamedev) in ~20 minutes.

Git is great for code but this seems like square peg, round hole to me.

~~~
icebraining
Git annex solves this without locking nor losing any of the versions - the
actual files get different names (based on hashes of the contents), which are
referenced by a symlink tracked by git. If two people edit the same file -
pointing the symlink at different filenames - you get a regular git merge
conflict.

~~~
vvanders
That's the thing with binary assets, there is no merge path due to them being
binary by nature.

~~~
LukeShu
No /automatic/ merge resolution. Obviously you have a tool that can open them
(if you edited it in the first place), and you can use that to view the
differences, and replay one set of changes. The fact that the SCM detected the
conflict, and alerted you, allowing you to resolve it, is a solid improvement
over not using an SCM. Further, automatic merge resolution isn't always
possible with text-based assets either (and even when it is, it isn't always
the correct option!).

~~~
vvanders
I don't think you follow. Programs like Photoshop, Maya and 3DSM don't merge.

Period.

Your only option in this case is to throw away someone's work and force them
to redo the work on the file that you decided to keep.

~~~
LukeShu
Programs like gedit don't merge. Period.

Yet I can still use it to resolve a merge conflict.

In Photoshop you would do this by opening both images, and visually comparing
them to see what's different, then copying the appropriate parts from one to
another. Instead of just visually comparing them, you might combine them into
one file as separate layers, and use a transform to see the difference. If the
tool doesn't support that, use ImageMagick to generate a graphical diff
(either the `compare` or `compose` commands), and then copy the relevant parts
from one to another.

We have fancy tools to help us, but fundamentally, merging is a /human/
operation, that requires human judgment to see how multiple sets of changes
can be made to coexist. And that doesn't require a tool (though it can
certainly help).

~~~
vvanders
Cool, how do you merge After Effects, Autocad, Cinema 4D, Unreal 3 Packages,
Illustrator, Sketch, Blender, XSi Lightwave or any other of the production
packages that I've seen used in shipping actual products? What happens if no
one used layers in your Photoshop file and collapsed history to save
performance?

There's a reason Pixar and most mid-large game dev studios use Perforce or a
similar tech, it's because fundamentally you need locking if you're working
with binary assets.

~~~
drewm1980
Your intelligent merge tool has access to the file history. If one user
modifies a layer, and the other one squashes them down, in the merge you
probably want to apply the changes in that order, even if it's out-of-order
chronologically. If the file format has a full edit history baked in, great;
even more info for the intelligent merge. Maybe the in-file history can even
be kept in sync with the repository level history.

In the current ecosystem you probably need locking for your sanity, but some
day software will suck less.

------
sytse
This is great, we plan to ship alpha LFS support in GitLab CE & EE & .com in
8.1 or 8.2. That is in addition to the git annex support that EE & .com
already have for a longer time [https://about.gitlab.com/2015/02/17/gitlab-
annex-solves-the-...](https://about.gitlab.com/2015/02/17/gitlab-annex-solves-
the-problem-of-versioning-large-binaries-with-git/)

~~~
zertrin
Is it safe to assume that Gitlab's implementation of Git LFS will allow to
host the file storage server on premises and potentially on another machine
than the one running Gitlab?

~~~
sytse
It for sure will allow you to host it on premises. We'll open source our LFS
server so people can use it for other purposes.

------
et1337
I haven't been following the various Git large file solutions - can someone
comment on how this implementation compares to git-annex or whatever else is
out there?

~~~
sytse
There are a lot of comparisons in the original announcement of LFS on HN
[https://news.ycombinator.com/item?id=9343021](https://news.ycombinator.com/item?id=9343021)

~~~
sytse
Also interesting is the comparison in [http://git-
annex.branchable.com/not/](http://git-annex.branchable.com/not/)

------
wscott
The complaints about git-lfs make me think we need to tell people about BAM
for BitKeeper.

[http://www.bitkeeper.com/features_binary_asset_management_ba...](http://www.bitkeeper.com/features_binary_asset_management_bam)

BAM works with a similar idea, instead of saving large files in the local
repository, users are allowed to save them in a centralized server. This saves
disk space and network transfer time.

However, unlike other solutions, BAM preserves the semantics of distributed
development.

Instead of requiring a single or standardized set of servers, every user can
have a different BAM server. Data is moved between servers automatically and
on demand.

One group in an office might use a single BAM server for storing all their
data close and locally. When another development group is started in India,
they can use a server local to them. The binary assets will automatically
transfer to the India server as commits are pulled between sites.

This allows centralized storage of your data and yet still supports having a
team work while completely disconnected from the internet.

~~~
luckydude
I've been using BAM for quite a while (I'm one of the developers of it). I use
it to store my photos. I've 55GB of photos in there and backing them up is

    
    
        cd photos
        bk push
    

Works pretty well, when my mom was still alive we pushed them to her imac and
the screen saver was pointed at that directory. So she got to see the kids and
I got another backup.

------
m0th87
If for whatever reason lfs doesn't work for you, check out our solution to
large file storage on git: [https://github.com/dailymuse/git-
fit](https://github.com/dailymuse/git-fit)

We wrote it because of various issues with tools at the time, basically
boiling down to an inclination to have a dead simple solution.

I haven't tried lfs, but if it's anything like github's other software, then
I'm sure it's substantially better than our tool.

------
m12k
I worked at Unity for a couple years and they are one of the biggest users of
(and maintainers of) the Mercurial LargeFiles extension, so I was using that
on a daily basis.

I agree that it should be a measure of last resort, but if you can't avoid
working with big binary files, it makes the difference between a workflow that
is a bit more cumbersome, and one that just grinds to a halt. Getting this
functionality in git is great. And it'll mean a huge step forward in
collaboration tools for game developers. You pretty much can't avoid big
binary files when making games - and so far they've been stuck with SVN or
Perforce (or the more adventurous ones are trying out PlasticSCM, which
apparently is pretty nice too, but is proprietary and doesn't have a big
ecosystem around it like git does). I hope this can lead to a boom of game
developers using git.

~~~
babuskov
Yup. I'm using git for game source code and I'm often holding off any commits
to graphics/music until project is done. Any workaround outside git means you
have two systems to manage and it can get really painful.

------
Amorymeltzer
We had a pretty good discussion here when this was initially announced six
months ago:
[https://news.ycombinator.com/item?id=9343021](https://news.ycombinator.com/item?id=9343021)
Looks like they've had some success!

------
entitycontext
Not sure if they were working together with GitHub on this, but Microsoft also
announced today that Visual Studio Online Git repos now support Git-LFS with
unlimited free storage:

[http://blogs.msdn.com/b/visualstudioalm/archive/2015/10/01/a...](http://blogs.msdn.com/b/visualstudioalm/archive/2015/10/01/announcing-
git-lfs-on-all-vso-git-repos.aspx)

------
rch
I wonder if it could leverage the HDF5 diff tool somehow...

[https://www.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-
Diff](https://www.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Diff)

And perhaps GridFTP:

[http://toolkit.globus.org/toolkit/docs/latest-
stable/gridftp...](http://toolkit.globus.org/toolkit/docs/latest-
stable/gridftp/)

------
res0nat0r
Where are the files actually stored? I hear "git lfs server" in the demo
video, can this be changed? Can I init my repo and tell it to push all my
objects to my own private s3 bucket, or can I only rely on some outside lfs
server I don't control?

~~~
thristian
Not with GitLFS, since it's designed under the assumption that the hostname
serving your repo over ssh is also the GitLFS server over HTTP.

However git-fat[1] is an alternative system that works in much the same way,
but lets you configure where the files are stored.

[1]: [https://github.com/cyaninc/git-fat](https://github.com/cyaninc/git-fat)

------
kazinator
This could be a corollary to P. Graham's "don't do anything that scales":
don't do anything that involves plonking stupidly large files into version
control.

------
mindprince
Good thing that Atlassian/BitBucket would also be supporting it:
[https://blog.bitbucket.org/2015/10/01/contributing-to-git-
lf...](https://blog.bitbucket.org/2015/10/01/contributing-to-git-lfs/)

And very glad to read that they decided to contribute to this instead of
working on their own solution for the same problem. Kudos!

~~~
kannonboy
The fact that both Atlassian and GitHub intended to unveil their own almost
identical competing solutions, both built in Go, in consecutive sessions at
the Git Merge conference (without either being aware of the other) is pretty
hilarious.

------
elcritch
Git-lfs has been helpful for managing my repo of scientific research data.
Hundreds of large-ish excel files, pngs, and hdf5 add up quickly if you're
doing lots of small edits.

There's still some warts (don't forget git lfs init after cloning!), buts it's
mostly fast and transparent. I also ponied up $5 a month to get 50 gigs or so
of lfs storage. Decent deal imho.

------
chejazi
As someone new to this idea, the README helped clarify the workflow:
[https://github.com/github/git-
lfs/blob/master/docs/api/READM...](https://github.com/github/git-
lfs/blob/master/docs/api/README.md)

------
lspears
That video is hilarious! I wish we had more awesome videos like this for new
technologies!

------
k__
Is there a solution, that doesn't depend on external storage?

I have data that belongs to my source, but is rather big and I want it inside
of my repo.

~~~
m12k
They do have a reference implementation of the serverside here:
[https://github.com/github/lfs-test-server](https://github.com/github/lfs-
test-server) \- though they themselves don't consider it production ready. But
I'm sure it'll either get there in time, or another open source implementation
will rise to the challenge (cf. syste's comment about GitLab planning support
for this:
[https://news.ycombinator.com/item?id=10313495](https://news.ycombinator.com/item?id=10313495)
)

------
tyoverby
What happens if someone that hasn't downloaded their command line tools tries
to clone your repo? Will they get the big files too?

~~~
deevus
I believe they just get the references to the big files, not the files
themselves.

------
anotherevan
Was I the only one who expected that bear to move on its own?

------
jonesetc
Any information on GitHub Enterprise support?

~~~
rlegit
GitHub Enterprise has been supporting LFS since 2.2 (current latest is 2.3.3)
in a technical preview mode. See here:
[https://enterprise.github.com/releases/2.2.0/notes](https://enterprise.github.com/releases/2.2.0/notes)

