It appears to centralize a distributed version control with no option to continue to use it in a distributed fashion. What would be wrong with fixing/enhancing the existing git protocols to enable shallow fetching of a commit (I want commit A, but without objects B and C, which are huge). Git already fully supports working from a shallow clone (not the full history) so it wouldn't be too much of a stretch to make it work with shallow trees (I didn't fetch all of the objects).
I'm sure git LFS was the quickest way for github to support a use case, but I'm not sure it is the best thing for git.
Unfortunately there is a lot of wackyness and far too often assets got out of sync. We ended up regressing, and put large assets (artwork) into a subversion repo instead.
I wish there was a better option, such as truncating the history of largeFiles, but that seems to break the concept of Git/Mercurial even more than the current "fix"
Problems generally occurred when a client timed out in the middle of an upload or download.
They were troublesome issues (and silent failures) which made it unusable for production use.
Hope they got it fixed, it was a great concept, and well ahead of Git in attempting to solve!
A shallow clone gives you some `n` commits of history (and the objects that they point to). Using LFS allows you to have some `m` commits worth of large files.
If you want a completely distributed workflow, and have infinite local storage and infinite bandwidth, you can fetch all the large files when you do a normal `git fetch`. However, most people don't, so you can tweak this to get only parts of the local file history that you're interested in.
Indeed this is a trade off that requires some centralization, but so does your proposed solution of a shallow clone. This adds some subtlety and configurability around that.
The additional command and configuration (and perhaps object storage) would be needed either way.
If you didn't want large files in the work tree, you would use sparse checkout. These are orthogonal problems.
If it was fully integrated with git, you could do 'git fetch --all-objects-ever' and your repository could then be cloned and fetched from.
to us, things like git lfs encourage us to ensure there is an open source package that supports it, to keep the d in dvcs, so we'll add support to our community edition too
Let me give an example of one way it hurts the existing git ecosystem. Someone decides to include their external source dependencies for their project as tarballs using lfs (which is probably dumb and not the use case that lfs is trying to support, but people will do it nonetheless). Now I want to mirror that repository inside of my companies firewall which hosts its git repositories using just git over ssh. Without lfs, I would just do 'git clone --mirror && git push --mirror' and I internally have a mirror that is easy to keep up-to-date, is dependable, supports extending the source in a way that is easy to contribute back, etc..
Now what options do I have with lfs (outside of gitlab)? Create a tarball with all the source +lfs in it? Create a new repository that doesn't use lfs and commit the files to that? Each of these is less than ideal and makes contributing back to the project harder.
Imagine instead a world where this happened: Github.com announces that they are adding large file support to git. These large files will be using an alternative object storage, but the existing git, http, and ssh protocols will be extended to support fetching these new objects. When support for it lands in the git mainline repository, suddenly everyone will be able to take advantage of it, regardless of how they choose to host their repositories!
I admire gitlab for creating an open source server implementation. I just wish that github would have done it a different way that would have been better for the overall git community (not just github users).
Usually large files are binary blobs(PSD, .ma, etc) and it becomes incredibly easy to blow away someone's work by not pulling before every file you edit(or two people edit at the same time).
As much as some people hate Perforce that's exactly what they are setup to do. Plus their binary syncing algorithms are top-notch. We used to regularly pull ~300GB art repo(for gamedev) in ~20 minutes.
Git is great for code but this seems like square peg, round hole to me.
I believe his (her?) point is that for a very large class of binaries there is just no upside in parallel development, one guy is going squash the other guy's work. You want to serialize those efforts.
We don't have global locks yet but we know how to do them, just waiting for the right sales prospect to "force" us to do them. I'm 90% sure we could add them to BK in about a week.
Your only option in this case is to throw away someone's work and force them to redo the work on the file that you decided to keep.
Yet I can still use it to resolve a merge conflict.
In Photoshop you would do this by opening both images, and visually comparing them to see what's different, then copying the appropriate parts from one to another. Instead of just visually comparing them, you might combine them into one file as separate layers, and use a transform to see the difference. If the tool doesn't support that, use ImageMagick to generate a graphical diff (either the `compare` or `compose` commands), and then copy the relevant parts from one to another.
We have fancy tools to help us, but fundamentally, merging is a /human/ operation, that requires human judgment to see how multiple sets of changes can be made to coexist. And that doesn't require a tool (though it can certainly help).
The point the parent was trying to make is that the lock operation of SVN was quite convenient for preventing a dual-edit scenario of assets that aren't easy to merge like 3D meshes, scenes, PSDs, etc.. It's easy to sit in the ivory tower of text merge resolution given how easy it is in comparison. The atom of change in other tools are quite a bit less obvious. Sure you can diff a mesh, but merging usually just means redoing it or picking one or the other.
There's a reason Pixar and most mid-large game dev studios use Perforce or a similar tech, it's because fundamentally you need locking if you're working with binary assets.
In the current ecosystem you probably need locking for your sanity, but some day software will suck less.
Yeah, good luck with that. With software like Photoshop, not every change is obvious or easily visible. Maybe the other guy tweaked blending parameters of a layer, or reconfigured some effects layers. Or modified the document metadata. Or did thousands other things that are not immediately visible, or for which it is hard to determine the sequence changes should be reapplied.
Maybe you can manually merge the two files to some reasonably good approximation of the intended result, but you can't never be sure you didn't miss something important.
Merging tools for text files show enough information for you to know when you've seen every change made. You can't have that with complex binary formats used by graphics programs, mostly because those formats were never explicitly designed to support merging.
Well, there's the time-honored technique of "rapidly switch between windows that are zoomed to the same place." But, more rigorously, I mentioned a way to do this; there are tools that can do a diff of raster images--which is what you are making at the end of the day with Photoshop. Sure it can't tell you what blurring parameters someone changed, but you can see that the blur changed, then you can go look at the parameters.
> Or did thousands other things that are not immediately visible
The trickiness of that situation isn't unique to binary formats. It comes up with code too.
> Maybe you can manually merge the two files to some reasonably good approximation of the intended result, but you can't never be sure you didn't miss something important.
That's just as true with code as it is with other formats!
> because those formats were never explicitly designed to support merging.
Neither was text. We just ended up making some tools that were reasonably decent at it.
I've been there, I've done that. I've done the 3-way merge with Photoshop files, and resolved the conflicts with 2 different people working on an InDesign file, and broken down to running `diff` on hexdumps of PDF files. Resolving merges with things that don't have nice tools for it isn't fun.
But it's a /lie/ to claim that a conflict for binary formats is "game over, you're just going to steamroll someone's work, there is no path to merge resolution". It's not a fun path, but it's not game over. Which is all I was really trying to refute.
(aside: It's interesting to me that this chain of comments went from being upvoted last night to downvoted this morning.)
I guess this could work with simple cases and if you accept less than pixel-perfect standard; I can see how this will fail when several people are working on a single file for long (because not everything that is important is visible to visual diff, at least you'd end up overwriting whatever scaffolding the other guy set himself up for his work), but at this point I'd be questioning the workflow that requires two or more people to work simultaneously on a single asset.
> The trickiness of that situation isn't unique to binary formats. It comes up with code too.
> That's just as true with code as it is with other formats!
Not really - text files don't contain any more data than you can see when you open them in your editor. With text, you see everything. When you open a 3D model or a PSD file, or even a Word document, what you see is just a tip of an iceberg.
> But it's a /lie/ to claim that a conflict for binary formats is "game over, you're just going to steamroll someone's work, there is no path to merge resolution". It's not a fun path, but it's not game over. Which is all I was really trying to refute.
I can agree with that. It's not impossible to do such merges; worst case scenario, one will end up praying to a hex editor like you say you did. It can even be fun sometimes. I guess what 'vvanders was arguing about is practicality - you can do it if you're willing to invest the time, but it's much better to not have to do it at all.
> (aside: It's interesting to me that this chain of comments went from being upvoted last night to downvoted this morning.)
HN moves in a mysterious way
its voting to perform;
A reader questions his comment's downvotes,
And thus ensues shitstorm.
Source code doesn't ~really merge all that well either; there's just been a big community of software developers collaboratively fixing up their tools for collaboration.
Of course we have it easy cause the tools we're using are ~made of the same stuff we work with daily. No amount of photoshop filter experience will enable you to write a program that intelligently merges two photoshop files.
I also haven't seen what the pull performance is like, P4 is a pretty known quantity(with caching as well).
A remote-only locking system should be pretty easy to implement, e.g by just throwing in "filename.userid.lock" into the filesystem next to the file in question.
That's nonsense of course.
There are a lot of use cases where this would be very helpful without locking (i.e. jar/dlls).
This is useful now, locking can come later. We don't have to solve every conceivable problem all at once. More progress is made in small incremental steps than big bang leaps.
BAM works with a similar idea, instead of saving large files in the local repository, users are allowed to save them in a centralized server. This saves disk space and network transfer time.
However, unlike other solutions, BAM preserves the semantics of distributed development.
Instead of requiring a single or standardized set of servers, every user can have a different BAM server. Data is moved between servers automatically and on demand.
One group in an office might use a single BAM server for storing all their data close and locally. When another development group is started in India, they can use a server local to them. The binary assets will automatically transfer to the India server as commits are pulled between sites.
This allows centralized storage of your data and yet still supports having a team work while completely disconnected from the internet.
We wrote it because of various issues with tools at the time, basically boiling down to an inclination to have a dead simple solution.
I haven't tried lfs, but if it's anything like github's other software, then I'm sure it's substantially better than our tool.
I agree that it should be a measure of last resort, but if you can't avoid working with big binary files, it makes the difference between a workflow that is a bit more cumbersome, and one that just grinds to a halt. Getting this functionality in git is great. And it'll mean a huge step forward in collaboration tools for game developers. You pretty much can't avoid big binary files when making games - and so far they've been stuck with SVN or Perforce (or the more adventurous ones are trying out PlasticSCM, which apparently is pretty nice too, but is proprietary and doesn't have a big ecosystem around it like git does). I hope this can lead to a boom of game developers using git.
And perhaps GridFTP:
However git-fat is an alternative system that works in much the same way, but lets you configure where the files are stored.
And very glad to read that they decided to contribute to this instead of working on their own solution for the same problem. Kudos!
There's still some warts (don't forget git lfs init after cloning!), buts it's mostly fast and transparent. I also ponied up $5 a month to get 50 gigs or so of lfs storage. Decent deal imho.
I have data that belongs to my source, but is rather big and I want it inside of my repo.