Hacker News new | past | comments | ask | show | jobs | submit login

Okay, so I should avoid it. What is the alternative?

I see so many git repos with READMEs saying download this huge pretrained weights file from {Dropbox link, Google drive link, Baidu link, ...} and I don't think that's a very good user experience compared to LFS.

LFS itself sucks and should be transparent without having to install it, but it's slightly better than downloading stuff from Dropbox or Google Drive.




According to the article you should use mercurial or PlasticSCM because otherwise you might have to rewrite your history to get to some hypothetical git solution that isn't even on the roadmap.

I think I'll stick to LFS.


Some combination of the following two features:

Partial clones (https://docs.gitlab.com/ee/topics/git/partial_clone.html)

Shallow clones (see the --depth argument: https://linux.die.net/man/1/git-clone)

The problem with large files is not so much that putting a 1Gb file in Git is a problem. If you just have one revision of it, you get a 1Gb repo, and things run at a reasonable speed. The problem is when you have 10 revisions of the 1Gb file and you end up dealing with 10Gb of data when you only want one, because the default git clone model is to give you the full history of everything since the beginning of time. This is fine for (compressible) text files, less fine for large binary blobs.

Git-lfs is a hack and it has caused me pain every time I've used it, despite Gitlab having good support for it. Some of this is more implementation detail - the command line UI has some wierdness to it, there's no clear error if someone doesn't have git-lfs when cloning and so something in your build process down the line breaks with a weird error because you've got a marker file instead of the expected binary blob. Some of it is inherent though - the hardest problem is that we now can't easily mirror the git repo from our internal gitlab to the client's gitlab because the config has to hold the http server address with the blobs in. We have workarounds but they're not fun.

The solution is to get over the 'always have the whole repository' thing. This is also useful for massive monorepos because you can clone and checkout just the subfolder you need and not all of everything.

I say this, but I haven't yet used partial clones in anger (unlike git-lfs). I have high hopes though, and it's a feature in early days.


I found using git-lfs only in a subrepo worked well, since subrepos by default are checked out shallow.


DVC [0] is great for data science applications, but I don't see why you couldn't use it as a general-purpose LFS replacement.

It doesn't fix all of the problems with LFS, but it helps a lot with some of them (and happens to also be a decent Make replacement in certain situations).

[0]: https://dvc.org/


If you are like most people you use systems that speak git (from Microsoft, Jetbrains, GitHub, Atlassian…) but rarely or less fluently anything else so the problem I’m trying to solve isn’t “which VCS lets me work well with large files” but rather “I’m stuck with git so what do I do with my large files”.

Your option is basically Git LFS, possibly also VFSForGit, or putting your large files in separate storage.


It's easy enough to script the download of external files, I'm not sure I see what the big deal is here.

To me, most cases of large files in VCS seem like using a hammer as a screwdriver.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: