I've heard this about game dev before. My (probably only somewhat correct) understanding is it's more than just source code--are they checking in assets/textures etc? Is perforce more appropriate for this than, say, git lfs?
I'm not sure about the current state of affairs, but I've been told that git-lfs performance was still not on par with Perforce on those kinds of repos a few years ago. Microsoft was investing a lot of effort in making it work for their large repos though so maybe it's different now.
But yeah, it's basically all about having binaries in source control. It's not just game dev, either - hardware folk also like this for their artifacts.
Interesting. Seems antithetical to the 'git centered' view of being for source code only (mostly)
I think I read somewhere that game dev teams would also check in the actual compiler binary and things of that nature into version control.
Usually it's considered "bad practice" when you see, like, and entire sysroot of shared libs in a git repository.
I don't even have any feeling one way or another. Even today "vendoring" cpp libraries (typically as source) isn't exactly rare. I'm not even sure if this is always a "bad" thing in other languages. Everyone just seems to have decided that relying on a/the package manager and some sort of external store is the Right Way. In some sense, it's harder to make the case for that.
It's only considered a bad idea because git handles it poorly. You're already putting all your code in version control - why would you not include the compiler binaries and system libraries too? Now everybody that gets the code has the right compiler to build it with as well!
The better organised projects I've worked on have done this, and included all relevant SDKs too, so you can just install roughly the right version of Visual Studio and you're good to go. Doesn't matter if you're not on quite the right point revision or haven't got rough to doing the latest update (or had it forced upon you); the project will still build with the compiler and libraries you got from Perforce, same as for everybody else.
I would say it's considered a bad idea, since git comes from an OpenSource world and its implied understanding, where you don't depend on a specific binary release to reproduce your results, but your results can always be reproduced purely from source - binaries are just considered a product of "the plaintext truth recorded in your repository".
Sure, there are nuances to it as the need for ReproducibleBuilds [1] project demonstrated or people proving this concept of "bootstrapped reproducibility" to the extremes like Guix' full-source bootstrap [2], but I believe the fundamental understanding is the same.
> It's only considered a bad idea because git handles it poorly. You're already putting all your code in version control - why would you not include the compiler binaries and system libraries too? Now everybody that gets the code has the right compiler to build it with as well!
No arguments here, it makes perfect sense to me as a practice. It's shortsighted to consider only "libraries" (as source, typically) to be "dependencies"--implicitly you're relying on a compatible compiler version/runtime/interpreter (and whatever those depend on) and etc
What was the nature of this project? Was this something related to game development?
That seems to be the only domain where this approach is used (from what I've heard).
Yes, this was a video game, targeting games consoles - so the executable statically links with pretty much everything, including stuff that would count as part of the OS on a PC. And then, if you can make sure everybody builds with the same libraries then you can be sure that everybody is running the same code. (I imagine this approach is also used for embedded development sometimes, another case where the build artefacts cover a much larger percentage of what gets run on the target than is the case with a PC.)
I will admit that even though I think it might be worth doing (and not doing it because your version control system doesn't make it easy is the worst reason for not doing it!), I'm not sure I'd consider it absolutely mandatory, certainly not for video games anyway. Most projects I've worked on haven't done this, and it's been fine, especially if you have a CI system doing the builds - and now the individual developers' systems just have to be roughly right. But it is nice to have, if you've got the right combination of personpower and institutional will to actually do it. It eliminates a whole category of possible issues straight away.
(I'd like to think that for safety critical stuff like medical or automotive this sort of approach, but done even more carefully, would be par for the course...!)
I've been checking in large (10s to 100s MBs) tarballs into one git repo that I use for managing a website archive for a few years, and it can be made to work but it's very painful.
I think there are three main issues:
1. Since it's a distributed VCS, everyone must have a whole copy of the entire repo. But that means anyone cloning the repo or pulling significant commits is going to end up downloading vast amounts of binaries. If you can directly copy the .git dir to the other machine first instead of using git's normal cloning mechanism then it's not as bad, but you're still fundamentally copying everything:
$ du -sh .git
55G .git
2. git doesn't "know" that something is a binary (although it seems to in some circumstances), so some common operations try to search them or operate on them in other ways as if they were text. (I just ran git log -S on that repo and git ran out of memory and crashed, on a machine with 64GB of RAM).
3. The cure for this (git lfs) is worse than the disease. LFS is so bad/strange that I stopped using it and went back to putting the tarballs in git.
Source control for large data.
Currently our biggest repository is 17 TB.
would love for you to try it out. It's open source, so you can self host as well.
Why would someone check binaries in a repo? The only time I came across checked binaries in a repo was because that particular dev could not be bothered to learn nuget / MAVEN. (the dev that approved that PR did not understand that either)
Because it’s way easier if you don’t require every level designer to spend 5 hours recompiling everything before they can get to work in the morning, because it’s way easier to just checkin that weird DLL than provide weird instructions to retrieve it, because onboarding is much simpler if all the tools are in the project, …
Hmm, I do not get it.... "The binaries are checked in the repo so that that the designer would not spend 5 hours recompiling" vs "the binaries come from a nuget site so that the designed would not spend 5 hours recompiling".
In both cases the designer does not recompile, but in the second case there are no checked in binaries in the repo... I still think nuget / MAVEN would be more appropriate for this task...
Everything is in P4: you checkout the project to work on it, you have everything. You update, you have everything up to date. All the tools are there, so any part of the pipeline can rely on anything that's checked in. You need an older version, you just check that out and off you go. And you have a single repository to maintain.
VCS + Nuget: half the things are in the VCS, you checkout the project and then you have to hunt down a bunch of packages from a separate thing (or five), when you update the repo you have to update the things, hopefully you don't forget any of the ones you use, scripts run on a prayer that you have fetched the right things or they crash, version sync is a crapshoot, hope you're not working on multiple projects at the same time needing different versions of a utility either. Now you need 15 layers of syncing and version management on top of each project to replicate half of what just checking everything into P4 gives you for free.
> VCS + Nuget: half the things are in the VCS, you checkout the project and then you have to hunt down a bunch of packages from a separate thing
Oh, and there's things like x509/proxy/whatever errors when on a corpo machine that has ZScaler or some such, so you have to use internal Artifactory/thing but that doesn't have the version you need or you need permissions to access so.. and etc etc.
I have no idea what environment / team you worked on but nuget is pretty much rock solid. There are no scripts running on a prayer that everything is fetched. Version sync is not a crapshot because nuget versions are updated during merges and with proper merge procedures (PR build + tests) nuget versions are always correct on the main branch.
One does not forget what nugets are used: VS projects do that bookkeeping for you. You update the VS project with the new nugets your task requires; and this bookkeeping will carry on when you merge your PR.
I have seen this model work with no issues in large codebases: VS solutions with upwards of 500,000 lines of code and 20-30 engineers.
But if you have to do this via Visual Studio, it's no good for the people that don't use Visual Studio.
Also, where does nuget get this stuff from? It doesn't build this stuff for you, presumably, and so the binaries must come from somewhere. So, you just got latest from version control to get the info for nuget - and now nuget has to use that info to download that stuff?
And that presumably means that somebody had to commit the info for nuget, and then separately upload the stuff somewhere that nuget can find it. But wait a minute - why not put that stuff in the version control you're using already? Now you don't need nuget at all.
Because it's (part of) a website that hosts the tarballs, and we want to keep the whole site under version control. Not saying it's a good reason, but it is a reason.