
Mercurial with Largefiles: Why it is not a solution for game development - noch
http://www.ennoble-studios.com/tuts/mercurial-with-largefiles.html
======
pjc50
Important point here is that if:

\- you have lots of large files which are not amenable to diff and change
frequently;

\- everyone is working within the same company, and usually on the same
network;

then a DVCS is unhelpful because you have to spend the disk cost for the
repositories on every machine having a full copy of everything that's ever
been checked in, regardless of whether they need it or not.

Many games are tens of gigabytes when shipped. It's easy to imagine a process
which accumulated hundreds of megabytes of asset changes every single day over
a multi-year development process. Then you can imagine having to have
expensive terabyte SSDs just to work on it with all your tools.

I'm actually looking at this problem at work for possibly converting a large
repository from svn which is a decade old and merely tens of gigabytes.
Frankly svn handles it just fine so I'm going to defer the problem until I
absolutely have to migrate.

~~~
Negitivefrags
As a point of reference, here are some sizes for the repos of the game I work
on which has been developed over around 11 years.

120 GB for the game data repo. This is the repo for the source and art that
actually ships.

2.5 TB for the raw art repo. This is where the pre-export raw files go.

Obviously it would be impractical to require that the artists all have the
entire repo. We use SVN.

~~~
merb
> which has been developed over around 11 years.

are there that many games which have a lifetime of over 11 years? I only know
about one...

~~~
Negitivefrags
There are many. Online games have very long lifetimes.

Think of League of Legends, Team Fortress 2, EVE online, World of Warcraft.
There are hundreds more smaller games.

~~~
merb
well lifetime = time where stuff is still under development. LoL = not more
than 11 years old (heck it's not even 10 years old) TF2 had it's 10th
anniversary this year. which means eve online, WoW and runescape still get
patches/updates/etc.

as said there aren't that many games that have a lifespan of over 11 years.

~~~
islanderfun
You've been given plenty of examples showing it's been done. What's the point
you're trying to reach?

------
vostok4
I actually setup a small studio to work exclusively on a Mercurial largefiles-
based VCS. I would say its one of the best solutions today on the market for
self-hosted free users.

Why? Perforce's integration to Unity is quite poor (they have a huge untapped
market here), so you end up having to resolve a lot of things slowly in their
tools. Git/HG are much faster in my experience at detecting changes and
interacting sanely with them. Also the team could never learn why a file is
checked out, why can't they commit, etc.

We regularly clean out our largefiles cache on disk, so most of the time
everyone just has the latest version of a given binary file on disk. The
server of course has every revision, but I want that.

And most important of all: with small tweaks we're able to use Phabricator for
all of our task management/documentation workflow. Getting VCS hooks out of
the box to let artists say "Adding typewriter model, please review T555" in
their commit and having that task automatically get assigned to the reviewer
is priceless.

Most of my team doesn't have any idea what VCS is, but they've learned to use
TortoiseHg (they call it "the turtle") and Phabricator to organize ourselves.

While Mercurial isn't the only way to get there, its free, its fast, its
simple, and it unlocks the power of Phabricator (so does Git+LFS I believe).

So in my experience, I would say hg+largefiles is an excellent solution for
game development.

------
zubspace
The only free option with unlimited storage seems to be Microsoft TeamServices
with Git-LFS support. (1) (2)

I have been using Bitbucket and Mercurial for all side projects of mine for
quite a while. But when you start with game development, you will reach the
repository limits quite fast. Textures, meshes, sound, music, concept art and
other binary blobs eat alot of storage.

Git-LFS is a bit a pain to setup, because you need to define _before_ checking
in, which extensions need to be stored as large file. And then there are
check-in hooks, which sometimes did seem unreliable. Visual Studio git
integration is also quite limited, but SourceTree did serve me well.

It's quite liberating if you're able to check in code and assets together
without taking into account the space needed.

1)
[https://blogs.msdn.microsoft.com/devops/2015/10/01/announcin...](https://blogs.msdn.microsoft.com/devops/2015/10/01/announcing-
git-lfs-on-all-vso-git-repos/)

2) [https://www.visualstudio.com/vso/](https://www.visualstudio.com/vso/)

------
luckydude
BitKeeper solved this years ago with one more centralized servers that hold
the binaries. When you clone you only get the tip but you can retrieve any
version you want when you need it. Scales to terabytes easily.

Free and open source (apache v2) at
[http://bitkeeper.org](http://bitkeeper.org)

------
corysama
Blizzard gave a great GDC talk titled “The Data Building Pipeline of
'Overwatch'”. It covered their in-house, http-based asset distribution system.

Take-aways: [http://seanmiddleditch.com/my-gdc-17-talk-
retrospective/](http://seanmiddleditch.com/my-gdc-17-talk-retrospective/)

[https://twvideo01.ubm-
us.net/o1/vault/gdc2017/Presentations/...](https://twvideo01.ubm-
us.net/o1/vault/gdc2017/Presentations/Clyde_David_TheDataBuilding.pdf)

------
LeoJiWoo
Pretty interesting.

Game Development seems to have such a different workflow than most of the
stuff I'm familiar with like backend, web dev, and the occasional network
programming.

[https://gamedev.stackexchange.com/questions/480/version-
cont...](https://gamedev.stackexchange.com/questions/480/version-control-for-
game-development-issues-and-solutions) This link recommends peforce as the
standard .

What do people in the industry actually use ?

~~~
Jare
Perforce, PlasticSCM, SVN or git + SVN. That probably covers 95%.

------
b0rsuk
For 2D graphics, SVG (and more generally vector graphics) fit very well in
git. For 3D, there are models, but there's still the problem of textures.

Is there anything close to a text-based procedural texture format ? Textures
could be procedurally generated at startup and transformed into bitmaps. I am
aware of kkrieger, but is there anything other than proof of concept ? No one
takes voxels seriously anymore...

------
kuschku
I’ve been checking in gigabyte large assets into my git repos with Git-LFS and
self-hosted GitLab, and it’s been working fine for now.

Are there any issues with this approach I should be aware of, considering that
Hg with Largefiles seems to have some, too?

~~~
kevincox
Not explicitly, IFAIK the main slowness/limits you risk hitting are:

1\. Git clones are deep and wide by default, so you end up with local repos
that are the size of the total history. (easy to work around)

2\. Git narrow clone support isn't the greatest, so you will probably want to
fit the "working directory" on one computer. (somewhat painful to work around)

3\. Git has to checkout all files every time you change branches which can be
slow if you have a large working directory. (very painful)

4\. Many operations (status, diff, commit for example) require git to scan the
working directory to see what changed. This will be very slow if you have many
files in your working directory. (very painful)

Luckily all of these problems can be solved seamlessly with a virtual
filesystem approach. For example
[https://github.com/Microsoft/GVFS](https://github.com/Microsoft/GVFS)

------
twic
> Okay, so once we have started, the most important thing to know is: This
> does not in any way change how Hg handles files in memory. [...] Hg has to
> take the file and consume many times more memory during the commit then the
> size of the file, to try to figure out what the differences are.

This isn't true. Largefiles aren't stored as deltas, but as complete blobs.
Mercurial still reads them in their entirety, so it still uses a lot of
memory, but it's not diffing.

> The next problem is everyone collaborating on the project would have to take
> a huge Pull with the new large files, for every version of the large file
> _they don 't yet have_ [...] if you want to go back to a revision you
> haven’t pulled yet and the Server is not up you’re out of luck. That means
> you should get all the commits at some point anyway (because you want all
> the code versions at your side), so what’s the point?

Largefiles's _mechanism_ doesn't require that you download every version of
every large file. That's a key part of its design. If you decide that as a
matter of _policy_ that you want to download them all anyway, then no,
largefiles won't help much.

> Well, they handle files by placing them outside the repo, and storing only
> the hash of the file in the repo itself (all bigfile hashes inside one
> file). This has an unfortunate effect that you won't be able to tell which
> exact bigfile/largefile has actually been modified when looking in the
> history – the only thing you'd see in the repo is the cumulative file that
> holds the hashes of all bigfiles as having a change.

This isn't true either. Largefiles stores the hashes in separate files, and
the history machinery is able to interpret the records properly.

I've written a little script to demonstrate the structure of a largefiles
repository:

[https://bitbucket.org/snippets/twic/7eeAxy](https://bitbucket.org/snippets/twic/7eeAxy)

One thing that would be really useful that largefiles doesn't (that i know of)
do would be to opt out of downloading some largefiles at all. If i'm checking
out an old revision just to read some old code, i don't want to spend ages
pulling largefiles that i'm not going to look at. You can do this with
Facebook's remotefilelog extension, which lets you make shallow clones, which
can omit the large files, but it's awkward:

[https://bitbucket.org/facebook/hg-
experimental/src/default/r...](https://bitbucket.org/facebook/hg-
experimental/src/default/remotefilelog/)

------
raugustinus
Would a maven like repository possibly be a solution to this problem? Simply
have dependencies to binary files/artifacts distributed by Nexus?

------
z3t4
File system Snapshots (like in ZFS) would work nice with binary data.

