Has anyone come out with a reason for the size? I mean they seem to reuse a lot of the same textures throughout the game so I have a hard time understanding how it managed to swell to that.
But a lot of these games not only Reuse textures but also duplicate them. They do this to improve load times for levels/etc (less seeking).
Playstation DOOM is an early example of this pattern. The WAD files were actually 'per-level' and contained all sprites/textures/etc used in said level.
Wouldn't duplicated textures get de-duplicated by any halfway-decent compression algorithm though? I'd assume the game is sent over the wire compressed
Most compression algorithms have a limited data reference window (dictionary size) - tens to hundreds of MB being typical upper limits. If duplicated data is further apart, it gets compressed again.
If the data isn't pre-compressed, or doesn't otherwise significantly vary, then using the right tools should de-duplicate the data. Things like lrzip, rsync batch mode, bdiff should work well. If large blocks are identical, many filesystem archivers/backup solutions should do good deduplication, and squashfs can be kind of nice.
There are possibly challenges in that however, my understanding of most compression algos (which may be inaccurate or incorrect) is that you'd run into limitations either with size of data overall or the memory requirements.
Of course, I'm sure there's other ways around that (And I have no clue if they are in use). One option would be to send over the assets in a master 'assets' package and then duplicate/assemble the data as intended during installation.
That's not how compression would work. Think about if they are adding a new level. The algorithm would have nothing to compare the files against except what is actually being sent. If the texture appears in each level, and each level comes in a separate update, there's no way for the algorithm to reference or incorporate the files that are already in your computer, so it has to compress it like it's a brand new file. That is unless you have a costume installer which duplicates the file once it runs, but that would probably be a ton more work for the developers.
You have this completely wrong. "there's no way"? It's extremely easy to make a patch file that references existing data. This is a problem that has been solved many times.
And that's not relevant to initial install anyway, which can/should be a single download.
Having a reference dictionary is a type of compression.
You need a program that decompresses no matter what, and the ability to reference existing files barely changes it. It's not a post-processing stage where files get copied around. It's the ability to reference arbitrary chunks of data inter-file just like it can reference arbitrary chunks of data intra-file.
At the most basic level it's like taking a compressed file and chopping it in half. The user already has the first half, then they download the second half and "resume" decompressing.
I assume it must be a massive refusal to deduplicate. For example in the Verdansk map, many of the houses and buildings are cut and pasted. In theory, these could be aliases of the same model, but I suspect they are all baked in.
Other than that I just don't know. Maybe they used bad settings for whatever is their compression algo and they don't have any experts who could actually identify the problem? It's a preposterous idea but I can't explain it otherwise.
The house models themselves might use the same geometry and textures, but if you pre-bake any lighting then that’s a unique texture across every single surface in the world that can’t be deduplicated.
Depending on how the models are expressed in the environment, theres a good chance they end up as triangle meshes with unique coordinates by the time the running game eversees them.
Unless it’s changed a lot since I worked in game dev, models absolutely are instanced but poly counts, texture resolutions and number of textures have gone through the roof. You used to have a texture and maybe a light map, now you have separate diffuse, specular, bump, glow, decals, etc. etc.
Wouldn’t they want to keep reused objects instanced for performance, or is there some other advantage to just meshing everything out into a unique set of level parts?
Games have always had this dilemma. Content creators and level makers will simply try to push in as much as they possibly can until something says stop. Historically this has been the size of one CD but nowadays there are no such clearly defined limits.
If the engine starts compressing better then the map creators will just use that newly gained savings to add even more content.
I can think of two reasons which are purely non-techincal:
- The bigger the game, the less space for other games.
- Much like adding metal to the inside consumer headphone purely to weight it down and give a premium feeling (they do this with other products as well). A larger program may be perceived as more significant.