I recently upgraded from a 1.5MB L2 cached athlon 2 to a 6MB L3 core i5, and surprisingly, game loadings are still as slow. I guess that copying assets files onto RAM doesn't result in a speed up ?
So if I understand the problem right, it's because copying data to the GPU is made through the PCI express bus, and done "piece by piece", instead of larger batches ? A little like grouping draw calls ? That's funny how that problem can be seen everywhere in hardware, where multiplying queries will make latencies snowball.
I think it has more to do with the fact that GPU's memory accesses aren't cache coherent with the CPU, so a larger L2 doesn't really add much to the table.
Generally DMA to/from the GPU is cache coherent (either via DMA sniffing for cache invalidation or software managing regions for DMA, e.g. marking relevant PTEs as nocache).
So accesses are _coherent_, but the cache is simply irrelevant (or even more costly, if it's using snooping).
So if I understand the problem right, it's because copying data to the GPU is made through the PCI express bus, and done "piece by piece", instead of larger batches ? A little like grouping draw calls ? That's funny how that problem can be seen everywhere in hardware, where multiplying queries will make latencies snowball.